What we do not know about LLMs
Although Deep Learning in general and Transformers in particular have made tremendous strides in applications and downstream tasks (such as Question Answering, Text Summarization, object detection etc.), there are still a lot of gaps in our understanding and effectiveness of Deep Learning and in particular in LLMs. For instance, Generative Large...
From Machine Learning to Large Language Models - A Survey
Starting the early 2000s, the improvements in hardware to support deep learning networks has lead to a leap in modern deep learning approaches. Deep Learning (Hinton et al. 2006), (Bengio et al. 2007), which is an extension of neural networks, contain an input, an output, and a large number of hidden layers between the input and output. This ty...
Probability Theory for Natural Language Processing
A lot of work in Natural Language Processing (NLP) such a creation of Language Models is based on probability theory. For the purpose of NLP, knowing about probabilities of words can help us predict the next word, understanding the rarity of words, analyzing and knowing when to ignore common words with respect to a given context - e.g. articles ...
The Foundations of Language Models
Language Models are models that are trained to predict the next word, given a set of words that are already uttered or written. e.g. Consider the sentence: “Don’t eat that because it looks…“
The next word following this will most likely be “disgusting”, or “bad”, but will probably not be “table” or “chair”. Language Models are models that assi...
The Comprehensive Guide to Logistic Regression
In Natural Language Processing (NLP) Logistic Regression is the baseline supervised ML algorithm for classification. It also has a very close relationship with neural networks (If you are new to neural networks, start with Logistic Regression to understand the basics.)
Introduction
Logistic Regression is a discriminative classifier.
Discrim...
What is Byte-Pair Encoding for Tokenization?
Tokenization is the concept of dividing text into tokens - words (unigrams), or groups of words (n-grams) or even characters.
Morphology traditionally defines morphemes as the smallest semantic unit. e.g. The word Unfortunately can be broken down as un - fortun - ate - ly
[[un [[fortun(e) ]\(_{ROOT}\) ate]\(_{STEM}\)]\(_{STEM}\) ly]\(_{WORD}\)...
Managing Machine Learning Experiments
I run Machine Learning experiments for a living and I run an average of 50 experiments per stage of a project. For each experiment I write code for training models, identifying the right test cases and metrics, finding the right preprocessors - the list goes on.
So how to manage these experiments? Here are a few criteria that I have:
Compat...
What is Natural Language Processing (NLP)?
Last year I wrote a highly popular blog post about Natural Language Processing, Machine Learning, and Deep Learning.
In this post, we will break down NLP further and talk about Rule-Based and Statistical NLP. I will discuss why everyone needs to know about NLP and AI (Artificial Intelligence), how Machine Learning (ML) fits into the NLP space (...
14 post articles, 2 pages.