Rutu Mulkar
Syntax and Semantics
Book Summaries
About
Syntax and Semantics
Probability Theory for Natural Language Processing
A lot of work in Natural Language Processing (NLP) such a creation of Language Models is based on probability theory. For the purpose of NLP, knowing about probabilities of words...
The Foundations of Language Models
Language Models are models that are trained to predict the next word, given a set of words that are already uttered or written. e.g. Consider the sentence: “Don’t eat that...
The Comprehensive Guide to Logistic Regression
In Natural Language Processing (NLP) Logistic Regression is the baseline supervised ML algorithm for classification. It also has a very close relationship with neural networks (If you are new to...
What is Byte-Pair Encoding for Tokenization?
Tokenization is the concept of dividing text into tokens - words (unigrams), or groups of words (n-grams) or even characters. Morphology traditionally defines morphemes as the smallest semantic unit. e.g....
Managing Machine Learning Experiments
I run Machine Learning experiments for a living and I run an average of 50 experiments per stage of a project. For each experiment I write code for training models,...
What is Natural Language Processing (NLP)?
Last year I wrote a highly popular blog post about Natural Language Processing, Machine Learning, and Deep Learning.
Natural Language Processing vs. Machine Learning vs. Deep Learning
NLP, Machine Learning and Deep Learning are all parts of Artificial Intelligence, which is a part of the greater field of Computer Science. The following image visually illustrates CS, AI...
Online Word2Vec for Gensim
Word2Vec [1] is a technique for creating vectors of word representations to capture the syntax and semantics of words. The vectors used to represent the words have several interesting features,...
Understanding your Data - Basic Statistics
Have you ever had to deal with a lot of data, and don’t know where to start? If yes, then this post is for you. In this post I will...
An Introduction to Probability
This post is an introduction to probability theory. Probability theory is the backbone of AI, and the this post attempts to cover these fundamentals, and bring us to Naive Bayes,...
Build your own search Engine
In this post, I will take you through the steps for calculating the $tf \times idf$ values for all the words in a given document. To implement this, we use...
The Math behind Lucene
Lucene is an open source search engine, that one can use on top of custom data and create your own search engine - like your own personal google. In this...