Tag lucene

Approximate time to read: 6 min
Build your own search Engine

In this post, I will take you through the steps for calculating the $tf \times idf$ values for all the words in a given document. To implement this, we use...

Approximate time to read: 6 min
The Math behind Lucene

Lucene is an open source search engine, that one can use on top of custom data and create your own search engine - like your own personal google. In this...

Tag information_retrieval

Approximate time to read: 6 min
Build your own search Engine

In this post, I will take you through the steps for calculating the $tf \times idf$ values for all the words in a given document. To implement this, we use...

Approximate time to read: 6 min
The Math behind Lucene

Lucene is an open source search engine, that one can use on top of custom data and create your own search engine - like your own personal google. In this...

Tag probability

Approximate time to read: 11 min
An Introduction to Probability

This post is an introduction to probability theory. Probability theory is the backbone of AI, and the this post attempts to cover these fundamentals, and bring us to Naive Bayes,...

Tag introduction

Approximate time to read: 7 min
What is Natural Language Processing (NLP)?

Last year I wrote a highly popular blog post about Natural Language Processing, Machine Learning, and Deep Learning.

Approximate time to read: 11 min
An Introduction to Probability

This post is an introduction to probability theory. Probability theory is the backbone of AI, and the this post attempts to cover these fundamentals, and bring us to Naive Bayes,...

Tag statistics

Approximate time to read: 10 min
Understanding your Data - Basic Statistics

Have you ever had to deal with a lot of data, and don’t know where to start? If yes, then this post is for you. In this post I will...

Tag word2vec

Approximate time to read: 14 min
Online Word2Vec for Gensim

Word2Vec [1] is a technique for creating vectors of word representations to capture the syntax and semantics of words. The vectors used to represent the words have several interesting features,...

Tag nlp

Approximate time to read: 9 min
Probability Theory for Natural Language Processing

A lot of work in Natural Language Processing (NLP) such a creation of Language Models is based on probability theory. For the purpose of NLP, knowing about probabilities of words...

Approximate time to read: 16 min
The Comprehensive Guide to Logistic Regression

In Natural Language Processing (NLP) Logistic Regression is the baseline supervised ML algorithm for classification. It also has a very close relationship with neural networks (If you are new to...

Approximate time to read: 7 min
What is Natural Language Processing (NLP)?

Last year I wrote a highly popular blog post about Natural Language Processing, Machine Learning, and Deep Learning.

Approximate time to read: 4 min
Natural Language Processing vs. Machine Learning vs. Deep Learning

NLP, Machine Learning and Deep Learning are all parts of Artificial Intelligence, which is a part of the greater field of Computer Science. The following image visually illustrates CS, AI...

Tag machine_learning

Approximate time to read: 3 min
Managing Machine Learning Experiments

I run Machine Learning experiments for a living and I run an average of 50 experiments per stage of a project. For each experiment I write code for training models,...

Approximate time to read: 4 min
Natural Language Processing vs. Machine Learning vs. Deep Learning

NLP, Machine Learning and Deep Learning are all parts of Artificial Intelligence, which is a part of the greater field of Computer Science. The following image visually illustrates CS, AI...

Tag deep_learning

Approximate time to read: 4 min
Natural Language Processing vs. Machine Learning vs. Deep Learning

NLP, Machine Learning and Deep Learning are all parts of Artificial Intelligence, which is a part of the greater field of Computer Science. The following image visually illustrates CS, AI...

Tag tokenization

Approximate time to read: 3 min
What is Byte-Pair Encoding for Tokenization?

Tokenization is the concept of dividing text into tokens - words (unigrams), or groups of words (n-grams) or even characters. Morphology traditionally defines morphemes as the smallest semantic unit. e.g....

Tag classification

Approximate time to read: 9 min
Probability Theory for Natural Language Processing

A lot of work in Natural Language Processing (NLP) such a creation of Language Models is based on probability theory. For the purpose of NLP, knowing about probabilities of words...

Approximate time to read: 16 min
The Comprehensive Guide to Logistic Regression

In Natural Language Processing (NLP) Logistic Regression is the baseline supervised ML algorithm for classification. It also has a very close relationship with neural networks (If you are new to...

Tag language_models

Approximate time to read: 12 min
The Foundations of Language Models

Language Models are models that are trained to predict the next word, given a set of words that are already uttered or written. e.g. Consider the sentence: “Don’t eat that...