NLP Bytes
Vector Semantics and Embeddings
General Relevant Concepts:
- Distributional hypothesis
- Vector Semantics
- Contextualized Word Representations
- Self-supervised Learning
- Lexical Semantics
Lexical Semantics
- Lemma
- Word Forms
- Word Sense
- Synonymy
- Propositional Meaning
- Word Similarity
- Relatedness
- Semantic Field
- Topic Models
- Hypernymy
- Autonymy
- Meronymy
- Semantic Frames and Roles
- Connotations
Vectors
- Vector semantics
- Co-occurrence Matrix
- Term Document Matrix
- Vector Space
- Vector for a document
- Information Retrieval
- Term - term matrix
- One Hot Encoding
- TF*IDF
- PMI (Pointwise Mutual Information)
- PPMI (Positive Pointwise Mutual Information)
- Laplace Smoothing
Cosine Similarity
- What is Cosine similarity
- Issues with raw dot product
- Normalized Dot Product
- Unit Vector
Embeddings
- Sparse vs Dense vectors
- Word2Vec and Logistic Regression
- Word2Vec (Skip Gram with Negative Sampling SGNS)
- Word2Vec (Cumulative Bag of Words CBOW)
- Semantic Properties of Embeddings
- Relationships:
- First Order Co-Occurrence - Syntagmatic Association
- Second Order Co-occurrence - Paradigmatic Association
Relevant Concepts
- GloVe Embeddings - method based on ratios of word co-occurrence probabilities
- Fasttext - computing word embeddings on bag or character (or subwords)
- LSI (Latent Semantic Indexing)
- LDA (Latent Dirichlet Allocation)
- LSA (Latent Semantic Analysis)
- SVD (Singular Value Decomposition)
- PLSI (Probabilistic Latent Semantic Indexing)
- NMF (Non-Negative Matrix Factorization)
- Contextual Embeddings - ELMO, BERT. The representation of a word is contextual - a function of the entire sentence