NLP Bytes

Vector Semantics and Embeddings

General Relevant Concepts:

  • Distributional hypothesis
  • Vector Semantics
  • Contextualized Word Representations
  • Self-supervised Learning
  • Lexical Semantics

Lexical Semantics

  • Lemma
  • Word Forms
  • Word Sense
  • Synonymy
  • Propositional Meaning
  • Word Similarity
  • Relatedness
  • Semantic Field
  • Topic Models
  • Hypernymy
  • Autonymy
  • Meronymy
  • Semantic Frames and Roles
  • Connotations

Vectors

  • Vector semantics
  • Co-occurrence Matrix
  • Term Document Matrix
  • Vector Space
  • Vector for a document
  • Information Retrieval
  • Term - term matrix
  • One Hot Encoding
  • TF*IDF
  • PMI (Pointwise Mutual Information)
  • PPMI (Positive Pointwise Mutual Information)
  • Laplace Smoothing

Cosine Similarity

  • What is Cosine similarity
  • Issues with raw dot product
  • Normalized Dot Product
  • Unit Vector

Embeddings

  • Sparse vs Dense vectors
  • Word2Vec and Logistic Regression
  • Word2Vec (Skip Gram with Negative Sampling SGNS)
  • Word2Vec (Cumulative Bag of Words CBOW)
  • Semantic Properties of Embeddings
  • Relationships:
    • First Order Co-Occurrence - Syntagmatic Association
    • Second Order Co-occurrence - Paradigmatic Association

Relevant Concepts

  • GloVe Embeddings - method based on ratios of word co-occurrence probabilities
  • Fasttext - computing word embeddings on bag or character (or subwords)
  • LSI (Latent Semantic Indexing)
  • LDA (Latent Dirichlet Allocation)
  • LSA (Latent Semantic Analysis)
  • SVD (Singular Value Decomposition)
  • PLSI (Probabilistic Latent Semantic Indexing)
  • NMF (Non-Negative Matrix Factorization)
  • Contextual Embeddings - ELMO, BERT. The representation of a word is contextual - a function of the entire sentence