Word embeddings are dense vector representations of words in a continuous vector space. Unlike simple one-hot encoding, these vectors capture semantic relationships between words. Each dimension in the embedding space represents a latent feature learned from the data.
The famous example "king - man + woman ≈ queen" demonstrates how word embeddings capture semantic relationships:
Start with raw text converted to numerical vectors:
Words are mapped to a continuous vector space where similar words cluster together.
Words interact through Query, Key, Value transformations:
Multiple attention heads capture different aspects of relationships.
Attention mechanisms allow models to focus on relevant parts of input data when making predictions. Multi-head attention splits this process into several parallel "heads," each learning different aspects of relationships in the data.
Where:
Key Components:
Time Complexity: \(O(n^2 \cdot d)\)
Space Complexity: \(O(n^2)\)
Where \(n\) is sequence length and \(d\) is embedding dimension
Word Embedding Analogy:
king - man + woman ≈ queen