Content is user-generated and unverified.

Inside the geometry of meaning: How neural embeddings actually work

The most important finding across embedding research is that neural networks encode far more concepts than they have dimensions through superposition—a 768-dimensional space can represent tens of thousands of interpretable features by using non-orthogonal directions, with approximately 90-98% of nominal dimensions being redundant for capturing core linguistic information. Individual dimensions rarely have cleanly interpretable roles; instead, meaning emerges from linear directions that combine many dimensions, and sparse autoencoders have emerged as the breakthrough technique for extracting these hidden features.

This understanding has profound implications: the intrinsic dimensionality of language appears to be 10-100 dimensions depending on the representation type, vastly lower than the 768-4096 dimensions typically used. The extra dimensions provide computational benefits—orthogonal subspaces for interference reduction and capacity for rare features—rather than semantic necessity. Practical applications can often compress embeddings to 10-25% of original dimensionality while retaining over 95% of task performance.

Information lives in directions, not dimensions

Research consistently demonstrates that individual neurons and dimensions in large language models are polysemantic—they respond to multiple unrelated concepts. Anthropic's interpretability work documented a single neuron in a small language model that responded to academic citations, English dialogue, HTTP requests, AND Korean text simultaneously. This polysemanticity is not a bug but a feature: superposition allows networks to represent far more concepts than they have dimensions.

The mathematical mechanism is elegant. Features are encoded as linear directions (combinations of many dimensions) rather than individual coordinates, forming an overcomplete basis for the activation space. When features are sparse—meaning only a few activate simultaneously—networks can pack many features into fewer dimensions with minimal interference, mathematically related to compressed sensing theory. Anthropic's toy model research revealed distinct phases: dense features force orthogonal representation, but sparse features enable dramatically increased superposition, with important features getting dedicated dimensions while less important ones share capacity.

Sparse autoencoders (SAEs) represent the current breakthrough for disentangling superposition. Training an SAE on a 512-neuron MLP layer with expansion factors up to 256× (131,072 features) still discovers new interpretable features, demonstrating these layers encode tens of thousands of concepts despite having only 512 neurons. Crucially, SAE-extracted features are causally meaningful: Anthropic demonstrated that amplifying a "Golden Gate Bridge" feature caused Claude to obsessively discuss that topic, proving features aren't merely correlational artifacts.

The intrinsic dimensionality of language is surprisingly low

Rigorous measurement reveals that linguistic representations occupy manifolds with far lower dimensionality than their nominal embedding size. Using k-nearest neighbor methods, researchers found Word2Vec and GloVe embeddings (nominally 300 dimensions) have intrinsic dimensionality of approximately 24-25 dimensions—representing 92% redundancy. FastText shows even more compression at 13 dimensions intrinsic (96% redundancy).

Representation TypeNominal DimensionIntrinsic DimensionRedundancy
Word2Vec/GloVe30024-25~92%
FastText300~13~96%
Sentence embeddings768-102410-5093-99%
LLM token embeddings128-512025-12072-98%

Across the Pythia model suite (14M to 12B parameters), redundancy stabilizes at approximately 98% for models with 410M+ parameters, suggesting a fundamental property of language representation. Sentence embeddings can compress to as few as ~10 dimensions while retaining core semantic information. The practical implication: dense retrieval systems using 768-dimensional vectors often capture 99% of variance in the first ~256 dimensions, enabling 48× compression with less than 4% accuracy degradation.

Theoretical bounds from Google DeepMind (2025) establish that the sign-rank of the relevance matrix creates hard lower limits on required embedding dimensions—some document combinations mathematically cannot be represented regardless of training quality. This suggests hybrid architectures combining embeddings with other retrieval mechanisms may be necessary for complete coverage.

Probing and intervention reveal what embeddings encode

The methodological toolkit for understanding embeddings has matured substantially, with complementary approaches that distinguish what's encoded from what's actually used.

Linear probes remain foundational: training simple classifiers on frozen representations to predict linguistic properties. The Hewitt & Manning structural probe innovated by learning transformations where squared L2 distance between transformed vectors encodes parse tree distance, successfully recovering entire syntax trees from BERT representations. Edge probing (Tenney et al.) extended this to sub-sentence tasks, revealing that BERT "rediscovers the classical NLP pipeline"—tasks appear in natural layer progression from POS tagging through parsing to coreference.

However, probe accuracy alone doesn't prove models use encoded information. Causal interventions address this gap. Activation patching (from ROME research) runs models on clean and corrupted inputs, then patches single activations to measure which restores correct behavior. This revealed that factual associations localize to MLP modules in specific middle layers at the subject's last token position. Attribution patching uses gradients to approximate these effects with ~30,000× speedup, making fine-grained analysis tractable.

Critical limitations demand attention. The Hewitt & Liang control task methodology showed that complex MLP probes achieve high accuracy even on random control tasks—questioning what high accuracy actually demonstrates. The recommended selectivity metric (linguistic accuracy minus control task accuracy) reveals that popular probes often lack true selectivity. Distribution specificity poses another challenge: findings from simple syntactic prompts may not generalize to arbitrary text, and components may be more polysemantic than specific analyses suggest.

Layer-wise representations follow a predictable hierarchy

Transformer layers exhibit consistent specialization patterns, best understood through the residual stream framework from Anthropic. Rather than sequential processors, layers function as readers and writers to a shared communication channel, with each attention head and MLP reading via linear projection, computing transformations, then adding results back additively.

Early layers (bottom ~1/3) handle surface features: tokenization effects, positional information, word morphology, and basic syntax including part-of-speech tags. POS tagging accuracy peaks by layer 2-4 in BERT-base. Middle layers perform the heavy lifting: syntactic parsing, dependency structures, named entity recognition, and critically, factual knowledge storage. ROME research localized facts to MLP modules in layers 14-18 of GPT-2 XL, treating these as key-value memories where keys encode subjects and values encode knowledge. Late layers handle higher semantics: coreference resolution, long-range dependencies, and task-specific refinement toward vocabulary space.

The logit lens technique directly projects intermediate residual stream states into vocabulary space, revealing how the model's "beliefs" evolve. Early layers produce near-nonsense projections; inputs are immediately transformed into predictive representations that progressively refine. The improved tuned lens trains affine probes per layer, showing smooth prediction refinement and revealing that representations may be rotated differently at different depths.

The linear representation hypothesis—that features correspond to linear directions in activation space—proves broadly useful but incomplete. Some features are multi-dimensional: circular representations emerge for periodic concepts like days of the week, and RNNs learn magnitude-based (non-linear) representations for certain tasks. Middle layers exhibit maximum superposition and contain the most abstract, general features, making them optimal targets for sparse autoencoder training.

The relationship between dimensionality and performance follows diminishing returns

Empirical evidence shows performance scales logarithmically with embedding dimension, with clear diminishing returns. OpenAI's text-embedding-3-large at only 256 dimensions still outperforms text-embedding-ada-002 at 1536 dimensions on the MTEB benchmark. Microsoft Azure analysis identified 1024 dimensions as the practical sweet spot for text-embedding-3-large, providing essentially identical performance to 3072 dimensions while requiring only one-third the storage.

Dimension RangeTypical Use CasePerformance Context
256-384Simple classification, edge deploymentStill competitive with older large models
512-768Semantic search, retrievalStandard production deployments
768-1024Complex semantic tasksOptimal quality-efficiency tradeoff
1536-3072Maximum quality applicationsMarginal gains over 1024

Matryoshka Representation Learning (Kusupati et al., NeurIPS 2022) represents a paradigm shift, training embeddings useful at multiple nested dimensions simultaneously. A single model produces embeddings truncatable from 2048 down to 8 dimensions, frontloading important information in early dimensions. Performance retention is remarkable: models trained with MRL at 8.3% of embedding size preserve 98.37% of performance versus 96.46% for standard training. OpenAI, Nomic, and Alibaba have adopted this technique, enabling runtime dimension selection without retraining.

Scaling laws from Chinchilla research establish that model size and training tokens should scale equally (~20 tokens per parameter for compute-optimal training), with embedding dimension scaling alongside overall model capacity. The "double-peak phenomenon" in recommendation systems—performance rising, falling, rising, and falling again as dimensions increase— suggests complex interactions between dimensionality, task structure, and data characteristics that resist universal optimization.

Practical implications for embedding space design

For practitioners designing embedding systems, several evidence-based recommendations emerge. Start with established dimensions (768-1024) as baselines, knowing these capture the vast majority of semantic information. Test compression aggressively: PCA reduction from 1024 to 384 dimensions typically loses only 0.5-3.5% accuracy while dramatically reducing storage and compute.

Consider Matryoshka training for systems requiring variable precision—train once at high dimensionality and deploy flexibly across use cases from edge devices to maximum-quality applications. Measure intrinsic dimensionality of your specific task to identify optimal compression limits; LoRA ranks below estimated intrinsic dimensionality cause clear performance drops, while ranks slightly above optimize the compactness-expressivity tradeoff.

For interpretability work, target middle layers with sparse autoencoders to extract the most abstract, general features. Combine probing (what's encoded) with causal interventions (what's used) for robust conclusions. Be skeptical of single-dimension interpretations—information lives in directions across many coordinates, and superposition means individual dimensions serve multiple purposes simultaneously.

Conclusion

The geometry of neural embeddings reveals a sophisticated compression strategy: language models create low-dimensional manifolds of meaning embedded in high-dimensional spaces, using superposition to pack far more concepts than dimensions. The practical ceiling appears to be around 50-100 intrinsic dimensions for most linguistic tasks, with additional dimensions providing computational headroom rather than semantic capacity.

Three methodological advances have transformed the field: sparse autoencoders for extracting interpretable features from superposition, causal interventions for distinguishing encoding from use, and Matryoshka training for flexible deployment. The evidence points toward 1024 dimensions as a reasonable default, aggressive compression as safe for most applications, and interpretability as requiring direction-based rather than dimension-based analysis. The apparent complexity of 768+ dimensional spaces masks an elegant underlying structure—meaning occupies curves and surfaces of much lower dimensionality, organized by the geometry of language itself.

Content is user-generated and unverified.
    Neural Embeddings Explained: How AI Encodes Meaning in Vectors | Claude