Content is user-generated and unverified.

The Compression at the Heart of Things: A Universal Data-Reduction Principle from Black Holes to Brains

TL;DR

Across five very different research programs — Maldacena's holographic duality, Hinton and Salakhutdinov's deep autoencoders, a 2026 ISTA discovery that the hippocampus prunes from "full" to sparse, the Johns Hopkins "Universal Weight Subspace Hypothesis" (arXiv 2512.05117v2), and Demis Hassabis's Nobel-lecture conjecture that nature's patterns are learnable — the same structural fact keeps recurring: high-dimensional descriptions of the world collapse onto a much lower-rank manifold that nonetheless retains everything that matters.
The strongest mathematical bridges between these domains are real, not metaphorical: Wilsonian renormalization-group flow is provably monotonic (Zamolodchikov's c-theorem); Swingle's MERA tensor networks discretize AdS/CFT so that the holographic radial direction is a scale of coarse-graining; and Mehta–Schwab and Koch-Janusz–Ringel give explicit mappings between RG and deep learning. Connecting these to neural pruning and to "low-rank intelligence" is suggestive but not yet rigorous.
For quantum gravity, the practical payoff of this convergence is the It-from-Qubit reframe: spacetime is not the substrate of information but the output of an information-compression process. If that is right, then the field that has the most operational experience designing such compressions — modern machine learning — may have something concrete, not merely poetic, to contribute.

Key Findings

1. The same low-rank shadow falls across the five sources. Maldacena's 1997 paper showed that a (d+1)-dimensional gravitational theory is fully encoded in a d-dimensional conformal field theory — fewer dimensions, all the physics. Hinton and Salakhutdinov's 2006 Science paper showed that a deep autoencoder with a 784-1000-500-250-30 architecture could reduce 784-pixel MNIST images to exactly 30 numbers in its central code layer, achieving an average squared reconstruction error of 3.00 versus 13.87 for standard PCA. The Vargas-Barroso et al. study (Nature Communications, 2026) shows the mouse hippocampal CA3 begins life with a dense, near-random recurrent network and is sculpted — by synaptic pruning and weakening — into a sparse, structured one that maximizes memory capacity. Kaushik et al.'s "Universal Weight Subspace Hypothesis" (arXiv 2512.05117v2) finds empirically that hundreds of independently trained deep networks — 500 Mistral-7B LoRAs, 500 Vision Transformers, 50 LLaMA-8B models — converge to a shared low-rank subspace capturing the majority of variance in a handful (often k ≤ 16) principal directions. And Hassabis's central claim in Lex Fridman Podcast #475, made in the episode highlight at 00:00:27, is the same statement viewed from the AI side: "Perhaps there is some kind of lower dimensional manifold that can be learned… That's maybe true of most of reality."

2. The connection between RG flow and deep learning is rigorous in places, heuristic elsewhere. Mehta and Schwab (arXiv:1410.3831, 2014) constructed an exact mapping between Kadanoff's variational renormalization group and deep architectures built from Restricted Boltzmann Machines — precisely the building blocks Hinton and Salakhutdinov used. Koch-Janusz and Ringel (Nature Physics 14, 578–582, 2018) trained a neural network to identify the "relevant" slow degrees of freedom for an RG step by maximizing a mutual-information objective, supplying a principled bridge between information bottlenecks and Wilsonian coarse-graining. Zamolodchikov's c-theorem (1986) — and its higher-dimensional cousins (the four-dimensional a-theorem proved by Komargodski–Schwimmer 2011) — proves that a "c-function" counting effective degrees of freedom decreases monotonically along RG flow from UV to IR. This is the cleanest physical theorem we have stating that the world systematically forgets. The bridge to autoencoders and to neural pruning is by analogy at this point, but the analogy is structural: in each case a fine-grained representation is compressed by integrating out, projecting out, or pruning away modes that do not survive a downstream criterion.

3. Holography reframes quantum gravity as a compression problem. The holographic principle of 't Hooft (1993) and Susskind (1995), generalizing Bekenstein's area-law entropy bound for black holes, asserts that the number of independent degrees of freedom inside a region scales not as the volume but as the area of its boundary in Planck units. Maldacena gave this principle a concrete realization: the bulk gravitational physics in AdS₅ × S⁵ is fully encoded in 𝒩 = 4 super-Yang–Mills on the four-dimensional boundary. Ryu and Takayanagi (PRL 2006) made the bridge sharper: the entanglement entropy of a boundary region equals (in Planck units) the area of the minimal bulk surface anchored to it. Van Raamsdonk (2010) then argued that "the emergence of classically connected spacetimes is intimately related to the quantum entanglement of degrees of freedom" — disentangle the boundary, and the bulk literally pulls apart. Swingle (arXiv:0905.1317, 2009; Phys. Rev. D 86, 065007, 2012) made this geometric in MERA tensor networks: "The discrete geometry that appears at the critical point is nothing but a discrete version of anti de Sitter space (AdS)," and the depth of a MERA — the number of coarse-grainings performed — plays the role of the AdS radial direction. The MERA is an RG circuit; the holographic radial direction is a renormalization scale. This is, in the technical literature, what AdS/CFT's emergent dimension actually means.

4. The empirical neuroscience now agrees with the engineering intuition. Vargas-Barroso et al. mapped >7,000 potential connections in CA3 across three developmental stages in mice and found that in P7–P8 animals "single synaptic events are sufficient to trigger postsynaptic spiking… whereas spatial summation of several inputs is required at later time points." Their computational simulation of a 100,000-neuron network showed that the mature, sparser, weaker, more-distributed configuration maximizes pattern-storage capacity. That is, biology built an over-parameterized initialization and then ran a long, selective pruning routine — exactly the recipe deep-learning engineers stumbled onto with lottery-ticket pruning, dropout, and LoRA. Stringer, Pachitariu, Steinmetz, Carandini & Harris (Nature 571, 361–365, 2019) had already shown that mouse V1 population activity is "high-dimensional, and correlations obeyed an unexpected power law: the nth principal component variance scaled as 1/n," riding the edge of as-high-dimensional-as-possible-while-remaining-smooth. They proved mathematically that any slower decay would destroy the smoothness of the code. Brains, like trained networks, live on the boundary of the manifold hypothesis.

Details

The five sources, accurately

(1) Maldacena (1997), "The Large N Limit of Superconformal Field Theories and Supergravity," Adv. Theor. Math. Phys. 2 (1998) 231 (arXiv:hep-th/9711200). Maldacena's abstract opens by stating that "the large N limit of certain conformal field theories in various dimensions include in their Hilbert space a sector describing supergravity on the product of Anti-deSitter spacetimes, spheres and other compact manifolds." The key conceptual move is the identification of an extra emergent dimension with energy scale: of the radial coordinate U on AdS, Maldacena writes "It seems natural to interpret motion in U as moving in energy scales, going to the IR for small U and to the UV for large U." The boundary CFT is non-gravitational and lives in fewer dimensions; the bulk is gravitational and lives in one more. Crucially, large N (the rank of the gauge group) is what makes the gravity description classical: a huge number of microscopic degrees of freedom on the boundary cohere into a small number of bulk geometric degrees of freedom. This is dimensionality reduction at the most fundamental level of physical theory.

(2) Hinton & Salakhutdinov (2006), "Reducing the Dimensionality of Data with Neural Networks," Science 313, 504–507. "High-dimensional data can be converted to low-dimensional codes," they write, "by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors." The deep autoencoder is, architecturally, a forced bottleneck: in their MNIST experiments, an encoder squeezes 784 pixels through 1000, 500, 250, and finally 30 units in the central code layer, and a symmetric decoder reconstructs the image; the resulting average squared reconstruction error on test digits was 3.00, versus 13.87 for the best linear PCA at the same code size. The 2006 contribution was that RBM-based pretraining made deep autoencoders trainable at all, and that once trainable, they outperform PCA decisively. This is the moment when "the manifold hypothesis" — that natural data live on a low-dimensional submanifold of their ambient space — became an engineering principle rather than a slogan.

(3) Vargas-Barroso, Watson, Navas-Olive, Schlögl & Jonas (2026), "Developmental emergence of sparse and structured synaptic connectivity in the hippocampal CA3 memory circuit," Nature Communications, doi:10.1038/s41467-026-71914-x. This is the primary source behind the PsyPost article. Using multicellular patch-clamp recording of up to eight CA3 pyramidal neurons simultaneously, the ISTA group tested >7,000 potential connections across postnatal day 7–8, 18–25, and 45–50 mice. They report "a developmental transformation from local, dense, and random connectivity to a distributed, sparse, and structured configuration." In their words, "sparse and structured connectivity may emerge via experience-dependent mechanisms." The mature network requires "spatial summation of several inputs" to fire, where the immature one fires from any single synapse. A 100,000-neuron simulation showed the mature configuration is what maximizes auto-associative memory capacity. Peter Jonas, in the ISTA press release: "It follows what we call a pruning model: it starts out full, and then it becomes streamlined and optimized." The key caveats: this is mouse hippocampus; the developmental window the authors connect (cautiously) to the offset of infantile amnesia in humans is at the timescale of weeks-to-months; and the molecular mechanisms of pruning are not yet identified. The popular framing — "brain begins full, prunes down to optimize learning" — is faithful to the paper.

(4) Kaushik, Chaudhari, Vaidya, Chellappa & Yuille, "The Universal Weight Subspace Hypothesis," arXiv:2512.05117v2 (Johns Hopkins, 2025). This paper's actual thesis: "deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces." It is an empirical paper, not a theoretical one, with theory in service of explaining the findings. Across more than 1,100 models — 500 Mistral-7B LoRAs, 500 ViTs, 50 LLaMA-8B finetunes, 177 GPT-2s, plus Flan-T5 — a layer-wise spectral decomposition of the weight matrices reveals that "the majority of variance is captured by the top few spectral components." Striking specifics: 500 ViTs can be replaced by a single universal-subspace model with ~100× memory reduction and no significant loss in classification accuracy; for the Mistral-7B LoRA experiment the authors report that "the majority of the information is present in only 16 (or less) distinct subspace directions for all layers." The authors situate this beside the Neural Tangent Kernel limit (Jacot et al. 2018), mechanistic-interpretability universality (Olah et al. 2020), the lottery-ticket hypothesis (Frankle & Carbin 2019), and Mao et al.'s 2024 PNAS result that "the training process of many deep networks explores the same low-dimensional manifold." The paper itself does not invoke holography or quantum gravity — that connection is ours to draw — but its closing question is striking: "if neural networks systematically collapse into the same subspace … is this lack of diversity a fundamental bottleneck?"

(5) Hassabis on Lex Fridman Podcast #475 (July 23, 2025). The signature "lower-dimensional manifold" passage appears in the episode highlight at 00:00:27 (and is reprised at the start of the "Veo 3 and understanding reality" chapter at 14:26): "Somehow these systems are reverse engineering from just watching YouTube videos. So presumably what's happening is it's extracting some underlying structure around how these materials behave. So perhaps there is some kind of lower dimensional manifold that can be learned if we actually fully understood what's going on under the hood. That's maybe true of most of reality." Earlier in the same conversation Hassabis lays out the underlying conjecture, originally stated in his Nobel lecture "Accelerating Scientific Discovery with AI" delivered 8 December 2024 at the Aula Magna, Stockholm University: "Any pattern that can be generated or found in nature can be efficiently discovered and modeled by a classical learning algorithm." He links this directly to P vs NP — "if you think of physics as informational … the P equals NP question is a physics question" — and proposes a putative new complexity class of "learnable natural systems." Hassabis: "I think information is primary, information is the most sort of fundamental unit of the universe, more fundamental than energy and matter." The framing he offers — that evolution, weathering, and cosmological selection produce systems with learnable low-rank structure — is conjectural; he explicitly notes that abstract problems like factoring large numbers may resist this treatment.

The defensible spine

Renormalization group as the prototype of irreversible data reduction. Wilson's RG and the c-theorem of Zamolodchikov (1986) make precise the intuition that integrating out short-distance physics is a one-way process. nLab's encyclopedic summary captures it: "An appealing physical interpretation of the c-function is as a kind of entropy of information about the critical system. Under renormalization, information is lost about the short distance behaviour of the correlation functions." The four-dimensional analog — the a-theorem — extends this to physically realistic dimensions. The RG is not metaphor; it is the working physical theory of what dimensions actually matter at a given scale. The c-function / a-theorem ensures that the count of effective degrees of freedom is monotonically non-increasing under coarse-graining — physics's most rigorous "data-reduction theorem."

Holography as data reduction made geometric. 't Hooft (1993) and Susskind (1995) generalized Bekenstein's bound: in Bousso's compact restatement (hep-th/9911002), "Degrees of freedom that cannot be utilized should not be considered to exist." The number of independent quantum degrees of freedom in a volume V is bounded by the area of ∂V in Planck units. Maldacena turned the principle into a duality. Ryu and Takayanagi (2006) gave it an entanglement-theoretic engine: bulk geometric area = boundary entanglement entropy. Van Raamsdonk (2010) drew the conclusion: entanglement is what connects regions of spacetime. Swingle (2009/2012) made the picture computational via MERA. In his words, "the analog of log z in the lattice setup is simply the layer number or depth which counts how many coarse grainings have been performed"; and "further evidence for the connection between entanglement renormalization and holography comes from the holographic interpretation of the extra gravitational dimension in terms of energy scale in the gauge theory." The MERA is an RG circuit; the holographic radial direction is a renormalization scale.

The bridge to machine learning. Mehta and Schwab (arXiv:1410.3831) prove an exact correspondence between Kadanoff's variational RG transformation and the layer-by-layer structure of stacked RBMs — "deep learning algorithms may be employing a generalized RG-like scheme to learn relevant features from data." Koch-Janusz and Ringel (Nature Physics 14, 578–582, 2018) make the information-theoretic engine explicit: train a neural network to maximize the mutual information between coarse-grained variables and a distant environment, and you recover Wilson's relevant operators. Tishby and Zaslavsky's information-bottleneck story (2015) frames deep learning itself as an optimization that compresses input X while preserving information about Y — successive layers as successive renormalization steps. These are the rigorous backbones. They do not extend, today, to a proof that hippocampal pruning implements the same operation, or that the Universal Weight Subspace lies in a specific RG fixed-point neighborhood. But they make the conjecture more than poetry.

The bridge to neuroscience. Stringer, Pachitariu, Steinmetz, Carandini & Harris (Nature 571, 361–365, 2019) showed that mouse V1 population activity is "high-dimensional, and correlations obeyed an unexpected power law: the nth principal component variance scaled as 1/n," and proved mathematically "that if the variance spectrum was to decay more slowly then the population code could not be smooth, allowing small changes in input to dominate population activity." The visual cortex thus sits at the most-expressive-but-still-smooth point on the manifold-dimensionality spectrum. Vargas-Barroso et al. give the developmental mechanism for memory circuits: start with a dense initialization, prune to optimum. Together these say: biological intelligence and engineered intelligence are both organized as compressions of a high-dimensional sensory stream onto a structured low-rank manifold, and the dimension of that manifold is set by an information-theoretic trade-off.

What is rigorous, what is heuristic

Connection	Status
AdS/CFT bulk reconstruction from boundary CFT	Rigorous within string theory; precise dictionary for many examples
Ryu–Takayanagi: entanglement entropy ↔ minimal surface area	Proved in the bulk-classical limit; quantum corrections worked out
MERA depth ↔ AdS radial direction ↔ RG scale	Proposed (Swingle 2009), demonstrated for critical 1D CFTs
Zamolodchikov c-theorem / a-theorem: monotonic decrease of dof under RG	Theorem (2D and 4D, unitary QFTs)
Variational RG ↔ stacked RBMs (Mehta–Schwab)	Exact mathematical mapping
Information bottleneck describes deep-learning training trajectories	Influential framework; debated empirically (Saxe et al. 2018)
"Universal weight subspaces" across independently trained nets	Strong empirical evidence; theoretical explanation conjectural
Hippocampal pruning ≈ autoencoder bottleneck optimization	Structural analogy; no formal mapping yet
"Reality lives on a low-rank manifold" (Hassabis conjecture)	Conjecture; supported by AlphaFold / Veo evidence; not theorem
Compression principles in ML literally inform quantum gravity	A research program (It-from-Qubit), not a finished result

Armchair Physics: what if, for fun

Clearly flagged as conjecture.

Erik Verlinde (2011) and Ted Jacobson (1995) have both written derivations of Einstein's equations from thermodynamic / entropic premises, treating gravity not as a fundamental force but as the statistical pull of information toward higher-entropy configurations. Jacobson's argument is precise — "the Einstein equation is an equation of state" — but, as Chirco, Haggard, Riello & Rovelli (2014) emphasize, the existence of underlying microscopic degrees of freedom is an interpretation, not a theorem. Verlinde's 2017 emergent-gravity proposal remains controversial; serious technical critiques exist (e.g., Roveto and Muñoz, arXiv:1201.2475).

But step back, and a pattern emerges that the present-day It-from-Qubit Simons Collaboration is taking seriously: spacetime, like a hidden representation in a deep network, may be the output of a compression — the structure that emerges when a high-dimensional state is encoded efficiently. If that is right, then a variational principle on the boundary (an action, an entropy functional, an information bottleneck) generates the bulk geometry. Hassabis's conjecture — that natural systems lie on low-rank learnable manifolds because they have been "survived" by some selection process — would, in this framing, be the statement that the universe is constructively compressible in exactly the way machine-learning models exploit. Whether the same variational structure that produces gravity from boundary entanglement also produces representations in a transformer from token sequences is currently a metaphor with three or four serious load-bearing equations under it, and many more that would have to be added before it became a theory.

The most provocative possibility, and the one worth flagging because serious researchers (Susskind, Maldacena, Van Raamsdonk, Carroll, ChunJun Cao, Vijay Balasubramanian) are exploring versions of it: intelligence and gravity may be two specializations of a more general principle of how high-dimensional information becomes low-dimensional structure. The Universal Weight Subspace Hypothesis observes the phenomenon in transformers; AdS/CFT observes it in quantum gravity; CA3 pruning observes it in the brain. There is, at present, no single equation connecting all three. There may yet be.

Recommendations

For an essay aimed at readers who can handle nuance:

Lead with the empirical convergence, not the speculation. The strongest version of the thesis is: "Four very different fields independently discovered that high-dimensional descriptions of natural systems collapse to low-dimensional ones." This is defensible. The unified-principle claim is not yet defensible and should be framed as the question, not the answer.
Use the c-theorem and AdS/CFT–MERA as the load-bearing technical anchors. These are the points at which the analogy becomes mathematics. Mehta–Schwab gives a second anchor on the ML side; Koch-Janusz–Ringel a third.
Cite the Vargas-Barroso et al. Nature Communications paper, not the PsyPost summary. The popular framing is faithful but the primary source has the actual numbers (>7,000 connections tested, three age windows, 100,000-neuron simulation).
Quote Hassabis precisely and locate the quote in the conversation. He is making a structural claim about which problems are tractable by classical learning systems, anchored on the AlphaFold/AlphaGo precedent. He is not claiming the universe is literally a neural network.
Separate the "Armchair Physics" sidebar visually and verbally. Verlinde's entropic gravity in particular has serious technical critics; the Simons "It from Qubit" program is the more responsible flagship for the speculative wing.

Benchmarks that would change these recommendations:

A proof that hippocampal pruning dynamics minimize an information-bottleneck-style objective would upgrade analogy → theorem in neuroscience.
A demonstration that the Universal Weight Subspace lies at an attractor of an RG-like flow on weight space would upgrade analogy → theorem in ML.
An experimental probe of holographic structure in a strongly coupled quantum simulator (the IFQ near-term-experiment proposals) would upgrade AdS/CFT from a duality of equations to a duality with physical realizations.

Caveats

The framing "universal force that reduces data" is poetic, not technical. No physical force is involved on either side. What is shared is a structural principle: high-dimensional data has low intrinsic dimension; processes that exploit this (RG flow, holographic encoding, autoencoder training, synaptic pruning) are all variational compressions. Calling that a "force" is a rhetorical device; the load-bearing technical content is variational, not dynamical.
The "AdS/CFT helps quantum gravity" claim has a real-world ceiling. AdS/CFT is a duality in negatively curved spacetimes; our universe appears to be positively curved (de Sitter). The dS analog of holography is far less well understood. The strongest claim warranted here is: the principle that low-dimensional information can encode higher-dimensional geometry is demonstrated in AdS and conjectured in dS.
The Universal Weight Subspace Hypothesis is one preprint, one team. It is impressive in scale (1,100+ models analyzed) but is a December 2025 preprint as of this writing. Independent replication is not yet available.
The Vargas-Barroso paper is mouse work. The connection to human infantile amnesia is hypothesized by the authors, not demonstrated. The molecular mechanism of pruning (microglia? interneuron-mediated remodeling?) is not yet identified.
Hassabis's claim is a conjecture from a Nobel lecture, not a published theorem. He is explicit about this: "I felt that it's sort of a tradition, I think, of Nobel Prize lectures that you're supposed to be a little bit provocative."
Information-bottleneck theory has been productively contested. Saxe et al. (2018) showed that the compression phase Tishby observed is sensitive to activation function and may not be the universal explanation of deep learning generalization. The bottleneck framing is valuable; it is not settled.

The honest summary is this: across physics, neuroscience, and machine learning, the most efficient descriptions of natural systems are much lower-dimensional than the systems themselves, and the processes that produce those descriptions — RG flow, holographic encoding, autoencoder training, synaptic pruning — share a recognizable structural family. Whether that family corresponds to a single mathematical object is the open question this essay is making the case to take seriously.

Content is user-generated and unverified.