1. The same low-rank shadow falls across the five sources. Maldacena's 1997 paper showed that a (d+1)-dimensional gravitational theory is fully encoded in a d-dimensional conformal field theory — fewer dimensions, all the physics. Hinton and Salakhutdinov's 2006 Science paper showed that a deep autoencoder with a 784-1000-500-250-30 architecture could reduce 784-pixel MNIST images to exactly 30 numbers in its central code layer, achieving an average squared reconstruction error of 3.00 versus 13.87 for standard PCA. The Vargas-Barroso et al. study (Nature Communications, 2026) shows the mouse hippocampal CA3 begins life with a dense, near-random recurrent network and is sculpted — by synaptic pruning and weakening — into a sparse, structured one that maximizes memory capacity. Kaushik et al.'s "Universal Weight Subspace Hypothesis" (arXiv 2512.05117v2) finds empirically that hundreds of independently trained deep networks — 500 Mistral-7B LoRAs, 500 Vision Transformers, 50 LLaMA-8B models — converge to a shared low-rank subspace capturing the majority of variance in a handful (often k ≤ 16) principal directions. And Hassabis's central claim in Lex Fridman Podcast #475, made in the episode highlight at 00:00:27, is the same statement viewed from the AI side: "Perhaps there is some kind of lower dimensional manifold that can be learned… That's maybe true of most of reality."
2. The connection between RG flow and deep learning is rigorous in places, heuristic elsewhere. Mehta and Schwab (arXiv:1410.3831, 2014) constructed an exact mapping between Kadanoff's variational renormalization group and deep architectures built from Restricted Boltzmann Machines — precisely the building blocks Hinton and Salakhutdinov used. Koch-Janusz and Ringel (Nature Physics 14, 578–582, 2018) trained a neural network to identify the "relevant" slow degrees of freedom for an RG step by maximizing a mutual-information objective, supplying a principled bridge between information bottlenecks and Wilsonian coarse-graining. Zamolodchikov's c-theorem (1986) — and its higher-dimensional cousins (the four-dimensional a-theorem proved by Komargodski–Schwimmer 2011) — proves that a "c-function" counting effective degrees of freedom decreases monotonically along RG flow from UV to IR. This is the cleanest physical theorem we have stating that the world systematically forgets. The bridge to autoencoders and to neural pruning is by analogy at this point, but the analogy is structural: in each case a fine-grained representation is compressed by integrating out, projecting out, or pruning away modes that do not survive a downstream criterion.
3. Holography reframes quantum gravity as a compression problem. The holographic principle of 't Hooft (1993) and Susskind (1995), generalizing Bekenstein's area-law entropy bound for black holes, asserts that the number of independent degrees of freedom inside a region scales not as the volume but as the area of its boundary in Planck units. Maldacena gave this principle a concrete realization: the bulk gravitational physics in AdS₅ × S⁵ is fully encoded in 𝒩 = 4 super-Yang–Mills on the four-dimensional boundary. Ryu and Takayanagi (PRL 2006) made the bridge sharper: the entanglement entropy of a boundary region equals (in Planck units) the area of the minimal bulk surface anchored to it. Van Raamsdonk (2010) then argued that "the emergence of classically connected spacetimes is intimately related to the quantum entanglement of degrees of freedom" — disentangle the boundary, and the bulk literally pulls apart. Swingle (arXiv:0905.1317, 2009; Phys. Rev. D 86, 065007, 2012) made this geometric in MERA tensor networks: "The discrete geometry that appears at the critical point is nothing but a discrete version of anti de Sitter space (AdS)," and the depth of a MERA — the number of coarse-grainings performed — plays the role of the AdS radial direction. The MERA is an RG circuit; the holographic radial direction is a renormalization scale. This is, in the technical literature, what AdS/CFT's emergent dimension actually means.
4. The empirical neuroscience now agrees with the engineering intuition. Vargas-Barroso et al. mapped >7,000 potential connections in CA3 across three developmental stages in mice and found that in P7–P8 animals "single synaptic events are sufficient to trigger postsynaptic spiking… whereas spatial summation of several inputs is required at later time points." Their computational simulation of a 100,000-neuron network showed that the mature, sparser, weaker, more-distributed configuration maximizes pattern-storage capacity. That is, biology built an over-parameterized initialization and then ran a long, selective pruning routine — exactly the recipe deep-learning engineers stumbled onto with lottery-ticket pruning, dropout, and LoRA. Stringer, Pachitariu, Steinmetz, Carandini & Harris (Nature 571, 361–365, 2019) had already shown that mouse V1 population activity is "high-dimensional, and correlations obeyed an unexpected power law: the nth principal component variance scaled as 1/n," riding the edge of as-high-dimensional-as-possible-while-remaining-smooth. They proved mathematically that any slower decay would destroy the smoothness of the code. Brains, like trained networks, live on the boundary of the manifold hypothesis.
(1) Maldacena (1997), "The Large N Limit of Superconformal Field Theories and Supergravity," Adv. Theor. Math. Phys. 2 (1998) 231 (arXiv:hep-th/9711200). Maldacena's abstract opens by stating that "the large N limit of certain conformal field theories in various dimensions include in their Hilbert space a sector describing supergravity on the product of Anti-deSitter spacetimes, spheres and other compact manifolds." The key conceptual move is the identification of an extra emergent dimension with energy scale: of the radial coordinate U on AdS, Maldacena writes "It seems natural to interpret motion in U as moving in energy scales, going to the IR for small U and to the UV for large U." The boundary CFT is non-gravitational and lives in fewer dimensions; the bulk is gravitational and lives in one more. Crucially, large N (the rank of the gauge group) is what makes the gravity description classical: a huge number of microscopic degrees of freedom on the boundary cohere into a small number of bulk geometric degrees of freedom. This is dimensionality reduction at the most fundamental level of physical theory.
(2) Hinton & Salakhutdinov (2006), "Reducing the Dimensionality of Data with Neural Networks," Science 313, 504–507. "High-dimensional data can be converted to low-dimensional codes," they write, "by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors." The deep autoencoder is, architecturally, a forced bottleneck: in their MNIST experiments, an encoder squeezes 784 pixels through 1000, 500, 250, and finally 30 units in the central code layer, and a symmetric decoder reconstructs the image; the resulting average squared reconstruction error on test digits was 3.00, versus 13.87 for the best linear PCA at the same code size. The 2006 contribution was that RBM-based pretraining made deep autoencoders trainable at all, and that once trainable, they outperform PCA decisively. This is the moment when "the manifold hypothesis" — that natural data live on a low-dimensional submanifold of their ambient space — became an engineering principle rather than a slogan.
(3) Vargas-Barroso, Watson, Navas-Olive, Schlögl & Jonas (2026), "Developmental emergence of sparse and structured synaptic connectivity in the hippocampal CA3 memory circuit," Nature Communications, doi:10.1038/s41467-026-71914-x. This is the primary source behind the PsyPost article. Using multicellular patch-clamp recording of up to eight CA3 pyramidal neurons simultaneously, the ISTA group tested >7,000 potential connections across postnatal day 7–8, 18–25, and 45–50 mice. They report "a developmental transformation from local, dense, and random connectivity to a distributed, sparse, and structured configuration." In their words, "sparse and structured connectivity may emerge via experience-dependent mechanisms." The mature network requires "spatial summation of several inputs" to fire, where the immature one fires from any single synapse. A 100,000-neuron simulation showed the mature configuration is what maximizes auto-associative memory capacity. Peter Jonas, in the ISTA press release: "It follows what we call a pruning model: it starts out full, and then it becomes streamlined and optimized." The key caveats: this is mouse hippocampus; the developmental window the authors connect (cautiously) to the offset of infantile amnesia in humans is at the timescale of weeks-to-months; and the molecular mechanisms of pruning are not yet identified. The popular framing — "brain begins full, prunes down to optimize learning" — is faithful to the paper.
(4) Kaushik, Chaudhari, Vaidya, Chellappa & Yuille, "The Universal Weight Subspace Hypothesis," arXiv:2512.05117v2 (Johns Hopkins, 2025). This paper's actual thesis: "deep neural networks trained across diverse tasks exhibit remarkably similar low-dimensional parametric subspaces." It is an empirical paper, not a theoretical one, with theory in service of explaining the findings. Across more than 1,100 models — 500 Mistral-7B LoRAs, 500 ViTs, 50 LLaMA-8B finetunes, 177 GPT-2s, plus Flan-T5 — a layer-wise spectral decomposition of the weight matrices reveals that "the majority of variance is captured by the top few spectral components." Striking specifics: 500 ViTs can be replaced by a single universal-subspace model with ~100× memory reduction and no significant loss in classification accuracy; for the Mistral-7B LoRA experiment the authors report that "the majority of the information is present in only 16 (or less) distinct subspace directions for all layers." The authors situate this beside the Neural Tangent Kernel limit (Jacot et al. 2018), mechanistic-interpretability universality (Olah et al. 2020), the lottery-ticket hypothesis (Frankle & Carbin 2019), and Mao et al.'s 2024 PNAS result that "the training process of many deep networks explores the same low-dimensional manifold." The paper itself does not invoke holography or quantum gravity — that connection is ours to draw — but its closing question is striking: "if neural networks systematically collapse into the same subspace … is this lack of diversity a fundamental bottleneck?"
(5) Hassabis on Lex Fridman Podcast #475 (July 23, 2025). The signature "lower-dimensional manifold" passage appears in the episode highlight at 00:00:27 (and is reprised at the start of the "Veo 3 and understanding reality" chapter at 14:26): "Somehow these systems are reverse engineering from just watching YouTube videos. So presumably what's happening is it's extracting some underlying structure around how these materials behave. So perhaps there is some kind of lower dimensional manifold that can be learned if we actually fully understood what's going on under the hood. That's maybe true of most of reality." Earlier in the same conversation Hassabis lays out the underlying conjecture, originally stated in his Nobel lecture "Accelerating Scientific Discovery with AI" delivered 8 December 2024 at the Aula Magna, Stockholm University: "Any pattern that can be generated or found in nature can be efficiently discovered and modeled by a classical learning algorithm." He links this directly to P vs NP — "if you think of physics as informational … the P equals NP question is a physics question" — and proposes a putative new complexity class of "learnable natural systems." Hassabis: "I think information is primary, information is the most sort of fundamental unit of the universe, more fundamental than energy and matter." The framing he offers — that evolution, weathering, and cosmological selection produce systems with learnable low-rank structure — is conjectural; he explicitly notes that abstract problems like factoring large numbers may resist this treatment.
Renormalization group as the prototype of irreversible data reduction. Wilson's RG and the c-theorem of Zamolodchikov (1986) make precise the intuition that integrating out short-distance physics is a one-way process. nLab's encyclopedic summary captures it: "An appealing physical interpretation of the c-function is as a kind of entropy of information about the critical system. Under renormalization, information is lost about the short distance behaviour of the correlation functions." The four-dimensional analog — the a-theorem — extends this to physically realistic dimensions. The RG is not metaphor; it is the working physical theory of what dimensions actually matter at a given scale. The c-function / a-theorem ensures that the count of effective degrees of freedom is monotonically non-increasing under coarse-graining — physics's most rigorous "data-reduction theorem."
Holography as data reduction made geometric. 't Hooft (1993) and Susskind (1995) generalized Bekenstein's bound: in Bousso's compact restatement (hep-th/9911002), "Degrees of freedom that cannot be utilized should not be considered to exist." The number of independent quantum degrees of freedom in a volume V is bounded by the area of ∂V in Planck units. Maldacena turned the principle into a duality. Ryu and Takayanagi (2006) gave it an entanglement-theoretic engine: bulk geometric area = boundary entanglement entropy. Van Raamsdonk (2010) drew the conclusion: entanglement is what connects regions of spacetime. Swingle (2009/2012) made the picture computational via MERA. In his words, "the analog of log z in the lattice setup is simply the layer number or depth which counts how many coarse grainings have been performed"; and "further evidence for the connection between entanglement renormalization and holography comes from the holographic interpretation of the extra gravitational dimension in terms of energy scale in the gauge theory." The MERA is an RG circuit; the holographic radial direction is a renormalization scale.
The bridge to machine learning. Mehta and Schwab (arXiv:1410.3831) prove an exact correspondence between Kadanoff's variational RG transformation and the layer-by-layer structure of stacked RBMs — "deep learning algorithms may be employing a generalized RG-like scheme to learn relevant features from data." Koch-Janusz and Ringel (Nature Physics 14, 578–582, 2018) make the information-theoretic engine explicit: train a neural network to maximize the mutual information between coarse-grained variables and a distant environment, and you recover Wilson's relevant operators. Tishby and Zaslavsky's information-bottleneck story (2015) frames deep learning itself as an optimization that compresses input X while preserving information about Y — successive layers as successive renormalization steps. These are the rigorous backbones. They do not extend, today, to a proof that hippocampal pruning implements the same operation, or that the Universal Weight Subspace lies in a specific RG fixed-point neighborhood. But they make the conjecture more than poetry.
The bridge to neuroscience. Stringer, Pachitariu, Steinmetz, Carandini & Harris (Nature 571, 361–365, 2019) showed that mouse V1 population activity is "high-dimensional, and correlations obeyed an unexpected power law: the nth principal component variance scaled as 1/n," and proved mathematically "that if the variance spectrum was to decay more slowly then the population code could not be smooth, allowing small changes in input to dominate population activity." The visual cortex thus sits at the most-expressive-but-still-smooth point on the manifold-dimensionality spectrum. Vargas-Barroso et al. give the developmental mechanism for memory circuits: start with a dense initialization, prune to optimum. Together these say: biological intelligence and engineered intelligence are both organized as compressions of a high-dimensional sensory stream onto a structured low-rank manifold, and the dimension of that manifold is set by an information-theoretic trade-off.
| Connection | Status |
|---|---|
| AdS/CFT bulk reconstruction from boundary CFT | Rigorous within string theory; precise dictionary for many examples |
| Ryu–Takayanagi: entanglement entropy ↔ minimal surface area | Proved in the bulk-classical limit; quantum corrections worked out |
| MERA depth ↔ AdS radial direction ↔ RG scale | Proposed (Swingle 2009), demonstrated for critical 1D CFTs |
| Zamolodchikov c-theorem / a-theorem: monotonic decrease of dof under RG | Theorem (2D and 4D, unitary QFTs) |
| Variational RG ↔ stacked RBMs (Mehta–Schwab) | Exact mathematical mapping |
| Information bottleneck describes deep-learning training trajectories | Influential framework; debated empirically (Saxe et al. 2018) |
| "Universal weight subspaces" across independently trained nets | Strong empirical evidence; theoretical explanation conjectural |
| Hippocampal pruning ≈ autoencoder bottleneck optimization | Structural analogy; no formal mapping yet |
| "Reality lives on a low-rank manifold" (Hassabis conjecture) | Conjecture; supported by AlphaFold / Veo evidence; not theorem |
| Compression principles in ML literally inform quantum gravity | A research program (It-from-Qubit), not a finished result |
Clearly flagged as conjecture.
Erik Verlinde (2011) and Ted Jacobson (1995) have both written derivations of Einstein's equations from thermodynamic / entropic premises, treating gravity not as a fundamental force but as the statistical pull of information toward higher-entropy configurations. Jacobson's argument is precise — "the Einstein equation is an equation of state" — but, as Chirco, Haggard, Riello & Rovelli (2014) emphasize, the existence of underlying microscopic degrees of freedom is an interpretation, not a theorem. Verlinde's 2017 emergent-gravity proposal remains controversial; serious technical critiques exist (e.g., Roveto and Muñoz, arXiv:1201.2475).
But step back, and a pattern emerges that the present-day It-from-Qubit Simons Collaboration is taking seriously: spacetime, like a hidden representation in a deep network, may be the output of a compression — the structure that emerges when a high-dimensional state is encoded efficiently. If that is right, then a variational principle on the boundary (an action, an entropy functional, an information bottleneck) generates the bulk geometry. Hassabis's conjecture — that natural systems lie on low-rank learnable manifolds because they have been "survived" by some selection process — would, in this framing, be the statement that the universe is constructively compressible in exactly the way machine-learning models exploit. Whether the same variational structure that produces gravity from boundary entanglement also produces representations in a transformer from token sequences is currently a metaphor with three or four serious load-bearing equations under it, and many more that would have to be added before it became a theory.
The most provocative possibility, and the one worth flagging because serious researchers (Susskind, Maldacena, Van Raamsdonk, Carroll, ChunJun Cao, Vijay Balasubramanian) are exploring versions of it: intelligence and gravity may be two specializations of a more general principle of how high-dimensional information becomes low-dimensional structure. The Universal Weight Subspace Hypothesis observes the phenomenon in transformers; AdS/CFT observes it in quantum gravity; CA3 pruning observes it in the brain. There is, at present, no single equation connecting all three. There may yet be.
For an essay aimed at readers who can handle nuance:
Benchmarks that would change these recommendations:
The honest summary is this: across physics, neuroscience, and machine learning, the most efficient descriptions of natural systems are much lower-dimensional than the systems themselves, and the processes that produce those descriptions — RG flow, holographic encoding, autoencoder training, synaptic pruning — share a recognizable structural family. Whether that family corresponds to a single mathematical object is the open question this essay is making the case to take seriously.