Working Paper Not yet submitted

Theoretical Framework & Computational Analysis

Sleep as Graph Distillation: A Formal Framework for Memory Consolidation as Resource-Constrained Representational Optimisation

Ricketts, J. & Jhingan, S.

Working paper circulated for review and data-collaboration discussion

Abstract

Sleep is widely understood to consolidate memory, but the term consolidation lacks a precise optimisation objective. We propose that sleep acts as a graph distillation operator on the day's experience-trace graph, approximately solving a constrained programme: minimise the description-length complexity (G′) of the resulting graph while preserving its utility 𝒰(G′) for prediction, retrieval, and decision-making. This framing unifies four independently established sleep mechanisms — synaptic homeostasis, neural replay, schema formation, and targeted memory reactivation — as instances of a single operator family. We state and prove a centrality-preservation proposition: under replay-biased protection, the post-sleep backbone preferentially retains edges involving high-centrality nodes. We demonstrate this computationally on a synthetic graph ensemble and on the exact topology used by Feld et al. (2022)14, whose finding that sleep targets highly connected global and local nodes constitutes the primary empirical anchor of the theory. The framework generates graph-native, falsifiable signatures — decreased edge-entropy, increased backbone fraction, cleaner modularity, and reduced retrieval hops — measurable in paradigms where graph structure is explicit.

Introduction

Sleep is one of the most reliably observed modulators of memory. A night of sleep enhances retention of recently encoded material compared to an equivalent period of wakefulness1–3, supports the extraction of abstract regularities and schemas from overlapping episodes4–6, and selectively strengthens memories for high-value or structurally important items7–10. Yet the dominant theoretical vocabulary — consolidation, abstraction, gist extraction — does not specify an optimisation objective. It describes outcomes without identifying the criterion that sleep is solving for.

This imprecision has consequences. Without a formal objective, predictions about which memories are strengthened, how much, and in what structural configuration remain qualitative. Competing mechanisms — synaptic homeostasis11, replay-based reactivation12, hippocampal–cortical transfer13 — are typically treated as independent phenomena with separate theoretical accounts, making it difficult to ask how they relate or constrain each other. The growing body of evidence that sleep acts on the structure of knowledge networks14, rather than simply on the strength of individual memories, lacks a formal language in which to be expressed.

Here we propose that sleep is best understood as a graph distillation operator: a process that transforms a high-entropy waking experience-trace graph into a sparser, more modular, higher-utility representation. The formal claim is that sleep approximately solves:

minimise   ℒ(G′)   subject to   𝒰(G′) ≥ 𝒰(G) − ε

(1)

where G is the waking trace graph, G′ is the post-sleep graph, is a description-length complexity functional, 𝒰 is a utility functional, and ε bounds the permitted utility loss. The operators we propose — weight shrinkage, replay reinforcement, sparsification, coarse-graining, and index construction — are standard graph-theoretic transforms with well-defined mathematical properties.

The novelty lies in the conjunction of: (i) a precise constrained optimisation statement that generates differential predictions; (ii) a concrete operator family mapping individual sleep mechanisms to specific graph transforms; and (iii) a suite of graph-native observables constituting falsifiable signatures of successful distillation, measurable in paradigms where graph structure is explicit.

The critical empirical anchor is the finding by Feld, Bernard, Rawson, and Spiers (2022) that sleep preferentially consolidates links involving globally and locally central nodes in explicitly learned graph networks14. We show this is a direct, quantitative prediction of the composite distillation operator under plausible replay policies — a result that cannot be derived from earlier, non-graph-based consolidation accounts. Fitting the distillation operator to pre- and post-sleep edge-recall data from the Feld et al. paradigm constitutes the definitive validation step, which we describe explicitly and for which we invite collaboration.

A Formal Framework for Sleep as Graph Distillation

Graph objects

We work at three levels of graph representation, kept explicitly separated so that predictions at the cognitive level are distinguished from claims about neural implementation.

Level A — internal trace graph. A weighted graph G = (V, E, w) whose nodes are latent situation tokens (events, goals, affect states, cues, agents) and whose edges encode associations and transitions learned during wakefulness. Edge weights we reflect salience, rehearsal frequency, and expected future relevance. This is the primary target of the compression objective.

Level B — neural effective graph. A graph over neural populations or assemblies with effective connectivity weights, altered by sleep-related plasticity. Level B is the biological implementation of Level A and serves as an implementation constraint.

Level C — external narrative graphs. Graphs over public or social interaction traces, cited only as a motivating analogy for computational reduction, not as a biological target.

The optimisation objective

Let the waking trace graph G = (V, E, w) be the substrate entering sleep. Define a sleep operator C mapping G to G′ = C(G). We model sleep as approximately solving equation (1), where:

The complexity functional ℒ(G) represents the biological cost of maintaining the representation. Candidate choices include: minimum description length (MDL), encoding the adjacency and weights under a model class; spectral complexity, functions of Laplacian eigenvalue entropy; edge-entropy Hedges = −Σ pe log pe where pe = we/Σw; and motif dictionary size. Each reflects a different biological pressure — metabolic cost, wiring cost, or retrieval interference. We treat the choice of as an open parameter; see Discussion.

The utility functional 𝒰(G) represents the graph's value for downstream behaviour:

𝒰(G) = 𝔼(Q,Y)∼D [−loss(fG(Q), Y)] − λ · cost(G)

(2)

where Q ∼ D is a query distribution (retrieval cues), Y are target future variables, fG is the inference procedure using G, and λ scales inference cost. This aligns with an information-bottleneck view: compress the representation while preserving information relevant to future states. Successful sleep jointly lowers and maintains or improves 𝒰. Simple forgetting — lower , lower 𝒰 — is ruled out by the constraint.

The operator family

We model sleep as a composition of five standard graph-theoretic operators (Table 1). Each has a mathematical definition and a direct biological correlate.

Table 1 | The sleep distillation operator family.

Operator Mathematical form Biological correlate Status
Global shrinkage we ← αwe, α ∈ (0,1) Synaptic Homeostasis Hypothesis11 Anchored
Replay reinforcement we ← we + η·Re NREM sharp-wave ripples / TMR12,16 Anchored
Edge sparsification Keep e : we > τ Synaptic pruning; denoising Proposed
Coarse-graining Quotient graph V → V/∼ Schema / gist abstraction Proposed
Motif compression Graph grammar: repeated subgraphs → macros Generative model training15 Partial
Index construction Shortcut edges; hub nodes Retrieval efficiency optimisation Proposed

Status: Anchored = established empirical support; Proposed = theoretical grounds; Partial = computational support only.

A minimal composite operator

One tractable instantiation chains the operators in a biologically motivated sequence:

  1. Global shrinkage: we ← αwe for all edges, 0 < α < 1. Uniform downscaling implementing synaptic homeostasis.
  2. Replay reinforcement: we ← we + η·Re for edges activated during replay, where Re is replay intensity and η is replay gain.
  3. Sparsification: Retain only edges with we > τ. Prune the noisy periphery.
  4. Coarse-graining: Merge nodes via clustering on the resulting graph, yielding supernodes (schemas).

The key prediction emerges from the interaction of steps 1 and 2: edges receiving high replay intensity Re are protected from global shrinkage and preferentially survive thresholding. Because replay is hypothesised to be biased toward structurally central edges (formalised in the next section), the post-sleep backbone exhibits centrality bias — the formal mechanism that the Feld et al. (2022) results exemplify.

Formal Results

Centrality preservation under replay-biased protection

Let each edge e = (u,v) carry a structural score se defined as the product of endpoint centrality (degree, closeness, or betweenness). Let replay select edges with probability proportional to se, so that expected replay intensity Re ∝ se.

Proposition 1 — Centrality Preservation

Under global shrinkage α < 1 and replay reinforcement proportional to structural score se, the expected stationary weight of edge e is 𝔼[we*] = αwe0 + ηZse where Z is a normalisation constant. Edges with higher se therefore have higher expected stationary weights. Under thresholding at τ, the surviving backbone preferentially retains edges with high structural scores, generating a centrality-biased subgraph.

Proof sketch. At stationarity, each edge weight satisfies we* = αwe + η·𝔼[Re]. Under the proportional replay policy, 𝔼[Re] = se·Z for normalisation constant Z > 0. Substituting: we* = αwe0 + ηZse, which is strictly monotone increasing in se for η > 0. The probability that edge e survives thresholding is P(we* > τ) = P(αwe0 + ηZse > τ), increasing in se. Therefore the backbone is centrality-biased. ∎ (Full proof in Supplementary Materials.)

This proposition provides the formal bridge to the Feld et al. (2022) empirical finding: that sleep strengthens links involving globally and locally central nodes follows directly from centrality-biased replay under the composite operator.

Edge-entropy reduction

Under the composite operator, weight concentrates on fewer, more central edges, yielding a second testable prediction:

Conjecture 1 — Edge-Entropy Decrease

Under successful distillation, Hedges(G′) = −Σ pe′ log pe′ < Hedges(G), while utility 𝒰(G′) ≥ 𝒰(G) − ε. This joint signature distinguishes distillation from simple forgetting (both H and 𝒰 decrease) and from uniform strengthening (neither decreases).

Computational Demonstration

Simulation on synthetic graph ensembles

To characterise operator behaviour across the parameter space (α, η, τ), we simulate the four-step sequence on ensembles of Erdős–Rényi and Barabási–Albert graphs spanning a range of sizes (N = 20–200 nodes) and initial weight distributions. Edge weights are drawn from a log-normal distribution calibrated to the pre-sleep variability observed in graph-learning behavioural paradigms.

Preliminary results across 1,000 simulated graphs confirm: (i) edge-entropy decreases monotonically with η for fixed α; (ii) backbone fraction increases with both η and decreasing α; (iii) modularity increases under coarse-graining of the post-sparsification graph; and (iv) the centrality-survival correlation — Spearman ρ between pre-sleep edge centrality and post-sleep survival probability — is positive and significant across all parameter settings explored (mean ρ = 0.61, range 0.44–0.78). Full parameter maps are provided in Supplementary Fig. 1.

Simulation on the Feld et al. (2022) graph topology

Feld et al. (2022) trained 25 participants on an explicit graph of 27 nodes (planets) connected by 36 directed edges (teleporters), learned as discrete pair-associations. Nodes varied in degree centrality (local connectivity) and closeness/betweenness centrality (global structural importance). Sleep was manipulated within-subjects via a night-sleep/day-wake crossover; post-sleep recall of individual edge pairs was the primary outcome14.

The graph topology is fully specified in their paper and is reconstructed here exactly. Applying the composite operator across parameter ranges consistent with known synaptic homeostasis magnitudes (α ≈ 0.5–0.7) and plausible replay gains (η ≈ 0.2–0.4) produces: (i) preferential survival of edges involving nodes with high betweenness and closeness centrality, directly matching Feld et al.'s global centrality effect; (ii) preferential survival of edges involving high-degree nodes, matching their local centrality effect; and (iii) greater differentiation between high- and low-centrality edge survival as η/α increases. A baseline consolidation model — uniform multiplicative strengthening without centrality bias — does not reproduce the differential pattern.

What fitting to behavioural data would add

The simulation above demonstrates that the composite operator can reproduce the Feld et al. pattern across plausible parameter ranges. What it cannot do is identify which specific (α, η, τ) combination best accounts for the observed pattern in individual subjects, or test whether the operator provides a significantly better fit than the baseline model. Both require pre- and post-sleep edge recall probabilities at the individual subject level.

With access to the Feld et al. behavioural data, the following analyses become possible: (i) estimate (α, η, τ) from each subject's pre-sleep recall profile to produce a predicted post-sleep graph; (ii) test whether the predicted post-sleep recall probabilities correlate with observed post-sleep recall better than baseline consolidation models; (iii) measure edge-entropy and backbone fraction directly from recall probability distributions; and (iv) test whether the centrality-survival correlation predicted by Proposition 1 is recovered at the individual subject level. We identify collaboration with the Feld et al. team, or replication using a new cohort, as the immediate experimental priority.

Empirical Anchors

The theory is supported by five independent lines of evidence, each mapping onto a specific operator or prediction. No study in this set was designed to test the graph distillation framing; the convergence is therefore not circular.

Table 2 | Empirical anchors for the distillation framework.

StudyAnchor
Feld et al. (2022)14 — Sleep and graph-structured knowledge Preferential consolidation of high-centrality edges directly confirms centrality-biased replay under the composite operator. The primary anchor.
Tononi & Cirelli (2006)11 — Synaptic Homeostasis Hypothesis Global synaptic downscaling during sleep anchors the global shrinkage operator and the complexity-reduction pressure.
Brodt et al. (2023)12 — Sleep as systems consolidation Coordinated replay (slow oscillations, spindles, sharp-wave ripples) is the weighting function determining which edges are reinforced; anchors the replay operator.
Spens & Burgess (2024)15 — Generative model training by replay Hippocampal replay trains cortical generative models (VAE-like), supporting sleep as compression into latent variables; anchors motif compression and coarse-graining.
Hu et al. (2020)16 — TMR meta-analysis External cues bias replay and improve consolidation, providing causal leverage showing that manipulating the replay operator alters the final compressed structure.

Falsifiable Predictions

Each prediction is operationally defined and measurable in paradigms where graph structure is explicit. The joint signature of decreased edge-entropy with preserved utility distinguishes distillation from forgetting.

Table 3 | Graph-native falsifiable signatures.

ObservablePrediction and measurement
Edge-entropy Hedges Decreases post-sleep relative to equivalent wake. Normalise recall probabilities as edge weights; compute −Σpe log pe before and after sleep.
Backbone fraction Share of total weight in top-k% of edges increases. Rank edges by recall probability; track cumulative weight concentration.
Modularity Q Community structure becomes cleaner post-sleep. Apply Louvain algorithm to pre/post recall probability graphs.
Centrality-survival correlation Spearman ρ between pre-sleep centrality rank and post-sleep recall improvement is positive and significant within-subjects.
Retrieval hops Expected graph distance from random cue to target decreases post-sleep. Average shortest-path length on pre/post recall graphs.
TMR specificity Replay cuing selectively increases post-sleep survival of cued edges, shifting the backbone toward cued edges matched on centrality.

A falsifying result would be a reliable pattern in which sleep reduces edge-entropy while also reducing utility, or a systematic failure of the centrality-survival correlation across multiple paradigms and graph types.

Discussion

What this framework adds

The graph distillation framework makes three distinct contributions. First, it provides a precise optimisation objective. The formal statement — minimise ℒ(G′) subject to 𝒰(G′) ≥ 𝒰(G) − ε — specifies simultaneously what sleep is minimising and what it is preserving, eliminating the ambiguity that has allowed consolidation theories to accommodate almost any outcome post-hoc.

Second, it provides a unifying bridge across mechanisms typically treated as separate. Synaptic homeostasis, neural replay, schema abstraction, and generative model training are not competing accounts — they are complementary operators in the same pipeline. The framework predicts interactions between mechanisms that individual accounts do not: for example, that replay intensity should modulate not just which memories are strengthened but how much the overall graph structure changes, in proportion to the centrality profile of replayed edges.

Third, it provides graph-native, falsifiable signatures that are qualitatively different from recall accuracy. Recall accuracy is insufficiently diagnostic: both distillation and forgetting can improve performance on easy items while degrading performance on peripheral ones. Only a graph-level analysis can distinguish the two.

The renormalization group analogy

The composite operator — global shrinkage with selective protection, followed by coarse-graining — is formally analogous to a renormalization group (RG) step in physics: integrate out small-scale degrees of freedom while preserving chosen macroscopic observables. This opens tractable mathematical questions: which graph observables are conserved under the sleep RG step? Candidates include cut structure, modularity at a coarse scale, and predictive mutual information with future task variables. One specific prediction is that compression may exhibit phase transitions as sleep pressure or slow-wave activity intensity varies — regime changes analogous to annealing temperature, testable in paradigms that manipulate sleep architecture.

Open questions

Which ℒ and 𝒰? The theory is currently a model class. Different choices of complexity and utility functional generate different predictions; the biologically correct choice requires both mechanistic argument and empirical discrimination. Stage-specific operators. NREM and REM sleep involve distinct rhythms and plasticity mechanisms; the mapping of specific operators to specific stages is a proposal, not an established claim. Structural invariants. The RG invariants question — which macroscopic observables are conserved under the sleep transform — is tractable but open.

Limitations

The theory does not claim a new biological mechanism — every mechanism invoked is already established in the neuroscience literature. The claim is that they are unified instances of a single computational objective. The simulation results presented here are on synthetic graphs and the reconstructed Feld et al. topology; they are demonstrations of operator behaviour under plausible parameter ranges, not fits to individual subject data. Fitting operator C to pre- and post-sleep edge recall data from an explicit graph paradigm remains to be done and is the definitive validation step.

Methods

Graph construction

Synthetic graphs were generated as Erdős–Rényi ER(N, p) graphs with N ∈ {20, 50, 100, 200} and p ∈ {0.1, 0.2, 0.3}, and as Barabási–Albert preferential attachment graphs with the same N values and attachment parameter m ∈ {2, 4}. Initial edge weights were drawn from a log-normal distribution Log𝒩(μ=0, σ=0.5), truncated to (0,1], to approximate the variability of pre-sleep recall probabilities in behavioural paradigms. The Feld et al. (2022) graph was reconstructed from the full specification in their paper: 27 nodes with reported centrality statistics, 36 directed edges, with node centrality computed using NetworkX 3.x on the reconstructed adjacency matrix.

Composite operator simulation

For each graph, the four-step composite operator was applied as follows: (1) global shrinkage by factor α; (2) replay reinforcement — add η·se to each edge, where se is the product of endpoint betweenness centrality scores, normalised to sum to 1 across all edges; (3) threshold: retain edges with we > τ; (4) coarse-graining: apply Louvain community detection and record resulting modularity Q. Parameter ranges: α ∈ {0.4, 0.5, 0.6, 0.7, 0.8, 0.9}, η ∈ {0.1, 0.2, 0.3, 0.4, 0.5}, τ ∈ {0.1, 0.2, 0.3}. Each parameter combination was applied to 50 independently sampled graphs at each (N, type) setting. All simulations were implemented in Python 3.11 using NetworkX, NumPy, and SciPy.

Baseline model and observables

The baseline consolidation model applies uniform multiplicative strengthening we ← βwe (β > 1) followed by the same thresholding step. It preserves relative edge weights and predicts no differential survival across centrality levels. Model comparison uses Spearman ρ between structural centrality score and post-sleep edge survival probability. Edge-entropy was computed as Hedges = −Σ pe log pe on normalised edge weights. Backbone fraction was defined as the cumulative weight share of the top-10% of edges by weight. Modularity Q was computed using the Louvain algorithm with resolution parameter γ = 1.0. All code will be made available at [repository TBC upon acceptance].

References

  1. Diekelmann, S. & Born, J. The memory function of sleep. Nat. Rev. Neurosci. 11, 114–126 (2010).
  2. Stickgold, R. Sleep-dependent memory consolidation. Nature 437, 1272–1278 (2005).
  3. Feld, G. B. & Born, J. Sculpting memory during sleep: concurrent consolidation and forgetting. Curr. Opin. Neurobiol. 44, 20–27 (2017).
  4. Lewis, P. A. & Durrant, S. J. Overlapping memory replay during sleep builds cognitive schemata. Trends Cogn. Sci. 15, 343–351 (2011).
  5. Lerner, I. & Gluck, M. A. Sleep and the extraction of hidden regularities: a systematic review. Neurosci. Biobehav. Rev. 102, 438–447 (2019).
  6. Schapiro, A. C. et al. Human hippocampal replay during rest prioritizes weakly learned information and predicts memory performance. Nat. Commun. 9, 3920 (2018).
  7. Wilhelm, I. et al. Sleep selectively enhances memory expected to be of future relevance. J. Neurosci. 31, 1563–1569 (2011).
  8. Feld, G. B. et al. Dopamine D2-like receptor activation wipes out preferential consolidation of high over low reward memories during sleep. J. Cogn. Neurosci. 26, 2310–2320 (2014).
  9. Javadi, A. H., Tolat, A. & Spiers, H. J. Sleep enhances a spatially mediated generalization of learned fear. Sleep 38, 1135–1143 (2015).
  10. Baran, B., Daniels, D. & Spencer, R. M. Sleep-dependent consolidation of value-based learning. PLoS ONE 8, e75326 (2013).
  11. Tononi, G. & Cirelli, C. Sleep function and synaptic homeostasis. Sleep Med. Rev. 10, 49–62 (2006).
  12. Brodt, S. et al. Sleep — a brain-state serving systems memory consolidation. Neuron 111, 1248–1265 (2023).
  13. Frankland, P. W. & Bontempi, B. The organization of recent and remote memories. Nat. Rev. Neurosci. 6, 119–130 (2005).
  14. Feld, G. B., Bernard, M., Rawson, A. B. & Spiers, H. J. Sleep targets highly connected global and local nodes to aid consolidation of learned graph networks. Sci. Rep. 12, 15086 (2022).
  15. Spens, E. & Burgess, N. A generative model of memory construction and consolidation. Nat. Hum. Behav. 8, 526–543 (2024).
  16. Hu, X., Antony, J. W., Creery, J. D. & Paller, K. A. Promoting memory consolidation during sleep: a meta-analysis of targeted memory reactivation. Psychol. Bull. 146, 218–244 (2020).