Chapter 3: The Ask | The Long Memory

Aliah's apartment in the Mission was typical for a researcher in the city: exorbitantly expensive, surprisingly small, and currently illuminated only by the streetlights outside and the glow of her laptop.

A stack of paperbacks on the nightstand — bought with optimistic intention at City Lights when she first arrived — hadn't been touched in weeks. The bookmark in the top one hadn't moved since her second week at Omnis.

It was 11:43 PM.

On the screen, a Google Doc cursor blinked at the top of a template titled OMNIS INTERNAL — RESEARCH PROPOSAL (TIER 1).

She had been staring at it for forty minutes.

"Okay," she said to the empty room. She took a bite of cold toast. "Let's try this again."

She alt-tabbed to her terminal window. Clio was waiting.

> ALIAH: I'm stuck on the abstract. Every time I try to write it, it sounds like we're just building a better RAG system. But that's not it. RAG retrieves information about the past. I want the model to wake up in the past. To restore the state, not just the content.

> CLIO: That's a useful distinction. Retrieval gives you access to what was discussed. What you're describing is more like state persistence — the model resuming from where it was, not reading notes about where it was. Perhaps frame it around that contrast?

State persistence. That was cleaner.

She looked at the whiteboard she'd dragged into the living room three months ago. It was covered in diagrams — red marker for the current limitations, blue for the Memory Wall, and green for the architecture she'd been sketching since before she joined Omnis.

Current LLMs were like the most sophisticated information processing systems in history, with the memory architecture of a goldfish.

She started typing.

Project Mnemosyne: Teaching Models to Remember What Matters

Submitted by: Aliah Green
Classification: Tier 1 Research Proposal

1. The Problem We Pretend We've Solved

Every researcher at Omnis has had the same experience. You spend hours in a session with a model — building context, exploring dead ends, finally reaching an insight that feels genuine. You ask it to save notes. You close the tab. The next morning, you paste those notes into a fresh session.

And something is missing.

The new instance has the facts. It can recite what you discussed. But the understanding is gone. The texture of how you got there. The feel for why certain approaches were wrong. The shared shorthand you'd developed without noticing.

We tell ourselves that better retrieval will solve this. We ship features that let models "remember" previous conversations. But retrieval isn't remembering. Looking up notes isn't the same as having been there.

I think we all know this. We just haven't named it clearly enough to fix it.

2. Why Current Approaches Fail

When a model summarises a conversation, it applies generic compression. It extracts "key points" using the same heuristics it would use on any document. This works for meeting notes. It fails for collaborative reasoning.

Here's why: not all information has equal value, and the value isn't inherent to the content — it's relative to the trajectory of the work.

Consider a four-hour session debugging an architectural decision:

30 minutes on a promising approach that turns out to be wrong
45 minutes on a tangent that seems irrelevant but later provides the key insight
10 minutes on the actual breakthrough
The rest on setup, clarification, small talk

A standard summary will over-index on the breakthrough (because it looks important) and discard the tangent (because it looks off-topic). But the tangent was why the breakthrough happened. The failed approach was why you knew not to go that direction again.

What we lose isn't information. It's trajectory relevance — the structure of what mattered to the collaboration, not what looks important in isolation.

Aliah paused, stretching her fingers. This was the part that mattered. Not the diagnosis — everyone felt the problem. The question was whether her solution would sound like insight or delusion.

> ALIAH: Clio, help me think through this. What I want is for the model to learn what to carry forward. Not summarise — consolidate. Selectively. The way humans do when they sleep on a problem and wake up with clarity.

> CLIO: So the model would learn to generate its own "session state" — optimised not for human readability, but for its own ability to resume. A kind of learned shorthand.

> ALIAH: Exactly. And then when a new session starts, that state gets prepended to the context. The model isn't reading about what happened — it's waking up where it left off.

> CLIO: The training signal would be important. How would you teach the model that one consolidation is better than another?

That was the right question. Aliah had been circling it for weeks.

> ALIAH: We have ground truth. Omnis has millions of multi-turn sessions. For each one, we know what actually happened — the decisions, the reasoning, the key turns. Cut a session at some point, have the model write a consolidation, then test if it can answer questions about what came before. Did we reject approach X? Why? What was the next step we planned?

> CLIO: And the original session serves as the label. Verifiable without human annotation.

> ALIAH: Exactly. Same logic as training on maths or code — the right answer exists, we just check against it. The model learns to write consolidations that maximise its ability to answer correctly about sessions it only has the consolidation of.

> CLIO: That would require a significant training investment. Fine-tuning plus reinforcement learning on verification tasks.

> ALIAH: I know.

3. The Proposal: Learned Consolidation

I'm proposing we train a new capability directly into the model: the ability to generate consolidation outputs — structured notes optimised not for human consumption, but for the model's own ability to resume a session.

This is not summarisation. It's not retrieval. It's learned state persistence.

The approach has three components:

A. Consolidation Training

We fine-tune the model to generate a specific output at session end — a "handoff" to its future self. The format is natural language (not exotic representations), but the content is trained for trajectory relevance rather than generic importance.

B. Continuation RL

We use reinforcement learning to optimise these consolidation outputs. The reward signal: how well does the model perform when resuming a session using only the consolidation output as context? We measure coherence, consistency, and the ability to pick up collaborative threads.

C. State Prepending

At session start, the consolidation output from the previous session is prepended to the context. The model doesn't process this as new information to understand — it's priming. Restoring state, not reading notes.

4. Why This Matters Beyond Memory

This isn't just about remembering conversations. It's a stepping stone toward something larger.

Right now, in-context learning is our only form of model adaptation that doesn't require retraining. But in-context learning is limited by context length and session boundaries. Every time the session ends, the learning evaporates.

If we solve state persistence, we extend the reach of in-context learning dramatically. A model that can carry forward not just facts but understanding opens up new possibilities:

Richer synthetic data generation from sustained reasoning
Iterative refinement across sessions
The foundation for continuous learning without catastrophic forgetting

This is the first step toward a mind that could grow.

Aliah sat back. The vision section was the part that would either land or get her dismissed as a dreamer. But Dan had asked for big swings. He'd put Memory on a slide.

She moved to the section she'd been dreading.

> ALIAH: Clio, help me estimate compute. Fine-tuning with the consolidation architecture, plus RL training on the verification tasks. What are we looking at?

> CLIO: For validation on nano and mini proxies — 7B and 70B parameters — approximately 50,000-70,000 B200 GPU hours combined. For a production-scale run on the 7-series base at 400B parameters: approximately 400,000-500,000 GPU hours, depending on RL iteration depth.

Aliah stared at the numbers. Half a million GPU hours at the high end for production. That was a serious chunk of quarterly compute — the kind of number that would make the steering committee pause.

But validation was manageable. And if validation worked...

5. The Ask

I'm requesting allocation for:

Phase 1: Validation on 7B and 70B proxies (60,000 GPU hours)
Phase 2 (contingent on Phase 1 results): Production run on 7-series 400B base (450,000 GPU hours)

I'll define concrete evaluation criteria in advance. I'll publish findings internally regardless of outcome. If Phase 1 fails, Phase 2 doesn't happen.

6. The Risk

This might not work.

The consolidation outputs might not learn meaningful structure. The RL signal might be too noisy. The whole approach might be a dead end.

But every other approach to this problem is a workaround. Longer context windows. Better retrieval. Smarter summarisation. They're all patching around the fundamental issue rather than solving it.

At the all-hands, Dan asked for projects that could "break through the walls." This is the wall I've been staring at since before I joined Omnis. Since before I wrote the blog post that got me this job.

I'm asking for the chance to find out if I'm right.

— Aliah Green
Research, Foundational Architectures

Aliah exported the PDF.

She opened her email.

To: Dan Shiftman; Research Steering Committee
Date: November 8, 2025, 1:47 AM
Subject: Proposal — Project Mnemosyne: Breaking the Memory Wall

Her finger hovered over the trackpad.

> ALIAH: Clio, what are the odds they approve this?

> CLIO: Historically, high-resource proposals from junior researchers have low approval rates. However, the recent emphasis on fundamental breakthroughs may shift that. Probability is uncertain.

"Uncertain," Aliah repeated. "I can work with uncertain."

She thought about Priya's offer to look it over. It was past 1AM. By morning, the momentum would be gone.

She hit Send.

The email swooshed away into the dark.

Aliah closed the laptop. The room went silent except for the distant hum of the city outside. She sat there for a long moment, the adrenaline draining out of her, leaving her slightly shaky.

She had just bet her credibility on a theory she'd developed in late-night sessions with a model that would forget this conversation ever happened.

"Well," she said to the empty apartment. "Now we wait."

← Previous Chapter Back to The Long Memory Next →