Persistent memory for AI agents

Total Recall. Persistent agent memory that learns like you do.

Give your AI agents durable memory for real work.

You keep working as you do. But now, your agent remembers the context, decisions, patterns, and past work even when a session ends.

See benchmarks
Claude Code — my-project ▐▛███▜▌ ▜█████▛▘ ▘▘ ▝▝ Claude Code v2.1.78 Opus 4.6 · Claude Max · ~/my-project ❯ what did we decide about the auth token refresh? Recall: 3 related sessions found. Building context... Here's what happened across your last 3 sessions: The 401 errors started on Feb 15. Tokens were expiring mid-request because the refresh check only ran at login, not on each API call. You added retry logic with exponential backoff as a quick fix, but it was masking the real problem. On Feb 18 you found the root cause: token TTL was 15 min but some API calls took 3-4 min to complete. The decision: refresh tokens 5 min before expiry instead of waiting for failure. You updated auth.py with a background refresh timer and session.py to check freshness on each request. Stable since. All 47 tests pass. Zero 401 errors in 4 days. Files: src/auth.py, src/middleware/session.py, tests/test_auth.py [Opus 4.6] ~/my-project · main 18% $0.12 · 164K tokens remaining
1

Adapts to the way you work.

Install once, then keep talking to your agent the way you already do.

2

Fully local and token-efficient.

No LLM calls at search time. No giant memory dump sitting in the prompt all day.

3

Leading benchmark scores.

96.8% Recall@10 and 98.0% end-to-end on LongMemEval.

The Problem

Your AI agent forgets everything the moment a session ends.

You figure something out with an AI agent once, then spend time rediscovering it later. The proposal shape that worked. The bug pattern that always comes back. The customer constraint that changed the plan. The operating decision that explained everything.

The issue is not lack of information. The issue is that useful history stops being usable right when the next conversation starts.

Without Total Recall Q2 Content Strategy Planning 2 hours of brainstorming... gone Auth Token Debugging Session finally found the fix... gone Client Proposal Reasoning why we picked that pricing... gone Onboarding Flow Redesign the whole decision tree... gone You start over every time. With Total Recall Q2 Content Strategy Planning searchable forever Auth Token Debugging Session searchable forever Client Proposal Reasoning searchable forever Onboarding Flow Redesign searchable forever what did we decide about content strategy? Found 4 sessions. Pivoted to implementation stories, kept pricing notes and customer constraints attached.

Benchmarks

Benchmarked where persistent memory either works or it doesn't.

These are not decorative metrics. They measure whether the system can recover the right history and support correct final answers under long-horizon memory conditions.

96.8% LongMemEval Recall@10

Deterministic, reproducible retrieval from the local index. No LLM is used to search for the right session.

87.3% LoCoMo J-Score

Strong conversational-memory performance on a second public benchmark for long-dialogue reasoning.

$0 Retrieval Cost

Zero LLM calls at retrieval time. Memory stays local, lightweight, and out of the prompt until it matters.

How Total Recall stacks up.

Public benchmark results plus the product differences that matter most in practice.

System LME E2E LME Recall@10 LoCoMo Local-first LLM at retrieval
Total RecallLeading 98.0%
GPT-5.4
96.8%
deterministic, no LLM retrieval
87.3%
J-Score
Yes No
Mastra OM 94.87%
gpt-5-mini
Not published Not published Not the core story LLM in memory formation
Hindsight 91.40%
gemini-3-pro-preview
Not published Not published Self-hostable Mixed system
Zep 71.20%
gpt-4o
Not published Not published Mostly cloud Varies
LongMemEval

Best fit for cross-session memory

96.8%Recall@10
98.0%End-to-end

LongMemEval is the clearest external benchmark for the Total Recall use case: recovering relevant history from many prior sessions and supporting correct answers across that history.

Retrieval measured separately from answering
Deterministic local retrieval, no LLM search layer
Stresses temporal reasoning, knowledge updates, and multi-session synthesis
LoCoMo

Useful second lens for long dialogue

87.2%Recall@10
87.3%J-Score

LoCoMo focuses on long conversations rather than many separate sessions. It is useful because it shows how the system behaves under long-dialogue retrieval and answer-faithfulness pressure.

Strong complement to LongMemEval, not a substitute for it
Helps show conversational memory quality under long histories
Published Total Recall result comes from an older snapshot than the LongMemEval result above

What It Understands Over Time

Memory that learns and grows with you.

The point is not to hoard random details. The point is to recover the narrative:
what changed, why it changed, what worked, what kept breaking, and which pattern matters right now.

Narrative memory

Reconnect the arc

Pull back the real sequence of events: what started the problem, what you tried first, what changed the direction, and why the final decision made sense.

Decision memory

Recover the reasoning

Bring back the tradeoffs, constraints, and judgment behind the decision, not just the final line item that ended up in a document or code diff.

Pattern memory

Connect the dots

Spot recurring bugs, repeated objections, familiar decision shapes, and the patterns that let the agent act with context instead of improvising from scratch.

Dashboard

See the work, the decisions, and the story behind it.

Search across sessions, projects, people, decisions, and turning points.
Browse the actual work history with its context intact,
not a pile of disconnected summaries.

The dashboard makes the memory layer inspectable, searchable, reusable, and grounded in the real narrative of the work.

Why It Fits

It fits the way you already work.

Total Recall is built so you do not have to contort your workflow around the memory system.
The memory system adapts to the way you and your agent already operate.

Prompt hygiene Progressive

Memory stays out of the live prompt until it is needed. No permanent context bloat.

Retrieval Local

Search happens locally, so the memory layer stays fast, private, and cheap to run.

Behavior Proactive

The agent can surface relevant past work on its own instead of waiting for the perfect command.

Footprint Light

Small footprint, low latency, and a setup that does not ask you to adopt a brand new ritual just to get memory.

Beyond Recall

It starts with memory, and grows into intelligence.

Session recall is the foundation. The broader direction is a system that compounds recurring patterns,
durable know-how, and practical judgment across real work without turning into a junk drawer.

Memory

Remember the pattern

Not just “we discussed this,” but “this is the bug that usually appears after that change.”

Smart forgetting

Keep what compounds

Not every detail should live forever. The system should preserve the patterns and judgment that keep paying off, while letting noise fade.

Intelligence

Know when it matters

Not just searchable memory, but proactive memory that surfaces the right pattern, context, or decision when the work makes it relevant.

Alex Greenshpun portrait

About Alex

Built by Alex Greenshpun,
for people doing real work with AI.

I build systems for serious human-agent work: memory, token efficiency, workflow safety, and the infrastructure that makes agents more useful in practice, not just in demos.

Total Recall comes from using agents every day across product work, writing, planning, research, debugging, and operations, then refusing to accept that all of that context should vanish whenever the session ends.

Local-first Built for privacy, speed, and low operating cost.
Agent-native Designed for agents that can use memory proactively on their own.
Compounding Focused on patterns, judgment, and work that should become easier over time.

If you already use AI agents for real work, you should not have to start over every session.

Total Recall is built for that exact frustration: not the absence of information, but the absence of usable memory. Keep working the way you already do. Let the memory layer catch up.

View GitHub
Works with your existing workflow No special ritual required. Ask normal questions and let the agent use memory when it matters.
Local, proactive, lightweight Zero LLM calls at retrieval time, low latency, and a footprint small enough to stay practical.
Benchmarked, not hand-waved 96.8% Recall@10 and 98.0% end-to-end on LongMemEval with GPT-5.4, plus strong LoCoMo results.