Persistent memory for AI agents

Total Recall. Persistent agent memory that learns like you do.

Give your AI agents durable memory for real work.

You keep working as you do. But now, your agent remembers the context, decisions, patterns, and past work even when a session ends.

See benchmarks
Claude Code — my-project ▐▛███▜▌ ▜█████▛▘ ▘▘ ▝▝ Claude Code v2.1.78 Opus 4.6 · Claude Max · ~/my-project ❯ what did we decide about the auth token refresh? Recall: 3 related sessions found. Building context... Here's what happened across your last 3 sessions: The 401 errors started on Feb 15. Tokens were expiring mid-request because the refresh check only ran at login, not on each API call. You added retry logic with exponential backoff as a quick fix, but it was masking the real problem. On Feb 18 you found the root cause: token TTL was 15 min but some API calls took 3-4 min to complete. The decision: refresh tokens 5 min before expiry instead of waiting for failure. You updated auth.py with a background refresh timer and session.py to check freshness on each request. Stable since. All 47 tests pass. Zero 401 errors in 4 days. Files: src/auth.py, src/middleware/session.py, tests/test_auth.py [Opus 4.6] ~/my-project · main 18% $0.12 · 164K tokens remaining
1

Adapts to the way you work.

Install once, then keep talking to your agent the way you already do.

2

Up to 97% fewer tokens.

Zero tokens at rest, no LLM calls at search time. Memory stays out of the prompt until it matters, so cost stays flat as it grows.

3

Leading benchmark scores.

The right memory in the top 10 98.2% of the time, and 98.0% answered correctly on LongMemEval.

The Problem

Your AI agent forgets everything the moment a session ends.

You figure something out with an AI agent once, then spend time rediscovering it later. The proposal shape that worked. The bug pattern that always comes back. The customer constraint that changed the plan. The operating decision that explained everything.

The issue is not lack of information. The issue is that useful history stops being usable right when the next conversation starts.

Without Total Recall Q2 Content Strategy Planning 2 hours of brainstorming... gone Auth Token Debugging Session finally found the fix... gone Client Proposal Reasoning why we picked that pricing... gone Onboarding Flow Redesign the whole decision tree... gone You start over every time. With Total Recall Q2 Content Strategy Planning searchable forever Auth Token Debugging Session searchable forever Client Proposal Reasoning searchable forever Onboarding Flow Redesign searchable forever what did we decide about content strategy? Found 4 sessions. Pivoted to implementation stories, kept pricing notes and customer constraints attached.

Benchmarks

Benchmarked where persistent memory either works or it doesn't.

These are not decorative metrics. They measure whether the system can recover the right history and support correct final answers under long-horizon memory conditions.

98.0% Answered correctly

Out of 500 hard memory questions, the assistant gave the right final answer (490/500). The bottom line, not just whether a memory showed up.

98.2% Right memory in the top 10

At least one correct past session showed up in the top 10 results, and 95.0% in the harder top 5. This is the retrieval metric most memory tools report.

$0 Retrieval cost

Zero LLM calls at retrieval time. Memory stays local, lightweight, and out of the prompt until it matters.

How Total Recall stacks up.

Public benchmark results plus the product differences that matter most in practice.

System LME E2E LME Recall@10 LoCoMo Local-first LLM at retrieval
Total RecallLeading 98.0%
GPT-5.4
96.8%
deterministic, no LLM retrieval
89.1%
Recall@10
Yes No
Mastra OM 94.87%
gpt-5-mini
Not published Not published Not the core story LLM in memory formation
Mem0 94.4%
self-reported
Not published Not published Cloud or self-host Varies
Honcho 92.6%
gemini-3-pro
Not published Not published Cloud Varies
Hindsight 91.40%
gemini-3-pro-preview
Not published Not published Self-hostable Mixed system
Supermemory 85.2%
gemini-3-pro
Not published Not published Cloud Varies
Zep 71.20%
gpt-4o
Not published Not published Mostly cloud Varies
LongMemEval

Best fit for cross-session memory

98.2%Right memory, top 10
98.0%Answered correctly

LongMemEval is the clearest external benchmark for the Total Recall use case: recovering relevant history from many prior sessions and supporting correct answers across that history.

95.0% right memory in the top 5 (at least one correct session surfaced)
96.8% found every memory a question needed in the top 10, 90.8% in the top 5, a stricter bar we also report
Deterministic local retrieval, no LLM at search time, measured separately from answering
LoCoMo

Useful second lens for long dialogue

89.1%Recall@10

LoCoMo focuses on long conversations rather than many separate sessions. It is useful because it shows how the system behaves under long-dialogue retrieval and answer-faithfulness pressure.

Strong complement to LongMemEval, not a substitute for it
Helps show conversational memory quality under long histories
Pure retrieval, zero LLM calls at search time

What It Understands Over Time

Memory that learns and grows with you.

The point is not to hoard random details. The point is to recover the narrative: what changed, why it changed, what worked, what kept breaking, and which pattern matters right now.

Narrative memory

Reconnect the arc

Pull back the real sequence of events: what started the problem, what you tried first, what changed the direction, and why the final decision made sense.

Decision memory

Recover the reasoning

Bring back the tradeoffs, constraints, and judgment behind the decision, not just the final line item that ended up in a document or code diff.

Pattern memory

Connect the dots

Spot recurring bugs, repeated objections, familiar decision shapes, and the patterns that let the agent act with context instead of improvising from scratch.

Dashboard

See the work, the decisions, and the story behind it.

Search across sessions, projects, people, decisions, and turning points. Browse the actual work history with its context intact, not a pile of disconnected summaries.

The dashboard makes the memory layer inspectable, searchable, reusable, and grounded in the real narrative of the work.

Why It Fits

It fits the way you already work.

Total Recall is built so you do not have to contort your workflow around the memory system. The memory system adapts to the way you and your agent already operate.

Prompt hygiene Progressive

Memory stays out of the live prompt until it is needed. No permanent context bloat.

Retrieval Local

Search happens locally, so the memory layer stays fast, private, and cheap to run.

Behavior Proactive

The agent can surface relevant past work on its own instead of waiting for the perfect command.

Footprint Light

Small footprint, low latency, and a setup that does not ask you to adopt a brand new ritual just to get memory.

Beyond Recall

It starts with memory, and grows into intelligence.

Session recall is the foundation. The broader direction is a system that compounds recurring patterns, durable know-how, and practical judgment across real work without turning into a junk drawer.

Memory

Remember the pattern

Not just “we discussed this,” but “this is the bug that usually appears after that change.”

Smart forgetting

Keep what compounds

Not every detail should live forever. The system should preserve the patterns and judgment that keep paying off, while letting noise fade.

Intelligence

Know when it matters

Not just searchable memory, but proactive memory that surfaces the right pattern, context, or decision when the work makes it relevant.

Alex Greenshpun portrait

About Alex

Built by Alex Greenshpun,
for people doing real work with AI.

I build systems for serious human-agent work: memory, token efficiency, workflow safety, and the infrastructure that makes agents more useful in practice, not just in demos.

Total Recall comes from using agents every day across product work, writing, planning, research, debugging, and operations, then refusing to accept that all of that context should vanish whenever the session ends.

Local-first Built for privacy, speed, and low operating cost.
Agent-native Designed for agents that can use memory proactively on their own.
Compounding Focused on patterns, judgment, and work that should become easier over time.

If you already use AI agents for real work, you should not have to start over every session.

Total Recall is built for that exact frustration: not the absence of information, but the absence of usable memory. Keep working the way you already do. Let the memory layer catch up.

View GitHub
Works with your existing workflow No special ritual required. Ask normal questions and let the agent use memory when it matters.
Local, proactive, lightweight Zero LLM calls at retrieval time, low latency, and a footprint small enough to stay practical.
Benchmarked, not hand-waved 98.0% answered correctly, and the right memory in the top 10 98.2% of the time, on LongMemEval with GPT-5.4. Plus strong LoCoMo results.