Total Recall

Persistent memory for AI agents

Total Recall. Persistent agent memory that learns like you do.

Give your AI agents durable memory for real work.

You keep working as you do. But now, your agent remembers the context, decisions, patterns, and past work even when a session ends.

See benchmarks

Adapts to the way you work.

Install once, then keep talking to your agent the way you already do.

Up to 97% fewer tokens.

Zero tokens at rest, no LLM calls at search time. Memory stays out of the prompt until it matters, so cost stays flat as it grows.

Leading benchmark scores.

The right memory in the top 10 98.2% of the time, and 98.0% answered correctly on LongMemEval.

The Problem

Your AI agent forgets everything the moment a session ends.

You figure something out with an AI agent once, then spend time rediscovering it later. The proposal shape that worked. The bug pattern that always comes back. The customer constraint that changed the plan. The operating decision that explained everything.

The issue is not lack of information. The issue is that useful history stops being usable right when the next conversation starts.

Benchmarks

Benchmarked where persistent memory either works or it doesn't.

These are not decorative metrics. They measure whether the system can recover the right history and support correct final answers under long-horizon memory conditions.

98.0% Answered correctly

Out of 500 hard memory questions, the assistant gave the right final answer (490/500). The bottom line, not just whether a memory showed up.

98.2% Right memory in the top 10

At least one correct past session showed up in the top 10 results, and 95.0% in the harder top 5. This is the retrieval metric most memory tools report.

$0 Retrieval cost

Zero LLM calls at retrieval time. Memory stays local, lightweight, and out of the prompt until it matters.

How Total Recall stacks up.

Public benchmark results plus the product differences that matter most in practice.

System	LME E2E	LME Recall@10	LoCoMo	Local-first	LLM at retrieval
Total RecallLeading	98.0% GPT-5.4	96.8% deterministic, no LLM retrieval	89.1% Recall@10	Yes	No
Mastra OM	94.87% gpt-5-mini	Not published	Not published	Not the core story	LLM in memory formation
Mem0	94.4% self-reported	Not published	Not published	Cloud or self-host	Varies
Honcho	92.6% gemini-3-pro	Not published	Not published	Cloud	Varies
Hindsight	91.40% gemini-3-pro-preview	Not published	Not published	Self-hostable	Mixed system
Supermemory	85.2% gemini-3-pro	Not published	Not published	Cloud	Varies
Zep	71.20% gpt-4o	Not published	Not published	Mostly cloud	Varies

LongMemEval

Best fit for cross-session memory

98.2%Right memory, top 10

98.0%Answered correctly

LongMemEval is the clearest external benchmark for the Total Recall use case: recovering relevant history from many prior sessions and supporting correct answers across that history.

•95.0% right memory in the top 5 (at least one correct session surfaced)

•96.8% found every memory a question needed in the top 10, 90.8% in the top 5, a stricter bar we also report

•Deterministic local retrieval, no LLM at search time, measured separately from answering

LoCoMo

Useful second lens for long dialogue

89.1%Recall@10

LoCoMo focuses on long conversations rather than many separate sessions. It is useful because it shows how the system behaves under long-dialogue retrieval and answer-faithfulness pressure.

•Strong complement to LongMemEval, not a substitute for it

•Helps show conversational memory quality under long histories

•Pure retrieval, zero LLM calls at search time

What It Understands Over Time

Memory that learns and grows with you.

The point is not to hoard random details. The point is to recover the narrative: what changed, why it changed, what worked, what kept breaking, and which pattern matters right now.

Narrative memory

Reconnect the arc

Pull back the real sequence of events: what started the problem, what you tried first, what changed the direction, and why the final decision made sense.

Decision memory

Recover the reasoning

Bring back the tradeoffs, constraints, and judgment behind the decision, not just the final line item that ended up in a document or code diff.

Pattern memory

Connect the dots

Spot recurring bugs, repeated objections, familiar decision shapes, and the patterns that let the agent act with context instead of improvising from scratch.

Dashboard

See the work, the decisions, and the story behind it.

Search across sessions, projects, people, decisions, and turning points. Browse the actual work history with its context intact, not a pile of disconnected summaries.

The dashboard makes the memory layer inspectable, searchable, reusable, and grounded in the real narrative of the work.

Why It Fits

It fits the way you already work.

Total Recall is built so you do not have to contort your workflow around the memory system. The memory system adapts to the way you and your agent already operate.

Prompt hygiene Progressive

Memory stays out of the live prompt until it is needed. No permanent context bloat.

Retrieval Local

Search happens locally, so the memory layer stays fast, private, and cheap to run.

Behavior Proactive

The agent can surface relevant past work on its own instead of waiting for the perfect command.

Footprint Light

Small footprint, low latency, and a setup that does not ask you to adopt a brand new ritual just to get memory.

Beyond Recall

It starts with memory, and grows into intelligence.

Session recall is the foundation. The broader direction is a system that compounds recurring patterns, durable know-how, and practical judgment across real work without turning into a junk drawer.

Memory

Remember the pattern

Not just “we discussed this,” but “this is the bug that usually appears after that change.”

Smart forgetting

Keep what compounds

Not every detail should live forever. The system should preserve the patterns and judgment that keep paying off, while letting noise fade.

Intelligence

Know when it matters

Not just searchable memory, but proactive memory that surfaces the right pattern, context, or decision when the work makes it relevant.

About Alex

Built by Alex Greenshpun,
for people doing real work with AI.

I build systems for serious human-agent work: memory, token efficiency, workflow safety, and the infrastructure that makes agents more useful in practice, not just in demos.

Total Recall comes from using agents every day across product work, writing, planning, research, debugging, and operations, then refusing to accept that all of that context should vanish whenever the session ends.

Local-first Built for privacy, speed, and low operating cost.

Agent-native Designed for agents that can use memory proactively on their own.

Compounding Focused on patterns, judgment, and work that should become easier over time.

Other Projects

Other tools in the same ecosystem.

They come from the same obsession: make agent workflows more trustworthy, more efficient, and more usable in real life.

Repo Forensics

Automated security auditing for code your agent is about to install. Catches bad actors, supply-chain issues, and prompt-level traps before anything lands on your machine.

github.com/alexgreensh/repo-forensics

Token Optimizer

Structural and runtime optimization for AI context. Audit where the waste comes from, tighten the stack, and improve output quality instead of just squeezing the window.

github.com/alexgreensh/token-optimizer

If you already use AI agents for real work, you should not have to start over every session.

Total Recall is built for that exact frustration: not the absence of information, but the absence of usable memory. Keep working the way you already do. Let the memory layer catch up.

View GitHub

Works with your existing workflow No special ritual required. Ask normal questions and let the agent use memory when it matters.

Local, proactive, lightweight Zero LLM calls at retrieval time, low latency, and a footprint small enough to stay practical.

Benchmarked, not hand-waved 98.0% answered correctly, and the right memory in the top 10 98.2% of the time, on LongMemEval with GPT-5.4. Plus strong LoCoMo results.