Adapts to the way you work.
Install once, then keep talking to your agent the way you already do.
Give your AI agents durable memory for real work.
You keep working as you do. But now, your agent remembers the context, decisions, patterns, and past work even when a session ends.
Install once, then keep talking to your agent the way you already do.
No LLM calls at search time. No giant memory dump sitting in the prompt all day.
96.8% Recall@10 and 98.0% end-to-end on LongMemEval.
The Problem
You figure something out with an AI agent once, then spend time rediscovering it later. The proposal shape that worked. The bug pattern that always comes back. The customer constraint that changed the plan. The operating decision that explained everything.
The issue is not lack of information. The issue is that useful history stops being usable right when the next conversation starts.
Benchmarks
These are not decorative metrics. They measure whether the system can recover the right history and support correct final answers under long-horizon memory conditions.
Deterministic, reproducible retrieval from the local index. No LLM is used to search for the right session.
Strong conversational-memory performance on a second public benchmark for long-dialogue reasoning.
Zero LLM calls at retrieval time. Memory stays local, lightweight, and out of the prompt until it matters.
Public benchmark results plus the product differences that matter most in practice.
| System | LME E2E | LME Recall@10 | LoCoMo | Local-first | LLM at retrieval |
|---|---|---|---|---|---|
| Total RecallLeading | 98.0% GPT-5.4 |
96.8% deterministic, no LLM retrieval |
87.3% J-Score |
Yes | No |
| Mastra OM | 94.87% gpt-5-mini |
Not published | Not published | Not the core story | LLM in memory formation |
| Hindsight | 91.40% gemini-3-pro-preview |
Not published | Not published | Self-hostable | Mixed system |
| Zep | 71.20% gpt-4o |
Not published | Not published | Mostly cloud | Varies |
LongMemEval is the clearest external benchmark for the Total Recall use case: recovering relevant history from many prior sessions and supporting correct answers across that history.
LoCoMo focuses on long conversations rather than many separate sessions. It is useful because it shows how the system behaves under long-dialogue retrieval and answer-faithfulness pressure.
What It Understands Over Time
The point is not to hoard random details. The point is to recover the narrative:
what changed, why it changed, what worked, what kept breaking, and which pattern matters right now.
Pull back the real sequence of events: what started the problem, what you tried first, what changed the direction, and why the final decision made sense.
Bring back the tradeoffs, constraints, and judgment behind the decision, not just the final line item that ended up in a document or code diff.
Spot recurring bugs, repeated objections, familiar decision shapes, and the patterns that let the agent act with context instead of improvising from scratch.
Dashboard
Search across sessions, projects, people, decisions, and turning points.
Browse the actual work history with its context intact,
not a pile of disconnected summaries.
The dashboard makes the memory layer inspectable, searchable, reusable, and grounded in the real narrative of the work.
Why It Fits
Total Recall is built so you do not have to contort your workflow around the memory system.
The memory system adapts to the way you and your agent already operate.
Memory stays out of the live prompt until it is needed. No permanent context bloat.
Search happens locally, so the memory layer stays fast, private, and cheap to run.
The agent can surface relevant past work on its own instead of waiting for the perfect command.
Small footprint, low latency, and a setup that does not ask you to adopt a brand new ritual just to get memory.
Beyond Recall
Session recall is the foundation. The broader direction is a system that compounds recurring patterns,
durable know-how, and practical judgment across real work without turning into a junk drawer.
Not just “we discussed this,” but “this is the bug that usually appears after that change.”
Not every detail should live forever. The system should preserve the patterns and judgment that keep paying off, while letting noise fade.
Not just searchable memory, but proactive memory that surfaces the right pattern, context, or decision when the work makes it relevant.
About Alex
I build systems for serious human-agent work: memory, token efficiency, workflow safety, and the infrastructure that makes agents more useful in practice, not just in demos.
Total Recall comes from using agents every day across product work, writing, planning, research, debugging, and operations, then refusing to accept that all of that context should vanish whenever the session ends.
Other Projects
They come from the same obsession: make agent workflows more trustworthy, more efficient, and more usable in real life.
Automated security auditing for code your agent is about to install. Catches bad actors, supply-chain issues, and prompt-level traps before anything lands on your machine.
github.com/alexgreensh/repo-forensicsStructural and runtime optimization for AI context. Audit where the waste comes from, tighten the stack, and improve output quality instead of just squeezing the window.
github.com/alexgreensh/token-optimizerTotal Recall is built for that exact frustration: not the absence of information, but the absence of usable memory. Keep working the way you already do. Let the memory layer catch up.