Agent Memory
Agent memory is SochDB's flagship AI-native feature. You write episodes (raw turns, observations, tasks) as they happen, and later recall the relevant ones compiled into a token-budgeted context string ready to drop into an LLM prompt. Retrieval can run over several lanes (lexical, BM25, trigram, hybrid, or all three at once) and is bi-temporal: every episode carries a valid-from timestamp, so you can ask "what did the agent know as of last Tuesday?".
This guide covers the agent-memory surface in each SDK. The shape and the level of "batteries included" differ per language, so the per-SDK notes below are honest about what is a high-level client, what is a set of lower-level pipelines, and what is reached only through a running server.
- Python (0.5.9 SDK) ships the highest-level API:
AgentMemory(a thin client over the server'sContextService+sochdb-memorybackend) plus lower-level pipelines (ExtractionPipeline,Consolidator,HybridRetriever). - Node.js (0.5.3 SDK) ships the lower-level
memorymodule (ExtractionPipeline,Consolidator,HybridRetriever) — no singleAgentMemoryfacade. - Go (0.4.5 SDK) ships the same pipeline trio, but only when built with the
sochdb_embeddedbuild tag. - Rust (
sochdb2.0.3) exposes memory as composable overlays:atomic_memory(crash-safe multi-part memory writes) plustemporal_graph,graph, andsemantic_cache. - MCP exposes memory to any LLM client through the
memory_*tools.
Concepts
| Concept | What it is |
|---|---|
| Episode | A unit of memory — a conversation turn, observation, task, or workflow step, stored as text with metadata. |
| Lane | A retrieval path. SochDB has lexical (keyword), BM25, trigram, vector, and a fused "three-lane" mode. |
| Token budget | A cap on the compiled context size. Lower-priority content is dropped or truncated to fit. |
| Bi-temporal recall | Episodes carry a valid-from time; as_of queries only see episodes valid at or before that point. |
| Entity / Fact | Higher-level structures extracted from episodes (people, projects, documents) and the consolidated facts about them. |
In Python, retrieval lanes are named constants on QueryLanes:
from sochdb import QueryLanes
QueryLanes.LEXICAL # "lexical" — fast keyword recall, no embedding needed
QueryLanes.THREE_LANE # "three_lane" — lexical + vector + graph, fused
QueryLanes.HYBRID # "hybrid" — vector + lexical fusion
QueryLanes.BM25 # "bm25" — BM25 ranking
QueryLanes.TRIGRAM # "trigram" — trigram (substring/fuzzy) matching
Python: high-level AgentMemory
AgentMemory is the recommended entry point. It is a thin wrapper over the server's
ContextService and the sochdb-memory backend, so it talks to a running SochDB
gRPC server via SochDBClient.
AgentMemory is a gRPC client. Start a server with the memory backend first:
cargo run -p sochdb-grpc --release
# or, once installed:
sochdb-grpc-server --host 127.0.0.1 --port 50051
Then point the client at it (default localhost:50051).
Constructor
AgentMemory(
client, # a connected SochDBClient
namespace="default",
*,
session_id=None, # defaults to namespace
token_limit=4096, # default recall budget
output_format="markdown" # "markdown" | "json" | "toon"
)
There is also a convenience factory, create_agent_memory(client, namespace=..., token_limit=...), which returns the same object.
Writing episodes
write_episode(
text,
*,
t_valid_from=None, # unix ms; when the fact became true (defaults to now)
metadata=None, # dict of arbitrary metadata
namespace=None # override the instance namespace
) -> EpisodeWriteResult
EpisodeWriteResult carries episode_id, t_created, lexical_indexed,
ingestion_lag_us, enrichment_queued, and error. Writes are
lexically indexed immediately; vector/graph enrichment is queued and runs
asynchronously (so enrichment_queued is typically True right after a write and a
three-lane search becomes richer once enrichment completes).
Searching
search(
query,
*,
token_limit=None, # defaults to the instance token_limit
lanes="lexical", # a QueryLanes value
format=None, # override output format for this call
namespace=None,
as_of=None # unix ms; bi-temporal point-in-time recall
) -> ContextQueryResult
ContextQueryResult carries context (the compiled string), total_tokens,
section_results (a list of ContextSectionResult with name, tokens_used,
truncated, content), and error. Pass as_of to restrict recall: only episodes
whose t_valid_from is at or before as_of are visible.
Other helpers: get_episode(doc_id), compile_context(sections, ...),
estimate_tokens(content), and format_context(content, format=...).
End-to-end example
from sochdb import SochDBClient, create_agent_memory, QueryLanes
# Connect to a running server with the memory backend enabled.
with SochDBClient("localhost:50051") as client:
memory = create_agent_memory(
client,
namespace="support-agent",
token_limit=4096,
)
# 1. Write episodes as they happen.
w1 = memory.write_episode(
"Caroline reported a login failure after the May 7 deploy.",
metadata={"speaker": "Caroline", "type": "conversation"},
)
w2 = memory.write_episode(
"Engineering rolled back the May 7 auth change at 14:20 UTC.",
metadata={"speaker": "ops", "type": "task"},
)
print(f"wrote {w1.episode_id} and {w2.episode_id}")
print(f"lexical_indexed={w1.lexical_indexed}, "
f"enrichment_queued={w1.enrichment_queued}")
# 2. Fast lexical recall (no embedding required).
lexical = memory.search("login failure", lanes=QueryLanes.LEXICAL)
print(f"lexical recall used {lexical.total_tokens} tokens")
print(lexical.context)
# 3. Three-lane recall (lexical + vector + graph) under a tighter budget.
fused = memory.search(
"auth rollback timeline",
lanes=QueryLanes.THREE_LANE,
token_limit=2000,
)
print(f"three-lane recall used {fused.total_tokens} tokens")
for section in fused.section_results:
print(f" {section.name}: {section.tokens_used} tokens, "
f"truncated={section.truncated}")
Bi-temporal recall with as_of
Because each episode has a t_valid_from, you can reconstruct what the agent knew at
a past moment. Only episodes valid at or before as_of are returned.
import time
# Record a point in time, then add a later episode.
cutoff_ms = int(time.time() * 1000)
memory.write_episode(
"Caroline upgraded to the Enterprise plan.",
t_valid_from=int(time.time() * 1000) + 60_000, # valid 1 min from now
)
# Recall as of `cutoff_ms` — the Enterprise upgrade is NOT yet visible.
past = memory.search("Caroline plan", lanes=QueryLanes.LEXICAL, as_of=cutoff_ms)
print(past.context) # reflects only what was true at cutoff_ms
as_of is a unix timestamp in milliseconds.
Python: lower-level pipelines
The 0.5.9 SDK also ships the building blocks behind agent memory, importable from
sochdb. Use these when you want to control extraction and consolidation yourself
rather than going through AgentMemory.
| Class | Factory | Purpose |
|---|---|---|
ExtractionPipeline | create_extraction_pipeline | Turn LLM output into typed Entity / Relation / Assertion records and commit them. |
Consolidator | create_consolidator | Merge raw assertions from many sources into CanonicalFacts, handling contradictions. |
HybridRetriever | create_retriever | BM25 + vector hybrid retrieval returning RetrievalResults. |
NamespaceManager | create_namespace_manager | Manage per-tenant memory namespaces. |
Supporting dataclasses: Entity, Relation, Assertion, CanonicalFact,
RetrievalResult.
Higher-level orchestration shown in older material — a ContextQueryBuilder /
ContextComponent context builder, graph-overlay demos, policy hooks, and tool
routing — are example patterns in the sochdb-python-examples repo, not
importable SDK classes. The importable agent-memory surface in Python is
AgentMemory plus the pipeline classes above. See the
Token-Aware Context Query and
Policy & Safety Hooks guides for how those patterns map
across SDKs.
Node.js: the memory module
The Node.js SDK (@sochdb/sochdb) ships the same lower-level pipeline trio (added in
v0.4.2). There is no single AgentMemory facade; you compose the pieces yourself
against an EmbeddedDatabase.
import { EmbeddedDatabase, ExtractionPipeline, Consolidator, HybridRetriever, AllowedSet } from '@sochdb/sochdb';
// EmbeddedDatabase.open() is synchronous (no await).
const db = EmbeddedDatabase.open('./agent_memory.sdb');
// 1. Extract entities/relations/assertions from text via your own LLM extractor.
const pipeline = new ExtractionPipeline(db, 'support-agent');
const result = await pipeline.extractAndCommit(
'Caroline reported a login failure after the May 7 deploy.',
async (text) => {
// call your LLM; return { entities, relations, assertions }
return { entities: [], relations: [], assertions: [] };
},
);
// 2. Consolidate raw assertions into canonical facts.
const consolidator = new Consolidator(db, 'support-agent');
const merged = consolidator.consolidate();
// 3. Retrieve with a BM25 + vector hybrid.
const retriever = HybridRetriever.fromDatabase(db, 'support-agent', 'episodes');
const hits = await retriever.retrieve(
'login failure', // query text (BM25 lane)
queryEmbedding, // query vector (vector lane)
AllowedSet.allowAll(), // pre-filter
10, // k
);
db.close();
RetrievalConfig defaults: k=10, alpha=0.5, enableRerank=false,
rerankK=100. AllowedSet factories — AllowedSet.fromIds(...),
.fromNamespace(...), .fromFilter(...), .allowAll() — pre-filter candidates
before retrieval.
For token-aware prompt assembly, pair the retriever with the Node
ContextQueryBuilder (see Token-Aware Context Query).
Go: the memory_* pipelines
The Go SDK ships the same trio (ExtractionPipeline, Consolidator,
HybridRetriever), but they depend on the embedded FFI engine.
sochdb_embedded build tagThe Go SDK is remote-first by default — a plain go build compiles only the
remote/gRPC path. The memory pipelines live in files gated behind
//go:build sochdb_embedded, so you must build with the tag (which also requires the
native libsochdb_storage, CGO, and pkg-config):
go build -tags sochdb_embedded ./...
The pure-Go memory types (Entity, Relation, Assertion, RawAssertion,
CanonicalFact, RetrievalResult, RetrievalResponse, and the ExtractionSchema
/ ConsolidationConfig / RetrievalConfig configs) always compile; only the
pipeline implementations are gated.
//go:build sochdb_embedded
package main
import (
"github.com/sochdb/sochdb-go"
"github.com/sochdb/sochdb-go/embedded"
)
func main() {
db, err := embedded.Open("./agent_memory.sdb")
if err != nil {
panic(err)
}
defer db.Close()
// Extraction: compile LLM output into typed records.
pipeline := sochdb.NewExtractionPipeline(db, "support-agent", nil)
_, _ = pipeline.ExtractAndCommit(
"Caroline reported a login failure after the May 7 deploy.",
func(text string) (map[string]interface{}, error) {
// call your LLM; return extracted entities/relations/assertions
return map[string]interface{}{}, nil
},
)
// Consolidation: merge raw assertions into canonical facts.
consolidator := sochdb.NewConsolidator(db, "support-agent", nil)
_, _ = consolidator.Consolidate()
// Retrieval: BM25 + semantic hybrid.
retriever := sochdb.NewHybridRetriever(db, "support-agent", nil)
_, _ = retriever.Retrieve("login failure", sochdb.NewAllAllowedSet())
}
Rust: memory as composable overlays
The Rust crate (sochdb 2.0.3) does not ship a single AgentMemory type. Instead it
gives you composable building blocks that are generic over any connection
implementing ConnectionTrait:
atomic_memory— write multi-part memory (blobs, embeddings, graph nodes and edges) crash-safely as one unit, with intent recovery.temporal_graph— bi-temporal edges (add_edge_at,get_edges_at,neighbors_at,edge_history) for time-travel queries.graph— entity/relation graph traversal (bfs,dfs,shortest_path).semantic_cache— cache LLM responses keyed by embedding similarity.
use std::sync::Arc;
use sochdb::SochConnection;
use sochdb::atomic_memory::{AtomicMemoryWriter, MemoryWriteBuilder};
let conn = Arc::new(SochConnection::open("./agent_memory")?);
let writer = AtomicMemoryWriter::new(conn.clone());
// Assemble a memory write: blob + embedding + graph structure, committed atomically.
let memory = MemoryWriteBuilder::new("episode-001")
.put_blob("text", b"Caroline reported a login failure after the May 7 deploy.")
.put_embedding("embedding", embedding_vec)
.create_node("caroline", "user")
.create_edge("caroline", "reported", "episode-001");
writer.write_atomic(memory)?;
// On startup, recover any partially-applied memory writes.
let report = writer.recover()?;
For time-travel recall, build a TemporalGraphOverlay and query at a timestamp:
use sochdb::temporal_graph::TemporalGraphOverlay;
let graph = TemporalGraphOverlay::new(conn.clone(), "support-agent");
let edges_now = graph.get_edges_at(&node_id, current_ts)?;
See the Graph Overlay guide for traversal details.
MCP: memory tools for LLM clients
The standalone MCP server (sochdb-mcp) exposes agent memory directly to LLM clients
(Claude Desktop, Cursor, Goose) over stdio. The memory tools are:
| Tool | What it does | Key args |
|---|---|---|
memory_search_episodes | Semantic-similarity search over episodes | query, k (default 5), episode_type, entity_id |
memory_get_episode_timeline | Event timeline for one episode | episode_id, max_events (default 50), role, include_metrics |
memory_search_entities | Search entities (users, projects, documents, …) | query, k (default 10), kind |
memory_get_entity_facts | Facts about an entity | entity_id, include_episodes (default true), max_episodes (default 5) |
memory_build_context | One-shot token-budgeted context packing | goal, token_budget (default 4096), session_id, episode_id, entity_ids, include_schema |
The authoritative tool names are the underscore forms above
(memory_search_episodes, etc.), taken from the Rust source
(tools.rs::get_built_in_tools()). The dot-named catalog (memory.search_episodes)
in the bundled mcp.json and the MCP README is stale — do not rely on it.
The MCP server has an optional semantic-search Cargo feature (and an
SOCHDB_SEMANTIC_SEARCH=1 runtime switch) that loads an embedding model. Even when
enabled, the vector search path is currently disabled in code "pending backend
support" — embeddings are generated but the memory tools fall back to keyword/text
scan. Treat MCP memory recall as lexical for now.
Run the server and point a client at it:
cargo build --release --package sochdb-mcp
./target/release/sochdb-mcp --db ./sochdb_data
Claude Desktop config (claude_desktop_config.json):
{
"mcpServers": {
"sochdb": {
"command": "/absolute/path/to/sochdb-mcp",
"args": ["--db", "/absolute/path/to/sochdb_data"],
"env": { "RUST_LOG": "info" }
}
}
}
See the Deployment guide for running the gRPC server that
backs the Python AgentMemory client, and the
Token-Aware Context Query guide for the budgeting model
shared by search / memory_build_context.
Choosing an approach
- Want the least code and a real recall pipeline? Use Python
AgentMemoryagainst a running gRPC server. - Building inside a Node or Go app and willing to wire extraction yourself? Use the
memorypipelines (Go needs-tags sochdb_embedded). - Need crash-safe, in-process memory with full control over storage? Use the
Rust
atomic_memory+temporal_graphoverlays. - Exposing memory to a desktop LLM client? Use the MCP
memory_*tools.