Skip to main content

Agent Memory

Agent memory is SochDB's flagship AI-native feature. You write episodes (raw turns, observations, tasks) as they happen, and later recall the relevant ones compiled into a token-budgeted context string ready to drop into an LLM prompt. Retrieval can run over several lanes (lexical, BM25, trigram, hybrid, or all three at once) and is bi-temporal: every episode carries a valid-from timestamp, so you can ask "what did the agent know as of last Tuesday?".

This guide covers the agent-memory surface in each SDK. The shape and the level of "batteries included" differ per language, so the per-SDK notes below are honest about what is a high-level client, what is a set of lower-level pipelines, and what is reached only through a running server.

Where agent memory lives in each SDK
  • Python (0.5.9 SDK) ships the highest-level API: AgentMemory (a thin client over the server's ContextService + sochdb-memory backend) plus lower-level pipelines (ExtractionPipeline, Consolidator, HybridRetriever).
  • Node.js (0.5.3 SDK) ships the lower-level memory module (ExtractionPipeline, Consolidator, HybridRetriever) — no single AgentMemory facade.
  • Go (0.4.5 SDK) ships the same pipeline trio, but only when built with the sochdb_embedded build tag.
  • Rust (sochdb 2.0.3) exposes memory as composable overlays: atomic_memory (crash-safe multi-part memory writes) plus temporal_graph, graph, and semantic_cache.
  • MCP exposes memory to any LLM client through the memory_* tools.

Concepts

ConceptWhat it is
EpisodeA unit of memory — a conversation turn, observation, task, or workflow step, stored as text with metadata.
LaneA retrieval path. SochDB has lexical (keyword), BM25, trigram, vector, and a fused "three-lane" mode.
Token budgetA cap on the compiled context size. Lower-priority content is dropped or truncated to fit.
Bi-temporal recallEpisodes carry a valid-from time; as_of queries only see episodes valid at or before that point.
Entity / FactHigher-level structures extracted from episodes (people, projects, documents) and the consolidated facts about them.

In Python, retrieval lanes are named constants on QueryLanes:

from sochdb import QueryLanes

QueryLanes.LEXICAL # "lexical" — fast keyword recall, no embedding needed
QueryLanes.THREE_LANE # "three_lane" — lexical + vector + graph, fused
QueryLanes.HYBRID # "hybrid" — vector + lexical fusion
QueryLanes.BM25 # "bm25" — BM25 ranking
QueryLanes.TRIGRAM # "trigram" — trigram (substring/fuzzy) matching

Python: high-level AgentMemory

AgentMemory is the recommended entry point. It is a thin wrapper over the server's ContextService and the sochdb-memory backend, so it talks to a running SochDB gRPC server via SochDBClient.

Requires a running server

AgentMemory is a gRPC client. Start a server with the memory backend first:

cargo run -p sochdb-grpc --release
# or, once installed:
sochdb-grpc-server --host 127.0.0.1 --port 50051

Then point the client at it (default localhost:50051).

Constructor

AgentMemory(
client, # a connected SochDBClient
namespace="default",
*,
session_id=None, # defaults to namespace
token_limit=4096, # default recall budget
output_format="markdown" # "markdown" | "json" | "toon"
)

There is also a convenience factory, create_agent_memory(client, namespace=..., token_limit=...), which returns the same object.

Writing episodes

write_episode(
text,
*,
t_valid_from=None, # unix ms; when the fact became true (defaults to now)
metadata=None, # dict of arbitrary metadata
namespace=None # override the instance namespace
) -> EpisodeWriteResult

EpisodeWriteResult carries episode_id, t_created, lexical_indexed, ingestion_lag_us, enrichment_queued, and error. Writes are lexically indexed immediately; vector/graph enrichment is queued and runs asynchronously (so enrichment_queued is typically True right after a write and a three-lane search becomes richer once enrichment completes).

Searching

search(
query,
*,
token_limit=None, # defaults to the instance token_limit
lanes="lexical", # a QueryLanes value
format=None, # override output format for this call
namespace=None,
as_of=None # unix ms; bi-temporal point-in-time recall
) -> ContextQueryResult

ContextQueryResult carries context (the compiled string), total_tokens, section_results (a list of ContextSectionResult with name, tokens_used, truncated, content), and error. Pass as_of to restrict recall: only episodes whose t_valid_from is at or before as_of are visible.

Other helpers: get_episode(doc_id), compile_context(sections, ...), estimate_tokens(content), and format_context(content, format=...).

End-to-end example

from sochdb import SochDBClient, create_agent_memory, QueryLanes

# Connect to a running server with the memory backend enabled.
with SochDBClient("localhost:50051") as client:
memory = create_agent_memory(
client,
namespace="support-agent",
token_limit=4096,
)

# 1. Write episodes as they happen.
w1 = memory.write_episode(
"Caroline reported a login failure after the May 7 deploy.",
metadata={"speaker": "Caroline", "type": "conversation"},
)
w2 = memory.write_episode(
"Engineering rolled back the May 7 auth change at 14:20 UTC.",
metadata={"speaker": "ops", "type": "task"},
)
print(f"wrote {w1.episode_id} and {w2.episode_id}")
print(f"lexical_indexed={w1.lexical_indexed}, "
f"enrichment_queued={w1.enrichment_queued}")

# 2. Fast lexical recall (no embedding required).
lexical = memory.search("login failure", lanes=QueryLanes.LEXICAL)
print(f"lexical recall used {lexical.total_tokens} tokens")
print(lexical.context)

# 3. Three-lane recall (lexical + vector + graph) under a tighter budget.
fused = memory.search(
"auth rollback timeline",
lanes=QueryLanes.THREE_LANE,
token_limit=2000,
)
print(f"three-lane recall used {fused.total_tokens} tokens")
for section in fused.section_results:
print(f" {section.name}: {section.tokens_used} tokens, "
f"truncated={section.truncated}")

Bi-temporal recall with as_of

Because each episode has a t_valid_from, you can reconstruct what the agent knew at a past moment. Only episodes valid at or before as_of are returned.

import time

# Record a point in time, then add a later episode.
cutoff_ms = int(time.time() * 1000)
memory.write_episode(
"Caroline upgraded to the Enterprise plan.",
t_valid_from=int(time.time() * 1000) + 60_000, # valid 1 min from now
)

# Recall as of `cutoff_ms` — the Enterprise upgrade is NOT yet visible.
past = memory.search("Caroline plan", lanes=QueryLanes.LEXICAL, as_of=cutoff_ms)
print(past.context) # reflects only what was true at cutoff_ms

as_of is a unix timestamp in milliseconds.

Python: lower-level pipelines

The 0.5.9 SDK also ships the building blocks behind agent memory, importable from sochdb. Use these when you want to control extraction and consolidation yourself rather than going through AgentMemory.

ClassFactoryPurpose
ExtractionPipelinecreate_extraction_pipelineTurn LLM output into typed Entity / Relation / Assertion records and commit them.
Consolidatorcreate_consolidatorMerge raw assertions from many sources into CanonicalFacts, handling contradictions.
HybridRetrievercreate_retrieverBM25 + vector hybrid retrieval returning RetrievalResults.
NamespaceManagercreate_namespace_managerManage per-tenant memory namespaces.

Supporting dataclasses: Entity, Relation, Assertion, CanonicalFact, RetrievalResult.

Example-only patterns

Higher-level orchestration shown in older material — a ContextQueryBuilder / ContextComponent context builder, graph-overlay demos, policy hooks, and tool routing — are example patterns in the sochdb-python-examples repo, not importable SDK classes. The importable agent-memory surface in Python is AgentMemory plus the pipeline classes above. See the Token-Aware Context Query and Policy & Safety Hooks guides for how those patterns map across SDKs.

Node.js: the memory module

The Node.js SDK (@sochdb/sochdb) ships the same lower-level pipeline trio (added in v0.4.2). There is no single AgentMemory facade; you compose the pieces yourself against an EmbeddedDatabase.

import { EmbeddedDatabase, ExtractionPipeline, Consolidator, HybridRetriever, AllowedSet } from '@sochdb/sochdb';

// EmbeddedDatabase.open() is synchronous (no await).
const db = EmbeddedDatabase.open('./agent_memory.sdb');

// 1. Extract entities/relations/assertions from text via your own LLM extractor.
const pipeline = new ExtractionPipeline(db, 'support-agent');
const result = await pipeline.extractAndCommit(
'Caroline reported a login failure after the May 7 deploy.',
async (text) => {
// call your LLM; return { entities, relations, assertions }
return { entities: [], relations: [], assertions: [] };
},
);

// 2. Consolidate raw assertions into canonical facts.
const consolidator = new Consolidator(db, 'support-agent');
const merged = consolidator.consolidate();

// 3. Retrieve with a BM25 + vector hybrid.
const retriever = HybridRetriever.fromDatabase(db, 'support-agent', 'episodes');
const hits = await retriever.retrieve(
'login failure', // query text (BM25 lane)
queryEmbedding, // query vector (vector lane)
AllowedSet.allowAll(), // pre-filter
10, // k
);

db.close();

RetrievalConfig defaults: k=10, alpha=0.5, enableRerank=false, rerankK=100. AllowedSet factories — AllowedSet.fromIds(...), .fromNamespace(...), .fromFilter(...), .allowAll() — pre-filter candidates before retrieval.

For token-aware prompt assembly, pair the retriever with the Node ContextQueryBuilder (see Token-Aware Context Query).

Go: the memory_* pipelines

The Go SDK ships the same trio (ExtractionPipeline, Consolidator, HybridRetriever), but they depend on the embedded FFI engine.

Requires the sochdb_embedded build tag

The Go SDK is remote-first by default — a plain go build compiles only the remote/gRPC path. The memory pipelines live in files gated behind //go:build sochdb_embedded, so you must build with the tag (which also requires the native libsochdb_storage, CGO, and pkg-config):

go build -tags sochdb_embedded ./...

The pure-Go memory types (Entity, Relation, Assertion, RawAssertion, CanonicalFact, RetrievalResult, RetrievalResponse, and the ExtractionSchema / ConsolidationConfig / RetrievalConfig configs) always compile; only the pipeline implementations are gated.

//go:build sochdb_embedded

package main

import (
"github.com/sochdb/sochdb-go"
"github.com/sochdb/sochdb-go/embedded"
)

func main() {
db, err := embedded.Open("./agent_memory.sdb")
if err != nil {
panic(err)
}
defer db.Close()

// Extraction: compile LLM output into typed records.
pipeline := sochdb.NewExtractionPipeline(db, "support-agent", nil)
_, _ = pipeline.ExtractAndCommit(
"Caroline reported a login failure after the May 7 deploy.",
func(text string) (map[string]interface{}, error) {
// call your LLM; return extracted entities/relations/assertions
return map[string]interface{}{}, nil
},
)

// Consolidation: merge raw assertions into canonical facts.
consolidator := sochdb.NewConsolidator(db, "support-agent", nil)
_, _ = consolidator.Consolidate()

// Retrieval: BM25 + semantic hybrid.
retriever := sochdb.NewHybridRetriever(db, "support-agent", nil)
_, _ = retriever.Retrieve("login failure", sochdb.NewAllAllowedSet())
}

Rust: memory as composable overlays

The Rust crate (sochdb 2.0.3) does not ship a single AgentMemory type. Instead it gives you composable building blocks that are generic over any connection implementing ConnectionTrait:

  • atomic_memory — write multi-part memory (blobs, embeddings, graph nodes and edges) crash-safely as one unit, with intent recovery.
  • temporal_graph — bi-temporal edges (add_edge_at, get_edges_at, neighbors_at, edge_history) for time-travel queries.
  • graph — entity/relation graph traversal (bfs, dfs, shortest_path).
  • semantic_cache — cache LLM responses keyed by embedding similarity.
use std::sync::Arc;
use sochdb::SochConnection;
use sochdb::atomic_memory::{AtomicMemoryWriter, MemoryWriteBuilder};

let conn = Arc::new(SochConnection::open("./agent_memory")?);
let writer = AtomicMemoryWriter::new(conn.clone());

// Assemble a memory write: blob + embedding + graph structure, committed atomically.
let memory = MemoryWriteBuilder::new("episode-001")
.put_blob("text", b"Caroline reported a login failure after the May 7 deploy.")
.put_embedding("embedding", embedding_vec)
.create_node("caroline", "user")
.create_edge("caroline", "reported", "episode-001");

writer.write_atomic(memory)?;

// On startup, recover any partially-applied memory writes.
let report = writer.recover()?;

For time-travel recall, build a TemporalGraphOverlay and query at a timestamp:

use sochdb::temporal_graph::TemporalGraphOverlay;

let graph = TemporalGraphOverlay::new(conn.clone(), "support-agent");
let edges_now = graph.get_edges_at(&node_id, current_ts)?;

See the Graph Overlay guide for traversal details.

MCP: memory tools for LLM clients

The standalone MCP server (sochdb-mcp) exposes agent memory directly to LLM clients (Claude Desktop, Cursor, Goose) over stdio. The memory tools are:

ToolWhat it doesKey args
memory_search_episodesSemantic-similarity search over episodesquery, k (default 5), episode_type, entity_id
memory_get_episode_timelineEvent timeline for one episodeepisode_id, max_events (default 50), role, include_metrics
memory_search_entitiesSearch entities (users, projects, documents, …)query, k (default 10), kind
memory_get_entity_factsFacts about an entityentity_id, include_episodes (default true), max_episodes (default 5)
memory_build_contextOne-shot token-budgeted context packinggoal, token_budget (default 4096), session_id, episode_id, entity_ids, include_schema
Tool names use underscores

The authoritative tool names are the underscore forms above (memory_search_episodes, etc.), taken from the Rust source (tools.rs::get_built_in_tools()). The dot-named catalog (memory.search_episodes) in the bundled mcp.json and the MCP README is stale — do not rely on it.

Semantic search is keyword-backed today

The MCP server has an optional semantic-search Cargo feature (and an SOCHDB_SEMANTIC_SEARCH=1 runtime switch) that loads an embedding model. Even when enabled, the vector search path is currently disabled in code "pending backend support" — embeddings are generated but the memory tools fall back to keyword/text scan. Treat MCP memory recall as lexical for now.

Run the server and point a client at it:

cargo build --release --package sochdb-mcp
./target/release/sochdb-mcp --db ./sochdb_data

Claude Desktop config (claude_desktop_config.json):

{
"mcpServers": {
"sochdb": {
"command": "/absolute/path/to/sochdb-mcp",
"args": ["--db", "/absolute/path/to/sochdb_data"],
"env": { "RUST_LOG": "info" }
}
}
}

See the Deployment guide for running the gRPC server that backs the Python AgentMemory client, and the Token-Aware Context Query guide for the budgeting model shared by search / memory_build_context.

Choosing an approach

  • Want the least code and a real recall pipeline? Use Python AgentMemory against a running gRPC server.
  • Building inside a Node or Go app and willing to wire extraction yourself? Use the memory pipelines (Go needs -tags sochdb_embedded).
  • Need crash-safe, in-process memory with full control over storage? Use the Rust atomic_memory + temporal_graph overlays.
  • Exposing memory to a desktop LLM client? Use the MCP memory_* tools.