Migrating from 0.4.x to v2.0
SochDB v2.0 is a major step up from the 0.4.x line. The core engine grew a real network server (auth, change data capture, metrics, WebSocket, Postgres wire), the SQL executor learned joins and aggregates, and the HNSW defaults were re-tuned for modern high-dimensional embeddings. Several behavioral changes are breaking โ read the Breaking and behavioral changes section before you upgrade a production deployment.
This guide assumes you are coming from a 0.4.x install. If you skipped the 0.5.0 release, its changes are folded in here too.
Version mapโ
The four components version independently โ there is no single unified version line anymore. Pin each one explicitly.
| Component | 0.4.x era | v2.0 | Install |
|---|---|---|---|
| Core engine (Rust workspace, server, MCP) | 0.4.x | 2.0.3 | cargo add sochdb |
| Python SDK | 0.4.x | 0.5.9 | pip install sochdb |
| Node.js SDK | 0.4.x | 0.5.3 | npm install @sochdb/sochdb |
| Go SDK | 0.4.x | 0.4.5 | go get github.com/sochdb/sochdb-go |
The SDKs are not pinned to the core version. The Python SDK at 0.5.9, Node at 0.5.3, and Go at 0.4.5 all target the 2.0.3 engine. Do not assume the SDK version number matches the engine.
- Python
- Node.js
- Go
- Rust
pip install --upgrade sochdb
npm install @sochdb/sochdb@latest
go get github.com/sochdb/sochdb-go@latest
cargo add sochdb
License change โ read this before upgradingโ
The license is now split by component. This is the single most important change for anyone embedding or redistributing SochDB.
| Component | License |
|---|---|
Core engine โ Rust workspace, the sochdb crate, the gRPC server, MCP | AGPL-3.0-or-later (commercial licensing available) |
| Python SDK | Apache-2.0 |
| Node.js SDK | Apache-2.0 |
| Go SDK | Apache-2.0 |
- Using one of the language SDKs (Python, Node.js, Go) in your application is governed by Apache-2.0 โ permissive, no copyleft obligations on your code.
- The core engine itself (the
sochdbRust crate and thesochdb-grpc-serverbinary) is AGPL-3.0-or-later. The AGPL's network clause applies: if you modify the engine and offer it as a network service, you must make your modified source available to users of that service. - Running the unmodified server binary and talking to it over gRPC/WebSocket/PG wire from an Apache-2.0 SDK does not, by itself, place AGPL obligations on your application code. Linking or statically embedding the AGPL Rust crate into your own binary does.
- A commercial license for the core engine is available if AGPL terms do not fit
your distribution model. See the engine
NOTICEfile for inquiry details.
This is not legal advice. If you redistribute or modify the core engine, consult your counsel.
What's new in the v2.0 serverโ
In 0.4.x the only network surface was a thin gRPC server. v2.0 ships a "thick"
server (sochdb-grpc-server) with auth, change data capture, observability, a
WebSocket gateway, and a Postgres wire listener โ all configurable via CLI flags.
See the Deployment guide for the full flag reference.
Authentication and RBAC (new)โ
Auth is opt-in via --auth. When disabled, every request resolves to an anonymous
principal with read/write/manage-collections capabilities (the 0.4.x behavior).
sochdb-grpc-server --auth --api-key "$SOCHDB_API_KEY"
- Credentials are presented as gRPC metadata:
authorization: Bearer <token>(preferred) orx-api-key: <key>(fallback). - RBAC roles are
Owner,Editor, andViewer(plus aCustomrole).Ownerhas all capabilities;Editorgets read/write/manage-collections/manage-indexes;Viewergets read and view-metrics. - JWT is validation-only (HS256). The server verifies tokens but does not
mint them โ issue tokens from your own IdP or signing service and supply the
shared secret via
SOCHDB_JWT_SECRET. - API keys are stored hashed with SHA-256, or HMAC-SHA256 when you set a pepper
(
SOCHDB_API_KEY_PEPPER). Argon2 is used only for user passwords, not API keys. - TLS/mTLS via
--tls-cert/--tls-key(and--tls-cafor client-cert verification).
Change Data Capture and subscriptions (new)โ
A WAL-derived CDC log feeds a gRPC SubscriptionService so clients can stream
inserts/updates/deletes/schema-changes.
rpc Subscribe(SubscribeRequest) returns (stream SubscribeEvent)
rpc WatchKey(WatchKeyRequest) returns (stream WatchKeyEvent)
- Subscribe by namespace, table list, and operation type; resume from a sequence
number (
start_sequence > 0) or from the latest. - Events are after-image only in this release (the
beforevalue is not populated).
where_predicate is accepted but not enforcedSubscribeRequest has a where_predicate field (a SQL WHERE clause), but the
streaming handler does not yet apply it. Table filtering and operation-type
filtering are enforced; row-level predicate filtering is not. Filter client-side
for now.
Prometheus metrics (new)โ
sochdb-grpc-server --metrics-port 9090 # 0 disables
Exposes GET /metrics (Prometheus text format) and GET /health on the metrics
port. Metric families include sochdb_grpc_requests_total,
sochdb_grpc_request_duration_seconds, sochdb_sql_queries_total,
sochdb_transactions_total, and sochdb_build_info.
WebSocket gateway (new)โ
sochdb-grpc-server --ws-port 8080 # 0 disables
A JSON message protocol at ws://<host>:8080/ supports sql, kv_get, kv_put,
kv_delete, subscribe, and ping request types.
In the default server binary the WebSocket gateway is not wired to the CDC log,
so subscribe over WebSocket does not stream change events. Use the gRPC
SubscriptionService for CDC.
Postgres wire protocol (new)โ
sochdb-grpc-server --pg-port 5433 --pg-data-dir /var/lib/sochdb/pg
psql -h 127.0.0.1 -p 5433 -d sochdb
The Postgres wire listener is simple-query protocol only, has no SSL/TLS
(cleartext), and uses trust auth (no password). Without --pg-data-dir it runs
an echo placeholder; you must pass --pg-data-dir to get real SQL
(SELECT/INSERT/UPDATE/DELETE/DDL including joins). Because it exposes a writable SQL
database with no auth, the server logs a loud warning if you bind it to a non-loopback
address. Keep it on 127.0.0.1 unless you fully accept the risk.
At-rest encryption (library API, not yet a server flag)โ
SochDB ships an EncryptionEngine (AES-256-GCM-SIV, nonce-misuse-resistant) that
encrypts data blocks, WAL entries, and checkpoint files with a 32-byte key. It is
unit-tested and usable as a library API, but there is no CLI flag to enable at-rest
encryption in sochdb-grpc-server โ the server's main path does not construct an
EncryptionEngine yet. Treat this as available-API / planned-wiring, not a runtime
toggle.
SQL engine upgradeโ
The SQL executor (the volcano operator engine in the core) gained real relational operators in v2.0.
Joinsโ
HashJoin, NestedLoopJoin, and MergeJoin are implemented, supporting INNER,
LEFT, RIGHT, FULL, and CROSS joins. Equi-joins (ON a = b) plan to a hash join;
non-equi ON conditions use a nested-loop join.
SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.total > 100;
The in-tree compatibility matrix still marks multi-table joins as "Partial/Planned". That matrix is stale relative to the executor and bridge code, which do implement the join types above. Trust the executor.
Aggregates and GROUP BY / HAVINGโ
COUNT, COUNT(DISTINCT ...), SUM, AVG, MIN, and MAX are supported with
GROUP BY and HAVING.
SELECT category, COUNT(*) AS n, AVG(price) AS avg_price
FROM products
GROUP BY category
HAVING COUNT(*) > 5;
MEDIAN and STDDEV (sample, nโ1) exist only in the dedicated sql/aggregate.rs
path, not in the volcano operator. Plain COUNT/SUM/AVG/MIN/MAX work everywhere.
EXPLAINโ
EXPLAIN SELECT * FROM users WHERE age > 30 ORDER BY name LIMIT 10;
EXPLAIN emits a textual plan tree under a QUERY PLAN column via the volcano path.
Still not supportedโ
DISTINCT (planner stub), window functions, CTEs/WITH, subqueries in
WHERE/SELECT, INTERSECT/EXCEPT, graph-traversal operators, and real CAST
coercion (CAST currently passes the inner value through). See the
SQL guide for the full support matrix.
HNSW default changesโ
The core HnswConfig defaults were re-tuned for high-dimensional embeddings. If you
relied on the old defaults, your recall and build times will change.
| Parameter | 0.4.x default | v2.0 default | Effect |
|---|---|---|---|
max_connections (M) | 16 | 32 | Higher recall (deep-1M recall@10 โ 0.967 โ 0.988); larger graph |
max_connections_layer0 (M0) | 32 | 64 | Standard 2รM |
ef_construction | 200 | 256 | Richer build for hard high-dim embeddings (e.g. Cohere) |
ef_search | โ | 500 | Default query-time breadth |
metric | Cosine | Cosine | Unchanged |
precision | F32 | F32 | Unchanged |
// v2.0 HnswConfig::default()
M (max_connections) = 32
M0 (max_connections_layer0) = 64
ef_construction = 256
ef_search = 500
metric = Cosine
precision = F32
A common misconception is a "500/1500" dimension-aware ef_search split. That does
not exist in the core โ ef_search is a single fixed default of 500. The logic
that is dimension-aware is the brute-force flat-scan threshold: searches over
small indexes use an exact parallel SIMD scan when node_count is below a
dimension-keyed threshold (โค128D: 10,000 vectors; โค384D: 4,000; otherwise 1,000).
MultiShardHnswIndex is a Python-only wrapperMultiShardHnswIndex is not a core-engine type. It exists only in the Python
native package as a threaded scatter-gather wrapper around per-shard HnswIndex
instances. Do not expect it from the server/core.
Breaking and behavioral changesโ
These are the changes most likely to affect existing code. They derive from the 0.5.0 release and the changes accumulated for v2.0.
commit() now returns a real commit timestampโ
Transaction.commit() used to return a hardcoded 0 (or None). It now surfaces a
real HLC (Hybrid Logical Clock) monotonic commit timestamp where the SDK exposes a
return value. Code that checked commit() == 0 to detect failure must switch to
catching exceptions/errors instead.
The exact return shape differs per SDK โ note these carefully:
- Python
- Node.js
- Go
Returns the u64 commit timestamp.
# BEFORE (0.4.x) โ return value was meaningless
with db.transaction() as txn:
txn.put(b"k", b"v")
ts = txn.commit()
assert ts == 0 # used to be hardcoded
# AFTER (0.5.9) โ real HLC timestamp; detect failure via exceptions
from sochdb.errors import TransactionConflictError
try:
with db.transaction() as txn:
txn.put(b"k", b"v")
ts = txn.commit() # int: monotonic commit timestamp
print("committed at", ts)
except TransactionConflictError:
# SSI serialization failure (FFI code -2)
...
The Node commit() returns Promise<void> โ it does not return a
timestamp. Detect failure by catching the thrown TransactionError.
// BEFORE / AFTER โ commit() resolves to void in the embedded SDK
import { open } from '@sochdb/sochdb';
const db = open('./data'); // open() is SYNCHRONOUS (not a Promise)
const txn = db.transaction();
try {
await txn.put(Buffer.from('k'), Buffer.from('v'));
await txn.commit(); // Promise<void>
} catch (err) {
// TransactionError on SSI conflict (error_code -2)
}
The CHANGELOG's "Unreleased" notes mention a Promise<bigint> return for Node
commit(). That was not what shipped: the embedded commit() resolves to
Promise<void>. Where you do need a numeric value, EmbeddedDatabase.checkpoint()
returns Promise<bigint> (the LSN) and IpcClient.beginTransaction() returns
Promise<bigint> (the txn id).
In Go, Transaction.Commit() returns error only โ the FFI commit timestamp is
not surfaced on the SDK method. Detect failure from the returned error.
// AFTER โ Commit returns error only
txn := db.Begin()
if err := txn.Put([]byte("k"), []byte("v")); err != nil {
return err
}
if err := txn.Commit(); err != nil {
// "SSI conflict: transaction aborted ..." on error_code -2
return err
}
The CHANGELOG draft proposed Commit() (uint64, error) for Go, but the shipped Go
SDK keeps Commit() error. The uint64 shows up on BeginTransaction() (txn id)
and on the embedded Checkpoint() (uint64, error), not on commit.
Prefix scans now enforce a minimum prefix lengthโ
To prevent accidental cross-tenant / full-database scans, prefix scans now require a
minimum prefix length (MIN_SCAN_PREFIX_LEN = 2 in storage). Short prefixes now
raise an error. A dedicated unchecked variant is available for power users who
genuinely need a full or short-prefix scan.
- Python
- Node.js
- Go
# BEFORE (0.4.x) โ short prefix silently scanned a lot
db.scan_prefix(b"a") # used to work
# AFTER (0.5.9) โ minimum 2-byte prefix; ValueError on shorter
db.scan_prefix(b"users/") # OK
db.scan_prefix_unchecked(b"") # explicit opt-in for full/short scans
// AFTER โ scanPrefix validates length; use the unchecked variant to bypass
for await (const [key, value] of db.scanPrefix(Buffer.from('users/'))) {
// ...
}
// db.scanPrefixUnchecked(Buffer.alloc(0)) for full/short scans
// AFTER โ ScanPrefix enforces validation; ScanPrefixUnchecked bypasses it
it := db.ScanPrefix([]byte("users/"))
defer it.Close()
for {
k, v, ok := it.Next()
if !ok {
break
}
_ = k
_ = v
}
Database configuration is now actually appliedโ
In 0.4.x, Database.open(config=...) accepted a config object but silently
ignored it. As of v2.0 the config is plumbed through FFI and takes real effect on
durability and write amplification (WAL mode, sync mode, memtable size, block cache,
compaction triggers, compression, checksums, auto-checkpoint interval, and more).
If you were passing a config that was previously ignored, it will now change runtime behavior. In particular, durability/sync settings now matter. Re-validate any performance- or durability-sensitive deployment after upgrading.
# This config was accepted-but-ignored in 0.4.x โ it is now applied.
db = Database.open("./data", config={
"wal_enabled": True,
"sync_mode": "normal",
"memtable_size_bytes": 64 * 1024 * 1024,
"compression_enabled": True,
})
Go SDK is now remote-first by defaultโ
The Go SDK changed its default build target to remote/gRPC. The embedded
in-process FFI engine (and the embedded SemanticCache and the entire embedded Memory
System) now sit behind the sochdb_embedded build tag.
# Default build โ remote/gRPC + pure-Go only. The embedded engine is excluded.
go build ./...
# Embedded FFI engine โ requires the build tag (plus native libsochdb_storage,
# CGO, and pkg-config).
go build -tags sochdb_embedded ./...
// Remote (default build) โ connect to a running server
client, err := sochdb.GrpcConnect("localhost:50051")
// Embedded (requires -tags sochdb_embedded)
db, err := embedded.Open("./data")
"Remote-first" means the default compilation target is remote โ it is not a
single Connect() that auto-selects transport. There is no unified Config that
chooses embedded vs remote; the entrypoints are distinct named functions
(GrpcConnect, Connect for IPC, embedded.Open).
Two Python packages, both named sochdbโ
There are now two importable packages both named sochdb, and they have mostly
disjoint APIs. Be explicit about which one a given class comes from.
| Package | Version | What it is | Key classes |
|---|---|---|---|
| Pure-Python ctypes SDK | 0.5.9 | The broad embedded + server SDK | Database, Namespace, Collection, Queue, AgentMemory, temporal graph, semantic cache, StudioClient |
| Native PyO3 engine | 2.0.3 | A focused native vector/relational engine | HnswIndex, BM25Index, RRFFusion, ThreeLaneHybridIndex, MultiShardHnswIndex, TableDatabase, build_index*, recommended_hnsw_params |
# 0.5.9 SDK โ general application usage (preferred for app code)
import sochdb
db = sochdb.Database.open("./data")
# 2.0.3 native engine โ low-level index building
from sochdb import HnswIndex, recommended_hnsw_params
index = HnswIndex(dimension=768, m=32, ef_construction=200)
There is no "adaptive RRF-k" in the core engine โ the fusion k is fixed at 60.0.
The Python HybridSearchIndex does accept an adaptive_rrf_k=True parameter; that
adaptivity lives only in that Python class, not in the Rust core.
MCP tool names use underscoresโ
If you wired up MCP tools against the old dot-named catalog, rename them. The actual
MCP tool names use underscores: sochdb_query, sochdb_get, sochdb_put,
sochdb_context_query, sochdb_grep, memory_search_episodes, and so on. The
dot-named entries in older mcp.json/README catalogs are stale.
Upgrade checklistโ
- Pin each component to its v2.0 version (core 2.0.3, Python 0.5.9, Node 0.5.3, Go 0.4.5).
- Review license obligations: AGPL-3.0-or-later for the core engine, Apache-2.0 for the SDKs. Arrange a commercial license if AGPL terms do not fit.
- Replace any
commit() == 0failure checks with exception/error handling. - Audit prefix scans for prefixes shorter than the minimum; switch to the
*_uncheckedvariant only where a full/short scan is intended. - Re-validate any
Database.open(config=...)โ configs are now applied for real. - Go: decide remote vs embedded; add
-tags sochdb_embeddedwhere you need the in-process engine. - If using HNSW defaults, re-benchmark recall/latency against the new M=32 / ef_construction=256 / ef_search=500 defaults.
- If running the server, decide which new surfaces to enable:
--auth,--metrics-port,--ws-port,--pg-port(with--pg-data-dir), TLS. - Rename any dot-named MCP tools to the underscore form.
See alsoโ
- Deploying to Production โ full server CLI flag reference
- Working with SQL in SochDB โ joins, aggregates, EXPLAIN
- HNSW Vector Search โ defaults and tuning
- Python SDK ยท Node.js SDK ยท Go SDK