SochDB Sync-First Architecture
Version: 0.3.5+
Status: Stable
Last Updated: January 2025
Table of Contentsâ
- Overview
- Design Philosophy
- Architecture Layers
- Why Sync-First?
- Implementation Details
- Feature Flags
- Performance Characteristics
- Comparison with Other Databases
- Best Practices
Overviewâ
SochDB v0.3.5 adopts a sync-first core architecture where the async runtime (tokio) is truly optional. This design follows the proven pattern established by SQLite: a synchronous storage engine with async capabilities only where needed (network I/O, async client APIs).
Key Principlesâ
- Sync by Default: Core storage operations are synchronous
- Async at Edges: Network and I/O use async when beneficial
- Opt-In Complexity: Async runtime only when explicitly needed
- Zero-Cost Abstraction: No async overhead for sync-only use cases
Design Philosophyâ
The SQLite Modelâ
SQLite is the most deployed database in the world, embedded in billions of devices. Its success stems from:
- Simplicity: No server process, no configuration
- Portability: Single file, cross-platform
- Efficiency: Direct system calls, no runtime overhead
- Predictability: Synchronous operations, clear error handling
SochDB v0.3.5 adopts this philosophy while adding:
- Modern vector search capabilities
- LLM-native features (TOON format, context queries)
- Optional async for network-heavy workloads
Why Not Async Everywhere?â
Async is not free:
- Runtime overhead (~500KB for tokio)
- ~40 additional dependencies
- Cognitive complexity (colored functions)
- FFI boundary challenges (Python, Node.js)
- Longer compile times
Async is beneficial when:
- Handling many concurrent network connections (gRPC server)
- I/O-bound workloads with high concurrency
- Explicit async client APIs (streaming queries)
For storage operations:
- Disk I/O is already buffered (OS page cache)
- Most operations complete in microseconds
- Blocking is acceptable (thread-per-connection model)
Architecture Layersâ
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â Application Layer â
â (User code: Python, Node.js, Rust, Go) â
âââââââââââââââââââââââŽââââââââââââââââââââââââââââââââââââ
â
âââââââââââââââīââââââââââââââ
â â
âž âž
ââââââââââââââââââââ ââââââââââââââââââââââââ
â Embedded FFI â â gRPC Server â
â (Sync Only) â â (Requires tokio) â
â â â â
â âĒ Python SDK â â âĒ Async handlers â
â âĒ Node.js SDK â â âĒ Connection pool â
â âĒ Go bindings â â âĒ Streaming â
ââââââââââŽââââââââââ ââââââââââââŽââââââââââââ
â â
â â [tokio boundary]
â â
ââââââââââŽâââââââââââââââââ
â
âž
âââââââââââââââââââââââââââââââââââ
â Client API Layer â
â (sochdb) â
â âĒ Sync methods (default) â
â âĒ Async methods (optional) â
ââââââââââââââŽâââââââââââââââââââââ
â
âž
âââââââââââââââââââââââââââââââââââ
â Sync-First Core â
â (NO tokio dependency) â
â â
â âââââââââââââââââââââââââââ â
â â Storage Engine â â
â â âĒ LSM tree (SSTable) â â
â â âĒ WAL (Write-Ahead Log)â â
â â âĒ MVCC â â
â â âĒ Compaction â â
â âââââââââââââââââââââââââââ â
â â
â âââââââââââââââââââââââââââ â
â â Query Engine â â
â â âĒ SQL parser â â
â â âĒ AST executor â â
â â âĒ Optimizer â â
â âââââââââââââââââââââââââââ â
â â
â âââââââââââââââââââââââââââ â
â â Vector Index â â
â â âĒ HNSW construction â â
â â âĒ Similarity search â â
â â âĒ Quantization â â
â âââââââââââââââââââââââââââ â
â â
â âââââââââââââââââââââââââââ â
â â Concurrency Control â â
â â âĒ parking_lot::Mutex â â
â â âĒ crossbeam channels â â
â â âĒ atomic operations â â
â âââââââââââââââââââââââââââ â
âââââââââââââââââââââââââââââââââââ
Layer Descriptionsâ
1. Application Layerâ
- User code: Python scripts, Node.js apps, Rust programs
- No async knowledge required: Just call database methods
- Examples: See
examples/directory
2. Client Interface Layerâ
Embedded FFI (Sync)
- Direct Rust function calls via FFI
- Python:
cffiorpyo3bindings - Node.js:
napi-rsbindings - Zero overhead: no serialization, no network
- No tokio: Pure synchronous calls
gRPC Server (Async)
- Multi-client support
- Unix socket or TCP
- Requires tokio for connection handling
- Useful for: microservices, language interop
3. Sync-First Coreâ
All core storage operations are synchronous:
| Component | Sync/Async | Rationale |
|---|---|---|
| Storage engine | Sync | Disk I/O is buffered, completes fast |
| MVCC | Sync | In-memory operations, microsecond latency |
| WAL | Sync | fsync is blocking anyway |
| SQL parser | Sync | CPU-bound, no I/O |
| Vector index | Sync | Memory operations, SIMD vectorization |
| Compaction | Sync | Background thread, no async needed |
Why Sync-First?â
1. Binary Sizeâ
Embedded Use Case:
# Without tokio (v0.3.5)
cargo build --release -p sochdb-storage
# Binary: 732 KB
# With tokio (v0.3.4)
cargo build --release -p sochdb-storage --features async
# Binary: 1,200 KB
# Savings: 468 KB (39% reduction)
Why it matters:
- Mobile apps: limited space
- WASM: every KB counts
- Edge devices: constrained resources
- Docker images: faster pulls
2. Dependency Treeâ
# Sync-only (v0.3.5)
cargo tree -p sochdb-storage --no-default-features | wc -l
# 62 crates
# With async
cargo tree -p sochdb-storage --features async | wc -l
# 102 crates
# Reduction: 40 fewer dependencies
Benefits:
- Faster compilation
- Fewer security audits
- Reduced supply chain risk
- Simpler dependency management
3. FFI Boundaryâ
Problem with async FFI:
# Python calling Rust async function
import sochdb
# This is complex!
db = sochdb.Database.open("./my_db") # Creates tokio runtime in Rust
db.put_async(b"key", b"value") # Needs event loop bridge
# Python's asyncio â Rust's tokio: impedance mismatch
Sync FFI is natural:
# Python calling Rust sync function
import sochdb
db = sochdb.Database.open("./my_db") # Direct Rust call
db.put(b"key", b"value") # Direct Rust call, returns immediately
# No async ceremony!
4. Mental Modelâ
Sync code is simpler:
// Sync: straightforward
fn write_data(db: &Database, key: &[u8], value: &[u8]) -> Result<()> {
db.put(key, value)?;
println!("Written!");
Ok(())
}
Async adds complexity:
// Async: requires runtime, colored functions
async fn write_data(db: &Database, key: &[u8], value: &[u8]) -> Result<()> {
db.put_async(key, value).await?; // Must await
println!("Written!");
Ok(())
}
// Caller must also be async (function coloring)
#[tokio::main]
async fn main() {
write_data(&db, b"key", b"value").await.unwrap();
}
5. Performanceâ
For single-threaded workloads:
- Sync is faster: no runtime overhead
- Direct system calls
- Better CPU cache locality
For multi-threaded workloads:
- Thread-per-connection model works fine
- OS scheduler is efficient
- No need for async unless 10,000+ connections
Benchmark (1,000 writes):
Sync: 1.2ms (default)
Async: 1.5ms (+25% overhead from runtime)
Implementation Detailsâ
Crate Structureâ
sochdb/
âââ sochdb-storage/ # Sync-first storage engine
â âââ Cargo.toml # default = [] (no tokio)
â âââ src/
â âââ engine.rs # Sync operations
â âââ async_ext.rs # Optional async wrappers
â
âââ sochdb-core/ # Core abstractions (sync)
â âââ Cargo.toml # No tokio dependency
â âââ src/
â âââ transaction.rs
â âââ mvcc.rs
â
âââ sochdb-query/ # SQL engine (sync)
â âââ Cargo.toml # No tokio dependency
â âââ src/
â âââ parser.rs
â âââ executor.rs
â
âââ sochdb-index/ # Vector index (sync)
â âââ Cargo.toml # No tokio dependency
â âââ src/
â âââ hnsw.rs
â
âââ sochdb-grpc/ # gRPC server (async)
âââ Cargo.toml # Requires tokio
âââ src/
âââ server.rs # Async handlers
Cargo.toml Configurationâ
Workspace root (/Cargo.toml):
[workspace]
members = [
"sochdb-storage",
"sochdb-core",
"sochdb-query",
"sochdb-index",
"sochdb-grpc",
]
[workspace.dependencies]
# â NO tokio here! (was in v0.3.4)
# Each crate declares it explicitly if needed
parking_lot = "0.12"
crossbeam = "0.8"
Storage crate (sochdb-storage/Cargo.toml):
[package]
name = "sochdb-storage"
[features]
default = [] # â
No tokio by default (was ["async"] in v0.3.4)
async = ["tokio"] # Opt-in
[dependencies]
parking_lot = { workspace = true }
crossbeam = { workspace = true }
# â
Explicit, optional
tokio = { version = "1.35", features = ["rt-multi-thread", "sync"], optional = true }
[dev-dependencies]
# â No tokio in dev-dependencies
criterion = "0.5"
gRPC server (sochdb-grpc/Cargo.toml):
[package]
name = "sochdb-grpc"
[dependencies]
sochdb-storage = { path = "../sochdb-storage", features = ["async"] } # â
Requires async
tokio = { version = "1.35", features = ["rt-multi-thread", "net", "sync"] } # â
Required
tonic = "0.10"
prost = "0.12"
Synchronization Primitivesâ
Instead of tokio primitives:
// â Old (v0.3.4): tokio dependency
use tokio::sync::Mutex;
use tokio::sync::RwLock;
// â
New (v0.3.5): no tokio
use parking_lot::Mutex;
use parking_lot::RwLock;
Benefits of parking_lot:
- No async runtime required
- Faster: optimized assembly
- Smaller binary footprint
- Better suited for short critical sections
Channel Usageâ
Instead of tokio channels:
// â Old: tokio::sync::mpsc
use tokio::sync::mpsc;
let (tx, rx) = mpsc::channel(100);
tx.send(value).await?;
// â
New: crossbeam::channel
use crossbeam::channel;
let (tx, rx) = channel::bounded(100);
tx.send(value)?; // Blocking, but completes fast
Feature Flagsâ
Available Featuresâ
| Feature | Enables | Use Case |
|---|---|---|
default = [] | Sync-only storage | Embedded, FFI, CLI tools |
async | tokio runtime, async methods | gRPC server, async clients |
Usage Examplesâ
Sync-Only (Default)
[dependencies]
sochdb = "0.4.0"
use sochdb::Database;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let db = Database::open("./my_db")?;
db.put(b"key", b"value")?;
Ok(())
}
With Async
[dependencies]
sochdb = { version = "0.4.0", features = ["async"] }
tokio = { version = "1.35", features = ["rt-multi-thread"] }
use sochdb::Database;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let db = Database::open("./my_db")?;
db.put_async(b"key", b"value").await?;
Ok(())
}
Performance Characteristicsâ
Latency Comparisonâ
| Operation | Sync (v0.3.5) | Async (v0.3.5) | Overhead |
|---|---|---|---|
| Single write | 8 Ξs | 11 Ξs | +37% |
| Single read | 2 Ξs | 3 Ξs | +50% |
| Transaction (10 writes) | 45 Ξs | 55 Ξs | +22% |
| Index search (k=10) | 15 Ξs | 18 Ξs | +20% |
Conclusion: Async adds measurable overhead for single-threaded workloads.
Throughput Comparisonâ
| Workload | Sync | Async | Winner |
|---|---|---|---|
| 1 client, sequential | 125k ops/s | 90k ops/s | Sync |
| 10 clients, concurrent | 240k ops/s | 280k ops/s | Async |
| 100 clients, concurrent | 200k ops/s | 450k ops/s | Async |
| 1000 clients, concurrent | N/A (thread limit) | 580k ops/s | Async |
Conclusion: Async shines with high concurrency (100+ clients).
Memory Usageâ
| Configuration | Resident Memory (RSS) |
|---|---|
| Sync (1 client) | 12 MB |
| Sync (10 threads) | 45 MB |
| Async (10 tasks) | 28 MB |
| Async (100 tasks) | 35 MB |
Conclusion: Async is more memory-efficient for high concurrency.
Comparison with Other Databasesâ
| Database | Core Architecture | Async Runtime | Binary Size |
|---|---|---|---|
| SochDB v0.3.5 | Sync-first | Optional (tokio) | 732 KB |
| SQLite | Sync-only | None | ~600 KB |
| DuckDB | Sync-only | None | ~3 MB |
| RocksDB | Sync-only | None | ~8 MB |
| Sled | Async-first | Built-in | ~2 MB |
| SurrealDB | Async-first | Required (tokio) | ~15 MB |
SochDB's Position:
- Follows SQLite/DuckDB pattern (sync-first)
- But offers async opt-in for network workloads
- Best of both worlds: small by default, scalable when needed
Best Practicesâ
When to Use Sync (Default)â
â Use sync when:
- Embedding in applications (mobile, desktop, WASM)
- FFI boundaries (Python, Node.js, Ruby)
- CLI tools
- Single-threaded scripts
- Low-latency requirements
- You want minimal dependencies
Example:
// Perfect for embedded use
fn process_batch(db: &Database, items: &[Item]) -> Result<()> {
for item in items {
db.put(&item.key, &item.value)?;
}
Ok(())
}
When to Enable Asyncâ
â Enable async when:
- Running gRPC server (100+ concurrent clients)
- Streaming large result sets
- Integrating with async frameworks (axum, actix-web)
- You already have tokio in your dependency tree
Example:
// gRPC server with high concurrency
#[tokio::main]
async fn main() {
let server = GrpcServer::new("./my_db").await;
server.serve("0.0.0.0:50051").await; // Handles 1000+ clients
}
Hybrid Approachâ
Use sync for storage, async for network:
use sochdb::Database;
use axum::{Router, routing::get};
#[tokio::main]
async fn main() {
// Sync database (no async feature)
let db = Database::open("./my_db").unwrap();
let db = Arc::new(db);
// Async HTTP server
let app = Router::new()
.route("/get/:key", get({
let db = db.clone();
move |key| async move {
// Sync call inside async handler
let value = db.get(&key).unwrap();
value.map(|v| String::from_utf8_lossy(&v).to_string())
}
}));
axum::Server::bind(&"0.0.0.0:3000".parse().unwrap())
.serve(app.into_make_service())
.await
.unwrap();
}
Why this works:
- Database operations are fast (< 100Ξs)
- Blocking inside async is acceptable for short operations
- No need for async database methods
- Smaller binary, simpler code
Future Considerationsâ
Planned Enhancements (v0.4.0+)â
- Async Streaming: Optional async iterators for large result sets
- Connection Pooling: Optional async connection pool for multi-tenant setups
- Async Compaction: Background compaction with tokio::task::spawn
- Hybrid Transactions: Sync writes, async replication
Not Plannedâ
- Making core storage async-first
- Requiring tokio for embedded use
- Async-only APIs
Conclusionâ
SochDB's sync-first architecture provides:
â
Simplicity: No async complexity for most use cases
â
Efficiency: ~500KB smaller binaries, 40 fewer dependencies
â
Compatibility: Easy FFI, works with sync codebases
â
Flexibility: Opt-in async when you need it
â
Performance: Fast for single-threaded, scalable for concurrent workloads
Philosophy: "Async is a tool, not a religion. Use it where it helps, avoid it where it hurts."