Version: 0.4.4

SochDB Sync-First Architecture

Name: SochDB
Author: SochDB

Version: 0.3.5+
Status: Stable
Last Updated: January 2025

Overview
Design Philosophy
Architecture Layers
Why Sync-First?
Implementation Details
Feature Flags
Performance Characteristics
Comparison with Other Databases
Best Practices

Overview

SochDB v0.3.5 adopts a sync-first core architecture where the async runtime (tokio) is truly optional. This design follows the proven pattern established by SQLite: a synchronous storage engine with async capabilities only where needed (network I/O, async client APIs).

Key Principles

Sync by Default: Core storage operations are synchronous
Async at Edges: Network and I/O use async when beneficial
Opt-In Complexity: Async runtime only when explicitly needed
Zero-Cost Abstraction: No async overhead for sync-only use cases

Design Philosophy

The SQLite Model

SQLite is the most deployed database in the world, embedded in billions of devices. Its success stems from:

Simplicity: No server process, no configuration
Portability: Single file, cross-platform
Efficiency: Direct system calls, no runtime overhead
Predictability: Synchronous operations, clear error handling

SochDB v0.3.5 adopts this philosophy while adding:

Modern vector search capabilities
LLM-native features (TOON format, context queries)
Optional async for network-heavy workloads

Why Not Async Everywhere?

Async is not free:

Runtime overhead (~500KB for tokio)
~40 additional dependencies
Cognitive complexity (colored functions)
FFI boundary challenges (Python, Node.js)
Longer compile times

Async is beneficial when:

Handling many concurrent network connections (gRPC server)
I/O-bound workloads with high concurrency
Explicit async client APIs (streaming queries)

For storage operations:

Disk I/O is already buffered (OS page cache)
Most operations complete in microseconds
Blocking is acceptable (thread-per-connection model)

Architecture Layers

┌─────────────────────────────────────────────────────────┐
│                  Application Layer                       │
│  (User code: Python, Node.js, Rust, Go)                 │
└─────────────────────┬───────────────────────────────────┘
                      │
        ┌─────────────┴─────────────┐
        │                           │
        ▼                           ▼
┌──────────────────┐    ┌──────────────────────┐
│   Embedded FFI   │    │    gRPC Server       │
│   (Sync Only)    │    │  (Requires tokio)    │
│                  │    │                      │
│  • Python SDK    │    │  • Async handlers    │
│  • Node.js SDK   │    │  • Connection pool   │
│  • Go bindings   │    │  • Streaming         │
└────────┬─────────┘    └──────────┬───────────┘
         │                         │
         │                         │ [tokio boundary]
         │                         │
         └────────┬────────────────┘
                  │
                  ▼
    ┌─────────────────────────────────┐
    │      Client API Layer           │
    │  (sochdb)                │
    │  • Sync methods (default)       │
    │  • Async methods (optional)     │
    └────────────┬────────────────────┘
                 │
                 ▼
    ┌─────────────────────────────────┐
    │     Sync-First Core             │
    │  (NO tokio dependency)          │
    │                                 │
    │  ┌─────────────────────────┐   │
    │  │   Storage Engine        │   │
    │  │  • LSM tree (SSTable)   │   │
    │  │  • WAL (Write-Ahead Log)│   │
    │  │  • MVCC                 │   │
    │  │  • Compaction           │   │
    │  └─────────────────────────┘   │
    │                                 │
    │  ┌─────────────────────────┐   │
    │  │   Query Engine          │   │
    │  │  • SQL parser           │   │
    │  │  • AST executor         │   │
    │  │  • Optimizer            │   │
    │  └─────────────────────────┘   │
    │                                 │
    │  ┌─────────────────────────┐   │
    │  │   Vector Index          │   │
    │  │  • HNSW construction    │   │
    │  │  • Similarity search    │   │
    │  │  • Quantization         │   │
    │  └─────────────────────────┘   │
    │                                 │
    │  ┌─────────────────────────┐   │
    │  │   Concurrency Control   │   │
    │  │  • parking_lot::Mutex   │   │
    │  │  • crossbeam channels   │   │
    │  │  • atomic operations    │   │
    │  └─────────────────────────┘   │
    └─────────────────────────────────┘

Layer Descriptions

1. Application Layer

User code: Python scripts, Node.js apps, Rust programs
No async knowledge required: Just call database methods
Examples: See examples/ directory

2. Client Interface Layer

Embedded FFI (Sync)

Direct Rust function calls via FFI
Python: cffi or pyo3 bindings
Node.js: napi-rs bindings
Zero overhead: no serialization, no network
No tokio: Pure synchronous calls

gRPC Server (Async)

Multi-client support
Unix socket or TCP
Requires tokio for connection handling
Useful for: microservices, language interop

3. Sync-First Core

All core storage operations are synchronous:

Component	Sync/Async	Rationale
Storage engine	Sync	Disk I/O is buffered, completes fast
MVCC	Sync	In-memory operations, microsecond latency
WAL	Sync	fsync is blocking anyway
SQL parser	Sync	CPU-bound, no I/O
Vector index	Sync	Memory operations, SIMD vectorization
Compaction	Sync	Background thread, no async needed

Why Sync-First?

1. Binary Size

Embedded Use Case:

# Without tokio (v0.3.5)
cargo build --release -p sochdb-storage
# Binary: 732 KB

# With tokio (v0.3.4)
cargo build --release -p sochdb-storage --features async
# Binary: 1,200 KB

# Savings: 468 KB (39% reduction)

Why it matters:

Mobile apps: limited space
WASM: every KB counts
Edge devices: constrained resources
Docker images: faster pulls

2. Dependency Tree

# Sync-only (v0.3.5)
cargo tree -p sochdb-storage --no-default-features | wc -l
# 62 crates

# With async
cargo tree -p sochdb-storage --features async | wc -l
# 102 crates

# Reduction: 40 fewer dependencies

Benefits:

Faster compilation
Fewer security audits
Reduced supply chain risk
Simpler dependency management

3. FFI Boundary

Problem with async FFI:

# Python calling Rust async function
import sochdb

# This is complex!
db = sochdb.Database.open("./my_db")  # Creates tokio runtime in Rust
db.put_async(b"key", b"value")  # Needs event loop bridge
# Python's asyncio ↔ Rust's tokio: impedance mismatch

Sync FFI is natural:

# Python calling Rust sync function
import sochdb

db = sochdb.Database.open("./my_db")  # Direct Rust call
db.put(b"key", b"value")  # Direct Rust call, returns immediately
# No async ceremony!

4. Mental Model

Sync code is simpler:

// Sync: straightforward
fn write_data(db: &Database, key: &[u8], value: &[u8]) -> Result<()> {
    db.put(key, value)?;
    println!("Written!");
    Ok(())
}

Async adds complexity:

// Async: requires runtime, colored functions
async fn write_data(db: &Database, key: &[u8], value: &[u8]) -> Result<()> {
    db.put_async(key, value).await?;  // Must await
    println!("Written!");
    Ok(())
}

// Caller must also be async (function coloring)
#[tokio::main]
async fn main() {
    write_data(&db, b"key", b"value").await.unwrap();
}

5. Performance

For single-threaded workloads:

Sync is faster: no runtime overhead
Direct system calls
Better CPU cache locality

For multi-threaded workloads:

Thread-per-connection model works fine
OS scheduler is efficient
No need for async unless 10,000+ connections

Benchmark (1,000 writes):

Sync:   1.2ms (default)
Async:  1.5ms (+25% overhead from runtime)

Implementation Details

Crate Structure

sochdb/
├── sochdb-storage/       # Sync-first storage engine
│   ├── Cargo.toml        # default = [] (no tokio)
│   └── src/
│       ├── engine.rs     # Sync operations
│       └── async_ext.rs  # Optional async wrappers
│
├── sochdb-core/          # Core abstractions (sync)
│   ├── Cargo.toml        # No tokio dependency
│   └── src/
│       ├── transaction.rs
│       └── mvcc.rs
│
├── sochdb-query/         # SQL engine (sync)
│   ├── Cargo.toml        # No tokio dependency
│   └── src/
│       ├── parser.rs
│       └── executor.rs
│
├── sochdb-index/         # Vector index (sync)
│   ├── Cargo.toml        # No tokio dependency
│   └── src/
│       └── hnsw.rs
│
└── sochdb-grpc/          # gRPC server (async)
    ├── Cargo.toml        # Requires tokio
    └── src/
        └── server.rs     # Async handlers

Cargo.toml Configuration

Workspace root (/Cargo.toml):

[workspace]
members = [
    "sochdb-storage",
    "sochdb-core",
    "sochdb-query",
    "sochdb-index",
    "sochdb-grpc",
]

[workspace.dependencies]
# ❌ NO tokio here! (was in v0.3.4)
# Each crate declares it explicitly if needed

parking_lot = "0.12"
crossbeam = "0.8"

Storage crate (sochdb-storage/Cargo.toml):

[package]
name = "sochdb-storage"

[features]
default = []  # ✅ No tokio by default (was ["async"] in v0.3.4)
async = ["tokio"]  # Opt-in

[dependencies]
parking_lot = { workspace = true }
crossbeam = { workspace = true }

# ✅ Explicit, optional
tokio = { version = "1.35", features = ["rt-multi-thread", "sync"], optional = true }

[dev-dependencies]
# ❌ No tokio in dev-dependencies
criterion = "0.5"

gRPC server (sochdb-grpc/Cargo.toml):

[package]
name = "sochdb-grpc"

[dependencies]
sochdb-storage = { path = "../sochdb-storage", features = ["async"] }  # ✅ Requires async
tokio = { version = "1.35", features = ["rt-multi-thread", "net", "sync"] }  # ✅ Required
tonic = "0.10"
prost = "0.12"

Synchronization Primitives

Instead of tokio primitives:

// ❌ Old (v0.3.4): tokio dependency
use tokio::sync::Mutex;
use tokio::sync::RwLock;

// ✅ New (v0.3.5): no tokio
use parking_lot::Mutex;
use parking_lot::RwLock;

Benefits of parking_lot:

No async runtime required
Faster: optimized assembly
Smaller binary footprint
Better suited for short critical sections

Channel Usage

Instead of tokio channels:

// ❌ Old: tokio::sync::mpsc
use tokio::sync::mpsc;

let (tx, rx) = mpsc::channel(100);
tx.send(value).await?;

// ✅ New: crossbeam::channel
use crossbeam::channel;

let (tx, rx) = channel::bounded(100);
tx.send(value)?;  // Blocking, but completes fast

Feature Flags

Available Features

Feature	Enables	Use Case
`default = []`	Sync-only storage	Embedded, FFI, CLI tools
`async`	tokio runtime, async methods	gRPC server, async clients

Usage Examples

Sync-Only (Default)

[dependencies]
sochdb = "0.4.0"

use sochdb::Database;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let db = Database::open("./my_db")?;
    db.put(b"key", b"value")?;
    Ok(())
}

With Async

[dependencies]
sochdb = { version = "0.4.0", features = ["async"] }
tokio = { version = "1.35", features = ["rt-multi-thread"] }

use sochdb::Database;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let db = Database::open("./my_db")?;
    db.put_async(b"key", b"value").await?;
    Ok(())
}

Performance Characteristics

Latency Comparison

Operation	Sync (v0.3.5)	Async (v0.3.5)	Overhead
Single write	8 μs	11 μs	+37%
Single read	2 μs	3 μs	+50%
Transaction (10 writes)	45 μs	55 μs	+22%
Index search (k=10)	15 μs	18 μs	+20%

Conclusion: Async adds measurable overhead for single-threaded workloads.

Throughput Comparison

Workload	Sync	Async	Winner
1 client, sequential	125k ops/s	90k ops/s	Sync
10 clients, concurrent	240k ops/s	280k ops/s	Async
100 clients, concurrent	200k ops/s	450k ops/s	Async
1000 clients, concurrent	N/A (thread limit)	580k ops/s	Async

Conclusion: Async shines with high concurrency (100+ clients).

Memory Usage

Configuration	Resident Memory (RSS)
Sync (1 client)	12 MB
Sync (10 threads)	45 MB
Async (10 tasks)	28 MB
Async (100 tasks)	35 MB

Conclusion: Async is more memory-efficient for high concurrency.

Comparison with Other Databases

Database	Core Architecture	Async Runtime	Binary Size
SochDB v0.3.5	Sync-first	Optional (tokio)	732 KB
SQLite	Sync-only	None	~600 KB
DuckDB	Sync-only	None	~3 MB
RocksDB	Sync-only	None	~8 MB
Sled	Async-first	Built-in	~2 MB
SurrealDB	Async-first	Required (tokio)	~15 MB

SochDB's Position:

Follows SQLite/DuckDB pattern (sync-first)
But offers async opt-in for network workloads
Best of both worlds: small by default, scalable when needed

Best Practices

When to Use Sync (Default)

✅ Use sync when:

Embedding in applications (mobile, desktop, WASM)
FFI boundaries (Python, Node.js, Ruby)
CLI tools
Single-threaded scripts
Low-latency requirements
You want minimal dependencies

Example:

// Perfect for embedded use
fn process_batch(db: &Database, items: &[Item]) -> Result<()> {
    for item in items {
        db.put(&item.key, &item.value)?;
    }
    Ok(())
}

When to Enable Async

✅ Enable async when:

Running gRPC server (100+ concurrent clients)
Streaming large result sets
Integrating with async frameworks (axum, actix-web)
You already have tokio in your dependency tree

Example:

// gRPC server with high concurrency
#[tokio::main]
async fn main() {
    let server = GrpcServer::new("./my_db").await;
    server.serve("0.0.0.0:50051").await;  // Handles 1000+ clients
}

Hybrid Approach

Use sync for storage, async for network:

use sochdb::Database;
use axum::{Router, routing::get};

#[tokio::main]
async fn main() {
    // Sync database (no async feature)
    let db = Database::open("./my_db").unwrap();
    let db = Arc::new(db);

    // Async HTTP server
    let app = Router::new()
        .route("/get/:key", get({
            let db = db.clone();
            move |key| async move {
                // Sync call inside async handler
                let value = db.get(&key).unwrap();
                value.map(|v| String::from_utf8_lossy(&v).to_string())
            }
        }));

    axum::Server::bind(&"0.0.0.0:3000".parse().unwrap())
        .serve(app.into_make_service())
        .await
        .unwrap();
}

Why this works:

Database operations are fast (< 100μs)
Blocking inside async is acceptable for short operations
No need for async database methods
Smaller binary, simpler code

Future Considerations

Planned Enhancements (v0.4.0+)

Async Streaming: Optional async iterators for large result sets
Connection Pooling: Optional async connection pool for multi-tenant setups
Async Compaction: Background compaction with tokio::task::spawn
Hybrid Transactions: Sync writes, async replication

Not Planned

Making core storage async-first
Requiring tokio for embedded use
Async-only APIs

Conclusion

SochDB's sync-first architecture provides:

✅ Simplicity: No async complexity for most use cases
✅ Efficiency: ~500KB smaller binaries, 40 fewer dependencies
✅ Compatibility: Easy FFI, works with sync codebases
✅ Flexibility: Opt-in async when you need it
✅ Performance: Fast for single-threaded, scalable for concurrent workloads

Philosophy: "Async is a tool, not a religion. Use it where it helps, avoid it where it hurts."

SochDB Sync-First Architecture

Table of Contents

Overview

Key Principles

Design Philosophy

The SQLite Model

Why Not Async Everywhere?

Architecture Layers

Layer Descriptions

1. Application Layer

2. Client Interface Layer

3. Sync-First Core

Why Sync-First?

1. Binary Size

2. Dependency Tree

3. FFI Boundary

4. Mental Model

5. Performance

Implementation Details

Crate Structure

Cargo.toml Configuration

Synchronization Primitives

Channel Usage

Feature Flags

Available Features

Usage Examples

Performance Characteristics

Latency Comparison

Throughput Comparison

Memory Usage

Comparison with Other Databases

Best Practices

When to Use Sync (Default)

When to Enable Async

Hybrid Approach

Future Considerations

Planned Enhancements (v0.4.0+)

Not Planned

Conclusion

References

Table of Contents​

Overview​

Key Principles​

Design Philosophy​

The SQLite Model​

Why Not Async Everywhere?​

Architecture Layers​

Layer Descriptions​

1. Application Layer​

2. Client Interface Layer​

3. Sync-First Core​

Why Sync-First?​

1. Binary Size​

2. Dependency Tree​

3. FFI Boundary​

4. Mental Model​

5. Performance​

Implementation Details​

Crate Structure​

Cargo.toml Configuration​

Synchronization Primitives​

Channel Usage​

Feature Flags​

Available Features​

Usage Examples​

Performance Characteristics​

Latency Comparison​

Throughput Comparison​

Memory Usage​

Comparison with Other Databases​

Best Practices​

When to Use Sync (Default)​

When to Enable Async​

Hybrid Approach​

Future Considerations​

Planned Enhancements (v0.4.0+)​

Not Planned​

Conclusion​

References​

Table of Contents

Overview

Key Principles

Design Philosophy

The SQLite Model

Why Not Async Everywhere?

Architecture Layers

Layer Descriptions

1. Application Layer

2. Client Interface Layer

3. Sync-First Core

Why Sync-First?

1. Binary Size

2. Dependency Tree

3. FFI Boundary

4. Mental Model

5. Performance

Implementation Details

Crate Structure

Cargo.toml Configuration

Synchronization Primitives

Channel Usage

Feature Flags

Available Features

Usage Examples

Performance Characteristics

Latency Comparison

Throughput Comparison

Memory Usage

Comparison with Other Databases

Best Practices

When to Use Sync (Default)

When to Enable Async

Hybrid Approach

Future Considerations

Planned Enhancements (v0.4.0+)

Not Planned

Conclusion

References