SochDB End-to-End Profiling Documentation
Overviewβ
This document describes the comprehensive end-to-end profiling infrastructure for SochDB's HNSW vector index, covering the entire pipeline from Python SDK through FFI to Rust core.
Architectureβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Python SDK Layer β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 10_e2e_profiling.py β β
β β - PrecisionTimer (nanosecond resolution) β β
β β - ProfiledVectorIndex wrapper β β
β β - Memory tracking via tracemalloc β β
β β - Per-operation timing: numpy, dtype, validation, FFI ptr creation β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β vector.py (FFI bindings) β β
β β - ctypes bindings to libsochdb_index.dylib β β
β β - insert_batch β hnsw_insert_batch β β
β β - Profiling control: enable_profiling(), dump_profiling() β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β FFI Boundary
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Rust FFI Layer (ffi.rs) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β hnsw_insert_batch() β β
β β - Profiled: ffi.slice_from_raw, ffi.id_conversion β β
β β - Calls: index.insert_batch_contiguous() β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β HNSW Core (hnsw.rs) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β insert_batch_contiguous_bulk() β β
β β β β
β β Phase 1: Parallel Quantization (hnsw.phase1.quantize_parallel) β β
β β - rayon parallel iterator β β
β β - Creates HnswNode with QuantizedVector β β
β β β β
β β Phase 2: Map Insert (hnsw.phase2.map_insert) β β
β β - Insert nodes into DashMap β β
β β - Update entry point β β
β β β β
β β Phase 3: Connection Building (hnsw.phase3.*) β β
β β - connect_total: Overall connection time β β
β β - search_layer: Graph navigation β β
β β - neighbor_select: RNG heuristic β MAIN BOTTLENECK β β
β β - add_connections: Bidirectional edge creation β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Profiling Infrastructureβ
Python-Side Profilingβ
File: sochdb-python-sdk/examples/10_e2e_profiling.py
# Key components
class PrecisionTimer:
"""Nanosecond-precision timer using time.perf_counter_ns()"""
class TimingStats:
"""Statistical aggregation: count, total, mean, std, min, max, p50, p95, p99"""
class ProfiledVectorIndex:
"""Wrapper that instruments all VectorIndex operations"""
def insert_batch_profiled(vectors, ids):
# Times: numpy_ascontiguous, dtype_conversion, data_validation,
# ffi_ptr_creation, ffi_call_overhead, batch_total
Output: profiling_results.json
Rust-Side Profilingβ
File: sochdb-index/src/profiling.rs
// Enabling profiling
SOCHDB_PROFILING=1 // Environment variable
// Key components
pub struct Timer { /* start, stop, record */ }
pub struct OpStats { count, total_ns, min_ns, max_ns, item_count }
pub struct ProfileCollector { /* global singleton with thread-safe stats */ }
// FFI exports
pub extern "C" fn sochdb_profiling_enable();
pub extern "C" fn sochdb_profiling_disable();
pub extern "C" fn sochdb_profiling_dump();
Output: /tmp/sochdb_profile.json
Running Profilingβ
Basic Usageβ
cd sochdb-python-sdk
# Enable profiling and run
export SOCHDB_PROFILING=1
export SOCHDB_LIB_PATH=/path/to/target/release/libsochdb_index.dylib
python3 examples/10_e2e_profiling.py
# Analyze results
python3 examples/11_profiling_analysis.py
Customizable Parametersβ
Edit 10_e2e_profiling.py:
# Configuration
NUM_VECTORS = 1000 # Number of vectors to insert
DIMENSION = 768 # Vector dimension (e.g., OpenAI embeddings)
EF_CONSTRUCTION = 200 # HNSW ef parameter (quality vs speed)
MAX_CONNECTIONS = 16 # HNSW M parameter
Profiling Results (1000 vectors, 768d)β
Summary Metricsβ
| Metric | Value |
|---|---|
| Total Insert Time | 13,533 ms (13.5s) |
| Throughput | 74 vectors/sec |
| Latency | 13.5 ms/vector |
| Peak Memory | 9.45 MB |
| Search Time | 1.7 ms |
Time Breakdownβ
| Phase | Time (ms) | Percentage | Per-Vector |
|---|---|---|---|
| Neighbor Selection | 10,365 | 76.6% | 11.07 ms |
| Add Connections | 1,908 | 14.1% | 2.04 ms |
| Search Layer | 1,076 | 7.9% | 1.15 ms |
| Quantization | 0.99 | <0.01% | 1.05 Β΅s |
| Map Insert | 0.11 | <0.01% | 0.12 Β΅s |
| FFI Overhead | 0.003 | <0.01% | 0.003 Β΅s |
Bottleneck Analysisβ
CRITICAL BOTTLENECK: Neighbor Selection (RNG Heuristic) - 77%
The select_neighbors_heuristic function implements the Relative Neighborhood
Graph (RNG) pruning which is essential for HNSW quality but computationally
expensive.
For each of N vectors inserted:
- Search returns ~ef candidates (200)
- RNG checks each candidate against all previously selected neighbors
- Each check requires a full 768-dimensional distance calculation
- Complexity: O(N Γ ef Γ m Γ D) where D is dimension
With N=1000, ef=200, m=16, D=768:
- Approximately 1000 Γ 200 Γ 16 = 3.2M distance calculations
- Each 768d cosine distance = 3 SIMD operations (dot, norm_a, norm_b)
Optimization Recommendationsβ
1. Reduce ef_constructionβ
# Current: ef_construction = 200
# Recommended: ef_construction = 100-128
# Trade-off: ~5% recall reduction, 2x faster insert
index = VectorIndex(
dimension=768,
ef_construction=100, # Reduced from 200
max_connections=16,
)
2. Use Batch Distance Computationβ
The RNG heuristic currently does scalar distance calculations. Batching candidates and using SIMD would provide 4-8x speedup on distance computation.
3. Pre-compute Distance Cacheβ
Cache pairwise distances between candidates during search to avoid recomputation in neighbor selection.
4. Approximate RNG for Early Insertsβ
Use simpler heuristics (top-k by distance) when graph is sparse, switch to full RNG when graph is denser.
5. Parallel Neighbor Selectionβ
The RNG check for each candidate can be parallelized since it's independent.
Files Createdβ
| File | Purpose |
|---|---|
sochdb-python-sdk/examples/10_e2e_profiling.py | Main profiling script |
sochdb-python-sdk/examples/11_profiling_analysis.py | Analysis and visualization |
sochdb-index/src/profiling.rs | Rust profiling infrastructure |
profiling_results.json | Python-side profiling output |
/tmp/sochdb_profile.json | Rust-side profiling output |
JSON Output Formatβ
Python Profile (profiling_results.json)β
{
"timestamp": "2025-12-29T00:33:55",
"config": {
"num_vectors": 1000,
"dimension": 768,
"ef_construction": 200,
"max_connections": 16
},
"summary": {
"total_insert_time_ms": 13533.17,
"vectors_per_second": 73.89,
"peak_memory_mb": 9.45
},
"python_layer": {
"timings": {
"numpy_ascontiguous": { "total_ms": 0.008 },
"dtype_conversion": { "total_ms": 0.013 },
"ffi_ptr_creation": { "total_ms": 0.032 },
"batch_total": { "total_ms": 13533.05 }
}
}
}
Rust Profile (/tmp/sochdb_profile.json)β
{
"total_elapsed_ms": 13537.17,
"operations": {
"hnsw.phase3.neighbor_select": {
"count": 1,
"total_ms": 10365.00,
"mean_us": 10365003.96,
"item_count": 936,
"per_item_us": 11073.72
},
"hnsw.phase3.add_connections": { ... },
"hnsw.phase3.search_layer": { ... },
"hnsw.phase1.quantize_parallel": { ... },
"hnsw.phase2.map_insert": { ... }
}
}
Conclusionβ
The profiling reveals that 77% of insertion time is spent in the RNG neighbor selection heuristic, which is the core HNSW algorithm for maintaining graph quality. This is expected behavior for high-dimensional vectors.
Key insights:
- Python/FFI overhead is negligible (
<0.001%) - Quantization and map operations are very fast (
<0.01%) - The bottleneck is algorithmic, not I/O or memory
- Optimization should focus on the neighbor selection loop
For faster inserts at the cost of some recall:
- Reduce ef_construction from 200 to 100
- Use lower precision (F16) for distance calculations
- Implement batched SIMD distance in the RNG loop