Performance

Benchmarks run on macOS 14 (Apple M-series, ARM64) using synthetic drone-swarm datasets. All timings are measured inside the C engine via clock_gettime(CLOCK_MONOTONIC) (nanosecond resolution), not Python wall-clock time, so Python interpreter overhead is excluded from latency numbers.

Benchmark harness: benchmarks/run_benchmark.py — 100 trials per configuration, results written to benchmarks/results/.

python3 benchmarks/run_benchmark.py          # all sizes (1K → 1M)
python3 benchmarks/run_benchmark.py --skip-1m

TPI Range Query vs Full Scan

Query window: 30 seconds over a 1,000-second timeline (3 % of timeline). Theoretical speedup: ~33× (1 / 0.03).

Dataset	TPI median	Full scan median	TPI p95	Speedup	Theoretical
1 K	25 µs	621 µs	37 µs	24.8×	33.3×
10 K	245 µs	5.92 ms	274 µs	24.2×	33.3×
100 K	2.49 ms	60.5 ms	2.60 ms	24.3×	33.3×
1 M	24.8 ms	610 ms	26.0 ms	24.6×	33.3×

Key observations

Speedup is consistent at ~24–25× regardless of dataset size — TPI bucket pushdown eliminates ~97 % of I/O whether you have 1 K or 1 M records.
The small gap from the theoretical 33× is caused by bucket boundary effects (the 30 s window can span a partial extra bucket at each end).
Latency scales linearly with records read (3 % of N), not with total N.

FMI Point Lookup vs Full Scan

The FMI benchmark measures the complete hm_fmi_query() operation: binary search + sequential scan for matching seq-IDs + WAL merge.

The top-degree node in the synthetic dataset appears in ~27 % of all records (by design — synthetic data has no low-degree nodes). For this adversarial case the theoretical maximum speedup is 1 / 0.27 ≈ 3.7×.

Dataset	Node degree	Degree %	FMI query	Full scan	Speedup
1 K	299	29.9 %	1.6 ms	4.6 ms	2.9×
10 K	2,694	26.9 %	13.6 ms	37.9 ms	2.8×
100 K	26,766	26.8 %	142 ms	399 ms	2.8×
1 M	265,268	26.5 %	1.40 s	4.15 s	3.0×

Key observations

FMI speedup of ~3× for a node present in 27 % of records matches theory.
For realistic sparse nodes (e.g. a node present in 0.1 % of records), the theoretical speedup is 1,000× — the FMI is specifically designed for the sparse-node query pattern dominant in drone-swarm coalition analysis.
FMI speedup scales with the inverse of the node’s degree fraction; the dataset above is the worst case, not the expected case.

Scalability Projection (TPI)

Latency grows as O(window_fraction × N), not O(N). For a fixed 3 % window:

Records	Expected TPI median
10 K	~0.25 ms
100 K	~2.5 ms
1 M	~25 ms
10 M	~250 ms (est.)

Measured values confirm this linear relationship — adding one order of magnitude of records multiplies latency by exactly one order of magnitude.

Write-Path Overhead

Operation	Latency	Notes
`insert()` one record	~20 µs	Includes `fdatasync()` — crash safe
`delete()` (tombstone)	~20 µs	Includes `fdatasync()`
`compact()` (100 K records)	~83 ms	Full rebuild: TPI + FMI + WAL truncate

WAL writes are durable after each fdatasync(). Compaction rebuilds the entire TPI + FMI binary index from scratch and then truncates the WAL to a zero-entry header — leaving the system in a consistent state even if interrupted mid-way.

Benchmark Reproducibility

# Clone and build
git clone https://github.com/sree181/hypermesh-workbench
cd hypermesh-workbench
make -C hypermesh_core
pip install -e ".[dev]"

# Run benchmarks (writes CSV + summary.txt to benchmarks/results/)
python3 benchmarks/run_benchmark.py

# Inspect results
cat benchmarks/results/summary.txt

All random seeds are fixed (seed=42 for data generation, seed=2024 for query windows) so results are exactly reproducible across runs and machines.