Skip to content

Performance

Benchmarks run on macOS 14 (Apple M-series, ARM64) using synthetic drone-swarm datasets. All timings are measured inside the C engine via clock_gettime(CLOCK_MONOTONIC) (nanosecond resolution), not Python wall-clock time, so Python interpreter overhead is excluded from latency numbers.

Benchmark harness: benchmarks/run_benchmark.py — 100 trials per configuration, results written to benchmarks/results/.

python3 benchmarks/run_benchmark.py # all sizes (1K → 1M)
python3 benchmarks/run_benchmark.py --skip-1m

Query window: 30 seconds over a 1,000-second timeline (3 % of timeline). Theoretical speedup: ~33× (1 / 0.03).

DatasetTPI medianFull scan medianTPI p95SpeedupTheoretical
1 K25 µs621 µs37 µs24.8×33.3×
10 K245 µs5.92 ms274 µs24.2×33.3×
100 K2.49 ms60.5 ms2.60 ms24.3×33.3×
1 M24.8 ms610 ms26.0 ms24.6×33.3×

Key observations

  • Speedup is consistent at ~24–25× regardless of dataset size — TPI bucket pushdown eliminates ~97 % of I/O whether you have 1 K or 1 M records.
  • The small gap from the theoretical 33× is caused by bucket boundary effects (the 30 s window can span a partial extra bucket at each end).
  • Latency scales linearly with records read (3 % of N), not with total N.

The FMI benchmark measures the complete hm_fmi_query() operation: binary search + sequential scan for matching seq-IDs + WAL merge.

The top-degree node in the synthetic dataset appears in ~27 % of all records (by design — synthetic data has no low-degree nodes). For this adversarial case the theoretical maximum speedup is 1 / 0.27 ≈ 3.7×.

DatasetNode degreeDegree %FMI queryFull scanSpeedup
1 K29929.9 %1.6 ms4.6 ms2.9×
10 K2,69426.9 %13.6 ms37.9 ms2.8×
100 K26,76626.8 %142 ms399 ms2.8×
1 M265,26826.5 %1.40 s4.15 s3.0×

Key observations

  • FMI speedup of ~3× for a node present in 27 % of records matches theory.
  • For realistic sparse nodes (e.g. a node present in 0.1 % of records), the theoretical speedup is 1,000× — the FMI is specifically designed for the sparse-node query pattern dominant in drone-swarm coalition analysis.
  • FMI speedup scales with the inverse of the node’s degree fraction; the dataset above is the worst case, not the expected case.

Latency grows as O(window_fraction × N), not O(N). For a fixed 3 % window:

RecordsExpected TPI median
10 K~0.25 ms
100 K~2.5 ms
1 M~25 ms
10 M~250 ms (est.)

Measured values confirm this linear relationship — adding one order of magnitude of records multiplies latency by exactly one order of magnitude.


OperationLatencyNotes
insert() one record~20 µsIncludes fdatasync() — crash safe
delete() (tombstone)~20 µsIncludes fdatasync()
compact() (100 K records)~83 msFull rebuild: TPI + FMI + WAL truncate

WAL writes are durable after each fdatasync(). Compaction rebuilds the entire TPI + FMI binary index from scratch and then truncates the WAL to a zero-entry header — leaving the system in a consistent state even if interrupted mid-way.


Terminal window
# Clone and build
git clone https://github.com/sree181/hypermesh-workbench
cd hypermesh-workbench
make -C hypermesh_core
pip install -e ".[dev]"
# Run benchmarks (writes CSV + summary.txt to benchmarks/results/)
python3 benchmarks/run_benchmark.py
# Inspect results
cat benchmarks/results/summary.txt

All random seeds are fixed (seed=42 for data generation, seed=2024 for query windows) so results are exactly reproducible across runs and machines.