Performance
Benchmarks run on macOS 14 (Apple M-series, ARM64) using synthetic drone-swarm
datasets. All timings are measured inside the C engine via clock_gettime(CLOCK_MONOTONIC)
(nanosecond resolution), not Python wall-clock time, so Python interpreter
overhead is excluded from latency numbers.
Benchmark harness: benchmarks/run_benchmark.py — 100 trials per configuration,
results written to benchmarks/results/.
python3 benchmarks/run_benchmark.py # all sizes (1K → 1M)python3 benchmarks/run_benchmark.py --skip-1mTPI Range Query vs Full Scan
Section titled “TPI Range Query vs Full Scan”Query window: 30 seconds over a 1,000-second timeline (3 % of timeline). Theoretical speedup: ~33× (1 / 0.03).
| Dataset | TPI median | Full scan median | TPI p95 | Speedup | Theoretical |
|---|---|---|---|---|---|
| 1 K | 25 µs | 621 µs | 37 µs | 24.8× | 33.3× |
| 10 K | 245 µs | 5.92 ms | 274 µs | 24.2× | 33.3× |
| 100 K | 2.49 ms | 60.5 ms | 2.60 ms | 24.3× | 33.3× |
| 1 M | 24.8 ms | 610 ms | 26.0 ms | 24.6× | 33.3× |
Key observations
- Speedup is consistent at ~24–25× regardless of dataset size — TPI bucket pushdown eliminates ~97 % of I/O whether you have 1 K or 1 M records.
- The small gap from the theoretical 33× is caused by bucket boundary effects (the 30 s window can span a partial extra bucket at each end).
- Latency scales linearly with records read (3 % of N), not with total N.
FMI Point Lookup vs Full Scan
Section titled “FMI Point Lookup vs Full Scan”The FMI benchmark measures the complete hm_fmi_query() operation:
binary search + sequential scan for matching seq-IDs + WAL merge.
The top-degree node in the synthetic dataset appears in ~27 % of all records (by design — synthetic data has no low-degree nodes). For this adversarial case the theoretical maximum speedup is 1 / 0.27 ≈ 3.7×.
| Dataset | Node degree | Degree % | FMI query | Full scan | Speedup |
|---|---|---|---|---|---|
| 1 K | 299 | 29.9 % | 1.6 ms | 4.6 ms | 2.9× |
| 10 K | 2,694 | 26.9 % | 13.6 ms | 37.9 ms | 2.8× |
| 100 K | 26,766 | 26.8 % | 142 ms | 399 ms | 2.8× |
| 1 M | 265,268 | 26.5 % | 1.40 s | 4.15 s | 3.0× |
Key observations
- FMI speedup of ~3× for a node present in 27 % of records matches theory.
- For realistic sparse nodes (e.g. a node present in 0.1 % of records), the theoretical speedup is 1,000× — the FMI is specifically designed for the sparse-node query pattern dominant in drone-swarm coalition analysis.
- FMI speedup scales with the inverse of the node’s degree fraction; the dataset above is the worst case, not the expected case.
Scalability Projection (TPI)
Section titled “Scalability Projection (TPI)”Latency grows as O(window_fraction × N), not O(N). For a fixed 3 % window:
| Records | Expected TPI median |
|---|---|
| 10 K | ~0.25 ms |
| 100 K | ~2.5 ms |
| 1 M | ~25 ms |
| 10 M | ~250 ms (est.) |
Measured values confirm this linear relationship — adding one order of magnitude of records multiplies latency by exactly one order of magnitude.
Write-Path Overhead
Section titled “Write-Path Overhead”| Operation | Latency | Notes |
|---|---|---|
insert() one record | ~20 µs | Includes fdatasync() — crash safe |
delete() (tombstone) | ~20 µs | Includes fdatasync() |
compact() (100 K records) | ~83 ms | Full rebuild: TPI + FMI + WAL truncate |
WAL writes are durable after each fdatasync(). Compaction rebuilds the entire
TPI + FMI binary index from scratch and then truncates the WAL to a zero-entry
header — leaving the system in a consistent state even if interrupted mid-way.
Benchmark Reproducibility
Section titled “Benchmark Reproducibility”# Clone and buildgit clone https://github.com/sree181/hypermesh-workbenchcd hypermesh-workbenchmake -C hypermesh_corepip install -e ".[dev]"
# Run benchmarks (writes CSV + summary.txt to benchmarks/results/)python3 benchmarks/run_benchmark.py
# Inspect resultscat benchmarks/results/summary.txtAll random seeds are fixed (seed=42 for data generation, seed=2024 for
query windows) so results are exactly reproducible across runs and machines.