HyperMesh CISO Solution Brief
Endpoint Behavioural Anomaly Detection With Temporal Hypergraphs and Liquid State Machines
Section titled “Endpoint Behavioural Anomaly Detection With Temporal Hypergraphs and Liquid State Machines”Audience: CISO, SOC leadership, security architecture, detection engineering
Use case: Zero-shot behavioural anomaly detection from Microsoft Defender for Endpoint telemetry
Core stack: HyperMeshDB + temporal hypergraph features + Liquid State Machine reservoir + One-Class SVM
Primary result: 90.2% precision on the NANXCV validation host while training only on clean baseline activity
Executive Summary
Section titled “Executive Summary”Modern endpoint attacks often do not announce themselves as one obvious bad event. They emerge as a gradual shift in how machines, processes, accounts, and network artefacts co-occur over time. Traditional SIEM and EDR rules are excellent for known signatures, but they struggle when each individual action looks legitimate in isolation.
HyperMesh addresses this gap by modelling endpoint telemetry as a temporal hypergraph. Each event becomes a first-class n-way behavioural fact linking the machine, process, account, and IP involved at that moment. A Liquid State Machine then gives the system a short fading memory of recent behaviour, and a One-Class SVM learns the boundary of normal activity from clean baseline data only.
The key operational claim is simple:
HyperMesh can learn what normal endpoint behaviour looks like from Defender telemetry and flag anomalous behaviour windows without training on attack labels.
On the NANXCV validation host, the zero-shot detector achieved:
| Metric | Result |
|---|---|
| Precision | 0.902 |
| Recall | 0.537 |
| F1 | 0.673 |
| ROC-AUC | 0.744 |
| PR-AUC | 0.831 |
| Accuracy | 0.720 |
This means that when the detector fires on this validation run, it is right about 9 times out of 10. That is the most important number for a SOC, because analyst attention is the scarce resource.
This is not positioned as an EDR replacement. It is an additive analyst-priority signal that surfaces behaviourally unusual windows for investigation.
The CISO Message
Section titled “The CISO Message”The Problem
Section titled “The Problem”Security teams already collect rich endpoint telemetry, but most pipelines still treat it as flat rows. A row may contain a machine, process, account, IP address, timestamp, and action type, yet the detection system often loses the fact that these identities acted together.
Attackers exploit this gap. Living-off-the-land behaviour, lateral movement, staging, command-and-control, and dwell time often appear as changes in patterns of co-occurrence, not as single events that cross a static threshold.
The HyperMesh Approach
Section titled “The HyperMesh Approach”HyperMesh turns endpoint telemetry into a temporal hypergraph:
machine + process + account + IP + behaviour category + timestampEach event is stored as a hyperedge, preserving the n-way relationship directly. Temporal analytics then summarize behaviour in five-minute windows, and the reservoir model tracks how those windows evolve.
Why It Matters
Section titled “Why It Matters”The model is trained on normal activity only. It does not need examples of every attack technique. This makes it suitable for a zero-shot detection setting where future attacker behaviour may not match yesterday’s signatures.
End-to-End Architecture
Section titled “End-to-End Architecture”┌────────────────────┐│ Microsoft Defender ││ Endpoint telemetry │└─────────┬──────────┘ │ ▼┌────────────────────┐│ HyperMesh ingest ││ MDE rows → entities││ and hyperedges │└─────────┬──────────┘ │ ▼┌────────────────────┐│ Temporal hypergraph││ machine + process ││ account + IP │└─────────┬──────────┘ │ ▼┌────────────────────┐│ 5-minute features ││ 16-dim vector per ││ time window │└─────────┬──────────┘ │ ▼┌────────────────────┐│ LSM reservoir ││ 500 recurrent ││ memory neurons │└─────────┬──────────┘ │ ▼┌────────────────────┐│ One-Class SVM ││ learns normal only │└─────────┬──────────┘ │ ▼┌────────────────────┐│ Anomaly timeline ││ score + evidence ││ for SOC triage │└────────────────────┘Repository artefacts:
| Layer | Implementation |
|---|---|
| MDE ingestion | scripts/ingest_mde_baseline.py, scripts/ingest_mde_fleet.py, hypermeshdb/ingest/strategies/mde_baseline.py, hypermeshdb/connectors/mde.py |
| Temporal features | scripts/temporal_common.py, scripts/temporal_analysis_nanxcv.py, scripts/temporal_analysis_fleet.py |
| Reservoir and readouts | scripts/snn_train_nanxcv.py |
| Results | data/snn_results_nanxcv.json, data/snn_results_ciso_fleet.json, data/temporal_nanxcv.json, data/temporal_ciso_fleet.json |
| Presentation assets | data/HyperMesh_CISO_LSM_Deck.pptx, client/src/pages/SnnDashboard.tsx, design/snn-hypergraph-*deck.html |
| Methodology | design/methodology_temporal_hypergraph_snn.tex |
Data Used
Section titled “Data Used”The study uses Microsoft Defender for Endpoint timeline exports provided by the LTIMindTree CISO team for February 2026.
| Host | Role | Baseline Window | Notes |
|---|---|---|---|
NANXCV (nanxcv00f89340g) | Primary victim host | Feb 9-13, 2026 | Cleanest baseline; 0% simulation traffic; validation host |
AZRPREPW (AZRCIPREPWCYMLT) | Windows VM | Feb 7-12, 2026 | Cymulate simulation traffic excluded |
AZRPREPL (AZRCIPREPLCYMLT) | Linux VM | Feb 6-12, 2026 | Cymulate simulation traffic excluded |
The attack/exercise slice is NANXCV, Feb 14-20, 2026, stored as SYS_NANXCV.
Two study modes were run:
| Phase | Scope | Purpose |
|---|---|---|
| Phase 1 | Single host, NANXCV | Method validation with the cleanest baseline and clearest behavioural contrast |
| Phase 2 | Three-host fleet | Scale-realism test using a global entity map across machines |
Important caveat: the attack window is labelled by the CISO team’s operational calendar, not by per-event kill-chain ground truth. That is sufficient for prototype validation, but a production evaluation needs analyst-confirmed labels.
Hypergraph Ingestion
Section titled “Hypergraph Ingestion”The MDE ingestion strategy converts raw endpoint rows into typed entities and hyperedges.
For each event, HyperMesh resolves:
| Entity Type | Source Field |
|---|---|
| Machine | Computer Name or Machine Id |
| Process | Initiating Process SHA1, with filename used for display |
| Account | Initiating Process Account Domain + Initiating Process Account Name |
| IP | Remote IP, when present |
| Formation | Mapped from Action Type |
Each row becomes one behavioural hyperedge:
e = (event_ts, members, formation, weight)Where:
| Field | Meaning |
|---|---|
event_ts | Unix timestamp for the endpoint event |
members | Integer entity IDs for machine, process, account, and optional IP |
formation | Compact behavioural category |
weight | Event count or aggregated event volume |
The formation vocabulary is:
PROCESS_EXECNETWORK_CONNFILE_OPSREGISTRY_OPSDNS_LOOKUPHTTP_TRAFFICIPC_PIPEMODULE_LOADLOGONSCRIPT_EXECGENERICWhy this matters: a normal graph would split one 4-way behaviour into multiple pairwise edges. A hypergraph preserves the complete behavioural fact as one object.
Ingestion Magnitudes
Section titled “Ingestion Magnitudes”| Measure | NANXCV Phase 1 | Fleet Phase 2 |
|---|---|---|
| Baseline hyperedges | 14,034 | - |
| Attack hyperedges | 13,341 | - |
| Total hyperedges | - | 39,016 |
| Approx. raw rows after Cymulate exclusion | - | 374,000 |
| Unique baseline entities | 1,775 | 2,993 |
| Novel entities in attack | 53 | 815 |
The fleet run uses a global, deduplicated entity map so shared IPs, accounts, and file hashes resolve to the same node across hosts.
Temporal Features
Section titled “Temporal Features”HyperMesh partitions the telemetry into contiguous five-minute windows. Each window becomes a 16-dimensional feature vector.
Aggregate Features
Section titled “Aggregate Features”| Feature | Description |
|---|---|
hedge_count | Number of hyperedges in the window |
entity_count | Number of distinct active entities |
novelty_rate | Fraction of entities not seen in baseline |
mean_members | Average hyperedge cardinality |
formation_entropy | Behavioural diversity across formation types |
Formation Histogram
Section titled “Formation Histogram”The remaining 11 channels are raw counts for each formation category.
| Measure | NANXCV Phase 1 | Fleet Phase 2 |
|---|---|---|
| Total windows | 952 | 1,614 |
| Baseline windows | 457 | 1,119 |
| Attack windows | 495 | 495 |
| Windows scored after washout | 922 | 1,584 |
An important interpretability signal came from formation_entropy: volumetric features generated many noisy changepoints, but entropy produced only a small number of statistically meaningful shifts. The sharpest was Feb 17, 09:15 UTC, where entropy collapsed to 0.65 from a running mean near 2.34, indicating concentration into a narrower behavioural mix dominated by network and process execution activity.
Baseline-Only Normalisation
Section titled “Baseline-Only Normalisation”The model is deliberately normalised using baseline windows only.
u_tj = (X_tj - mean_j_baseline) / std_j_baselineThis mimics production deployment. In a real environment, the model would learn from an assumed-normal historical period and then score future activity. No future attack distribution is used to scale the features.
Liquid State Machine Reservoir
Section titled “Liquid State Machine Reservoir”The reservoir is a fixed recurrent neural system that gives each five-minute window a fading memory of recent activity.
x(t) = (1 - α) x(t-1) + α tanh(W x(t-1) + W_in u(t) + ε_t)| Parameter | Value | Meaning |
|---|---|---|
n_reservoir | 500 | Number of reservoir neurons |
spectral_radius | 0.95 | Keeps the reservoir stable and fading |
leak_rate | 0.30 | Controls memory decay |
sparsity | 0.10 | Sparse recurrent connectivity |
input_scaling | 0.30 | Input projection scale |
washout | 30 | Initial windows discarded before scoring |
seed | 7 | Reproducibility |
The recurrent weights are fixed. There is no deep backpropagation. Only the readout is trained. This is important for security telemetry because the number of labelled attack windows is usually small and overfitting risk is high.
Detection Models
Section titled “Detection Models”Production Mode: One-Class SVM
Section titled “Production Mode: One-Class SVM”The deployable detector is a One-Class SVM trained only on baseline reservoir states.
OneClassSVM(kernel="rbf", gamma="scale", nu=0.05)anomaly_score = -decision_function(state)nu = 0.05 sets an approximate baseline false-alarm budget of 5%. In the NANXCV run, the observed baseline alarm rate was 29 / 427 = 6.8%, close to the intended operating point.
Diagnostic Mode: Logistic Regression
Section titled “Diagnostic Mode: Logistic Regression”A logistic readout was also tested on the same reservoir states. It uses labels and is therefore not a production simulation. It is included only to assess whether separability exists in the reservoir representation.
The key message: operational claims should use the One-Class SVM results, not the supervised diagnostic readout.
Results
Section titled “Results”Phase 1: NANXCV Validation Host
Section titled “Phase 1: NANXCV Validation Host”One-Class SVM, trained only on clean baseline windows:
| Metric | Value |
|---|---|
| Precision | 0.902 |
| Recall | 0.537 |
| F1 | 0.673 |
| ROC-AUC | 0.744 |
| PR-AUC | 0.831 |
| Accuracy | 0.720 |
Confusion matrix:
| Predicted Normal | Predicted Anomalous | |
|---|---|---|
| Actual Baseline | 398 | 29 |
| Actual Attack | 229 | 266 |
Interpretation:
- The detector is highly precise: when it fires, it is usually meaningful.
- Recall is moderate: some attack windows, especially dwell-like behaviour, remain inside the normal boundary.
- Attack windows sit 13.6% farther from the baseline reservoir centroid than baseline windows, confirming a measurable state-space drift.
Headline:
When HyperMesh fires on the NANXCV validation host, it is right 90% of the time, without training on attack labels.
Phase 2: Three-Host Fleet
Section titled “Phase 2: Three-Host Fleet”| Metric | One-Class SVM | Logistic Diagnostic |
|---|---|---|
| Precision | 0.731 | 0.561 |
| Recall | 0.329 | 0.980 |
| F1 | 0.454 | 0.713 |
| ROC-AUC | 0.713 | 0.908 |
| PR-AUC | 0.613 | 0.809 |
Interpretation:
- The zero-shot fleet model remains precision-oriented but recall drops.
- Pooling heterogeneous hosts widens the normal envelope.
- The supervised diagnostic AUC of 0.908 shows the signal exists, but the zero-shot boundary needs per-host or per-segment baselines to recover recall.
This is a roadmap item, not a hidden flaw.
What Analysts Would See
Section titled “What Analysts Would See”Each anomaly window can be surfaced with:
| Evidence | Why It Helps |
|---|---|
| Timestamp | Places the alert on the incident timeline |
| Anomaly score | Prioritises the analyst queue |
| Dominant formations | Shows whether the window is network-heavy, process-heavy, etc. |
| Novel entity count | Highlights new accounts, processes, hashes, or IPs |
| Contributing hyperedges | Links back to raw Defender evidence |
| Reservoir drift | Quantifies departure from baseline behaviour |
This makes the model investigable rather than a black-box score.
Honest Limitations
Section titled “Honest Limitations”These should be stated explicitly in any CISO presentation.
- Recall is not yet high enough to replace existing controls. The NANXCV result is precision-first and should be used as an analyst-priority signal layered on top of EDR.
- The attack label is calendar-based. The Feb 14-20 window was designated by the CISO team, but per-event kill-chain labels are not yet available.
- No MITRE technique attribution is claimed. The system flags behaviourally anomalous windows; analysts determine root cause.
- No commercial EDR or UEBA head-to-head benchmark has been run yet.
- Fleet modelling needs per-host or per-segment boundaries. Pooled baselines are too broad for best zero-shot recall.
- The logistic model is diagnostic only. It uses labels and should not be presented as deployable performance.
- Validation is single-tenant. Broader generalisation requires additional environments.
Roadmap
Section titled “Roadmap”| Priority | Workstream | Outcome |
|---|---|---|
| 1 | Per-host and per-segment baselines | Higher recall without sacrificing precision |
| 2 | Adaptive operating point | Tune alert budget to SOC capacity |
| 3 | Richer temporal features | Capture dwell, inter-arrival timing, and formation sequences |
| 4 | Explainability UI | Per-alert “why” view in SnnDashboard |
| 5 | EDR / UEBA benchmark | Credible side-by-side comparison on the same telemetry |
| 6 | Shadow-mode pilot | Analyst-validated precision, recall, and time-to-detect |
Recommended Pilot Ask
Section titled “Recommended Pilot Ask”To move from prototype to operational validation, ask the CISO team for:
- Ground truth for Feb 14-20. Analyst-confirmed true positives, timestamps, and known activity phases.
- A four-to-six-week shadow-mode pilot. HyperMesh runs alongside the existing stack; analysts validate alarms.
- A same-data benchmark against the current EDR or UEBA. Agree the scoreboard before the pilot begins.
- A longer clean baseline. Weeks or months of normal Defender telemetry across a representative host mix.
- A live telemetry path. Defender API, Sentinel, Event Hub, or batch export cadence.
- A target alert destination. SIEM, SOAR, ticketing, or dashboard.
The pilot success criteria should be agreed up front:
| Criterion | Example Target |
|---|---|
| Precision | Analyst-validated precision above agreed SOC threshold |
| Alert volume | Within analyst capacity per day |
| Time-to-detect | Earlier or complementary detection versus current stack |
| Explainability | Analyst can trace each alert back to contributing events |
| Deployment fit | Runs within the customer’s governance and data residency constraints |
Anticipated CISO Questions
Section titled “Anticipated CISO Questions”| Question | Answer |
|---|---|
| How is this different from our EDR? | EDR detects known signatures and rule conditions. HyperMesh detects deviations from a host’s own behavioural baseline. It is additive, not a replacement. |
| Did the model train on the attack? | No. The production One-Class SVM trains only on baseline windows. The supervised logistic readout is diagnostic only. |
| Why should we care if recall is around 54%? | Because the signal is highly precise. It gives analysts a high-quality priority queue on top of existing controls. Recall improvement is the next engineering target. |
| What about false positives? | The operating point is tunable. In the NANXCV run, the baseline alarm rate was 6.8%, close to the intended 5% budget. |
| Does it scale to the fleet? | Yes, but pooled baselines reduce zero-shot recall. The next step is per-host or per-segment baselines. |
| Can analysts act on it? | Yes. Each alert can carry score, timestamp, dominant formations, novel entities, and contributing hyperedges. |
| What is needed for production confidence? | Ground truth, a shadow-mode pilot, and a head-to-head benchmark against the current stack. |
Reproducibility
Section titled “Reproducibility”# 1. Ingest baseline and attack hyperedge tables.venv/bin/python scripts/ingest_mde_baseline.py.venv/bin/python scripts/ingest_mde_fleet.py
# 2. Build five-minute temporal features.venv/bin/python scripts/temporal_analysis_nanxcv.py.venv/bin/python scripts/temporal_analysis_fleet.py
# 3. Train LSM reservoir and readouts.venv/bin/python scripts/snn_train_nanxcv.py \ --input data/temporal_nanxcv.json \ --output data/snn_results_nanxcv.json
.venv/bin/python scripts/snn_train_nanxcv.py \ --input data/temporal_ciso_fleet.json \ --output data/snn_results_ciso_fleet.jsonDefault model settings:
n_reservoir=500sparsity=0.10spectral_radius=0.95input_scaling=0.30leak_rate=0.30washout=30seed=7Glossary
Section titled “Glossary”| Term | Meaning |
|---|---|
| Hyperedge | A relationship linking more than two entities at once, such as machine + process + account + IP |
| Formation | Behaviour category such as NETWORK_CONN or PROCESS_EXEC |
| Temporal hypergraph | A hypergraph where each hyperedge has a timestamp |
| Liquid State Machine | A recurrent reservoir that gives the model fading memory of recent behaviour |
| Echo-state property | Stability condition ensuring the reservoir memory fades instead of exploding |
| One-Class SVM | A model that learns normal behaviour only and flags points outside that normal boundary |
| Zero-shot detection | Detecting attacks without training on examples of those attacks |
| Washout | Initial windows discarded while the reservoir state stabilises |
Source Artefacts
Section titled “Source Artefacts”All quantitative claims are reproduced from repository artefacts:
data/snn_results_nanxcv.jsondata/snn_results_ciso_fleet.jsondata/temporal_nanxcv.jsondata/temporal_ciso_fleet.jsonscripts/snn_train_nanxcv.pyscripts/temporal_common.pyscripts/ingest_mde_baseline.pydesign/methodology_temporal_hypergraph_snn.tex
HyperMesh — Endpoint Behavioural Anomaly Detection
Section titled “HyperMesh — Endpoint Behavioural Anomaly Detection”End-to-End Solution Brief for the CISO
Section titled “End-to-End Solution Brief for the CISO”Temporal Hypergraph + Liquid State Machine (LSM/SNN) + One-Class SVM Zero-shot detection of attacker behaviour in endpoint telemetry — learned from normal activity alone.
One-line thesis. We turn raw Microsoft Defender for Endpoint (MDE) logs into a hypergraph of co-occurring behaviour, give a fixed neural reservoir a fading memory of recent activity, and flag windows that don’t look like the machine’s own normal — without ever training on a single attack label. On the validation host this flags attacker windows at 90% precision while learning only from the clean week.
This document is the full technical and business narrative, in the order you can present it: Data → Ingestion → Hypergraph formation → Temporal features → Normalisation → LSM reservoir → One-Class SVM readout → Results → Limitations → Roadmap. Every figure here is reproduced from the live artefacts in this repository (sources cited inline).
0. Executive summary (the opening slide)
Section titled “0. Executive summary (the opening slide)”| Problem | SIEM/EDR rules catch known signatures. Sophisticated intrusions show up as a slow shift in patterns of co-occurring behaviour across days — no single threshold ever trips. |
| Approach | Model each endpoint event as a hyperedge (machine + process + account + IP, tagged with a behaviour “formation”). Summarise behaviour in 5-minute windows. A Liquid State Machine gives those windows temporal memory. A One-Class SVM learns the boundary of normal and flags anything outside it. |
| Key result (NANXCV host, zero-shot) | Precision 0.90, Recall 0.54, F1 0.67, ROC-AUC 0.74, PR-AUC 0.83 — trained on the clean week only, evaluated across 922 windows. |
| Why it matters | When the model fires, it is right 9 times out of 10 — the scarce resource in a SOC is analyst attention, and a 90%-precision zero-shot signal is directly actionable. |
| Honesty line | Recall is ~54% — this is an additive analyst-priority signal, not a replacement for EDR. We have not yet benchmarked against a commercial EDR/UEBA, and the “attack window” is defined by the CISO team’s operational calendar, not per-event ground truth. We say so on the slide. |
1. The problem & threat model
Section titled “1. The problem & threat model”Traditional SIEM/EDR pipelines operate on flat event rows. Each MDE timeline row carries strings — machine, process, account, network artefact — but no native primitive that says “these four identities acted together at this instant.” That relational fact is exactly where modern intrusions live.
- Rule-based detection excels when a signature exists. It is blind to novel tradecraft and to “living off the land,” where every individual action looks legitimate.
- Lateral movement, C2 beaconing, staging, and dwell typically manifest as a gradual shift in the distribution of co-occurring behaviour over hours or days — never as one counter crossing one line.
Our structural bets:
- Hypergraph ingestion — represent each event as a hyperedge over its participating identities plus a behaviour formation label. This preserves the n-way simultaneity a flat row destroys, and (unlike a normal graph) without flattening a 4-way co-occurrence into six lossy pairwise edges.
- Temporal memory — a high-dimensional reservoir state summarises recent windows, so slow drifts and bursts become separable from normal in state space.
- Learn normal, not attack — the production detector (One-Class SVM) is trained only on baseline behaviour. It needs no attack labels, so it generalises to threats never seen before. This is the “zero-shot” property.
2. The end-to-end pipeline (architecture)
Section titled “2. The end-to-end pipeline (architecture)” ┌──────────────┐ ┌───────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ MDE Events │ │ Ingestion │ │ Temporal HG │ │ 5-min windows │ │ Log (CSV/ │──▶│ → HyperMeshDB │──▶│ hyperedges: │──▶│ 16-dim feature │ │ XLSX) │ │ entity map + │ │ (ts, members, │ │ vector / window│ │ per host │ │ formations │ │ formation, wt) │ │ │ └──────────────┘ └───────────────┘ └──────────────────┘ └────────┬────────┘ Stage 1 Stage 2 Stage 3 Stage 4 │ ▼ ┌──────────────────┐ ┌─────────────────────────┐ ┌──────────────────────────┐ │ Results & alarms │ │ Readout │ │ LSM Reservoir │ │ per-window score │◀──│ • One-Class SVM (prod) │◀──│ 500 LIF neurons, fixed │ │ + ROC/PR + drift │ │ • LogReg (diagnostic) │ │ recurrent weights, │ │ │ │ on PCA(50) states │ │ fading memory ≈ 17 win │ └──────────────────┘ └─────────────────────────┘ └──────────────────────────┘ Stage 7 Stage 6 Stage 5 (z-score normalise on baseline-only first)Code map (this repository):
| Stage | Artefact |
|---|---|
| Ingestion | scripts/ingest_mde_baseline.py, scripts/ingest_mde_fleet.py, hypermeshdb/ingest/strategies/mde_baseline.py, hypermeshdb/connectors/mde.py |
| HG + temporal features | scripts/temporal_common.py, scripts/temporal_analysis_nanxcv.py, scripts/temporal_analysis_fleet.py |
| LSM + readouts | scripts/snn_train_nanxcv.py |
| Results | data/snn_results_nanxcv.json, data/snn_results_ciso_fleet.json, data/temporal_*.json |
| Deck / UI | data/HyperMesh_CISO_LSM_Deck.pptx, client/src/pages/SnnDashboard.tsx, design/snn-hypergraph-*deck.html |
| Methodology paper | design/methodology_temporal_hypergraph_snn.tex |
3. Stage 1 — The data
Section titled “3. Stage 1 — The data”Source. Microsoft Defender for Endpoint (MDE) timeline exports from the LTIMindTree CISO team (February 2026). Three real hosts:
| Host | Role | Baseline window | Notes |
|---|---|---|---|
NANXCV (nanxcv00f89340g) | Primary victim host | Feb 9–13, 2026 | Cleanest baseline (0% simulation traffic) — the validation host |
AZRPREPW (AZRCIPREPWCYMLT) | Windows VM | Feb 7–12, 2026 | ~14% Cymulate simulation traffic excluded |
AZRPREPL (AZRCIPREPLCYMLT) | Linux VM | Feb 6–12, 2026 | ~39% Cymulate simulation traffic excluded |
Attack slice. NANXCV, Feb 14–20, 2026 (Sample2.xlsx → SYS_NANXCV), the period the CISO team designated as the attack/exercise window.
Two-phase study design (identical method, different scope):
- Phase 1 — single host (NANXCV). Two separate tables (
NANXCV_BASELINE,SYS_NANXCV), entity IDs scoped per table. Cleanest behavioural contrast → the method-validation experiment. - Phase 2 — fleet (three hosts). All hosts merged into one table (
CISO_FLEET_MDE) with a global, deduplicated entity map, so a shared IP / account / file-hash resolves to the same node across machines — enabling cross-host structural links. This is the scale-realism experiment.
Honesty note to state up front. “Attack” is labelled by the CISO team’s operational calendar, not per-event kill-chain ground truth. Cymulate simulation traffic was excluded from baselines by team attestation. So our zero-shot scoring is principled; the supervised readout (later) is diagnostic only.
4. Stage 2 — Ingestion into HyperMeshDB
Section titled “4. Stage 2 — Ingestion into HyperMeshDB”scripts/ingest_mde_baseline.py drives MDEBaselineStrategy, which:
- Parses each MDE row and resolves its identities — machine, process (by SHA1 hash), account, and optionally remote IP — into stable integer node IDs via a persisted entity map (
data/CISO_FLEET_MDE_entity_map.json, 3,808 typed entities in the fleet run). Example entries:machine:nanxcv00f89340g,process:eb42621654…(SHA1),account:nt authority\system. - Excludes Cymulate simulation traffic from baseline tables (so “normal” is genuinely normal).
- Aggregates rows into hyperedges on a configurable window (default 5 min;
--window 0= per-row), writing native HyperMeshDB hyperedge tables. - Emits ingest stats: hyperedge count, entity count, formation vocabulary, entity-map path.
Ingestion magnitudes (from the run artefacts):
| Phase 1 (NANXCV) | Phase 2 (Fleet) | |
|---|---|---|
| Baseline hyperedges | 14,034 | — |
| Attack hyperedges | 13,341 | — |
| Total hyperedges | — | 39,016 (from ~374k raw rows after Cymulate exclusion) |
| Unique baseline entities | 1,775 | 2,993 |
| Novel entities in attack | 53 (~3.0%) | 815 (vs. broader pooled baseline) |
(Phase-1 hyperedge totals are from the methodology briefing; entity/bucket counts below are read directly from data/temporal_nanxcv.json and data/temporal_ciso_fleet.json.)
5. Stage 3 — Hypergraph formation
Section titled “5. Stage 3 — Hypergraph formation”Each MDE event becomes exactly one hyperedge:
e = ( event_ts, members, formation, weight )event_ts— Unix epoch seconds.members— list of integer node IDs (machine, process, account, [IP]) acting together in that event.formation— one of 11 behaviour categories (the compact behavioural vocabulary):PROCESS_EXEC, NETWORK_CONN, FILE_OPS, REGISTRY_OPS, DNS_LOOKUP, HTTP_TRAFFIC, IPC_PIPE, MODULE_LOAD, LOGON, SCRIPT_EXEC, GENERIC.weight— 1.0 in the prototype (reserved for future confidence/volume weighting).
Why a hypergraph, not a graph? A single “process X, run by account Y, on machine Z, talking to IP W” event is one 4-way fact. A binary graph must shatter it into 6 pairwise edges, losing the simultaneity — the very thing that distinguishes a coordinated attacker action from coincidence. Oversized member sets (>64 endpoints in one edge) are split with the machine node as an anchor — an implementation guardrail, not a statistical claim.
This is the HyperMesh differentiator: n-ary behaviour preserved as a first-class primitive, queryable in the Graph Explorer / Lens, and feedable directly into temporal analytics.
6. Stage 4 — Temporal aggregation & the 16-dim feature vector
Section titled “6. Stage 4 — Temporal aggregation & the 16-dim feature vector”The timeline is partitioned into contiguous 5-minute windows (temporal_common.py). For each window t we build a 16-dimensional feature vector u(t), in this exact order:
Aggregate statistics (5):
hedge_count— hyperedge arrivals in the windowentity_count— distinct active node IDsnovelty_rate— fraction of entities never seen in the baseline set (drift signal)mean_members— average hyperedge cardinality (how “wide” behaviour is)formation_entropy— Shannon entropy over the 11 formations (behavioural diversity)
Formation histogram (11):
6–16. raw counts form_PROCESS_EXEC … form_GENERIC
Window counts (from the live exports):
| Phase 1 (NANXCV) | Phase 2 (Fleet) | |
|---|---|---|
| Total windows | 952 (457 baseline + 495 attack) | 1,614 (1,119 baseline + 495 attack) |
| Windows scored (post-washout) | 922 (427 baseline + 495 attack) | 1,584 |
Interpretability bonus — changepoints. A separate analysis runs binary-segmentation changepoint detection on formation_entropy. Volumetric features (hedge_count, entity_count) produced hundreds of noisy changepoints driven by business rhythm; entropy yielded only 4 statistically flagged breakpoints, aligned with attack-phase dynamics. The sharpest: Feb 17, 09:15 UTC, entropy collapsing to 0.65 from a running mean near 2.34 — behaviour concentrating into NETWORK_CONN + PROCESS_EXEC, consistent with C2/lateral-movement profiles (stated qualitatively; we do not assert a labelled MITRE mapping). All 16 channels still feed the reservoir; entropy is the human-readable pivot.
7. Stage 5 — Normalisation that mimics deployment
Section titled “7. Stage 5 — Normalisation that mimics deployment”Stack raw features into X_raw ∈ ℝ^(T×16). A z-score normaliser is fit on baseline windows only (y=0):
μ_j = mean(X_raw[:,j] | baseline) σ_j = std(X_raw[:,j] | baseline) (σ←1 if σ<1e-8) u_tj = (X_raw[tj] − μ_j) / σ_jThis is deliberate: scaling statistics come from presumed-normal history, never from a future distribution that contains attacks. It is the first of several places where we hold the line on “the model only ever learns from normal.”
8. Stage 6 — The Liquid State Machine reservoir
Section titled “8. Stage 6 — The Liquid State Machine reservoir”The LSM (an Echo State Network with leaky-integrator neurons — “spiking/liquid” in neuromorphic terms) maintains a hidden state x(t) ∈ ℝ^N that evolves as:
x(t) = (1 − α)·x(t−1) + α·tanh( W·x(t−1) + W_in·u(t) + ε_t )W_in ∈ ℝ^(N×16)— random input projection, scaled byinput_scaling.W ∈ ℝ^(N×N)— sparse random recurrent matrix, rescaled so its spectral radius ρ < 1 (the echo-state property — guarantees a fading, stable memory). The script measures ρ after construction and asserts it’s < 1.0.α(leak_rate) — membrane leak; controls how fast memory fades.ε_t— tiny Gaussian noise (1e-4) for regularisation.
Hyperparameters (defaults, tuned for this data scale):
| Param | Value | Meaning |
|---|---|---|
n_reservoir (N) | 500 | reservoir neurons |
spectral_radius (ρ) | 0.95 | echo-state stability (<1) |
leak_rate (α) | 0.30 | membrane leak → memory time-constant 1/α ≈ 3 windows (~15 min); effective context is longer via recurrence (design notes cite ~15–30 windows) |
sparsity | 0.10 | 10% non-zero recurrent connections |
input_scaling | 0.30 | input weight scale |
washout | 30 | warm-up windows discarded before scoring |
seed | 7 | reproducibility |
Why an LSM — the three reasons that matter to a CISO:
- Fading memory.
x(t)encodes not just this window but the recent evolution of behaviour (leak time-constant ≈ 3 windows; effective context extends further via recurrence). A single anomalous window bends the whole state trajectory — slow drifts become visible. - Recurrent weights are FIXED — no backprop. Only the lightweight readout is trained. With only ~10³ windows, this avoids the overfitting that plagues fully-trained deep RNNs on small security datasets, and it’s fast and deterministic.
- Expansion. Projecting 16-dim behaviour into 500-dim reservoir space makes attack-driven deviations accumulate and become linearly separable by a simple boundary.
9. Stage 7 — Readout models
Section titled “9. Stage 7 — Readout models”Reservoir states (post-washout) → PCA to 50 components, fit on baseline rows only → StandardScaler (also baseline-fit). Both readouts consume the same 50-dim, baseline-centred coordinates.
9a. Mode A — One-Class SVM (the production detector)
Section titled “9a. Mode A — One-Class SVM (the production detector)” OneClassSVM(kernel="rbf", gamma="scale", nu=0.05) # trained ONLY on baseline states anomaly_score = −decision_function(state) # higher = more anomalous- Trained exclusively on baseline behaviour — never sees an attack label. This is the deployable, zero-shot detector.
nu = 0.05sets the baseline false-alarm budget to ~5%; the observed rate is 29/427 = 6.8% of baseline windows (Phase 1) — close to budget and expected by design, not a calibration failure. (Fleet: 60/1,089 = 5.5%.)- This is the model behind the headline numbers.
9b. Mode B — Logistic Regression (diagnostic only)
Section titled “9b. Mode B — Logistic Regression (diagnostic only)”A logistic readout on the same PCA(50) reservoir states, under a temporal 60/40 train/test split that does use attack labels. It is explicitly NOT a deployment simulation — it leaks future knowledge. It is reported only as a comparison point. Note: in Phase 1 this diagnostic readout actually scored lower (ROC-AUC 0.55) than the zero-shot OC-SVM (0.74) — we attribute this to the temporal split (training on the first 60% of attack windows, testing on the last 40%), not to feature quality; in Phase 2 the same readout reaches ROC-AUC 0.91, showing the separating signal is present. Never present Mode B as operational performance. (Caveat: the code contains no raw-16-dimension classifier, so we make no claim about raw-feature separability — both readouts consume reservoir states.)
10. Results
Section titled “10. Results”10a. Phase 1 — NANXCV (single host, the validation experiment)
Section titled “10a. Phase 1 — NANXCV (single host, the validation experiment)”One-Class SVM (zero-shot, production-realistic) — 922 windows scored:
| Metric | Value |
|---|---|
| Precision | 0.902 |
| Recall | 0.537 |
| F1 | 0.673 |
| ROC-AUC | 0.744 |
| PR-AUC | 0.831 |
| Accuracy | 0.720 |
Confusion matrix: TP 266, FP 29, FN 229, TN 398 (baseline = 427, attack = 495). Reservoir drift: attack windows sit +13.6% further from the baseline centroid (L2) than baseline windows — geometric confirmation the reservoir separates the phases.
The headline for the slide: “When it fires, it is right 90% of the time, having learned only from the clean week.”
Timeline narrative (from per-window scores): a Feb 16 burst dominated by NETWORK_CONN peaks at normalised score 1.000; the Feb 17 09:15 UTC entropy collapse to 0.65; a quiet Feb 18 dwell with few alarms; renewed activity Feb 19–20.
10b. Phase 2 — Fleet (three hosts, same method, no re-tuning)
Section titled “10b. Phase 2 — Fleet (three hosts, same method, no re-tuning)”| Metric | One-Class SVM (zero-shot) | Logistic (diagnostic only) |
|---|---|---|
| Precision | 0.731 | 0.561 |
| Recall | 0.329 | 0.980 |
| F1 | 0.454 | 0.713 |
| ROC-AUC | 0.713 | 0.908 |
| PR-AUC | 0.613 | 0.809 |
1,584 windows scored. Top anomalous cluster again anchored on the Feb 16 afternoon NETWORK_CONN burst.
Be transparent about the fleet number. Zero-shot recall drops to 33% and the centroid separation is slightly negative (−1.1%) — pooling three heterogeneous hosts into one baseline widens the “normal” envelope, so more attack windows hide inside it, with no hyperparameter re-tuning for the fleet. Phase 1 and Phase 2 are not directly comparable (“better/worse”) because the boundary is fit to different baseline distributions. The supervised AUC of 0.91 shows the signal is there — the zero-shot boundary just needs per-segment tuning to recover it. This is a roadmap item, not a hidden flaw.
11. Limitations & explicit non-claims (put these on a slide)
Section titled “11. Limitations & explicit non-claims (put these on a slide)”- Recall ~50% (single host) — many attack windows that mimic baseline statistics stay inside the boundary, especially during dwell. This is an analyst-priority signal layered on top of EDR, not a replacement.
- No MITRE technique IDs, no attribution, no automated forensic narrative. We flag behaviourally anomalous windows; a human investigates.
- “Attack phase” = operational calendar, not per-event ground truth. Metrics inherit that labelling assumption.
- No commercial-EDR / UEBA head-to-head benchmark yet.
- Fleet hyperparameters not jointly re-tuned; cross-phase comparison requires care.
- Supervised (logistic) results are diagnostic only — they use future labels and must not be read as deployment performance.
- Single-tenant validation on one CISO team’s February-2026 data; broader generalisation is unproven.
- Attack-window provenance (red-team vs. simulated vs. production) should be confirmed with the data owner before external use.
12. Roadmap & the ask
Section titled “12. Roadmap & the ask”What we’d build next, in priority order:
- Lift recall — per-host / per-segment baseline boundaries, adaptive
nu, richer features (sequence-of-formations, inter-arrival timing) so dwell-phase attacks surface. - Fleet-scale zero-shot — per-host normalisation + hierarchical reservoirs to recover the signal the supervised AUC (0.91) proves is present.
- Head-to-head benchmark vs. a commercial EDR/UEBA on the same telemetry — the credibility milestone.
- Live SOC shadow-mode pilot — run alongside the existing stack, measure analyst-validated precision/recall and time-to-detect on real traffic.
- Explainability surface — per-alarm “why” (dominant formations, novel entities, contributing hyperedges) wired into the
SnnDashboardUI.
The ask: access to a labelled (or red-team-tagged) production window + permission for a shadow-mode pilot, so we can report analyst-validated precision/recall and a fair EDR benchmark.
13. Joint next-stage agenda — what we need from the CISO & team
Section titled “13. Joint next-stage agenda — what we need from the CISO & team”Frame for the meeting: “To take this from a validated prototype to an operational pilot, here is what we’d need from you.” Each question maps to closing a gap already named in §11 (Limitations). Position the limitations as a joint roadmap, not weaknesses.
The 3 asks that decide whether there is a Phase 2 (lead with these)
Section titled “The 3 asks that decide whether there is a Phase 2 (lead with these)”- Ground truth for Feb 14–20 — “Can we get analyst-confirmed true positives for that window?” → converts our calendar-labelled result into a defensible evaluation.
- Shadow-mode pilot — “Will you sponsor a 4–6 week shadow-mode run on live telemetry?” → the only way to prove operational precision/recall.
- EDR benchmark — “Can we run head-to-head against your current EDR/UEBA on the same data?” → the credibility milestone.
Everything below feeds these three.
13a. Data — tighten “normal” and get real labels
Section titled “13a. Data — tighten “normal” and get real labels”| Ask | What it unblocks |
|---|---|
| Provenance of the attack window — red-team, Cymulate simulation, or real production incident? | We currently cannot say on the slide; determines whether 90% precision is against real or simulated adversary behaviour. |
| Per-event ground truth / kill chain — validated TPs, timestamps, MITRE techniques | Replaces operational-calendar labelling; lets us report real recall. |
| Longer & broader baseline — weeks–months, more hosts, server + workstation mix | Our “normal” is one week / one host; more baseline = tighter boundary, fewer false alarms, better recall. |
| Cymulate intent — detect simulated attacks, or exclude as noise? | We exclude it today; if they want it detected it becomes a free labelled positive set. |
| Full MDE schema access — DeviceProcess / DeviceNetwork / DeviceLogon fields, not just the timeline export | Richer features (sequence-of-formations, inter-arrival timing) that lift recall. |
| Data governance — PII / residency constraints, standing feed vs. one-off CSV exports | Decides whether deployment is legally possible and whether we get live data. |
13b. Model — define the target before tuning
Section titled “13b. Model — define the target before tuning”| Ask | What it unblocks |
|---|---|
| Desired operating point — precision-first or recall-first? | Directly sets the OC-SVM threshold / nu; we can’t tune without it. |
| Alert budget — alerts/day/analyst acceptable? | Defines the false-alarm ceiling; maps our 6.8% baseline-window rate to their volume tolerance. |
| What do they miss today? — detections the current stack fails on | Anchors value to a real gap, not abstract “anomalies.” |
| Explainability requirement — need a per-alarm “why” (dominant formations, novel entities, contributing hyperedges)? | Decides whether we build the explainability surface before or after the pilot. |
| Per-host vs. fleet modelling — one model or per-asset baselines? | The fix for the fleet recall drop (33%); supervised AUC 0.91 proves the signal is there, per-host baselines recover it. |
| Benchmark definition — what does “better than our EDR” mean numerically? | Agree the scoreboard before we play, not after. |
13c. Deployment — prove it in production, on their rails
Section titled “13c. Deployment — prove it in production, on their rails”| Ask | What it unblocks |
|---|---|
| Shadow-mode pilot — 4–6 weeks alongside production, analysts validate each alarm | The headline ask; yields analyst-validated precision/recall and time-to-detect. |
| Pilot exit criteria — agree now what result triggers full adoption | Prevents a “great demo, no decision” outcome. |
| Live telemetry feed — Defender API / Event Hub / Sentinel stream vs. batch exports | Determines architecture and whether 5-min-window cadence is feasible. |
| Alert destination — SIEM (Sentinel?), SOAR, ticketing; format/API? | Defines integration scope and effort. |
| Latency SLA — is 5-minute batch acceptable, or near-real-time needed? | Sets architecture and cost. |
| Where the model runs — their tenant/cloud, on-prem, or our environment; data egress permitted? | Often the hardest blocker; surface it early, not after the pilot is agreed. |
| Ownership — who triages and validates alarms during the pilot? | A pilot with no analyst owner produces no validated labels and fails silently. |
Appendix A — Reproducibility
Section titled “Appendix A — Reproducibility”# 1. Ingest baseline + attack hyperedge tables.venv/bin/python scripts/ingest_mde_baseline.py # NANXCV (Phase 1).venv/bin/python scripts/ingest_mde_fleet.py # CISO_FLEET_MDE (Phase 2)
# 2. Build 5-min temporal features.venv/bin/python scripts/temporal_analysis_nanxcv.py # → data/temporal_nanxcv.json.venv/bin/python scripts/temporal_analysis_fleet.py # → data/temporal_ciso_fleet.json
# 3. Train LSM + readouts, export results.venv/bin/python scripts/snn_train_nanxcv.py \ --input data/temporal_nanxcv.json --output data/snn_results_nanxcv.json.venv/bin/python scripts/snn_train_nanxcv.py \ --input data/temporal_ciso_fleet.json --output data/snn_results_ciso_fleet.jsonDefaults (in-code): n_reservoir=500, sparsity=0.10, spectral_radius=0.95, input_scaling=0.30, leak_rate=0.30, washout=30, seed=7. CLI overrides exist for all of them.
Appendix B — Glossary (for non-specialist stakeholders)
Section titled “Appendix B — Glossary (for non-specialist stakeholders)”- Hyperedge — a single relationship linking more than two things at once (here: machine + process + account + IP acting together). A normal graph edge links only two.
- Formation — the behavioural category of an event (e.g.
NETWORK_CONN,PROCESS_EXEC). - Liquid State Machine (LSM) / reservoir — a fixed (untrained) recurrent neural network that gives the model a short fading memory of recent behaviour. Cheap, fast, hard to overfit.
- Echo-state property (spectral radius < 1) — the mathematical guarantee that the reservoir’s memory fades smoothly rather than blowing up.
- One-Class SVM — a model that learns the shape of “normal” from normal data only, then flags anything outside that shape. The engine of zero-shot detection.
- Zero-shot — detecting attacks the model was never trained on, by learning only what normal looks like.
- Washout — initial windows discarded while the reservoir “warms up.”
Appendix C — Anticipated CISO questions
Section titled “Appendix C — Anticipated CISO questions”| Question | Answer |
|---|---|
| ”How is this different from our EDR?” | EDR fires on known signatures/rules. This fires on behavioural deviation from this machine’s own normal — catching novel/living-off-the-land activity rules miss. It’s additive, not a replacement. |
| ”Why should I trust a 54% recall?” | You trust the precision (90%): when it fires it’s almost always real, so it’s a high-quality analyst-priority queue. Recall improves on the roadmap; it’s a net-new signal on top of full EDR coverage. |
| ”Did you train on the attack?” | No. The production detector (One-Class SVM) trains only on the clean baseline week. The one model that uses attack labels (logistic) is clearly flagged diagnostic-only and never presented as operational. |
| ”What about false alarms?” | ~6.8% of baseline windows fire (29/427) against a deliberate, tunable nu=0.05 (~5%) budget. At 5-min windows that’s a manageable, prioritisable volume. |
| ”Does it scale to my whole fleet?” | Single-host is strong; pooled-fleet zero-shot recall drops (33%) without per-host tuning, though supervised AUC (0.91) shows the signal is there. Per-host baselines are the next build. |
| ”Can my analysts act on an alarm?” | Each alarm carries timestamp, anomaly score, dominant formation, novel-entity count, and reservoir drift — enough to triage. Full per-alarm explainability is on the roadmap. |
All quantitative claims in this document are reproduced from repository artefacts: data/snn_results_nanxcv.json, data/snn_results_ciso_fleet.json, data/temporal_nanxcv.json, data/temporal_ciso_fleet.json, and the implementation in scripts/snn_train_nanxcv.py / scripts/temporal_common.py / scripts/ingest_mde_baseline.py. Phase-1 hyperedge totals and the entropy-changepoint narrative are from design/methodology_temporal_hypergraph_snn.tex.