HyperMesh CISO Solution Brief

Endpoint Behavioural Anomaly Detection With Temporal Hypergraphs and Liquid State Machines

Audience: CISO, SOC leadership, security architecture, detection engineering
Use case: Zero-shot behavioural anomaly detection from Microsoft Defender for Endpoint telemetry
Core stack: HyperMeshDB + temporal hypergraph features + Liquid State Machine reservoir + One-Class SVM
Primary result: 90.2% precision on the NANXCV validation host while training only on clean baseline activity

Executive Summary

Modern endpoint attacks often do not announce themselves as one obvious bad event. They emerge as a gradual shift in how machines, processes, accounts, and network artefacts co-occur over time. Traditional SIEM and EDR rules are excellent for known signatures, but they struggle when each individual action looks legitimate in isolation.

HyperMesh addresses this gap by modelling endpoint telemetry as a temporal hypergraph. Each event becomes a first-class n-way behavioural fact linking the machine, process, account, and IP involved at that moment. A Liquid State Machine then gives the system a short fading memory of recent behaviour, and a One-Class SVM learns the boundary of normal activity from clean baseline data only.

The key operational claim is simple:

HyperMesh can learn what normal endpoint behaviour looks like from Defender telemetry and flag anomalous behaviour windows without training on attack labels.

On the NANXCV validation host, the zero-shot detector achieved:

Metric	Result
Precision	0.902
Recall	0.537
F1	0.673
ROC-AUC	0.744
PR-AUC	0.831
Accuracy	0.720

This means that when the detector fires on this validation run, it is right about 9 times out of 10. That is the most important number for a SOC, because analyst attention is the scarce resource.

This is not positioned as an EDR replacement. It is an additive analyst-priority signal that surfaces behaviourally unusual windows for investigation.

The CISO Message

The Problem

Security teams already collect rich endpoint telemetry, but most pipelines still treat it as flat rows. A row may contain a machine, process, account, IP address, timestamp, and action type, yet the detection system often loses the fact that these identities acted together.

Attackers exploit this gap. Living-off-the-land behaviour, lateral movement, staging, command-and-control, and dwell time often appear as changes in patterns of co-occurrence, not as single events that cross a static threshold.

The HyperMesh Approach

HyperMesh turns endpoint telemetry into a temporal hypergraph:

machine + process + account + IP + behaviour category + timestamp

Each event is stored as a hyperedge, preserving the n-way relationship directly. Temporal analytics then summarize behaviour in five-minute windows, and the reservoir model tracks how those windows evolve.

Why It Matters

The model is trained on normal activity only. It does not need examples of every attack technique. This makes it suitable for a zero-shot detection setting where future attacker behaviour may not match yesterday’s signatures.

End-to-End Architecture

┌────────────────────┐
│ Microsoft Defender │
│ Endpoint telemetry │
└─────────┬──────────┘
          │
          ▼
┌────────────────────┐
│ HyperMesh ingest   │
│ MDE rows → entities│
│ and hyperedges     │
└─────────┬──────────┘
          │
          ▼
┌────────────────────┐
│ Temporal hypergraph│
│ machine + process  │
│ account + IP       │
└─────────┬──────────┘
          │
          ▼
┌────────────────────┐
│ 5-minute features  │
│ 16-dim vector per  │
│ time window        │
└─────────┬──────────┘
          │
          ▼
┌────────────────────┐
│ LSM reservoir      │
│ 500 recurrent      │
│ memory neurons     │
└─────────┬──────────┘
          │
          ▼
┌────────────────────┐
│ One-Class SVM      │
│ learns normal only │
└─────────┬──────────┘
          │
          ▼
┌────────────────────┐
│ Anomaly timeline   │
│ score + evidence   │
│ for SOC triage     │
└────────────────────┘

Repository artefacts:

Layer	Implementation
MDE ingestion	`scripts/ingest_mde_baseline.py`, `scripts/ingest_mde_fleet.py`, `hypermeshdb/ingest/strategies/mde_baseline.py`, `hypermeshdb/connectors/mde.py`
Temporal features	`scripts/temporal_common.py`, `scripts/temporal_analysis_nanxcv.py`, `scripts/temporal_analysis_fleet.py`
Reservoir and readouts	`scripts/snn_train_nanxcv.py`
Results	`data/snn_results_nanxcv.json`, `data/snn_results_ciso_fleet.json`, `data/temporal_nanxcv.json`, `data/temporal_ciso_fleet.json`
Presentation assets	`data/HyperMesh_CISO_LSM_Deck.pptx`, `client/src/pages/SnnDashboard.tsx`, `design/snn-hypergraph-*deck.html`
Methodology	`design/methodology_temporal_hypergraph_snn.tex`

Data Used

The study uses Microsoft Defender for Endpoint timeline exports provided by the LTIMindTree CISO team for February 2026.

Host	Role	Baseline Window	Notes
NANXCV (`nanxcv00f89340g`)	Primary victim host	Feb 9-13, 2026	Cleanest baseline; 0% simulation traffic; validation host
AZRPREPW (`AZRCIPREPWCYMLT`)	Windows VM	Feb 7-12, 2026	Cymulate simulation traffic excluded
AZRPREPL (`AZRCIPREPLCYMLT`)	Linux VM	Feb 6-12, 2026	Cymulate simulation traffic excluded

The attack/exercise slice is NANXCV, Feb 14-20, 2026, stored as SYS_NANXCV.

Two study modes were run:

Phase	Scope	Purpose
Phase 1	Single host, NANXCV	Method validation with the cleanest baseline and clearest behavioural contrast
Phase 2	Three-host fleet	Scale-realism test using a global entity map across machines

Important caveat: the attack window is labelled by the CISO team’s operational calendar, not by per-event kill-chain ground truth. That is sufficient for prototype validation, but a production evaluation needs analyst-confirmed labels.

Hypergraph Ingestion

The MDE ingestion strategy converts raw endpoint rows into typed entities and hyperedges.

For each event, HyperMesh resolves:

Entity Type	Source Field
Machine	`Computer Name` or `Machine Id`
Process	`Initiating Process SHA1`, with filename used for display
Account	`Initiating Process Account Domain` + `Initiating Process Account Name`
IP	`Remote IP`, when present
Formation	Mapped from `Action Type`

Each row becomes one behavioural hyperedge:

e = (event_ts, members, formation, weight)

Where:

Field	Meaning
`event_ts`	Unix timestamp for the endpoint event
`members`	Integer entity IDs for machine, process, account, and optional IP
`formation`	Compact behavioural category
`weight`	Event count or aggregated event volume

The formation vocabulary is:

PROCESS_EXEC
NETWORK_CONN
FILE_OPS
REGISTRY_OPS
DNS_LOOKUP
HTTP_TRAFFIC
IPC_PIPE
MODULE_LOAD
LOGON
SCRIPT_EXEC
GENERIC

Why this matters: a normal graph would split one 4-way behaviour into multiple pairwise edges. A hypergraph preserves the complete behavioural fact as one object.

Ingestion Magnitudes

Measure	NANXCV Phase 1	Fleet Phase 2
Baseline hyperedges	14,034	-
Attack hyperedges	13,341	-
Total hyperedges	-	39,016
Approx. raw rows after Cymulate exclusion	-	374,000
Unique baseline entities	1,775	2,993
Novel entities in attack	53	815

The fleet run uses a global, deduplicated entity map so shared IPs, accounts, and file hashes resolve to the same node across hosts.

Temporal Features

HyperMesh partitions the telemetry into contiguous five-minute windows. Each window becomes a 16-dimensional feature vector.

Aggregate Features

Feature	Description
`hedge_count`	Number of hyperedges in the window
`entity_count`	Number of distinct active entities
`novelty_rate`	Fraction of entities not seen in baseline
`mean_members`	Average hyperedge cardinality
`formation_entropy`	Behavioural diversity across formation types

Formation Histogram

The remaining 11 channels are raw counts for each formation category.

Measure	NANXCV Phase 1	Fleet Phase 2
Total windows	952	1,614
Baseline windows	457	1,119
Attack windows	495	495
Windows scored after washout	922	1,584

An important interpretability signal came from formation_entropy: volumetric features generated many noisy changepoints, but entropy produced only a small number of statistically meaningful shifts. The sharpest was Feb 17, 09:15 UTC, where entropy collapsed to 0.65 from a running mean near 2.34, indicating concentration into a narrower behavioural mix dominated by network and process execution activity.

Baseline-Only Normalisation

The model is deliberately normalised using baseline windows only.

u_tj = (X_tj - mean_j_baseline) / std_j_baseline

This mimics production deployment. In a real environment, the model would learn from an assumed-normal historical period and then score future activity. No future attack distribution is used to scale the features.

Liquid State Machine Reservoir

The reservoir is a fixed recurrent neural system that gives each five-minute window a fading memory of recent activity.

x(t) = (1 - α) x(t-1) + α tanh(W x(t-1) + W_in u(t) + ε_t)

Parameter	Value	Meaning
`n_reservoir`	500	Number of reservoir neurons
`spectral_radius`	0.95	Keeps the reservoir stable and fading
`leak_rate`	0.30	Controls memory decay
`sparsity`	0.10	Sparse recurrent connectivity
`input_scaling`	0.30	Input projection scale
`washout`	30	Initial windows discarded before scoring
`seed`	7	Reproducibility

The recurrent weights are fixed. There is no deep backpropagation. Only the readout is trained. This is important for security telemetry because the number of labelled attack windows is usually small and overfitting risk is high.

Detection Models

Production Mode: One-Class SVM

The deployable detector is a One-Class SVM trained only on baseline reservoir states.

OneClassSVM(kernel="rbf", gamma="scale", nu=0.05)
anomaly_score = -decision_function(state)

nu = 0.05 sets an approximate baseline false-alarm budget of 5%. In the NANXCV run, the observed baseline alarm rate was 29 / 427 = 6.8%, close to the intended operating point.

Diagnostic Mode: Logistic Regression

A logistic readout was also tested on the same reservoir states. It uses labels and is therefore not a production simulation. It is included only to assess whether separability exists in the reservoir representation.

The key message: operational claims should use the One-Class SVM results, not the supervised diagnostic readout.

Results

Phase 1: NANXCV Validation Host

One-Class SVM, trained only on clean baseline windows:

Metric	Value
Precision	0.902
Recall	0.537
F1	0.673
ROC-AUC	0.744
PR-AUC	0.831
Accuracy	0.720

Confusion matrix:

	Predicted Normal	Predicted Anomalous
Actual Baseline	398	29
Actual Attack	229	266

Interpretation:

The detector is highly precise: when it fires, it is usually meaningful.
Recall is moderate: some attack windows, especially dwell-like behaviour, remain inside the normal boundary.
Attack windows sit 13.6% farther from the baseline reservoir centroid than baseline windows, confirming a measurable state-space drift.

Headline:

When HyperMesh fires on the NANXCV validation host, it is right 90% of the time, without training on attack labels.

Phase 2: Three-Host Fleet

Metric	One-Class SVM	Logistic Diagnostic
Precision	0.731	0.561
Recall	0.329	0.980
F1	0.454	0.713
ROC-AUC	0.713	0.908
PR-AUC	0.613	0.809

Interpretation:

The zero-shot fleet model remains precision-oriented but recall drops.
Pooling heterogeneous hosts widens the normal envelope.
The supervised diagnostic AUC of 0.908 shows the signal exists, but the zero-shot boundary needs per-host or per-segment baselines to recover recall.

This is a roadmap item, not a hidden flaw.

What Analysts Would See

Each anomaly window can be surfaced with:

Evidence	Why It Helps
Timestamp	Places the alert on the incident timeline
Anomaly score	Prioritises the analyst queue
Dominant formations	Shows whether the window is network-heavy, process-heavy, etc.
Novel entity count	Highlights new accounts, processes, hashes, or IPs
Contributing hyperedges	Links back to raw Defender evidence
Reservoir drift	Quantifies departure from baseline behaviour

This makes the model investigable rather than a black-box score.

Honest Limitations

These should be stated explicitly in any CISO presentation.

Recall is not yet high enough to replace existing controls. The NANXCV result is precision-first and should be used as an analyst-priority signal layered on top of EDR.
The attack label is calendar-based. The Feb 14-20 window was designated by the CISO team, but per-event kill-chain labels are not yet available.
No MITRE technique attribution is claimed. The system flags behaviourally anomalous windows; analysts determine root cause.
No commercial EDR or UEBA head-to-head benchmark has been run yet.
Fleet modelling needs per-host or per-segment boundaries. Pooled baselines are too broad for best zero-shot recall.
The logistic model is diagnostic only. It uses labels and should not be presented as deployable performance.
Validation is single-tenant. Broader generalisation requires additional environments.

Roadmap

Priority	Workstream	Outcome
1	Per-host and per-segment baselines	Higher recall without sacrificing precision
2	Adaptive operating point	Tune alert budget to SOC capacity
3	Richer temporal features	Capture dwell, inter-arrival timing, and formation sequences
4	Explainability UI	Per-alert “why” view in `SnnDashboard`
5	EDR / UEBA benchmark	Credible side-by-side comparison on the same telemetry
6	Shadow-mode pilot	Analyst-validated precision, recall, and time-to-detect

Recommended Pilot Ask

To move from prototype to operational validation, ask the CISO team for:

Ground truth for Feb 14-20. Analyst-confirmed true positives, timestamps, and known activity phases.
A four-to-six-week shadow-mode pilot. HyperMesh runs alongside the existing stack; analysts validate alarms.
A same-data benchmark against the current EDR or UEBA. Agree the scoreboard before the pilot begins.
A longer clean baseline. Weeks or months of normal Defender telemetry across a representative host mix.
A live telemetry path. Defender API, Sentinel, Event Hub, or batch export cadence.
A target alert destination. SIEM, SOAR, ticketing, or dashboard.

The pilot success criteria should be agreed up front:

Criterion	Example Target
Precision	Analyst-validated precision above agreed SOC threshold
Alert volume	Within analyst capacity per day
Time-to-detect	Earlier or complementary detection versus current stack
Explainability	Analyst can trace each alert back to contributing events
Deployment fit	Runs within the customer’s governance and data residency constraints

Anticipated CISO Questions

Question	Answer
How is this different from our EDR?	EDR detects known signatures and rule conditions. HyperMesh detects deviations from a host’s own behavioural baseline. It is additive, not a replacement.
Did the model train on the attack?	No. The production One-Class SVM trains only on baseline windows. The supervised logistic readout is diagnostic only.
Why should we care if recall is around 54%?	Because the signal is highly precise. It gives analysts a high-quality priority queue on top of existing controls. Recall improvement is the next engineering target.
What about false positives?	The operating point is tunable. In the NANXCV run, the baseline alarm rate was 6.8%, close to the intended 5% budget.
Does it scale to the fleet?	Yes, but pooled baselines reduce zero-shot recall. The next step is per-host or per-segment baselines.
Can analysts act on it?	Yes. Each alert can carry score, timestamp, dominant formations, novel entities, and contributing hyperedges.
What is needed for production confidence?	Ground truth, a shadow-mode pilot, and a head-to-head benchmark against the current stack.

Reproducibility

# 1. Ingest baseline and attack hyperedge tables
.venv/bin/python scripts/ingest_mde_baseline.py
.venv/bin/python scripts/ingest_mde_fleet.py

# 2. Build five-minute temporal features
.venv/bin/python scripts/temporal_analysis_nanxcv.py
.venv/bin/python scripts/temporal_analysis_fleet.py

# 3. Train LSM reservoir and readouts
.venv/bin/python scripts/snn_train_nanxcv.py \
  --input data/temporal_nanxcv.json \
  --output data/snn_results_nanxcv.json

.venv/bin/python scripts/snn_train_nanxcv.py \
  --input data/temporal_ciso_fleet.json \
  --output data/snn_results_ciso_fleet.json

Default model settings:

n_reservoir=500
sparsity=0.10
spectral_radius=0.95
input_scaling=0.30
leak_rate=0.30
washout=30
seed=7

Glossary

Term	Meaning
Hyperedge	A relationship linking more than two entities at once, such as machine + process + account + IP
Formation	Behaviour category such as `NETWORK_CONN` or `PROCESS_EXEC`
Temporal hypergraph	A hypergraph where each hyperedge has a timestamp
Liquid State Machine	A recurrent reservoir that gives the model fading memory of recent behaviour
Echo-state property	Stability condition ensuring the reservoir memory fades instead of exploding
One-Class SVM	A model that learns normal behaviour only and flags points outside that normal boundary
Zero-shot detection	Detecting attacks without training on examples of those attacks
Washout	Initial windows discarded while the reservoir state stabilises

Source Artefacts

All quantitative claims are reproduced from repository artefacts:

data/snn_results_nanxcv.json
data/snn_results_ciso_fleet.json
data/temporal_nanxcv.json
data/temporal_ciso_fleet.json
scripts/snn_train_nanxcv.py
scripts/temporal_common.py
scripts/ingest_mde_baseline.py
design/methodology_temporal_hypergraph_snn.tex

HyperMesh — Endpoint Behavioural Anomaly Detection

End-to-End Solution Brief for the CISO

Temporal Hypergraph + Liquid State Machine (LSM/SNN) + One-Class SVM Zero-shot detection of attacker behaviour in endpoint telemetry — learned from normal activity alone.

One-line thesis. We turn raw Microsoft Defender for Endpoint (MDE) logs into a hypergraph of co-occurring behaviour, give a fixed neural reservoir a fading memory of recent activity, and flag windows that don’t look like the machine’s own normal — without ever training on a single attack label. On the validation host this flags attacker windows at 90% precision while learning only from the clean week.

This document is the full technical and business narrative, in the order you can present it: Data → Ingestion → Hypergraph formation → Temporal features → Normalisation → LSM reservoir → One-Class SVM readout → Results → Limitations → Roadmap. Every figure here is reproduced from the live artefacts in this repository (sources cited inline).

0. Executive summary (the opening slide)


Problem	SIEM/EDR rules catch known signatures. Sophisticated intrusions show up as a slow shift in patterns of co-occurring behaviour across days — no single threshold ever trips.
Approach	Model each endpoint event as a hyperedge (machine + process + account + IP, tagged with a behaviour “formation”). Summarise behaviour in 5-minute windows. A Liquid State Machine gives those windows temporal memory. A One-Class SVM learns the boundary of normal and flags anything outside it.
Key result (NANXCV host, zero-shot)	Precision 0.90, Recall 0.54, F1 0.67, ROC-AUC 0.74, PR-AUC 0.83 — trained on the clean week only, evaluated across 922 windows.
Why it matters	When the model fires, it is right 9 times out of 10 — the scarce resource in a SOC is analyst attention, and a 90%-precision zero-shot signal is directly actionable.
Honesty line	Recall is ~54% — this is an additive analyst-priority signal, not a replacement for EDR. We have not yet benchmarked against a commercial EDR/UEBA, and the “attack window” is defined by the CISO team’s operational calendar, not per-event ground truth. We say so on the slide.

1. The problem & threat model

Traditional SIEM/EDR pipelines operate on flat event rows. Each MDE timeline row carries strings — machine, process, account, network artefact — but no native primitive that says “these four identities acted together at this instant.” That relational fact is exactly where modern intrusions live.

Rule-based detection excels when a signature exists. It is blind to novel tradecraft and to “living off the land,” where every individual action looks legitimate.
Lateral movement, C2 beaconing, staging, and dwell typically manifest as a gradual shift in the distribution of co-occurring behaviour over hours or days — never as one counter crossing one line.

Our structural bets:

Hypergraph ingestion — represent each event as a hyperedge over its participating identities plus a behaviour formation label. This preserves the n-way simultaneity a flat row destroys, and (unlike a normal graph) without flattening a 4-way co-occurrence into six lossy pairwise edges.
Temporal memory — a high-dimensional reservoir state summarises recent windows, so slow drifts and bursts become separable from normal in state space.
Learn normal, not attack — the production detector (One-Class SVM) is trained only on baseline behaviour. It needs no attack labels, so it generalises to threats never seen before. This is the “zero-shot” property.

2. The end-to-end pipeline (architecture)

 ┌──────────────┐   ┌───────────────┐   ┌──────────────────┐   ┌─────────────────┐
 │  MDE Events  │   │  Ingestion     │   │  Temporal HG     │   │  5-min windows  │
 │  Log (CSV/   │──▶│  → HyperMeshDB │──▶│  hyperedges:     │──▶│  16-dim feature │
 │  XLSX)       │   │  entity map +  │   │  (ts, members,   │   │  vector / window│
 │  per host    │   │  formations    │   │  formation, wt)  │   │                 │
 └──────────────┘   └───────────────┘   └──────────────────┘   └────────┬────────┘
   Stage 1            Stage 2              Stage 3                Stage 4 │
                                                                         ▼
 ┌──────────────────┐   ┌─────────────────────────┐   ┌──────────────────────────┐
 │  Results & alarms │   │  Readout                │   │  LSM Reservoir           │
 │  per-window score │◀──│  • One-Class SVM (prod) │◀──│  500 LIF neurons, fixed  │
 │  + ROC/PR + drift │   │  • LogReg (diagnostic)  │   │  recurrent weights,      │
 │                   │   │  on PCA(50) states      │   │  fading memory ≈ 17 win  │
 └──────────────────┘   └─────────────────────────┘   └──────────────────────────┘
   Stage 7              Stage 6                         Stage 5
                                          (z-score normalise on baseline-only first)

Code map (this repository):

Stage	Artefact
Ingestion	`scripts/ingest_mde_baseline.py`, `scripts/ingest_mde_fleet.py`, `hypermeshdb/ingest/strategies/mde_baseline.py`, `hypermeshdb/connectors/mde.py`
HG + temporal features	`scripts/temporal_common.py`, `scripts/temporal_analysis_nanxcv.py`, `scripts/temporal_analysis_fleet.py`
LSM + readouts	`scripts/snn_train_nanxcv.py`
Results	`data/snn_results_nanxcv.json`, `data/snn_results_ciso_fleet.json`, `data/temporal_*.json`
Deck / UI	`data/HyperMesh_CISO_LSM_Deck.pptx`, `client/src/pages/SnnDashboard.tsx`, `design/snn-hypergraph-*deck.html`
Methodology paper	`design/methodology_temporal_hypergraph_snn.tex`

3. Stage 1 — The data

Source. Microsoft Defender for Endpoint (MDE) timeline exports from the LTIMindTree CISO team (February 2026). Three real hosts:

Host	Role	Baseline window	Notes
NANXCV (`nanxcv00f89340g`)	Primary victim host	Feb 9–13, 2026	Cleanest baseline (0% simulation traffic) — the validation host
AZRPREPW (`AZRCIPREPWCYMLT`)	Windows VM	Feb 7–12, 2026	~14% Cymulate simulation traffic excluded
AZRPREPL (`AZRCIPREPLCYMLT`)	Linux VM	Feb 6–12, 2026	~39% Cymulate simulation traffic excluded

Attack slice. NANXCV, Feb 14–20, 2026 (Sample2.xlsx → SYS_NANXCV), the period the CISO team designated as the attack/exercise window.

Two-phase study design (identical method, different scope):

Phase 1 — single host (NANXCV). Two separate tables (NANXCV_BASELINE, SYS_NANXCV), entity IDs scoped per table. Cleanest behavioural contrast → the method-validation experiment.
Phase 2 — fleet (three hosts). All hosts merged into one table (CISO_FLEET_MDE) with a global, deduplicated entity map, so a shared IP / account / file-hash resolves to the same node across machines — enabling cross-host structural links. This is the scale-realism experiment.

Honesty note to state up front. “Attack” is labelled by the CISO team’s operational calendar, not per-event kill-chain ground truth. Cymulate simulation traffic was excluded from baselines by team attestation. So our zero-shot scoring is principled; the supervised readout (later) is diagnostic only.

4. Stage 2 — Ingestion into HyperMeshDB

scripts/ingest_mde_baseline.py drives MDEBaselineStrategy, which:

Parses each MDE row and resolves its identities — machine, process (by SHA1 hash), account, and optionally remote IP — into stable integer node IDs via a persisted entity map (data/CISO_FLEET_MDE_entity_map.json, 3,808 typed entities in the fleet run). Example entries: machine:nanxcv00f89340g, process:eb42621654… (SHA1), account:nt authority\system.
Excludes Cymulate simulation traffic from baseline tables (so “normal” is genuinely normal).
Aggregates rows into hyperedges on a configurable window (default 5 min; --window 0 = per-row), writing native HyperMeshDB hyperedge tables.
Emits ingest stats: hyperedge count, entity count, formation vocabulary, entity-map path.

Ingestion magnitudes (from the run artefacts):

	Phase 1 (NANXCV)	Phase 2 (Fleet)
Baseline hyperedges	14,034	—
Attack hyperedges	13,341	—
Total hyperedges	—	39,016 (from ~374k raw rows after Cymulate exclusion)
Unique baseline entities	1,775	2,993
Novel entities in attack	53 (~3.0%)	815 (vs. broader pooled baseline)

(Phase-1 hyperedge totals are from the methodology briefing; entity/bucket counts below are read directly from data/temporal_nanxcv.json and data/temporal_ciso_fleet.json.)

5. Stage 3 — Hypergraph formation

Each MDE event becomes exactly one hyperedge:

 e = ( event_ts, members, formation, weight )

event_ts — Unix epoch seconds.
members — list of integer node IDs (machine, process, account, [IP]) acting together in that event.
formation — one of 11 behaviour categories (the compact behavioural vocabulary): PROCESS_EXEC, NETWORK_CONN, FILE_OPS, REGISTRY_OPS, DNS_LOOKUP, HTTP_TRAFFIC, IPC_PIPE, MODULE_LOAD, LOGON, SCRIPT_EXEC, GENERIC.
weight — 1.0 in the prototype (reserved for future confidence/volume weighting).

Why a hypergraph, not a graph? A single “process X, run by account Y, on machine Z, talking to IP W” event is one 4-way fact. A binary graph must shatter it into 6 pairwise edges, losing the simultaneity — the very thing that distinguishes a coordinated attacker action from coincidence. Oversized member sets (>64 endpoints in one edge) are split with the machine node as an anchor — an implementation guardrail, not a statistical claim.

This is the HyperMesh differentiator: n-ary behaviour preserved as a first-class primitive, queryable in the Graph Explorer / Lens, and feedable directly into temporal analytics.

6. Stage 4 — Temporal aggregation & the 16-dim feature vector

The timeline is partitioned into contiguous 5-minute windows (temporal_common.py). For each window t we build a 16-dimensional feature vector u(t), in this exact order:

Aggregate statistics (5):

hedge_count — hyperedge arrivals in the window
entity_count — distinct active node IDs
novelty_rate — fraction of entities never seen in the baseline set (drift signal)
mean_members — average hyperedge cardinality (how “wide” behaviour is)
formation_entropy — Shannon entropy over the 11 formations (behavioural diversity)

Formation histogram (11): 6–16. raw counts form_PROCESS_EXEC … form_GENERIC

Window counts (from the live exports):

	Phase 1 (NANXCV)	Phase 2 (Fleet)
Total windows	952 (457 baseline + 495 attack)	1,614 (1,119 baseline + 495 attack)
Windows scored (post-washout)	922 (427 baseline + 495 attack)	1,584

Interpretability bonus — changepoints. A separate analysis runs binary-segmentation changepoint detection on formation_entropy. Volumetric features (hedge_count, entity_count) produced hundreds of noisy changepoints driven by business rhythm; entropy yielded only 4 statistically flagged breakpoints, aligned with attack-phase dynamics. The sharpest: Feb 17, 09:15 UTC, entropy collapsing to 0.65 from a running mean near 2.34 — behaviour concentrating into NETWORK_CONN + PROCESS_EXEC, consistent with C2/lateral-movement profiles (stated qualitatively; we do not assert a labelled MITRE mapping). All 16 channels still feed the reservoir; entropy is the human-readable pivot.

7. Stage 5 — Normalisation that mimics deployment

Stack raw features into X_raw ∈ ℝ^(T×16). A z-score normaliser is fit on baseline windows only (y=0):

 μ_j = mean(X_raw[:,j] | baseline)        σ_j = std(X_raw[:,j] | baseline)   (σ←1 if σ<1e-8)
 u_tj = (X_raw[tj] − μ_j) / σ_j

This is deliberate: scaling statistics come from presumed-normal history, never from a future distribution that contains attacks. It is the first of several places where we hold the line on “the model only ever learns from normal.”

8. Stage 6 — The Liquid State Machine reservoir

The LSM (an Echo State Network with leaky-integrator neurons — “spiking/liquid” in neuromorphic terms) maintains a hidden state x(t) ∈ ℝ^N that evolves as:

 x(t) = (1 − α)·x(t−1) + α·tanh( W·x(t−1) + W_in·u(t) + ε_t )

W_in ∈ ℝ^(N×16) — random input projection, scaled by input_scaling.
W ∈ ℝ^(N×N) — sparse random recurrent matrix, rescaled so its spectral radius ρ < 1 (the echo-state property — guarantees a fading, stable memory). The script measures ρ after construction and asserts it’s < 1.0.
α (leak_rate) — membrane leak; controls how fast memory fades.
ε_t — tiny Gaussian noise (1e-4) for regularisation.

Hyperparameters (defaults, tuned for this data scale):

Param	Value	Meaning
`n_reservoir` (N)	500	reservoir neurons
`spectral_radius` (ρ)	0.95	echo-state stability (<1)
`leak_rate` (α)	0.30	membrane leak → memory time-constant 1/α ≈ 3 windows (~15 min); effective context is longer via recurrence (design notes cite ~15–30 windows)
`sparsity`	0.10	10% non-zero recurrent connections
`input_scaling`	0.30	input weight scale
`washout`	30	warm-up windows discarded before scoring
`seed`	7	reproducibility

Why an LSM — the three reasons that matter to a CISO:

Fading memory. x(t) encodes not just this window but the recent evolution of behaviour (leak time-constant ≈ 3 windows; effective context extends further via recurrence). A single anomalous window bends the whole state trajectory — slow drifts become visible.
Recurrent weights are FIXED — no backprop. Only the lightweight readout is trained. With only ~10³ windows, this avoids the overfitting that plagues fully-trained deep RNNs on small security datasets, and it’s fast and deterministic.
Expansion. Projecting 16-dim behaviour into 500-dim reservoir space makes attack-driven deviations accumulate and become linearly separable by a simple boundary.

9. Stage 7 — Readout models

Reservoir states (post-washout) → PCA to 50 components, fit on baseline rows only → StandardScaler (also baseline-fit). Both readouts consume the same 50-dim, baseline-centred coordinates.

9a. Mode A — One-Class SVM (the production detector)

 OneClassSVM(kernel="rbf", gamma="scale", nu=0.05)   # trained ONLY on baseline states
 anomaly_score = −decision_function(state)            # higher = more anomalous

Trained exclusively on baseline behaviour — never sees an attack label. This is the deployable, zero-shot detector.
nu = 0.05 sets the baseline false-alarm budget to ~5%; the observed rate is 29/427 = 6.8% of baseline windows (Phase 1) — close to budget and expected by design, not a calibration failure. (Fleet: 60/1,089 = 5.5%.)
This is the model behind the headline numbers.

9b. Mode B — Logistic Regression (diagnostic only)

A logistic readout on the same PCA(50) reservoir states, under a temporal 60/40 train/test split that does use attack labels. It is explicitly NOT a deployment simulation — it leaks future knowledge. It is reported only as a comparison point. Note: in Phase 1 this diagnostic readout actually scored lower (ROC-AUC 0.55) than the zero-shot OC-SVM (0.74) — we attribute this to the temporal split (training on the first 60% of attack windows, testing on the last 40%), not to feature quality; in Phase 2 the same readout reaches ROC-AUC 0.91, showing the separating signal is present. Never present Mode B as operational performance. (Caveat: the code contains no raw-16-dimension classifier, so we make no claim about raw-feature separability — both readouts consume reservoir states.)

10. Results

10a. Phase 1 — NANXCV (single host, the validation experiment)

One-Class SVM (zero-shot, production-realistic) — 922 windows scored:

Metric	Value
Precision	0.902
Recall	0.537
F1	0.673
ROC-AUC	0.744
PR-AUC	0.831
Accuracy	0.720

Confusion matrix: TP 266, FP 29, FN 229, TN 398 (baseline = 427, attack = 495). Reservoir drift: attack windows sit +13.6% further from the baseline centroid (L2) than baseline windows — geometric confirmation the reservoir separates the phases.

The headline for the slide: “When it fires, it is right 90% of the time, having learned only from the clean week.”

Timeline narrative (from per-window scores): a Feb 16 burst dominated by NETWORK_CONN peaks at normalised score 1.000; the Feb 17 09:15 UTC entropy collapse to 0.65; a quiet Feb 18 dwell with few alarms; renewed activity Feb 19–20.

10b. Phase 2 — Fleet (three hosts, same method, no re-tuning)

Metric	One-Class SVM (zero-shot)	Logistic (diagnostic only)
Precision	0.731	0.561
Recall	0.329	0.980
F1	0.454	0.713
ROC-AUC	0.713	0.908
PR-AUC	0.613	0.809

1,584 windows scored. Top anomalous cluster again anchored on the Feb 16 afternoon NETWORK_CONN burst.

Be transparent about the fleet number. Zero-shot recall drops to 33% and the centroid separation is slightly negative (−1.1%) — pooling three heterogeneous hosts into one baseline widens the “normal” envelope, so more attack windows hide inside it, with no hyperparameter re-tuning for the fleet. Phase 1 and Phase 2 are not directly comparable (“better/worse”) because the boundary is fit to different baseline distributions. The supervised AUC of 0.91 shows the signal is there — the zero-shot boundary just needs per-segment tuning to recover it. This is a roadmap item, not a hidden flaw.

11. Limitations & explicit non-claims (put these on a slide)

Recall ~50% (single host) — many attack windows that mimic baseline statistics stay inside the boundary, especially during dwell. This is an analyst-priority signal layered on top of EDR, not a replacement.
No MITRE technique IDs, no attribution, no automated forensic narrative. We flag behaviourally anomalous windows; a human investigates.
“Attack phase” = operational calendar, not per-event ground truth. Metrics inherit that labelling assumption.
No commercial-EDR / UEBA head-to-head benchmark yet.
Fleet hyperparameters not jointly re-tuned; cross-phase comparison requires care.
Supervised (logistic) results are diagnostic only — they use future labels and must not be read as deployment performance.
Single-tenant validation on one CISO team’s February-2026 data; broader generalisation is unproven.
Attack-window provenance (red-team vs. simulated vs. production) should be confirmed with the data owner before external use.

12. Roadmap & the ask

What we’d build next, in priority order:

Lift recall — per-host / per-segment baseline boundaries, adaptive nu, richer features (sequence-of-formations, inter-arrival timing) so dwell-phase attacks surface.
Fleet-scale zero-shot — per-host normalisation + hierarchical reservoirs to recover the signal the supervised AUC (0.91) proves is present.
Head-to-head benchmark vs. a commercial EDR/UEBA on the same telemetry — the credibility milestone.
Live SOC shadow-mode pilot — run alongside the existing stack, measure analyst-validated precision/recall and time-to-detect on real traffic.
Explainability surface — per-alarm “why” (dominant formations, novel entities, contributing hyperedges) wired into the SnnDashboard UI.

The ask: access to a labelled (or red-team-tagged) production window + permission for a shadow-mode pilot, so we can report analyst-validated precision/recall and a fair EDR benchmark.

13. Joint next-stage agenda — what we need from the CISO & team

Frame for the meeting: “To take this from a validated prototype to an operational pilot, here is what we’d need from you.” Each question maps to closing a gap already named in §11 (Limitations). Position the limitations as a joint roadmap, not weaknesses.

The 3 asks that decide whether there is a Phase 2 (lead with these)

Ground truth for Feb 14–20 — “Can we get analyst-confirmed true positives for that window?” → converts our calendar-labelled result into a defensible evaluation.
Shadow-mode pilot — “Will you sponsor a 4–6 week shadow-mode run on live telemetry?” → the only way to prove operational precision/recall.
EDR benchmark — “Can we run head-to-head against your current EDR/UEBA on the same data?” → the credibility milestone.

Everything below feeds these three.

13a. Data — tighten “normal” and get real labels

Ask	What it unblocks
Provenance of the attack window — red-team, Cymulate simulation, or real production incident?	We currently cannot say on the slide; determines whether 90% precision is against real or simulated adversary behaviour.
Per-event ground truth / kill chain — validated TPs, timestamps, MITRE techniques	Replaces operational-calendar labelling; lets us report real recall.
Longer & broader baseline — weeks–months, more hosts, server + workstation mix	Our “normal” is one week / one host; more baseline = tighter boundary, fewer false alarms, better recall.
Cymulate intent — detect simulated attacks, or exclude as noise?	We exclude it today; if they want it detected it becomes a free labelled positive set.
Full MDE schema access — DeviceProcess / DeviceNetwork / DeviceLogon fields, not just the timeline export	Richer features (sequence-of-formations, inter-arrival timing) that lift recall.
Data governance — PII / residency constraints, standing feed vs. one-off CSV exports	Decides whether deployment is legally possible and whether we get live data.

13b. Model — define the target before tuning

Ask	What it unblocks
Desired operating point — precision-first or recall-first?	Directly sets the OC-SVM threshold / `nu`; we can’t tune without it.
Alert budget — alerts/day/analyst acceptable?	Defines the false-alarm ceiling; maps our 6.8% baseline-window rate to their volume tolerance.
What do they miss today? — detections the current stack fails on	Anchors value to a real gap, not abstract “anomalies.”
Explainability requirement — need a per-alarm “why” (dominant formations, novel entities, contributing hyperedges)?	Decides whether we build the explainability surface before or after the pilot.
Per-host vs. fleet modelling — one model or per-asset baselines?	The fix for the fleet recall drop (33%); supervised AUC 0.91 proves the signal is there, per-host baselines recover it.
Benchmark definition — what does “better than our EDR” mean numerically?	Agree the scoreboard before we play, not after.

13c. Deployment — prove it in production, on their rails

Ask	What it unblocks
Shadow-mode pilot — 4–6 weeks alongside production, analysts validate each alarm	The headline ask; yields analyst-validated precision/recall and time-to-detect.
Pilot exit criteria — agree now what result triggers full adoption	Prevents a “great demo, no decision” outcome.
Live telemetry feed — Defender API / Event Hub / Sentinel stream vs. batch exports	Determines architecture and whether 5-min-window cadence is feasible.
Alert destination — SIEM (Sentinel?), SOAR, ticketing; format/API?	Defines integration scope and effort.
Latency SLA — is 5-minute batch acceptable, or near-real-time needed?	Sets architecture and cost.
Where the model runs — their tenant/cloud, on-prem, or our environment; data egress permitted?	Often the hardest blocker; surface it early, not after the pilot is agreed.
Ownership — who triages and validates alarms during the pilot?	A pilot with no analyst owner produces no validated labels and fails silently.

Appendix A — Reproducibility

# 1. Ingest baseline + attack hyperedge tables
.venv/bin/python scripts/ingest_mde_baseline.py            # NANXCV (Phase 1)
.venv/bin/python scripts/ingest_mde_fleet.py               # CISO_FLEET_MDE (Phase 2)

# 2. Build 5-min temporal features
.venv/bin/python scripts/temporal_analysis_nanxcv.py       # → data/temporal_nanxcv.json
.venv/bin/python scripts/temporal_analysis_fleet.py        # → data/temporal_ciso_fleet.json

# 3. Train LSM + readouts, export results
.venv/bin/python scripts/snn_train_nanxcv.py \
    --input data/temporal_nanxcv.json  --output data/snn_results_nanxcv.json
.venv/bin/python scripts/snn_train_nanxcv.py \
    --input data/temporal_ciso_fleet.json --output data/snn_results_ciso_fleet.json

Defaults (in-code): n_reservoir=500, sparsity=0.10, spectral_radius=0.95, input_scaling=0.30, leak_rate=0.30, washout=30, seed=7. CLI overrides exist for all of them.

Appendix B — Glossary (for non-specialist stakeholders)

Hyperedge — a single relationship linking more than two things at once (here: machine + process + account + IP acting together). A normal graph edge links only two.
Formation — the behavioural category of an event (e.g. NETWORK_CONN, PROCESS_EXEC).
Liquid State Machine (LSM) / reservoir — a fixed (untrained) recurrent neural network that gives the model a short fading memory of recent behaviour. Cheap, fast, hard to overfit.
Echo-state property (spectral radius < 1) — the mathematical guarantee that the reservoir’s memory fades smoothly rather than blowing up.
One-Class SVM — a model that learns the shape of “normal” from normal data only, then flags anything outside that shape. The engine of zero-shot detection.
Zero-shot — detecting attacks the model was never trained on, by learning only what normal looks like.
Washout — initial windows discarded while the reservoir “warms up.”

Appendix C — Anticipated CISO questions

Question	Answer
”How is this different from our EDR?”	EDR fires on known signatures/rules. This fires on behavioural deviation from this machine’s own normal — catching novel/living-off-the-land activity rules miss. It’s additive, not a replacement.
”Why should I trust a 54% recall?”	You trust the precision (90%): when it fires it’s almost always real, so it’s a high-quality analyst-priority queue. Recall improves on the roadmap; it’s a net-new signal on top of full EDR coverage.
”Did you train on the attack?”	No. The production detector (One-Class SVM) trains only on the clean baseline week. The one model that uses attack labels (logistic) is clearly flagged diagnostic-only and never presented as operational.
”What about false alarms?”	~6.8% of baseline windows fire (29/427) against a deliberate, tunable `nu=0.05` (~5%) budget. At 5-min windows that’s a manageable, prioritisable volume.
”Does it scale to my whole fleet?”	Single-host is strong; pooled-fleet zero-shot recall drops (33%) without per-host tuning, though supervised AUC (0.91) shows the signal is there. Per-host baselines are the next build.
”Can my analysts act on an alarm?”	Each alarm carries timestamp, anomaly score, dominant formation, novel-entity count, and reservoir drift — enough to triage. Full per-alarm explainability is on the roadmap.

All quantitative claims in this document are reproduced from repository artefacts: data/snn_results_nanxcv.json, data/snn_results_ciso_fleet.json, data/temporal_nanxcv.json, data/temporal_ciso_fleet.json, and the implementation in scripts/snn_train_nanxcv.py / scripts/temporal_common.py / scripts/ingest_mde_baseline.py. Phase-1 hyperedge totals and the entropy-changepoint narrative are from design/methodology_temporal_hypergraph_snn.tex.