Skip to content

HyperMesh CISO Solution Brief

Endpoint Behavioural Anomaly Detection With Temporal Hypergraphs and Liquid State Machines

Section titled “Endpoint Behavioural Anomaly Detection With Temporal Hypergraphs and Liquid State Machines”

Audience: CISO, SOC leadership, security architecture, detection engineering
Use case: Zero-shot behavioural anomaly detection from Microsoft Defender for Endpoint telemetry
Core stack: HyperMeshDB + temporal hypergraph features + Liquid State Machine reservoir + One-Class SVM
Primary result: 90.2% precision on the NANXCV validation host while training only on clean baseline activity


Modern endpoint attacks often do not announce themselves as one obvious bad event. They emerge as a gradual shift in how machines, processes, accounts, and network artefacts co-occur over time. Traditional SIEM and EDR rules are excellent for known signatures, but they struggle when each individual action looks legitimate in isolation.

HyperMesh addresses this gap by modelling endpoint telemetry as a temporal hypergraph. Each event becomes a first-class n-way behavioural fact linking the machine, process, account, and IP involved at that moment. A Liquid State Machine then gives the system a short fading memory of recent behaviour, and a One-Class SVM learns the boundary of normal activity from clean baseline data only.

The key operational claim is simple:

HyperMesh can learn what normal endpoint behaviour looks like from Defender telemetry and flag anomalous behaviour windows without training on attack labels.

On the NANXCV validation host, the zero-shot detector achieved:

MetricResult
Precision0.902
Recall0.537
F10.673
ROC-AUC0.744
PR-AUC0.831
Accuracy0.720

This means that when the detector fires on this validation run, it is right about 9 times out of 10. That is the most important number for a SOC, because analyst attention is the scarce resource.

This is not positioned as an EDR replacement. It is an additive analyst-priority signal that surfaces behaviourally unusual windows for investigation.


Security teams already collect rich endpoint telemetry, but most pipelines still treat it as flat rows. A row may contain a machine, process, account, IP address, timestamp, and action type, yet the detection system often loses the fact that these identities acted together.

Attackers exploit this gap. Living-off-the-land behaviour, lateral movement, staging, command-and-control, and dwell time often appear as changes in patterns of co-occurrence, not as single events that cross a static threshold.

HyperMesh turns endpoint telemetry into a temporal hypergraph:

machine + process + account + IP + behaviour category + timestamp

Each event is stored as a hyperedge, preserving the n-way relationship directly. Temporal analytics then summarize behaviour in five-minute windows, and the reservoir model tracks how those windows evolve.

The model is trained on normal activity only. It does not need examples of every attack technique. This makes it suitable for a zero-shot detection setting where future attacker behaviour may not match yesterday’s signatures.


┌────────────────────┐
│ Microsoft Defender │
│ Endpoint telemetry │
└─────────┬──────────┘
┌────────────────────┐
│ HyperMesh ingest │
│ MDE rows → entities│
│ and hyperedges │
└─────────┬──────────┘
┌────────────────────┐
│ Temporal hypergraph│
│ machine + process │
│ account + IP │
└─────────┬──────────┘
┌────────────────────┐
│ 5-minute features │
│ 16-dim vector per │
│ time window │
└─────────┬──────────┘
┌────────────────────┐
│ LSM reservoir │
│ 500 recurrent │
│ memory neurons │
└─────────┬──────────┘
┌────────────────────┐
│ One-Class SVM │
│ learns normal only │
└─────────┬──────────┘
┌────────────────────┐
│ Anomaly timeline │
│ score + evidence │
│ for SOC triage │
└────────────────────┘

Repository artefacts:

LayerImplementation
MDE ingestionscripts/ingest_mde_baseline.py, scripts/ingest_mde_fleet.py, hypermeshdb/ingest/strategies/mde_baseline.py, hypermeshdb/connectors/mde.py
Temporal featuresscripts/temporal_common.py, scripts/temporal_analysis_nanxcv.py, scripts/temporal_analysis_fleet.py
Reservoir and readoutsscripts/snn_train_nanxcv.py
Resultsdata/snn_results_nanxcv.json, data/snn_results_ciso_fleet.json, data/temporal_nanxcv.json, data/temporal_ciso_fleet.json
Presentation assetsdata/HyperMesh_CISO_LSM_Deck.pptx, client/src/pages/SnnDashboard.tsx, design/snn-hypergraph-*deck.html
Methodologydesign/methodology_temporal_hypergraph_snn.tex

The study uses Microsoft Defender for Endpoint timeline exports provided by the LTIMindTree CISO team for February 2026.

HostRoleBaseline WindowNotes
NANXCV (nanxcv00f89340g)Primary victim hostFeb 9-13, 2026Cleanest baseline; 0% simulation traffic; validation host
AZRPREPW (AZRCIPREPWCYMLT)Windows VMFeb 7-12, 2026Cymulate simulation traffic excluded
AZRPREPL (AZRCIPREPLCYMLT)Linux VMFeb 6-12, 2026Cymulate simulation traffic excluded

The attack/exercise slice is NANXCV, Feb 14-20, 2026, stored as SYS_NANXCV.

Two study modes were run:

PhaseScopePurpose
Phase 1Single host, NANXCVMethod validation with the cleanest baseline and clearest behavioural contrast
Phase 2Three-host fleetScale-realism test using a global entity map across machines

Important caveat: the attack window is labelled by the CISO team’s operational calendar, not by per-event kill-chain ground truth. That is sufficient for prototype validation, but a production evaluation needs analyst-confirmed labels.


The MDE ingestion strategy converts raw endpoint rows into typed entities and hyperedges.

For each event, HyperMesh resolves:

Entity TypeSource Field
MachineComputer Name or Machine Id
ProcessInitiating Process SHA1, with filename used for display
AccountInitiating Process Account Domain + Initiating Process Account Name
IPRemote IP, when present
FormationMapped from Action Type

Each row becomes one behavioural hyperedge:

e = (event_ts, members, formation, weight)

Where:

FieldMeaning
event_tsUnix timestamp for the endpoint event
membersInteger entity IDs for machine, process, account, and optional IP
formationCompact behavioural category
weightEvent count or aggregated event volume

The formation vocabulary is:

PROCESS_EXEC
NETWORK_CONN
FILE_OPS
REGISTRY_OPS
DNS_LOOKUP
HTTP_TRAFFIC
IPC_PIPE
MODULE_LOAD
LOGON
SCRIPT_EXEC
GENERIC

Why this matters: a normal graph would split one 4-way behaviour into multiple pairwise edges. A hypergraph preserves the complete behavioural fact as one object.


MeasureNANXCV Phase 1Fleet Phase 2
Baseline hyperedges14,034-
Attack hyperedges13,341-
Total hyperedges-39,016
Approx. raw rows after Cymulate exclusion-374,000
Unique baseline entities1,7752,993
Novel entities in attack53815

The fleet run uses a global, deduplicated entity map so shared IPs, accounts, and file hashes resolve to the same node across hosts.


HyperMesh partitions the telemetry into contiguous five-minute windows. Each window becomes a 16-dimensional feature vector.

FeatureDescription
hedge_countNumber of hyperedges in the window
entity_countNumber of distinct active entities
novelty_rateFraction of entities not seen in baseline
mean_membersAverage hyperedge cardinality
formation_entropyBehavioural diversity across formation types

The remaining 11 channels are raw counts for each formation category.

MeasureNANXCV Phase 1Fleet Phase 2
Total windows9521,614
Baseline windows4571,119
Attack windows495495
Windows scored after washout9221,584

An important interpretability signal came from formation_entropy: volumetric features generated many noisy changepoints, but entropy produced only a small number of statistically meaningful shifts. The sharpest was Feb 17, 09:15 UTC, where entropy collapsed to 0.65 from a running mean near 2.34, indicating concentration into a narrower behavioural mix dominated by network and process execution activity.


The model is deliberately normalised using baseline windows only.

u_tj = (X_tj - mean_j_baseline) / std_j_baseline

This mimics production deployment. In a real environment, the model would learn from an assumed-normal historical period and then score future activity. No future attack distribution is used to scale the features.


The reservoir is a fixed recurrent neural system that gives each five-minute window a fading memory of recent activity.

x(t) = (1 - α) x(t-1) + α tanh(W x(t-1) + W_in u(t) + ε_t)
ParameterValueMeaning
n_reservoir500Number of reservoir neurons
spectral_radius0.95Keeps the reservoir stable and fading
leak_rate0.30Controls memory decay
sparsity0.10Sparse recurrent connectivity
input_scaling0.30Input projection scale
washout30Initial windows discarded before scoring
seed7Reproducibility

The recurrent weights are fixed. There is no deep backpropagation. Only the readout is trained. This is important for security telemetry because the number of labelled attack windows is usually small and overfitting risk is high.


The deployable detector is a One-Class SVM trained only on baseline reservoir states.

OneClassSVM(kernel="rbf", gamma="scale", nu=0.05)
anomaly_score = -decision_function(state)

nu = 0.05 sets an approximate baseline false-alarm budget of 5%. In the NANXCV run, the observed baseline alarm rate was 29 / 427 = 6.8%, close to the intended operating point.

A logistic readout was also tested on the same reservoir states. It uses labels and is therefore not a production simulation. It is included only to assess whether separability exists in the reservoir representation.

The key message: operational claims should use the One-Class SVM results, not the supervised diagnostic readout.


One-Class SVM, trained only on clean baseline windows:

MetricValue
Precision0.902
Recall0.537
F10.673
ROC-AUC0.744
PR-AUC0.831
Accuracy0.720

Confusion matrix:

Predicted NormalPredicted Anomalous
Actual Baseline39829
Actual Attack229266

Interpretation:

  • The detector is highly precise: when it fires, it is usually meaningful.
  • Recall is moderate: some attack windows, especially dwell-like behaviour, remain inside the normal boundary.
  • Attack windows sit 13.6% farther from the baseline reservoir centroid than baseline windows, confirming a measurable state-space drift.

Headline:

When HyperMesh fires on the NANXCV validation host, it is right 90% of the time, without training on attack labels.

MetricOne-Class SVMLogistic Diagnostic
Precision0.7310.561
Recall0.3290.980
F10.4540.713
ROC-AUC0.7130.908
PR-AUC0.6130.809

Interpretation:

  • The zero-shot fleet model remains precision-oriented but recall drops.
  • Pooling heterogeneous hosts widens the normal envelope.
  • The supervised diagnostic AUC of 0.908 shows the signal exists, but the zero-shot boundary needs per-host or per-segment baselines to recover recall.

This is a roadmap item, not a hidden flaw.


Each anomaly window can be surfaced with:

EvidenceWhy It Helps
TimestampPlaces the alert on the incident timeline
Anomaly scorePrioritises the analyst queue
Dominant formationsShows whether the window is network-heavy, process-heavy, etc.
Novel entity countHighlights new accounts, processes, hashes, or IPs
Contributing hyperedgesLinks back to raw Defender evidence
Reservoir driftQuantifies departure from baseline behaviour

This makes the model investigable rather than a black-box score.


These should be stated explicitly in any CISO presentation.

  1. Recall is not yet high enough to replace existing controls. The NANXCV result is precision-first and should be used as an analyst-priority signal layered on top of EDR.
  2. The attack label is calendar-based. The Feb 14-20 window was designated by the CISO team, but per-event kill-chain labels are not yet available.
  3. No MITRE technique attribution is claimed. The system flags behaviourally anomalous windows; analysts determine root cause.
  4. No commercial EDR or UEBA head-to-head benchmark has been run yet.
  5. Fleet modelling needs per-host or per-segment boundaries. Pooled baselines are too broad for best zero-shot recall.
  6. The logistic model is diagnostic only. It uses labels and should not be presented as deployable performance.
  7. Validation is single-tenant. Broader generalisation requires additional environments.

PriorityWorkstreamOutcome
1Per-host and per-segment baselinesHigher recall without sacrificing precision
2Adaptive operating pointTune alert budget to SOC capacity
3Richer temporal featuresCapture dwell, inter-arrival timing, and formation sequences
4Explainability UIPer-alert “why” view in SnnDashboard
5EDR / UEBA benchmarkCredible side-by-side comparison on the same telemetry
6Shadow-mode pilotAnalyst-validated precision, recall, and time-to-detect

To move from prototype to operational validation, ask the CISO team for:

  1. Ground truth for Feb 14-20. Analyst-confirmed true positives, timestamps, and known activity phases.
  2. A four-to-six-week shadow-mode pilot. HyperMesh runs alongside the existing stack; analysts validate alarms.
  3. A same-data benchmark against the current EDR or UEBA. Agree the scoreboard before the pilot begins.
  4. A longer clean baseline. Weeks or months of normal Defender telemetry across a representative host mix.
  5. A live telemetry path. Defender API, Sentinel, Event Hub, or batch export cadence.
  6. A target alert destination. SIEM, SOAR, ticketing, or dashboard.

The pilot success criteria should be agreed up front:

CriterionExample Target
PrecisionAnalyst-validated precision above agreed SOC threshold
Alert volumeWithin analyst capacity per day
Time-to-detectEarlier or complementary detection versus current stack
ExplainabilityAnalyst can trace each alert back to contributing events
Deployment fitRuns within the customer’s governance and data residency constraints

QuestionAnswer
How is this different from our EDR?EDR detects known signatures and rule conditions. HyperMesh detects deviations from a host’s own behavioural baseline. It is additive, not a replacement.
Did the model train on the attack?No. The production One-Class SVM trains only on baseline windows. The supervised logistic readout is diagnostic only.
Why should we care if recall is around 54%?Because the signal is highly precise. It gives analysts a high-quality priority queue on top of existing controls. Recall improvement is the next engineering target.
What about false positives?The operating point is tunable. In the NANXCV run, the baseline alarm rate was 6.8%, close to the intended 5% budget.
Does it scale to the fleet?Yes, but pooled baselines reduce zero-shot recall. The next step is per-host or per-segment baselines.
Can analysts act on it?Yes. Each alert can carry score, timestamp, dominant formations, novel entities, and contributing hyperedges.
What is needed for production confidence?Ground truth, a shadow-mode pilot, and a head-to-head benchmark against the current stack.

Terminal window
# 1. Ingest baseline and attack hyperedge tables
.venv/bin/python scripts/ingest_mde_baseline.py
.venv/bin/python scripts/ingest_mde_fleet.py
# 2. Build five-minute temporal features
.venv/bin/python scripts/temporal_analysis_nanxcv.py
.venv/bin/python scripts/temporal_analysis_fleet.py
# 3. Train LSM reservoir and readouts
.venv/bin/python scripts/snn_train_nanxcv.py \
--input data/temporal_nanxcv.json \
--output data/snn_results_nanxcv.json
.venv/bin/python scripts/snn_train_nanxcv.py \
--input data/temporal_ciso_fleet.json \
--output data/snn_results_ciso_fleet.json

Default model settings:

n_reservoir=500
sparsity=0.10
spectral_radius=0.95
input_scaling=0.30
leak_rate=0.30
washout=30
seed=7

TermMeaning
HyperedgeA relationship linking more than two entities at once, such as machine + process + account + IP
FormationBehaviour category such as NETWORK_CONN or PROCESS_EXEC
Temporal hypergraphA hypergraph where each hyperedge has a timestamp
Liquid State MachineA recurrent reservoir that gives the model fading memory of recent behaviour
Echo-state propertyStability condition ensuring the reservoir memory fades instead of exploding
One-Class SVMA model that learns normal behaviour only and flags points outside that normal boundary
Zero-shot detectionDetecting attacks without training on examples of those attacks
WashoutInitial windows discarded while the reservoir state stabilises

All quantitative claims are reproduced from repository artefacts:

  • data/snn_results_nanxcv.json
  • data/snn_results_ciso_fleet.json
  • data/temporal_nanxcv.json
  • data/temporal_ciso_fleet.json
  • scripts/snn_train_nanxcv.py
  • scripts/temporal_common.py
  • scripts/ingest_mde_baseline.py
  • design/methodology_temporal_hypergraph_snn.tex

HyperMesh — Endpoint Behavioural Anomaly Detection

Section titled “HyperMesh — Endpoint Behavioural Anomaly Detection”

Temporal Hypergraph + Liquid State Machine (LSM/SNN) + One-Class SVM Zero-shot detection of attacker behaviour in endpoint telemetry — learned from normal activity alone.

One-line thesis. We turn raw Microsoft Defender for Endpoint (MDE) logs into a hypergraph of co-occurring behaviour, give a fixed neural reservoir a fading memory of recent activity, and flag windows that don’t look like the machine’s own normal — without ever training on a single attack label. On the validation host this flags attacker windows at 90% precision while learning only from the clean week.

This document is the full technical and business narrative, in the order you can present it: Data → Ingestion → Hypergraph formation → Temporal features → Normalisation → LSM reservoir → One-Class SVM readout → Results → Limitations → Roadmap. Every figure here is reproduced from the live artefacts in this repository (sources cited inline).


ProblemSIEM/EDR rules catch known signatures. Sophisticated intrusions show up as a slow shift in patterns of co-occurring behaviour across days — no single threshold ever trips.
ApproachModel each endpoint event as a hyperedge (machine + process + account + IP, tagged with a behaviour “formation”). Summarise behaviour in 5-minute windows. A Liquid State Machine gives those windows temporal memory. A One-Class SVM learns the boundary of normal and flags anything outside it.
Key result (NANXCV host, zero-shot)Precision 0.90, Recall 0.54, F1 0.67, ROC-AUC 0.74, PR-AUC 0.83 — trained on the clean week only, evaluated across 922 windows.
Why it mattersWhen the model fires, it is right 9 times out of 10 — the scarce resource in a SOC is analyst attention, and a 90%-precision zero-shot signal is directly actionable.
Honesty lineRecall is ~54% — this is an additive analyst-priority signal, not a replacement for EDR. We have not yet benchmarked against a commercial EDR/UEBA, and the “attack window” is defined by the CISO team’s operational calendar, not per-event ground truth. We say so on the slide.

Traditional SIEM/EDR pipelines operate on flat event rows. Each MDE timeline row carries strings — machine, process, account, network artefact — but no native primitive that says “these four identities acted together at this instant.” That relational fact is exactly where modern intrusions live.

  • Rule-based detection excels when a signature exists. It is blind to novel tradecraft and to “living off the land,” where every individual action looks legitimate.
  • Lateral movement, C2 beaconing, staging, and dwell typically manifest as a gradual shift in the distribution of co-occurring behaviour over hours or days — never as one counter crossing one line.

Our structural bets:

  1. Hypergraph ingestion — represent each event as a hyperedge over its participating identities plus a behaviour formation label. This preserves the n-way simultaneity a flat row destroys, and (unlike a normal graph) without flattening a 4-way co-occurrence into six lossy pairwise edges.
  2. Temporal memory — a high-dimensional reservoir state summarises recent windows, so slow drifts and bursts become separable from normal in state space.
  3. Learn normal, not attack — the production detector (One-Class SVM) is trained only on baseline behaviour. It needs no attack labels, so it generalises to threats never seen before. This is the “zero-shot” property.

┌──────────────┐ ┌───────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ MDE Events │ │ Ingestion │ │ Temporal HG │ │ 5-min windows │
│ Log (CSV/ │──▶│ → HyperMeshDB │──▶│ hyperedges: │──▶│ 16-dim feature │
│ XLSX) │ │ entity map + │ │ (ts, members, │ │ vector / window│
│ per host │ │ formations │ │ formation, wt) │ │ │
└──────────────┘ └───────────────┘ └──────────────────┘ └────────┬────────┘
Stage 1 Stage 2 Stage 3 Stage 4 │
┌──────────────────┐ ┌─────────────────────────┐ ┌──────────────────────────┐
│ Results & alarms │ │ Readout │ │ LSM Reservoir │
│ per-window score │◀──│ • One-Class SVM (prod) │◀──│ 500 LIF neurons, fixed │
│ + ROC/PR + drift │ │ • LogReg (diagnostic) │ │ recurrent weights, │
│ │ │ on PCA(50) states │ │ fading memory ≈ 17 win │
└──────────────────┘ └─────────────────────────┘ └──────────────────────────┘
Stage 7 Stage 6 Stage 5
(z-score normalise on baseline-only first)

Code map (this repository):

StageArtefact
Ingestionscripts/ingest_mde_baseline.py, scripts/ingest_mde_fleet.py, hypermeshdb/ingest/strategies/mde_baseline.py, hypermeshdb/connectors/mde.py
HG + temporal featuresscripts/temporal_common.py, scripts/temporal_analysis_nanxcv.py, scripts/temporal_analysis_fleet.py
LSM + readoutsscripts/snn_train_nanxcv.py
Resultsdata/snn_results_nanxcv.json, data/snn_results_ciso_fleet.json, data/temporal_*.json
Deck / UIdata/HyperMesh_CISO_LSM_Deck.pptx, client/src/pages/SnnDashboard.tsx, design/snn-hypergraph-*deck.html
Methodology paperdesign/methodology_temporal_hypergraph_snn.tex

Source. Microsoft Defender for Endpoint (MDE) timeline exports from the LTIMindTree CISO team (February 2026). Three real hosts:

HostRoleBaseline windowNotes
NANXCV (nanxcv00f89340g)Primary victim hostFeb 9–13, 2026Cleanest baseline (0% simulation traffic) — the validation host
AZRPREPW (AZRCIPREPWCYMLT)Windows VMFeb 7–12, 2026~14% Cymulate simulation traffic excluded
AZRPREPL (AZRCIPREPLCYMLT)Linux VMFeb 6–12, 2026~39% Cymulate simulation traffic excluded

Attack slice. NANXCV, Feb 14–20, 2026 (Sample2.xlsxSYS_NANXCV), the period the CISO team designated as the attack/exercise window.

Two-phase study design (identical method, different scope):

  • Phase 1 — single host (NANXCV). Two separate tables (NANXCV_BASELINE, SYS_NANXCV), entity IDs scoped per table. Cleanest behavioural contrast → the method-validation experiment.
  • Phase 2 — fleet (three hosts). All hosts merged into one table (CISO_FLEET_MDE) with a global, deduplicated entity map, so a shared IP / account / file-hash resolves to the same node across machines — enabling cross-host structural links. This is the scale-realism experiment.

Honesty note to state up front. “Attack” is labelled by the CISO team’s operational calendar, not per-event kill-chain ground truth. Cymulate simulation traffic was excluded from baselines by team attestation. So our zero-shot scoring is principled; the supervised readout (later) is diagnostic only.


scripts/ingest_mde_baseline.py drives MDEBaselineStrategy, which:

  1. Parses each MDE row and resolves its identities — machine, process (by SHA1 hash), account, and optionally remote IP — into stable integer node IDs via a persisted entity map (data/CISO_FLEET_MDE_entity_map.json, 3,808 typed entities in the fleet run). Example entries: machine:nanxcv00f89340g, process:eb42621654… (SHA1), account:nt authority\system.
  2. Excludes Cymulate simulation traffic from baseline tables (so “normal” is genuinely normal).
  3. Aggregates rows into hyperedges on a configurable window (default 5 min; --window 0 = per-row), writing native HyperMeshDB hyperedge tables.
  4. Emits ingest stats: hyperedge count, entity count, formation vocabulary, entity-map path.

Ingestion magnitudes (from the run artefacts):

Phase 1 (NANXCV)Phase 2 (Fleet)
Baseline hyperedges14,034
Attack hyperedges13,341
Total hyperedges39,016 (from ~374k raw rows after Cymulate exclusion)
Unique baseline entities1,7752,993
Novel entities in attack53 (~3.0%)815 (vs. broader pooled baseline)

(Phase-1 hyperedge totals are from the methodology briefing; entity/bucket counts below are read directly from data/temporal_nanxcv.json and data/temporal_ciso_fleet.json.)


Each MDE event becomes exactly one hyperedge:

e = ( event_ts, members, formation, weight )
  • event_ts — Unix epoch seconds.
  • members — list of integer node IDs (machine, process, account, [IP]) acting together in that event.
  • formation — one of 11 behaviour categories (the compact behavioural vocabulary): PROCESS_EXEC, NETWORK_CONN, FILE_OPS, REGISTRY_OPS, DNS_LOOKUP, HTTP_TRAFFIC, IPC_PIPE, MODULE_LOAD, LOGON, SCRIPT_EXEC, GENERIC.
  • weight — 1.0 in the prototype (reserved for future confidence/volume weighting).

Why a hypergraph, not a graph? A single “process X, run by account Y, on machine Z, talking to IP W” event is one 4-way fact. A binary graph must shatter it into 6 pairwise edges, losing the simultaneity — the very thing that distinguishes a coordinated attacker action from coincidence. Oversized member sets (>64 endpoints in one edge) are split with the machine node as an anchor — an implementation guardrail, not a statistical claim.

This is the HyperMesh differentiator: n-ary behaviour preserved as a first-class primitive, queryable in the Graph Explorer / Lens, and feedable directly into temporal analytics.


6. Stage 4 — Temporal aggregation & the 16-dim feature vector

Section titled “6. Stage 4 — Temporal aggregation & the 16-dim feature vector”

The timeline is partitioned into contiguous 5-minute windows (temporal_common.py). For each window t we build a 16-dimensional feature vector u(t), in this exact order:

Aggregate statistics (5):

  1. hedge_count — hyperedge arrivals in the window
  2. entity_count — distinct active node IDs
  3. novelty_rate — fraction of entities never seen in the baseline set (drift signal)
  4. mean_members — average hyperedge cardinality (how “wide” behaviour is)
  5. formation_entropy — Shannon entropy over the 11 formations (behavioural diversity)

Formation histogram (11): 6–16. raw counts form_PROCESS_EXECform_GENERIC

Window counts (from the live exports):

Phase 1 (NANXCV)Phase 2 (Fleet)
Total windows952 (457 baseline + 495 attack)1,614 (1,119 baseline + 495 attack)
Windows scored (post-washout)922 (427 baseline + 495 attack)1,584

Interpretability bonus — changepoints. A separate analysis runs binary-segmentation changepoint detection on formation_entropy. Volumetric features (hedge_count, entity_count) produced hundreds of noisy changepoints driven by business rhythm; entropy yielded only 4 statistically flagged breakpoints, aligned with attack-phase dynamics. The sharpest: Feb 17, 09:15 UTC, entropy collapsing to 0.65 from a running mean near 2.34 — behaviour concentrating into NETWORK_CONN + PROCESS_EXEC, consistent with C2/lateral-movement profiles (stated qualitatively; we do not assert a labelled MITRE mapping). All 16 channels still feed the reservoir; entropy is the human-readable pivot.


7. Stage 5 — Normalisation that mimics deployment

Section titled “7. Stage 5 — Normalisation that mimics deployment”

Stack raw features into X_raw ∈ ℝ^(T×16). A z-score normaliser is fit on baseline windows only (y=0):

μ_j = mean(X_raw[:,j] | baseline) σ_j = std(X_raw[:,j] | baseline) (σ←1 if σ<1e-8)
u_tj = (X_raw[tj] − μ_j) / σ_j

This is deliberate: scaling statistics come from presumed-normal history, never from a future distribution that contains attacks. It is the first of several places where we hold the line on “the model only ever learns from normal.”


8. Stage 6 — The Liquid State Machine reservoir

Section titled “8. Stage 6 — The Liquid State Machine reservoir”

The LSM (an Echo State Network with leaky-integrator neurons — “spiking/liquid” in neuromorphic terms) maintains a hidden state x(t) ∈ ℝ^N that evolves as:

x(t) = (1 − α)·x(t−1) + α·tanh( W·x(t−1) + W_in·u(t) + ε_t )
  • W_in ∈ ℝ^(N×16) — random input projection, scaled by input_scaling.
  • W ∈ ℝ^(N×N) — sparse random recurrent matrix, rescaled so its spectral radius ρ < 1 (the echo-state property — guarantees a fading, stable memory). The script measures ρ after construction and asserts it’s < 1.0.
  • α (leak_rate) — membrane leak; controls how fast memory fades.
  • ε_t — tiny Gaussian noise (1e-4) for regularisation.

Hyperparameters (defaults, tuned for this data scale):

ParamValueMeaning
n_reservoir (N)500reservoir neurons
spectral_radius (ρ)0.95echo-state stability (<1)
leak_rate (α)0.30membrane leak → memory time-constant 1/α ≈ 3 windows (~15 min); effective context is longer via recurrence (design notes cite ~15–30 windows)
sparsity0.1010% non-zero recurrent connections
input_scaling0.30input weight scale
washout30warm-up windows discarded before scoring
seed7reproducibility

Why an LSM — the three reasons that matter to a CISO:

  1. Fading memory. x(t) encodes not just this window but the recent evolution of behaviour (leak time-constant ≈ 3 windows; effective context extends further via recurrence). A single anomalous window bends the whole state trajectory — slow drifts become visible.
  2. Recurrent weights are FIXED — no backprop. Only the lightweight readout is trained. With only ~10³ windows, this avoids the overfitting that plagues fully-trained deep RNNs on small security datasets, and it’s fast and deterministic.
  3. Expansion. Projecting 16-dim behaviour into 500-dim reservoir space makes attack-driven deviations accumulate and become linearly separable by a simple boundary.

Reservoir states (post-washout) → PCA to 50 components, fit on baseline rows onlyStandardScaler (also baseline-fit). Both readouts consume the same 50-dim, baseline-centred coordinates.

9a. Mode A — One-Class SVM (the production detector)

Section titled “9a. Mode A — One-Class SVM (the production detector)”
OneClassSVM(kernel="rbf", gamma="scale", nu=0.05) # trained ONLY on baseline states
anomaly_score = −decision_function(state) # higher = more anomalous
  • Trained exclusively on baseline behaviour — never sees an attack label. This is the deployable, zero-shot detector.
  • nu = 0.05 sets the baseline false-alarm budget to ~5%; the observed rate is 29/427 = 6.8% of baseline windows (Phase 1) — close to budget and expected by design, not a calibration failure. (Fleet: 60/1,089 = 5.5%.)
  • This is the model behind the headline numbers.

9b. Mode B — Logistic Regression (diagnostic only)

Section titled “9b. Mode B — Logistic Regression (diagnostic only)”

A logistic readout on the same PCA(50) reservoir states, under a temporal 60/40 train/test split that does use attack labels. It is explicitly NOT a deployment simulation — it leaks future knowledge. It is reported only as a comparison point. Note: in Phase 1 this diagnostic readout actually scored lower (ROC-AUC 0.55) than the zero-shot OC-SVM (0.74) — we attribute this to the temporal split (training on the first 60% of attack windows, testing on the last 40%), not to feature quality; in Phase 2 the same readout reaches ROC-AUC 0.91, showing the separating signal is present. Never present Mode B as operational performance. (Caveat: the code contains no raw-16-dimension classifier, so we make no claim about raw-feature separability — both readouts consume reservoir states.)


10a. Phase 1 — NANXCV (single host, the validation experiment)

Section titled “10a. Phase 1 — NANXCV (single host, the validation experiment)”

One-Class SVM (zero-shot, production-realistic) — 922 windows scored:

MetricValue
Precision0.902
Recall0.537
F10.673
ROC-AUC0.744
PR-AUC0.831
Accuracy0.720

Confusion matrix: TP 266, FP 29, FN 229, TN 398 (baseline = 427, attack = 495). Reservoir drift: attack windows sit +13.6% further from the baseline centroid (L2) than baseline windows — geometric confirmation the reservoir separates the phases.

The headline for the slide: “When it fires, it is right 90% of the time, having learned only from the clean week.”

Timeline narrative (from per-window scores): a Feb 16 burst dominated by NETWORK_CONN peaks at normalised score 1.000; the Feb 17 09:15 UTC entropy collapse to 0.65; a quiet Feb 18 dwell with few alarms; renewed activity Feb 19–20.

10b. Phase 2 — Fleet (three hosts, same method, no re-tuning)

Section titled “10b. Phase 2 — Fleet (three hosts, same method, no re-tuning)”
MetricOne-Class SVM (zero-shot)Logistic (diagnostic only)
Precision0.7310.561
Recall0.3290.980
F10.4540.713
ROC-AUC0.7130.908
PR-AUC0.6130.809

1,584 windows scored. Top anomalous cluster again anchored on the Feb 16 afternoon NETWORK_CONN burst.

Be transparent about the fleet number. Zero-shot recall drops to 33% and the centroid separation is slightly negative (−1.1%) — pooling three heterogeneous hosts into one baseline widens the “normal” envelope, so more attack windows hide inside it, with no hyperparameter re-tuning for the fleet. Phase 1 and Phase 2 are not directly comparable (“better/worse”) because the boundary is fit to different baseline distributions. The supervised AUC of 0.91 shows the signal is there — the zero-shot boundary just needs per-segment tuning to recover it. This is a roadmap item, not a hidden flaw.


11. Limitations & explicit non-claims (put these on a slide)

Section titled “11. Limitations & explicit non-claims (put these on a slide)”
  1. Recall ~50% (single host) — many attack windows that mimic baseline statistics stay inside the boundary, especially during dwell. This is an analyst-priority signal layered on top of EDR, not a replacement.
  2. No MITRE technique IDs, no attribution, no automated forensic narrative. We flag behaviourally anomalous windows; a human investigates.
  3. “Attack phase” = operational calendar, not per-event ground truth. Metrics inherit that labelling assumption.
  4. No commercial-EDR / UEBA head-to-head benchmark yet.
  5. Fleet hyperparameters not jointly re-tuned; cross-phase comparison requires care.
  6. Supervised (logistic) results are diagnostic only — they use future labels and must not be read as deployment performance.
  7. Single-tenant validation on one CISO team’s February-2026 data; broader generalisation is unproven.
  8. Attack-window provenance (red-team vs. simulated vs. production) should be confirmed with the data owner before external use.

What we’d build next, in priority order:

  1. Lift recall — per-host / per-segment baseline boundaries, adaptive nu, richer features (sequence-of-formations, inter-arrival timing) so dwell-phase attacks surface.
  2. Fleet-scale zero-shot — per-host normalisation + hierarchical reservoirs to recover the signal the supervised AUC (0.91) proves is present.
  3. Head-to-head benchmark vs. a commercial EDR/UEBA on the same telemetry — the credibility milestone.
  4. Live SOC shadow-mode pilot — run alongside the existing stack, measure analyst-validated precision/recall and time-to-detect on real traffic.
  5. Explainability surface — per-alarm “why” (dominant formations, novel entities, contributing hyperedges) wired into the SnnDashboard UI.

The ask: access to a labelled (or red-team-tagged) production window + permission for a shadow-mode pilot, so we can report analyst-validated precision/recall and a fair EDR benchmark.


13. Joint next-stage agenda — what we need from the CISO & team

Section titled “13. Joint next-stage agenda — what we need from the CISO & team”

Frame for the meeting: “To take this from a validated prototype to an operational pilot, here is what we’d need from you.” Each question maps to closing a gap already named in §11 (Limitations). Position the limitations as a joint roadmap, not weaknesses.

The 3 asks that decide whether there is a Phase 2 (lead with these)

Section titled “The 3 asks that decide whether there is a Phase 2 (lead with these)”
  1. Ground truth for Feb 14–20 — “Can we get analyst-confirmed true positives for that window?” → converts our calendar-labelled result into a defensible evaluation.
  2. Shadow-mode pilot — “Will you sponsor a 4–6 week shadow-mode run on live telemetry?” → the only way to prove operational precision/recall.
  3. EDR benchmark — “Can we run head-to-head against your current EDR/UEBA on the same data?” → the credibility milestone.

Everything below feeds these three.

13a. Data — tighten “normal” and get real labels

Section titled “13a. Data — tighten “normal” and get real labels”
AskWhat it unblocks
Provenance of the attack window — red-team, Cymulate simulation, or real production incident?We currently cannot say on the slide; determines whether 90% precision is against real or simulated adversary behaviour.
Per-event ground truth / kill chain — validated TPs, timestamps, MITRE techniquesReplaces operational-calendar labelling; lets us report real recall.
Longer & broader baseline — weeks–months, more hosts, server + workstation mixOur “normal” is one week / one host; more baseline = tighter boundary, fewer false alarms, better recall.
Cymulate intent — detect simulated attacks, or exclude as noise?We exclude it today; if they want it detected it becomes a free labelled positive set.
Full MDE schema access — DeviceProcess / DeviceNetwork / DeviceLogon fields, not just the timeline exportRicher features (sequence-of-formations, inter-arrival timing) that lift recall.
Data governance — PII / residency constraints, standing feed vs. one-off CSV exportsDecides whether deployment is legally possible and whether we get live data.

13b. Model — define the target before tuning

Section titled “13b. Model — define the target before tuning”
AskWhat it unblocks
Desired operating point — precision-first or recall-first?Directly sets the OC-SVM threshold / nu; we can’t tune without it.
Alert budget — alerts/day/analyst acceptable?Defines the false-alarm ceiling; maps our 6.8% baseline-window rate to their volume tolerance.
What do they miss today? — detections the current stack fails onAnchors value to a real gap, not abstract “anomalies.”
Explainability requirement — need a per-alarm “why” (dominant formations, novel entities, contributing hyperedges)?Decides whether we build the explainability surface before or after the pilot.
Per-host vs. fleet modelling — one model or per-asset baselines?The fix for the fleet recall drop (33%); supervised AUC 0.91 proves the signal is there, per-host baselines recover it.
Benchmark definition — what does “better than our EDR” mean numerically?Agree the scoreboard before we play, not after.

13c. Deployment — prove it in production, on their rails

Section titled “13c. Deployment — prove it in production, on their rails”
AskWhat it unblocks
Shadow-mode pilot — 4–6 weeks alongside production, analysts validate each alarmThe headline ask; yields analyst-validated precision/recall and time-to-detect.
Pilot exit criteria — agree now what result triggers full adoptionPrevents a “great demo, no decision” outcome.
Live telemetry feed — Defender API / Event Hub / Sentinel stream vs. batch exportsDetermines architecture and whether 5-min-window cadence is feasible.
Alert destination — SIEM (Sentinel?), SOAR, ticketing; format/API?Defines integration scope and effort.
Latency SLA — is 5-minute batch acceptable, or near-real-time needed?Sets architecture and cost.
Where the model runs — their tenant/cloud, on-prem, or our environment; data egress permitted?Often the hardest blocker; surface it early, not after the pilot is agreed.
Ownership — who triages and validates alarms during the pilot?A pilot with no analyst owner produces no validated labels and fails silently.

Terminal window
# 1. Ingest baseline + attack hyperedge tables
.venv/bin/python scripts/ingest_mde_baseline.py # NANXCV (Phase 1)
.venv/bin/python scripts/ingest_mde_fleet.py # CISO_FLEET_MDE (Phase 2)
# 2. Build 5-min temporal features
.venv/bin/python scripts/temporal_analysis_nanxcv.py # → data/temporal_nanxcv.json
.venv/bin/python scripts/temporal_analysis_fleet.py # → data/temporal_ciso_fleet.json
# 3. Train LSM + readouts, export results
.venv/bin/python scripts/snn_train_nanxcv.py \
--input data/temporal_nanxcv.json --output data/snn_results_nanxcv.json
.venv/bin/python scripts/snn_train_nanxcv.py \
--input data/temporal_ciso_fleet.json --output data/snn_results_ciso_fleet.json

Defaults (in-code): n_reservoir=500, sparsity=0.10, spectral_radius=0.95, input_scaling=0.30, leak_rate=0.30, washout=30, seed=7. CLI overrides exist for all of them.

Appendix B — Glossary (for non-specialist stakeholders)

Section titled “Appendix B — Glossary (for non-specialist stakeholders)”
  • Hyperedge — a single relationship linking more than two things at once (here: machine + process + account + IP acting together). A normal graph edge links only two.
  • Formation — the behavioural category of an event (e.g. NETWORK_CONN, PROCESS_EXEC).
  • Liquid State Machine (LSM) / reservoir — a fixed (untrained) recurrent neural network that gives the model a short fading memory of recent behaviour. Cheap, fast, hard to overfit.
  • Echo-state property (spectral radius < 1) — the mathematical guarantee that the reservoir’s memory fades smoothly rather than blowing up.
  • One-Class SVM — a model that learns the shape of “normal” from normal data only, then flags anything outside that shape. The engine of zero-shot detection.
  • Zero-shot — detecting attacks the model was never trained on, by learning only what normal looks like.
  • Washout — initial windows discarded while the reservoir “warms up.”
QuestionAnswer
”How is this different from our EDR?”EDR fires on known signatures/rules. This fires on behavioural deviation from this machine’s own normal — catching novel/living-off-the-land activity rules miss. It’s additive, not a replacement.
”Why should I trust a 54% recall?”You trust the precision (90%): when it fires it’s almost always real, so it’s a high-quality analyst-priority queue. Recall improves on the roadmap; it’s a net-new signal on top of full EDR coverage.
”Did you train on the attack?”No. The production detector (One-Class SVM) trains only on the clean baseline week. The one model that uses attack labels (logistic) is clearly flagged diagnostic-only and never presented as operational.
”What about false alarms?”~6.8% of baseline windows fire (29/427) against a deliberate, tunable nu=0.05 (~5%) budget. At 5-min windows that’s a manageable, prioritisable volume.
”Does it scale to my whole fleet?”Single-host is strong; pooled-fleet zero-shot recall drops (33%) without per-host tuning, though supervised AUC (0.91) shows the signal is there. Per-host baselines are the next build.
”Can my analysts act on an alarm?”Each alarm carries timestamp, anomaly score, dominant formation, novel-entity count, and reservoir drift — enough to triage. Full per-alarm explainability is on the roadmap.

All quantitative claims in this document are reproduced from repository artefacts: data/snn_results_nanxcv.json, data/snn_results_ciso_fleet.json, data/temporal_nanxcv.json, data/temporal_ciso_fleet.json, and the implementation in scripts/snn_train_nanxcv.py / scripts/temporal_common.py / scripts/ingest_mde_baseline.py. Phase-1 hyperedge totals and the entropy-changepoint narrative are from design/methodology_temporal_hypergraph_snn.tex.