Case Study: Congressional Cosponsorship (end-to-end)

This walkthrough takes a real, public hypergraph from raw files all the way to a trained hypergraph neural network, exercising every layer of HyperMesh in a single script. It doubles as a reference for how the pieces fit together.

The full, runnable script lives at examples/case_study_congress_bills.py.

pip install "hypermesh[analytics,interop,ml]" pandas
python examples/case_study_congress_bills.py            # downloads the dataset
python examples/case_study_congress_bills.py --limit 20000   # faster subset

The dataset

We use congress-bills from the Austin R. Benson data collection: a temporal higher-order network where

nodes are US Congresspersons (1,718 of them),
hyperedges are legislative bills — the set of a bill’s sponsor and co-sponsors (260,851 timestamped bills),
timestamps are the day each bill was introduced.

This is a textbook hypergraph: a bill naturally connects many legislators at once, which a plain graph can only approximate with cliques. It ships in the standard Benson format (-nverts.txt, -simplices.txt, -times.txt, -node-labels.txt), which the script downloads and parses.

1. Ingest — bulk-load timestamped hyperedges

Each bill becomes one hyperedge. We hand copy_from_df a frame with event_ts and a members list per row — the fast bulk path (no row-by-row inserts).

he_df = pd.DataFrame({
    "event_ts": times,
    "members":  [list(s) for s in simplices],   # variable-size sets
    "weight":   [1.0] * len(simplices),
})
db.execute("CREATE HYPEREDGE TABLE Cosponsorship () BUCKET_SECONDS 365")
db.copy_from_df(he_df, "Cosponsorship")

We also compute each legislator’s tenure (first/last active day) on the way in — used later as model features.

2. Query — temporal Cypher

The hyperedges are immediately queryable, including time windows and pagination:

db.execute(
    "MATCH HYPEREDGE (he:Cosponsorship) "
    "WHERE he.event_ts >= 2000 AND he.event_ts <= 4000 RETURN * LIMIT 1000"
)
db.execute("MATCH HYPEREDGE (he:Cosponsorship) RETURN * SKIP 5 LIMIT 3")

3. Analytics — who matters, and how it’s structured

The analytics engine runs directly on the stored hypergraph — degree, PageRank influence, density, spectral gap, and spectral communities:

an = db.analytics("Cosponsorship")
an.density()
an.spectral_gap()
degree = an.node_degree()       # {node_id: #bills}
pr     = an.pagerank()          # influence ranking
an.zhou_clustering()            # spectral communities

We then derive a reproducible supervised label for the modeling stage: legislators in the top quartile of cosponsorship degree are tagged "high" influence, everyone else "low". This label is written into a node table:

db.execute(
    "CREATE NODE TABLE Congressperson ("
    "  node_id INTEGER PRIMARY KEY, name TEXT, "
    "  first_q INTEGER, last_q INTEGER, tier TEXT)"
)
db.copy_from_df(nodes_df, "Congressperson")   # name + tenure + tier

4. Interop — export for visualisation

HyperMesh doesn’t render graphs itself, but interop bridges the hypergraph to the standard ecosystem. Here we project to NetworkX and write GraphML for Gephi/Cytoscape:

hg = db.to_hypergraph("Cosponsorship")
g  = hm.interop.to_networkx(hg, kind="clique")
hm.interop.to_graphml(hg, "congress.graphml", kind="clique")

5. Modeling — train an owned HGNN in one call

Now the payoff. We predict each legislator’s influence tier from their tenure features + co-sponsorship structure, using the modeling layer. Tenure is non-leaky with respect to the degree-derived label, so the model genuinely has to learn from the hypergraph.

fhg = hm.nn.featurize(
    db, "Cosponsorship",
    node_table="Congressperson",
    node_features=["first_q", "last_q"],
    label="tier",
)

# Framework-native tensors, if you'd rather bring your own model:
tensors = hm.nn.prepare(db, "Cosponsorship", framework="torch",
                        node_table="Congressperson",
                        node_features=["first_q", "last_q"], label="tier")

model = hm.nn.fit(fhg, epochs=150)
model.evaluate(fhg.y, "val")["accuracy"]
model.embed()                  # node embeddings for downstream use

hm.nn.fit builds the spectral propagation operator from the incidence matrix, standardises features, splits train/val, and returns a FittedHGNN you can predict, predict_proba, embed, and evaluate.

6. Temporal — reservoir computing (LSM)

Finally we treat each legislator as a time series of yearly activity and classify their influence tier with a reservoir / liquid-state model — no backprop through time:

seq = hm.nn.temporal_features(db, "Cosponsorship", window_seconds=365)

clf = hm.nn.ReservoirClassifier(n_reservoir=128, seed=0)
clf.fit(per_legislator_sequences, tiers)
clf.score(per_legislator_sequences, tiers)

What this demonstrates

In one script, with one connect():

Layer	API	What it did
Ingest	`copy_from_df`	260k variable-size bills as hyperedges
Query	Cypher `MATCH HYPEREDGE`	temporal windows + pagination
Analytics	`db.analytics(...)`	influence, density, communities, spectral gap
Interop	`hm.interop.*`	NetworkX / GraphML export
Modeling	`hm.nn.fit`	owned spectral HGNN, trained on the DB
Temporal	`hm.nn.ReservoirClassifier`	activity-trajectory classification

The same APIs run unchanged on the full dataset (--limit 0) and against a remote HyperMesh server.