Skip to content

Case Study: Congressional Cosponsorship (end-to-end)

This walkthrough takes a real, public hypergraph from raw files all the way to a trained hypergraph neural network, exercising every layer of HyperMesh in a single script. It doubles as a reference for how the pieces fit together.

The full, runnable script lives at examples/case_study_congress_bills.py.

Terminal window
pip install "hypermesh[analytics,interop,ml]" pandas
python examples/case_study_congress_bills.py # downloads the dataset
python examples/case_study_congress_bills.py --limit 20000 # faster subset

We use congress-bills from the Austin R. Benson data collection: a temporal higher-order network where

  • nodes are US Congresspersons (1,718 of them),
  • hyperedges are legislative bills — the set of a bill’s sponsor and co-sponsors (260,851 timestamped bills),
  • timestamps are the day each bill was introduced.

This is a textbook hypergraph: a bill naturally connects many legislators at once, which a plain graph can only approximate with cliques. It ships in the standard Benson format (-nverts.txt, -simplices.txt, -times.txt, -node-labels.txt), which the script downloads and parses.

1. Ingest — bulk-load timestamped hyperedges

Section titled “1. Ingest — bulk-load timestamped hyperedges”

Each bill becomes one hyperedge. We hand copy_from_df a frame with event_ts and a members list per row — the fast bulk path (no row-by-row inserts).

he_df = pd.DataFrame({
"event_ts": times,
"members": [list(s) for s in simplices], # variable-size sets
"weight": [1.0] * len(simplices),
})
db.execute("CREATE HYPEREDGE TABLE Cosponsorship () BUCKET_SECONDS 365")
db.copy_from_df(he_df, "Cosponsorship")

We also compute each legislator’s tenure (first/last active day) on the way in — used later as model features.

The hyperedges are immediately queryable, including time windows and pagination:

db.execute(
"MATCH HYPEREDGE (he:Cosponsorship) "
"WHERE he.event_ts >= 2000 AND he.event_ts <= 4000 RETURN * LIMIT 1000"
)
db.execute("MATCH HYPEREDGE (he:Cosponsorship) RETURN * SKIP 5 LIMIT 3")

3. Analytics — who matters, and how it’s structured

Section titled “3. Analytics — who matters, and how it’s structured”

The analytics engine runs directly on the stored hypergraph — degree, PageRank influence, density, spectral gap, and spectral communities:

an = db.analytics("Cosponsorship")
an.density()
an.spectral_gap()
degree = an.node_degree() # {node_id: #bills}
pr = an.pagerank() # influence ranking
an.zhou_clustering() # spectral communities

We then derive a reproducible supervised label for the modeling stage: legislators in the top quartile of cosponsorship degree are tagged "high" influence, everyone else "low". This label is written into a node table:

db.execute(
"CREATE NODE TABLE Congressperson ("
" node_id INTEGER PRIMARY KEY, name TEXT, "
" first_q INTEGER, last_q INTEGER, tier TEXT)"
)
db.copy_from_df(nodes_df, "Congressperson") # name + tenure + tier

HyperMesh doesn’t render graphs itself, but interop bridges the hypergraph to the standard ecosystem. Here we project to NetworkX and write GraphML for Gephi/Cytoscape:

hg = db.to_hypergraph("Cosponsorship")
g = hm.interop.to_networkx(hg, kind="clique")
hm.interop.to_graphml(hg, "congress.graphml", kind="clique")

5. Modeling — train an owned HGNN in one call

Section titled “5. Modeling — train an owned HGNN in one call”

Now the payoff. We predict each legislator’s influence tier from their tenure features + co-sponsorship structure, using the modeling layer. Tenure is non-leaky with respect to the degree-derived label, so the model genuinely has to learn from the hypergraph.

fhg = hm.nn.featurize(
db, "Cosponsorship",
node_table="Congressperson",
node_features=["first_q", "last_q"],
label="tier",
)
# Framework-native tensors, if you'd rather bring your own model:
tensors = hm.nn.prepare(db, "Cosponsorship", framework="torch",
node_table="Congressperson",
node_features=["first_q", "last_q"], label="tier")
model = hm.nn.fit(fhg, epochs=150)
model.evaluate(fhg.y, "val")["accuracy"]
model.embed() # node embeddings for downstream use

hm.nn.fit builds the spectral propagation operator from the incidence matrix, standardises features, splits train/val, and returns a FittedHGNN you can predict, predict_proba, embed, and evaluate.

Finally we treat each legislator as a time series of yearly activity and classify their influence tier with a reservoir / liquid-state model — no backprop through time:

seq = hm.nn.temporal_features(db, "Cosponsorship", window_seconds=365)
clf = hm.nn.ReservoirClassifier(n_reservoir=128, seed=0)
clf.fit(per_legislator_sequences, tiers)
clf.score(per_legislator_sequences, tiers)

In one script, with one connect():

LayerAPIWhat it did
Ingestcopy_from_df260k variable-size bills as hyperedges
QueryCypher MATCH HYPEREDGEtemporal windows + pagination
Analyticsdb.analytics(...)influence, density, communities, spectral gap
Interophm.interop.*NetworkX / GraphML export
Modelinghm.nn.fitowned spectral HGNN, trained on the DB
Temporalhm.nn.ReservoirClassifieractivity-trajectory classification

The same APIs run unchanged on the full dataset (--limit 0) and against a remote HyperMesh server.