Skip to content

Ingestion

hypermesh.ingest is the universal converter that turns any source — tabular, semi-structured, graph DB, RDBMS, or unstructured — into a HyperCore bundle that loads through the unmodified engine loader.

Terminal window
pip install "hypermesh[engine,yaml]"
import hypermesh as hm
# CSV → bundle (mapping auto-inferred from the header)
bundle = hm.ingest.ingest_csv("events.csv", staging="./staging")
# DataFrame → bundle (mapping auto-inferred from the columns)
import pandas as pd
df = pd.read_parquet("events.parquet")
bundle = hm.ingest.ingest_dataframe(df, source_id="warehouse", staging="./staging")
# Generic URI (RDBMS / graph DB / object store)
bundle = hm.ingest.ingest_from_uri(
"rdbms://postgresql://host/db", spec="mapping.yaml", staging="./staging"
)

When you don’t supply a spec, ingestion infers a hypergraph from the source’s columns/keys instead of using a fixed default. The inference targets a co-occurrence model that matches how the adapters build edges: it picks a member/entity column and a distinct grouping key, so the result is a real hypergraph (not one giant edge):

  • a basket-like id (order_id, session_id, transaction_id, …) becomes the hyperedge identity — entities sharing that value co-occur;
  • otherwise, if a timestamp column exists, rows are grouped into time buckets;
  • the remaining entity/id column becomes the member (node), and other columns become vertex properties.

Inspect and override the inferred mapping before committing to it:

import dataclasses
import hypermesh as hm
# 1. See what would be inferred — returns a reviewable MappingSpec
spec = hm.infer_spec("orders.csv")
print(dataclasses.asdict(spec)) # member = product_id, grouped by order_id, ...
# 2. One-shot: infer + ingest, with the mapping surfaced for transparency
result = hm.infer_hypergraph("orders.csv", staging="./staging")
print(result.describe()) # "members = 'product_id', hyperedges grouped by 'order_id'; ..."
db = hm.connect("./data", hyperedges_csv=f"{result.bundle}/hyperedges.csv")
# 3. Override: tweak the draft (or write your own) and pass it back
spec.tables["CoProximity"].bucket_seconds = 60
bundle = hm.ingest.ingest_csv("orders.csv", spec=spec, staging="./staging")

infer_spec / infer_hypergraph accept a CSV path, a pandas.DataFrame, or a csv:// / rdbms:// (SQLAlchemy) URI. RDBMS sources use foreign-key-aware introspection.

The returned bundle directory contains the HyperCore CSVs plus additive sidecars (manifest, temporal index, id map, provenance, quarantine). Load it:

db = hm.connect("/var/lib/hypermesh/data", hyperedges_csv=f"{bundle}/hyperedges.csv")

Provide a declarative YAML MappingSpec to control identity keys, property mapping, PII policies, and time bucketing. Without one, ingestion auto-infers a mapping from the data (see Automatic inference).

from hypermesh.ingest import MappingSpec
spec = MappingSpec.from_yaml("mapping.yaml")
bundle = hm.ingest.ingest_dataframe(df, source_id="warehouse", spec=spec)

Use dry_run=True to print the plan without writing a bundle.

Pre-built batch strategies (security, clinical, patents, geo) are discoverable through the registry:

from hypermesh.ingest import strategies, get_strategy
for s in strategies():
print(s["name"])
mde = get_strategy("mde_baseline")