Ingestion
hypermesh.ingest is the universal converter that turns any source — tabular,
semi-structured, graph DB, RDBMS, or unstructured — into a HyperCore bundle that
loads through the unmodified engine loader.
pip install "hypermesh[engine,yaml]"Quick paths
Section titled “Quick paths”import hypermesh as hm
# CSV → bundle (mapping auto-inferred from the header)bundle = hm.ingest.ingest_csv("events.csv", staging="./staging")
# DataFrame → bundle (mapping auto-inferred from the columns)import pandas as pddf = pd.read_parquet("events.parquet")bundle = hm.ingest.ingest_dataframe(df, source_id="warehouse", staging="./staging")
# Generic URI (RDBMS / graph DB / object store)bundle = hm.ingest.ingest_from_uri( "rdbms://postgresql://host/db", spec="mapping.yaml", staging="./staging")Automatic inference
Section titled “Automatic inference”When you don’t supply a spec, ingestion infers a hypergraph from the source’s columns/keys instead of using a fixed default. The inference targets a co-occurrence model that matches how the adapters build edges: it picks a member/entity column and a distinct grouping key, so the result is a real hypergraph (not one giant edge):
- a basket-like id (
order_id,session_id,transaction_id, …) becomes the hyperedge identity — entities sharing that value co-occur; - otherwise, if a timestamp column exists, rows are grouped into time buckets;
- the remaining entity/id column becomes the member (node), and other columns become vertex properties.
Inspect and override the inferred mapping before committing to it:
import dataclassesimport hypermesh as hm
# 1. See what would be inferred — returns a reviewable MappingSpecspec = hm.infer_spec("orders.csv")print(dataclasses.asdict(spec)) # member = product_id, grouped by order_id, ...
# 2. One-shot: infer + ingest, with the mapping surfaced for transparencyresult = hm.infer_hypergraph("orders.csv", staging="./staging")print(result.describe()) # "members = 'product_id', hyperedges grouped by 'order_id'; ..."db = hm.connect("./data", hyperedges_csv=f"{result.bundle}/hyperedges.csv")
# 3. Override: tweak the draft (or write your own) and pass it backspec.tables["CoProximity"].bucket_seconds = 60bundle = hm.ingest.ingest_csv("orders.csv", spec=spec, staging="./staging")infer_spec / infer_hypergraph accept a CSV path, a pandas.DataFrame, or a
csv:// / rdbms:// (SQLAlchemy) URI. RDBMS sources use foreign-key-aware
introspection.
The returned bundle directory contains the HyperCore CSVs plus additive
sidecars (manifest, temporal index, id map, provenance, quarantine). Load it:
db = hm.connect("/var/lib/hypermesh/data", hyperedges_csv=f"{bundle}/hyperedges.csv")Mapping specs
Section titled “Mapping specs”Provide a declarative YAML MappingSpec to control identity keys, property
mapping, PII policies, and time bucketing. Without one, ingestion auto-infers a
mapping from the data (see Automatic inference).
from hypermesh.ingest import MappingSpecspec = MappingSpec.from_yaml("mapping.yaml")bundle = hm.ingest.ingest_dataframe(df, source_id="warehouse", spec=spec)Use dry_run=True to print the plan without writing a bundle.
Domain strategies
Section titled “Domain strategies”Pre-built batch strategies (security, clinical, patents, geo) are discoverable through the registry:
from hypermesh.ingest import strategies, get_strategy
for s in strategies(): print(s["name"])
mde = get_strategy("mde_baseline")