Quickstart¶
This page walks you through three hands-on flows with parsimony-core 0.7.0 (Python >=3.11): define and call your own connector, compose two connectors into a collection, and build a tiny searchable Catalog. The first two flows run with only the base install; the catalog flow needs one optional extra, which is called out below.
If you have not installed the package yet, start with Installation:
No connectors ship in core
The core package is the framework plus the catalog — it contains zero data
connectors. Every real data source is published as its own
parsimony-<name> distribution (a provider plugin)
and discovered at runtime. The examples below define their own connectors
so they run with nothing but parsimony-core installed.
1. Define and call a connector¶
A connector is a small async function plus metadata. The function's parameters are the connector's call surface, and the function returns raw data — a pandas DataFrame, Series, scalar, or dict. The framework wraps that raw value into a Result / TabularResult and attaches framework-built Provenance; connectors never construct those carriers themselves.
The @connector decorator turns an async def into a frozen Connector. When you attach an OutputConfig, the framework applies the declared schema to the returned DataFrame — renaming, coercing dtypes, and tagging each column with a role.
import asyncio
import pandas as pd
from parsimony import Column, ColumnRole, OutputConfig, TabularResult, connector
PRICE_OUTPUT = OutputConfig(
columns=[
Column(name="date", role=ColumnRole.KEY, namespace="demo_prices", dtype="date"),
Column(name="close", role=ColumnRole.DATA, dtype="numeric"),
]
)
@connector(output=PRICE_OUTPUT, tags=["demo"])
async def daily_close(symbol: str) -> pd.DataFrame:
"""Return a tiny synthetic price series for a ticker symbol."""
# Replace this with a real HTTP call — see the HTTP transport guide.
return pd.DataFrame(
{
"date": ["2024-01-02", "2024-01-03", "2024-01-04"],
"close": ["185.6", "188.1", "187.2"],
}
)
async def main() -> None:
result = await daily_close(symbol="ACME")
assert isinstance(result, TabularResult)
print(result.df) # the schema-applied DataFrame
print(result.df["close"].dtype) # float64 — coerced from strings by dtype="numeric"
print(result.provenance.source) # "daily_close" (defaults to the function name)
print(result.provenance.params) # {"symbol": "ACME"}
print([c.name for c in result.data_columns]) # ["close"]
asyncio.run(main())
A few things this example demonstrates, all enforced by the framework:
- The connector is async and you call it with
await. A non-coroutine function raisesTypeErrorat decoration time. - A description is mandatory. It defaults to the stripped docstring; pass
description=to override. With neither, decoration raisesValueError. provenance.sourceis the connector name, which defaults tofn.__name__.provenance.paramsrecords only the call-time arguments.dtype="numeric"coerced the string column tofloat64. Thedatecolumn declareddtype="date"and was normalized to midnight timestamps.
Connectors must return raw data
Returning a Result, a TabularResult, or a (data, properties) tuple
raises TypeError. The framework builds the output envelope; your job is
to return the data. A schema or coercion failure during wrapping surfaces as
a typed ParseError, not a bare ValueError.
2. Compose connectors and hide secrets¶
Connectors live in an immutable Connectors collection. You merge collections with the + operator (there is no .merge method), look connectors up by name with [], and invoke them with the canonical idiom await collection[name](**kwargs).
bind(**kwargs) fixes parameter values and returns a new connector with those parameters removed from its call surface. This is how you inject a secret or a base URL without exposing it: declare the parameter in secrets=(...), then bind it. Bound secrets never appear in the connector's signature, its LLM-facing card, or its provenance.
import asyncio
import pandas as pd
from parsimony import Connectors, connector
@connector
async def search_titles(query: str) -> pd.DataFrame:
"""Search a demo index by keyword."""
return pd.DataFrame({"code": ["A", "B"], "title": [f"{query} alpha", f"{query} beta"]})
@connector(secrets=("api_key",))
async def fetch_series(series_id: str, api_key: str) -> pd.DataFrame:
"""Fetch one observation for a series id (requires an API key)."""
return pd.DataFrame({"date": ["2024-01-01"], "value": [1.0]})
async def main() -> None:
# Merge two single-connector collections with the + operator.
bundle = Connectors([search_titles]) + Connectors([fetch_series])
print(bundle.names()) # ["fetch_series", "search_titles"] (sorted)
print("fetch_series" in bundle) # True
print(len(bundle)) # 2
# Bind the secret across the whole collection. bind is scoped per connector:
# it only fixes parameters a connector actually has, so search_titles is untouched.
wired = bundle.bind(api_key="sk-demo")
print(list(wired["fetch_series"].exposed_signature.parameters)) # ["series_id"]
# Invoke by name with await.
titles = await wired["search_titles"](query="GDP")
print(len(titles.df)) # 2
series = await wired["fetch_series"](series_id="UNRATE")
print(series.provenance.params) # {"series_id": "UNRATE"} — api_key is stripped
asyncio.run(main())
Why binding hides secrets
Provenance records only the connector's exposed (unbound) call-time
parameters, and even a supplied secret-named argument is stripped. So a
bound api_key is invisible both to provenance and to the
describe() / to_llm() cards a connector renders for an agent prompt —
that is the mechanism, not a convention.
Connectors also offers get, names, filter, search, describe, and to_llm — see Calling, binding, and composing. Note that [] takes a connector name, never an integer index: bundle[0] raises KeyError.
3. Build and search a Catalog¶
A Catalog is a portable, in-memory index over normalized Entity records. An entity has a namespace (lowercase snake_case), a code (its identifier within that namespace), a title, and arbitrary metadata. The lifecycle is fixed: construct the catalog, load entities with set_entities, materialize indexes with await build(), then await search(...).
import asyncio
from parsimony.catalog import BM25Index, Catalog, CatalogMatch, Entity
async def main() -> None:
# An explicit BM25 index over the "title" field; default_field makes plain-text
# (broad) queries search that field.
catalog = Catalog("demo", indexes={"title": BM25Index()}, default_field="title")
catalog.set_entities(
[
Entity(namespace="series", code="UNRATE", title="Unemployment Rate"),
Entity(namespace="series", code="GDPC1", title="Real Gross Domestic Product"),
]
)
await catalog.build() # materialize the indexes; required before searching
matches, diagnostic = await catalog.search("unemployment", limit=5)
print(diagnostic.mode) # "broad" — a plain-text query against default_field
for match in matches:
assert isinstance(match, CatalogMatch)
print(match.namespace, match.code, match.title, round(match.score, 3))
asyncio.run(main())
search returns a tuple (list[CatalogMatch], SearchDiagnostic). Each CatalogMatch carries the entity's namespace, code, title, metadata, and a final score. The SearchDiagnostic.mode tells you how the query was executed: "broad" for plain text, or "structured" when the query uses FIELD: value syntax.
Build before you search
Every mutation (set_entities, set_index, delete_many, …) marks the
catalog dirty. Calling search() or save() while dirty raises a plain
ValueError whose message tells you to await catalog.build() first.
Re-run build() after any change.
BM25 needs the standard extra
BM25Index builds and scores with rank-bm25, which ships in the
standard optional extra, not the base install. Run this flow after
pip install "parsimony-core[standard]". That extra also unlocks the
FAISS vector indexes, the default sentence-transformers embedder, and the
hf:// snapshot loader. See Installation.
The default index policy¶
If you pass indexes=None (the default), the catalog uses the default index policy: at build() time it creates a BM25 index for code, title, and every metadata key observed across your entities. This is the quickest way to make a catalog searchable across all its fields:
import asyncio
from parsimony.catalog import Catalog, Entity
async def main() -> None:
catalog = Catalog("demo") # indexes=None -> default policy
catalog.set_entities(
[Entity(namespace="demo", code="a", title="alpha", metadata={"region": "eu"})]
)
await catalog.build()
print(sorted(catalog.indexes)) # ["code", "region", "title"]
asyncio.run(main())
For structured queries, snapshot persistence (save / load over file:// and hf://), and the index types in depth, see Building and searching, Indexes, and Snapshots and persistence.
Using a real provider plugin¶
The connectors above are synthetic. A real data source is an installed
parsimony-<name> distribution that registers itself through the
parsimony.providers entry-point group. Once installed, load it at runtime
through parsimony.discover:
import asyncio
from parsimony import discover
# discover.load("fred") loads a named provider (LookupError if not installed);
# discover.load_all() loads every installed provider, skipping failures.
providers = discover.load_all() # -> a Connectors collection
print(providers.names())
# Compose installed providers with your own connectors using +.
# bundle = providers + Connectors([daily_close])
discover.load_all() returns a Connectors collection you can compose with + exactly like the ones you built by hand. The Plugins and providers section covers installing, discovering, and authoring plugins.
Plugins are separate installs
The discover example above prints an empty list until you install a
provider, for example pip install parsimony-fred. Core never bundles a
connector, so there is nothing to discover out of the box.
See also¶
- Installation — base install and the optional-extras matrix
- Core concepts — the mental model behind connectors and catalogs
- The connector model — connectors, loaders, and enumerators in depth
- The Catalog — entities, indexes, search, and snapshots