Skip to content

Quickstart

This page walks you through three hands-on flows with parsimony-core 0.7.0 (Python >=3.11): define and call your own connector, compose two connectors into a collection, and build a tiny searchable Catalog. The first two flows run with only the base install; the catalog flow needs one optional extra, which is called out below.

If you have not installed the package yet, start with Installation:

pip install parsimony-core

No connectors ship in core

The core package is the framework plus the catalog — it contains zero data connectors. Every real data source is published as its own parsimony-<name> distribution (a provider plugin) and discovered at runtime. The examples below define their own connectors so they run with nothing but parsimony-core installed.

1. Define and call a connector

A connector is a small async function plus metadata. The function's parameters are the connector's call surface, and the function returns raw data — a pandas DataFrame, Series, scalar, or dict. The framework wraps that raw value into a Result / TabularResult and attaches framework-built Provenance; connectors never construct those carriers themselves.

The @connector decorator turns an async def into a frozen Connector. When you attach an OutputConfig, the framework applies the declared schema to the returned DataFrame — renaming, coercing dtypes, and tagging each column with a role.

import asyncio

import pandas as pd

from parsimony import Column, ColumnRole, OutputConfig, TabularResult, connector

PRICE_OUTPUT = OutputConfig(
    columns=[
        Column(name="date", role=ColumnRole.KEY, namespace="demo_prices", dtype="date"),
        Column(name="close", role=ColumnRole.DATA, dtype="numeric"),
    ]
)


@connector(output=PRICE_OUTPUT, tags=["demo"])
async def daily_close(symbol: str) -> pd.DataFrame:
    """Return a tiny synthetic price series for a ticker symbol."""
    # Replace this with a real HTTP call — see the HTTP transport guide.
    return pd.DataFrame(
        {
            "date": ["2024-01-02", "2024-01-03", "2024-01-04"],
            "close": ["185.6", "188.1", "187.2"],
        }
    )


async def main() -> None:
    result = await daily_close(symbol="ACME")

    assert isinstance(result, TabularResult)
    print(result.df)                       # the schema-applied DataFrame
    print(result.df["close"].dtype)        # float64 — coerced from strings by dtype="numeric"
    print(result.provenance.source)        # "daily_close" (defaults to the function name)
    print(result.provenance.params)        # {"symbol": "ACME"}
    print([c.name for c in result.data_columns])  # ["close"]


asyncio.run(main())

A few things this example demonstrates, all enforced by the framework:

  • The connector is async and you call it with await. A non-coroutine function raises TypeError at decoration time.
  • A description is mandatory. It defaults to the stripped docstring; pass description= to override. With neither, decoration raises ValueError.
  • provenance.source is the connector name, which defaults to fn.__name__. provenance.params records only the call-time arguments.
  • dtype="numeric" coerced the string column to float64. The date column declared dtype="date" and was normalized to midnight timestamps.

Connectors must return raw data

Returning a Result, a TabularResult, or a (data, properties) tuple raises TypeError. The framework builds the output envelope; your job is to return the data. A schema or coercion failure during wrapping surfaces as a typed ParseError, not a bare ValueError.

2. Compose connectors and hide secrets

Connectors live in an immutable Connectors collection. You merge collections with the + operator (there is no .merge method), look connectors up by name with [], and invoke them with the canonical idiom await collection[name](**kwargs).

bind(**kwargs) fixes parameter values and returns a new connector with those parameters removed from its call surface. This is how you inject a secret or a base URL without exposing it: declare the parameter in secrets=(...), then bind it. Bound secrets never appear in the connector's signature, its LLM-facing card, or its provenance.

import asyncio

import pandas as pd

from parsimony import Connectors, connector


@connector
async def search_titles(query: str) -> pd.DataFrame:
    """Search a demo index by keyword."""
    return pd.DataFrame({"code": ["A", "B"], "title": [f"{query} alpha", f"{query} beta"]})


@connector(secrets=("api_key",))
async def fetch_series(series_id: str, api_key: str) -> pd.DataFrame:
    """Fetch one observation for a series id (requires an API key)."""
    return pd.DataFrame({"date": ["2024-01-01"], "value": [1.0]})


async def main() -> None:
    # Merge two single-connector collections with the + operator.
    bundle = Connectors([search_titles]) + Connectors([fetch_series])
    print(bundle.names())          # ["fetch_series", "search_titles"] (sorted)
    print("fetch_series" in bundle)  # True
    print(len(bundle))             # 2

    # Bind the secret across the whole collection. bind is scoped per connector:
    # it only fixes parameters a connector actually has, so search_titles is untouched.
    wired = bundle.bind(api_key="sk-demo")
    print(list(wired["fetch_series"].exposed_signature.parameters))  # ["series_id"]

    # Invoke by name with await.
    titles = await wired["search_titles"](query="GDP")
    print(len(titles.df))  # 2

    series = await wired["fetch_series"](series_id="UNRATE")
    print(series.provenance.params)  # {"series_id": "UNRATE"} — api_key is stripped


asyncio.run(main())

Why binding hides secrets

Provenance records only the connector's exposed (unbound) call-time parameters, and even a supplied secret-named argument is stripped. So a bound api_key is invisible both to provenance and to the describe() / to_llm() cards a connector renders for an agent prompt — that is the mechanism, not a convention.

Connectors also offers get, names, filter, search, describe, and to_llm — see Calling, binding, and composing. Note that [] takes a connector name, never an integer index: bundle[0] raises KeyError.

3. Build and search a Catalog

A Catalog is a portable, in-memory index over normalized Entity records. An entity has a namespace (lowercase snake_case), a code (its identifier within that namespace), a title, and arbitrary metadata. The lifecycle is fixed: construct the catalog, load entities with set_entities, materialize indexes with await build(), then await search(...).

import asyncio

from parsimony.catalog import BM25Index, Catalog, CatalogMatch, Entity


async def main() -> None:
    # An explicit BM25 index over the "title" field; default_field makes plain-text
    # (broad) queries search that field.
    catalog = Catalog("demo", indexes={"title": BM25Index()}, default_field="title")
    catalog.set_entities(
        [
            Entity(namespace="series", code="UNRATE", title="Unemployment Rate"),
            Entity(namespace="series", code="GDPC1", title="Real Gross Domestic Product"),
        ]
    )

    await catalog.build()  # materialize the indexes; required before searching

    matches, diagnostic = await catalog.search("unemployment", limit=5)
    print(diagnostic.mode)  # "broad" — a plain-text query against default_field
    for match in matches:
        assert isinstance(match, CatalogMatch)
        print(match.namespace, match.code, match.title, round(match.score, 3))


asyncio.run(main())

search returns a tuple (list[CatalogMatch], SearchDiagnostic). Each CatalogMatch carries the entity's namespace, code, title, metadata, and a final score. The SearchDiagnostic.mode tells you how the query was executed: "broad" for plain text, or "structured" when the query uses FIELD: value syntax.

Build before you search

Every mutation (set_entities, set_index, delete_many, …) marks the catalog dirty. Calling search() or save() while dirty raises a plain ValueError whose message tells you to await catalog.build() first. Re-run build() after any change.

BM25 needs the standard extra

BM25Index builds and scores with rank-bm25, which ships in the standard optional extra, not the base install. Run this flow after pip install "parsimony-core[standard]". That extra also unlocks the FAISS vector indexes, the default sentence-transformers embedder, and the hf:// snapshot loader. See Installation.

The default index policy

If you pass indexes=None (the default), the catalog uses the default index policy: at build() time it creates a BM25 index for code, title, and every metadata key observed across your entities. This is the quickest way to make a catalog searchable across all its fields:

import asyncio

from parsimony.catalog import Catalog, Entity


async def main() -> None:
    catalog = Catalog("demo")  # indexes=None -> default policy
    catalog.set_entities(
        [Entity(namespace="demo", code="a", title="alpha", metadata={"region": "eu"})]
    )
    await catalog.build()
    print(sorted(catalog.indexes))  # ["code", "region", "title"]


asyncio.run(main())

For structured queries, snapshot persistence (save / load over file:// and hf://), and the index types in depth, see Building and searching, Indexes, and Snapshots and persistence.

Using a real provider plugin

The connectors above are synthetic. A real data source is an installed parsimony-<name> distribution that registers itself through the parsimony.providers entry-point group. Once installed, load it at runtime through parsimony.discover:

import asyncio

from parsimony import discover

# discover.load("fred") loads a named provider (LookupError if not installed);
# discover.load_all() loads every installed provider, skipping failures.
providers = discover.load_all()  # -> a Connectors collection
print(providers.names())

# Compose installed providers with your own connectors using +.
# bundle = providers + Connectors([daily_close])

discover.load_all() returns a Connectors collection you can compose with + exactly like the ones you built by hand. The Plugins and providers section covers installing, discovering, and authoring plugins.

Plugins are separate installs

The discover example above prints an empty list until you install a provider, for example pip install parsimony-fred. Core never bundles a connector, so there is nothing to discover out of the box.

See also