Skip to content

Parsimony

Parsimony is a connector framework for financial data — typed fetch and hybrid-search catalogs. It gives you a small, agent-native data layer: connectors that fetch raw data through a typed, async call surface, and a portable in-memory catalog that indexes and searches over the entities those connectors discover.

The distribution is published to PyPI as parsimony-core (import name parsimony, version 0.7.0, Apache-2.0). It runs on Python >=3.11 (3.11, 3.12, 3.13).

The two pillars

Parsimony is built around two complementary ideas.

  • Connectors — a connector is a small async Python callable plus metadata. The @connector decorator (and the stricter @loader / @enumerator verbs) turn an async def into a frozen Connector. The function's parameters are the connector's call surface — there is no bundled params object. A connector returns raw data (a DataFrame, Series, scalar, or dict); the framework wraps it in a Result / TabularResult carrying framework-built Provenance. The immutable Connectors collection composes connectors and is invoked with await connectors[name](**kwargs).

  • Catalog — a Catalog is a portable, in-memory, searchable index over normalized Entity records. It supports pluggable per-field indexes (BM25, FAISS vectors, hybrid fusion, DisMax), structured and broad search, and snapshot persistence to local paths or Hugging Face datasets.

Connectors ship as separate plugins

No connectors ship inside the core package. Every connector is published as its own parsimony-<name> distribution and discovered at runtime through the parsimony.providers entry-point group. The core library is the framework plus the catalog. See Plugins and providers.

Two design choices show up throughout the code and are worth knowing up front: connectors expose flat, top-level parameters (the conformance suite forbids bundling them into a single params: SomeModel object), and connector errors are typed and agent-facing — default messages embed directives like "DO NOT retry" so an LLM driving the connector can act on them. Connectors can also render themselves for prompts via to_llm().

Install

pip install parsimony-core

The base install pulls only a small kernel (pydantic, pandas, pyarrow, httpx, platformdirs). The heavy catalog runtime (FAISS, sentence-transformers, Hugging Face Hub) is an optional extra that loads lazily — a plain import parsimony never imports torch or faiss.

pip install "parsimony-core[standard]"

See Installation for the full optional-extras matrix.

A 60-second taste

This runs with only parsimony-core installed. Define a @connector, attach an output schema, await it, and read the typed TabularResult.

import asyncio

import pandas as pd

from parsimony import Column, ColumnRole, OutputConfig, connector

OUTPUT = OutputConfig(
    columns=[
        Column(name="date", role=ColumnRole.KEY, namespace="demo"),
        Column(name="value", role=ColumnRole.DATA, dtype="numeric"),
    ]
)


@connector(output=OUTPUT, tags=["demo"])
async def demo_fetch(series_id: str) -> pd.DataFrame:
    """Fetch a tiny demo time series by series_id."""
    return pd.DataFrame({"date": ["2020-01-01", "2020-04-01"], "value": [1.0, 2.0]})


async def main() -> None:
    result = await demo_fetch(series_id="GDP")
    print(result.df)                       # the validated DataFrame
    print(result.provenance.source)        # 'demo_fetch'
    print(result.provenance.params)        # {'series_id': 'GDP'}


asyncio.run(main())

A few things this shows:

  • The connector is async; a plain def would raise TypeError at decoration time.
  • The docstring becomes the connector's required description — omit both and decoration raises ValueError.
  • The function returns a raw DataFrame. The framework applies the OutputConfig schema and wraps the result in a TabularResult with Provenance. Returning a Result or a (data, properties) tuple instead would raise TypeError.
  • result.provenance is built by the framework — connectors never construct it. Its params record only the call-time arguments (with any declared secrets stripped).

Composing connectors

Merge collections with the + operator, then invoke a member by name:

from parsimony import Connectors

bundle = Connectors([demo_fetch]) + Connectors([another_connector])
result = await bundle["demo_fetch"](series_id="GDP")

There is no .merge method — + is the composition primitive. See Calling, binding, and composing.

A taste of the catalog

The catalog indexes Entity records so you can search them. A catalog must be built before it can be searched. This example uses a keyword-only BM25Index, which loads rank-bm25 lazily on first build.

Needs the standard extra

The BM25Index shown here resolves its backend on build(), so install the standard extra first:

pip install "parsimony-core[standard]"
import asyncio

from parsimony import BM25Index, Catalog, Entity


async def main() -> None:
    catalog = Catalog(name="demo", indexes={"title": BM25Index()})
    catalog.set_entities(
        [
            Entity(namespace="demo", code="gdp", title="Gross domestic product"),
            Entity(namespace="demo", code="cpi", title="Consumer price index"),
        ]
    )
    await catalog.build()                          # required before searching
    matches, diagnostic = await catalog.search("price", limit=5)
    for match in matches:
        print(match.code, match.title, match.score)


asyncio.run(main())

catalog.search(...) returns a list of CatalogMatch records plus a search diagnostic. Mutating a built catalog marks it dirty; search and save raise until you rebuild. See The Catalog for the full lifecycle.

Using a real provider

Core ships no connectors, so the runnable examples above define their own. In practice you install a provider plugin and discover it at runtime:

pip install parsimony-fred
from parsimony import discover

bundle = discover.load_all()       # composes every installed parsimony-<name> plugin
print(bundle.names())

discover.load_all() is forgiving (it logs and skips a plugin that fails to import); discover.load("fred") is strict and raises if a name is missing. See Discovering installed providers. You can also list what is installed from the shell with parsimony list.

Where to go next

  • Installation — the optional-extras matrix (standard, standard-onnx, litellm, s3, all) and what each pulls in.
  • Quickstart — hands-on flows: a custom connector, a composed collection, and a small in-memory catalog.
  • Core concepts — the mental model that ties connectors, results, entities, and the catalog together.
  • The connector model — connectors in depth: defining, the loader/enumerator verbs, calling and binding, results, errors, and HTTP transport.
  • The Catalog — entities, building and searching, indexes, ranking and fusion, embedders, snapshots, and data stores.
  • Plugins and providers — discovering, authoring, and conformance- testing your own parsimony-<name> distribution.

See also