Parsimony¶
Parsimony is a connector framework for financial data — typed fetch and hybrid-search catalogs. It gives you a small, agent-native data layer: connectors that fetch raw data through a typed, async call surface, and a portable in-memory catalog that indexes and searches over the entities those connectors discover.
The distribution is published to PyPI as parsimony-core (import name parsimony,
version 0.7.0, Apache-2.0). It runs on Python >=3.11 (3.11, 3.12, 3.13).
The two pillars¶
Parsimony is built around two complementary ideas.
-
Connectors — a connector is a small async Python callable plus metadata. The
@connectordecorator (and the stricter@loader/@enumeratorverbs) turn anasync definto a frozenConnector. The function's parameters are the connector's call surface — there is no bundledparamsobject. A connector returns raw data (a DataFrame, Series, scalar, or dict); the framework wraps it in aResult/TabularResultcarrying framework-builtProvenance. The immutableConnectorscollection composes connectors and is invoked withawait connectors[name](**kwargs). -
Catalog — a
Catalogis a portable, in-memory, searchable index over normalizedEntityrecords. It supports pluggable per-field indexes (BM25, FAISS vectors, hybrid fusion, DisMax), structured and broad search, and snapshot persistence to local paths or Hugging Face datasets.
Connectors ship as separate plugins
No connectors ship inside the core package. Every connector is published as its own
parsimony-<name> distribution and discovered at runtime through the
parsimony.providers entry-point group. The core library is the framework plus the
catalog. See Plugins and providers.
Two design choices show up throughout the code and are worth knowing up front: connectors
expose flat, top-level parameters (the conformance suite forbids bundling them into a
single params: SomeModel object), and connector errors are
typed and agent-facing — default messages embed directives like
"DO NOT retry" so an LLM driving the connector can act on them. Connectors can also render
themselves for prompts via to_llm().
Install¶
The base install pulls only a small kernel (pydantic, pandas, pyarrow, httpx,
platformdirs). The heavy catalog runtime (FAISS, sentence-transformers, Hugging Face Hub)
is an optional extra that loads lazily — a plain import parsimony never imports torch or
faiss.
See Installation for the full optional-extras matrix.
A 60-second taste¶
This runs with only parsimony-core installed. Define a @connector, attach an output
schema, await it, and read the typed TabularResult.
import asyncio
import pandas as pd
from parsimony import Column, ColumnRole, OutputConfig, connector
OUTPUT = OutputConfig(
columns=[
Column(name="date", role=ColumnRole.KEY, namespace="demo"),
Column(name="value", role=ColumnRole.DATA, dtype="numeric"),
]
)
@connector(output=OUTPUT, tags=["demo"])
async def demo_fetch(series_id: str) -> pd.DataFrame:
"""Fetch a tiny demo time series by series_id."""
return pd.DataFrame({"date": ["2020-01-01", "2020-04-01"], "value": [1.0, 2.0]})
async def main() -> None:
result = await demo_fetch(series_id="GDP")
print(result.df) # the validated DataFrame
print(result.provenance.source) # 'demo_fetch'
print(result.provenance.params) # {'series_id': 'GDP'}
asyncio.run(main())
A few things this shows:
- The connector is
async; a plaindefwould raiseTypeErrorat decoration time. - The docstring becomes the connector's required
description— omit both and decoration raisesValueError. - The function returns a raw DataFrame. The framework applies the
OutputConfigschema and wraps the result in aTabularResultwithProvenance. Returning aResultor a(data, properties)tuple instead would raiseTypeError. result.provenanceis built by the framework — connectors never construct it. Itsparamsrecord only the call-time arguments (with any declaredsecretsstripped).
Composing connectors
Merge collections with the + operator, then invoke a member by name:
from parsimony import Connectors
bundle = Connectors([demo_fetch]) + Connectors([another_connector])
result = await bundle["demo_fetch"](series_id="GDP")
There is no .merge method — + is the composition primitive. See
Calling, binding, and composing.
A taste of the catalog¶
The catalog indexes Entity records so you can search them. A
catalog must be built before it can be searched. This example uses a keyword-only
BM25Index, which loads rank-bm25 lazily on first build.
Needs the standard extra
The BM25Index shown here resolves its backend on build(), so install the
standard extra first:
import asyncio
from parsimony import BM25Index, Catalog, Entity
async def main() -> None:
catalog = Catalog(name="demo", indexes={"title": BM25Index()})
catalog.set_entities(
[
Entity(namespace="demo", code="gdp", title="Gross domestic product"),
Entity(namespace="demo", code="cpi", title="Consumer price index"),
]
)
await catalog.build() # required before searching
matches, diagnostic = await catalog.search("price", limit=5)
for match in matches:
print(match.code, match.title, match.score)
asyncio.run(main())
catalog.search(...) returns a list of CatalogMatch records plus a
search diagnostic. Mutating a built catalog marks it dirty; search and save raise until you
rebuild. See The Catalog for the full lifecycle.
Using a real provider¶
Core ships no connectors, so the runnable examples above define their own. In practice you install a provider plugin and discover it at runtime:
from parsimony import discover
bundle = discover.load_all() # composes every installed parsimony-<name> plugin
print(bundle.names())
discover.load_all() is forgiving (it logs and skips a plugin that fails to import);
discover.load("fred") is strict and raises if a name is missing. See
Discovering installed providers. You can also list what is
installed from the shell with parsimony list.
Where to go next¶
- Installation — the optional-extras matrix
(
standard,standard-onnx,litellm,s3,all) and what each pulls in. - Quickstart — hands-on flows: a custom connector, a composed collection, and a small in-memory catalog.
- Core concepts — the mental model that ties connectors, results, entities, and the catalog together.
- The connector model — connectors in depth: defining, the loader/enumerator verbs, calling and binding, results, errors, and HTTP transport.
- The Catalog — entities, building and searching, indexes, ranking and fusion, embedders, snapshots, and data stores.
- Plugins and providers — discovering, authoring, and conformance-
testing your own
parsimony-<name>distribution.
See also¶
- Quickstart — the fastest path from install to a first result.
- Core concepts — how the pieces fit together.
- The connector model — the connector abstraction in full.
- Public API & import map — what to import from where.