Artifacts reference¶
Field-level reference for the artifact models — Dataset, Chart, Report, Script,
ScriptPreview — plus ArtifactRef, the SnapshotKind union, and the identity functions that
compute logical IDs and content hashes.
For the conceptual model behind these types (dual identity, snapshots, lineage, the .ockham
layout), read Artifacts, identity & lineage first. For the
read/write codecs (read_dataset, read_chart, deserialize_*, save_notebook, …) see the
I/O functions reference.
All user-facing models import from the top-level package:
ArtifactRef, SnapshotKind, LiveNameCollisionError, and every identity function live in the
parsimony_agents.identity module; VIRTUAL_LIVE_KINDS lives in parsimony_agents.virtual_path.
Shared curation envelope¶
Dataset, Chart, and Report all extend an internal base that carries the same identity and
curation fields. Every one of these models has:
| Field | Type | Default | Meaning |
|---|---|---|---|
schema_version |
int |
2 |
Version of the embedded-metadata schema. |
logical_id |
str |
"" |
Stable identity — "which artifact." Computed from the relevant identity function; empty until assigned. |
content_sha |
str |
"" |
Hash of the serialized snapshot bytes — "which version." Computed at persist time. |
title |
str |
"" |
Human title (curation). |
description |
str |
"" |
Longer prose description (curation). |
tags |
list[str] |
[] |
Free-form tags (curation). |
notes |
list[str] |
[] |
Bullet notes (curation). |
live_name |
str \| None |
None |
Workspace-tree slug. None hides the artifact from the live tree; an empty string is mapped to slug_from_title(title) at persist time by the registry. |
These are Pydantic models that act as curation envelopes — the lineage and metadata around a
payload, not the payload itself. The actual data (a DataFrameObject for a dataset, a
FigureObject for a chart, the markdown body for a report) is attached separately and is never
re-serialized blindly: codecs pull it out at persist time.
Dataset¶
class Dataset(_ArtifactBase):
type: Literal["dataset"] = "dataset"
notebook_refs: list[ArtifactRef] # producing notebooks
source_refs: list[ArtifactRef] # upstream data_objects / composing datasets
variable_name: str # kernel variable name (recipe field)
# _payload: DataFrameObject | None (PrivateAttr, in-process only)
A curated, published dataframe. On disk it lives at
.ockham/datasets/<logical_id>/<content_sha>.parquet with sibling curation.json and
log.jsonl.
Dataset fields¶
| Field | Type | Notes |
|---|---|---|
notebook_refs |
list[ArtifactRef] |
Refs to the notebook(s) that produced the dataset. Multi-notebook pipelines are allowed. |
source_refs |
list[ArtifactRef] |
Refs to upstream data_objects and/or composing datasets. |
variable_name |
str |
The kernel variable the agent extracted. A recipe field: it participates in dataset_logical_id and is required for refresh to re-extract from the kernel. The PATCH endpoint rejects edits to preserve replay semantics. |
Plus all the shared curation fields.
Dataset methods¶
with_payload(payload: DataFrameObject) -> Dataset— attaches the in-process dataframe payload and returns the same instance (for chaining). RaisesTypeErrorifpayloadis not aDataFrameObject. The payload is stored on a private attribute and is never serialized into the model — it is only read by the codec at.save()time.payload -> DataFrameObject | None— read-only property exposing the attached payload (Noneuntilwith_payloadis called).to_llm(mode="default") -> list[dict[str, Any]]— renders the artifact as compact XML text blocks for the agent: a<dataset title="…">wrapper with optional<description>/<notes>and a<sources>block ofsource_refs.to_frontend_dict() -> dict[str, Any]— flat JSON-serializable dict for a host UI:type,schema_version,logical_id,content_sha,title,description,notes,tags,live_name,notebook_refs(as dicts),source_refs(as dicts), andvariable_name.save(path: str | Path) -> None— writes the Parquet snapshot viawrite_dataset_bytes. RaisesValueErrorif no payload is attached, or ifpathdoes not end in.parquet. Parent directories are created.
import pandas as pd
from parsimony_agents import Dataset
from parsimony_agents.execution.outputs import DataFrameObject
from parsimony_agents.identity import ArtifactRef, dataset_logical_id
df = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})
nb_ref = ArtifactRef(kind="notebook", logical_id="analysis", content_sha="a" * 64)
source_refs: list[ArtifactRef] = []
logical_id = dataset_logical_id(
notebook_refs=[nb_ref],
variable_name="results",
source_refs=source_refs,
)
dataset = (
Dataset(
logical_id=logical_id,
title="Q4 Results",
description="Quarterly analysis",
tags=["important"],
notebook_refs=[nb_ref],
source_refs=source_refs,
variable_name="results",
live_name="q4_results",
)
.with_payload(DataFrameObject.from_pandas(df, local_dir="/tmp/dfo"))
)
dataset.save("/tmp/q4.parquet")
content_shais computed from the serialized bytes at persist time — you do not need to set it when constructing the model.
Chart¶
class Chart(_ArtifactBase):
type: Literal["chart"] = "chart"
notebook_ref: ArtifactRef | None # singular — one rendering notebook
source_dataset_refs: list[ArtifactRef] # plural — multi-dataset charts
source_refs: list[ArtifactRef] # rare: drawn straight from data_objects
variable_name: str # kernel variable (recipe field)
# _payload: FigureObject | None (PrivateAttr, in-process only)
A curated, published chart. On disk: .ockham/charts/<logical_id>/<content_sha>.vl.json with
sibling curation/log files. The payload is a FigureObject, which accepts either an Altair chart
or a raw Vega-Lite spec dict.
Chart fields¶
| Field | Type | Notes |
|---|---|---|
notebook_ref |
ArtifactRef \| None |
Singular — a chart is rendered in exactly one notebook. Required at persist time; optional on the model so vanilla .vl.json round-trips without curation don't fail on construction. |
source_dataset_refs |
list[ArtifactRef] |
Refs to source datasets. Plural — multi-dataset comparison charts welcome. |
source_refs |
list[ArtifactRef] |
Refs to upstream data_objects for the uncommon case of a chart drawn straight from data, bypassing an intermediate dataset. |
variable_name |
str |
The kernel variable. A recipe field (same semantics as Dataset.variable_name): participates in chart_logical_id, required for refresh. |
Plus all the shared curation fields.
Chart methods¶
with_payload(payload: FigureObject) -> Chart— attaches the in-process figure and returnsself.payload -> FigureObject | None— read-only property.to_llm(mode="default") -> list[dict[str, Any]]— XML text blocks: a<chart title="…">wrapper with optional<description>/<notes>, a<source_datasets>block ofsource_dataset_refs, and a<sources>block ofsource_refs.to_frontend_dict() -> dict[str, Any]— flat UI dict includingnotebook_ref,source_dataset_refs,source_refs,variable_name, and the shared identity/curation fields.save(path: str | Path) -> None— writes the.vl.jsonsnapshot. RaisesValueErrorif no payload is attached.
import altair as alt
import pandas as pd
from parsimony_agents import Chart
from parsimony_agents.execution.outputs import FigureObject
from parsimony_agents.identity import ArtifactRef, chart_logical_id
df = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})
fig = alt.Chart(df).mark_line().encode(x="x", y="y")
nb_ref = ArtifactRef(kind="notebook", logical_id="analysis", content_sha="a" * 64)
ds_refs = [ArtifactRef(kind="dataset", logical_id="ds-lid", content_sha="b" * 64)]
logical_id = chart_logical_id(
notebook_ref=nb_ref,
chart_variable_name="trend_chart",
source_dataset_refs=ds_refs,
source_refs=[],
)
chart = Chart(
logical_id=logical_id,
title="Trend Chart",
notebook_ref=nb_ref,
source_dataset_refs=ds_refs,
variable_name="trend_chart",
).with_payload(FigureObject(value=fig))
chart.save("/tmp/trend.vl.json")
Report¶
class Report(_ArtifactBase):
type: Literal["report"] = "report"
markdown: str = ""
subtitle: str = ""
formats: list[str] = ["html"] # default_factory
live_name_pins: dict[str, ArtifactRef] = {}
# embedded_refs is a derived property, not a stored field
A user-readable markdown deliverable. On disk: .ockham/reports/<logical_id>/<content_sha>.qmd —
a valid Quarto document with a YAML frontmatter block carrying formats and pins. Because the
frontmatter participates in content_sha, changing the format list or pin map forks a new
snapshot under the same logical_id.
Report fields¶
| Field | Type | Default | Notes |
|---|---|---|---|
markdown |
str |
"" |
The report body. Must be non-empty to serialize (see snapshot_bytes). |
subtitle |
str |
"" |
Optional secondary line under the title. Empty means no subtitle is rendered. |
formats |
list[str] |
["html"] |
Output formats requested at publish time (e.g. html, pdf, pptx, dashboard, revealjs). Order is preserved on serialization. |
live_name_pins |
dict[str, ArtifactRef] |
{} |
Frozen live_name → ArtifactRef map. The body's file://./<dir>/<live_name>.<ext> URIs resolve against this snapshot-local map, so old reports stay byte-stable even when artifacts are later renamed. It is the single source of truth for embedded refs. |
Plus all the shared curation fields. Reports have no separate payload attribute — the markdown body is the payload.
Report members¶
embedded_refs -> list[ArtifactRef]— a derived (computed) property, not stored. It is derived frommarkdownpluslive_name_pins, in body order and deduped. If eithermarkdownorlive_name_pinsis empty, it returns[].to_llm(mode="default") -> list[dict[str, Any]]— XML text blocks: a<report title="…">wrapper with optional<description>/<notes>and an<embedded>block ofembedded_refs.to_frontend_dict() -> dict[str, Any]— flat UI dict includingsubtitle,formats,embedded_refs(as dicts),live_name_pins(as a{live_name: ref_dict}map), and the shared fields.snapshot_bytes() -> bytes— the canonical on-disk shape: deterministic YAML frontmatter (title + subtitle + formats + pins) followed by the body, UTF-8 encoded. This is the single source of truth for what hits disk and what gets hashed intocontent_sha. RaisesValueErrorif the markdown is empty/whitespace-only.save(path: str | Path) -> None— writessnapshot_bytes()to disk. RaisesValueErrorunlesspathends in.qmd. Parent directories are created.
from parsimony_agents import Report
from parsimony_agents.identity import ArtifactRef, report_logical_id
trend_ref = ArtifactRef(kind="chart", logical_id="trend-lid", content_sha="a" * 64)
sales_ref = ArtifactRef(kind="dataset", logical_id="sales-lid", content_sha="b" * 64)
pins = {"trend": trend_ref, "sales": sales_ref}
logical_id = report_logical_id(
embedded_refs=[trend_ref, sales_ref],
title="Q4 2025 Earnings",
)
report = Report(
logical_id=logical_id,
title="Q4 2025 Earnings",
subtitle="Revenue beat by 8%",
markdown="The trend chart shows strong growth.\n",
formats=["html", "pdf"],
live_name_pins=pins,
)
snapshot = report.snapshot_bytes() # YAML frontmatter + body
report.save("/tmp/earnings.qmd")
Round-trip the snapshot with parse_snapshot / compose_snapshot — see the
I/O functions reference.
Script and ScriptPreview¶
A Script is a workspace notebook file — a path, its Python source, and any kernel output from
the last run. Its identity is the workspace path (not a hash of inputs like the curated
artifacts). Both import from the top level:
The default notebook path is notebooks/main.py.
Script¶
class Script(BaseModel):
type: Literal["script"] = "script"
path: str = "notebooks/main.py"
code: str = ""
output: KernelOutput # default: empty KernelOutput
data_objects: list[FetchLogEntry] # default: []
| Field | Type | Notes |
|---|---|---|
path |
str |
Workspace path, e.g. notebooks/<name>.py. Defaults to notebooks/main.py. |
code |
str |
Full Python source. |
output |
KernelOutput |
Cell results from a kernel run. Set after execution; empty otherwise. |
data_objects |
list[FetchLogEntry] |
Connector fetches recorded during the run. Set after a kernel run for UI previews. |
Execution is not implicit — output and data_objects are populated only after a kernel run
(e.g. via the return_notebook / edit_notebook agent tools with execute=True).
Methods:
to_preview() -> ScriptPreview— projects to a UI-orientedScriptPreview. It scansoutput.outputsfor the first exception and surfaces its first line aserror_message, copiesdata_objects, and includesoutputonly when there is actually something to show (outputs or a fetch log).to_frontend_dict() -> dict[str, Any]— JSON-serializable form, equivalent toto_preview().model_dump(mode="json").
ScriptPreview¶
class ScriptPreview(BaseModel):
type: Literal["script_preview"] = "script_preview"
path: str = "notebooks/main.py"
code: str
error_message: str | None = None
data_objects: list[FetchLogEntry] # default: []
output: KernelOutput | None = None
ui_message: str | None = None
# steps: computed property
| Field | Type | Notes |
|---|---|---|
path |
str |
Workspace path. |
code |
str |
Python source (required). |
error_message |
str \| None |
First line of the first exception in the run, if any. |
data_objects |
list[FetchLogEntry] |
Connector fetches. |
output |
KernelOutput \| None |
Present only when there is output worth showing. |
ui_message |
str \| None |
Optional non-technical detail after > in Created/… labels (used by return_notebook, not edit_notebook). |
Computed property:
steps -> list[ScriptStepPreview]— a@computed_fieldderived fromcodeby parsing the comment/code structure into ordered steps. It serializes alongside the other fields.
from parsimony_agents import Script
from parsimony_agents.execution.outputs import KernelOutput, PrimitiveObject
script = Script(
path="notebooks/analysis.py",
code="x = 42\nprint(x)",
output=KernelOutput(
outputs=[PrimitiveObject(value=42, type="int")],
fetch_log=[],
),
)
preview = script.to_preview()
print(preview.path) # notebooks/analysis.py
print(preview.error_message) # None (no exception in outputs)
print(preview.steps) # parsed ScriptStepPreview list
See the I/O functions reference for read_notebook, serialize_notebook,
save_notebook, and the notebook-state cache helpers.
ArtifactRef and SnapshotKind¶
SnapshotKind is the union of the five artifact kinds:
ArtifactRef is a frozen dataclass — immutable after construction — that pins exactly one
content_sha of one logical artifact:
| Field | Type | Meaning |
|---|---|---|
kind |
SnapshotKind |
Which kind of artifact this references. |
logical_id |
str |
"Which artifact" — the stable identity. |
content_sha |
str |
"Which snapshot" — the version. |
Construction validates eagerly: an unsupported kind, or an empty logical_id/content_sha,
raises ValueError.
ArtifactRef members¶
workspace_file_path -> str— the workspace-relative on-disk path for this snapshot. Versioned kinds resolve to.ockham/<kind>s/<logical_id>/<content_sha>.<ext>, where the extension is.py(notebook),.parquet(dataset/data_object),.vl.json(chart), or.qmd(report). Adata_objectresolves to its immutable object-pool path (seeobject_pool_path) — addressed only bycontent_sha.to_dict() -> dict[str, str]—{"kind", "logical_id", "content_sha"}, e.g. forlog.jsonlrows or JSON wire.from_dict(data) -> ArtifactRef(classmethod) — inverse ofto_dict.from_workspace_file_path(path) -> ArtifactRef | None(classmethod) — parses a canonical.ockham/...path back into a ref. ReturnsNonefor any path outside the canonical layout, giving callers a clean miss signal.to_xml_attrs() -> str— the inline attribute fragmentkind="…" logical_id="…" content_sha="…", for composing tags that carry extra attributes.to_self_closing_tag(tag="ref") -> str— a self-closing tag<{tag} kind="…" logical_id="…" content_sha="…"/>. The defaulttag="ref"matches the generic<ref/>lineage form; pass a more specific tag ("notebook_ref","data_object_ref") where it helps the agent.
from parsimony_agents.identity import ArtifactRef
ref = ArtifactRef(kind="dataset", logical_id="sales-lid", content_sha="b" * 64)
ref.workspace_file_path # .ockham/datasets/sales-lid/bbbb…bbbb.parquet
ref.to_dict() # {"kind": "dataset", "logical_id": "sales-lid", ...}
ref.to_self_closing_tag() # <ref kind="dataset" logical_id="sales-lid" .../>
ref.to_xml_attrs() # kind="dataset" logical_id="sales-lid" content_sha="…"
roundtrip = ArtifactRef.from_workspace_file_path(ref.workspace_file_path)
assert roundtrip == ref # frozen dataclasses compare by value
Identity functions¶
All identity helpers live in parsimony_agents.identity:
from parsimony_agents.identity import (
content_sha,
notebook_content_sha,
notebook_logical_id,
dataset_logical_id,
chart_logical_id,
report_logical_id,
data_object_logical_id,
object_pool_path,
slug_from_title,
)
The dual-identity split is the load-bearing idea: a logical_id answers "which artifact"
(stable across data refreshes, derived from a recipe) and a content_sha answers "which
version" (the hash of the serialized bytes). See
Artifacts, identity & lineage for the full model.
content_sha¶
SHA-256 of blob, returned as a lowercase hex string. This is the generic snapshot-hashing
primitive used for dataset/chart/report bytes.
notebook_content_sha¶
SHA-256 of a notebook's UTF-8 source bytes, with trailing whitespace stripped so the hash is
invariant under the serialize → deserialize round-trip (on-disk files gain a trailing newline;
parsed source has it stripped). This is the canonical content_sha for a notebook snapshot — it
is not the notebook's logical_id.
notebook_logical_id¶
Derives a notebook's logical_id from its working-copy path: notebooks/foo.py → "foo".
Notebooks are special — the live name IS the logical_id, not a hash of inputs. Renaming a
notebook produces a new logical_id and a fresh log (git-style); old snapshots stay reachable
under the old name. The path must start with notebooks/, be flat (no subdirectories), and end in
.py; otherwise ValueError.
dataset_logical_id¶
def dataset_logical_id(
*,
notebook_refs: list[ArtifactRef],
variable_name: str,
source_refs: list[ArtifactRef],
) -> str: ...
Hashes a dataset's recipe: producing notebooks, variable_name, and source refs.
notebook_refs and source_refs are sorted by logical_id so call-site ordering does not
perturb identity, and the notebook logical_id (not its content_sha) is hashed — so notebook
edits append a new snapshot under the unchanged dataset logical_id rather than forking a new
artifact. Raises ValueError if variable_name is empty.
chart_logical_id¶
def chart_logical_id(
*,
notebook_ref: ArtifactRef,
chart_variable_name: str,
source_dataset_refs: list[ArtifactRef],
source_refs: list[ArtifactRef],
) -> str: ...
Hashes a chart's recipe: rendering notebook, chart_variable_name, source datasets, and source
refs. source_dataset_refs and source_refs are sorted by logical_id. Raises ValueError if
notebook_ref.kind != "notebook" or if chart_variable_name is empty.
report_logical_id¶
Hashes a report's identity: its embedded refs (sorted by logical_id) plus the title. The
title participates so that two distinct reports referencing the same artifact set get distinct
logical_ids. Raises ValueError if title is empty.
data_object_logical_id¶
Hashes a data object's provenance, excluding fetched_at and properties. The result is
stable across data refreshes: the same source plus the same parameters yields the same
logical_id, regardless of when the fetch happened or what bytes came back.
object_pool_path¶
Workspace-relative path for an immutable object-pool Parquet entry:
.ockham/objects/<sha[:2]>/<sha[2:]>.parquet. Data objects are content-addressed only — there is
no logical_id segment in the path. Raises ValueError if content_sha is shorter than 3 hex
chars.
slug_from_title¶
ASCII-folds a title into a lowercase snake_case slug: Unicode is NFKD-normalized then ASCII-folded,
non-alphanumeric runs collapse to _, leading/trailing _ is stripped, and the result is capped
at max_len (default 40). Empty or all-non-ASCII input yields "untitled". Used to derive
live_name defaults at persist time.
from parsimony_agents.identity import slug_from_title
slug_from_title("Q4 2025 Earnings") # "q4_2025_earnings"
slug_from_title("US-GDP, 2020-2024") # "us_gdp_2020_2024"
slug_from_title("ñ café") # "n_cafe" (ASCII folding)
slug_from_title(" ") # "untitled"
slug_from_title("a" * 50) # "a" * 40 (capped)
LiveNameCollisionError and VIRTUAL_LIVE_KINDS¶
LiveNameCollisionError¶
from parsimony_agents.identity import LiveNameCollisionError
class LiveNameCollisionError(Exception):
def __init__(
self,
*,
live_name: str,
existing_logical_id: str,
kind: SnapshotKind = "notebook",
) -> None: ...
Raised when a live_name already belongs to another terminal's artifact in the same workspace —
i.e. a write or refresh would silently coalesce with an artifact this terminal has never
interacted with. Three keyword-only parameters are load-bearing and exposed as attributes:
| Attribute | Meaning |
|---|---|
live_name |
The slug both halves of the agent surface share — what the agent typed, and what it must use to read. |
existing_logical_id |
The colliding artifact's logical_id. Retrying with the same live_name after reading returns this value (continuation), not a fresh slug. |
kind |
Which artifact kind collided. The seen-set is keyed on (kind, live_name). Defaults to "notebook". |
The recovery loop is encoded in the exception message: read the existing artifact first (to bring
it into this terminal's seen-set), then re-issue the write — or pick a different live_name.
VIRTUAL_LIVE_KINDS¶
Maps each virtual live-tree directory to its canonical (kind, extension) pair:
VIRTUAL_LIVE_KINDS: Final[dict[str, tuple[str, str]]] = {
"notebooks": ("notebook", ".py"),
"data": ("dataset", ".parquet"),
"charts": ("chart", ".vl.json"),
"reports": ("report", ".qmd"),
}
This is the source of truth that connects the agent-facing virtual paths (notebooks/<name>.py,
data/<name>.parquet, charts/<name>.vl.json, reports/<name>.qmd) to the canonical .ockham
storage layout. Note the agent-facing directory for datasets is data/, while the canonical
storage directory is .ockham/datasets/. It is used by resolve_virtual_entry to map a live-tree
path back to its .ockham/<kind>s/<logical_id>/<content_sha> snapshot — see the
I/O functions reference.
See also¶
- Artifacts, identity & lineage — the conceptual model.
- I/O functions reference — codecs, snapshot serialization, virtual-path resolution, and closure traversal.
- Code execution — how
KernelOutputandFetchLogEntryare produced. - Saving and loading artifacts — task-oriented guide.