Code execution¶

When an agent answers a data question, it writes Python, runs it in a stateful namespace, and observes typed outputs. The execution engine also enforces per-cell timeouts and records which inputs produced each variable so published artifacts remain traceable.

Execution has three distinct modes:

A standalone Agent uses the in-process CodeExecutor unless you pass code_executor=. This is convenient for local scripts, but it is not an isolation boundary.
create_executor(cwd=...) selects a bubblewrap-confined subprocess on supported Linux hosts. The kernel has no network or credentials and reaches bound connectors through an RPC broker in the trusted supervisor. If bubblewrap is unavailable, the factory logs a warning and returns the in-process executor.
SandboxedCodeExecutor(cwd=..., confine=False) runs a plain child process. It separates processes but does not deny network access, clear the environment, or confine the filesystem. It is a development substrate, not a security boundary.

Every implementation satisfies BaseCodeExecutor. Inspect executor.capability_tier — "namespaces", "process", or "none" — rather than assuming code is confined.

To ask Agent to select the strongest available local boundary:

from parsimony_agents import Agent, create_executor

executor = create_executor(cwd="/tmp/my_workspace")
agent = Agent(model="claude-sonnet-4-6", code_executor=executor)

from pathlib import Path

from parsimony_agents.execution import CodeExecutor, OutputFactory

workspace = Path("/tmp/my_workspace")
executor = CodeExecutor(
    cwd=str(workspace),
    output_factory=OutputFactory(local_dir=workspace),
)

CodeExecutor.execute and eval are coroutines, so the host always drives them inside an event loop:

import asyncio


async def main() -> None:
    executor = CodeExecutor(
        cwd="/tmp/my_workspace",
        output_factory=OutputFactory(local_dir="/tmp/my_workspace"),
    )
    result = await executor.execute("df = pd.DataFrame({'a': [1, 2, 3]})\ndf")
    print(result.outputs)


if __name__ == "__main__":
    asyncio.run(main())

For the full executor surface, see the execution reference. For how outputs travel through the agent loop, see How it works and Events.

The kernel namespace (`pd`, `np`, `alt`, `datetime`, connectors, `load_dataset`)¶

A single CodeExecutor owns one kernel — a persistent Python namespace (locals dict) that survives across execute() calls. Variables you define in one cell are visible in the next, exactly like cells in a Jupyter notebook sharing one runtime.

await executor.execute("import math\nradius = 3")
result = await executor.execute("area = math.pi * radius**2\narea")
# `radius` and `math` are still bound; the kernel is stateful.

Every kernel boots with a fixed base namespace so agent code never has to import the common analysis stack:

Name	Bound to
`pd`	`pandas`
`np`	`numpy`
`alt`	`altair`
`datetime`, `timedelta`, `timezone`	the corresponding classes from the `datetime` module
`load_dataset`	the dataset-loading primitive (see below)
`read_pdf_text`, `read_excel`, `read_pptx_text`	document readers (see below)
each connector (e.g. `fred`)	a name-routed `RemoteConnector` stub under the subprocess executor; a memoized bound connector under the in-process executor

Connectors are bound in the supervisor and exposed to the kernel via set_connectors, which accepts a Connectors bundle or Mapping[str, Connectors] for named bindings:

from parsimony_fred import CONNECTORS as FRED

await executor.set_connectors(FRED.bind(api_key="..."))
# or, for an explicit binding name:
await executor.set_connectors({"fred": FRED.bind(api_key="...")})

result = await executor.execute('gdp = fred["fred_fetch"](series_id="GDPC1")')

The confined subprocess ships only connector names to the kernel. The kernel builds a RemoteConnector for each — a name-routed stub with no metadata, credentials, bound arguments, or function body. Calling a stub RPCs the ConnectorBroker in the supervisor, which holds the bound connectors and runs the real fetch. The in-process executor instead injects the real connectors wrapped in the same memoizing layer. Identical calls are served from a per-kernel cache, while post-fetch hooks still run on every call so the fetch log stays truthful. The cache is a kernel-lifetime concern — set_cwd() and clear_namespace() clear it. See Connectors for the connector model itself.

clear_namespace() resets the kernel to that base state (dropping any agent-defined names), re-applies connectors, and re-runs any registered setup snippets:

await executor.execute("def my_helper(x): return x * 2")
assert "my_helper" in executor.locals

await executor.clear_namespace()
assert "my_helper" not in executor.locals
assert "pd" in executor.locals  # base namespace restored

Typed kernel outputs (`DataFrameObject`, `FigureObject`, `PrimitiveObject`, `ExceptionObject`)¶

The executor never returns raw Python objects to the agent loop. Every value a cell displays — the trailing expression, a print(), an explicit display() — is converted into a typed kernel output by the OutputFactory. There are four built-in types:

Type	Wraps	Notes
`DataFrameObject`	a pandas DataFrame	parquet-backed via a `DataframeRef`, with head/tail previews and column dtypes
`FigureObject`	an Altair chart or Vega-Lite spec dict	serializes to a spec, computes a base64 PNG for the LLM
`PrimitiveObject`	a scalar (`str`, `int`, `float`, `bool`, `None`)	also the fallthrough for anything else, via `str(value)`
`ExceptionObject`	an exception	renders a redacted traceback; typed `parsimony.errors` surface their message without a traceback

OutputFactory.from_value performs the dispatch — custom handlers first, then a built-in isinstance chain (DataFrame → DataFrameObject, Altair → FigureObject, scalar → PrimitiveObject, Exception → ExceptionObject, else str(value) → PrimitiveObject):

from parsimony_agents.execution import (
    DataFrameObject,
    FigureObject,
    PrimitiveObject,
    ExceptionObject,
)

output = await executor.get("df")  # wraps the kernel variable `df`
if isinstance(output, DataFrameObject):
    llm_blocks = output.to_llm(mode="default")  # paginated CSV text blocks

Each typed output knows how to render itself for the model via to_llm(): DataFrames and long strings paginate (see TablePaginator / StringPaginator), and figures emit a data:image/png;base64 URI. These same objects are the payloads of published artifacts — a Dataset carries a DataFrameObject, a Chart carries a FigureObject. See Artifacts for the curation layer on top.

You can teach the factory new types. OutputFactory.register is a class method; handlers are checked before the built-in chain and receive (value, *, local_dir, backend):

import polars as pl

from parsimony_agents.execution import DataFrameObject, DataframeRef, OutputFactory


def handle_polars(val, *, local_dir, backend):
    return DataFrameObject(
        ref=DataframeRef.from_pandas(
            val.to_pandas(),
            ref="polars_result",
            local_dir=local_dir,
            backend=backend,
        )
    )


OutputFactory.register(pl.DataFrame, handle_polars)

`KernelOutput` and the fetch log¶

A single execute() or eval() call returns one KernelOutput, not a bare list. It bundles two things:

outputs: list[KernelOutputType] — the typed outputs produced by the cell.
fetch_log: list[FetchLogEntry] — a record of every connector fetch observed during the run.

result = await executor.execute('gdp = fred["fred_fetch"](series_id="GDPC1")\ngdp')

for output in result.outputs:
    ...  # DataFrameObject / FigureObject / PrimitiveObject / ExceptionObject

for entry in result.fetch_log:
    print(entry.provenance, entry.row_count, entry.column_names)
    print(entry.data_object_ref)  # ArtifactRef to the persisted parquet snapshot

Each FetchLogEntry carries the fetch provenance, row_count, column_names, head/tail samples, and an optional data_object_ref (ArtifactRef) pointing at the immutable parquet snapshot persisted into the content-addressed data-object pool. The fetch log is how the agent — and you — know what raw data a cell pulled, independent of what it assigned. Post-fetch hooks (the data-object persister and the fetch logger) run on every connector call, cached or not, so the log stays truthful even when the memoizing transport serves a result from cache.

KernelOutput.to_llm() renders the whole bundle into the list of {type, text|image_url} blocks the model consumes, paginating large tables and strings per the view configuration.

Defense-in-depth: safe builtins + AST sanitizer (blocked env/subprocess access)¶

When agent code runs out-of-process under bwrap (the "namespaces" capability tier on Linux), the kernel environment is already cleared, network is unavailable, and the filesystem is restricted to the workspace. The sanitizer and safe builtins are best-effort defense-in-depth for the in-process fallback only (non-Linux self-host, "none" capability tier). They are not the boundary.

Restricted __builtins__. The exec namespace is given a curated _SAFE_BUILTINS dict rather than the full builtin set. It keeps the data-analysis primitives (types, introspection, itertools, I/O, the exception classes) and re-binds __import__ so ordinary import statements still work, but it omits the obvious foot-guns.

An AST sanitizer. Before every compile(), assert_safe_code walks the parse tree and refuses code that tries to read secrets or spawn processes. It blocks, at compile time:

os.environ in any access shape — os.environ["X"], os.environ.get(...), os.environ.copy(), etc.
os.getenv(...).
subprocess.* — any attribute access under the subprocess module.
String literals that reference /proc/<pid>/environ (the literal-pattern case).

result = await executor.execute('key = os.getenv("FRED_API_KEY")')
assert isinstance(result.outputs[0], ExceptionObject)
# message: "os.getenv is blocked in agent code (secrets are not in scope)"

Under in-process execution, agent code shares the host process's real environment, so the sanitizer makes an attempt to read credentials fail loudly instead of silently exfiltrating them. Note that import subprocess itself is permitted (it is just a name) — what fails is every attribute access on it.

There is one escape hatch for local debugging: set the environment variable OCKHAM_DISABLE_SANITIZE to 1, true, or yes and assert_safe_code becomes a no-op. Leave it unset in any deployment that runs untrusted agent-authored code.

create_executor(prefer_boundary=True) is the selection API. Pass prefer_boundary=False to request the in-process executor explicitly. There is no environment-variable switch for execution mode.

Timeouts and the daemon-thread execution model¶

Every cell runs under a wall-clock timeout. The default is DEFAULT_CELL_TIMEOUT_S, read once at import from the EXECUTOR_CELL_TIMEOUT_S environment variable and defaulting to 300 seconds. You can override it per-call with timeout_seconds:

result = await executor.execute("while True: pass", timeout_seconds=1)

assert isinstance(result.outputs[0], ExceptionObject)
assert "timeout" in result.outputs[0].value.lower()

# The executor is not wedged — it remains usable afterwards.
followup = await executor.execute("x = 42")

Mechanically, the synchronous cell body runs in a dedicated daemon thread (_run_sync_in_thread_with_timeout) while the event loop waits. On timeout the executor injects SystemExit into that thread via PyThreadState_SetAsyncExc. This is best-effort: the exception is only delivered at the next Python bytecode boundary, so a cell blocked inside a C extension (a long NumPy/pandas call, a blocking native I/O call) cannot be interrupted. In that case the timed-out daemon thread is abandoned and reaped when the process exits; the event loop and the executor itself stay responsive and ready for the next cell. A process-global lock serializes overlapping executions because the executor changes the process working directory.

Producer attribution: `OriginLedger`, `RunScope`, `VariableOrigin`¶

The kernel records who produced each variable and what data it depended on, so that when the agent publishes an artifact, the lineage is already known. This is opt-in per call: pass producer_notebook_path to execute() and the executor opens a RunScope around the run.

result = await executor.execute(
    'df = pd.DataFrame({"a": [1, 2, 3]})',
    producer_notebook_path="notebooks/analysis.py",
)

origin = await executor.get_origin("df")
print(origin.notebook_path)  # "notebooks/analysis.py"
print(origin.load_refs, origin.fetch_refs)

The pieces:

OriginLedger — an in-memory map of variable name → VariableOrigin, one per kernel lifetime. Cleared on set_cwd() / clear_namespace().
RunScope — a mutable per-run accumulator opened by scope(). As the cell runs it collects the load/fetch refs the code touched. Scopes do not nest: opening one inside another raises RuntimeError.
VariableOrigin — the frozen result. At scope exit the executor diffs the kernel namespace before and after, then stamps every newly assigned name with a VariableOrigin recording the producing notebook plus the load_refs and fetch_refs it depended on. VariableOrigin is a @dataclass(frozen=True) — once stamped, lineage is immutable.

from dataclasses import dataclass

# Conceptual shape (frozen):
# @dataclass(frozen=True)
# class VariableOrigin:
#     notebook_path: str
#     load_refs: tuple[ArtifactRef, ...] = ()
#     fetch_refs: tuple[ArtifactRef, ...] = ()

Omit producer_notebook_path (and dry_run=True runs) and no scope opens — the run leaves no lineage, which is the right behaviour for scratch evaluation. For how these origins feed the dual-identity artifact model, see Artifacts.

`load_dataset` and cross-terminal gating¶

load_dataset is the injected primitive agent code uses to pull a previously published dataset back into the kernel. It takes a single positional live name (the dataset's live_name slug), resolves it from the workspace's .ockham/datasets/*/curation.json plus log.jsonl, records the load on the open RunScope (if any), and returns the materialized DataFrame:

result = await executor.execute(
    'inflation = load_dataset("inflation_monthly")',
    producer_notebook_path="notebooks/fed_analysis.py",
)

If the slug does not match a published dataset, or is ambiguous, the call raises LoadDatasetError (a KeyError subclass) with guidance.

load_dataset also enforces a cross-terminal gate. A workspace can host several terminals, each with its own set of live names it has actually seen. When the executor is given seen_live_names (a set[tuple[str, str]] of (kind, live_name) pairs) and ("dataset", slug) is not in it, the underlying resolve_dataset_slug raises LiveNameCollisionError — the dataset belongs to a sibling terminal, and the agent must read_artifact it before loading. (The load_dataset wrapper surfaces that collision to agent code as a LoadDatasetError carrying the collision message.)

result = await executor.execute(
    'inflation = load_dataset("inflation_monthly")',
    producer_notebook_path="notebooks/fed_analysis.py",
    seen_live_names={
        ("dataset", "inflation_monthly"),
        ("notebook", "fed_analysis.py"),
    },
)
# Drop ("dataset", "inflation_monthly") from the set and the load is rejected:
# the slug belongs to a sibling terminal.

This keeps one terminal from silently reaching into another terminal's namespace by slug collision. LiveNameCollisionError is importable from parsimony_agents.identity.

Document helpers (`read_pdf_text`, `read_excel`, `read_pptx_text`)¶

Three readers are injected into the kernel so agent code can pull text and tables out of office documents without importing third-party packages itself:

# Inside agent code (all three are pre-bound in the kernel):
text = read_pdf_text("reports/q4.pdf", max_pages=5)
df = read_excel("data/sales.xlsx", sheet_name=0)
slides = read_pptx_text("deck.pptx")  # list of {"index", "text"} dicts

Helper	Returns	Backed by
`read_pdf_text(path, *, max_pages=None)`	extracted text (`str`)	`pypdf`
`read_excel(path, sheet_name=0, **kwargs)`	a pandas `DataFrame` (xlsx, `openpyxl` engine)	`openpyxl`
`read_pptx_text(path)`	`list[dict]` of per-slide text extracts	`python-pptx`

These backends are part of the optional [documents] extra (also pulled in by [all]), and each helper imports its package lazily, at call time. That means parsimony-agents installs and the executor boots fine without the extra — calling a document helper without the backend installed raises an ImportError telling you to install the documents extra, rather than failing at import time. See Installation for the extras and SQL and document inputs for the broader document-ingestion workflow.

`DataframeRef` content addressing and `StorageBackend` healing¶

A DataFrameObject does not embed its rows — it carries a DataframeRef, an immutable, content-addressed handle to a parquet file:

from parsimony_agents.execution import DataframeRef

ref = DataframeRef.from_pandas(df, ref="q4_results", local_dir="/tmp/dfo")
# ref.content_hash  — MD5 of the canonicalized data (one hash, one file)
# ref.local_path    — parquet path under the session dir
# ref.remote_key    — optional key for cross-environment healing

The model is intentionally version-free: data is immutable, so one content hash maps to exactly one parquet file. materialize_sync() reads the frame back, trying the current session-dir layout first, then the stored absolute path, then a remote download — this is the healing path that lets a DataframeRef resolve even after it has moved between environments (a saved artifact reopened on a different host, say).

Remote persistence is optional and goes through the StorageBackend protocol, which any object with upload(key, local_path) and download(key, local_path) -> bool satisfies:

from parsimony_agents.execution import set_default_backend, set_default_local_root

set_default_backend(my_backend)      # used by materialize_sync() when none passed
set_default_local_root("/var/sessions/abc")  # default local root for path resolution

set_default_backend and set_default_local_root install process-level defaults so DataframeRef.materialize_sync() can heal without every call site threading a backend through. When no backend is configured, refs resolve purely from the local session directory. For the persisted, curated form of these frames — Dataset snapshots, the .ockham layout, and the data-object pool — see Artifacts and the I/O reference.

Code execution¶

The kernel namespace (pd, np, alt, datetime, connectors, load_dataset)¶

Typed kernel outputs (DataFrameObject, FigureObject, PrimitiveObject, ExceptionObject)¶

KernelOutput and the fetch log¶

Defense-in-depth: safe builtins + AST sanitizer (blocked env/subprocess access)¶

Timeouts and the daemon-thread execution model¶

Producer attribution: OriginLedger, RunScope, VariableOrigin¶

load_dataset and cross-terminal gating¶

Document helpers (read_pdf_text, read_excel, read_pptx_text)¶

DataframeRef content addressing and StorageBackend healing¶