Deterministic execution substrate

Deterministic execution for AI‑generated research.

BLISP lets stochastic agents propose computations while a typed execution layer grounds, canonicalizes, executes, hashes, and replays them deterministically. No unwarranted operation reaches execution. Every result is replayable by hash.

BLISP does not try to make LLMs deterministic; it makes the execution boundary deterministic.

Prompt
Agent Proposal
Grounding Gate
Canonical Execution
8-Layer Provenance
Replayable Result
23.3% → 10.0%
Valid-but-unwarranted executions reduced by grounding gate
100% → 0%
Unwarranted executions on undiscoverable prompts
50/50
Replay runs produced bit-identical execution hashes
<14 ms
Grounding overhead per request
The problem

AI agents can reason. They cannot be trusted to execute unchecked.

Large language models propose computational pipelines from natural-language prompts. The operations they select may be structurally valid but semantically unwarranted—the operation exists in the system, but the user's request does not justify it. Schema validation catches malformed output. It does not catch valid-but-wrong execution.

Example: valid-but-unwarranted execution

User request

“Build a momentum strategy on equity futures, ranked by Sharpe ratio.”

vs

Agent proposal

Family: MOM_REV (mean-reversion)
Metric: SRP (Sharpe)

Both are valid capabilities. Schema validation passes. The pipeline executes—and produces the opposite computational signal.

The output is correct in form and exactly wrong in substance. Constrained decoding restricts the model to the full set of valid names—all 36 family×metric pairs—but not to the per-prompt discovered subset. The grounding gate restricts to discovered names only.

System design

The missing boundary between proposal and execution.

BLISP interposes a mandatory admissibility boundary—the grounding gate—between stochastic reasoning and deterministic execution. Above the boundary, agents propose. Below it, everything is deterministic, typed, and content-addressed.

01 Registry

A live capability registry (244 operations, 4 strategy families, 9 metrics): operations, families, signal blocks, and recipes. Each entry is hashed over semantic, algebraic, and implementation layers.

02 Discovery

Given natural-language terms, the system matches against the live registry using a four-tier cascade: exact, alias, tag, keyword. Unresolved terms cannot reach execution.

03 Grounding Gate

A deterministic function that checks whether every capability name in the agent's proposal has evidence in the discovery result. Names lacking evidence are rejected.

04 Specification

Admitted proposals become typed specification records with family, metric, parameter ranges, and data source. Parameter ranges expand into a morphism grid via Cartesian product.

05 Canonicalization

Expressions are parsed, normalized, canonicalized, planned, and optimized through a six-stage typed compilation pipeline. Surface syntax differences collapse to canonical identity.

06 Execution

Each admissible morphism executes through a typed deterministic execution engine. Deterministic: same input, same registry, same output. No randomness below the boundary.

07 Provenance

Every execution produces an 8-layer hash decomposing provenance into registry, request, morphisms, plans, artifacts, score, selection, and data. Fault localization without re-execution.

08 Replay

Identical grounded requests against identical data and registry produce bit-identical hashes. Compare two hashes to verify replay. Compare sub-hashes to localize divergence.

Design principle

Description/identity separation.

Each capability is hashed over three layers: semantic properties, algebraic type signature, and implementation details. A fourth layer—discovery metadata (aliases, tags, descriptions)—is explicitly excluded from the identity hash. Adding an alias like “log returns” → dlog changes what agents can discover; it does not change what dlog computes. The registry can improve discoverability without invalidating any prior execution hash.


Research Program
Five papers

The scientific backbone.

BLISP is built on a five-paper research program that formalizes the execution semantics, identity algebra, provenance structure, and behavioral geometry of AI-generated computation.

Paper 1

The Grounding Gate

A mandatory admissibility boundary between stochastic AI reasoning and deterministic execution. Proposals whose capability names lack evidence from the user's terms are rejected before execution.

F3 rate: 23.3% → 10.0% (p = 0.027)
Undiscoverable: 100% → 0%
Paper 2 — forthcoming

Canonical Execution Semantics

A typed specification space, canonicalization pipeline, and content-addressed hashing scheme that provides execution identity independent of surface syntax.

8-layer provenance decomposition
50/50 bit-identical replays
Paper 3

Execution Categories

Stochastic prompt variation defines an equivalence relation on the execution space. Prompts that produce the same canonical execution form a quotient class. Execution fibers bundle equivalent proposals.

Execution classes as categorical quotient objects
Paper 4

Provenance Algebra

Every execution produces a decomposable provenance record. Sub-hash comparison localizes divergence without re-execution. Drift detection isolates which semantic layer changed.

8 semantic layers · fault localization by sub-hash
Paper 5

Execution Fibers

Under stochastic prompt variation, many distinct proposals collapse into few execution identities. Synonym perturbations stay intra-fiber. Metric/family substitutions produce clean inter-fiber transitions.

2,200 proposals · synonym ≈ intra-fiber
metric/family swap = inter-fiber
Why it matters

Different audiences, one execution problem.

For AI research

Agents need execution substrates, not just tool APIs.

Tool-augmented LLMs select tools directly with no admission gate between selection and execution. A valid but wrong tool call produces a silent failure. The grounding gate makes tool admission evidence-based and deterministic.

For research

Computations must be replayable, comparable, and attributable.

Two researchers running the same grounded request against the same data get bit-identical results. When results differ, 8-layer sub-hash comparison localizes the divergence to a specific semantic layer without re-execution.

For finance

Systematic research needs deterministic provenance from prompt to portfolio.

Strategy families, scoring metrics, and parameter grids are content-addressed. Every research pipeline has a verifiable execution fingerprint. Six months later, the hash still validates.

For infrastructure

BLISP turns agent outputs into typed, admissible, content-addressed executions.

The execution layer is domain-independent. Finance is the first package. The architecture—discovery, grounding, canonicalization, provenance—applies to any domain where AI-generated pipelines must be validated before execution.

Infrastructure thesis

Why this can become infrastructure.

BLISP does not make the model truthful. It prevents unwarranted proposals from silently becoming executions. The model reasons stochastically. The execution layer operates deterministically. The boundary between them is the contribution.
Formal structure

The execution pipeline, formally.

ERΓ ⟶ BR/∼RκRεRPR
ER
Stochastic proposal space—all agent-generated proposals
Γ
Grounding gate—rejects proposals without discovery evidence
BR/∼R
Execution identity—equivalence classes under canonicalization
κR
Canonical representative—one expression per equivalence class
εR
Deterministic execution—same canonical input, same output
PR
8-layer provenance record—decomposable, content-addressed

Stochastic prompt variation generates many elements of ER. The grounding gate Γ admits only proposals with discovery evidence. Canonicalization collapses admitted proposals into equivalence classes BR/∼R, each with a unique canonical representative κR. Execution εR is a function on canonical representatives—deterministic by construction. The provenance record PR decomposes the full execution into 8 semantic layers for audit and fault localization.