BLISP lets stochastic agents propose computations while a typed execution layer grounds, canonicalizes, executes, hashes, and replays them deterministically. No unwarranted operation reaches execution. Every result is replayable by hash.
BLISP does not try to make LLMs deterministic; it makes the execution boundary deterministic.
Large language models propose computational pipelines from natural-language prompts. The operations they select may be structurally valid but semantically unwarranted—the operation exists in the system, but the user's request does not justify it. Schema validation catches malformed output. It does not catch valid-but-wrong execution.
“Build a momentum strategy on equity futures, ranked by Sharpe ratio.”
Family: MOM_REV (mean-reversion)
Metric: SRP (Sharpe)
Both are valid capabilities. Schema validation passes. The pipeline executes—and produces the opposite computational signal.
The output is correct in form and exactly wrong in substance. Constrained decoding restricts the model to the full set of valid names—all 36 family×metric pairs—but not to the per-prompt discovered subset. The grounding gate restricts to discovered names only.
BLISP interposes a mandatory admissibility boundary—the grounding gate—between stochastic reasoning and deterministic execution. Above the boundary, agents propose. Below it, everything is deterministic, typed, and content-addressed.
A live capability registry (244 operations, 4 strategy families, 9 metrics): operations, families, signal blocks, and recipes. Each entry is hashed over semantic, algebraic, and implementation layers.
Given natural-language terms, the system matches against the live registry using a four-tier cascade: exact, alias, tag, keyword. Unresolved terms cannot reach execution.
A deterministic function that checks whether every capability name in the agent's proposal has evidence in the discovery result. Names lacking evidence are rejected.
Admitted proposals become typed specification records with family, metric, parameter ranges, and data source. Parameter ranges expand into a morphism grid via Cartesian product.
Expressions are parsed, normalized, canonicalized, planned, and optimized through a six-stage typed compilation pipeline. Surface syntax differences collapse to canonical identity.
Each admissible morphism executes through a typed deterministic execution engine. Deterministic: same input, same registry, same output. No randomness below the boundary.
Every execution produces an 8-layer hash decomposing provenance into registry, request, morphisms, plans, artifacts, score, selection, and data. Fault localization without re-execution.
Identical grounded requests against identical data and registry produce bit-identical hashes. Compare two hashes to verify replay. Compare sub-hashes to localize divergence.
Each capability is hashed over three layers: semantic properties, algebraic type signature,
and implementation details. A fourth layer—discovery metadata (aliases, tags,
descriptions)—is explicitly excluded from the identity hash. Adding an alias like
“log returns” → dlog changes what agents can discover;
it does not change what dlog computes. The registry can improve discoverability
without invalidating any prior execution hash.
BLISP is built on a five-paper research program that formalizes the execution semantics, identity algebra, provenance structure, and behavioral geometry of AI-generated computation.
A mandatory admissibility boundary between stochastic AI reasoning and deterministic execution. Proposals whose capability names lack evidence from the user's terms are rejected before execution.
A typed specification space, canonicalization pipeline, and content-addressed hashing scheme that provides execution identity independent of surface syntax.
Stochastic prompt variation defines an equivalence relation on the execution space. Prompts that produce the same canonical execution form a quotient class. Execution fibers bundle equivalent proposals.
Every execution produces a decomposable provenance record. Sub-hash comparison localizes divergence without re-execution. Drift detection isolates which semantic layer changed.
Under stochastic prompt variation, many distinct proposals collapse into few execution identities. Synonym perturbations stay intra-fiber. Metric/family substitutions produce clean inter-fiber transitions.
Tool-augmented LLMs select tools directly with no admission gate between selection and execution. A valid but wrong tool call produces a silent failure. The grounding gate makes tool admission evidence-based and deterministic.
Two researchers running the same grounded request against the same data get bit-identical results. When results differ, 8-layer sub-hash comparison localizes the divergence to a specific semantic layer without re-execution.
Strategy families, scoring metrics, and parameter grids are content-addressed. Every research pipeline has a verifiable execution fingerprint. Six months later, the hash still validates.
The execution layer is domain-independent. Finance is the first package. The architecture—discovery, grounding, canonicalization, provenance—applies to any domain where AI-generated pipelines must be validated before execution.
Stochastic prompt variation generates many elements of ER. The grounding gate Γ admits only proposals with discovery evidence. Canonicalization collapses admitted proposals into equivalence classes BR/∼R, each with a unique canonical representative κR. Execution εR is a function on canonical representatives—deterministic by construction. The provenance record PR decomposes the full execution into 8 semantic layers for audit and fault localization.