SpineFrame Specification

Authoritative technical specification for SpineFrame v0.1.0. Last updated: March 2026.

Overview

SpineFrame is a checkpointable, replayable, forkable DAG execution kernel for structured AI workflows. It is model-agnostic, framework-agnostic, and treats provenance as a first-class concern.

Architecture Diagram

                            ┌──────────────────────────┐
                            │       CLI / Web UI        │
                            │   (Click / FastAPI+React) │
                            └─────────┬────────────────┘
                                      │
                          ┌───────────▼───────────────┐
                          │         Engine             │
                          │  run/resume/replay/fork    │
                          │  budget controls           │
                          │  dynamic replanning        │
                          │  graph expansion           │
                          └───┬───────┬───────┬───────┘
                              │       │       │
              ┌───────────────▼─┐  ┌──▼────┐  ▼──────────────────┐
              │  Trust Layer    │  │Stages │  │  Assurance Layer   │
              │                 │  │       │  │                    │
              │ Artifact        │  │ 53    │  │ Tool Policy        │
              │  Isolation      │  │ types │  │ Domain Assertions  │
              │ Graph Hash      │  │       │  │ Signed Provenance  │
              │  Validation     │  │ 3     │  │ Identity & Actor   │
              │                 │  │ pipes │  │  Tracking          │
              └───────┬─────────┘  └──┬────┘  └────────┬──────────┘
                      │               │                │
                      └───────┬───────┘                │
                              │                        │
              ┌───────────────▼────────────────────────▼──┐
              │            Artifacts & Events              │
              │  Content-addressed JSON (SHA-256)          │
              │  Append-only JSONL event log               │
              │  Immutable checkpoints                     │
              │  Merkle hash provenance chain              │
              └───────────────┬───────────────────────────┘
                              │
              ┌───────────────▼───────────────────────────┐
              │           Provider Layer                   │
              │                                            │
              │  Model: Anthropic, OpenAI (lazy imports)   │
              │  Search: Tavily, Serper, SearXNG, fallback │
              │  Fetch:  HTTP (stdlib)                     │
              │  Extract: Basic HTML (stdlib)              │
              │  Tool:   MCP (stdio + SSE transport)       │
              └────────────────────────────────────────────┘

Data Flow (Research Pipeline Example)

User Query
    │
    ▼
clarify_query ──► plan_query ──► search_web ──► refine_search
                                  (expand)       │
                                  :plan          │ sufficient?
                                  :batch_000     │ no → inject search_r2
                                  :batch_001     │ yes → continue
                                  :collect       ▼
                               fetch_pages ──► extract_chunks ──► extract_claims
                                (expand,        (expand,          (expand,
                                concurrent)     concurrent)       concurrent)
                                    │               │                 │
                                    ▼               ▼                 ▼
                               verify_claims ──► map_claims ──► synthesize
                                                                    │
                                                                    ▼
                                                              Final Report
                                                            + Provenance Chain
                                                            + Signed Bundle

Assurance Flow

Stage Execution
    │
    ├──► Artifact Isolation Check (read/write)
    │       └── pass/warn/fail
    │
    ├──► Tool Policy Check (per tool call)
    │       └── allow/deny + event log
    │
    ├──► Domain Assertions (post-stage)
    │       └── pass/warn/fail/approval_required
    │
    ├──► Event Emission (actor + timestamp)
    │       └── append to events.jsonl
    │
    └──► Provenance Hash Update
            └── merkle chain extension

Core Concepts

Run

A single pipeline execution. Each run gets a unique ID (12-char hex) and a directory under runs_dir/ containing all state.

Run directory layout:

runs/{run_id}/
  run.json              # RunManifest (query, pipeline, status, timestamps)
  graph.json            # GraphDefinition (DAG of StageNodes)
  events.jsonl          # Append-only event log
  artifacts/            # Stage output files (JSON, JSONL, HTML, Markdown)
  stages/{stage_id}/    # Per-stage metadata (meta.json)
  checkpoints/          # Immutable snapshots

Stage

A unit of work in the DAG. Each stage has:

stage_id -- unique identifier within the graph
stage_type -- maps to a registered Stage class via @register_stage("type")
depends_on -- list of stage_ids that must complete first
input_artifacts / output_artifacts -- declared artifact names
phase -- logical grouping for UI display (e.g. "Planning", "Search")

Stage statuses: pending | running | success | failure

Artifact

A file produced by a stage, stored in artifacts/. Every artifact has metadata:

{
  "schema_version": "1",
  "generated_at": "2026-03-07T...",
  "upstream_stage": "plan_query",
  "sha256": "a1b2c3..."
}

Content-addressed via SHA-256. Resume compares input hashes to skip unchanged stages.

Event

An entry in the append-only events.jsonl log:

{
  "timestamp": "2026-03-07T...",
  "event_type": "stage_completed",
  "stage_id": "plan_query",
  "data": {"status": "success", "cost_estimate": 0.015}
}

Event types: run_started, run_completed, run_failed, stage_started, stage_completed, stage_failed, stage_skipped, budget_exceeded, loop_sufficient, loop_max_reached, run_forked, graph_expanded, artifact_access_denied, tool_policy_decision, domain_policy_evaluated, provenance_signed

Data Models

StageNode

stage_id: str                    # unique within graph
stage_type: str                  # maps to @register_stage class
depends_on: list[str]            # upstream stage_ids
input_artifacts: list[str]       # declared input artifact names
output_artifacts: list[str]      # declared output artifact names
expandable: bool                 # can split into plan/batch/collect sub-stages
parent_stage: str                # for sub-stages: parent's stage_id
batch_index: int                 # for batch sub-stages: 0-based index (-1 = not a batch)
concurrent_batches: bool         # batch sub-stages can run in parallel
max_concurrent: int              # cap concurrent batches (0 = use engine's max_workers)
input_artifact_patterns: list[str]   # glob patterns (e.g. ["page_*.html"])
output_artifact_patterns: list[str]  # glob patterns
max_rounds: int                  # >0 enables dynamic replanning
phase: str                       # UI grouping (e.g. "Planning", "Search")

GraphDefinition

A list of StageNodes forming a DAG. Serialized to graph.json.

Validation rules:

No duplicate stage_ids
All depends_on references must exist in the graph
Topological sort must succeed (no cycles)

RunManifest

run_id: str           # 12-char hex UUID
query: str            # user's original query
pipeline: str         # pipeline name (e.g. "research")
created_at: str       # ISO 8601 timestamp
status: str           # run status
parent_run_id: str    # set on forked runs
fork_stage: str       # stage the fork branched from
instructions: str     # user instructions to guide LLM stages

Provenance Models

Source:       source_id, url, title, snippet, fetched_at, content_hash, search_score
Chunk:        chunk_id, source_id, text, start_offset, end_offset, content_hash
Claim:        claim_id, text, chunk_ids, confidence, stage_id, content_hash
EvidenceMap:  claim_id, supporting_chunks, source_ids, provenance_hash

Provenance hash chain:

chunk.content_hash = SHA-256(source_id + "|" + text)
claim.content_hash = SHA-256(text + sorted chunk_id:chunk_hash pairs)
evidence.provenance_hash = SHA-256(claim_hash + sorted chunk_hashes + sorted source_hashes)
root_hash = SHA-256(sorted provenance_hashes)

Validation: validate_provenance() recomputes all hashes and checks integrity. Exposed via spineframe verify CLI and /api/runs/{id}/provenance/verify endpoint.

BudgetConfig

max_cost_usd: float       # 0 = unlimited
max_stages: int            # 0 = unlimited
max_time_seconds: float    # 0 = unlimited

Engine Operations

run

Creates a new run directory, writes manifest and graph, executes stages in topological order. Expandable stages are expanded at execution time (plan_batches -> graph expansion -> batch execution -> collect).

resume

Loads an existing run. For each stage, computes SHA-256 of current input artifacts and compares to the hash recorded at last execution. Skips stages with unchanged inputs.

replay

Deletes all stage metadata and artifacts downstream of the specified stage, then re-executes from that point. Upstream artifacts are preserved.

fork

Copies the entire run directory to a new run ID. Clears downstream stage metadata and artifacts from from_stage. Optionally sets a new query and/or instructions. The forked run can then be resumed or replayed.

inspect

Returns the current state of a run: stages with status/cost/timing, checkpoints list, manifest metadata.

Expandable Stages

Stages with expandable=True split into sub-stages at execution time:

The stage's plan_batches() method returns a BatchPlan
The engine replaces the expandable node with: {id}:plan -> [{id}:{batch_id}, ...] -> {id}:collect
Sub-stages have parent_stage set to the original stage_id
Batch sub-stages can run concurrently if concurrent_batches=True
Sub-stage IDs use : separator (e.g. fetch_pages:batch_000)
Sub-stages inherit phase from their parent

After expansion, the graph is re-persisted. All engine mechanics (skip, resume, replay, fork) apply to individual sub-stages.

Dynamic Replanning

Stages with max_rounds > 0 trigger adaptive replanning:

After execution, the engine reads the stage's output artifact
If sufficient: false in the output and round < max_rounds:
- Injects new {base}_r{N}_search and {base}_r{N} stages into the DAG
- Rewires downstream dependencies
- Re-persists the graph
Loop round IDs use _r{N} suffix (e.g. refine_search_r2, refine_search_r3)
Injected stages inherit phase from the original stage

Stage Isolation

Each stage declares its artifact access via input_artifacts, output_artifacts, and glob patterns. The engine enforces access control via _check_access() with fnmatch:

Permissive mode (default): warns on undeclared access
Strict mode: raises errors on undeclared access

Research Pipeline

The reference implementation is an 11-stage web research pipeline:

Phase	Stage	Type	Expandable
Planning	`clarify_query`	`clarify_query`	No
Planning	`plan_query`	`plan_query`	No
Search	`search_web`	`search_web`	Yes
Search	`refine_search`	`refine_search`	No (max_rounds=3)
Search	`search_web_r2`	`search_web_r2`	Yes
Collection	`fetch_pages`	`fetch_pages`	Yes (concurrent)
Extraction	`extract_chunks`	`extract_chunks`	Yes (concurrent)
Extraction	`extract_claims`	`extract_claims`	Yes (concurrent, max_concurrent=4)
Verification	`verify_claims`	`verify_claims`	No
Verification	`map_claims`	`map_claims_to_sources`	No
Synthesis	`synthesize`	`synthesize_report`	No

Fast pipeline (research_fast): skips extract_claims, verify_claims, map_claims -- goes directly from extract_chunks to synthesize_direct. No provenance chain.

28 registered research stage types (11 main stages + plan/batch/collect variants for expandable stages + synthesize_direct). 53 total registered stage types across all pipelines.

OSINT Pipeline

A 10-stage domain reconnaissance pipeline with 13 stdlib-only source clients (zero API keys required):

Phase	Stage	Type	Expandable
Recon	`target_parse`	`osint_target_parse`	No
Recon	`dns_recon`	`osint_dns_recon`	No
Recon	`whois_rdap`	`osint_whois_rdap`	No
Enrichment	`subdomain_enum`	`osint_subdomain_enum`	Yes
Enrichment	`infrastructure`	`osint_infrastructure`	Yes (concurrent)
Enrichment	`web_archive`	`osint_web_archive`	No
Enrichment	`tech_fingerprint`	`osint_tech_fingerprint`	No
Identity	`identity_recon`	`osint_identity`	No
Analysis	`correlate`	`osint_correlate`	No
Analysis	`osint_report`	`osint_report`	No

15 registered OSINT stage types (10 main + plan/batch/collect for 2 expandable stages - 1 shared collect). Source clients use only stdlib (urllib.request, subprocess, socket, json, ssl).

Compliance Pipeline

A 6-stage audit evidence pipeline:

Phase	Stage	Type	Expandable
Collection	`target_parse`	`compliance_target_parse`	No
Collection	`evidence_gather`	`compliance_evidence_gather`	Yes
Analysis	`normalize`	`compliance_normalize`	No
Analysis	`map_to_controls`	`compliance_map_controls`	No
Review	`gap_analysis`	`compliance_gap_analysis`	No
Export	`export_package`	`compliance_export_package`	No

10 registered compliance stage types (6 main + plan/batch/collect for evidence_gather). Supports SOC2 and ISO 27001 control frameworks with keyword-based and LLM-enhanced mapping.

Provider Architecture

Providers are created via create_provider(type, config) with a registry and lazy imports.

Provider types:

model -- LLM provider (Anthropic, OpenAI). Must be configured explicitly.
search -- Search provider (Tavily, Serper, SearXNG, fallback). Must be configured explicitly.
fetch -- HTTP fetch. Defaults to "http" (stdlib).
extract -- HTML extraction. Defaults to "basic" (stdlib).
tool -- MCP tool provider (stdio + SSE transport). Optional.

Optional dependencies:

pip install spineframe[anthropic] -- Anthropic SDK
pip install spineframe[openai] -- OpenAI SDK
pip install spineframe[tavily] -- Tavily SDK
pip install spineframe[mcp] -- MCP tool support
pip install spineframe[pdf] -- PyMuPDF for PDF extraction
pip install spineframe[signing] -- Ed25519 provenance signing
pip install spineframe[web] -- FastAPI + Uvicorn for web UI
pip install spineframe[all] -- everything

CLI Commands

14 commands via Click:

Command	Description
`run`	Execute a full pipeline run
`resume`	Resume a run, skipping unchanged stages
`replay`	Re-execute from a specific stage downstream
`fork`	Clone a run, diverge from a stage with new query/instructions
`show`	Detailed run info (stages, costs, timing)
`status`	Visual tree view of run progress
`inspect`	Raw JSON dump of run state
`ls`	List all runs
`export`	Export as JSON, Markdown, or self-contained HTML
`diff`	Compare artifacts between two runs
`verify`	Validate provenance chain integrity + signatures
`keygen`	Generate Ed25519 signing keypair
`suggest-fork`	LLM-powered fork point suggestion
`web`	Launch the web UI

Web UI

FastAPI backend + React SPA frontend.

Backend routes:

GET /api/runs -- list runs
GET /api/runs/{id} -- inspect run (with live status from executor)
POST /api/runs -- create new run
POST /api/runs/{id}/resume -- resume run
POST /api/runs/{id}/replay -- replay from stage
POST /api/runs/{id}/fork -- fork run
DELETE /api/runs/{id} -- cancel run
POST /api/runs/{id}/approve-plan -- approve/modify research plan
GET /api/runs/{id}/plan -- get current research plan
GET /api/runs/{id}/diff/{other} -- compare runs
GET /api/pipelines -- list available pipelines
GET /api/runs/{id}/artifacts -- list artifacts
GET /api/runs/{id}/artifacts/{name} -- get artifact content
GET /api/runs/{id}/events -- get event log
POST /api/runs/{id}/suggest-fork -- LLM-powered fork point suggestion
POST /api/runs/{id}/approve-diff -- approve diff (actor-stamped)
GET /api/runs/{id}/provenance/verify -- verify provenance chain
POST /api/runs/{id}/provenance/sign -- sign provenance bundle (Ed25519)
POST /api/runs/{id}/policy-check -- evaluate domain policy assertions
GET /api/runs/{id}/provenance -- paginated provenance chain
GET /api/runs/{id}/artifact-diff/{other_id}/{name} -- unified diff for a single artifact
GET /api/runs/{id}/report -- HTML report
GET /api/runs/{id}/last-approved -- find most recent approved run for comparison
GET /api/health -- health check
WS /api/ws -- WebSocket for live progress events (global broadcast; clients filter by run_id in payload)

Frontend features:

Run management (create, monitor, resume, replay, fork, cancel)
Pipeline selector (research, OSINT, compliance)
Live WebSocket progress updates
Plan review and editing before execution continues
Stage pipeline visualization grouped by phase
Smart fork suggestions (prompt-first with LLM analysis)
Stage-level fork buttons
Embedded HTML report with provenance
Artifact browser with costs and timing
Run comparison (diff view with executive summary)
Provenance chain (paginated, citation back-navigation)

Serialization

All models use explicit to_dict() / from_dict() class methods. No pydantic, no ORM. Optional fields are omitted from serialization when at their default value (empty string, False, 0, empty list).

Graph and manifest are JSON files. Events are JSONL (one JSON object per line). Artifacts are JSON or JSONL depending on the stage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpineFrame Specification

Overview

Architecture Diagram

Data Flow (Research Pipeline Example)

Assurance Flow

Core Concepts

Run

Stage

Artifact

Event

Data Models

StageNode

GraphDefinition

RunManifest

Provenance Models

BudgetConfig

Engine Operations

run

resume

replay

fork

inspect

Expandable Stages

Dynamic Replanning

Stage Isolation

Research Pipeline

OSINT Pipeline

Compliance Pipeline

Provider Architecture

CLI Commands

Web UI

Serialization

FilesExpand file tree

SPEC.md

Latest commit

History

SPEC.md

File metadata and controls

SpineFrame Specification

Overview

Architecture Diagram

Data Flow (Research Pipeline Example)

Assurance Flow

Core Concepts

Run

Stage

Artifact

Event

Data Models

StageNode

GraphDefinition

RunManifest

Provenance Models

BudgetConfig

Engine Operations

run

resume

replay

fork

inspect

Expandable Stages

Dynamic Replanning

Stage Isolation

Research Pipeline

OSINT Pipeline

Compliance Pipeline

Provider Architecture

CLI Commands

Web UI

Serialization