Sextant

A Plan-and-Execute multi-agent orchestrator written as the smallest legible version that's still production-shaped. Type a question, the planner builds a DAG, specialists run in parallel, you get a synthesis with cost transparency.

~820 LoC kernel · 106 tests · 97% local coverage on src/core/*.ts · $0.05-0.12 per real-research run (Anthropic web_search_20250305 pricing as of 2026-05-09) · TypeScript · MIT

A sextant fixes position by combining sightings from multiple angles. The system is shaped the same way: the planner picks the angles, specialists take their sightings in parallel, the synthesis fixes the position. A judge that scores the fix and a memory that carries the chart forward land in Phases 4-5.

What this is

A reference implementation of the Plan-and-Execute pattern in four layers:

Orchestrator - planner produces a DAG, kernel runs nodes in parallel under a concurrency cap, bounded replanner adapts when a precondition fails.
Specialists - retrieval (BM25 + vector), web/API, synthesis. Pluggable adapters; no framework lock.
Memory - working KV per run, vector store for embeddings, episodic store for full (goal, plan, trace, outcome) tuples.
Observability - OpenTelemetry spans, Langfuse exporter env-gated, LLM-as-Judge harness with a frozen rubric.

Phase 1 (kernel) and Phase 2 (Anthropic provider + planner + working demo with real Anthropic web_search) are shipped on main. Phases 3-5 are gated.

What this isn't

Not a framework. The kernel is ~820 LoC. Read it, fork it, replace any layer.
Not a LangGraph replacement. LangGraph is battle-tested with parallel execution, persistence, and conditional edges. If you're already on it, stay there.
Not a hosted product. Run it where your code already runs.
Not Python. The provider abstraction is portable, but this repo is TS-first. Python port is on the table if there's pull for it.

Quick start

pnpm install
cp .env.example .env  # add your ANTHROPIC_API_KEY
pnpm demo:basic                                # default goal, real Anthropic web search
pnpm demo:basic "your question here"           # ask anything; planner builds the DAG live
SEXTANT_FIXTURE_SEARCH=1 pnpm demo:basic       # offline fixture mode for CI / no-network demos
pnpm demo:rag                                  # 4-node DAG with hybrid retrieval (Phase 3, gated)
pnpm eval                                      # full eval harness with LLM-as-Judge (Phase 4, gated)

Phase 2 (DAG kernel + planner + executor + working demo with real web search) is shipped. The default demo hits the Anthropic native web_search server tool, so any goal you pass becomes a real research run. Cost is roughly $0.05 for simple single-search queries, $0.12 for multi-angle comparisons. Phase 3+ (adaptive replanning + retrieval + observability + eval) is gated on inbound signal. See the build plan below.

Use it from your code

The kernel and adapters are exported as a library. ~20 lines gets you a working multi-agent research run:

import {
  run,
  createAnthropicProvider,
  createAnthropicWebSearch,
  createSynthSummarize,
  webSearchToolSchema,
  synthSummarizeToolSchema,
} from '@bambushu/sextant';

const provider = createAnthropicProvider();
const search = createAnthropicWebSearch();
const synth = createSynthSummarize({ provider });

const result = await run('How do agentic orchestration frameworks compare in 2026?', {
  provider,
  tools: { 'web.search': search, 'synth.summarize': synth },
  toolSchemas: [webSearchToolSchema, synthSummarizeToolSchema],
  onPlan: ({ plan }) => console.log(`Plan: ${plan.nodes.length} nodes`),
});

const finalNodeId = result.plan.nodes.at(-1)!.id;
console.log(result.outputs.get(finalNodeId));
console.log(`Cost: $${(result.costSpentUsd + result.plannerCostUsd).toFixed(4)}`);

Bring your own tools by adding entries to tools and toolSchemas. The planner reads the schema descriptions and picks them when a goal calls for it. The kernel handles concurrency, abort signals, retries, cost accounting, and (in Phase 3) bounded replanning.

Currently install from GitHub: pnpm add github:Bambushu/sextant. npm publish lands at v0.3.

What does it look like?

Click to expand a recorded run of pnpm demo:basic

=== Sextant ===
Goal: How do recent agentic-AI orchestration frameworks compare in 2026?
Mode: live web search (Anthropic web_search_20250305, ~$0.01 per search call)
Cost cap: $0.20 (run aborts if exceeded)

[planning]
  claude-sonnet-4-6 -> 4-node DAG ($0.006339, 1 attempt)
    search1      (web.search)
    search2      (web.search)
    search3      (web.search)
    summarize    (synth.summarize) <- [search1, search2, search3]

[executing]
  ok  search2      11.6s   $0.0358
  ok  search1      11.7s   $0.0338
  ok  search3      13.1s   $0.0349
  ok  summarize    5.3s    $0.0059

[result]

# Agentic-AI Orchestration Frameworks in 2026: Key Comparison

Three frameworks dominate 2026 deployments: LangGraph, CrewAI, and AutoGen
(search2, search1). LangGraph uses graph-based state management for explicit
workflow control, excelling at cyclical and complex routing scenarios. CrewAI
emphasizes role-based agent orchestration with predefined collaboration
patterns... [full synthesis at examples/recorded-run.md]

[summary]
  Status:           succeeded
  Wall clock:       23.4s
  Planner cost:     $0.006339
  Specialist cost:  $0.110326
  Total cost:       $0.116665  (under cap)

Captured 2026-05-09 against Claude Sonnet 4.6 (planner) + Haiku 4.5 (specialists) with the native web_search_20250305 server tool. Real research, real synthesis, real $0.116665 cost for a 3-search comparison. Sonnet read "compare" in the goal and fanned out into three parallel web.search nodes; the kernel held summarize back until all three resolved, then handed their outputs through working memory. The synthesis cites upstream node ids inline ((search2, search1)), surfacing how downstream specialists name and read upstream results.

Full trace + cost breakdown at examples/recorded-run.md. Run with your own goal: pnpm demo:basic "your question here".

Architecture

flowchart TB
    user([User goal]) --> planner

    subgraph L1[Layer 1 - Orchestrator]
        planner[Planner Agent]
        kernel[DAG Kernel]
        replanner[Replanner Hook]
        planner --> kernel
        kernel --> replanner
        replanner -->|assumption broken| planner
    end

    subgraph L2[Layer 2 - Specialist Agents]
        retrieval[Retrieval Agent<br/>BM25 + vector + RRF]
        web[Web/API Agent]
        synthesis[Synthesis Agent<br/>conflict resolution]
    end

    kernel -->|dispatch ready node| retrieval
    kernel -->|dispatch ready node| web
    kernel -->|dispatch ready node| synthesis

    retrieval --> working
    web --> working
    synthesis --> working

    subgraph L3[Layer 3 - Memory]
        working[Working Memory<br/>run-scoped KV]
        vector[(Vector Store<br/>pgvector default)]
        episodic[(Episodic Store<br/>run history)]
        retrieval -.read.-> vector
        kernel -.write.-> episodic
    end

    working --> replanner

    subgraph L4[Layer 4 - Observability and Eval]
        otel[OpenTelemetry Spans]
        langfuse[Langfuse Exporter<br/>env-gated]
        judge[LLM-as-Judge]
        otel --> langfuse
        synthesis -.span.-> otel
        retrieval -.span.-> otel
        web -.span.-> otel
        planner -.span.-> otel
        replanner -.span.-> otel
    end

    synthesis --> output([Final answer + trace])
    output --> judge
    judge --> report[(Eval Report)]

    classDef agent fill:#2c3a4a,stroke:#5a8,color:#fff
    classDef store fill:#3a2c2c,stroke:#a85,color:#fff
    classDef obs fill:#2c3a3a,stroke:#5aa,color:#fff
    class planner,kernel,replanner,retrieval,web,synthesis,judge agent
    class vector,episodic,report store
    class otel,langfuse,working obs

GitHub renders the diagram above natively. The same source, plus a styled rendering pipeline for non-GitHub viewers, lives at docs/architecture.mmd.

Why this pattern

Three patterns dominate senior agentic briefs.

Pattern	Strength	Failure mode
ReAct (single loop)	Simple. Good for short tasks.	Public reports describe coherence loss past 5-7 steps. We have not benchmarked this ourselves.
Plan-and-Execute	Predictable cost. Parallelism is straightforward.	Brittle if the world changes mid-run.
LangGraph state-graph	Most flexible. Battle-tested. Parallel execution and checkpointing built in.	Framework-coupled. Orchestration logic is harder to read end-to-end.

Sextant is Plan-and-Execute with bounded adaptive replanning. The planner produces a DAG. The kernel runs nodes in parallel up to a configured concurrency limit, in topological order. After each completed node, a replan hook checks downstream Zod preconditions against the new state. If a precondition fails, the planner is re-invoked with the state diff and the failed node id, subject to a per-run replan cap.

We are not claiming Sextant beats LangGraph at every workload. We are claiming it sits at a legible middle for teams who would rather own ~820 LoC of orchestration code than depend on a framework.

Compared to other tools

If you want	Use	What Sextant offers instead
Web research from a chat UI	Claude.ai with web_search, Perplexity	A library you embed in your own app, with cost transparency, fan-out parallelism, and a typed kernel you can extend
Production multi-agent orchestration with checkpointing, persistence, conditional edges	LangGraph	A smaller, readable kernel for teams that would rather own ~820 LoC than depend on a framework
Role-based agent teams with structured handoffs	CrewAI	A DAG-first model where the planner picks the team for each goal, instead of you defining roles up front
Token-by-token reasoning with tool use	Anthropic computer use, OpenAI Agents SDK	A planner that commits to a graph up front, so you can budget cost and parallelize before execution
A polished framework with a community and a roadmap	LangGraph, CrewAI, Mastra	Sextant is a reference primitive, not a framework. Fork-and-modify is the intended consumption pattern

This isn't a "best framework" claim. It's a positioning: Sextant is for teams who want the smallest readable Plan-and-Execute primitive that's still production-shaped, with provider portability and pluggable specialists.

DAG kernel contract

The pieces below are what Phase 1 must implement. They exist to keep the architecture from being hand-waved.

Concern	Spec
Node schema	`{ id, tool, inputs (Zod), outputs (Zod), preconditions (Zod predicate over RunState), maxRetries, timeoutMs }`
Replan trigger	Post-node hook re-evaluates downstream `preconditions` against updated `RunState`. Failed precondition fires the planner with state diff + failed node id.
Replan bound	`maxReplans` per run (default 3). `maxReplansPerNode` per node id. Run fails with `ReplanExhausted` after the cap.
Concurrency	Ready set runs in parallel up to `concurrencyLimit` (default 4). Slow nodes don't block siblings.
Backoff	Per-node retry with exponential backoff. Retries are separate from replans.
Cost guard	Per-run token budget enforced at the LLM layer. Aborts on exceed. Default ~$1.00 USD-equivalent.
Cancellation	`AbortSignal` plumbed through all nodes and tools.
Trace redaction	OTel spans redact tool inputs by default; verbose tracing is opt-in.

Limitations and known risks

Replan thrash if a node's precondition keeps failing. Caps and tests guard against this; pathological tools can still hit the cap.
This is currently TypeScript only. If your team's stack is LangChain Python, Sextant will be a poor fit; a Python port is open as a follow-up.
Phase 1 + Phase 2 are shipped and live-verified. Real research demo runs at ~$0.05 (single-search queries) to ~$0.12 (3-search comparisons) per goal. Numbers for Phase 3-5 are still targets until each phase ships.

Layered components

Orchestrator

The DAG kernel is pure functions plus an execute loop. Plan validation, topological sort, parallel ready-set computation (capped by concurrencyLimit), and run-state transitions are all separable. The replanner hook is opt-in: pass { replan: true, maxReplans: 3 } to execute() and any post-node Zod precondition that fails will trigger a bounded re-plan with the state diff and the failed node id.

Specialist agents

Each agent is a Tool the kernel can dispatch. The default specialists:

Retrieval: pgvector for vector, an in-memory BM25 (lunr or similar) for keyword, Reciprocal Rank Fusion for the merge. Adapter interface so LanceDB, Qdrant, Pinecone, or OpenSearch slot in without touching the agent.
Web/API: provider-agnostic adapter. First-party fixtures so demos run without network.
Synthesis: aggregates upstream node outputs. Conflict resolution is a heuristic chain: source priority → recency → consensus → flagged-disagreement.

Memory

Three tiers, each with a clear job:

Working memory: run-scoped key-value store. Drives node-input resolution.
Vector store: long-term embeddings. Read by retrieval, written by ingestion jobs.
Episodic store: full (goal, plan, trace, outcome) tuples. Used for offline analysis and few-shot prompting.

Observability + eval

OpenTelemetry spans wrap each agent invocation. The Langfuse exporter is env-gated: no key set, no calls made. The eval harness takes a (goal, output, rubric) triple and runs an LLM-as-Judge with a frozen rubric covering faithfulness, completeness, conflict resolution, plan efficiency, latency, and cost. Frozen rubric means scores stay comparable across runs and across model versions.

Defaults and swap paths

Concern	Default	Swap
LLM provider	Anthropic Claude (Sonnet 4.6 plan, Haiku 4.5 specialists)	Vercel AI SDK provider interface
Vector store	pgvector	LanceDB, Qdrant, Pinecone via adapter
BM25	in-memory (`lunr`)	OpenSearch, Tantivy
Tracing	OpenTelemetry → Langfuse	Honeycomb, Phoenix, LangSmith
Eval	LLM-as-Judge with frozen rubric	RAGAS, custom

Build plan

Sextant ships in five phases. Phases 1 and 2 are shipped on main. Phases 3-5 are gated on inbound signal.

Phase	Lands	Estimate	Status
1	DAG kernel + core types (concurrency, replan budget, cost guard, tests)	8-12 h	shipped
2	Anthropic provider + planner + executor + working demo with real web search	12-20 h	shipped
3	Bounded adaptive replanning + retrieval specialist	20-28 h	gated
4	Observability + LLM-as-Judge harness	16-24 h	gated
5	Polish, docs, full public launch	10-16 h	gated

Phases 3-5 are gated. They start only after either Crucible (a prior public artifact) or the Sextant Phase 1+2 stub captures at least one senior-tier inbound lead within 4 weeks of launch. If neither does, the public-artifact-funnel hypothesis is falsified and Sextant ships at v0.2 (kernel + Plan-and-Execute) only.

Status

Phase 1 (DAG kernel + types + 56 tests) and Phase 2 (Anthropic provider + planner + executor + working demo with real web research) are shipped and merged on main. The demo accepts any goal as a CLI arg and runs live against the Anthropic native web_search server tool. Real measured cost: $0.05-0.12 per run depending on how many search nodes the planner picks. Phase 3 (adaptive replanning + retrieval) and Phase 4-5 (observability + LLM-as-Judge + public launch) are gated on inbound signal per the build plan above. Stars and issues welcome.

FAQ

Why TypeScript and not Python? Most agentic AI work is Python. TS gives end-to-end typing across the planner contract, kernel state, and tool schemas (Zod). It also matches the deployment context for many senior briefs (Next.js, edge functions, Tauri, browser extensions). Python port is on the table if there's pull for it.

Why no LangChain? LangChain has its place; it's not the right fit when the goal is to read the orchestration code in one sitting. Sextant is the inverse choice: minimum legible code, opinionated kernel, swap any layer.

How much does it cost to run? ~$0.05 for a single-search query, ~$0.12 for a multi-angle comparison (3 parallel web_searches + synthesis). $10 of Anthropic credit covers ~85-200 runs depending on complexity. Fixture mode is free.

Can I use a different LLM provider? The LlmProvider interface is the seam. OpenAI / Bedrock / Vercel AI SDK adapters land in Phase 3+. Today, only Anthropic is wired up; the kernel itself doesn't know about any provider.

Is the kernel really only ~820 LoC? 820 LoC measured for src/core/{types,plan,state,kernel}.ts (run wc -l src/core/*.ts to verify). Add the planner + executor + agents + Anthropic provider and the full footprint is ~2000 LoC. The kernel is what you read when you want to understand "how does this actually run a DAG safely under concurrency, replans, aborts, and cost budgets."

How do I add my own tool? Implement the Tool signature (call: ToolCall) => Promise<ToolResult>, write a Zod schema for inputs, and register it in tools + toolSchemas. The planner sees the schema description and picks it when relevant. See src/agents/web.ts and src/agents/synthesis.ts for the two existing patterns (pure function and LLM-calling).

What happens if the planner generates an invalid plan? The planner has a retry budget (default 3 attempts). On a parse or schema-validation failure, the previous output and the validator's error message are appended to the conversation so the model can self-correct. After the budget, PlannerExhausted surfaces with the last raw output for debugging.

What happens if a tool fails or times out mid-run? Per-node retries with exponential backoff (capped at 5s). Aborts propagate via AbortSignal. After the retry budget, the kernel records state.failed for that node id; downstream nodes that depended on it stay unscheduled, and the run lands in state.status === 'failed' with state.terminalError populated. Phase 3 wires the replanner so a downstream Zod precondition failure can swap in a new plan instead.

Acknowledgements

Sextant inherits design patterns from a small constellation of private tools (Sanhedrin, KeurSmid, ContentSmid) and one public artifact (Crucible at github.com/Bambushu/crucible). The pattern reuse is genuine; the code is green-field. Where a file's structure mirrors a prior tool, the file header notes the source.

License

MIT. See LICENSE.

Contact

Built by Maikel Slomp. For senior-tier agentic-AI work and partnerships, reach out via mad-it.agency.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
docs		docs
eval		eval
examples		examples
logo-candidates		logo-candidates
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
logo.svg		logo.svg
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.test.json		tsconfig.test.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sextant

What this is

What this isn't

Quick start

Use it from your code

What does it look like?

Architecture

Why this pattern

Compared to other tools

DAG kernel contract

Limitations and known risks

Layered components

Orchestrator

Specialist agents

Memory

Observability + eval

Defaults and swap paths

Build plan

Status

FAQ

Acknowledgements

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sextant

What this is

What this isn't

Quick start

Use it from your code

What does it look like?

Architecture

Why this pattern

Compared to other tools

DAG kernel contract

Limitations and known risks

Layered components

Orchestrator

Specialist agents

Memory

Observability + eval

Defaults and swap paths

Build plan

Status

FAQ

Acknowledgements

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages