Make AI a software engineering discipline.
Tool-calling agents cost 2.3x more, fail ~20% of the time, and ignore your instructions as context grows. Every ReAct loop re-tokenizes the entire conversation, burning tokens and losing accuracy with each step.
OpenSymbolicAI fixes this by splitting the LLM's job in two:
┌─────────────────────────────────────┐
│ Traditional Agent (ReAct) │
│ │
│ User ─→ LLM ─→ Tool ─→ LLM ─→ │
│ Tool ─→ LLM ─→ Tool ─→ │
│ LLM ─→ ... (loop forever) │
│ │
│ ⚠ Data in prompt = injection risk │
│ ⚠ Context bloats every iteration │
│ ⚠ LLM makes unplanned tool calls │
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ OpenSymbolicAI (Plan + Execute) │
│ │
│ User ─→ LLM ─→ Plan │
│ ↓ │
│ Runtime executes plan │
│ deterministically │
│ │
│ ✓ Data never enters LLM context │
│ ✓ Fewer tokens, fewer LLM calls │
│ ✓ Every side effect is explicit │
└─────────────────────────────────────┘
The LLM plans. The runtime executes. Data stays in application memory and never gets tokenized.
Three blueprints for different problem shapes:
| Blueprint | Pattern | Use when |
|---|---|---|
| PlanExecute | Plan once, execute deterministically | Fixed sequence of steps (calculators, converters, simple QA) |
| DesignExecute | Plan with loops and conditionals | Dynamic-length data (shopping carts, batch processing) |
| GoalSeeking | Plan → execute → evaluate → repeat | Iterative problems (optimization, multi-hop research, deep research) |
pip install opensymbolicai-coreDefine primitives (what your agent can do) and decompositions (examples of how to use them). The LLM learns from your examples to plan new queries:
from opensymbolicai import PlanExecute, primitive, decomposition
class Calculator(PlanExecute):
@primitive(read_only=True)
def add(self, a: float, b: float) -> float:
return a + b
@decomposition(
intent="What is 2 + 3?",
expanded_intent="Add the two numbers",
)
def _example(self) -> float:
return self.add(a=2, b=3)Every decomposition you add makes the agent better. This is the flywheel that prompt engineering doesn't have.
| Problem | How OpenSymbolicAI solves it |
|---|---|
| Prompt injection | Symbolic Firewall keeps data out of LLM context. Nothing to inject into |
| Unpredictable behavior | Execution is deterministic and fully traced. Even iterative agents (GoalSeeking) produce inspectable plans each step — no runaway tool-calling |
| High costs | Fewer LLM calls to plan, then pure code execution. No re-tokenizing on every step |
| Can't test or debug | Full execution traces, typed outputs (Pydantic), version-controlled behavior |
| Model lock-in | Model-agnostic. Swap providers without rewriting your agent |
| Repo | Description |
|---|---|
| core-py | Python runtime: primitives, blueprints (PlanExecute, DesignExecute, GoalSeeking), multi-provider LLM abstraction |
| examples-py | Example agents: RAG, multi-hop QA, deep research, unit converter, date calculator |
| cli-py | Interactive TUI for discovering and running agents |
| claude-skills | Claude Code skills for scaffolding agents, adding primitives/decompositions/evaluators, and debugging traces |
| Benchmark | Result | What it shows |
|---|---|---|
| TravelPlanner | 97.9% on 1,000 tasks — GPT-4 gets 0.6% | GoalSeeking two-stage. 100% hard constraint pass rate, 3.1× fewer tokens than LangChain. Blog post |
| MultiHopRAG | 82.9% — +7.9pp over previous best | GoalSeeking iterates over 609 documents, 2,556 queries. gpt-oss-120b (Fireworks) |
| FOLIO | 89.2% — outperforms GPT-4 CoT (78.1%) | PlanExecute + Z3 theorem prover. gpt-oss-120b (Fireworks) translates to first-order logic, Z3 proves it |
Same model (gpt-oss-120b), same tools, same evaluation — only the framework differs:
Pass Rate Tokens/Task Cost/Passing Task LLM Calls/Task
───────── ─────────── ───────────────── ──────────────
OpenSymbolicAI ████████████ 100% ██░░░░░░░ 13,936 █░░░░░░░ $0.013 ██░░░░░░░ 2.3
LangChain █████████░░░ 77.8% █████░░░░ 43,801 ████░░░░ $0.051 ████████░ 13.5
CrewAI ████████░░░░ 73.3% █████████ 81,331 ████████ $0.100 █████████ 39.6
7 models hit 100% pass rate — including Llama 3.3 70B at $0.006/task and 4.3s latency on Groq. The framework matters more than the model. See the full model landscape.
- Getting Started - Build your first agent in 5 minutes
- The OpenSymbolicAI Manifesto - The philosophy behind the architecture
- Behaviour Programming vs. Tool Calling - Why executable examples beat massive prompts
- LLM Attention Is Precious: Why ReAct Wastes It - A visual breakdown of token waste
- Secure by Design - How the Symbolic Firewall prevents prompt injection
- The Missing Flywheel in Agent Building - Why agents stay brittle and how to fix it
MIT