Skip to content
@OpenSymbolicAI

OpenSymbolicAI

OpenSymbolicAI

Make AI a software engineering discipline.

Tool-calling agents cost 2.3x more, fail ~20% of the time, and ignore your instructions as context grows. Every ReAct loop re-tokenizes the entire conversation, burning tokens and losing accuracy with each step.

OpenSymbolicAI fixes this by splitting the LLM's job in two:

┌─────────────────────────────────────┐
│  Traditional Agent (ReAct)          │
│                                     │
│  User ─→ LLM ─→ Tool ─→ LLM ─→      │
│          Tool ─→ LLM ─→ Tool ─→     │
│          LLM ─→ ... (loop forever)  │
│                                     │
│  ⚠ Data in prompt = injection risk  │
│  ⚠ Context bloats every iteration   │
│  ⚠ LLM makes unplanned tool calls   │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│  OpenSymbolicAI (Plan + Execute)    │
│                                     │
│  User ─→ LLM ─→ Plan                │
│                    ↓                │
│          Runtime executes plan      │
│          deterministically          │
│                                     │
│  ✓ Data never enters LLM context    │
│  ✓ Fewer tokens, fewer LLM calls    │
│  ✓ Every side effect is explicit    │
└─────────────────────────────────────┘

The LLM plans. The runtime executes. Data stays in application memory and never gets tokenized.

Three blueprints for different problem shapes:

Blueprint Pattern Use when
PlanExecute Plan once, execute deterministically Fixed sequence of steps (calculators, converters, simple QA)
DesignExecute Plan with loops and conditionals Dynamic-length data (shopping carts, batch processing)
GoalSeeking Plan → execute → evaluate → repeat Iterative problems (optimization, multi-hop research, deep research)
pip install opensymbolicai-core

How It Works

Define primitives (what your agent can do) and decompositions (examples of how to use them). The LLM learns from your examples to plan new queries:

from opensymbolicai import PlanExecute, primitive, decomposition

class Calculator(PlanExecute):

    @primitive(read_only=True)
    def add(self, a: float, b: float) -> float:
        return a + b

    @decomposition(
        intent="What is 2 + 3?",
        expanded_intent="Add the two numbers",
    )
    def _example(self) -> float:
        return self.add(a=2, b=3)

Every decomposition you add makes the agent better. This is the flywheel that prompt engineering doesn't have.

Why This Matters

Problem How OpenSymbolicAI solves it
Prompt injection Symbolic Firewall keeps data out of LLM context. Nothing to inject into
Unpredictable behavior Execution is deterministic and fully traced. Even iterative agents (GoalSeeking) produce inspectable plans each step — no runaway tool-calling
High costs Fewer LLM calls to plan, then pure code execution. No re-tokenizing on every step
Can't test or debug Full execution traces, typed outputs (Pydantic), version-controlled behavior
Model lock-in Model-agnostic. Swap providers without rewriting your agent

Repositories

Repo Description
core-py Python runtime: primitives, blueprints (PlanExecute, DesignExecute, GoalSeeking), multi-provider LLM abstraction
examples-py Example agents: RAG, multi-hop QA, deep research, unit converter, date calculator
cli-py Interactive TUI for discovering and running agents
claude-skills Claude Code skills for scaffolding agents, adding primitives/decompositions/evaluators, and debugging traces

Benchmarks

Benchmark Result What it shows
TravelPlanner 97.9% on 1,000 tasks — GPT-4 gets 0.6% GoalSeeking two-stage. 100% hard constraint pass rate, 3.1× fewer tokens than LangChain. Blog post
MultiHopRAG 82.9% — +7.9pp over previous best GoalSeeking iterates over 609 documents, 2,556 queries. gpt-oss-120b (Fireworks)
FOLIO 89.2% — outperforms GPT-4 CoT (78.1%) PlanExecute + Z3 theorem prover. gpt-oss-120b (Fireworks) translates to first-order logic, Z3 proves it

Framework Comparison (TravelPlanner)

Same model (gpt-oss-120b), same tools, same evaluation — only the framework differs:

                Pass Rate        Tokens/Task       Cost/Passing Task    LLM Calls/Task
                ─────────        ───────────       ─────────────────    ──────────────
OpenSymbolicAI  ████████████ 100%  ██░░░░░░░  13,936   █░░░░░░░  $0.013    ██░░░░░░░  2.3
LangChain       █████████░░░ 77.8% █████░░░░  43,801   ████░░░░  $0.051    ████████░  13.5
CrewAI          ████████░░░░ 73.3% █████████  81,331   ████████  $0.100    █████████  39.6

7 models hit 100% pass rate — including Llama 3.3 70B at $0.006/task and 4.3s latency on Groq. The framework matters more than the model. See the full model landscape.

Deep Dives

License

MIT

Pinned Loading

  1. core-py core-py Public

    Open Symbolic AI Core Repository

    Python 6 6

  2. examples-py examples-py Public

    Examples for OpenSymbolic AI Python

    1

  3. claude-skills claude-skills Public

    Claude Code skills for building agents with the OpenSymbolicAI framework

    Go Template

Repositories

Showing 10 of 13 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Most used topics

Loading…