Toolflow - Supercharge Any LLM SDK

🔗 GitHub • 📘 Examples • 🔒 Security

🚀 Stable API: Version 0.3.x has a frozen public API. Breaking changes will bump to 0.4.0.

A lightweight drop-in wrapper for OpenAI and Anthropic SDKs that adds automatic parallel tool calling and structured Pydantic outputs without breaking changes.

Why Toolflow?

Stop battling bloated frameworks. Toolflow supercharges the official SDKs you already use:

✅ Drop-in replacement - One line change, zero breaking changes
✅ Auto-parallel tools - Functions execute concurrently (2-4x faster)
✅ Structured outputs - Pass Pydantic models, get typed responses
✅ Advanced AI support - OpenAI reasoning + Anthropic thinking modes
✅ Lightweight - ~5MB vs ~50MB+ for other frameworks
✅ Unified interface - Same code across providers

Installation

pip install toolflow
# Provider-specific installs
pip install toolflow[openai]      # OpenAI only
pip install toolflow[anthropic]   # Anthropic only

Quick Start

import toolflow
from openai import OpenAI
from pydantic import BaseModel
from typing import List

# Only change needed!
client = toolflow.from_openai(OpenAI())

# Define structured models
class CityWeather(BaseModel):
    city: str
    temperature: float
    condition: str

class WeatherRequest(BaseModel):
    cities: List[str]
    units: str

def get_weather(request: WeatherRequest) -> List[CityWeather]:
    """Get weather for multiple cities."""
    return [CityWeather(city=city, temperature=72.0, condition="Sunny") 
            for city in request.cities]

# Automatic parallel tool execution + structured output
result = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Get weather for NYC and London in Celsius"}],
    tools=[get_weather],
    response_format=List[CityWeather]
)
print(result)  # List of CityWeather objects

Core Features

1. Auto-Parallel Tool Execution

Tools execute concurrently by default - 2-4x faster than sequential:

import time
from pydantic import BaseModel

class ApiRequest(BaseModel):
    query: str
    timeout: int

def slow_api_call(request: ApiRequest) -> str:
    time.sleep(1)  # Simulated API call
    return f"Result for {request.query}"

def fast_calculation(x: int, y: int) -> int:
    return x * y

# These execute in parallel (total time ~1 second)
result = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Call API with 'data' and calculate 10 * 5"}],
    tools=[slow_api_call, fast_calculation],
    parallel_tool_execution=True  # Default behavior
)

2. Structured Outputs (Like Instructor)

Get typed responses with Pydantic models:

class TeamAnalysis(BaseModel):
    people: List[Person]
    average_age: float
    top_skills: List[str]

result = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Analyze team: John (30, Python), Sarah (25, Go)"}],
    response_format=TeamAnalysis
)
print(type(result))  # <class 'TeamAnalysis'>
print(result.average_age)  # 27.5

3. Response Modes

Choose between simplified or full SDK responses:

# Simplified (default) - Direct content
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response)  # "Hello! How can I help you today?"

# Full SDK response
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}],
    full_response=True
)
print(response.choices[0].message.content)  # Original SDK behavior

Advanced AI Capabilities

OpenAI Reasoning Mode

Seamlessly integrate reasoning with tools and structured outputs:

class AnalysisResult(BaseModel):
    solution: str
    reasoning_steps: List[str]
    confidence: float

result = client.chat.completions.create(
    model="o4-mini",
    reasoning_effort="medium",  # OpenAI reasoning
    messages=[{"role": "user", "content": "Analyze sales data and project 15% growth"}],
    tools=[calculate, analyze_data],
    response_format=AnalysisResult,
    parallel_tool_execution=True
)

Anthropic Extended Thinking

anthropic_client = toolflow.from_anthropic(Anthropic())

result = anthropic_client.messages.create(
    model="claude-3-5-sonnet-20241022",
    thinking=True,  # Extended thinking mode
    messages=[{"role": "user", "content": "Research AI trends and provide recommendations"}],
    tools=[search_web, analyze_trends],
    response_format=ResearchFindings,
    parallel_tool_execution=True
)

Async Support

Mix sync and async tools with automatic optimization:

import asyncio
from openai import AsyncOpenAI

client = toolflow.from_openai(AsyncOpenAI())

async def async_api_call(query: str) -> str:
    await asyncio.sleep(0.5)
    return f"Async result: {query}"

def sync_calculation(x: int, y: int) -> int:
    return x * y

async def main():
    result = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Call API and calculate 10*5"}],
        tools=[async_api_call, sync_calculation]  # Mixed sync/async
    )
    print(result)

asyncio.run(main())

Streaming

Streaming works exactly like the official SDKs:

# Simplified streaming
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Write a story"}],
    tools=[search_web],
    stream=True
)

for chunk in stream:
    print(chunk, end="")  # Direct content

# Full response streaming
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True,
    full_response=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Supported Types for Tools and Response Formats

Toolflow can turn almost any modern Python type hint into an OpenAI tool-schema—and then coerce the JSON payload into real Python objects when the tool executes.

Category	Example Annotations
Primitives	`str`, `int`, `float`, `bool`, `None`
Std-lib scalars & formats	`datetime`, `date`, `time`, `timedelta`, `UUID`, `Decimal`, `Path`, `IPv4Address`
Sequences / sets	`list[int]`, `set[UUID]`, `tuple[str, int]`, `tuple[float, …]`
Mappings	`dict[str, float]`, `Mapping[str, Any]`
TypedDict (PEP 589)	`class User(TypedDict): id: int; name: str`
Data classes	`@dataclass class Point: x: float; y: float`
Pydantic BaseModel	`class Address(BaseModel): city: str; zip: str`
Enums & Literals	`Enum`, `IntEnum`, `StrEnum`, `Literal["A","B"]`
Union / Optional	`Union[int, str]`, `Optional[MyModel]`
Constrained & specialised types	`conint(gt=0)`, `EmailStr`, `AnyUrl`, `conlist(str, min_items=1)`
Annotated + Field metadata	`Annotated[int, Field(gt=0, description="positive")]`
Recursion & generics	`class Node(BaseModel): children: list["Node"]`
NewType / type aliases	`UserId = NewType("UserId", int)`

Everything above is runtime‐validated via Pydantic v2 TypeAdapter, so you get both a correct schema and real typed objects when the tool runs.

What isn’t supported (raises `UndescribableTypeError`)

Annotation	Why
`Callable[…]`, functions, lambdas	Executable code isn’t data
Open file handles (`IO`), `socket.socket`, live DB/HTTP sessions	External resources, no JSON form
Arbitrary classes without a Pydantic schema & no `**kwargs` constructor	Pydantic can’t introspect them
Large binary / numeric containers (`numpy.ndarray`, `pandas.DataFrame`, etc.)	No built-in JSON schema

Missing type hints raise MissingAnnotationError.

Need set-semantics? Use set[T] + an immutable/frozen model (or FrozenSet[T]). Otherwise prefer list[T].

Migration Guide

From OpenAI SDK

# Before
from openai import OpenAI
client = OpenAI()

# After - one line change!
import toolflow
from openai import OpenAI
client = toolflow.from_openai(OpenAI())

From Instructor

# Before
import instructor
client = instructor.from_openai(OpenAI())

# After - same interface!
import toolflow
client = toolflow.from_openai(OpenAI())

Configuration

Performance Tuning

import toolflow
from concurrent.futures import ThreadPoolExecutor

# Thread pool configuration
toolflow.set_max_workers(8)  # Default: 4
toolflow.set_executor(ThreadPoolExecutor(max_workers=16))

# Async streaming control
toolflow.set_async_yield_frequency(1)  # 0=disabled, 1=every chunk

Internal Concurrency

Toolflow intelligently handles concurrency based on your environment:

SYNC OPERATIONS
├── Default: Parallel execution in ThreadPoolExecutor
    ├── Only supports sync tools
    ├── No custom executor → Global ThreadPoolExecutor (4 workers)
    ├── Change with toolflow.set_max_workers(workers)
    └── Custom executor with toolflow.set_executor(executor)

ASYNC OPERATIONS  
├── Default: Parallel execution optimized for async
    ├── Async tools → Uses asyncio.gather() for true concurrency
    ├── Sync tools → Uses asyncio.run_in_executor() with default thread pool ( or custom executor if set)
    └── Mixed tools → Combines both approaches automatically

STREAMING
├── Sync streaming → ThreadPoolExecutor for tool execution
└── Async streaming → Event loop yielding controlled by yield frequency
                     ├── 0 (default) → Trust provider libraries
                     └── N → Explicit asyncio.sleep(0) every N chunks

Configuration Examples:

# High-performance custom executor
custom_executor = ThreadPoolExecutor(
    max_workers=16,
    thread_name_prefix="toolflow-custom-"
)
toolflow.set_executor(custom_executor)

# High-concurrency FastAPI deployment
toolflow.set_max_workers(12)              # More threads for parallel tools
toolflow.set_async_yield_frequency(1)     # Yield after every chunk

# Maximum performance setup
toolflow.set_max_workers(16)              # Maximum parallel tool execution
toolflow.set_async_yield_frequency(0)     # Trust provider libraries (default)

When to adjust settings:

High-concurrency deployments (100+ simultaneous streams): Set yield frequency to 1
I/O-heavy tools: Increase max_workers to 8-16
CPU-intensive tools: Keep max_workers at 4-6
Standard deployments: Use defaults

Enhanced Parameters

All standard SDK parameters work unchanged, plus:

client.chat.completions.create(
    # Standard parameters (model, messages, temperature, etc.)
    
    # Toolflow enhancements
    tools=[...],                          # List of functions
    response_format=BaseModel,            # Pydantic model
    parallel_tool_execution=True,         # Enable concurrency
    max_tool_call_rounds=10,              # Safety limit
    max_response_format_retries=2,        # Retry limit
    graceful_error_handling=True,         # Handle errors gracefully
    full_response=False,                  # Response mode
)

Performance Comparison

Metric	Toolflow	Other Frameworks	Native SDK
Speed	2-4x faster	Variable	Baseline
Memory	+5MB	+50MB+	Baseline
Learning Curve	Zero	Steep	N/A
Migration	One line	Complete rewrite	N/A

API Support

Currently Supported

✅ OpenAI: Chat Completions, reasoning mode (reasoning_effort)
✅ Anthropic: Messages API, thinking mode (thinking=True)
✅ Both: Tool calling, streaming, structured outputs

Coming Soon

⏳ OpenAI Responses API - New stateful API with hosted tools
🔄 Other providers - Groq, Gemini, etc.

Error Handling

Tools handle errors gracefully by default:

def unreliable_tool(data: str) -> str:
    if "error" in data:
        raise ValueError("Something went wrong!")
    return f"Success: {data}"

# Graceful handling (default)
result = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Process 'error data'"}],
    tools=[unreliable_tool],
    graceful_error_handling=True  # LLM receives error messages
)

# Strict handling
result = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Process 'error data'"}],
    tools=[unreliable_tool],
    graceful_error_handling=False  # Raises exceptions
)

Development

# Install for development
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black src/ && isort src/

# Type checking
mypy src/

# Live tests (requires API keys)
export OPENAI_API_KEY='your-key'
export ANTHROPIC_API_KEY='your-key'
python run_live_tests.py

Security

Toolflow executes all tool functions locally on your machine. See our Security Policy for important security information and best practices.

API Stability

0.3.x Series (Current)

✅ Frozen Public API: No breaking changes
✅ Production Ready: Stable for production use
🔄 Feature Additions: New features in minor releases

0.4.0 and Beyond

⚠️ Breaking Changes: Will bump to 0.4.0
📋 Migration Guide: Clear upgrade path provided

Contributing

Contributions welcome! Please fork, create a feature branch, add tests, and submit a pull request.

Author

Created by Isuru Wijesiri
🔗 LinkedIn • GitHub

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
examples		examples
src/toolflow		src/toolflow
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
parse_method.info		parse_method.info
pyproject.toml		pyproject.toml

License

IsuruMaduranga/toolflow

Folders and files

Latest commit

History

Repository files navigation

Toolflow - Supercharge Any LLM SDK

Why Toolflow?

Installation

Quick Start

Core Features

1. Auto-Parallel Tool Execution

2. Structured Outputs (Like Instructor)

3. Response Modes

Advanced AI Capabilities

OpenAI Reasoning Mode

Anthropic Extended Thinking

Async Support

Streaming

Supported Types for Tools and Response Formats

What isn’t supported (raises UndescribableTypeError)

Migration Guide

From OpenAI SDK

From Instructor

Configuration

Performance Tuning

Internal Concurrency

Enhanced Parameters

Performance Comparison

API Support

Currently Supported

Coming Soon

Error Handling

Development

Security

API Stability

Contributing

Author

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

What isn’t supported (raises `UndescribableTypeError`)

Packages