Skip to content

LatentEvals/opentools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

opentools

The tool surface every agentic AI framework reimplements, but done once and done right.

bash / read / write / edit / glob / grep / http behind a single Tool trait, with JSON Schema generation, streaming output, cancellation, and four reference agent loops wiring it into OpenAI, Anthropic, Google Gemini, and OpenRouter. Drop it into your harness and focus on the loop, auth, retry, and sandboxing that's actually your product.

Why this exists

If you're building an agentic harness in Rust — your own Claude Code, your own codex, a custom copilot, an eval runner, a sandboxed notebook — you will reimplement the same seven tools that everyone else has implemented, and you will get the same details subtly wrong:

  • Bash. Naive implementations spawn a fresh tokio::process::Command per call and lose every cd, every export, every shell function the model sets up. A correct one opens a PTY, keeps one bash alive across calls, streams stdout line-by-line as it arrives, handles cancellation via Ctrl-C mid-command without killing the session, and survives set -m job-control shenanigans.
  • Tool definitions. Each LLM provider wants a different shape. OpenAI Chat Completions nests under function. OpenAI Responses API is flat. Anthropic calls it input_schema. Gemini Code Assist wraps everything in functionDeclarations and wants OpenAPI-3 schema types. Keeping four flavors of tool metadata in sync by hand is not fun.
  • Streaming, cancellation, typed errors. None of these are free. A ProgressSink trait, a CancellationToken threaded into every call, and a ToolError enum that separates protocol-level failures (cancelled, timeout, invalid args) from execution errors takes more design work than most ad-hoc agent code ever gets.

opentools is the result of getting all of that right once so your harness doesn't have to.

What you get

Three primitives: a Tool trait, a ToolRegistry, and a ToolContext.

#[async_trait]
pub trait Tool: Send + Sync + 'static {
    type Input:  DeserializeOwned + JsonSchema + Send + Sync;
    type Output: Serialize + Send + Sync;

    fn name(&self) -> &'static str;
    fn description(&self) -> &'static str;

    async fn execute(&self, input: Self::Input, ctx: ToolContext)
        -> Result<Self::Output, ToolError>;
}

Input/output are typed, not serde_json::Value. You write strongly- typed structs, schemars derives the JSON Schema, and a blanket impl erases the associated types into an object-safe DynTool so a ToolRegistry can hold heterogeneous tools behind Arc<dyn DynTool>.

  • Tool — one trait, every tool.
  • ToolRegistry — register by value, dispatch by name with a JSON payload: reg.call("bash", json!({"command": "ls"}), ctx).await?. reg.list() returns {name, description, input_schema} entries ready to convert into any provider's tool-call format.
  • ToolContext — per-call cwd, CancellationToken, and Arc<dyn ProgressSink> for streaming. Nothing about providers, loops, auth, or retry — you bring those.

Integrating into your harness

The typical integration is ~50 lines of glue. Here's the skeleton.

1. Build a registry

use opentools::{ToolContext, ToolRegistry};
use opentools::tools::{Bash, Edit, Glob, Grep, Http, Read, Write};

let bash = Bash::spawn().await?;   // PTY spawned here
let mut reg = ToolRegistry::new();
reg.register(bash)
    .register(Read)
    .register(Write)
    .register(Edit)
    .register(Glob)
    .register(Grep)
    .register(Http::new());

// Add your own tools alongside the built-ins.
// reg.register(SqlQuery::new(pool));
// reg.register(SlackPost::new(webhook));

2. Convert the registry into your provider's tool format

Every provider has a different shape. Here are the four you need to know — these are lifted straight from the demos and are known to work.

OpenAI Chat Completions / OpenRouter / most OpenAI-compatible endpoints — nested under function:

let tools: Vec<Value> = reg.list().into_iter().map(|t| json!({
    "type": "function",
    "function": {
        "name":        t.name,
        "description": t.description,
        "parameters":  t.input_schema,
    }
})).collect();

OpenAI Responses API (codex endpoint / gpt-5 family) — flat:

let tools: Vec<Value> = reg.list().into_iter().map(|t| json!({
    "type":        "function",
    "name":        t.name,
    "description": t.description,
    "parameters":  t.input_schema,
    "strict":      false,
})).collect();

Anthropic Messages API — no type wrapper, uses input_schema:

let tools: Vec<Value> = reg.list().into_iter().map(|t| json!({
    "name":         t.name,
    "description":  t.description,
    "input_schema": t.input_schema,
})).collect();

Google Gemini (Vertex / Code Assist) — wrap in functionDeclarations, and the schema needs light sanitization (strip $schema, title, format; flatten type: ["X","null"]type: "X" + nullable: true). demo_gemini.rs has a 25-line sanitize_schema helper you can copy verbatim:

let declarations: Vec<Value> = reg.list().into_iter().map(|t| {
    let mut params = t.input_schema;
    sanitize_schema_for_gemini(&mut params);
    json!({
        "name":        t.name,
        "description": t.description,
        "parameters":  params,
    })
}).collect();
let tools = json!([{ "functionDeclarations": declarations }]);

3. Dispatch tool calls from the model

When the model emits a tool call, you get a name and a JSON arguments string. Parse and dispatch through the registry:

let ctx = ToolContext::default();
// ... in your agent loop, after parsing the model's tool_call:

let args: Value = serde_json::from_str(&tool_call_arguments_string)?;
let result = reg.call(&tool_call_name, args, ctx.clone()).await;

let content = match result {
    Ok(v)  => serde_json::to_string(&v)?,
    Err(e) => json!({"error": e.to_string()}).to_string(),
};

// Feed `content` back to the model on the next turn as a tool_result /
// function_call_output / functionResponse (whatever your provider calls it).

Anything not in the registry errors with ToolError::Execution("unknown tool: ..."). There's no way for the model to bypass the registered set — the protocol physically prevents it.

4. Stream output to the user

Attach a ProgressSink to the context. bash (and any custom tool you write) will emit events as they happen:

use opentools::{ProgressSink, ToolEvent};
use std::sync::Arc;

struct TerminalSink;

impl ProgressSink for TerminalSink {
    fn send(&self, event: ToolEvent) {
        match event {
            ToolEvent::StdoutLine(line) => println!("│ {line}"),
            ToolEvent::StderrLine(line) => eprintln!("│ {line}"),
            ToolEvent::Progress { message, .. } => println!("… {message}"),
        }
    }
}

let ctx = ToolContext::default().with_progress(Arc::new(TerminalSink));

Now reg.call("bash", ...) streams bash output through TerminalSink as commands produce it, not all at once at the end. The sink is behind Arc<dyn ProgressSink>, so the same context can be cloned across tool calls in an agent loop. NoopProgress is provided for "I don't care about streaming" contexts.

5. Cancel in-flight tool calls

Thread a CancellationToken through and cancel it when the user hits Ctrl-C (or a higher-level deadline fires, or whatever):

use tokio_util::sync::CancellationToken;

let cancel = CancellationToken::new();
let ctx = ToolContext::default().with_cancel(cancel.clone());

// Spawn a handler that cancels on Ctrl-C
tokio::spawn({
    let cancel = cancel.clone();
    async move {
        let _ = tokio::signal::ctrl_c().await;
        cancel.cancel();
    }
});

// Long-running bash command will return ToolError::Cancelled promptly
reg.call("bash", json!({"command": "sleep 600"}), ctx).await

The bash tool sends SIGINT to the current foreground process when the token fires, waits for the sentinel (up to 2 seconds), and returns Cancelled. The bash session itself stays alive for the next call (set -m enables job control so only the foreground command is killed).

Writing a custom tool

Implementing Tool is three structs and a trait impl. schemars derives the JSON Schema from your Input type automatically, including field descriptions from /// doc comments.

use opentools::{Tool, ToolContext, ToolError};
use async_trait::async_trait;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};

#[derive(Debug, Deserialize, JsonSchema)]
pub struct SqlQueryInput {
    /// The SELECT statement to run. Must be read-only.
    pub query: String,
    /// Maximum number of rows to return.
    #[serde(default)]
    pub limit: Option<u32>,
}

#[derive(Debug, Serialize, JsonSchema)]
pub struct SqlQueryOutput {
    pub rows:      Vec<serde_json::Value>,
    pub row_count: usize,
}

pub struct SqlQuery {
    pool: sqlx::PgPool,
}

#[async_trait]
impl Tool for SqlQuery {
    type Input  = SqlQueryInput;
    type Output = SqlQueryOutput;

    fn name(&self) -> &'static str { "sql_query" }
    fn description(&self) -> &'static str {
        "Run a read-only SELECT query against the production replica. \
         Use this when the user asks for data that's stored in Postgres. \
         Returns at most `limit` rows as JSON objects."
    }

    async fn execute(&self, input: SqlQueryInput, _ctx: ToolContext)
        -> Result<SqlQueryOutput, ToolError>
    {
        let rows = sqlx::query(&input.query)
            .fetch_all(&self.pool)
            .await
            .map_err(|e| ToolError::Execution(e.to_string()))?;
        Ok(SqlQueryOutput {
            row_count: rows.len(),
            rows: rows.into_iter().map(row_to_json).collect(),
        })
    }
}

Now just reg.register(SqlQuery { pool }) and it shows up in reg.list() alongside the built-ins, with a derived schema the model can use.

Included tools

Tool Description
bash Persistent PTY-backed bash session. cd, env vars, aliases, shell functions persist across calls. Streams stdout line-by-line, cancellable via CancellationToken, per-call timeout, session survives SIGINT'd commands via set -m.
read Read a UTF-8 file with optional offset / limit.
write Write a file, mkdir -p'ing parent directories as needed.
edit Exact-string find/replace with a unique-match guard (opt-in replace_all).
glob **/*.rs-style pattern walker that honors .gitignore.
grep Regex search with path + line number + matching text.
http HTTP request via reqwest. Any method (GET/POST/PUT/PATCH/DELETE/HEAD/OPTIONS), custom headers, optional body. Returns status, final URL after redirects, response headers, content-type, and body (capped at 1 MiB by default, truncated at a UTF-8 boundary). headers_only: true skips the body entirely for cheap probes (like curl -I).

Reference agent loops

examples/ has four full working agent loops, ~200–300 LOC each. They're templates you can copy-paste when building your own harness — each shows the complete integration for one provider.

Demo Provider Subscription source API key env var Default model
demo_openai OpenAI Responses API ~/.codex/auth.json OPENAI_API_KEY gpt-5.4
demo_claude Anthropic Messages API macOS Keychain / ~/.claude/.credentials.json ANTHROPIC_API_KEY claude-opus-4-6
demo_gemini Google Code Assist ~/.gemini/oauth_creds.json GEMINI_API_KEY gemini-3-flash-preview
demo_openrouter OpenRouter Chat Completions (API key only) OPENROUTER_API_KEY openai/gpt-5

Each demo supports:

# Auto-detect — use subscription creds if present, else env var API key
cargo run --example demo_claude

# Force an auth mode
cargo run --example demo_openai -- --auth sub
cargo run --example demo_gemini -- --auth api

# Override the default model
cargo run --example demo_claude -- --model claude-sonnet-4-5

# Remove tools from the registry so the model has to work harder
cargo run --example demo_openai -- --exclude-tool bash
cargo run --example demo_claude -- --exclude-tool bash --exclude-tool http

A typical run prints the registered tools, each tool call with args, the streamed bash output through a PrintingSink, and the final model summary:

opentools v0.1.0 registered: bash, edit, glob, grep, http, read, write
model: claude-opus-4-6 (via Keychain → api.anthropic.com (OAuth))
🚀 Task: Create a new directory at /tmp/demo-project-claude, initialize a git
         repository inside it, and write a README.md...

│ I'll accomplish this with parallel operations where possible.
🔧 bash {"command":"mkdir -p /tmp/demo-project-claude && cd ... && git init"}
   ┆ Initialized empty Git repository in /private/tmp/demo-project-claude/.git/
   ↳ {"exit_code":0, "output":"Initialized empty Git repository..."}
🔧 write {"path":"/tmp/demo-project-claude/README.md","content":"# Demo Project..."}
   ↳ {"bytes_written":185}
│ Created /tmp/demo-project-claude, initialized a git repository, and wrote a README.md.

✓ Done

Dispatch always goes through ToolRegistry::call(name, args, ctx) — no demo ever invokes a provider-native tool. The protocol physically prevents it, which is the whole point.

Testing

Three layers — see TESTING.md for the full coverage matrices.

# 1. Unit + integration tests (no network, ~3s, 17 tests)
cargo test

# 2. Preflight e2e tests for each demo — flag parsing, auth errors,
#    fake credentials, etc. (no network, no creds, ~5s)
bash tests/e2e/run.sh                        # 39 tests on macOS, 44 in Docker

# 3. Full e2e against real APIs with your real credentials
#    (costs provider quota, mounts creds into a Docker sandbox)
bash tests/e2e/docker_full.sh                # all providers
bash tests/e2e/docker_full.sh --only claude  # just one (save quota)

Integration tests exercise the PTY bash path end-to-end — cwd persistence, env persistence, streaming, timeout, cancellation, session survival after SIGINT — plus round-trip tests for the file tools.

Caveats

  • Not affiliated with OpenAI, Anthropic, or Google. The subscription auth paths reuse credentials from the respective first-party CLIs (codex, claude, gemini). If those projects change their credential storage formats or providers tighten server-side checks on allowed clients, the demos will break. For production harnesses, prefer API keys via --auth api.
  • No sandboxing. The bash tool runs commands on your real filesystem as your real user. Wrap it in whatever sandbox your harness needs (bubblewrap, seatbelt, Docker, firejail). The Tool trait makes no security claims — it's just a dispatch layer.
  • Gemini preview model gating. gemini-3.1-pro-preview-customtools returns 404 on most accounts; the default is gemini-3-flash-preview. See TESTING.md for the full model availability table.
  • No OAuth token refresh in the OpenAI and Claude demos — if your subscription token has expired, re-run the CLI's login command. The Gemini demo does auto-refresh because its tokens are shorter-lived.

License

Dual-licensed under MIT or Apache-2.0, at your option.