LLM Tool Performance Audit

A precise profiling tool to benchmark Standard Tool Loading vs. Anthropic's Deferred Tool Search (Beta).

Building agents with 50+ tools introduces a critical trade-off: Context Bloat vs. Latency. This repo reverse-engineers the "Double Pass" behavior of server-side tool search to help engineers make data-driven architecture decisions.

The Findings

Through empirical testing, this profiler confirms that Deferred Loading triggers a server-side loop:

Pass 1: Model receives user prompt + tool_search tool only.
Server Action: Anthropic executes search against your deferred tool definitions.
Pass 2: Model receives the original prompt + only the relevant tools found.

The Trade-off:

Standard: Higher Cost (Input Tokens), Lower Latency.
Deferred: Lower Cost (Tokens), Higher Latency (~1.5x - 2x).

Installation & Usage

Clone and Install

git clone [https://github.com/yourusername/llm-tool-performance-audit.git](https://github.com/yourusername/llm-tool-performance-audit.git)
cd llm-tool-performance-audit
make install

Configure Environment Create a .env file in the root directory:
```
ANTHROPIC_API_KEY=sk-ant-api03-...
```
Run the Audit
```
make run
```
Optional: Simulate a heavier load with 100 tools:
```
make run-heavy
```

The "Secret Sauce"

This project leverages the undocumented behavior of the advanced-tool-use beta. Here is how we implement the Deferred Profiler:

1. The Beta Configuration

We must specifically opt-in to the 2025-11-20 beta and use the tool_search_tool_bm25 type.

# src/profilers/deferred.py

response = client.beta.messages.create(
    model="claude-3-5-sonnet-20241022",
    # ⚠️ CRITICAL: The beta flag enables server-side loops
    betas=["advanced-tool-use-2025-11-20"], 
    max_tokens=1024,
    messages=[{"role": "user", "content": user_message}],
    tools=[
        {
            "type": "tool_search_tool_bm25_20251119", # The searcher
            "name": "tool_search"
        }
    ] + deferred_tools # Tools marked with defer_loading=True
)

2. Simulating Production Load

To get realistic metrics, we generate "heavy" tool definitions that mimic complex enterprise schemas.

# src/utils.py

def generate_dummy_tools(count=50):
    return [{
        "name": f"get_metric_{i}",
        "description": "Complex retrieval tool for specific user segments...",
        "defer_loading": True  # <--- This triggers the search behavior
    } for i in range(count)]

📊 Sample Output

The CLI uses rich to visualize the delta between architectures.

Performance Comparison
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Metric           ┃ Standard           ┃ Deferred (Search)   ┃ Delta        ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ Latency (s)      │ 2.105s             │ 3.850s              │ +1.745s      │
│ Input Tokens     │ 18,450             │ 1,240               │ -17,210      │
└──────────────────┴────────────────────┴─────────────────────┴──────────────┘
Analysis: Deferred loading reduced token consumption by 93%, but increased latency by 82% due to the sequential inference passes.

Contributing

PRs are welcome! Specifically looking for:

Profiling for the Regex search tool variant.
Cost estimation calculator based on current token prices.

📄 License

MIT © Alex

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
makefile		makefile
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Tool Performance Audit

The Findings

Installation & Usage

The "Secret Sauce"

1. The Beta Configuration

2. Simulating Production Load

📊 Sample Output

Contributing

📄 License

About

Uh oh!

Releases

Packages

Languages

aveljkovic/llm-tool-performance-audit

Folders and files

Latest commit

History

Repository files navigation

LLM Tool Performance Audit

The Findings

Installation & Usage

The "Secret Sauce"

1. The Beta Configuration

2. Simulating Production Load

📊 Sample Output

Contributing

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages