Benchmarks on tool accuracy, token usage and caching ?

Hey folks, thanks for the repo. Really helpful. I just have one question. Is there any benchmarks that this tool retriever was run on ? Can you share any metrics ?

I was trying to run some evals on [my synthetic data](https://github.com/altic-dev/reflex-evals) for a meta tool retriever and executor I built, got pretty bad results on the benchmark. So looking to compare and see if this can make it work. Please point me to any metrics on benchmarks that you might have ran

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks on tool accuracy, token usage and caching ? #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmarks on tool accuracy, token usage and caching ? #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions