Hey folks, thanks for the repo. Really helpful. I just have one question. Is there any benchmarks that this tool retriever was run on ? Can you share any metrics ?
I was trying to run some evals on my synthetic data for a meta tool retriever and executor I built, got pretty bad results on the benchmark. So looking to compare and see if this can make it work. Please point me to any metrics on benchmarks that you might have ran