-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Goal
Evaluate the changes in the performance of google search, but running a websearch eval on google/serper every month, and monitor either an accuracy metric or a number of tool called needed metric
Over time, as the web search tools improve, we should see changes in the numbers
This allows us to calibrate the performance of web-search agents even as the underlying tool they are using is changing.
We can open source this eval and make public the findings.
It can be useful to people who use websearch agents like jan-v1
Approach
Take the existing work done on menloresearch/jan-models#197 and build an automatic eval pipeline that runs monthly