Skip to content

feat: Evaluation of search tools #4

@Yip-Jia-Qi

Description

@Yip-Jia-Qi

Goal

Evaluate the changes in the performance of google search, but running a websearch eval on google/serper every month, and monitor either an accuracy metric or a number of tool called needed metric

Over time, as the web search tools improve, we should see changes in the numbers

This allows us to calibrate the performance of web-search agents even as the underlying tool they are using is changing.

We can open source this eval and make public the findings.

It can be useful to people who use websearch agents like jan-v1

Approach

Take the existing work done on menloresearch/jan-models#197 and build an automatic eval pipeline that runs monthly

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions