feat: visualising search traces

# Feature Request: Search Model Human Alignment Evaluation

## Background
Jia Qi brought up an amazing paper today: [[MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures](https://proceedings.neurips.cc/paper_files/paper/2024/file/b1f34d7b4a03a3d80be8e72eb430dd81-Paper-Conference.pdf)](https://proceedings.neurips.cc/paper_files/paper/2024/file/b1f34d7b4a03a3d80be8e72eb430dd81-Paper-Conference.pdf)

**Gist of the paper**: Compares a set of human Google searches with the distribution of different datasets.

## Proposed Feature
**Spin-off idea**: Measure the closeness of search models' distribution with these human Google searches instead, so as to model how well our model aligns with human search preferences. 

We can then also rank other models on this same dataset and make a qualitative evaluation on which model is the most "human-aligned" to search.

## Implementation
- Adapt MixEval's methodology from LLM evaluation to search model alignment
- Use distributional similarity between human search patterns and search model outputs
- Create leaderboard ranking different search models on human alignment
- Build evaluation framework that's efficient and reproducible

## Some open questions
- What dataset/where to collect the human google search data (Yuuki has been driving some collation of internal search data  by sending his own search queries in `#jan-model-internal/Random Question for Model from User` on Discord (search for `Random question from user for` on discord to find the thread)
- Which dataset should be used to curate the distributions of different search models (At a high level this is open to discussion with the team on how to do this)

## Expected Outcome
Standardized benchmark for measuring how well search models align with human search behavior and preferences.

A picture of how the evaluation could look like can be seen in this excalidraw.

<img width="422" height="179" alt="Image" src="https://github.com/user-attachments/assets/d589024b-cccd-4e49-8878-cb70c6411cfa" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: visualising search traces #2

Feature Request: Search Model Human Alignment Evaluation

Background

Proposed Feature

Implementation

Some open questions

Expected Outcome

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: visualising search traces #2

Description

Feature Request: Search Model Human Alignment Evaluation

Background

Proposed Feature

Implementation

Some open questions

Expected Outcome

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions