-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Is your feature request related to a problem?
We currently only support storing metric values at the per query level, as this was the natural pattern to leverage OpenSearch Dashboard visualizations. For Experiment level metrics, we currently just average the individual Query level metrics for rendering, both in OSD and in the Dashboards Search Relevance.
There are a couple of limiitations we've seen:
- There are metrics that need to be summed instead of averaged, like DCG, and we don't support that.
- We recalculate the experiment level metrics both in the
dashboards-search-relevance
UI and in our OSD dashboards, which can lead to discrepancies. - There may be addiitonal Experiment level metrics we want that don't make sense at the per query level, and we have no home for them.
What solution would you like?
I would like to see the existing metrics that are stored at the per query level:
"queryText": "Ice Age",
"metrics": [
{
"metric": "jaccard",
"value": 0.54
},
{
"metric": "rbo50",
"value": 0.57
},
to now be accessible in the same basic data structure at the Experiment level.
This would require us to calculate this metric as part of completing a Experiment. We currently already in our Java code have the right place for updating an experiment when all the per query work is done, so we have a place to do this calcuation. This is also an additive data structure, so we just need to add some smarts to create the index mappings if they don't exist.
What alternatives have you considered?
Adding more richness to our understanding of each metric to know when a sum or a average or even if it's a text value and should render in a special way. But this seems complex and error prone.
Do you have any additional context?
This need become very visible when working with @frejonb on integrating RAGElo genetated metrics into SRW.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status