Skip to content

[FEATURE] Store Experiment Level Metrics instead of Recalculating on Render #245

@epugh

Description

@epugh

Is your feature request related to a problem?

We currently only support storing metric values at the per query level, as this was the natural pattern to leverage OpenSearch Dashboard visualizations. For Experiment level metrics, we currently just average the individual Query level metrics for rendering, both in OSD and in the Dashboards Search Relevance.

There are a couple of limiitations we've seen:

  1. There are metrics that need to be summed instead of averaged, like DCG, and we don't support that.
  2. We recalculate the experiment level metrics both in the dashboards-search-relevance UI and in our OSD dashboards, which can lead to discrepancies.
  3. There may be addiitonal Experiment level metrics we want that don't make sense at the per query level, and we have no home for them.

What solution would you like?

I would like to see the existing metrics that are stored at the per query level:

"queryText": "Ice Age",
      "metrics": [
        {
          "metric": "jaccard",
          "value": 0.54
        },
        {
          "metric": "rbo50",
          "value": 0.57
        },

to now be accessible in the same basic data structure at the Experiment level.

This would require us to calculate this metric as part of completing a Experiment. We currently already in our Java code have the right place for updating an experiment when all the per query work is done, so we have a place to do this calcuation. This is also an additive data structure, so we just need to add some smarts to create the index mappings if they don't exist.

What alternatives have you considered?

Adding more richness to our understanding of each metric to know when a sum or a average or even if it's a text value and should render in a special way. But this seems complex and error prone.

Do you have any additional context?

This need become very visible when working with @frejonb on integrating RAGElo genetated metrics into SRW.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions