Skip to content

Support for generating identical indices during benchmarking #1002

@peteralfonsi

Description

@peteralfonsi

Is your feature request related to a problem? Please describe

I often want to run OSB against 2 clusters, one with a search-related change and one without it, to see how search latency is affected. However, there is often quite a bit of noise in the results, and it can be tricky to tell if this is really due to my change, or if it's due to how exactly the data was indexed - for example the exact segment topology can very significantly affect sort query performance.

It would be really useful if there was an optional, easy way OSB could ensure the index would be exactly the same every time.

Describe the solution you'd like

The simplest solution might be setting a random seed with some workload flag like --rng-seed. But, I don't know enough about the indexing flow to be sure if this would actually result in identical indices every time.

Describe alternatives you've considered

If the OSB random seed isn't enough to ensure identical indexes, another option would be hosting one "standard" snapshot for each dataset somewhere. Then, if the user specifies some flag, OSB would download this snapshot and install it to the cluster, instead of doing the typical indexing operation.

This could be limited to only a snapshot for the current OS version, and maybe for only the important workloads such as nyc_taxis/big5/http_logs, to avoid having to host too many of them.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions