Skip to content

Allow users to specify columns names for columnar format vector datasets #987

@finnegancarroll

Description

@finnegancarroll

Is your feature request related to a problem? Please describe

When using OSB with a custom HDF5 vector dataset users will find the column names of their .hdf5 must exactly match the "train"/"test"/ect fields hard coded by OSB.

https://github.com/opensearch-project/opensearch-benchmark/blob/main/osbenchmark/utils/dataset.py#L139-L160

Describe the solution you'd like

It would be convenient if these column names were configurable, maybe as part of the workload or some other flag.

Describe alternatives you've considered

The alternative is to recreate the .hdf5 file with a python script. A helper could be provided as an alternative but this can be a bit of a hassle as these .hdf5 datasets might be quite large and take some time to convert.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions