Skip to content

Base enablement to enable a dataset configuration file to be passed in rather than standard args, allow alias's for the folder structure #81

Open
@rgreenberg1

Description

@rgreenberg1

Description:
We will need to add support for GuideLLM to take in a dataset configuration profile as an argument (with the details of i/o tokens, for example) that can be used for benchmarking with GuideLLM. These dataset profiles and push them upstream to a folder under GuideLLM to then be pulled in and used as reference datasets for GuideLLM.

User Story:
As a ml developer, I want to be able to provide standard dataset profiles, targeted at different use cases (RAG, Summarization, Chat...) and use that profile as a benchmarking target to better understand what my inference performance will look like for my model/use case when I move to production.

Additional Documentation:

https://docs.google.com/document/d/1Ql6RI3_LbhQxqFCIu_n2A2CewK6OVVFZFYgu4KEjjtw/edit?usp=sharing

Acceptance criteria:
Extend a new argument format for GuideLLM to accept a dataset profile as an argument.
Dataset profile should have the following extensions and formats to be valid as arguments:

  • Text - .txt
  • Audio - .wav
  • image - .png or .jpg

Metadata

Metadata

Assignees

Labels

datasetDataset workstream

Type

Projects

Status

In progress

Relationships

None yet

Development

No branches or pull requests

Issue actions