Description
Description:
We will need to create the community-oriented dataset profiles and land them in understandable, organized folder structures w/in GuideLLM to offer users the ability to benchmark common community scenarios on GuideLLM. These profiles should cover common use cases that a developer may want to load test locally or in a dev environment to test the inference performance of their model for their desired use case. This set of data should be Split up by use case and offer 1-5 community-oriented datasets so users can simulate loads they may expect to see in their applications. These should be named easily understandable profiles.
Solidify the Inference Scenario Profiles for inference model validation.
Goal:
Get all of the relevant datasets that fit under the use case scenarios, run tokenizer on them to get the underlying the statistics. We will target synthetic data to actually use. Once this is scoped, we can create a follow up task on the details of what we need to implement and pass over from the Acceptance Criteria.
User Story
As a new developer getting started with GuideLLM and performance benchmarking
Additional Docs
Acceptance Criteria
Create standard dataset profiles per: https://docs.google.com/document/d/1Ql6RI3_LbhQxqFCIu_n2A2CewK6OVVFZFYgu4KEjjtw/edit?usp=sharing
Land these profiles in a 'dataset-profiles' folder under the GuideLLM upstream
folder formatting:
- README - describes the profiles below and the use cases they cover
- Chat
-
- short_form_chat
-
- mid_form_chat
-
- long_form_chat
- Code Generation
-
- Summarization
- RAG
Metadata
Metadata
Assignees
Labels
Type
Projects
Status