Scope out and propose Community-oriented text dataset profiles for model validation

**Description:** 
 We will need to create the community-oriented dataset profiles and land them in understandable, organized folder structures w/in GuideLLM to offer users the ability to benchmark common community scenarios on GuideLLM. These profiles should cover common use cases that a developer may want to load test locally or in a dev environment to test the inference performance of their model for their desired use case.  This set of data should be Split up by use case and offer 1-5 community-oriented datasets so users can simulate loads they may expect to see in their applications. These should be named easily understandable profiles.  

Solidify the Inference Scenario Profiles for inference model validation. 

**Goal:** 
Get all of the relevant datasets that fit under the use case scenarios, run tokenizer on them to get the underlying the statistics. We will target synthetic data to actually use. Once this is scoped, we can create a follow up task on the details of what we need to implement and pass over from the Acceptance Criteria. 

**User Story**
As a new developer getting started with GuideLLM and performance benchmarking 

**Additional Docs**


**Acceptance Criteria**
Create standard dataset profiles per: https://docs.google.com/document/d/1Ql6RI3_LbhQxqFCIu_n2A2CewK6OVVFZFYgu4KEjjtw/edit?usp=sharing 
Land these profiles in a 'dataset-profiles' folder under the GuideLLM upstream 
folder formatting: 
- README - describes the profiles below and the use cases they cover 
- Chat
- - short_form_chat
- - mid_form_chat
- - long_form_chat
- Code Generation
- - 
- Summarization
- RAG 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scope out and propose Community-oriented text dataset profiles for model validation #83

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scope out and propose Community-oriented text dataset profiles for model validation #83

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions