[Roadmap] llm-d-benchmark 0.6 Release Plan

1. Experiments
- [ ] (Tactical) Run-only experiments with parameter auto-discovery (resulting in full population of the performance benchmark report #715 )
- [ ] (Strategic) Ability to fully capture and replay a particular benchmark run which was just executed.

2. Integration
- [ ] (Strategic) Import “well-lit paths” directly from llm-d
- [ ] (Strategic) Consume "time-series" data from `llm-d-monitoring` (#568 #597)
- [ ] (Strategic) Eliminate or greatly reduce the need for cluster admin privileges. In particular, there should be no privilege requirement to push load through pre-deployed `llm-d` stacks (#351)

 
3. “Design of Experiments” (“e2e.sh”)
- [ ]  (Tactical) Automatic generation of treatments (with parameter space pruning)
- [ ]  (Tactical) “Fast Treatment”: ability to “update” llm-d stack without a full standup/teardown cycle
- [ ]  (Strategic) Declaratively specify scenario directly on the experiment file (incorporating part of the observation and feature requests on #371)  
- [ ]  (Strategic) "Resumable” Experiments: ability to resume an experiment from a given treatment in case of failure (#716)

4. Code
- [ ] (Tactical) Full conversion to python (#601)
- [ ] (Strategic) `llm-d-benchmark` becomes a `pip` package.

5. Usability
- [ ] (Tactical) Run experiments directly from the target cluster: do not require user workstation/notebook
- [ ] (Strategic) “Benchmark as a Service”: long-running service on the cluster which receives experiment requests, executes it and make the results available.

6. Analysis
- [ ] (Strategic) Output pre-configured png graphs for each treatment
- [ ] (Tactical) Improvements on standard notebook format and data (e.g., add `llm-d-monitoring` data on the reports)

7. Configuration Explorer
- [ ] (Tactical) Make the "Performance Explorer" publicly available, using the data already accessible on the public google drive.

8. CI/CD
- [ ] (Tactical) ALL well-lit paths nightly run on at least one cluster
- [ ] (Strategic) Parallel jobs (only limitation should be GPU availability on the cluster)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Roadmap] llm-d-benchmark 0.6 Release Plan #722

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap] llm-d-benchmark 0.6 Release Plan #722

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions