-
Notifications
You must be signed in to change notification settings - Fork 52
Open
Description
- Experiments
- (Tactical) Run-only experiments with parameter auto-discovery (resulting in full population of the performance benchmark report Add metadata and description to workloads #715 )
- (Strategic) Ability to fully capture and replay a particular benchmark run which was just executed.
- Integration
- (Strategic) Import “well-lit paths” directly from llm-d
- (Strategic) Consume "time-series" data from
llm-d-monitoring(Extended benchmark report format #568 Collect and add time-series data in benchmark report #597) - (Strategic) Eliminate or greatly reduce the need for cluster admin privileges. In particular, there should be no privilege requirement to push load through pre-deployed
llm-dstacks (Run benchmark without cluster wide privileges #351)
- “Design of Experiments” (“e2e.sh”)
- (Tactical) Automatic generation of treatments (with parameter space pruning)
- (Tactical) “Fast Treatment”: ability to “update” llm-d stack without a full standup/teardown cycle
- (Strategic) Declaratively specify scenario directly on the experiment file (incorporating part of the observation and feature requests on Thoughts on llm-d-benchmark usability #371)
- (Strategic) "Resumable” Experiments: ability to resume an experiment from a given treatment in case of failure (Stateful status of experiments #716)
- Code
- (Tactical) Full conversion to python (Refactor the code to use a new fully declarative experiment specification format. #601)
- (Strategic)
llm-d-benchmarkbecomes apippackage.
- Usability
- (Tactical) Run experiments directly from the target cluster: do not require user workstation/notebook
- (Strategic) “Benchmark as a Service”: long-running service on the cluster which receives experiment requests, executes it and make the results available.
- Analysis
- (Strategic) Output pre-configured png graphs for each treatment
- (Tactical) Improvements on standard notebook format and data (e.g., add
llm-d-monitoringdata on the reports)
- Configuration Explorer
- (Tactical) Make the "Performance Explorer" publicly available, using the data already accessible on the public google drive.
- CI/CD
- (Tactical) ALL well-lit paths nightly run on at least one cluster
- (Strategic) Parallel jobs (only limitation should be GPU availability on the cluster)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels