Is your feature request related to a problem? Please describe.
Say we fix the hardware setup, cache length, and chunk size. What is the best way to use the available GPU memory?
We can find this out by something like HPO, or random search.
Describe the solution you'd like
- Define evaluation metric. Should catch OOM and output that. Otherwise, train on longest batch (or on 2?), output running time, maybe estimate of max GPU usage.
- Then, need to wrap this into some basic HPO system, like Optuna. Simpler: Just do random search
- Also, need to define a configuration space and sampler.
This would a great task for LLM prompt-driven development.
Is your feature request related to a problem? Please describe.
Say we fix the hardware setup, cache length, and chunk size. What is the best way to use the available GPU memory?
We can find this out by something like HPO, or random search.
Describe the solution you'd like
This would a great task for LLM prompt-driven development.