Hyperparameters (non-comprehensive):
- Data
- Model
- Optimization
- Number of steps
- Batch size
- Learning rate schedule
- Optimizer hyperparameters
- Weight decay
- Max grad norm
Potential optimization targets:
- Held-out language modeling loss
- Downstream task performance (validation set)
Potential approaches:
- Guess suitable HP from literature + heuristics
- Search at large scale
- Search at small scale and extrapolate
Assumptions:
- Finite data, epoching.
- Context size has already been fixed and is small (256-512bp). All examples have the exact same length.
Hyperparameters (non-comprehensive):
Potential optimization targets:
Potential approaches:
Assumptions: