Skip to content

Hyperparameter optimization #80

@gonzalobenegas

Description

@gonzalobenegas

Hyperparameters (non-comprehensive):

  • Data
    • Mixture proportions
  • Model
    • Size
    • Shape
  • Optimization
    • Number of steps
    • Batch size
    • Learning rate schedule
    • Optimizer hyperparameters
    • Weight decay
    • Max grad norm

Potential optimization targets:

  • Held-out language modeling loss
  • Downstream task performance (validation set)

Potential approaches:

  • Guess suitable HP from literature + heuristics
  • Search at large scale
  • Search at small scale and extrapolate

Assumptions:

  • Finite data, epoching.
  • Context size has already been fixed and is small (256-512bp). All examples have the exact same length.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions