Skip to content

Latest commit

 

History

History
13 lines (8 loc) · 288 Bytes

File metadata and controls

13 lines (8 loc) · 288 Bytes

Nano MDM

Decoding Process Visualization
  • trainer with DDP

  • generation (perplexity / diversity traderoff with reference model?)

  • Sweep learning rate

  • Sweep width and see muP happening