Skip to content

Planning scale up #64

@yair-schiff

Description

@yair-schiff

tl;dr: Train large (7-8B) E2D2 model.
Goal: Beat LLaDA on downstream tasks, be fully open-source, super fast inference!

Design decisions

  • E2D2, block parameterization with block size $S=32$
  • From scratch or fine-tune? TODO
  • Dataset: TODO
  • Hyperparams: TODO

Open questions

  • Where can we run this? Options include:
    • Use Jax and run on Marin cc @eric-czech
    • Empire AI comes through (unlikely)
    • Lambda labs
  • What is the total budget?

Major TODOs:

  • Good profiling to convince ourselves current implementations are utilizing hardware appropriately (e.g., also make sure flex attention and compile are working as expected and providing boost) ((exp): Benchmark MFU as compute / nodes scale #7)
  • Determine dataset mix
  • Use remaining Lambda credits to do hyperparam sweep
  • Download and process data to permanent file storage location
  • Integrate an eval harness ((feat): Add evaluation harness #5)

Other TODOs

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions