tl;dr: Train large (7-8B) E2D2 model.
Goal: Beat LLaDA on downstream tasks, be fully open-source, super fast inference!
Design decisions
- E2D2, block parameterization with block size $S=32$
- From scratch or fine-tune? TODO
- Dataset: TODO
- Hyperparams: TODO
Open questions
- Where can we run this? Options include:
- Use Jax and run on Marin cc @eric-czech
- Empire AI comes through (unlikely)
- Lambda labs
- What is the total budget?
Major TODOs:
Other TODOs
tl;dr: Train large (7-8B) E2D2 model.
Goal: Beat LLaDA on downstream tasks, be fully open-source, super fast inference!
Design decisions
Open questions
Major TODOs:
Other TODOs
modeling_qwen3.pyfromtransformers)jax-compatible version of backbone and denoiser files. ((feat): Create backbone modeling file #66 )generatemethod? Currently it has a lot of assumptions and perhaps too much custom logic (Clean up our custom generate method (rm assumptions and custom logic) #67)