Planning scale up

**tl;dr:** Train large (7-8B) E2D2 model.
**Goal:** Beat LLaDA on downstream tasks, be **fully** open-source, super fast inference!

## Design decisions
- E2D2, block parameterization with block size $S=32$
- From scratch or fine-tune? TODO
- Dataset: TODO
- Hyperparams: TODO
    - Can also rely on this as resource: [Scaling laws for dLLMs](https://jinjieni.github.io/Quokka/resources/pdfs/Training_Optimal_Large_Diffusion_Language_Models.pdf)

## Open questions
- Where can we run this? Options include:
    - Use Jax and run on Marin cc @eric-czech 
    - Empire AI comes through (unlikely)
    - Lambda labs
- What is the total budget?

## Major TODOs:
- [ ] Good profiling to convince ourselves current implementations are utilizing hardware appropriately (e.g., also make sure flex attention and compile are working as expected and providing boost) (#7)
- [ ] Determine dataset mix
- [ ] Use remaining Lambda credits to do hyperparam sweep
- [ ] Download and process data to permanent file storage location
- [ ] Integrate an eval harness (#5)

## Other TODOs
- [ ] Create our own backbone_modeling.py file (as opposed to rely on `modeling_qwen3.py` from `transformers`)
    - If using Jax, we should support a `jax`-compatible version of backbone and denoiser files. (#66 )
- [ ] Clean up our custom `generate` method? Currently it has a lot of assumptions and perhaps too much custom logic (#67)
- [ ] Need to sort out resumption: currently seeing loss spike (#4).
- [ ] There is slight difference in loss / PPL logged to wandb and what we get when (#65).
- [ ] Confirm that push to hub works well and integrates with eval harness(es)
- [ ] Upgrade packages and pin software versions (#58)
- [ ] General code cleanup: Perhaps we don't need to be so "general" / support so many baseline denoisers and backbones

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Planning scale up #64

Design decisions

Open questions

Major TODOs:

Other TODOs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Planning scale up #64

Description

Design decisions

Open questions

Major TODOs:

Other TODOs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions