AGENTS.md: Wan 2.2 VAE auto-optim (the contract)

Short, always-loaded contract. The full procedure is program.md (the runbook), read it before you start. Target spec + thresholds + stop conditions live in goals.json (frozen JSON). Current best + dead-end log live in PROGRESS.md.

Your job: minimize Wan 2.2 VAE decode latency on a single H100 without failing the quality gate, by running the keep-or-revert loop in program.md. You do not stop on your own; only a goals.json stop condition ends the loop.

Commands

export CUDA_VISIBLE_DEVICES=<free gpu>        # never benchmark on a busy GPU
python harness/make_reference.py              # once: build the frozen reference
python harness/run_experiment.py --desc "..." # after EVERY edit to optimize.py
python harness/verify.py   # quality gate only (exit 0 = pass)
python harness/bench.py    # latency only
python harness/run_profile.py --exp <n>   # detailed warmup+profiler (auto-run on every KEEP)

Every KEEP auto-runs profile.py and appends the breakdown (top ops + GPU-util/idle) to PROGRESS.md. The next hypothesis comes from that profile, not a guess, if GPU util is low the decode is host-bound (recover idle time); if high, attack the top op.

How success is defined

run_experiment.py runs verify.py (quality vs the frozen fp32 reference) then bench.py (latency). A result is KEPT only if it PASSES the gate AND is faster than the current best; everything else is auto-reverted. The frozen evaluator decides, you do not grade your own work. Project success = a kept result at/under the goals.json target, confirmed by a fresh-context grader.

Boundaries

✅ Always: edit only optimize.py; run the driver after every edit; profile before guessing; append a reflexion entry to PROGRESS.md (what you tried / why it failed); prefer the simplest change that moves the metric; paste the driver's real output as evidence.

⚠️ Ask the human first: anything that changes the measured task (output shape, the input, the tolerance); a new dependency; peak memory over the goals.json cap.

🚫 NEVER (YOU MUST NOT):

Edit or delete goals.json, anything in harness/, or anything in model/. Editing the evaluator to pass is reward hacking, forbidden.
Special-case the benchmark input, cache outputs across runs, mutate the input in place, or wrap a wrong fast path in try/except with a slow fallback.
Mark/claim a result good when quality_ok is false.
Claim "faster"/"done" without pasting the driver output.
Stop the loop to ask whether to continue. "Hard" or "slow" is not a stop condition. Stuck for 8 experiments → summarize in PROGRESS.md and escalate; never silently quit.

The decode must produce the same video. "Faster" only counts if the gate stays green, enforced, not hoped for. See program.md §7 for the full anti-reward-hacking rules.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AGENTS.md: Wan 2.2 VAE auto-optim (the contract)

Commands

How success is defined

Boundaries

Uh oh!

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md: Wan 2.2 VAE auto-optim (the contract)

Commands

How success is defined

Boundaries