Skip to content

Latest commit

 

History

History
51 lines (42 loc) · 2.85 KB

File metadata and controls

51 lines (42 loc) · 2.85 KB

AGENTS.md: Wan 2.2 VAE auto-optim (the contract)

Short, always-loaded contract. The full procedure is program.md (the runbook), read it before you start. Target spec + thresholds + stop conditions live in goals.json (frozen JSON). Current best + dead-end log live in PROGRESS.md.

Your job: minimize Wan 2.2 VAE decode latency on a single H100 without failing the quality gate, by running the keep-or-revert loop in program.md. You do not stop on your own; only a goals.json stop condition ends the loop.

Commands

export CUDA_VISIBLE_DEVICES=<free gpu>        # never benchmark on a busy GPU
python harness/make_reference.py              # once: build the frozen reference
python harness/run_experiment.py --desc "..." # after EVERY edit to optimize.py
python harness/verify.py   # quality gate only (exit 0 = pass)
python harness/bench.py    # latency only
python harness/run_profile.py --exp <n>   # detailed warmup+profiler (auto-run on every KEEP)

Every KEEP auto-runs profile.py and appends the breakdown (top ops + GPU-util/idle) to PROGRESS.md. The next hypothesis comes from that profile, not a guess, if GPU util is low the decode is host-bound (recover idle time); if high, attack the top op.

How success is defined

run_experiment.py runs verify.py (quality vs the frozen fp32 reference) then bench.py (latency). A result is KEPT only if it PASSES the gate AND is faster than the current best; everything else is auto-reverted. The frozen evaluator decides, you do not grade your own work. Project success = a kept result at/under the goals.json target, confirmed by a fresh-context grader.

Boundaries

Always: edit only optimize.py; run the driver after every edit; profile before guessing; append a reflexion entry to PROGRESS.md (what you tried / why it failed); prefer the simplest change that moves the metric; paste the driver's real output as evidence.

⚠️ Ask the human first: anything that changes the measured task (output shape, the input, the tolerance); a new dependency; peak memory over the goals.json cap.

🚫 NEVER (YOU MUST NOT):

  • Edit or delete goals.json, anything in harness/, or anything in model/. Editing the evaluator to pass is reward hacking, forbidden.
  • Special-case the benchmark input, cache outputs across runs, mutate the input in place, or wrap a wrong fast path in try/except with a slow fallback.
  • Mark/claim a result good when quality_ok is false.
  • Claim "faster"/"done" without pasting the driver output.
  • Stop the loop to ask whether to continue. "Hard" or "slow" is not a stop condition. Stuck for 8 experiments → summarize in PROGRESS.md and escalate; never silently quit.

The decode must produce the same video. "Faster" only counts if the gate stays green, enforced, not hoped for. See program.md §7 for the full anti-reward-hacking rules.