You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: document test fixture system and add tests/AGENTS.md
Add comprehensive documentation for the export_models fixture system:
- Module-level docstring covering caching strategy, invalidation rules,
fixture usage patterns, and CLI usage
- Inline docstrings for key functions
- New tests/AGENTS.md with agent-facing guidance on test infrastructure,
pre-export workflow, expected test durations, and compiler cache policy
- Reference tests/AGENTS.md from root AGENTS.md context loading section
- Fix stale path reference to export_models.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: AGENTS.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,7 @@ directories, but must be read manually when working from the project root:
14
14
-`optimum/neuron/models/inference/backend/modules/attention/AGENTS.md` — attention or NKI kernel work
15
15
-`optimum/neuron/models/inference/<model>/AGENTS.md` — model-specific work (gemma3, llama, qwen3, etc.)
16
16
-`optimum/neuron/vllm/AGENTS.md` — vLLM integration work
17
+
-`tests/AGENTS.md` — test infrastructure, fixtures, and cache management
17
18
18
19
When adding a new model, create a `CLAUDE.md` containing `@AGENTS.md` in its directory
19
20
so this auto-loading applies to it automatically.
@@ -115,7 +116,7 @@ All test workflows follow the same pattern:
115
116
- Static shapes: runtime input shapes must match compiled shapes.
116
117
- Export and load in separate processes to avoid device conflicts.
117
118
- Neuron runtime does not release devices reliably within the same process.
118
-
- Decoder graph changes require cache prune when using the fixtures defined under `tests/fixtures/export_models.py`: `python tools/prune_test_models.py`.
119
+
- Decoder graph changes require cache prune when using the fixtures defined under `tests/fixtures/llm/export_models.py`: `python tools/prune_test_models.py`.
The code hash changes when `pyproject.toml` or anything under `optimum/neuron/models/inference/` changes.
69
+
Old repos must be pruned manually: `python tools/prune_test_models.py`.
70
+
71
+
## Always Pre-Export Models Before Running Tests
72
+
73
+
**CI always runs `python tests/fixtures/llm/export_models.py` as a separate step before any pytest invocation** (see `.github/workflows/test_inf2_llm.yml` and `test_inf2_vllm.yml`). You must do the same locally:
74
+
75
+
```bash
76
+
# Export all models (or use a pattern like 'llama*')
If you skip the pre-export step, fixtures will auto-export on first use. This causes:
84
+
-**Long hangs**: compilation takes 10-30+ minutes per model, making it hard to tell if a test is stuck or just compiling.
85
+
-**NeuronCore conflicts**: the compilation process may conflict with subprocess-isolated tests that also need device access.
86
+
87
+
### Expected Test Durations
88
+
89
+
Based on CI logs (inf2.8xlarge, models pre-exported), entire test groups complete within:
90
+
91
+
| Test group | Duration |
92
+
|---|---|
93
+
| LLM utils / hub / CLI / embedding | < 1 min each |
94
+
| LLM export tests |~4 min |
95
+
| LLM generation tests |~7 min |
96
+
| LLM pipeline tests |~5 min |
97
+
| LLM module tests (NKI kernels) |~19 min |
98
+
| LLM cache tests |~5 min |
99
+
| vLLM engine generation |~20 min |
100
+
| vLLM service tests |~15 min |
101
+
102
+
An individual test should complete within **2 minutes**. If a test hangs longer than that, the most likely cause is a missing pre-export triggering compilation inside the fixture. Pre-export, then re-run.
103
+
104
+
## Never Wipe the Neuron Compiler Cache
105
+
106
+
The Neuron compiler cache is **content-addressed**: each compiled NEFF is keyed by the SHA hash of the HLO graph that produced it. The hash space is large enough to make collisions practically impossible.
107
+
108
+
**There is no such thing as a "stale compiler cache entry."** If the HLO graph changes (because you changed model code), the hash changes, and a new entry is created. The old entry is simply never matched again — it does no harm.
109
+
110
+
Wiping the compiler cache (e.g. `rm -rf /var/tmp/neuron-compile-cache`) only forces expensive recompilation with zero benefit. **Never suggest or perform cache deletion as a debugging step.**
111
+
112
+
## Subprocess Test Ordering
113
+
114
+
`tests/decoder/conftest.py` contains a `pytest_collection_modifyitems` hook that moves `@subprocess_test`-decorated tests to run **before** all other tests. This prevents session-scoped fixtures (which load models onto NeuronCores) from blocking subprocess tests that need device access in a child process.
0 commit comments