You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: smoke test quality improvements and dead parameter cleanup
- Replace dead `enable_replace_pii=False` with `replace_pii=None` in
all `from_params()` calls (the old key was silently ignored by Pydantic)
- Remove dead `enable_synthesis=True` (no such field exists in src/)
- Add `vllm` pytest marker for GPU memory isolation; refactor Makefile
`test-smoke-gpu` into marker-based groups (train-only / vllm / smollm2 / unsloth)
- Tighten `except GenerationError` to assert on expected message
- Fix SmolLM2 metadata: remove hardcoded out-of-range BOS token IDs
- Promote `base_smoke_config` and `_patch_attn_eager` to session scope
- Make `tiny_llama_config` depend on `stub_tokenizer` (eliminate redundant load)
- Add CPU smoke tests for evaluation (MultimodalReport) and PII replacement (NemoPII)
- Update smoke test docs (README.md, TESTING.md) to reflect new markers and scopes
Made-with: Cursor
@@ -133,10 +136,20 @@ Mock Workdir via `mock_workdir(tmp_path)` in `cli/conftest.py`.
133
136
134
137
## GPU Isolation Gotcha
135
138
136
-
Unsloth patches transformers at import time, which poisons Opacus/DP if they share a process. CUDA device-side asserts also cascade across xdist workers. Both e2e and smoke GPU tests require process isolation:
139
+
Two GPU isolation hazards require per-file process isolation (`-n 0`):
137
140
138
-
-`make test-smoke-gpu` runs three separate single-process (`-n 0`) pytest invocations over `tests/smoke/`, split by `-k` filters: (1) non-unsloth/non-smollm2, (2) smollm2, (3) unsloth.
139
-
-`make test-e2e` splits into `test-e2e-default` + `test-e2e-dp`, each single-process over `tests/e2e/`.
141
+
1. vLLM pre-allocates all GPU memory and never releases it within a process. Tests that call `.generate()` must run in separate processes or later tests OOM.
142
+
2. Unsloth patches transformers at import time, poisoning Opacus/DP if they share a process.
143
+
144
+
GPU smoke tests use markers to express isolation requirements:
145
+
146
+
-`requires_gpu`: all GPU tests
147
+
-`vllm`: tests using vLLM generation (each file gets its own process)
148
+
-`smollm2`, `unsloth`: marker-isolated groups (auto-discovered)
149
+
150
+
`make test-smoke-gpu` uses marker algebra for train-only tests (auto-discovering via `requires_gpu and not vllm and not smollm2 and not unsloth`), explicit file paths for vLLM tests (per-file isolation), and marker selection for SmolLM2/Unsloth. When adding a new vLLM test file, add `pytest.mark.vllm` and also add the file to the Makefile's explicit list.
151
+
152
+
`make test-e2e` splits into `test-e2e-default` + `test-e2e-dp`, each single-process over `tests/e2e/`.
140
153
141
154
See [`tests/smoke/README.md`](smoke/README.md) for additional smoke-specific gotchas.
If you're adding a new training backend, generation backend, or model family,
14
-
add a smoke test for it. Same if you're changing how the SDK orchestrates
15
-
train/generate -- those paths are easy to break silently.
13
+
If you're adding a new training backend, generation backend, evaluation
14
+
component, or model family, add a smoke test for it. Same if you're changing
15
+
how the SDK orchestrates train/generate/evaluate -- those paths are easy to
16
+
break silently.
16
17
17
18
Smoke tests don't check output quality. They just make sure the code runs
18
19
end-to-end without throwing. Use the smallest model that exercises the path
@@ -21,31 +22,24 @@ a real tokenizer/model).
21
22
22
23
## GPU Test Process Isolation
23
24
24
-
GPU smoke tests run in three separate single-process (`-n 0`) pytest invocations to avoid CUDA and import-time conflicts:
25
+
GPU smoke tests use three marker-based isolation groups:
25
26
26
-
1.Local tiny-model tests (everything except SmolLM2 and Unsloth)
27
-
2.SmolLM2 Hub download test (downloads ~270MB from HuggingFace)
28
-
3.Unsloth backend test (process-isolated from DP tests)
27
+
1.Train-only (`requires_gpu` without `vllm`/`smollm2`/`unsloth`): share a single process, auto-discovered via marker algebra.
28
+
2.vLLM generation (`vllm` marker): each file gets its own process because vLLM pre-allocates all GPU memory and never releases it.
29
+
3.SmolLM2 / Unsloth (`smollm2`, `unsloth` markers): each gets its own process, auto-discovered via markers.
29
30
30
-
Why: Unsloth monkey-patches transformers at import time, poisoning Opacus/DP if they share a process. CUDA device-side asserts also cascade across xdist workers. The Makefile `test-smoke-gpu` target handles the split automatically via `-k` filters.
31
-
32
-
Tests use pytestmark decorators:
31
+
When adding a new GPU smoke test, add the appropriate markers to `pytestmark`:
33
32
34
33
```python
35
34
pytestmark = [
36
35
pytest.mark.requires_gpu,
36
+
pytest.mark.vllm, # if the test calls .generate() (uses vLLM)
37
37
pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA not available"),
38
38
pytest.mark.skipif(sys.platform =="darwin", reason="Not applicable on macOS"),
39
39
]
40
40
```
41
41
42
-
For SmolLM2 and Unsloth tests, add the marker to a test function:
43
-
44
-
```python
45
-
@pytest.mark.usefixtures("_register_smollm2") # for SmolLM2 tests
46
-
deftest_full_pipeline_smollm2(...):
47
-
...
48
-
```
42
+
If the new file uses vLLM, also add it to the explicit file list in the `test-smoke-gpu` Makefile target (vLLM files need per-file isolation).
The shared fixtures cover both CPU and GPU smoke tests. The most important ones:
55
+
The shared fixtures cover both CPU and GPU smoke tests. Session-scoped fixtures are created once per pytest process; function-scoped fixtures are recreated per test.
62
56
63
-
-`base_smoke_config` -- default `SafeSynthesizerParameters` pointing at the local tiny model
64
-
-`train_with_sdk(config, data_df, save_path)` -- convenience wrapper around the SDK train flow
65
-
-`assert_adapter_saved(workdir)` -- checks that adapter files landed on disk
57
+
Session-scoped (immutable / read-only):
58
+
59
+
-`base_smoke_config` -- default `SafeSynthesizerParameters` pointing at the local tiny model (Pydantic frozen model)
66
60
-`_patch_attn_eager` -- the attention implementation workaround mentioned above
67
-
-`tiny_model`, `stub_tokenizer`, `tiny_training_dataset` -- CPU test building blocks
68
-
-`local_tinyllama_dir` -- saves the tiny model to a temp dir so GPU tests don't need internet
61
+
-`stub_tokenizer`, `tiny_llama_config`, `local_tinyllama_dir` -- tokenizer and tiny model on disk
69
62
-`iris_df`, `timeseries_df` -- small DataFrames for training input
70
63
64
+
Function-scoped (fresh per test):
65
+
66
+
-`tiny_model` -- randomly initialized `LlamaForCausalLM` (mutated by training)
67
+
68
+
Helpers (plain functions, not fixtures):
69
+
70
+
-`train_with_sdk(config, data_df, save_path)` -- convenience wrapper around the SDK train flow
71
+
-`assert_adapter_saved(workdir)` -- checks that adapter files landed on disk
72
+
71
73
See [CONTRIBUTING.md](../../CONTRIBUTING.md#testing) for the full list of test commands.
0 commit comments