You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`conftest.py`: shared fixtures per directory; root conftest has `load_test_dataset()` and `load_test_dataframe()` helpers
732
732
- Use `tmp_path` fixture for file operations, never write to the repo tree
733
-
- Mark CUDA-dependent tests with `@pytest.mark.e2e`or `@pytest.mark.gpu_integration`
733
+
- Mark CUDA-dependent tests with `@pytest.mark.e2e`, `@pytest.mark.smoke`, or `@pytest.mark.requires_gpu`
734
734
- Mock only external boundaries, not internal implementation details
735
735
- Test isolation: no shared mutable state or execution-order dependencies between tests. If something must be run first before executing a test, include it in the test or a fixture.
736
736
- Use `@pytest.mark.parametrize` for testing multiple input combinations rather than copy-pasting similar tests
Copy file name to clipboardExpand all lines: tests/smoke/README.md
+28-1Lines changed: 28 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,6 +19,34 @@ end-to-end without throwing. Use the smallest model that exercises the path
19
19
(the local `tiny_llama` stub for most things, SmolLM2-135M when you need
20
20
a real tokenizer/model).
21
21
22
+
## GPU Test Process Isolation
23
+
24
+
GPU smoke tests run in three separate single-process (`-n 0`) pytest invocations to avoid CUDA and import-time conflicts:
25
+
26
+
1. Local tiny-model tests (everything except SmolLM2 and Unsloth)
27
+
2. SmolLM2 Hub download test (downloads ~270MB from HuggingFace)
28
+
3. Unsloth backend test (process-isolated from DP tests)
29
+
30
+
Why: Unsloth monkey-patches transformers at import time, poisoning Opacus/DP if they share a process. CUDA device-side asserts also cascade across xdist workers. The Makefile `test-smoke-gpu` target handles the split automatically via `-k` filters.
31
+
32
+
Tests use pytestmark decorators:
33
+
34
+
```python
35
+
pytestmark = [
36
+
pytest.mark.requires_gpu,
37
+
pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA not available"),
38
+
pytest.mark.skipif(sys.platform =="darwin", reason="Not applicable on macOS"),
39
+
]
40
+
```
41
+
42
+
For SmolLM2 and Unsloth tests, add the marker to a test function:
43
+
44
+
```python
45
+
@pytest.mark.usefixtures("_register_smollm2") # for SmolLM2 tests
46
+
deftest_full_pipeline_smollm2(...):
47
+
...
48
+
```
49
+
22
50
## Things that will bite you
23
51
24
52
- LoRA rank must be 8 (not 4). vLLM silently rejects rank 4. Use `lora_r=8`.
@@ -27,7 +55,6 @@ a real tokenizer/model).
27
55
- Stub tokenizer vocab is 32000. If you change the tiny model config, keep `vocab_size=32000` or you'll get shape mismatches.
28
56
- Always set `use_unsloth=False` unless you're specifically testing Unsloth. The `auto` default can pull it in and it monkey-patches transformers globally.
29
57
- CPU tests need `optim="adamw_torch"`. The production default (`paged_adamw_32bit`) requires bitsandbytes CUDA kernels.
30
-
- Unsloth tests run in a separate process. Unsloth patches transformers at import time, which breaks Opacus/DP if they share a process. The Makefile handles this automatically.
0 commit comments