|
1 | | -# Fast Smoke Tests for Training and Generation Hot Paths |
| 1 | +# Smoke Tests |
2 | 2 |
|
3 | | -Smoke tests exercising training and generation hot paths with a tiny model. |
4 | | -Run CPU tests without GPU (`make test-smoke`), GPU tests via `make test-gpu-integration`. |
| 3 | +Quick tests that verify training and generation code paths don't crash. |
| 4 | +They use tiny or small models and run in seconds (CPU) or a few minutes (GPU). |
5 | 5 |
|
6 | | -## Shared Infrastructure (`conftest.py`) |
7 | | - |
8 | | -Key fixtures and helpers available to all smoke tests: |
9 | | - |
10 | | -- **`base_smoke_config`** -- `SafeSynthesizerParameters` with local tiny model defaults |
11 | | -- **`train_with_sdk(config, data_df, save_path)`** -- runs `process_data().train()`, returns the `SafeSynthesizer` instance |
12 | | -- **`assert_adapter_saved(workdir)`** -- asserts `adapter_config.json` + `*.safetensors` exist |
13 | | -- **`_patch_attn_eager`** -- monkeypatches `attn_implementation` to `"eager"` for tiny model compatibility |
14 | | -- **`tiny_model`** / **`stub_tokenizer`** / **`tiny_training_dataset`** -- CPU test primitives |
15 | | -- **`local_tinyllama_dir`** -- local model directory for GPU tests (no internet needed) |
16 | | -- **`iris_df`** / **`timeseries_df`** -- small stub datasets |
17 | | - |
18 | | -## Design Origin |
19 | | - |
20 | | -This test suite was organized into **self-contained work units (WUs)** that were delegated independently. WU1 and WU2 (infrastructure and fixtures) were done first. After that, WU3-WU11 were done in parallel, then consolidated in WU13. |
21 | | - |
22 | | -## Dependency Graph and Parallel Execution Strategy |
23 | | - |
24 | | -There are only two sequential dependencies: WU1 -> WU2 (foundation). After that, **all remaining WUs are fully independent** -- no test file reads output from another test file. Each GPU test does its own training internally. |
25 | | - |
26 | | -```mermaid |
27 | | -flowchart TD |
28 | | - subgraph phase1 ["Phase 1 -- Foundation (sequential, do first)"] |
29 | | - WU1["WU1: Infrastructure"] --> WU2["WU2: Shared Fixtures"] |
30 | | - end |
31 | | -
|
32 | | - subgraph phase2 ["Phase 2 -- All parallelizable (no inter-dependencies)"] |
33 | | - direction TB |
34 | | - batchA["Batch A: WU0 README"] |
35 | | - batchB["Batch B: WU3 CPU Training + WU4 CPU Generation"] |
36 | | - batchC["Batch C: WU5 GPU Training + WU10 GPU Adapter Persistence"] |
37 | | - batchD["Batch D: WU6 GPU Generation + WU8 GPU Structured Gen"] |
38 | | - batchE["Batch E: WU7 GPU Timeseries"] |
39 | | - batchF["Batch F: WU9 GPU Resume"] |
40 | | - batchG["Batch G: WU11 GPU Full Pipeline + WU12 Unsloth"] |
41 | | - end |
42 | | -
|
43 | | - subgraph phase3 ["Phase 3 -- Consolidation (fresh agent, after all Phase 2)"] |
44 | | - WU13["WU13: DRY pass -- deduplicate, extract helpers, consolidate"] |
45 | | - end |
46 | | -
|
47 | | - WU2 --> batchA |
48 | | - WU2 --> batchB |
49 | | - WU2 --> batchC |
50 | | - WU2 --> batchD |
51 | | - WU2 --> batchE |
52 | | - WU2 --> batchF |
53 | | - WU2 --> batchG |
54 | | -
|
55 | | - batchA --> WU13 |
56 | | - batchB --> WU13 |
57 | | - batchC --> WU13 |
58 | | - batchD --> WU13 |
59 | | - batchE --> WU13 |
60 | | - batchF --> WU13 |
61 | | - batchG --> WU13 |
| 6 | +```bash |
| 7 | +make test-smoke # CPU only, no GPU needed |
| 8 | +make test-gpu-integration # GPU tests (requires CUDA) |
62 | 9 | ``` |
63 | 10 |
|
| 11 | +## When should I add a smoke test? |
64 | 12 |
|
| 13 | +If you're adding a new training backend, generation backend, or model family, |
| 14 | +add a smoke test for it. Same if you're changing how the SDK orchestrates |
| 15 | +train/generate -- those paths are easy to break silently. |
65 | 16 |
|
66 | | -### Recommended Delegation Batches |
67 | | - |
68 | | -WU3-WU11 are grouped by **skill similarity** so each assignee has minimal context-switching: |
69 | | - |
70 | | - |
71 | | -| Batch | WUs | Why grouped | Skills needed | Size | |
72 | | -| ----------- | --------------- | ----------------------------------------------------------------- | ------------------------------------------------- | ------------------------- | |
73 | | -| **Phase 1** | WU0 + WU1 + WU2 | Sequential foundation; one person does all setup | pytest fixtures, basic infra | ~30 min | |
74 | | -| **B** | WU3 + WU4 | Both CPU-only, similar Trainer/generate patterns | HF Trainer, peft, Opacus, NSS assembler/processor | Medium (2 files, 7 tests) | |
75 | | -| **C** | WU5 + WU10 | Both train via SDK then inspect adapter output | SafeSynthesizer SDK, PEFT adapter loading | Medium (2 files, 5 tests) | |
76 | | -| **D** | WU6 + WU8 | Both exercise vLLM generation paths | VllmBackend, vLLM, structured outputs | Medium (2 files, 3 tests) | |
77 | | -| **E** | WU7 | Specialized timeseries knowledge | TimeseriesBackend, timeseries config | Small (1 file, 1 test) | |
78 | | -| **F** | WU9 | Specialized resume/Workdir knowledge | SafeSynthesizer resume, load_from_save_path | Small (1 file, 1 test) | |
79 | | -| **G** | WU11 + WU12 | Both need internet + HF Hub; WU12 needs process isolation from DP | SmolLM2, Unsloth, HF Hub, Makefile update | Medium (2 files, 3 tests) | |
80 | | - |
81 | | - |
82 | | -### Priority order (if fewer hands available) |
| 17 | +Smoke tests don't check output quality. They just make sure the code runs |
| 18 | +end-to-end without throwing. Use the smallest model that exercises the path |
| 19 | +(the local `tiny_llama` stub for most things, SmolLM2-135M when you need |
| 20 | +a real tokenizer/model). |
83 | 21 |
|
84 | | -### Phase 3: Consolidation (sequential, after all Phase 2 batches complete) |
| 22 | +## Things that will bite you |
85 | 23 |
|
| 24 | +- **LoRA rank must be 8** (not 4). vLLM silently rejects rank 4. Use `lora_r=8`. |
| 25 | +- **Iris only has 151 rows**, but holdout needs >=200. Set `holdout=0, max_holdout=0` to skip it. |
| 26 | +- **Attention implementation**: HuggingFaceBackend defaults to `flashinfer`, which HF doesn't recognize. The `_patch_attn_eager` fixture overrides it to `"sdpa"`. |
| 27 | +- **Stub tokenizer vocab is 32000**. If you change the tiny model config, keep `vocab_size=32000` or you'll get shape mismatches. |
| 28 | +- **Always set `use_unsloth=False`** unless you're specifically testing Unsloth. The `auto` default can pull it in and it monkey-patches transformers globally. |
| 29 | +- **CPU tests need `optim="adamw_torch"`**. The production default (`paged_adamw_32bit`) requires bitsandbytes CUDA kernels. |
| 30 | +- **Unsloth tests run in a separate process**. Unsloth patches transformers at import time, which breaks Opacus/DP if they share a process. The Makefile handles this automatically. |
86 | 31 |
|
87 | | -| Batch | WU | Purpose | Owner | |
88 | | -| ----- | ---- | ------------------------------------------------------------------------------- | -------------------------------------------------- | |
89 | | -| **H** | WU13 | Holistic DRY pass: deduplicate configs, extract helpers, consolidate decorators | **Fresh agent** (must NOT have worked on WU3-WU12) | |
| 32 | +## What's in `conftest.py`? |
90 | 33 |
|
| 34 | +The shared fixtures cover both CPU and GPU smoke tests. The most important ones: |
91 | 35 |
|
92 | | -### Priority order (if fewer hands available) |
| 36 | +- `base_smoke_config` -- default `SafeSynthesizerParameters` pointing at the local tiny model |
| 37 | +- `train_with_sdk(config, data_df, save_path)` -- convenience wrapper around the SDK train flow |
| 38 | +- `assert_adapter_saved(workdir)` -- checks that adapter files landed on disk |
| 39 | +- `_patch_attn_eager` -- the attention implementation workaround mentioned above |
| 40 | +- `tiny_model`, `stub_tokenizer`, `tiny_training_dataset` -- CPU test building blocks |
| 41 | +- `local_tinyllama_dir` -- saves the tiny model to a temp dir so GPU tests don't need internet |
| 42 | +- `iris_df`, `timeseries_df` -- small DataFrames for training input |
93 | 43 |
|
94 | | -If you cannot parallelize all batches, do them in this order (highest value first): |
95 | | - |
96 | | -1. **Phase 1** (A) -- must go first |
97 | | -2. **C** (GPU SDK training + adapter) -- highest signal for catching regressions |
98 | | -3. **D** (GPU generation + structured) -- tests the production generation path |
99 | | -4. **B** (CPU training + generation) -- catches dep breakage without GPU |
100 | | -5. **E** (timeseries) -- specialized but important path |
101 | | -6. **F** (resume) -- important production flow |
102 | | -7. **G** (SmolLM2 + Unsloth) -- lowest priority, needs internet; WU12 also needs Makefile update for process isolation |
103 | | -8. **H** (consolidation) -- must be last; requires fresh eyes |
104 | | - |
105 | | ---- |
106 | | - |
107 | | -## Critical Gotchas (every WU must know these) |
108 | | - |
109 | | -These were discovered by automated council review and affect ALL work units: |
110 | | - |
111 | | -1. **Copyright headers**: Every new `.py` file MUST start with: |
112 | | - ```python |
113 | | - # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
114 | | - # SPDX-License-Identifier: Apache-2.0 |
115 | | - ``` |
116 | | - Enforced by `tools/lint/copyright_fixer.py --check .` in CI. |
117 | | -2. `**lora_r=8` not 4**: vLLM only allows LoRA ranks in {1, 8, 16, 32, 64, ...}. Rank 4 is silently rejected. Use 8 everywhere in smoke tests. |
118 | | -3. `**holdout=0, max_holdout=0**`: The iris dataset has 151 rows, but `Holdout.train_test_split()` in `src/nemo_safe_synthesizer/holdout/holdout.py` requires >=200 rows. Setting holdout=0 bypasses this. |
119 | | -4. `**attn_implementation="eager"**`: The `HuggingFaceBackend` defaults to `flash_attention_2` which can fail with head_dim=32 (our tiny model: hidden_size=64 / 2 heads). Override to `"eager"` in smoke tests. |
120 | | -5. `**vocab_size=32000**`: The stub tokenizer at `tests/stub_tokenizer/` has 32000 tokens. The tiny model config must match this exactly. |
121 | | -6. `**use_unsloth=False**`: Always set explicitly. The `auto` default may resolve to `True` and pull in Unsloth, which invasively patches transformers. |
122 | | -7. `**optim="adamw_torch"**` for CPU tests: The production default `paged_adamw_32bit` requires bitsandbytes CUDA kernels. |
123 | | - |
124 | | ---- |
125 | | - |
126 | | -## Summary: File Inventory |
127 | | - |
128 | | - |
129 | | -| File | WU | Tests | Marker | |
130 | | -| ------------------------------------------------- | ---- | ------------------ | --------------------------- | |
131 | | -| `tests/smoke/README.md` | WU0 | -- (documentation) | -- | |
132 | | -| `tests/smoke/__init__.py` | WU1 | -- | -- | |
133 | | -| `tests/smoke/conftest.py` | WU2 | -- (fixtures only) | -- | |
134 | | -| `tests/smoke/test_training_cpu.py` | WU3 | 4 | `smoke` (auto) | |
135 | | -| `tests/smoke/test_generation_cpu.py` | WU4 | 3 | `smoke` (auto) | |
136 | | -| `tests/smoke/test_nss_training_gpu.py` | WU5 | 2 | `smoke` + `gpu_integration` | |
137 | | -| `tests/smoke/test_nss_generation_gpu.py` | WU6 | 2 | `smoke` + `gpu_integration` | |
138 | | -| `tests/smoke/test_nss_timeseries_gpu.py` | WU7 | 1 | `smoke` + `gpu_integration` | |
139 | | -| `tests/smoke/test_nss_structured_gen_gpu.py` | WU8 | 1 | `smoke` + `gpu_integration` | |
140 | | -| `tests/smoke/test_nss_resume_gpu.py` | WU9 | 1 | `smoke` + `gpu_integration` | |
141 | | -| `tests/smoke/test_nss_adapter_persistence_gpu.py` | WU10 | 3 | `smoke` + `gpu_integration` | |
142 | | -| `tests/smoke/test_full_pipeline_gpu.py` | WU11 | 2 | `smoke` + `gpu_integration` | |
143 | | -| `tests/smoke/test_nss_unsloth_gpu.py` | WU12 | 1 | `smoke` + `gpu_integration` | |
144 | | - |
145 | | - |
146 | | -**Modified files**: `tests/conftest.py`, `pytest.ini`, `Makefile` |
147 | | - |
148 | | -**Total**: 21 tests across 10 test files, plus 2 infra files (conftest, init) and 1 README. |
149 | | - |
150 | | -## Running |
151 | | - |
152 | | -```bash |
153 | | -# CPU smoke tests only (~10 seconds, no GPU required) |
154 | | -make test-smoke |
155 | | - |
156 | | -# GPU smoke + e2e tests (requires CUDA) |
157 | | -make test-gpu-integration |
158 | | -``` |
| 44 | +See [CONTRIBUTING.md](../../CONTRIBUTING.md#testing) for the full list of test commands. |
0 commit comments