Skip to content

Commit 7c5ce08

Browse files
committed
feature: smoke tests
Signed-off-by: aagonzales <aagonzales@nvidia.com>
1 parent e679d4b commit 7c5ce08

18 files changed

Lines changed: 950 additions & 2 deletions

Makefile

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -168,9 +168,14 @@ test-ci-slow: ## Run slow tests in CI with coverage
168168
pushd $(NSS_ROOT_PATH) && \
169169
$(PYTEST_CMD) $(PYTEST_CI_OPTS) $(NSS_ROOT_PATH)/tests -m "slow"
170170

171+
.PHONY: test-smoke
172+
test-smoke: ## Run CPU smoke tests (~few min, no GPU required)
173+
$(PYTEST_CMD) -m "smoke and not gpu_integration"
174+
171175
.PHONY: test-gpu-integration
172-
test-gpu-integration: ## Run GPU integration tests
173-
pushd $(NSS_ROOT_PATH) && \
176+
test-gpu-integration: ## Run GPU integration tests (smoke GPU + e2e)
177+
$(PYTEST_CMD) tests/smoke/ -m "gpu_integration" -k "not unsloth" && \
178+
$(PYTEST_CMD) tests/smoke/ -m "gpu_integration" -k "unsloth" && \
174179
$(PYTEST_CMD) $(NSS_ROOT_PATH)/tests/e2e/ -m "gpu_integration and not e2e" -k default && \
175180
$(PYTEST_CMD) $(NSS_ROOT_PATH)/tests/e2e/ -m "gpu_integration and not e2e" -k dp
176181

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ dependencies = [
3636
"structlog>=25.4.0",
3737
"colorama>=0.4.6",
3838
"tqdm>=4.67.1",
39+
"setuptools>=80.0.0",
40+
3941
]
4042

4143
[dependency-groups]

pytest.ini

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ markers =
2222
unit: Unit tests - test single classes/functions with no infrastructure dependencies
2323
unit_test: Legacy marker for unit tests (deprecated, use 'unit' instead)
2424
noautouse: Marker to skip autouse fixtures for specific tests
25+
smoke: Smoke tests - slow unit tests exercising training/generation hot paths with tiny model
2526

2627
# Note: Unit tests (testing single classes/functions with no infrastructure dependencies)
2728
# do not need markers and are the default test type.

tests/conftest.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ def pytest_collection_modifyitems(config, items):
2323
"e2e",
2424
"integration",
2525
"gpu_integration",
26+
"smoke",
2627
}
2728

2829
for item in items:
@@ -45,6 +46,11 @@ def pytest_collection_modifyitems(config, items):
4546
item.add_marker(pytest.mark.integration)
4647
marker_names.add("integration")
4748

49+
if "/smoke/" in path_str:
50+
if "smoke" not in marker_names:
51+
item.add_marker(pytest.mark.smoke)
52+
marker_names.add("smoke")
53+
4854
if not marker_names.intersection(category_markers):
4955
item.add_marker(pytest.mark.unit)
5056

tests/smoke/README.md

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
# Fast Smoke Tests for Training and Generation Hot Paths
2+
3+
Smoke tests exercising training and generation hot paths with a tiny model.
4+
Run CPU tests without GPU (`make test-smoke`), GPU tests via `make test-gpu-integration`.
5+
6+
## Shared Infrastructure (`conftest.py`)
7+
8+
Key fixtures and helpers available to all smoke tests:
9+
10+
- **`base_smoke_config`** -- `SafeSynthesizerParameters` with local tiny model defaults
11+
- **`train_with_sdk(config, data_df, save_path)`** -- runs `process_data().train()`, returns the `SafeSynthesizer` instance
12+
- **`assert_adapter_saved(workdir)`** -- asserts `adapter_config.json` + `*.safetensors` exist
13+
- **`_patch_attn_eager`** -- monkeypatches `attn_implementation` to `"eager"` for tiny model compatibility
14+
- **`tiny_model`** / **`stub_tokenizer`** / **`tiny_training_dataset`** -- CPU test primitives
15+
- **`local_tinyllama_dir`** -- local model directory for GPU tests (no internet needed)
16+
- **`iris_df`** / **`timeseries_df`** -- small stub datasets
17+
18+
## Design Origin
19+
20+
This test suite was organized into **self-contained work units (WUs)** that were delegated independently. WU1 and WU2 (infrastructure and fixtures) were done first. After that, WU3-WU11 were done in parallel, then consolidated in WU13.
21+
22+
## Dependency Graph and Parallel Execution Strategy
23+
24+
There are only two sequential dependencies: WU1 -> WU2 (foundation). After that, **all remaining WUs are fully independent** -- no test file reads output from another test file. Each GPU test does its own training internally.
25+
26+
```mermaid
27+
flowchart TD
28+
subgraph phase1 ["Phase 1 -- Foundation (sequential, do first)"]
29+
WU1["WU1: Infrastructure"] --> WU2["WU2: Shared Fixtures"]
30+
end
31+
32+
subgraph phase2 ["Phase 2 -- All parallelizable (no inter-dependencies)"]
33+
direction TB
34+
batchA["Batch A: WU0 README"]
35+
batchB["Batch B: WU3 CPU Training + WU4 CPU Generation"]
36+
batchC["Batch C: WU5 GPU Training + WU10 GPU Adapter Persistence"]
37+
batchD["Batch D: WU6 GPU Generation + WU8 GPU Structured Gen"]
38+
batchE["Batch E: WU7 GPU Timeseries"]
39+
batchF["Batch F: WU9 GPU Resume"]
40+
batchG["Batch G: WU11 GPU Full Pipeline + WU12 Unsloth"]
41+
end
42+
43+
subgraph phase3 ["Phase 3 -- Consolidation (fresh agent, after all Phase 2)"]
44+
WU13["WU13: DRY pass -- deduplicate, extract helpers, consolidate"]
45+
end
46+
47+
WU2 --> batchA
48+
WU2 --> batchB
49+
WU2 --> batchC
50+
WU2 --> batchD
51+
WU2 --> batchE
52+
WU2 --> batchF
53+
WU2 --> batchG
54+
55+
batchA --> WU13
56+
batchB --> WU13
57+
batchC --> WU13
58+
batchD --> WU13
59+
batchE --> WU13
60+
batchF --> WU13
61+
batchG --> WU13
62+
```
63+
64+
65+
66+
### Recommended Delegation Batches
67+
68+
WU3-WU11 are grouped by **skill similarity** so each assignee has minimal context-switching:
69+
70+
71+
| Batch | WUs | Why grouped | Skills needed | Size |
72+
| ----------- | --------------- | ----------------------------------------------------------------- | ------------------------------------------------- | ------------------------- |
73+
| **Phase 1** | WU0 + WU1 + WU2 | Sequential foundation; one person does all setup | pytest fixtures, basic infra | ~30 min |
74+
| **B** | WU3 + WU4 | Both CPU-only, similar Trainer/generate patterns | HF Trainer, peft, Opacus, NSS assembler/processor | Medium (2 files, 7 tests) |
75+
| **C** | WU5 + WU10 | Both train via SDK then inspect adapter output | SafeSynthesizer SDK, PEFT adapter loading | Medium (2 files, 5 tests) |
76+
| **D** | WU6 + WU8 | Both exercise vLLM generation paths | VllmBackend, vLLM, structured outputs | Medium (2 files, 3 tests) |
77+
| **E** | WU7 | Specialized timeseries knowledge | TimeseriesBackend, timeseries config | Small (1 file, 1 test) |
78+
| **F** | WU9 | Specialized resume/Workdir knowledge | SafeSynthesizer resume, load_from_save_path | Small (1 file, 1 test) |
79+
| **G** | WU11 + WU12 | Both need internet + HF Hub; WU12 needs process isolation from DP | SmolLM2, Unsloth, HF Hub, Makefile update | Medium (2 files, 3 tests) |
80+
81+
82+
### Priority order (if fewer hands available)
83+
84+
### Phase 3: Consolidation (sequential, after all Phase 2 batches complete)
85+
86+
87+
| Batch | WU | Purpose | Owner |
88+
| ----- | ---- | ------------------------------------------------------------------------------- | -------------------------------------------------- |
89+
| **H** | WU13 | Holistic DRY pass: deduplicate configs, extract helpers, consolidate decorators | **Fresh agent** (must NOT have worked on WU3-WU12) |
90+
91+
92+
### Priority order (if fewer hands available)
93+
94+
If you cannot parallelize all batches, do them in this order (highest value first):
95+
96+
1. **Phase 1** (A) -- must go first
97+
2. **C** (GPU SDK training + adapter) -- highest signal for catching regressions
98+
3. **D** (GPU generation + structured) -- tests the production generation path
99+
4. **B** (CPU training + generation) -- catches dep breakage without GPU
100+
5. **E** (timeseries) -- specialized but important path
101+
6. **F** (resume) -- important production flow
102+
7. **G** (SmolLM2 + Unsloth) -- lowest priority, needs internet; WU12 also needs Makefile update for process isolation
103+
8. **H** (consolidation) -- must be last; requires fresh eyes
104+
105+
---
106+
107+
## Critical Gotchas (every WU must know these)
108+
109+
These were discovered by automated council review and affect ALL work units:
110+
111+
1. **Copyright headers**: Every new `.py` file MUST start with:
112+
```python
113+
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
114+
# SPDX-License-Identifier: Apache-2.0
115+
```
116+
Enforced by `tools/lint/copyright_fixer.py --check .` in CI.
117+
2. `**lora_r=8` not 4**: vLLM only allows LoRA ranks in {1, 8, 16, 32, 64, ...}. Rank 4 is silently rejected. Use 8 everywhere in smoke tests.
118+
3. `**holdout=0, max_holdout=0**`: The iris dataset has 151 rows, but `Holdout.train_test_split()` in `src/nemo_safe_synthesizer/holdout/holdout.py` requires >=200 rows. Setting holdout=0 bypasses this.
119+
4. `**attn_implementation="eager"**`: The `HuggingFaceBackend` defaults to `flash_attention_2` which can fail with head_dim=32 (our tiny model: hidden_size=64 / 2 heads). Override to `"eager"` in smoke tests.
120+
5. `**vocab_size=32000**`: The stub tokenizer at `tests/stub_tokenizer/` has 32000 tokens. The tiny model config must match this exactly.
121+
6. `**use_unsloth=False**`: Always set explicitly. The `auto` default may resolve to `True` and pull in Unsloth, which invasively patches transformers.
122+
7. `**optim="adamw_torch"**` for CPU tests: The production default `paged_adamw_32bit` requires bitsandbytes CUDA kernels.
123+
124+
---
125+
126+
## Summary: File Inventory
127+
128+
129+
| File | WU | Tests | Marker |
130+
| ------------------------------------------------- | ---- | ------------------ | --------------------------- |
131+
| `tests/smoke/README.md` | WU0 | -- (documentation) | -- |
132+
| `tests/smoke/__init__.py` | WU1 | -- | -- |
133+
| `tests/smoke/conftest.py` | WU2 | -- (fixtures only) | -- |
134+
| `tests/smoke/test_training_cpu.py` | WU3 | 4 | `smoke` (auto) |
135+
| `tests/smoke/test_generation_cpu.py` | WU4 | 3 | `smoke` (auto) |
136+
| `tests/smoke/test_nss_training_gpu.py` | WU5 | 2 | `smoke` + `gpu_integration` |
137+
| `tests/smoke/test_nss_generation_gpu.py` | WU6 | 2 | `smoke` + `gpu_integration` |
138+
| `tests/smoke/test_nss_timeseries_gpu.py` | WU7 | 1 | `smoke` + `gpu_integration` |
139+
| `tests/smoke/test_nss_structured_gen_gpu.py` | WU8 | 1 | `smoke` + `gpu_integration` |
140+
| `tests/smoke/test_nss_resume_gpu.py` | WU9 | 1 | `smoke` + `gpu_integration` |
141+
| `tests/smoke/test_nss_adapter_persistence_gpu.py` | WU10 | 3 | `smoke` + `gpu_integration` |
142+
| `tests/smoke/test_full_pipeline_gpu.py` | WU11 | 2 | `smoke` + `gpu_integration` |
143+
| `tests/smoke/test_nss_unsloth_gpu.py` | WU12 | 1 | `smoke` + `gpu_integration` |
144+
145+
146+
**Modified files**: `tests/conftest.py`, `pytest.ini`, `Makefile`
147+
148+
**Total**: 21 tests across 10 test files, plus 2 infra files (conftest, init) and 1 README.
149+
150+
## Running
151+
152+
```bash
153+
# CPU smoke tests only (~10 seconds, no GPU required)
154+
make test-smoke
155+
156+
# GPU smoke + e2e tests (requires CUDA)
157+
make test-gpu-integration
158+
```

tests/smoke/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0

tests/smoke/conftest.py

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
from pathlib import Path
5+
6+
import pandas as pd
7+
import pytest
8+
from datasets import Dataset
9+
from transformers import AutoTokenizer, LlamaConfig, LlamaForCausalLM
10+
11+
12+
@pytest.fixture(scope="session")
13+
def fixture_stub_tokenizer_path() -> str:
14+
"""Session-scoped override of the function-scoped fixture in tests/conftest.py."""
15+
return str(Path(__file__).parent.parent / "stub_tokenizer")
16+
17+
18+
@pytest.fixture(scope="session")
19+
def tiny_llama_config(fixture_stub_tokenizer_path):
20+
"""LlamaConfig with minimal dimensions for fast smoke testing."""
21+
tokenizer = AutoTokenizer.from_pretrained(fixture_stub_tokenizer_path)
22+
return LlamaConfig(
23+
vocab_size=tokenizer.vocab_size, # 32000 -- must match stub tokenizer
24+
hidden_size=64,
25+
intermediate_size=128,
26+
num_hidden_layers=2,
27+
num_attention_heads=2,
28+
num_key_value_heads=2,
29+
max_position_embeddings=128,
30+
)
31+
32+
33+
@pytest.fixture
34+
def tiny_model(tiny_llama_config):
35+
"""Randomly initialized LlamaForCausalLM. Tiny (~few KB), no download."""
36+
return LlamaForCausalLM(tiny_llama_config)
37+
38+
39+
@pytest.fixture(scope="session")
40+
def stub_tokenizer(fixture_stub_tokenizer_path):
41+
"""Load the Llama stub tokenizer from tests/stub_tokenizer/."""
42+
return AutoTokenizer.from_pretrained(fixture_stub_tokenizer_path)
43+
44+
45+
@pytest.fixture(scope="session")
46+
def tiny_training_dataset(stub_tokenizer):
47+
"""~8 tokenized training examples as a datasets.Dataset."""
48+
texts = [
49+
'{"col1":"a","col2":"1"}',
50+
'{"col1":"b","col2":"2"}',
51+
'{"col1":"c","col2":"3"}',
52+
'{"col1":"d","col2":"4"}',
53+
'{"col1":"e","col2":"5"}',
54+
'{"col1":"f","col2":"6"}',
55+
'{"col1":"g","col2":"7"}',
56+
'{"col1":"h","col2":"8"}',
57+
]
58+
tokenized = stub_tokenizer(texts, padding="max_length", truncation=True, max_length=64, return_tensors="np")
59+
return Dataset.from_dict(
60+
{
61+
"input_ids": tokenized["input_ids"].tolist(),
62+
"attention_mask": tokenized["attention_mask"].tolist(),
63+
"labels": tokenized["input_ids"].tolist(), # labels = input_ids for causal LM
64+
}
65+
)
66+
67+
68+
@pytest.fixture(scope="session")
69+
def tiny_training_dataset_with_position_ids(tiny_training_dataset):
70+
"""Training dataset with position_ids column, required by DataCollatorForPrivateTokenClassification."""
71+
seq_len = len(tiny_training_dataset[0]["input_ids"])
72+
position_ids = [list(range(seq_len))] * len(tiny_training_dataset)
73+
return tiny_training_dataset.add_column("position_ids", position_ids)
74+
75+
76+
@pytest.fixture(scope="session")
77+
def local_tinyllama_dir(tmp_path_factory, tiny_llama_config, stub_tokenizer):
78+
"""Save tiny model + tokenizer to a local dir named with 'tinyllama' for NSS compatibility."""
79+
local_dir = tmp_path_factory.mktemp("smoke-tinyllama-model")
80+
model = LlamaForCausalLM(tiny_llama_config)
81+
model.save_pretrained(local_dir)
82+
stub_tokenizer.save_pretrained(local_dir)
83+
return local_dir
84+
85+
86+
@pytest.fixture(scope="session")
87+
def iris_df():
88+
"""Load iris.csv from stub_datasets."""
89+
return pd.read_csv(Path(__file__).parent.parent / "stub_datasets" / "iris.csv")
90+
91+
92+
@pytest.fixture(scope="session")
93+
def timeseries_df():
94+
"""Minimal timeseries stub: 2 groups, 5 rows each, elapsed_seconds."""
95+
return pd.DataFrame(
96+
{
97+
"group_id": ["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"],
98+
"elapsed_seconds": [0, 60, 120, 180, 240, 0, 60, 120, 180, 240],
99+
"value": [10, 20, 30, 40, 50, 100, 110, 120, 130, 140],
100+
}
101+
)
102+
103+
104+
@pytest.fixture(scope="session")
105+
def smoke_save_path(tmp_path_factory):
106+
"""Shared temp directory for Tier B (SmolLM2) train -> generate flow."""
107+
return tmp_path_factory.mktemp("smoke-tier-b")
108+
109+
110+
@pytest.fixture
111+
def base_smoke_config(local_tinyllama_dir):
112+
"""Base SafeSynthesizerParameters shared by all GPU smoke tests with local tiny model.
113+
114+
Individual tests override specific fields via SafeSynthesizerParameters.from_params(**overrides).
115+
"""
116+
from nemo_safe_synthesizer.config.parameters import SafeSynthesizerParameters
117+
118+
return SafeSynthesizerParameters.from_params(
119+
enable_synthesis=True,
120+
enable_replace_pii=False,
121+
pretrained_model=str(local_tinyllama_dir),
122+
use_unsloth=False,
123+
num_input_records_to_sample=10,
124+
num_records=5,
125+
lora_r=8,
126+
holdout=0,
127+
max_holdout=0,
128+
)
129+
130+
131+
def assert_adapter_saved(workdir):
132+
"""Verify adapter files exist after training.
133+
134+
Reusable assertion helper for any test that trains via the SDK.
135+
"""
136+
adapter_dir = workdir.train.adapter.path
137+
assert (adapter_dir / "adapter_config.json").exists(), "adapter_config.json missing"
138+
assert any(adapter_dir.glob("*.safetensors")), "No safetensors files found"
139+
140+
141+
def train_with_sdk(config, data_df, save_path):
142+
"""Run SafeSynthesizer.process_data().train() and return the instance."""
143+
from nemo_safe_synthesizer.sdk.library_builder import SafeSynthesizer
144+
145+
nss = SafeSynthesizer(config=config, save_path=save_path)
146+
nss.with_data_source(data_df).process_data().train()
147+
return nss
148+
149+
150+
@pytest.fixture
151+
def _patch_attn_eager(monkeypatch):
152+
"""Override attn_implementation to 'eager' for tiny model compatibility.
153+
154+
The HuggingFaceBackend defaults to 'flashinfer' which can fail with
155+
head_dim=32 (our tiny model: hidden_size=64 / 2 heads).
156+
"""
157+
from nemo_safe_synthesizer.training.huggingface_backend import HuggingFaceBackend
158+
159+
original = HuggingFaceBackend._build_base_framework_params
160+
161+
def patched(self, model_kwargs):
162+
model_kwargs.setdefault("attn_implementation", "eager")
163+
return original(self, model_kwargs)
164+
165+
monkeypatch.setattr(HuggingFaceBackend, "_build_base_framework_params", patched)

0 commit comments

Comments
 (0)