Skip to content

Commit 6fd711e

Browse files
committed
fix: smoke test quality improvements and dead parameter cleanup
- Replace dead `enable_replace_pii=False` with `replace_pii=None` in all `from_params()` calls (the old key was silently ignored by Pydantic) - Remove dead `enable_synthesis=True` (no such field exists in src/) - Add `vllm` pytest marker for GPU memory isolation; refactor Makefile `test-smoke-gpu` into marker-based groups (train-only / vllm / smollm2 / unsloth) - Tighten `except GenerationError` to assert on expected message - Fix SmolLM2 metadata: remove hardcoded out-of-range BOS token IDs - Promote `base_smoke_config` and `_patch_attn_eager` to session scope - Make `tiny_llama_config` depend on `stub_tokenizer` (eliminate redundant load) - Add CPU smoke tests for evaluation (MultimodalReport) and PII replacement (NemoPII) - Update smoke test docs (README.md, TESTING.md) to reflect new markers and scopes Made-with: Cursor
1 parent 9137c3e commit 6fd711e

17 files changed

Lines changed: 250 additions & 177 deletions

Makefile

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -182,13 +182,28 @@ test-ci-slow: ## Run slow tests in CI with coverage
182182
test-smoke: ## Run CPU smoke tests (~few min, no GPU required)
183183
$(PYTEST_CMD) -m "smoke and not requires_gpu"
184184

185+
SMOKE_DIR := tests/smoke
185186
.PHONY: test-smoke-gpu
186187
test-smoke-gpu: ## Run GPU smoke tests (requires CUDA)
187-
# -n 0 disables xdist: CUDA device-side asserts poison the worker, cascading to all subsequent tests.
188-
# Separate invocations: (1) local tiny-model tests, (2) SmolLM2 Hub test, (3) Unsloth (process-isolated from DP).
189-
$(PYTEST_CMD) tests/smoke/ -n 0 -m "requires_gpu and not unsloth and not smollm2"
190-
$(PYTEST_CMD) tests/smoke/ -n 0 -m "requires_gpu and smollm2"
191-
$(PYTEST_CMD) tests/smoke/ -n 0 -m "requires_gpu and unsloth"
188+
# -n 0 disables xdist. Groups are split for GPU memory isolation.
189+
#
190+
# When adding a new GPU smoke test file:
191+
# - Train-only (no vLLM): add pytest.mark.requires_gpu -> auto-discovered below
192+
# - Uses vLLM: also add pytest.mark.vllm -> add the file to the vLLM list below
193+
# - Uses Unsloth: also add pytest.mark.unsloth -> auto-discovered below
194+
# - Downloads from Hub: also add pytest.mark.smollm2 (or similar) -> auto-discovered below
195+
#
196+
# 1) Train-only tests share a process (no vLLM, safe to batch).
197+
$(PYTEST_CMD) $(SMOKE_DIR)/ -n 0 -m "requires_gpu and not vllm and not smollm2 and not unsloth"
198+
# 2) Each vLLM test file gets its own process -- vLLM pre-allocates all GPU
199+
# memory and never releases it within a process.
200+
$(PYTEST_CMD) $(SMOKE_DIR)/test_nss_generation_gpu.py -n 0
201+
$(PYTEST_CMD) $(SMOKE_DIR)/test_nss_resume_gpu.py -n 0
202+
$(PYTEST_CMD) $(SMOKE_DIR)/test_nss_structured_gen_gpu.py -n 0
203+
$(PYTEST_CMD) $(SMOKE_DIR)/test_nss_timeseries_gpu.py -n 0
204+
# 3) SmolLM2 (Hub download + vLLM) and Unsloth (patches transformers) are marker-isolated.
205+
$(PYTEST_CMD) $(SMOKE_DIR)/ -n 0 -m "requires_gpu and smollm2"
206+
$(PYTEST_CMD) $(SMOKE_DIR)/ -n 0 -m "requires_gpu and unsloth"
192207

193208

194209
E2E_TEST_FILE := $(NSS_ROOT_PATH)/tests/e2e/test_safe_synthesizer.py

pytest.ini

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ markers =
1818
smoke: Smoke tests - quick tests exercising training/generation hot paths with tiny models
1919
e2e: End-to-end tests - test the entire pipeline from data to generation to evaluation
2020
requires_gpu: Test needs CUDA hardware (orthogonal modifier, stacks on smoke/e2e)
21+
vllm: Tests using vLLM generation backend (each file runs in its own process for GPU memory isolation)
2122
smollm2: SmolLM2 Hub download tests (used by Makefile for process isolation)
2223
unsloth: Unsloth backend tests (process-isolated from DP tests)
2324
noautouse: Marker to skip autouse fixtures for specific tests

src/nemo_safe_synthesizer/llm/metadata.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -767,8 +767,6 @@ def __init__(
767767
add_bos_token_to_prompt=False,
768768
add_eos_token_to_prompt=False,
769769
tokenizer=tokenizer,
770-
bos_token="<|im_start|>",
771-
bos_token_id=151644,
772770
name=model_name_or_path,
773771
),
774772
model_name_or_path=model_name_or_path,

tests/TESTING.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,9 @@ Defined in `pytest.ini` (`--strict-markers` is enabled):
7373
| `smoke` | Quick smoke tests (training/generation hot paths, tiny models) |
7474
| `e2e` | End-to-end pipeline tests (requires CUDA) |
7575
| `requires_gpu` | Test needs CUDA hardware (modifier, stacks on `smoke`/`e2e`) |
76+
| `vllm` | Tests using vLLM generation backend (each file runs in its own process for GPU memory isolation) |
77+
| `smollm2` | SmolLM2 Hub download tests (Makefile uses for process isolation) |
78+
| `unsloth` | Unsloth backend tests (process-isolated from DP tests) |
7679
| `noautouse` | Skip autouse fixtures for specific tests |
7780

7881
## Auto-marking
@@ -117,7 +120,7 @@ Per-module fixtures:
117120
- Generation/eval/data_processing: shared tokenizer and JSONL fixtures
118121
- CLI: `mock_workdir(tmp_path)` for tmp_path-based Workdir
119122
- Config: `basic_parameter`, `training_hyperparams`, `simple_safe_synthesizer_parameters`
120-
- Smoke: session-scoped `tiny_llama_config`, `stub_tokenizer`, `local_tinyllama_dir`, `iris_df`; function-scoped `base_smoke_config`, `tiny_model`, `_patch_attn_eager`; helpers `train_with_sdk()`, `assert_adapter_saved()`
123+
- Smoke: session-scoped `tiny_llama_config`, `stub_tokenizer`, `local_tinyllama_dir`, `iris_df`, `base_smoke_config`, `_patch_attn_eager`; function-scoped `tiny_model`; helpers `train_with_sdk()`, `assert_adapter_saved()`
121124

122125
## Fixture Scoping
123126

@@ -133,10 +136,20 @@ Mock Workdir via `mock_workdir(tmp_path)` in `cli/conftest.py`.
133136

134137
## GPU Isolation Gotcha
135138

136-
Unsloth patches transformers at import time, which poisons Opacus/DP if they share a process. CUDA device-side asserts also cascade across xdist workers. Both e2e and smoke GPU tests require process isolation:
139+
Two GPU isolation hazards require per-file process isolation (`-n 0`):
137140

138-
- `make test-smoke-gpu` runs three separate single-process (`-n 0`) pytest invocations over `tests/smoke/`, split by `-k` filters: (1) non-unsloth/non-smollm2, (2) smollm2, (3) unsloth.
139-
- `make test-e2e` splits into `test-e2e-default` + `test-e2e-dp`, each single-process over `tests/e2e/`.
141+
1. vLLM pre-allocates all GPU memory and never releases it within a process. Tests that call `.generate()` must run in separate processes or later tests OOM.
142+
2. Unsloth patches transformers at import time, poisoning Opacus/DP if they share a process.
143+
144+
GPU smoke tests use markers to express isolation requirements:
145+
146+
- `requires_gpu`: all GPU tests
147+
- `vllm`: tests using vLLM generation (each file gets its own process)
148+
- `smollm2`, `unsloth`: marker-isolated groups (auto-discovered)
149+
150+
`make test-smoke-gpu` uses marker algebra for train-only tests (auto-discovering via `requires_gpu and not vllm and not smollm2 and not unsloth`), explicit file paths for vLLM tests (per-file isolation), and marker selection for SmolLM2/Unsloth. When adding a new vLLM test file, add `pytest.mark.vllm` and also add the file to the Makefile's explicit list.
151+
152+
`make test-e2e` splits into `test-e2e-default` + `test-e2e-dp`, each single-process over `tests/e2e/`.
140153

141154
See [`tests/smoke/README.md`](smoke/README.md) for additional smoke-specific gotchas.
142155

tests/smoke/README.md

Lines changed: 26 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Smoke Tests
22

3-
Quick tests that verify training and generation code paths don't crash.
3+
Quick tests that verify training, generation, evaluation, and PII replacement code paths don't crash.
44
They use tiny or small models and run in seconds (CPU) or a few minutes (GPU).
55

66
```bash
@@ -10,9 +10,10 @@ make test-smoke-gpu # GPU tests (requires CUDA)
1010

1111
## When should I add a smoke test?
1212

13-
If you're adding a new training backend, generation backend, or model family,
14-
add a smoke test for it. Same if you're changing how the SDK orchestrates
15-
train/generate -- those paths are easy to break silently.
13+
If you're adding a new training backend, generation backend, evaluation
14+
component, or model family, add a smoke test for it. Same if you're changing
15+
how the SDK orchestrates train/generate/evaluate -- those paths are easy to
16+
break silently.
1617

1718
Smoke tests don't check output quality. They just make sure the code runs
1819
end-to-end without throwing. Use the smallest model that exercises the path
@@ -21,31 +22,24 @@ a real tokenizer/model).
2122

2223
## GPU Test Process Isolation
2324

24-
GPU smoke tests run in three separate single-process (`-n 0`) pytest invocations to avoid CUDA and import-time conflicts:
25+
GPU smoke tests use three marker-based isolation groups:
2526

26-
1. Local tiny-model tests (everything except SmolLM2 and Unsloth)
27-
2. SmolLM2 Hub download test (downloads ~270MB from HuggingFace)
28-
3. Unsloth backend test (process-isolated from DP tests)
27+
1. Train-only (`requires_gpu` without `vllm`/`smollm2`/`unsloth`): share a single process, auto-discovered via marker algebra.
28+
2. vLLM generation (`vllm` marker): each file gets its own process because vLLM pre-allocates all GPU memory and never releases it.
29+
3. SmolLM2 / Unsloth (`smollm2`, `unsloth` markers): each gets its own process, auto-discovered via markers.
2930

30-
Why: Unsloth monkey-patches transformers at import time, poisoning Opacus/DP if they share a process. CUDA device-side asserts also cascade across xdist workers. The Makefile `test-smoke-gpu` target handles the split automatically via `-k` filters.
31-
32-
Tests use pytestmark decorators:
31+
When adding a new GPU smoke test, add the appropriate markers to `pytestmark`:
3332

3433
```python
3534
pytestmark = [
3635
pytest.mark.requires_gpu,
36+
pytest.mark.vllm, # if the test calls .generate() (uses vLLM)
3737
pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA not available"),
3838
pytest.mark.skipif(sys.platform == "darwin", reason="Not applicable on macOS"),
3939
]
4040
```
4141

42-
For SmolLM2 and Unsloth tests, add the marker to a test function:
43-
44-
```python
45-
@pytest.mark.usefixtures("_register_smollm2") # for SmolLM2 tests
46-
def test_full_pipeline_smollm2(...):
47-
...
48-
```
42+
If the new file uses vLLM, also add it to the explicit file list in the `test-smoke-gpu` Makefile target (vLLM files need per-file isolation).
4943

5044
## Things that will bite you
5145

@@ -58,14 +52,22 @@ def test_full_pipeline_smollm2(...):
5852

5953
## What's in `conftest.py`?
6054

61-
The shared fixtures cover both CPU and GPU smoke tests. The most important ones:
55+
The shared fixtures cover both CPU and GPU smoke tests. Session-scoped fixtures are created once per pytest process; function-scoped fixtures are recreated per test.
6256

63-
- `base_smoke_config` -- default `SafeSynthesizerParameters` pointing at the local tiny model
64-
- `train_with_sdk(config, data_df, save_path)` -- convenience wrapper around the SDK train flow
65-
- `assert_adapter_saved(workdir)` -- checks that adapter files landed on disk
57+
Session-scoped (immutable / read-only):
58+
59+
- `base_smoke_config` -- default `SafeSynthesizerParameters` pointing at the local tiny model (Pydantic frozen model)
6660
- `_patch_attn_eager` -- the attention implementation workaround mentioned above
67-
- `tiny_model`, `stub_tokenizer`, `tiny_training_dataset` -- CPU test building blocks
68-
- `local_tinyllama_dir` -- saves the tiny model to a temp dir so GPU tests don't need internet
61+
- `stub_tokenizer`, `tiny_llama_config`, `local_tinyllama_dir` -- tokenizer and tiny model on disk
6962
- `iris_df`, `timeseries_df` -- small DataFrames for training input
7063

64+
Function-scoped (fresh per test):
65+
66+
- `tiny_model` -- randomly initialized `LlamaForCausalLM` (mutated by training)
67+
68+
Helpers (plain functions, not fixtures):
69+
70+
- `train_with_sdk(config, data_df, save_path)` -- convenience wrapper around the SDK train flow
71+
- `assert_adapter_saved(workdir)` -- checks that adapter files landed on disk
72+
7173
See [CONTRIBUTING.md](../../CONTRIBUTING.md#testing) for the full list of test commands.

tests/smoke/conftest.py

Lines changed: 17 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -6,72 +6,26 @@
66
import pandas as pd
77
import pytest
88
from datasets import Dataset
9-
from transformers import AutoConfig, AutoTokenizer, LlamaConfig, LlamaForCausalLM
9+
from transformers import AutoTokenizer, LlamaConfig, LlamaForCausalLM
1010

1111
from nemo_safe_synthesizer.cli.artifact_structure import Workdir
1212
from nemo_safe_synthesizer.config.parameters import SafeSynthesizerParameters
13-
from nemo_safe_synthesizer.defaults import DEFAULT_INSTRUCTION
14-
from nemo_safe_synthesizer.llm.metadata import LLMPromptConfig, ModelMetadata
1513
from nemo_safe_synthesizer.sdk.library_builder import SafeSynthesizer
1614

17-
18-
class SmolLM2(ModelMetadata):
19-
"""Test-only metadata for HuggingFaceTB/SmolLM2-135M.
20-
21-
Moved out of production code because SmolLM2-135M is only used in smoke tests.
22-
Uses the tokenizer's native BOS token (not the Qwen2 <|im_start|> override
23-
that was previously hardcoded -- that token ID is out of range for SmolLM2's
24-
49K vocab and causes CUDA device-side asserts).
25-
"""
26-
27-
def __init__(
28-
self, model_name_or_path: str, tokenizer=None, rope_scaling_factor: float | None = None, **kwargs
29-
) -> None:
30-
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) if tokenizer is None else tokenizer
31-
config = AutoConfig.from_pretrained(model_name_or_path)
32-
33-
super().__init__(
34-
autoconfig=config,
35-
instruction=DEFAULT_INSTRUCTION,
36-
prompt_config=LLMPromptConfig.from_tokenizer(
37-
template="user\n {instruction} {schema} \n assistant\n{prefill}",
38-
add_bos_token_to_prompt=False,
39-
add_eos_token_to_prompt=False,
40-
tokenizer=tokenizer,
41-
name=model_name_or_path,
42-
),
43-
model_name_or_path=model_name_or_path,
44-
rope_scaling=None,
45-
rope_parameters_location="autoconfig",
46-
**kwargs,
47-
)
48-
49-
50-
@pytest.fixture
51-
def _register_smollm2(monkeypatch):
52-
"""Patch ModelMetadata resolution so the SDK can find SmolLM2 (test-only class)."""
53-
original = ModelMetadata.from_str_or_path.__func__
54-
55-
def patched(cls, model_name_or_path, **kwargs):
56-
if "smollm2" in str(model_name_or_path).lower():
57-
return SmolLM2(model_name_or_path=str(model_name_or_path), **kwargs)
58-
return original(cls, model_name_or_path, **kwargs)
59-
60-
monkeypatch.setattr(ModelMetadata, "from_str_or_path", classmethod(patched))
15+
STUB_DATASETS_DIR = Path(__file__).parent.parent / "stub_datasets"
6116

6217

6318
@pytest.fixture(scope="session")
6419
def fixture_stub_tokenizer_path() -> str:
65-
"""Session-scoped override of the function-scoped fixture in tests/conftest.py."""
20+
"""Path to the Llama stub tokenizer in tests/stub_tokenizer/."""
6621
return str(Path(__file__).parent.parent / "stub_tokenizer")
6722

6823

6924
@pytest.fixture(scope="session")
70-
def tiny_llama_config(fixture_stub_tokenizer_path):
25+
def tiny_llama_config(stub_tokenizer):
7126
"""LlamaConfig with minimal dimensions for fast smoke testing."""
72-
tokenizer = AutoTokenizer.from_pretrained(fixture_stub_tokenizer_path)
7327
return LlamaConfig(
74-
vocab_size=tokenizer.vocab_size, # 32000 -- must match stub tokenizer
28+
vocab_size=stub_tokenizer.vocab_size, # 32000 -- must match stub tokenizer
7529
hidden_size=64,
7630
intermediate_size=128,
7731
num_hidden_layers=2,
@@ -137,9 +91,7 @@ def local_tinyllama_dir(tmp_path_factory, tiny_llama_config, stub_tokenizer):
13791
@pytest.fixture(scope="session")
13892
def iris_df():
13993
"""Load iris.csv from stub_datasets."""
140-
from tests.conftest import load_test_dataframe
141-
142-
return load_test_dataframe("iris.csv").copy()
94+
return pd.read_csv(STUB_DATASETS_DIR / "iris.csv")
14395

14496

14597
@pytest.fixture(scope="session")
@@ -162,7 +114,7 @@ def timeseries_df():
162114
],
163115
"value": [10, 20, 30, 40, 50, 100, 110, 120, 130, 140],
164116
}
165-
).copy()
117+
)
166118

167119

168120
@pytest.fixture(scope="session")
@@ -171,15 +123,15 @@ def smoke_save_path(tmp_path_factory):
171123
return tmp_path_factory.mktemp("smoke-tier-b")
172124

173125

174-
@pytest.fixture
126+
@pytest.fixture(scope="session")
175127
def base_smoke_config(local_tinyllama_dir):
176128
"""Base SafeSynthesizerParameters shared by all GPU smoke tests with local tiny model.
177129
178-
Individual tests override specific fields via SafeSynthesizerParameters.from_params(**overrides).
130+
Session-scoped because the config is immutable (Pydantic frozen model).
131+
Tests that need different settings create their own via SafeSynthesizerParameters.from_params().
179132
"""
180133
return SafeSynthesizerParameters.from_params(
181-
enable_synthesis=True,
182-
enable_replace_pii=False,
134+
replace_pii=None,
183135
pretrained_model=str(local_tinyllama_dir),
184136
use_unsloth=False,
185137
num_input_records_to_sample=10,
@@ -207,10 +159,11 @@ def train_with_sdk(config: SafeSynthesizerParameters, data_df: pd.DataFrame, sav
207159
return nss
208160

209161

210-
@pytest.fixture
211-
def _patch_attn_eager(monkeypatch):
162+
@pytest.fixture(scope="session")
163+
def _patch_attn_eager():
212164
"""Override attn_implementation from 'flashinfer' (not a valid HF option) to 'sdpa'.
213165
166+
Session-scoped so class-scoped and function-scoped fixtures can depend on it.
214167
The HuggingFaceBackend defaults to 'flashinfer' which is not supported by
215168
HuggingFace's from_pretrained. PyTorch SDPA is universally compatible.
216169
"""
@@ -222,4 +175,6 @@ def patched_build(self, model_kwargs):
222175
model_kwargs.setdefault("attn_implementation", "sdpa")
223176
return original_build(self, model_kwargs)
224177

225-
monkeypatch.setattr(HuggingFaceBackend, "_build_base_framework_params", patched_build)
178+
HuggingFaceBackend._build_base_framework_params = patched_build
179+
yield
180+
HuggingFaceBackend._build_base_framework_params = original_build

tests/smoke/test_evaluation_cpu.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
"""CPU evaluation smoke test -- MultimodalReport.from_dataframes().
5+
6+
Exercises the evaluation pipeline (column distribution, correlation,
7+
deep structure, text similarity) on a small dataset with privacy
8+
metrics disabled. Catches dep breakage in the evaluation stack
9+
(scipy, plotly, sentence-transformers, etc.).
10+
"""
11+
12+
import pytest
13+
14+
pytest.importorskip(
15+
"sentence_transformers", reason="sentence-transformers required (install with: uv sync --extra cpu)"
16+
)
17+
18+
from nemo_safe_synthesizer.config.parameters import EvaluationParameters, SafeSynthesizerParameters
19+
from nemo_safe_synthesizer.evaluation.reports.multimodal.multimodal_report import MultimodalReport
20+
21+
22+
def test_multimodal_report_from_dataframes(iris_df):
23+
"""Build a MultimodalReport from iris_df on CPU with privacy metrics off."""
24+
config = SafeSynthesizerParameters(
25+
evaluation=EvaluationParameters(mia_enabled=False, aia_enabled=False),
26+
)
27+
reference = iris_df.copy()
28+
output = iris_df.sample(frac=0.8, random_state=42).reset_index(drop=True)
29+
30+
report = MultimodalReport.from_dataframes(
31+
reference=reference,
32+
output=output,
33+
config=config,
34+
)
35+
assert report is not None
36+
assert len(report.components) > 0
37+
38+
score_dict = report.get_dict()
39+
assert "Synthetic Quality Score" in score_dict

0 commit comments

Comments
 (0)