Skip to content

Commit 0280ed2

Browse files
committed
chore: feedback
Signed-off-by: aagonzales <aagonzales@nvidia.com>
1 parent 08909ca commit 0280ed2

12 files changed

Lines changed: 202 additions & 271 deletions

File tree

.github/workflows/ci-checks.yml

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,27 @@ jobs:
165165
token: ${{ secrets.CODECOV_TOKEN }}
166166
fail_ci_if_error: false
167167

168+
smoke-test:
169+
name: Smoke Tests
170+
needs: changes
171+
if: ${{ needs.changes.outputs.src == 'true' || needs.changes.outputs.test == 'true' || github.event_name == 'workflow_dispatch' }}
172+
runs-on: ubuntu-latest
173+
steps:
174+
- name: checkout
175+
uses: actions/checkout@v6
176+
with:
177+
fetch-depth: 0
178+
179+
- name: Setup Python environment
180+
uses: ./.github/actions/setup-python-env
181+
with:
182+
bootstrap-tools: "true"
183+
184+
- name: Run CPU smoke tests
185+
run: |
186+
make bootstrap-nss cpu
187+
make test-smoke
188+
168189
# ---------------------------------------------------------------------------
169190
# Single required status check for branch protection.
170191
# Aggregates results from all upstream jobs so that skipped jobs (due to
@@ -173,7 +194,7 @@ jobs:
173194
ci-status:
174195
name: CI Status
175196
if: always() && !cancelled()
176-
needs: [changes, format, lint, typecheck, unit-test]
197+
needs: [changes, format, lint, typecheck, unit-test, smoke-test]
177198
runs-on: ubuntu-latest
178199
steps:
179200
- name: Check job results
@@ -182,7 +203,8 @@ jobs:
182203
echo "format: ${{ needs.format.result }}"
183204
echo "lint: ${{ needs.lint.result }}"
184205
echo "typecheck: ${{ needs.typecheck.result }}"
185-
echo "unit-test: ${{ needs.unit-test.result }}"
206+
echo "unit-test: ${{ needs.unit-test.result }}"
207+
echo "smoke-test: ${{ needs.smoke-test.result }}"
186208
187209
if [[ "${{ contains(needs.*.result, 'failure') }}" == "true" ]]; then
188210
echo "::error::One or more CI jobs failed"

.github/workflows/gpu-tests.yml

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -75,11 +75,16 @@ jobs:
7575
python-version: ${{ matrix.python-version }}
7676
bootstrap-tools: "true"
7777

78+
- name: Bootstrap CUDA environment
79+
run: make bootstrap-nss cu128
80+
81+
- name: Run GPU smoke tests
82+
timeout-minutes: 20
83+
run: make test-gpu-integration
84+
7885
- name: Run GPU E2E tests
7986
timeout-minutes: 45
80-
run: |
81-
make bootstrap-nss cu128
82-
make test-e2e
87+
run: make test-e2e
8388

8489
# ---------------------------------------------------------------------------
8590
# Single required status check for branch protection.

CONTRIBUTING.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,9 @@ make test-slow
268268
# Run SDK-related tests (config, sdk, cli, api)
269269
make test-sdk-related
270270

271+
# Run CPU smoke tests (~10 seconds, no GPU required)
272+
make test-smoke
273+
271274
# Run GPU integration tests (requires CUDA)
272275
make test-gpu-integration
273276

@@ -281,6 +284,8 @@ make test-ci-container
281284
uv run pytest tests/cli/test_run.py
282285
```
283286

287+
Smoke tests exercise training and generation hot paths with a tiny model. See [`tests/smoke/README.md`](tests/smoke/README.md) for details on shared fixtures, gotchas, and when to add new smoke tests.
288+
284289
### Test Requirements
285290

286291
Before submitting a PR:

Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -159,9 +159,9 @@ test-sdk-related: ## Run SDK-related tests (config, sdk, cli, api)
159159
$(NSS_ROOT_PATH)/tests/api
160160

161161
.PHONY: test-ci
162-
test-ci: ## Run CI unit tests excluding slow and GPU tests
162+
test-ci: ## Run CI unit tests excluding slow, GPU, and smoke tests
163163
pushd $(NSS_ROOT_PATH) && \
164-
$(PYTEST_CMD) $(PYTEST_CI_OPTS) $(NSS_ROOT_PATH)/tests -m "not e2e and not gpu_integration and not slow"
164+
$(PYTEST_CMD) $(PYTEST_CI_OPTS) $(NSS_ROOT_PATH)/tests -m "not e2e and not gpu_integration and not slow and not smoke"
165165

166166
.PHONY: test-ci-slow
167167
test-ci-slow: ## Run slow tests in CI with coverage

pytest.ini

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ markers =
2222
unit: Unit tests - test single classes/functions with no infrastructure dependencies
2323
unit_test: Legacy marker for unit tests (deprecated, use 'unit' instead)
2424
noautouse: Marker to skip autouse fixtures for specific tests
25-
smoke: Smoke tests - slow unit tests exercising training/generation hot paths with tiny model
25+
smoke: Smoke tests - quick tests exercising training/generation hot paths with tiny models
2626

2727
# Note: Unit tests (testing single classes/functions with no infrastructure dependencies)
2828
# do not need markers and are the default test type.

src/nemo_safe_synthesizer/llm/metadata.py

Lines changed: 1 addition & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,7 @@ def save_metadata(self) -> None:
262262

263263
@classmethod
264264
def from_str_or_path(cls: type["ModelMetadata"], model_name_or_path: Path | str, **kwargs) -> ModelMetadata:
265-
classes = TinyLlama, Qwen, Llama32, SmolLM2, SmolLM3, Mistral, Nemotron, Granite
265+
classes = TinyLlama, Qwen, Llama32, SmolLM3, Mistral, Nemotron, Granite
266266
for class_ in classes:
267267
if str(class_.__name__).lower() in str(model_name_or_path).lower():
268268
return class_(model_name_or_path=str(model_name_or_path), **kwargs)
@@ -473,37 +473,6 @@ def __init__(
473473
)
474474

475475

476-
class SmolLM2(ModelMetadata):
477-
"""SmolLM2 models (e.g., HuggingFaceTB/SmolLM2-135M).
478-
Potentially used for testing."""
479-
480-
def __init__(
481-
self, model_name_or_path: str, tokenizer=None, rope_scaling_factor: float | None = None, **kwargs
482-
) -> None:
483-
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) if tokenizer is None else tokenizer
484-
config = AutoConfig.from_pretrained(model_name_or_path)
485-
if rope_scaling_factor:
486-
logger.warning(
487-
f"Rope scaling factor {rope_scaling_factor} is not supported for Mistral due to longer default context lengths. Ignoring."
488-
)
489-
490-
super().__init__(
491-
autoconfig=config,
492-
instruction=DEFAULT_INSTRUCTION,
493-
prompt_config=LLMPromptConfig.from_tokenizer(
494-
template="user\n {instruction} {schema} \n assistant\n{prefill}",
495-
add_bos_token_to_prompt=False,
496-
add_eos_token_to_prompt=False,
497-
tokenizer=tokenizer,
498-
name=model_name_or_path,
499-
),
500-
model_name_or_path=model_name_or_path,
501-
rope_scaling=None,
502-
rope_parameters_location="autoconfig",
503-
**kwargs,
504-
)
505-
506-
507476
class SmolLM3(ModelMetadata):
508477
def __init__(
509478
self, model_name_or_path: str, tokenizer=None, rope_scaling_factor: float | None = None, **kwargs

tests/llm/test_metadata.py

Lines changed: 0 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,6 @@
3434
Nemotron,
3535
Qwen,
3636
RopeScaling,
37-
SmolLM2,
3837
SmolLM3,
3938
TinyLlama,
4039
resolve_rope_scaling_factor,
@@ -87,7 +86,6 @@ class RopeScalingScenario:
8786
ModelDetectionScenario("tinyllama", "TinyLlama/TinyLlama-1.1B-Chat-v1.0", TinyLlama),
8887
ModelDetectionScenario("qwen", "Qwen/Qwen2-0.5B", Qwen),
8988
ModelDetectionScenario("llama32", "meta-llama/Llama32-1B", Llama32),
90-
ModelDetectionScenario("smollm2", "HuggingFaceTB/SmolLM2-135M", SmolLM2),
9189
ModelDetectionScenario("smollm3", "HuggingFaceTB/SmolLM3-3B", SmolLM3),
9290
ModelDetectionScenario("mistral", "mistralai/Mistral-7B-v0.1", Mistral),
9391
ModelDetectionScenario("nemotron", "nvidia/Nemotron-4-340B", Nemotron),
@@ -125,17 +123,6 @@ class RopeScalingScenario:
125123
expected_bos_token="<|im_start|>",
126124
expected_bos_token_id=151644,
127125
),
128-
ModelInitScenario(
129-
id="smollm2",
130-
model_class=SmolLM2,
131-
model_path="HuggingFaceTB/SmolLM2-135M",
132-
expected_template="user\n {instruction} {schema} \n assistant\n{prefill}",
133-
expected_add_bos=False,
134-
expected_add_eos=False,
135-
expected_bos_token="<|im_start|>",
136-
expected_bos_token_id=151644,
137-
custom_max_position_embeddings=8192,
138-
),
139126
ModelInitScenario(
140127
id="smollm3",
141128
model_class=SmolLM3,

tests/smoke/README.md

Lines changed: 32 additions & 146 deletions
Original file line numberDiff line numberDiff line change
@@ -1,158 +1,44 @@
1-
# Fast Smoke Tests for Training and Generation Hot Paths
1+
# Smoke Tests
22

3-
Smoke tests exercising training and generation hot paths with a tiny model.
4-
Run CPU tests without GPU (`make test-smoke`), GPU tests via `make test-gpu-integration`.
3+
Quick tests that verify training and generation code paths don't crash.
4+
They use tiny or small models and run in seconds (CPU) or a few minutes (GPU).
55

6-
## Shared Infrastructure (`conftest.py`)
7-
8-
Key fixtures and helpers available to all smoke tests:
9-
10-
- **`base_smoke_config`** -- `SafeSynthesizerParameters` with local tiny model defaults
11-
- **`train_with_sdk(config, data_df, save_path)`** -- runs `process_data().train()`, returns the `SafeSynthesizer` instance
12-
- **`assert_adapter_saved(workdir)`** -- asserts `adapter_config.json` + `*.safetensors` exist
13-
- **`_patch_attn_eager`** -- monkeypatches `attn_implementation` to `"eager"` for tiny model compatibility
14-
- **`tiny_model`** / **`stub_tokenizer`** / **`tiny_training_dataset`** -- CPU test primitives
15-
- **`local_tinyllama_dir`** -- local model directory for GPU tests (no internet needed)
16-
- **`iris_df`** / **`timeseries_df`** -- small stub datasets
17-
18-
## Design Origin
19-
20-
This test suite was organized into **self-contained work units (WUs)** that were delegated independently. WU1 and WU2 (infrastructure and fixtures) were done first. After that, WU3-WU11 were done in parallel, then consolidated in WU13.
21-
22-
## Dependency Graph and Parallel Execution Strategy
23-
24-
There are only two sequential dependencies: WU1 -> WU2 (foundation). After that, **all remaining WUs are fully independent** -- no test file reads output from another test file. Each GPU test does its own training internally.
25-
26-
```mermaid
27-
flowchart TD
28-
subgraph phase1 ["Phase 1 -- Foundation (sequential, do first)"]
29-
WU1["WU1: Infrastructure"] --> WU2["WU2: Shared Fixtures"]
30-
end
31-
32-
subgraph phase2 ["Phase 2 -- All parallelizable (no inter-dependencies)"]
33-
direction TB
34-
batchA["Batch A: WU0 README"]
35-
batchB["Batch B: WU3 CPU Training + WU4 CPU Generation"]
36-
batchC["Batch C: WU5 GPU Training + WU10 GPU Adapter Persistence"]
37-
batchD["Batch D: WU6 GPU Generation + WU8 GPU Structured Gen"]
38-
batchE["Batch E: WU7 GPU Timeseries"]
39-
batchF["Batch F: WU9 GPU Resume"]
40-
batchG["Batch G: WU11 GPU Full Pipeline + WU12 Unsloth"]
41-
end
42-
43-
subgraph phase3 ["Phase 3 -- Consolidation (fresh agent, after all Phase 2)"]
44-
WU13["WU13: DRY pass -- deduplicate, extract helpers, consolidate"]
45-
end
46-
47-
WU2 --> batchA
48-
WU2 --> batchB
49-
WU2 --> batchC
50-
WU2 --> batchD
51-
WU2 --> batchE
52-
WU2 --> batchF
53-
WU2 --> batchG
54-
55-
batchA --> WU13
56-
batchB --> WU13
57-
batchC --> WU13
58-
batchD --> WU13
59-
batchE --> WU13
60-
batchF --> WU13
61-
batchG --> WU13
6+
```bash
7+
make test-smoke # CPU only, no GPU needed
8+
make test-gpu-integration # GPU tests (requires CUDA)
629
```
6310

11+
## When should I add a smoke test?
6412

13+
If you're adding a new training backend, generation backend, or model family,
14+
add a smoke test for it. Same if you're changing how the SDK orchestrates
15+
train/generate -- those paths are easy to break silently.
6516

66-
### Recommended Delegation Batches
67-
68-
WU3-WU11 are grouped by **skill similarity** so each assignee has minimal context-switching:
69-
70-
71-
| Batch | WUs | Why grouped | Skills needed | Size |
72-
| ----------- | --------------- | ----------------------------------------------------------------- | ------------------------------------------------- | ------------------------- |
73-
| **Phase 1** | WU0 + WU1 + WU2 | Sequential foundation; one person does all setup | pytest fixtures, basic infra | ~30 min |
74-
| **B** | WU3 + WU4 | Both CPU-only, similar Trainer/generate patterns | HF Trainer, peft, Opacus, NSS assembler/processor | Medium (2 files, 7 tests) |
75-
| **C** | WU5 + WU10 | Both train via SDK then inspect adapter output | SafeSynthesizer SDK, PEFT adapter loading | Medium (2 files, 5 tests) |
76-
| **D** | WU6 + WU8 | Both exercise vLLM generation paths | VllmBackend, vLLM, structured outputs | Medium (2 files, 3 tests) |
77-
| **E** | WU7 | Specialized timeseries knowledge | TimeseriesBackend, timeseries config | Small (1 file, 1 test) |
78-
| **F** | WU9 | Specialized resume/Workdir knowledge | SafeSynthesizer resume, load_from_save_path | Small (1 file, 1 test) |
79-
| **G** | WU11 + WU12 | Both need internet + HF Hub; WU12 needs process isolation from DP | SmolLM2, Unsloth, HF Hub, Makefile update | Medium (2 files, 3 tests) |
80-
81-
82-
### Priority order (if fewer hands available)
17+
Smoke tests don't check output quality. They just make sure the code runs
18+
end-to-end without throwing. Use the smallest model that exercises the path
19+
(the local `tiny_llama` stub for most things, SmolLM2-135M when you need
20+
a real tokenizer/model).
8321

84-
### Phase 3: Consolidation (sequential, after all Phase 2 batches complete)
22+
## Things that will bite you
8523

24+
- **LoRA rank must be 8** (not 4). vLLM silently rejects rank 4. Use `lora_r=8`.
25+
- **Iris only has 151 rows**, but holdout needs >=200. Set `holdout=0, max_holdout=0` to skip it.
26+
- **Attention implementation**: HuggingFaceBackend defaults to `flashinfer`, which HF doesn't recognize. The `_patch_attn_eager` fixture overrides it to `"sdpa"`.
27+
- **Stub tokenizer vocab is 32000**. If you change the tiny model config, keep `vocab_size=32000` or you'll get shape mismatches.
28+
- **Always set `use_unsloth=False`** unless you're specifically testing Unsloth. The `auto` default can pull it in and it monkey-patches transformers globally.
29+
- **CPU tests need `optim="adamw_torch"`**. The production default (`paged_adamw_32bit`) requires bitsandbytes CUDA kernels.
30+
- **Unsloth tests run in a separate process**. Unsloth patches transformers at import time, which breaks Opacus/DP if they share a process. The Makefile handles this automatically.
8631

87-
| Batch | WU | Purpose | Owner |
88-
| ----- | ---- | ------------------------------------------------------------------------------- | -------------------------------------------------- |
89-
| **H** | WU13 | Holistic DRY pass: deduplicate configs, extract helpers, consolidate decorators | **Fresh agent** (must NOT have worked on WU3-WU12) |
32+
## What's in `conftest.py`?
9033

34+
The shared fixtures cover both CPU and GPU smoke tests. The most important ones:
9135

92-
### Priority order (if fewer hands available)
36+
- `base_smoke_config` -- default `SafeSynthesizerParameters` pointing at the local tiny model
37+
- `train_with_sdk(config, data_df, save_path)` -- convenience wrapper around the SDK train flow
38+
- `assert_adapter_saved(workdir)` -- checks that adapter files landed on disk
39+
- `_patch_attn_eager` -- the attention implementation workaround mentioned above
40+
- `tiny_model`, `stub_tokenizer`, `tiny_training_dataset` -- CPU test building blocks
41+
- `local_tinyllama_dir` -- saves the tiny model to a temp dir so GPU tests don't need internet
42+
- `iris_df`, `timeseries_df` -- small DataFrames for training input
9343

94-
If you cannot parallelize all batches, do them in this order (highest value first):
95-
96-
1. **Phase 1** (A) -- must go first
97-
2. **C** (GPU SDK training + adapter) -- highest signal for catching regressions
98-
3. **D** (GPU generation + structured) -- tests the production generation path
99-
4. **B** (CPU training + generation) -- catches dep breakage without GPU
100-
5. **E** (timeseries) -- specialized but important path
101-
6. **F** (resume) -- important production flow
102-
7. **G** (SmolLM2 + Unsloth) -- lowest priority, needs internet; WU12 also needs Makefile update for process isolation
103-
8. **H** (consolidation) -- must be last; requires fresh eyes
104-
105-
---
106-
107-
## Critical Gotchas (every WU must know these)
108-
109-
These were discovered by automated council review and affect ALL work units:
110-
111-
1. **Copyright headers**: Every new `.py` file MUST start with:
112-
```python
113-
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
114-
# SPDX-License-Identifier: Apache-2.0
115-
```
116-
Enforced by `tools/lint/copyright_fixer.py --check .` in CI.
117-
2. `**lora_r=8` not 4**: vLLM only allows LoRA ranks in {1, 8, 16, 32, 64, ...}. Rank 4 is silently rejected. Use 8 everywhere in smoke tests.
118-
3. `**holdout=0, max_holdout=0**`: The iris dataset has 151 rows, but `Holdout.train_test_split()` in `src/nemo_safe_synthesizer/holdout/holdout.py` requires >=200 rows. Setting holdout=0 bypasses this.
119-
4. `**attn_implementation="eager"**`: The `HuggingFaceBackend` defaults to `flash_attention_2` which can fail with head_dim=32 (our tiny model: hidden_size=64 / 2 heads). Override to `"eager"` in smoke tests.
120-
5. `**vocab_size=32000**`: The stub tokenizer at `tests/stub_tokenizer/` has 32000 tokens. The tiny model config must match this exactly.
121-
6. `**use_unsloth=False**`: Always set explicitly. The `auto` default may resolve to `True` and pull in Unsloth, which invasively patches transformers.
122-
7. `**optim="adamw_torch"**` for CPU tests: The production default `paged_adamw_32bit` requires bitsandbytes CUDA kernels.
123-
124-
---
125-
126-
## Summary: File Inventory
127-
128-
129-
| File | WU | Tests | Marker |
130-
| ------------------------------------------------- | ---- | ------------------ | --------------------------- |
131-
| `tests/smoke/README.md` | WU0 | -- (documentation) | -- |
132-
| `tests/smoke/__init__.py` | WU1 | -- | -- |
133-
| `tests/smoke/conftest.py` | WU2 | -- (fixtures only) | -- |
134-
| `tests/smoke/test_training_cpu.py` | WU3 | 4 | `smoke` (auto) |
135-
| `tests/smoke/test_generation_cpu.py` | WU4 | 3 | `smoke` (auto) |
136-
| `tests/smoke/test_nss_training_gpu.py` | WU5 | 2 | `smoke` + `gpu_integration` |
137-
| `tests/smoke/test_nss_generation_gpu.py` | WU6 | 2 | `smoke` + `gpu_integration` |
138-
| `tests/smoke/test_nss_timeseries_gpu.py` | WU7 | 1 | `smoke` + `gpu_integration` |
139-
| `tests/smoke/test_nss_structured_gen_gpu.py` | WU8 | 1 | `smoke` + `gpu_integration` |
140-
| `tests/smoke/test_nss_resume_gpu.py` | WU9 | 1 | `smoke` + `gpu_integration` |
141-
| `tests/smoke/test_nss_adapter_persistence_gpu.py` | WU10 | 3 | `smoke` + `gpu_integration` |
142-
| `tests/smoke/test_full_pipeline_gpu.py` | WU11 | 2 | `smoke` + `gpu_integration` |
143-
| `tests/smoke/test_nss_unsloth_gpu.py` | WU12 | 1 | `smoke` + `gpu_integration` |
144-
145-
146-
**Modified files**: `tests/conftest.py`, `pytest.ini`, `Makefile`
147-
148-
**Total**: 21 tests across 10 test files, plus 2 infra files (conftest, init) and 1 README.
149-
150-
## Running
151-
152-
```bash
153-
# CPU smoke tests only (~10 seconds, no GPU required)
154-
make test-smoke
155-
156-
# GPU smoke + e2e tests (requires CUDA)
157-
make test-gpu-integration
158-
```
44+
See [CONTRIBUTING.md](../../CONTRIBUTING.md#testing) for the full list of test commands.

0 commit comments

Comments
 (0)