test: add smoke tests for training and generation hot paths by binaryaaron · Pull Request #76 · NVIDIA-NeMo/Safe-Synthesizer

binaryaaron · 2026-02-19T17:36:22Z

Summary

Adds smoke tests exercising training, generation, evaluation, and PII replacement hot paths with tiny models. Catches dependency breakage and SDK orchestration regressions without checking output quality.

Changes beyond the test files themselves:

Pytest markers restructured: 4 categories (unit, slow, smoke, e2e) + orthogonal modifiers (requires_gpu, vllm, smollm2, unsloth). Removed 5 unused markers.
Makefile targets renamed for consistency: test-smoke (CPU), test-smoke-gpu (GPU with vLLM/SmolLM2/Unsloth process isolation).
CPU smoke tests added as a required CI job in ci-checks.yml.
Docs updated

Test plan

CPU smoke tests pass locally
GPU smoke tests pass on A100
CI workflows pass on push

kendrickb-nvidia

Should we add some guidance to CONTRIBUTING.md about when and why to use smoke tests versus the others?

When are we planning to run these (I don't see changes to CI yet)? Will we gate running other tests on smoke tests passing?

Copilot

Pull request overview

This PR adds comprehensive smoke tests to catch dependency breakage and SDK orchestration regressions in training and generation hot paths. The tests use tiny models and run quickly (CPU tests in seconds, GPU tests in ~20 minutes) without checking output quality.

Changes:

Adds 13 new smoke test files covering CPU training/generation, GPU SDK training/generation, adapter persistence, resume flows, structured generation, timeseries, and Unsloth integration
Moves SmolLM2 metadata from production code to test-only tests/smoke/conftest.py (fixes CUDA device-side asserts from out-of-range BOS token)
Restructures pytest markers: 4 categories (unit, slow, smoke, e2e) + 1 orthogonal modifier (requires_gpu)
Splits GPU CI workflow into parallel required smoke tests (~20 min) and informational e2e tests (~50 min) for faster feedback
Adds CPU smoke tests as a required CI job
Updates documentation across 11 files (README, CONTRIBUTING, AGENTS, test docs, Cursor rules, agent skills)

Reviewed changes

Copilot reviewed 31 out of 32 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/smoke/conftest.py	Shared fixtures for smoke tests including SmolLM2 metadata class, tiny model fixtures, session-scoped tokenizer, and helper functions
tests/smoke/init.py	Package marker file with license header
tests/smoke/README.md	Documentation explaining when to add smoke tests, common gotchas, and fixture descriptions
tests/smoke/test_training_cpu.py	CPU training smoke tests (HF Trainer, LoRA, DP, Assembler)
tests/smoke/test_generation_cpu.py	CPU generation smoke tests (schema prompt, processor)
tests/smoke/test_nss_training_gpu.py	GPU SDK training tests (standard + DP)
tests/smoke/test_nss_generation_gpu.py	GPU generation tests (full chain + manual VllmBackend)
tests/smoke/test_nss_adapter_persistence_gpu.py	GPU adapter file checks + PEFT load verification
tests/smoke/test_nss_resume_gpu.py	GPU resume flow test (train -> save -> load -> generate)
tests/smoke/test_nss_structured_gen_gpu.py	GPU structured generation test (outlines + json_schema)
tests/smoke/test_nss_timeseries_gpu.py	GPU timeseries backend test
tests/smoke/test_nss_unsloth_gpu.py	GPU Unsloth training test (process-isolated)
tests/smoke/test_full_pipeline_gpu.py	GPU full pipeline test with SmolLM2-135M from Hub
src/nemo_safe_synthesizer/llm/metadata.py	Removes SmolLM2 class (moved to tests)
tests/conftest.py	Updates auto-marking logic for new marker structure
tests/llm/test_metadata.py	Removes SmolLM2 test scenarios
tests/e2e/test_safe_synthesizer.py	Updates markers from gpu_integration to requires_gpu
pytest.ini	Redefines markers: removes 5 old markers, adds 3 new ones
pyproject.toml	Adds setuptools>=80.0.0 dependency
uv.lock	Updates dependency lock with setuptools
Makefile	Renames test targets: test-smoke, test-smoke-gpu (replaces test-gpu-integration)
.github/workflows/gpu-tests.yml	Splits into gpu-smoke-test (required) and gpu-e2e-test (informational) jobs
.github/workflows/ci-checks.yml	Adds CPU smoke-test as required job
README.md	Updates test command documentation
CONTRIBUTING.md	Updates test command documentation and adds smoke test reference
AGENTS.md	Updates test command and marker documentation
tests/AGENTS.md	Updates test command reference
tests/TESTING.md	Comprehensive update of markers, commands, fixtures, and isolation patterns
.cursor/rules/repo-navigation.mdc	Updates marker list
.claude/commands/gpu-test.md	Updates GPU test command reference
.agent/skills/python-testing-patterns/SKILL.md	Updates marker documentation
.agent/skills/diagnose-failures/SKILL.md	Updates marker and command table

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 31 out of 32 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codecov · 2026-03-17T22:00:05Z

Codecov Report

❌ Patch coverage is 45.17906% with 199 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
tests/smoke/conftest.py	48.78%	42 Missing ⚠️
tests/smoke/test_training_cpu.py	49.15%	30 Missing ⚠️
tests/smoke/test_nss_adapter_persistence_gpu.py	35.55%	29 Missing ⚠️
tests/smoke/test_nss_generation_gpu.py	28.20%	28 Missing ⚠️
tests/smoke/test_generation_cpu.py	25.92%	20 Missing ⚠️
tests/smoke/test_full_pipeline_gpu.py	39.13%	14 Missing ⚠️
tests/smoke/test_nss_resume_gpu.py	50.00%	11 Missing ⚠️
tests/smoke/test_nss_timeseries_gpu.py	50.00%	9 Missing ⚠️
tests/smoke/test_nss_structured_gen_gpu.py	56.25%	7 Missing ⚠️
tests/smoke/test_nss_training_gpu.py	66.66%	5 Missing ⚠️
... and 1 more

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

Copilot reviewed 32 out of 33 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

tests/llm/test_metadata.py:130

This removes SmolLM2 coverage from the unit-level metadata tests (no ModelDetectionScenario/ModelInitScenario for SmolLM2) even though SmolLM2 is still a production ModelMetadata subclass. Consider updating the SmolLM2 scenarios to use tokenizer-derived BOS/EOS (or otherwise avoid hardcoding an out-of-range bos_token_id) rather than dropping the unit test coverage entirely.

# Model detection scenarios - maps model paths to expected classes
MODEL_DETECTION_SCENARIOS = [
    ModelDetectionScenario("tinyllama", "TinyLlama/TinyLlama-1.1B-Chat-v1.0", TinyLlama),
    ModelDetectionScenario("qwen", "Qwen/Qwen2-0.5B", Qwen),
    ModelDetectionScenario("llama32", "meta-llama/Llama32-1B", Llama32),
    ModelDetectionScenario("smollm3", "HuggingFaceTB/SmolLM3-3B", SmolLM3),
    ModelDetectionScenario("mistral", "mistralai/Mistral-7B-Instruct-v0.3", Mistral),
    ModelDetectionScenario("nemotron", "nvidia/Nemotron-4-340B", Nemotron),
]

# Model initialization scenarios - tests each model's specific configuration
MODEL_INIT_SCENARIOS = [
    ModelInitScenario(
        id="tinyllama",
        model_class=TinyLlama,
        model_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
        expected_template=PROMPT_TEMPLATE,
        expected_add_bos=True,
        expected_add_eos=True,
        expected_bos_token=None,  # Uses tokenizer default
        expected_bos_token_id=None,
    ),
    ModelInitScenario(
        id="qwen",
        model_class=Qwen,
        model_path="Qwen/Qwen2-0.5B",
        expected_template="user\n {instruction} {schema} \n assistant\n{prefill}",
        expected_add_bos=True,
        expected_add_eos=False,
        expected_bos_token=None,
        expected_bos_token_id=None,
    ),
    ModelInitScenario(
        id="llama32",
        model_class=Llama32,
        model_path="meta-llama/Llama32-1B",
        expected_template="user\n {instruction} {schema} \n assistant\n{prefill}",
        expected_add_bos=False,
        expected_add_eos=False,
        expected_bos_token="<|im_start|>",
        expected_bos_token_id=151644,
    ),
    ModelInitScenario(
        id="smollm3",
        model_class=SmolLM3,
        model_path="HuggingFaceTB/SmolLM3-3B",

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 48 out of 49 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

kendrickb-nvidia

Maybe this is a bug with github console. But I'm seeing diffs here that are from already merged PRs. Not just the smoke test changes.

Signed-off-by: aagonzales <aagonzales@nvidia.com> Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com> Signed-off-by: aagonzales <aagonzales@nvidia.com>

- Replace dead `enable_replace_pii=False` with `replace_pii=None` in all `from_params()` calls (the old key was silently ignored by Pydantic) - Remove dead `enable_synthesis=True` (no such field exists in src/) - Add `vllm` pytest marker for GPU memory isolation; refactor Makefile `test-smoke-gpu` into marker-based groups (train-only / vllm / smollm2 / unsloth) - Tighten `except GenerationError` to assert on expected message - Fix SmolLM2 metadata: remove hardcoded out-of-range BOS token IDs - Promote `base_smoke_config` and `_patch_attn_eager` to session scope - Make `tiny_llama_config` depend on `stub_tokenizer` (eliminate redundant load) - Add CPU smoke tests for evaluation (MultimodalReport) and PII replacement (NemoPII) - Update smoke test docs (README.md, TESTING.md) to reflect new markers and scopes Made-with: Cursor Signed-off-by: aagonzales <aagonzales@nvidia.com>

- ci-checks.yml: fix echo alignment in status check job - TESTING.md: use `uv run --frozen pytest` instead of bare `pytest` - test_training_cpu.py: remove unused `fixture_stub_tokenizer_path` param - Makefile: use PYTEST_NO_XDIST_CMD instead of appending -n 0 to PYTEST_CMD; fix test-gpu-integration help text to match its actual recipe (e2e only) Made-with: Cursor Signed-off-by: aagonzales <aagonzales@nvidia.com>

Signed-off-by: aagonzales <aagonzales@nvidia.com>

Copilot

Pull request overview

Copilot reviewed 32 out of 36 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T06:05:57Z

+    decoded = stub_tokenizer.decode(output[0], skip_special_tokens=True)
+    assert len(decoded) > len(prompt)  # model produced something


test_generate_with_schema_prompt asserts len(decoded) > len(prompt) after decoding with skip_special_tokens=True. With a randomly initialized model and greedy decoding, it's possible for generation to immediately emit only special tokens (e.g., EOS) so the decoded text matches the prompt, making this test flaky. Prefer asserting on tensor lengths (e.g., output.shape[-1] > input_ids.shape[-1]) or simply that generate() returns without error and the output shape is valid.

Suggested change

decoded = stub_tokenizer.decode(output[0], skip_special_tokens=True)

assert len(decoded) > len(prompt) # model produced something

# Ensure generate() returns a tensor with valid shape (same batch, no shorter sequence)

assert output is not None

assert output.shape[0] == input_ids.shape[0]

assert output.shape[1] >= input_ids.shape[1]

Copilot · 2026-03-20T06:05:58Z

+        rope_scaling_factor=1,
+    )
+    mock_metadata = MagicMock()
+    mock_metadata.prompt_config.prompt_template = PROMPT_TEMPLATE


ModelMetadata.prompt_config uses the field name template (see LLMPromptConfig.template), but this test sets mock_metadata.prompt_config.prompt_template. Even though the current create_processor() path may not read it, keeping the mock aligned with the real metadata interface will prevent silent breakage if create_processor() starts using the prompt template in more modes.

Suggested change

mock_metadata.prompt_config.prompt_template = PROMPT_TEMPLATE

mock_metadata.prompt_config.template = PROMPT_TEMPLATE

mckornfield

marks are a bit copy pasta but probably more readable, mostly talking to myself in reviews, lg

mckornfield · 2026-03-20T15:40:31Z

+2. vLLM generation (`vllm` marker): each file gets its own process because vLLM pre-allocates all GPU memory and never releases it.
+3. SmolLM2 / Unsloth (`smollm2`, `unsloth` markers): each gets its own process, auto-discovered via markers.
+
+When adding a new GPU smoke test, add the appropriate markers to `pytestmark`:


are all of them appropriate? does it make sense to just add one pytest_add_gpu_markers() fn or nah?

## Summary the smoke tests pr #76 uses smollm2 for testing. during one of many rebases and changes to that branch, somewhere along the way we lost the following change. - `SmolLM2` metadata hardcoded `bos_token_id=151644` (copied from the `Llama32` class), but all SmolLM2 models have a 49152-token vocab where `<|im_start|>`; token id `151644` caused a CUDA `vectorized_gather_kernel` oob during tests. - Fix: look up the actual `<|im_start|>` token ID from the tokenizer via `convert_tokens_to_ids` at init time instead of hardcoding it. - Also corrects stale `generation/AGENTS.md` that said `ignore_eos=True` (now always `False` after #265). ## Test plan - [x] `pytest tests/llm/` -- 52 passed - [x] `pytest tests/smoke/test_full_pipeline_gpu.py::test_full_pipeline_smollm2` -- passes (was CUDA crash before) Signed-off-by: aagonzales <aagonzales@nvidia.com>

kendrickb-nvidia reviewed Feb 19, 2026

View reviewed changes

kendrickb-nvidia mentioned this pull request Feb 20, 2026

fix: replace faiss with rapids #66

Closed

6 tasks

binaryaaron force-pushed the binaryaaron/add-smoke-tests branch 2 times, most recently from 0280ed2 to 8453877 Compare February 20, 2026 22:16

binaryaaron marked this pull request as ready for review February 20, 2026 22:21

binaryaaron requested review from a team as code owners February 20, 2026 22:21

binaryaaron mentioned this pull request Feb 20, 2026

refactor: rename pytest markers to use requires_ prefix #82

Closed

binaryaaron changed the title ~~feat: add smoke tests~~ feat(test): add smoke tests for training and generation hot paths Feb 20, 2026

binaryaaron force-pushed the binaryaaron/add-smoke-tests branch from 8453877 to 482412e Compare February 24, 2026 15:01

binaryaaron self-assigned this Feb 24, 2026

binaryaaron changed the title ~~feat(test): add smoke tests for training and generation hot paths~~ test: add smoke tests for training and generation hot paths Feb 24, 2026

Copilot AI review requested due to automatic review settings February 24, 2026 17:47

Copilot started reviewing on behalf of binaryaaron February 24, 2026 17:48 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

Comment thread Makefile Outdated

Comment thread tests/smoke/conftest.py

Comment thread .github/workflows/gpu-tests.yml

Comment thread pyproject.toml Outdated

Comment thread .github/workflows/ci-checks.yml Outdated

Comment thread pyproject.toml

binaryaaron force-pushed the binaryaaron/add-smoke-tests branch 2 times, most recently from b4ae90d to fb8129b Compare March 10, 2026 14:34

binaryaaron requested review from Copilot and kendrickb-nvidia March 10, 2026 20:52

Copilot started reviewing on behalf of binaryaaron March 10, 2026 20:53 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

kendrickb-nvidia previously approved these changes Mar 13, 2026

View reviewed changes

Comment thread .github/workflows/ci-checks.yml

Comment thread .github/workflows/ci-checks.yml

Comment thread docs/user-guide/troubleshooting.md

Comment thread tests/smoke/conftest.py Outdated

binaryaaron added task area:dev-ex Affects build or dev experience labels Mar 17, 2026

binaryaaron dismissed kendrickb-nvidia’s stale review via 9137c3e March 17, 2026 20:13

binaryaaron force-pushed the binaryaaron/add-smoke-tests branch from fb8129b to 9137c3e Compare March 17, 2026 20:13

Copilot AI review requested due to automatic review settings March 18, 2026 22:17

Copilot AI reviewed Mar 18, 2026

View reviewed changes

Comment thread Makefile Outdated

Comment thread tests/smoke/test_training_cpu.py Outdated

Comment thread src/nemo_safe_synthesizer/llm/metadata.py

Comment thread .github/workflows/ci-checks.yml

Comment thread .github/workflows/gpu-tests.yml

Comment thread Makefile Outdated

Copilot AI review requested due to automatic review settings March 19, 2026 04:46

binaryaaron force-pushed the binaryaaron/add-smoke-tests branch 2 times, most recently from 4e53f6b to bb69311 Compare March 19, 2026 04:50

binaryaaron requested review from kendrickb-nvidia and mckornfield March 19, 2026 04:50

Copilot AI reviewed Mar 19, 2026

View reviewed changes

kendrickb-nvidia reviewed Mar 19, 2026

View reviewed changes

Comment thread .github/workflows/ci-checks.yml

kendrickb-nvidia mentioned this pull request Mar 19, 2026

chore: update ty and fix all type issues #141

Merged

7 tasks

binaryaaron added 8 commits March 19, 2026 12:54

feature: smoke tests

93a5b0c

Signed-off-by: aagonzales <aagonzales@nvidia.com> Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com> Signed-off-by: aagonzales <aagonzales@nvidia.com>

chore: update pytest markers

725c153

Signed-off-by: aagonzales <aagonzales@nvidia.com> Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com> Signed-off-by: aagonzales <aagonzales@nvidia.com>

chore: make gpu smoke tests run more frequently

5b50906

Signed-off-by: aagonzales <aagonzales@nvidia.com> Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com> Signed-off-by: aagonzales <aagonzales@nvidia.com>

fix: adding matrix back

f7d1263

Signed-off-by: aagonzales <aagonzales@nvidia.com> Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com> Signed-off-by: aagonzales <aagonzales@nvidia.com>

fix: bad arg

febb84c

Signed-off-by: aagonzales <aagonzales@nvidia.com>

docs: update a few stragglers

4e3a6a2

Signed-off-by: aagonzales <aagonzales@nvidia.com>

binaryaaron force-pushed the binaryaaron/add-smoke-tests branch from bb69311 to 4e3a6a2 Compare March 19, 2026 20:25

binaryaaron requested a review from Copilot March 20, 2026 06:00

Copilot started reviewing on behalf of binaryaaron March 20, 2026 06:01 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

mckornfield approved these changes Mar 20, 2026

View reviewed changes

kendrickb-nvidia approved these changes Mar 20, 2026

View reviewed changes

binaryaaron merged commit c757976 into main Mar 23, 2026
17 checks passed

binaryaaron deleted the binaryaaron/add-smoke-tests branch March 23, 2026 21:32

binaryaaron mentioned this pull request Mar 24, 2026

fix(llm): resolve SmolLM2 bos_token_id out-of-range CUDA crash #286

Merged

2 tasks

		decoded = stub_tokenizer.decode(output[0], skip_special_tokens=True)
		assert len(decoded) > len(prompt) # model produced something

-    decoded = stub_tokenizer.decode(output[0], skip_special_tokens=True)
-    assert len(decoded) > len(prompt)  # model produced something
+    # Ensure generate() returns a tensor with valid shape (same batch, no shorter sequence)
+    assert output is not None
+    assert output.shape[0] == input_ids.shape[0]
+    assert output.shape[1] >= input_ids.shape[1]

	mock_metadata.prompt_config.prompt_template = PROMPT_TEMPLATE
	mock_metadata.prompt_config.template = PROMPT_TEMPLATE

Conversation

binaryaaron commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

kendrickb-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Mar 17, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

kendrickb-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

mckornfield left a comment

Choose a reason for hiding this comment

Uh oh!

mckornfield Mar 20, 2026

binaryaaron commented Feb 19, 2026 •

edited

Loading