Skip to content

test: add smoke tests for training and generation hot paths#76

Merged
binaryaaron merged 8 commits into
mainfrom
binaryaaron/add-smoke-tests
Mar 23, 2026
Merged

test: add smoke tests for training and generation hot paths#76
binaryaaron merged 8 commits into
mainfrom
binaryaaron/add-smoke-tests

Conversation

@binaryaaron

@binaryaaron binaryaaron commented Feb 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds smoke tests exercising training, generation, evaluation, and PII replacement hot paths with tiny models. Catches dependency breakage and SDK orchestration regressions without checking output quality.

Changes beyond the test files themselves:

  • Pytest markers restructured: 4 categories (unit, slow, smoke, e2e) + orthogonal modifiers (requires_gpu, vllm, smollm2, unsloth). Removed 5 unused markers.
  • Makefile targets renamed for consistency: test-smoke (CPU), test-smoke-gpu (GPU with vLLM/SmolLM2/Unsloth process isolation).
  • CPU smoke tests added as a required CI job in ci-checks.yml.
  • Docs updated

Test plan

  • CPU smoke tests pass locally
  • GPU smoke tests pass on A100
  • CI workflows pass on push

@kendrickb-nvidia kendrickb-nvidia left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some guidance to CONTRIBUTING.md about when and why to use smoke tests versus the others?

When are we planning to run these (I don't see changes to CI yet)? Will we gate running other tests on smoke tests passing?

Comment thread src/nemo_safe_synthesizer/llm/metadata.py
Comment thread tests/smoke/conftest.py Outdated
Comment thread tests/smoke/conftest.py
Comment thread tests/smoke/conftest.py Outdated
Comment thread tests/smoke/README.md Outdated
Comment thread tests/smoke/test_full_pipeline_gpu.py Outdated
Comment thread tests/smoke/test_generation_cpu.py Outdated
Comment thread tests/smoke/test_nss_generation_gpu.py Outdated
Comment thread tests/smoke/test_nss_unsloth_gpu.py
Comment thread tests/smoke/README.md Outdated
@binaryaaron binaryaaron force-pushed the binaryaaron/add-smoke-tests branch 2 times, most recently from 0280ed2 to 8453877 Compare February 20, 2026 22:16
@binaryaaron binaryaaron marked this pull request as ready for review February 20, 2026 22:21
@binaryaaron binaryaaron requested review from a team as code owners February 20, 2026 22:21
@binaryaaron binaryaaron changed the title feat: add smoke tests feat(test): add smoke tests for training and generation hot paths Feb 20, 2026
@binaryaaron binaryaaron force-pushed the binaryaaron/add-smoke-tests branch from 8453877 to 482412e Compare February 24, 2026 15:01
@binaryaaron binaryaaron self-assigned this Feb 24, 2026
@binaryaaron binaryaaron changed the title feat(test): add smoke tests for training and generation hot paths test: add smoke tests for training and generation hot paths Feb 24, 2026
Copilot AI review requested due to automatic review settings February 24, 2026 17:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive smoke tests to catch dependency breakage and SDK orchestration regressions in training and generation hot paths. The tests use tiny models and run quickly (CPU tests in seconds, GPU tests in ~20 minutes) without checking output quality.

Changes:

  • Adds 13 new smoke test files covering CPU training/generation, GPU SDK training/generation, adapter persistence, resume flows, structured generation, timeseries, and Unsloth integration
  • Moves SmolLM2 metadata from production code to test-only tests/smoke/conftest.py (fixes CUDA device-side asserts from out-of-range BOS token)
  • Restructures pytest markers: 4 categories (unit, slow, smoke, e2e) + 1 orthogonal modifier (requires_gpu)
  • Splits GPU CI workflow into parallel required smoke tests (~20 min) and informational e2e tests (~50 min) for faster feedback
  • Adds CPU smoke tests as a required CI job
  • Updates documentation across 11 files (README, CONTRIBUTING, AGENTS, test docs, Cursor rules, agent skills)

Reviewed changes

Copilot reviewed 31 out of 32 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/smoke/conftest.py Shared fixtures for smoke tests including SmolLM2 metadata class, tiny model fixtures, session-scoped tokenizer, and helper functions
tests/smoke/init.py Package marker file with license header
tests/smoke/README.md Documentation explaining when to add smoke tests, common gotchas, and fixture descriptions
tests/smoke/test_training_cpu.py CPU training smoke tests (HF Trainer, LoRA, DP, Assembler)
tests/smoke/test_generation_cpu.py CPU generation smoke tests (schema prompt, processor)
tests/smoke/test_nss_training_gpu.py GPU SDK training tests (standard + DP)
tests/smoke/test_nss_generation_gpu.py GPU generation tests (full chain + manual VllmBackend)
tests/smoke/test_nss_adapter_persistence_gpu.py GPU adapter file checks + PEFT load verification
tests/smoke/test_nss_resume_gpu.py GPU resume flow test (train -> save -> load -> generate)
tests/smoke/test_nss_structured_gen_gpu.py GPU structured generation test (outlines + json_schema)
tests/smoke/test_nss_timeseries_gpu.py GPU timeseries backend test
tests/smoke/test_nss_unsloth_gpu.py GPU Unsloth training test (process-isolated)
tests/smoke/test_full_pipeline_gpu.py GPU full pipeline test with SmolLM2-135M from Hub
src/nemo_safe_synthesizer/llm/metadata.py Removes SmolLM2 class (moved to tests)
tests/conftest.py Updates auto-marking logic for new marker structure
tests/llm/test_metadata.py Removes SmolLM2 test scenarios
tests/e2e/test_safe_synthesizer.py Updates markers from gpu_integration to requires_gpu
pytest.ini Redefines markers: removes 5 old markers, adds 3 new ones
pyproject.toml Adds setuptools>=80.0.0 dependency
uv.lock Updates dependency lock with setuptools
Makefile Renames test targets: test-smoke, test-smoke-gpu (replaces test-gpu-integration)
.github/workflows/gpu-tests.yml Splits into gpu-smoke-test (required) and gpu-e2e-test (informational) jobs
.github/workflows/ci-checks.yml Adds CPU smoke-test as required job
README.md Updates test command documentation
CONTRIBUTING.md Updates test command documentation and adds smoke test reference
AGENTS.md Updates test command and marker documentation
tests/AGENTS.md Updates test command reference
tests/TESTING.md Comprehensive update of markers, commands, fixtures, and isolation patterns
.cursor/rules/repo-navigation.mdc Updates marker list
.claude/commands/gpu-test.md Updates GPU test command reference
.agent/skills/python-testing-patterns/SKILL.md Updates marker documentation
.agent/skills/diagnose-failures/SKILL.md Updates marker and command table

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Makefile Outdated
Comment thread tests/smoke/conftest.py
Comment thread .github/workflows/gpu-tests.yml
Comment thread pyproject.toml Outdated
Comment thread .github/workflows/ci-checks.yml Outdated
Comment thread pyproject.toml
@binaryaaron binaryaaron force-pushed the binaryaaron/add-smoke-tests branch 2 times, most recently from b4ae90d to fb8129b Compare March 10, 2026 14:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 31 out of 32 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/TESTING.md Outdated
Comment thread tests/TESTING.md Outdated
Comment thread Makefile Outdated
Comment thread tests/smoke/test_full_pipeline_gpu.py
Comment thread tests/smoke/test_nss_unsloth_gpu.py
Comment thread tests/smoke/test_nss_unsloth_gpu.py Outdated
Comment thread tests/smoke/README.md Outdated
Comment thread .github/workflows/ci-checks.yml
Comment thread .github/workflows/ci-checks.yml
Comment thread docs/user-guide/troubleshooting.md
Comment thread tests/smoke/conftest.py Outdated
@binaryaaron binaryaaron added task area:dev-ex Affects build or dev experience labels Mar 17, 2026
@binaryaaron binaryaaron force-pushed the binaryaaron/add-smoke-tests branch from fb8129b to 9137c3e Compare March 17, 2026 20:13
Copilot AI review requested due to automatic review settings March 18, 2026 22:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 33 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

tests/llm/test_metadata.py:130

  • This removes SmolLM2 coverage from the unit-level metadata tests (no ModelDetectionScenario/ModelInitScenario for SmolLM2) even though SmolLM2 is still a production ModelMetadata subclass. Consider updating the SmolLM2 scenarios to use tokenizer-derived BOS/EOS (or otherwise avoid hardcoding an out-of-range bos_token_id) rather than dropping the unit test coverage entirely.
# Model detection scenarios - maps model paths to expected classes
MODEL_DETECTION_SCENARIOS = [
    ModelDetectionScenario("tinyllama", "TinyLlama/TinyLlama-1.1B-Chat-v1.0", TinyLlama),
    ModelDetectionScenario("qwen", "Qwen/Qwen2-0.5B", Qwen),
    ModelDetectionScenario("llama32", "meta-llama/Llama32-1B", Llama32),
    ModelDetectionScenario("smollm3", "HuggingFaceTB/SmolLM3-3B", SmolLM3),
    ModelDetectionScenario("mistral", "mistralai/Mistral-7B-Instruct-v0.3", Mistral),
    ModelDetectionScenario("nemotron", "nvidia/Nemotron-4-340B", Nemotron),
]

# Model initialization scenarios - tests each model's specific configuration
MODEL_INIT_SCENARIOS = [
    ModelInitScenario(
        id="tinyllama",
        model_class=TinyLlama,
        model_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
        expected_template=PROMPT_TEMPLATE,
        expected_add_bos=True,
        expected_add_eos=True,
        expected_bos_token=None,  # Uses tokenizer default
        expected_bos_token_id=None,
    ),
    ModelInitScenario(
        id="qwen",
        model_class=Qwen,
        model_path="Qwen/Qwen2-0.5B",
        expected_template="user\n {instruction} {schema} \n assistant\n{prefill}",
        expected_add_bos=True,
        expected_add_eos=False,
        expected_bos_token=None,
        expected_bos_token_id=None,
    ),
    ModelInitScenario(
        id="llama32",
        model_class=Llama32,
        model_path="meta-llama/Llama32-1B",
        expected_template="user\n {instruction} {schema} \n assistant\n{prefill}",
        expected_add_bos=False,
        expected_add_eos=False,
        expected_bos_token="<|im_start|>",
        expected_bos_token_id=151644,
    ),
    ModelInitScenario(
        id="smollm3",
        model_class=SmolLM3,
        model_path="HuggingFaceTB/SmolLM3-3B",

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Makefile Outdated
Comment thread tests/smoke/test_training_cpu.py Outdated
Comment thread src/nemo_safe_synthesizer/llm/metadata.py
Comment thread .github/workflows/ci-checks.yml
Comment thread .github/workflows/gpu-tests.yml
Comment thread Makefile Outdated
Copilot AI review requested due to automatic review settings March 19, 2026 04:46
@binaryaaron binaryaaron force-pushed the binaryaaron/add-smoke-tests branch 2 times, most recently from 4e53f6b to bb69311 Compare March 19, 2026 04:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 48 out of 49 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kendrickb-nvidia kendrickb-nvidia left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is a bug with github console. But I'm seeing diffs here that are from already merged PRs. Not just the smoke test changes.

Comment thread .github/workflows/ci-checks.yml
Signed-off-by: aagonzales <aagonzales@nvidia.com>

Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: aagonzales <aagonzales@nvidia.com>
Signed-off-by: aagonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: aagonzales <aagonzales@nvidia.com>
Signed-off-by: aagonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: aagonzales <aagonzales@nvidia.com>
Signed-off-by: aagonzales <aagonzales@nvidia.com>
Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com>
Signed-off-by: aagonzales <aagonzales@nvidia.com>
- Replace dead `enable_replace_pii=False` with `replace_pii=None` in
  all `from_params()` calls (the old key was silently ignored by Pydantic)
- Remove dead `enable_synthesis=True` (no such field exists in src/)
- Add `vllm` pytest marker for GPU memory isolation; refactor Makefile
  `test-smoke-gpu` into marker-based groups (train-only / vllm / smollm2 / unsloth)
- Tighten `except GenerationError` to assert on expected message
- Fix SmolLM2 metadata: remove hardcoded out-of-range BOS token IDs
- Promote `base_smoke_config` and `_patch_attn_eager` to session scope
- Make `tiny_llama_config` depend on `stub_tokenizer` (eliminate redundant load)
- Add CPU smoke tests for evaluation (MultimodalReport) and PII replacement (NemoPII)
- Update smoke test docs (README.md, TESTING.md) to reflect new markers and scopes

Made-with: Cursor
Signed-off-by: aagonzales <aagonzales@nvidia.com>
- ci-checks.yml: fix echo alignment in status check job
- TESTING.md: use `uv run --frozen pytest` instead of bare `pytest`
- test_training_cpu.py: remove unused `fixture_stub_tokenizer_path` param
- Makefile: use PYTEST_NO_XDIST_CMD instead of appending -n 0 to PYTEST_CMD;
  fix test-gpu-integration help text to match its actual recipe (e2e only)

Made-with: Cursor
Signed-off-by: aagonzales <aagonzales@nvidia.com>
Signed-off-by: aagonzales <aagonzales@nvidia.com>
Signed-off-by: aagonzales <aagonzales@nvidia.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 36 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +27 to +28
decoded = stub_tokenizer.decode(output[0], skip_special_tokens=True)
assert len(decoded) > len(prompt) # model produced something

Copilot AI Mar 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_generate_with_schema_prompt asserts len(decoded) > len(prompt) after decoding with skip_special_tokens=True. With a randomly initialized model and greedy decoding, it's possible for generation to immediately emit only special tokens (e.g., EOS) so the decoded text matches the prompt, making this test flaky. Prefer asserting on tensor lengths (e.g., output.shape[-1] > input_ids.shape[-1]) or simply that generate() returns without error and the output shape is valid.

Suggested change
decoded = stub_tokenizer.decode(output[0], skip_special_tokens=True)
assert len(decoded) > len(prompt) # model produced something
# Ensure generate() returns a tensor with valid shape (same batch, no shorter sequence)
assert output is not None
assert output.shape[0] == input_ids.shape[0]
assert output.shape[1] >= input_ids.shape[1]

Copilot uses AI. Check for mistakes.
rope_scaling_factor=1,
)
mock_metadata = MagicMock()
mock_metadata.prompt_config.prompt_template = PROMPT_TEMPLATE

Copilot AI Mar 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModelMetadata.prompt_config uses the field name template (see LLMPromptConfig.template), but this test sets mock_metadata.prompt_config.prompt_template. Even though the current create_processor() path may not read it, keeping the mock aligned with the real metadata interface will prevent silent breakage if create_processor() starts using the prompt template in more modes.

Suggested change
mock_metadata.prompt_config.prompt_template = PROMPT_TEMPLATE
mock_metadata.prompt_config.template = PROMPT_TEMPLATE

Copilot uses AI. Check for mistakes.

@mckornfield mckornfield left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marks are a bit copy pasta but probably more readable, mostly talking to myself in reviews, lg

Comment thread tests/smoke/README.md
2. vLLM generation (`vllm` marker): each file gets its own process because vLLM pre-allocates all GPU memory and never releases it.
3. SmolLM2 / Unsloth (`smollm2`, `unsloth` markers): each gets its own process, auto-discovered via markers.

When adding a new GPU smoke test, add the appropriate markers to `pytestmark`:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are all of them appropriate? does it make sense to just add one pytest_add_gpu_markers() fn or nah?

Comment thread tests/smoke/README.md
Comment thread tests/smoke/test_full_pipeline_gpu.py
Comment thread tests/TESTING.md
Comment thread Makefile
@binaryaaron binaryaaron merged commit c757976 into main Mar 23, 2026
17 checks passed
@binaryaaron binaryaaron deleted the binaryaaron/add-smoke-tests branch March 23, 2026 21:32
binaryaaron added a commit that referenced this pull request Mar 24, 2026
## Summary

the smoke tests pr #76 uses smollm2 for testing. during one of many
rebases and changes to that branch, somewhere along the way we lost the
following change.

- `SmolLM2` metadata hardcoded `bos_token_id=151644` (copied from the
`Llama32` class), but all SmolLM2 models have a 49152-token vocab where
`<|im_start|>`; token id `151644` caused a CUDA
`vectorized_gather_kernel` oob during tests.
- Fix: look up the actual `<|im_start|>` token ID from the tokenizer via
`convert_tokens_to_ids` at init time instead of hardcoding it.
- Also corrects stale `generation/AGENTS.md` that said `ignore_eos=True`
(now always `False` after #265).



## Test plan

- [x] `pytest tests/llm/` -- 52 passed
- [x] `pytest
tests/smoke/test_full_pipeline_gpu.py::test_full_pipeline_smollm2` --
passes (was CUDA crash before)

Signed-off-by: aagonzales <aagonzales@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-ex Affects build or dev experience

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants