Skip to content

Commit 38f86b2

Browse files
binaryaaroncursoragentkendrickb-nvidia
authored
feat: add preflight validation workflow and centralize shared validators (#406)
`tooling/` is where the evaluation-report renderer will eventually consolidate and where agent-friendly JSON and plain-text modes will land. Keeping preflight rendering-free means render swaps never reach into check code. ## UX `safe-synthesizer run --validate ...` prints a Rich report grouped by stage, with per-check status, severity-tagged issues, and a terminal summary. Errors set exit code 1. Warnings are advisory. See `docs/user-guide/running.md` and `docs/user-guide/troubleshooting.md`. ## Extending preflight `docs/developer-guide/preflight-plugins.md` walks two end-to-end worked examples (`DuplicateRowsCheck`, `GroupSkewCheck`), covers the `requires` / `enabled()` dependency model and issue-code conventions, and includes a "Testing a check" section. ## Tests - `tests/preflight/test_preflight.py`: per-check unit coverage. - `tests/preflight/test_plugin_registration.py`: plugin surface, reserved namespaces, crash isolation, dependency gating. - `tests/preflight/test_registry_validation.py`: `build_registry` and `_validate_registry` shape invariants. - `tests/preflight/test_preflight_cli.py`: smoke coverage for the `--validate` path. - `tests/cli/test_run.py`: integration with `safe-synthesizer run`. ## Follow-ups - `safe-synthesizer doctor`: a data-independent environment health command that reuses the environment and config-only subset of the registry. Plan drafted, implementation deferred. - Plain, JSON, and agent-friendly render modes via `tooling.modes.RenderMode`. closes #332. --------- Signed-off-by: Aaron Gonzales <aagonzales@nvidia.com> Signed-off-by: Kendrick Boyd <kendrickb@nvidia.com> Signed-off-by: aaron gonzales <aagonzales@nvidia.com> Signed-off-by: aagonzales <aagonzales@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Kendrick Boyd <kendrickb@nvidia.com>
1 parent bf4754e commit 38f86b2

64 files changed

Lines changed: 5853 additions & 297 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,8 +73,5 @@ docs/**/_snippets/**/*.tgz
7373
# this is sometimes used by wandb as a local path.
7474
wandb/
7575

76-
# output from unsloth defaults to the local path
77-
unsloth_compiled_cache/
78-
7976
# symbolic link to the nmp repo folder for uv dependency resolution.
8077
.nmp_repo

AGENTS.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,20 +48,22 @@ Source code lives in `src/nemo_safe_synthesizer/`:
4848
| `cli/` | Click CLI, main entry point |
4949
| `config/` | Pydantic parameter models, SafeSynthesizerParameters |
5050
| `configurator/` | Pydantic-to-Click mapping, Parameter types, validators |
51-
| `data_processing/` | Holdout, actions, assembler, records |
51+
| `data_processing/` | Holdout, actions, assembler, records, shared token budget (`budget.py`), shared column validators (`validation.py`) |
5252
| `evaluation/` | Evaluator, components (privacy, MI, AIA, PII replay), reports |
5353
| `generation/` | GeneratorBackend, VllmBackend, regex manager, batch gen |
5454
| `holdout/` | Train/test splitting |
5555
| `llm/` | Model loading, metadata, memory management |
5656
| `pii_replacer/` | NER-based PII detection and replacement |
5757
| `privacy/` | DP transformers (Opacus integration) |
5858
| `sdk/` | SafeSynthesizer builder, library_builder |
59-
| `training/` | TrainingBackend, HuggingFace backend |
59+
| `training/` | TrainingBackend, HuggingFace backend, timeseries_preprocessing (`timeseries_preprocessing.py`) |
6060
| `artifacts/` | Data quality checks, field analysis, metadata |
6161
| `observability.py` | CategoryLogger, TracedContext, structured logging |
6262
| `errors.py` | Error hierarchy: `SafeSynthesizerError``UserError` (`DataError`/`ParameterError` are also `ValueError`; `GenerationError` is also `RuntimeError`) and `InternalError` (also `RuntimeError`). See `diagnose-failures` skill |
6363
| `defaults.py` | Default settings, constants (`DEFAULT_ARTIFACTS_PATH`, `PSEUDO_GROUP_COLUMN`) |
6464
| `package_info.py` | Package version (uv-dynamic-versioning) |
65+
| `preflight/` | Pre-flight validation (runs against the training split produced by `Holdout`, not the full input). Package layout: `types` (dataclasses), `base` (`PreflightCheck` ABC hierarchy — `ConfigCheck`/`DataFrameCheck`/`MetadataCheck`/`AdvisoryCheck`), `registry` (`get_registry() -> PreflightRegistry`, plugin registration), `orchestrator` (`run_preflight`, `_run_registry` with dependency gating), `checks/` (15 granular core checks grouped by stage: `environment.py` for CONFIG, `dataframe.py` for DATAFRAME, `metadata.py` for METADATA, `advisory.py` for ADVISORY, plus `_helpers.py` shared helpers and public `preflight.helpers` for plugin authors). Rendering-free by design. |
66+
| `tooling/` | Internal rendering layer. Hosts `render_preflight_report` (Rich today; agentic/plain/JSON modes planned via `RenderMode`), `PreflightRenderContext`. Intended to absorb the evaluation report renderer and alternative output modes over time. |
6567
| `results.py` | Result compilation (`make_nss_results`, `make_nss_summary`) |
6668
| `utils.py` | Schema prompt creation, pattern matching helpers |
6769

0 commit comments

Comments
 (0)