Skip to content

Add anonymization, Studio, and Nemotron privacy-filter support#46

Merged
maziyarpanahi merged 54 commits into
masterfrom
feature/obfuscation-pii
Apr 29, 2026
Merged

Add anonymization, Studio, and Nemotron privacy-filter support#46
maziyarpanahi merged 54 commits into
masterfrom
feature/obfuscation-pii

Conversation

@maziyarpanahi

@maziyarpanahi maziyarpanahi commented Apr 27, 2026

Copy link
Copy Markdown
Owner

Summary

  • Adds a Faker-backed anonymizer with locale-aware, deterministic, and format-preserving surrogate generation.
  • Introduces canonical PII label normalization plus shared BIOES decoding and span refinement utilities for privacy-filter backends.
  • Adds cross-platform privacy-filter routing with MLX support on Apple Silicon and PyTorch fallback elsewhere.
  • Extends the public privacy-filter family with Nemotron-PII checkpoint IDs: OpenMed/privacy-filter-nemotron, OpenMed/privacy-filter-nemotron-mlx, and OpenMed/privacy-filter-nemotron-mlx-8bit.
  • Adds family-aware Torch fallback so Nemotron MLX requests on non-MLX hosts substitute OpenMed/privacy-filter-nemotron instead of openai/privacy-filter.
  • Adds classifier-head bias support for Nemotron MLX artifacts while preserving the original bias-less OpenAI Privacy Filter default, including native Swift OpenMedKit artifact decoding/model construction.
  • Switches the OpenMed Scan Demo privacy-filter engine to OpenMed/privacy-filter-nemotron-mlx-8bit and updates the visible engine naming/docs.
  • Adds Privacy Filter Studio, an interactive FastAPI/static web demo for masking or deterministic Faker randomization with sample notes, highlighted entities, backend/model status, and explicit first-run download control.
  • Adds Portuguese API schema support, trust-remote-code loading for the Torch privacy-filter family, docs, demos, changelog/readme updates, lockfile refresh, and focused unit coverage.

Review Notes

  • Reviewed the new Swift OpenMedKit classifier-bias decoding/model changes, Scan Demo Nemotron switch, Studio app/config, MLX classifier-head change, changelog entries, routing, MLX dispatch aliases, Torch loader changes, docs, example, and tests before committing.
  • Fixed Studio behavior before commit so cache-only mode actively sets Hugging Face offline flags and model-load/inference errors render as errors in the output pane.
  • Earlier docs fixes in this PR covered artifact count wording, tiktoken typo, baseline-vs-fine-tune wording, and fuzzy changelog validation count.

Validation

  • git diff --check
  • swift test --filter OpenMedMLXTests/testPrivacyFilterConfigDecodesNemotronClassifierBias (1 passed)
  • swift test --filter OpenMedMLXTests/testPrivacyFilterConfigAcceptsUnembeddingBiasAlias (1 passed)
  • swift test --filter OpenMedMLXTests/testTinyPrivacyFilterNemotronModelForwardShape (1 skipped: MLX runtime resources unavailable from SwiftPM CLI test bundle)
  • swift test --filter OpenMedMLXTests/testTinyPrivacyFilterBaselineHasNoUnembeddingBiasSlot (1 skipped: MLX runtime resources unavailable from SwiftPM CLI test bundle)
  • .venv/bin/python -m compileall -q examples/privacy_filter_studio openmed/mlx/models/privacy_filter.py
  • FastAPI Studio smoke via TestClient: GET /api/examples -> 200, empty POST /api/run -> 200 / empty
  • .venv/bin/python -m pytest tests/unit/mlx/test_privacy_filter_mlx.py tests/unit/test_privacy_filter_routing.py (20 passed, 8 skipped)
  • Earlier focused privacy/anonymization suite: .venv/bin/python -m pytest tests/unit/core/test_anonymizer.py tests/unit/core/test_labels.py tests/unit/test_pii.py tests/unit/test_privacy_filter_routing.py tests/unit/test_pii_multilingual_regression.py tests/unit/mlx/test_privacy_filter_mlx.py tests/unit/service/test_api.py (471 passed, 1 skipped, 11 warnings)

@maziyarpanahi maziyarpanahi marked this pull request as ready for review April 27, 2026 21:21
@maziyarpanahi maziyarpanahi self-assigned this Apr 27, 2026
@maziyarpanahi maziyarpanahi changed the title [codex] Add anonymization and privacy-filter routing [codex] Add anonymization and Nemotron privacy-filter routing Apr 28, 2026
@maziyarpanahi maziyarpanahi changed the title [codex] Add anonymization and Nemotron privacy-filter routing Add anonymization and Nemotron privacy-filter routing Apr 28, 2026
@maziyarpanahi maziyarpanahi changed the title Add anonymization and Nemotron privacy-filter routing Add anonymization, Studio, and Nemotron privacy-filter routing Apr 29, 2026
@maziyarpanahi maziyarpanahi changed the title Add anonymization, Studio, and Nemotron privacy-filter routing Add anonymization, Studio, and Nemotron privacy-filter support Apr 29, 2026
@maziyarpanahi maziyarpanahi merged commit f85bcde into master Apr 29, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant