Skip to content

Commit a7d4b51

Browse files
Merge pull request #51 from maziyarpanahi/feature/multilingual-pii-model
Add multilingual Privacy Filter v1.4.0 release
2 parents c82b8e0 + 6776df4 commit a7d4b51

46 files changed

Lines changed: 3139 additions & 207 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,7 @@ local_config.py
221221

222222
# Personal release/announcement drafts
223223
/RELEASE_NOTES*.md
224+
!/RELEASE_NOTES_v1.4.0.md
224225
/ANNOUNCEMENT*.md
225226
local_settings.py
226227
dev_config.json

CHANGELOG.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,40 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [1.4.0] - 2026-05-04
11+
12+
### Added
13+
14+
- **OpenMed Multilingual Privacy Filter family**, registered across PyTorch and MLX:
15+
- `OpenMed/privacy-filter-multilingual` — PyTorch / Transformers (CPU + CUDA).
16+
- `OpenMed/privacy-filter-multilingual-mlx` — MLX full-precision (Apple Silicon).
17+
- `OpenMed/privacy-filter-multilingual-mlx-8bit` — MLX 8-bit quantized (Apple Silicon and OpenMedKit demos).
18+
These artifacts use the OpenAI Privacy Filter architecture and officially support 16 languages through the OpenMed multilingual PII corpus.
19+
- **Python MLX routing for multilingual Privacy Filter artifacts**:
20+
- `_MLX_MODEL_MAP` entries for the full and 8-bit multilingual MLX repo IDs.
21+
- `privacy-filter-multilingual` and `multilingual-privacy-filter` MLX family aliases, both resolving to the existing OpenAI Privacy Filter model class and BIOES decoder.
22+
- Family-aware Torch fallback so multilingual MLX model names substitute `OpenMed/privacy-filter-multilingual` on non-MLX hosts instead of the OpenAI baseline.
23+
- **Multilingual Privacy Filter Studio** in `examples/privacy_filter_multilingual_studio/`, a web demo comparing the OpenAI baseline, OpenAI Nemotron Privacy Filter, and OpenMed Multilingual Privacy Filter with English, French, and Arabic examples.
24+
- **OpenMed Scan Demo multilingual mode** with `OpenMed/privacy-filter-multilingual-mlx-8bit`, a three-engine picker, EN/FR/AR sample buttons, and new French/Arabic scanned demo documents for screenshot-ready flows.
25+
- **Release notes** for v1.4.0 in `RELEASE_NOTES_v1.4.0.md`.
26+
27+
### Changed
28+
29+
- Privacy Filter docs and README now describe three Privacy Filter families and label the multilingual model as **OpenMed Multilingual Privacy Filter**.
30+
- OpenMedKit and demo version surfaces now point at `1.4.0`.
31+
- The scan demo clears previous annotation windows whenever the language/sample changes, avoiding stale entities from earlier model runs.
32+
- The multilingual web studio scan animation now performs a single top-to-bottom pass while redacting line by line, matching the stronger visual rhythm of the original Privacy Filter Studio.
33+
34+
### Fixed
35+
36+
- Improved Swift model-download handling so stale cached 401/404 responses cannot masquerade as `openmed-mlx.json` manifests after a public model becomes available.
37+
- Tightened stale-result invalidation in iOS and web demo flows so slower previous model runs cannot overwrite a newly selected language/sample.
38+
39+
### Tests
40+
41+
- Added Python unit coverage for multilingual MLX backend selection, family-aware Torch fallback, and MLX Privacy Filter family dispatch aliases.
42+
- Rebuilt the OpenMed Scan Demo after the multilingual 8-bit integration.
43+
1044
## [1.3.0] - 2026-04-27
1145

1246
### Added

README.md

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -56,16 +56,16 @@ Apple Silicon acceleration in Python:
5656
uv pip install -e ".[mlx]"
5757
```
5858

59-
Swift apps on macOS and iOS use `OpenMedKit`. In `1.2.0`, that means:
59+
Swift apps on macOS and iOS use `OpenMedKit`. As of `1.4.0`, that means:
6060

61-
- **MLX** on Apple Silicon macOS and real iPhone/iPad hardware for supported OpenMed PII, OpenAI Privacy Filter, and experimental GLiNER-family artifacts
61+
- **MLX** on Apple Silicon macOS and real iPhone/iPad hardware for supported OpenMed PII, OpenAI Privacy Filter, OpenAI Nemotron Privacy Filter, OpenMed Multilingual Privacy Filter, and experimental GLiNER-family artifacts
6262
- **CoreML** when you already have a bundled Apple model package or want the fallback Apple path
6363

6464
Add the Swift package like this:
6565

6666
```swift
6767
dependencies: [
68-
.package(url: "https://github.com/maziyarpanahi/openmed.git", from: "1.2.0"),
68+
.package(url: "https://github.com/maziyarpanahi/openmed.git", from: "1.4.0"),
6969
]
7070
```
7171

@@ -121,7 +121,7 @@ result = processor.process_texts([
121121
- **Advanced NER Processing**: Confidence filtering, entity grouping, and span alignment
122122
- **Multiple Output Formats**: Dict, JSON, HTML, CSV for any downstream system
123123

124-
### Production Tools (v1.2.0)
124+
### Production Tools (v1.4.0)
125125

126126
- **Batch Processing**: Multi-text and multi-file workflows with progress tracking
127127
- **Configuration Profiles**: `dev`/`prod`/`test`/`fast` presets with flexible overrides
@@ -176,8 +176,8 @@ uvicorn openmed.service.app:app --host 0.0.0.0 --port 8080
176176
### Run with Docker
177177

178178
```bash
179-
docker build -t openmed:1.2.0 .
180-
docker run --rm -p 8080:8080 -e OPENMED_PROFILE=prod openmed:1.2.0
179+
docker build -t openmed:1.4.0 .
180+
docker run --rm -p 8080:8080 -e OPENMED_PROFILE=prod openmed:1.4.0
181181
```
182182

183183
### Example request
@@ -262,15 +262,18 @@ deidentify(text, method="replace", lang="pt", locale="pt_BR",
262262

263263
### Privacy Filter Family (Public)
264264

265-
OpenMed ships **two checkpoints** of the OpenAI Privacy Filter architecture — same model code (gpt-oss-style sparse-MoE transformer with local attention, sink tokens, RoPE+YaRN, tiktoken `o200k_base` tokenization), different training data:
265+
OpenMed ships **three Privacy Filter families** on the OpenAI Privacy Filter architecture — same model code (gpt-oss-style sparse-MoE transformer with local attention, sink tokens, RoPE+YaRN, tiktoken `o200k_base` tokenization), different training data:
266266

267-
| Variant | Trained on | PyTorch (CPU + CUDA) | MLX full (Apple Silicon) | MLX 8-bit (Apple Silicon) |
268-
| ---------------------- | ---------------------------------------------------------------------------------- | ------------------------------------- | ------------------------------------------------- | ------------------------------------------------------ |
269-
| OpenAI Privacy Filter | OpenAI's PII training set | [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter) | [`OpenMed/privacy-filter-mlx`](https://huggingface.co/OpenMed/privacy-filter-mlx) | [`OpenMed/privacy-filter-mlx-8bit`](https://huggingface.co/OpenMed/privacy-filter-mlx-8bit) |
270-
| Nemotron-PII fine-tune | [Nemotron PII dataset](https://huggingface.co/datasets/nvidia/Nemotron-PII-v1) | [`OpenMed/privacy-filter-nemotron`](https://huggingface.co/OpenMed/privacy-filter-nemotron) | [`OpenMed/privacy-filter-nemotron-mlx`](https://huggingface.co/OpenMed/privacy-filter-nemotron-mlx) | [`OpenMed/privacy-filter-nemotron-mlx-8bit`](https://huggingface.co/OpenMed/privacy-filter-nemotron-mlx-8bit) |
267+
| Variant | Trained on | PyTorch (CPU + CUDA) | [MLX full (OpenMedKit + Apple Silicon)](swift/OpenMedKit) | [MLX 8-bit (OpenMedKit + Apple Silicon)](swift/OpenMedKit) |
268+
| ------------------------------------ | ------------------------------------------------------------------------------ | ------------------------------------------------------------------------ | ------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
269+
| OpenAI Privacy Filter | OpenAI's PII training set | [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter) | [`OpenMed/privacy-filter-mlx`](https://huggingface.co/OpenMed/privacy-filter-mlx) | [`OpenMed/privacy-filter-mlx-8bit`](https://huggingface.co/OpenMed/privacy-filter-mlx-8bit) |
270+
| Nemotron-PII fine-tune | [Nemotron PII dataset](https://huggingface.co/datasets/nvidia/Nemotron-PII-v1) | [`OpenMed/privacy-filter-nemotron`](https://huggingface.co/OpenMed/privacy-filter-nemotron) | [`OpenMed/privacy-filter-nemotron-mlx`](https://huggingface.co/OpenMed/privacy-filter-nemotron-mlx) | [`OpenMed/privacy-filter-nemotron-mlx-8bit`](https://huggingface.co/OpenMed/privacy-filter-nemotron-mlx-8bit) |
271+
| OpenMed Multilingual Privacy Filter | OpenMed multilingual PII corpus with official support for 16 languages | [`OpenMed/privacy-filter-multilingual`](https://huggingface.co/OpenMed/privacy-filter-multilingual) | [`OpenMed/privacy-filter-multilingual-mlx`](https://huggingface.co/OpenMed/privacy-filter-multilingual-mlx) | [`OpenMed/privacy-filter-multilingual-mlx-8bit`](https://huggingface.co/OpenMed/privacy-filter-multilingual-mlx-8bit) |
271272

272273
All model IDs above route through the **same** `extract_pii()` / `deidentify()` API — only the `model_name=` argument changes.
273274

275+
The MLX artifacts above use the OpenMed MLX artifact layout consumed by [OpenMedKit](swift/OpenMedKit) for native macOS, iOS, and iPadOS apps.
276+
274277
#### Install
275278

276279
The PyTorch path runs anywhere (Linux, macOS, Windows; CPU or CUDA):
@@ -321,6 +324,10 @@ extract_pii(text, model_name="OpenMed/privacy-filter-mlx-8bit")
321324
# Nemotron-PII fine-tune (full / 8-bit MLX artifacts)
322325
extract_pii(text, model_name="OpenMed/privacy-filter-nemotron-mlx")
323326
extract_pii(text, model_name="OpenMed/privacy-filter-nemotron-mlx-8bit")
327+
328+
# OpenMed Multilingual Privacy Filter (full / 8-bit MLX artifacts)
329+
extract_pii(text, model_name="OpenMed/privacy-filter-multilingual-mlx")
330+
extract_pii(text, model_name="OpenMed/privacy-filter-multilingual-mlx-8bit")
324331
```
325332

326333
#### Cross-platform note
@@ -329,6 +336,7 @@ The MLX artifact names work everywhere — on a non-Apple-Silicon host (or anywh
329336

330337
- `OpenMed/privacy-filter-mlx*` ⇒ falls back to `openai/privacy-filter`
331338
- `OpenMed/privacy-filter-nemotron-mlx*` ⇒ falls back to `OpenMed/privacy-filter-nemotron`
339+
- `OpenMed/privacy-filter-multilingual-mlx*` ⇒ falls back to `OpenMed/privacy-filter-multilingual`
332340

333341
So your code can ship an MLX model name and run on any host without changes — Apple Silicon users get MLX speed, everyone else gets the same family's PyTorch checkpoint.
334342

RELEASE_NOTES_v1.4.0.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# OpenMed v1.4.0
2+
3+
OpenMed v1.4.0 is the multilingual Privacy Filter release.
4+
5+
This release brings the **OpenMed Multilingual Privacy Filter** into the main OpenMed ecosystem across Python, MLX, OpenMedKit, the iOS Scan Demo, and the web demo experience. The new family officially supports 16 languages and ships in PyTorch, MLX full-precision, and MLX 8-bit forms.
6+
7+
The headline: developers can now use the same `extract_pii()` / `deidentify()` API for the OpenAI baseline, OpenAI Nemotron Privacy Filter, and OpenMed Multilingual Privacy Filter, while Apple demos can showcase all three model choices without changing application code.
8+
9+
## Highlights
10+
11+
- Added the OpenMed Multilingual Privacy Filter model family:
12+
- `OpenMed/privacy-filter-multilingual`
13+
- `OpenMed/privacy-filter-multilingual-mlx`
14+
- `OpenMed/privacy-filter-multilingual-mlx-8bit`
15+
- Added Python MLX routing for the multilingual full and 8-bit artifacts.
16+
- Added family-aware fallback so multilingual MLX names resolve to the multilingual PyTorch checkpoint on non-MLX hosts.
17+
- Added MLX family aliases for multilingual Privacy Filter artifacts that reuse the existing OpenAI Privacy Filter runtime and BIOES decoder.
18+
- Updated the OpenMed Scan Demo with the 8-bit multilingual model, a clearer three-model picker, and EN/FR/AR sample buttons.
19+
- Added French and Arabic scanned demo documents for screenshot-ready multilingual flows.
20+
- Added a multilingual web studio that compares the OpenAI baseline, OpenAI Nemotron Privacy Filter, and OpenMed Multilingual Privacy Filter.
21+
- Updated README, anonymization docs, MLX docs, Swift docs, CHANGELOG, and version surfaces for `1.4.0`.
22+
23+
## Privacy Filter Families
24+
25+
OpenMed now documents and routes three Privacy Filter families:
26+
27+
| Variant | PyTorch | MLX full | MLX 8-bit |
28+
| --- | --- | --- | --- |
29+
| OpenAI Privacy Filter | `openai/privacy-filter` | `OpenMed/privacy-filter-mlx` | `OpenMed/privacy-filter-mlx-8bit` |
30+
| OpenAI Nemotron Privacy Filter | `OpenMed/privacy-filter-nemotron` | `OpenMed/privacy-filter-nemotron-mlx` | `OpenMed/privacy-filter-nemotron-mlx-8bit` |
31+
| OpenMed Multilingual Privacy Filter | `OpenMed/privacy-filter-multilingual` | `OpenMed/privacy-filter-multilingual-mlx` | `OpenMed/privacy-filter-multilingual-mlx-8bit` |
32+
33+
All three families use the OpenAI Privacy Filter architecture. The multilingual family uses OpenMed multilingual PII training data and officially supports 16 languages.
34+
35+
## Python Usage
36+
37+
The public API stays the same:
38+
39+
```python
40+
from openmed import extract_pii, deidentify
41+
42+
text = "Patient Marie Dubois, nee le 14/03/1982, email marie.dubois@example.fr."
43+
44+
entities = extract_pii(
45+
text,
46+
model_name="OpenMed/privacy-filter-multilingual-mlx-8bit",
47+
)
48+
49+
safe = deidentify(
50+
text,
51+
model_name="OpenMed/privacy-filter-multilingual-mlx-8bit",
52+
method="replace",
53+
consistent=True,
54+
seed=42,
55+
)
56+
```
57+
58+
On Apple Silicon with MLX available, the MLX artifact runs through `PrivacyFilterMLXPipeline`. On other hosts, OpenMed substitutes the matching PyTorch checkpoint and emits a one-time warning:
59+
60+
- `OpenMed/privacy-filter-mlx*` -> `openai/privacy-filter`
61+
- `OpenMed/privacy-filter-nemotron-mlx*` -> `OpenMed/privacy-filter-nemotron`
62+
- `OpenMed/privacy-filter-multilingual-mlx*` -> `OpenMed/privacy-filter-multilingual`
63+
64+
## Apple And Demo Updates
65+
66+
The iOS Scan Demo now presents three privacy engines cleanly:
67+
68+
- OpenMed PII
69+
- OpenAI Nemotron Privacy Filter
70+
- OpenMed Multilingual Privacy Filter
71+
72+
The multilingual path uses `OpenMed/privacy-filter-multilingual-mlx-8bit` so the demo stays aligned with the 8-bit Apple artifact strategy. The sample controls now use compact `EN`, `FR`, and `AR` buttons, and switching language/sample clears previous annotations before the next run starts.
73+
74+
The multilingual web studio now uses a single top-to-bottom scan pass and redacts line by line during that pass, matching the original Privacy Filter Studio demo feel without looping the scan effect.
75+
76+
## Upgrade Notes
77+
78+
- The package version is now `1.4.0`.
79+
- Swift demo marketing versions are now `1.4.0`.
80+
- `OpenMed/privacy-filter-multilingual-mlx` and `OpenMed/privacy-filter-multilingual-mlx-8bit` are first-class model names in the MLX routing table.
81+
- The multilingual MLX artifacts must include a valid `openmed-mlx.json`; stale cached HTTP error bodies are no longer treated as manifests by the scan demo downloader.
82+
83+
## Validation
84+
85+
This release adds targeted unit coverage for multilingual Privacy Filter routing, MLX family alias dispatch, and family-aware fallback behavior. The OpenMed Scan Demo was also rebuilt after the multilingual 8-bit integration.

docs/anonymization.md

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -127,34 +127,39 @@ register_label_generator("FIRST_NAME", my_first_name)
127127

128128
## Privacy-filter family
129129

130-
OpenMed ships two privacy-filter checkpoints, both **the same OpenAI
130+
OpenMed ships three privacy-filter families, all **the same OpenAI
131131
Privacy Filter architecture** (gpt-oss-style sparse-MoE transformer with
132132
local attention, sink tokens, RoPE+YaRN, tiktoken `o200k_base`), differing
133133
only in their training data:
134134

135-
| Variant | Trained on | PyTorch artifact | MLX (full) | MLX (8-bit) |
136-
| ------------------------- | --------------------------- | ------------------------------------ | ------------------------------------------- | ------------------------------------------------- |
137-
| OpenAI Privacy Filter | OpenAI's PII training set | `openai/privacy-filter` | `OpenMed/privacy-filter-mlx` | `OpenMed/privacy-filter-mlx-8bit` |
138-
| Nemotron-PII fine-tune | Nemotron PII dataset | `OpenMed/privacy-filter-nemotron` | `OpenMed/privacy-filter-nemotron-mlx` | `OpenMed/privacy-filter-nemotron-mlx-8bit` |
135+
| Variant | Trained on | PyTorch artifact | MLX (full) | MLX (8-bit) |
136+
| ------------------------------------ | ----------------------------------------------- | ---------------------------------------- | ----------------------------------------------- | ----------------------------------------------------- |
137+
| OpenAI Privacy Filter | OpenAI's PII training set | `openai/privacy-filter` | `OpenMed/privacy-filter-mlx` | `OpenMed/privacy-filter-mlx-8bit` |
138+
| OpenAI Nemotron Privacy Filter | Nemotron PII dataset | `OpenMed/privacy-filter-nemotron` | `OpenMed/privacy-filter-nemotron-mlx` | `OpenMed/privacy-filter-nemotron-mlx-8bit` |
139+
| OpenMed Multilingual Privacy Filter | OpenMed multilingual PII corpus, 16 languages | `OpenMed/privacy-filter-multilingual` | `OpenMed/privacy-filter-multilingual-mlx` | `OpenMed/privacy-filter-multilingual-mlx-8bit` |
139140

140-
Both run through the same `extract_pii()` / `deidentify()` API — only the
141+
All run through the same `extract_pii()` / `deidentify()` API — only the
141142
weights differ:
142143

143144
```python
144145
extract_pii(text, model_name="OpenMed/privacy-filter-mlx-8bit")
145146
extract_pii(text, model_name="OpenMed/privacy-filter-nemotron-mlx-8bit")
147+
extract_pii(text, model_name="OpenMed/privacy-filter-multilingual-mlx-8bit")
146148

147149
deidentify(text, model_name="OpenMed/privacy-filter-nemotron",
148150
method="replace", consistent=True, seed=42)
151+
deidentify(text, model_name="OpenMed/privacy-filter-multilingual",
152+
method="replace", consistent=True, seed=42)
149153
```
150154

151155
**Backend selection.** On Apple Silicon with MLX importable, the MLX
152156
artifact runs natively via `PrivacyFilterMLXPipeline`. Elsewhere, the
153157
call substitutes the corresponding PyTorch model via `transformers` and
154158
emits a one-time `UserWarning` explaining the swap. The fallback is
155159
**family-aware** — an MLX-only Nemotron request on Linux substitutes
156-
`OpenMed/privacy-filter-nemotron` (not the unrelated `openai/privacy-filter`),
157-
so the user gets the same training distribution they asked for.
160+
`OpenMed/privacy-filter-nemotron`, and an MLX-only multilingual request
161+
substitutes `OpenMed/privacy-filter-multilingual`, so the user gets the same
162+
training distribution they asked for.
158163

159164
Either way the output entity dicts have the same shape so the rest of
160165
the pipeline behaves identically. Smart-merging (regex-based span

docs/examples.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,9 @@ Run them with VS Code, Jupyter, or Google Colab—each relies on the same `uv pi
2424

2525
## Apple Silicon & Swift recipes
2626

27-
OpenMed `1.2.0` adds release-critical Apple entry points:
27+
OpenMed `1.4.0` includes release-critical Apple entry points:
2828

29-
- [MLX Backend](./mlx-backend.md) for Python on Apple Silicon Macs, including Privacy Filter and experimental GLiNER-family artifacts
29+
- [MLX Backend](./mlx-backend.md) for Python on Apple Silicon Macs, including Privacy Filter, OpenMed Multilingual Privacy Filter, and experimental GLiNER-family artifacts
3030
- [OpenMedKit (Swift Package)](./swift-openmedkit.md) for macOS, iOS, and iPadOS apps
3131

3232
Python MLX quick check:

0 commit comments

Comments
 (0)