Skip to content

docs: add a de-identification cookbook notebook with end-to-end recipes#401

Merged
maziyarpanahi merged 2 commits into
maziyarpanahi:masterfrom
pardeep-singh:pardeep/issue316-deidentification-cookbook
Jun 20, 2026
Merged

docs: add a de-identification cookbook notebook with end-to-end recipes#401
maziyarpanahi merged 2 commits into
maziyarpanahi:masterfrom
pardeep-singh:pardeep/issue316-deidentification-cookbook

Conversation

@pardeep-singh

@pardeep-singh pardeep-singh commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements #316 (OM-151): adds a task-oriented de-identification cookbook of short, copy-paste recipes, filling the gap between the existing complete-guide notebooks and common workflow snippets.

examples/notebooks/Deidentification_Cookbook.ipynb covers four recipes with synthetic PHI only and cleared outputs:

  1. De-identify a list or CSV of clinical strings with deidentify and stdlib csv.
  2. Batch-redact a directory of .txt files via BatchProcessor(operation="deidentify").process_directory(...), writing redacted file copies.
  3. Reversible replace + re-identify round-trip with deidentify(method="replace", keep_mapping=True, consistent=True) and reidentify(...).
  4. Per-language model selection with DEFAULT_PII_MODELS.

The live examples use the lightweight 44M English PII model for CPU-friendly runs. Other language defaults are displayed but not executed to avoid larger downloads.

Changes

  • examples/notebooks/Deidentification_Cookbook.ipynb — new cookbook with four task-oriented recipes.
  • examples/notebooks/README.md — links the new cookbook.
  • tests/unit/test_deidentification_cookbook.py — lightweight structural tests that validate JSON shape, four recipe sections, synthetic-data disclaimer, cleared outputs, key public API examples, and the batch recipe's redacted-file output path.

Acceptance criteria

  • Notebook exists with the four recipe sections using synthetic data only.
  • README links the new cookbook.
  • Lightweight check confirms the notebook is valid JSON/structure.
  • Test suite green: .venv/bin/python -m pytest tests/ -q -> 1622 passed, 1 skipped.

Notes

  • The notebook's model-calling cells are not executed in CI, matching the issue's out-of-scope list.
  • No new development dependency is required for the structural notebook test.

Out of scope

  • FHIR/multimodal cookbooks, CI execution of full notebooks against live models, and replacing the existing complete-guide notebooks.

Closes #316

@pardeep-singh

Copy link
Copy Markdown
Contributor Author

@maziyarpanahi Please have a look at this for #316 Issue

Add examples/notebooks/Deidentification_Cookbook.ipynb, a task-oriented
cookbook of four copy-paste de-identification recipes using synthetic
PHI only:

- de-identify a list/CSV of clinical strings
- batch-redact a directory via BatchProcessor
- reversible replace + reidentify round-trip
- per-language model selection via DEFAULT_PII_MODELS

Link it from examples/notebooks/README.md and add a lightweight
nbformat-based test asserting the notebook is valid, has the four
recipe sections, uses a synthetic-data disclaimer, and has cleared
outputs. Add nbformat to the dev extra for the test.

Closes maziyarpanahi#316
@pardeep-singh pardeep-singh force-pushed the pardeep/issue316-deidentification-cookbook branch from 763c650 to 4ed0adf Compare June 19, 2026 16:01
@maziyarpanahi maziyarpanahi self-requested a review June 19, 2026 18:14
@maziyarpanahi maziyarpanahi added help wanted Extra attention is needed good first issue Good for newcomers roadmap-v2 OpenMed V2 roadmap backlog improvement Hardening / refactor of existing code P3 Strategic labels Jun 19, 2026
@maziyarpanahi

Copy link
Copy Markdown
Owner

Thanks @pardeep-singh for the contribution.

I pushed a follow-up commit (b4b5f02) that:

  • makes the batch directory recipe write redacted output files, not just print results
  • removes the unused import and trims the new cookbook/README wording
  • changes the structural notebook test to use stdlib JSON validation, so no new development dependency or lockfile update is needed
  • strengthens the test coverage for the expected public API snippets and redacted-output path

Verification:

This is ready to merge and will close #316.

@maziyarpanahi maziyarpanahi merged commit 4215298 into maziyarpanahi:master Jun 20, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

good first issue Good for newcomers help wanted Extra attention is needed improvement Hardening / refactor of existing code P3 Strategic roadmap-v2 OpenMed V2 roadmap backlog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a de-identification cookbook notebook with end-to-end recipes

2 participants