docs: add a de-identification cookbook notebook with end-to-end recipes#401
Merged
maziyarpanahi merged 2 commits intoJun 20, 2026
Conversation
Contributor
Author
|
@maziyarpanahi Please have a look at this for #316 Issue |
Add examples/notebooks/Deidentification_Cookbook.ipynb, a task-oriented cookbook of four copy-paste de-identification recipes using synthetic PHI only: - de-identify a list/CSV of clinical strings - batch-redact a directory via BatchProcessor - reversible replace + reidentify round-trip - per-language model selection via DEFAULT_PII_MODELS Link it from examples/notebooks/README.md and add a lightweight nbformat-based test asserting the notebook is valid, has the four recipe sections, uses a synthetic-data disclaimer, and has cleared outputs. Add nbformat to the dev extra for the test. Closes maziyarpanahi#316
763c650 to
4ed0adf
Compare
Owner
|
Thanks @pardeep-singh for the contribution. I pushed a follow-up commit (
Verification:
This is ready to merge and will close #316. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements #316 (OM-151): adds a task-oriented de-identification cookbook of short, copy-paste recipes, filling the gap between the existing complete-guide notebooks and common workflow snippets.
examples/notebooks/Deidentification_Cookbook.ipynbcovers four recipes with synthetic PHI only and cleared outputs:deidentifyand stdlibcsv..txtfiles viaBatchProcessor(operation="deidentify").process_directory(...), writing redacted file copies.deidentify(method="replace", keep_mapping=True, consistent=True)andreidentify(...).DEFAULT_PII_MODELS.The live examples use the lightweight 44M English PII model for CPU-friendly runs. Other language defaults are displayed but not executed to avoid larger downloads.
Changes
examples/notebooks/Deidentification_Cookbook.ipynb— new cookbook with four task-oriented recipes.examples/notebooks/README.md— links the new cookbook.tests/unit/test_deidentification_cookbook.py— lightweight structural tests that validate JSON shape, four recipe sections, synthetic-data disclaimer, cleared outputs, key public API examples, and the batch recipe's redacted-file output path.Acceptance criteria
.venv/bin/python -m pytest tests/ -q-> 1622 passed, 1 skipped.Notes
Out of scope
Closes #316