fix: shift dates for the default model by matching the canonical label#559
Merged
maziyarpanahi merged 2 commits intoJun 22, 2026
Merged
Conversation
_redact_entity and the keep_mapping occurrence counter decided whether a span was a date by comparing the raw entity.entity_type against the literal "DATE". entity_type holds the model's raw label, and the default English model (OpenMed-PII-SuperClinical-Small-44M-v1) emits a lowercase "date", so date spans never matched and silently fell through to [DATE] mask placeholders instead of being shifted. Compare the canonical label instead (entity.canonical_label, falling back to normalize_label of the raw label) via a small _is_date_entity helper, applied in both the shift branch and the occurrence counter. Adds a regression test covering a date entity whose raw label is not literally "DATE". Fixes maziyarpanahi#513.
maziyarpanahi
approved these changes
Jun 22, 2026
maziyarpanahi
left a comment
Owner
There was a problem hiding this comment.
Thank you @ardittirana. I reviewed this against #513 / OM-323 and added one maintainer follow-up commit: test: cover canonical date shifting labels.
What I changed:
- extended
_is_date_entity()soDATE_OF_BIRTHis treated as a shiftable date label, not justDATE; - added end-to-end mocked
deidentify()regressions for the default English model's lowercasedatelabel; - added coverage for raw
date_of_birthlabels; - added
keep_mapping=Truecoverage to ensure shifted dates are not counted or suffixed like mask placeholders.
Verification on the current PR checkout:
PYTHONPATH=/private/tmp/openmed-pr-559 /Users/maziyar/Developer/openmed/.venv/bin/python -m pytest tests/unit/test_pii.py -q-> 116 passed/Users/maziyar/Developer/openmed/.venv/bin/ruff check openmed/core/pii.py tests/unit/test_pii.py CHANGELOG.md-> passed/Users/maziyar/Developer/openmed/.venv/bin/ruff format --check openmed/core/pii.py tests/unit/test_pii.py-> passed
I also copied the labels from #513 onto the PR. The branch is mergeable with no conflicts; GitHub has not attached hosted checks to the new head commit yet, so I verified the touched behavior locally.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
method="shift_dates"silently masked dates for the default English model instead of shifting them.In
_redact_entityand thekeep_mappingoccurrence counter in_build_deidentification_result, a span was treated as a date by comparing the rawentity.entity_typeto the literal"DATE". Butentity_typeholds the model's raw label, and the default modelOpenMed-PII-SuperClinical-Small-44M-v1emits a lowercasedate. So date spans never matched and fell through to[DATE]mask placeholders —shift_datesdegraded to masking for the default model.Fix: decide date-ness from the canonical label via a small
_is_date_entityhelper (entity.canonical_label, falling back tonormalize_label(entity.entity_type, lang)), applied at both sites.Change type
Bug fix (no API change, no new dependency).
Tests run
pytest tests/unit/test_pii.py— 113 passed (addstest_redact_shift_dates_uses_canonical_label, which fails on the base branch and passes here).ruff check/ruff format --check— clean.Docs / changelog
Added an entry under
[Unreleased] > FixedinCHANGELOG.md.Linked issue
Fixes #513 (addresses #408).