Skip to content

fix: shift dates for the default model by matching the canonical label#559

Merged
maziyarpanahi merged 2 commits into
maziyarpanahi:masterfrom
ardittirana:fix-shift-dates-canonical-label
Jun 22, 2026
Merged

fix: shift dates for the default model by matching the canonical label#559
maziyarpanahi merged 2 commits into
maziyarpanahi:masterfrom
ardittirana:fix-shift-dates-canonical-label

Conversation

@ardittirana

Copy link
Copy Markdown
Contributor

Description

method="shift_dates" silently masked dates for the default English model instead of shifting them.

In _redact_entity and the keep_mapping occurrence counter in _build_deidentification_result, a span was treated as a date by comparing the raw entity.entity_type to the literal "DATE". But entity_type holds the model's raw label, and the default model OpenMed-PII-SuperClinical-Small-44M-v1 emits a lowercase date. So date spans never matched and fell through to [DATE] mask placeholders — shift_dates degraded to masking for the default model.

Fix: decide date-ness from the canonical label via a small _is_date_entity helper (entity.canonical_label, falling back to normalize_label(entity.entity_type, lang)), applied at both sites.

Change type

Bug fix (no API change, no new dependency).

Tests run

  • pytest tests/unit/test_pii.py — 113 passed (adds test_redact_shift_dates_uses_canonical_label, which fails on the base branch and passes here).
  • ruff check / ruff format --check — clean.

Docs / changelog

Added an entry under [Unreleased] > Fixed in CHANGELOG.md.

Linked issue

Fixes #513 (addresses #408).

ardittirana and others added 2 commits June 21, 2026 23:47
_redact_entity and the keep_mapping occurrence counter decided whether a
span was a date by comparing the raw entity.entity_type against the
literal "DATE". entity_type holds the model's raw label, and the default
English model (OpenMed-PII-SuperClinical-Small-44M-v1) emits a lowercase
"date", so date spans never matched and silently fell through to [DATE]
mask placeholders instead of being shifted.

Compare the canonical label instead (entity.canonical_label, falling back
to normalize_label of the raw label) via a small _is_date_entity helper,
applied in both the shift branch and the occurrence counter.

Adds a regression test covering a date entity whose raw label is not
literally "DATE". Fixes maziyarpanahi#513.
@maziyarpanahi maziyarpanahi added bug Something isn't working P1 High roadmap-v2 OpenMed V2 roadmap backlog labels Jun 22, 2026

@maziyarpanahi maziyarpanahi left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ardittirana. I reviewed this against #513 / OM-323 and added one maintainer follow-up commit: test: cover canonical date shifting labels.

What I changed:

  • extended _is_date_entity() so DATE_OF_BIRTH is treated as a shiftable date label, not just DATE;
  • added end-to-end mocked deidentify() regressions for the default English model's lowercase date label;
  • added coverage for raw date_of_birth labels;
  • added keep_mapping=True coverage to ensure shifted dates are not counted or suffixed like mask placeholders.

Verification on the current PR checkout:

  • PYTHONPATH=/private/tmp/openmed-pr-559 /Users/maziyar/Developer/openmed/.venv/bin/python -m pytest tests/unit/test_pii.py -q -> 116 passed
  • /Users/maziyar/Developer/openmed/.venv/bin/ruff check openmed/core/pii.py tests/unit/test_pii.py CHANGELOG.md -> passed
  • /Users/maziyar/Developer/openmed/.venv/bin/ruff format --check openmed/core/pii.py tests/unit/test_pii.py -> passed

I also copied the labels from #513 onto the PR. The branch is mergeable with no conflicts; GitHub has not attached hosted checks to the new head commit yet, so I verified the touched behavior locally.

@maziyarpanahi maziyarpanahi merged commit fb6669f into maziyarpanahi:master Jun 22, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working P1 High roadmap-v2 OpenMed V2 roadmap backlog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix shift_dates silently masking dates for the default English model

2 participants