You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+40-3Lines changed: 40 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,13 +9,50 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
9
9
10
10
### Added
11
11
12
-
- Added a FHIR R4 transaction Bundle assembler (`openmed.clinical.exporters.fhir.to_bundle`) that wraps a document's exported resources into a single Bundle, assigns deterministic `urn:uuid``fullUrl` values (seeded by `doc_id` + resource index), rewrites in-Bundle references (`subject`/`result`/`encounter`) to those URNs, and emits per-entry `request` blocks for transaction/batch bundles.
13
-
- Added release changelog tooling that renders Keep a Changelog sections from Conventional Commits, computes the SemVer bump, and exposes the computed next version to the PyPI publish workflow.
14
-
- Added `bootstrap_ci` and `compute_confidence_intervals` to `openmed.eval.metrics` and an opt-in `confidence_intervals` flag on the benchmark harness that attaches deterministic non-parametric bootstrap confidence intervals to the leakage, character recall, and span F1 metrics (off by default to keep fast runs cheap).
12
+
- Added a policy-aware de-identification runtime with canonical `OpenMedSpan` schema contracts, a ten-stage `Pipeline`, detector arbitration/cascade routing, calibrated per-label/language/policy thresholds, deterministic safety sweep backstops, and six bundled policy profiles (`hipaa_safe_harbor`, `hipaa_expert_review_assist`, `gdpr_pseudonymization`, `research_limited_dataset`, `strict_no_leak`, `clinical_minimal_redaction`).
13
+
- Added signed, reproducible de-identification audit reports with span provenance, residual-risk metadata, reproducibility hashes, and optional HMAC signatures.
14
+
- Added re-identification risk reporting and adversarial re-identification benchmark support, including `openmed benchmark pii --attack reid`.
15
+
- Added a leakage-first evaluation harness with `BenchmarkReport`, synthetic golden de-identification fixtures, public/reference dataset adapters, DUA-gated corpus stubs, SHIELD comparison-suite support, weak labeling utilities, cold-start latency, and deterministic bootstrap confidence intervals.
16
+
- Added release-gate infrastructure for v1.6.0 model readiness: last-green baselines, calibration artifacts, G1a-G8 signed gate reports, quantization recall-delta checks, generated status/leaderboard pages, and a fail-closed release-gates workflow.
17
+
- Added clinical and interoperability utilities: ConText temporality and uncertainty axes, OHDSI Athena/Usagi ingestion, a Presidio adapter, and a deterministic FHIR R4 transaction/batch Bundle assembler.
18
+
- Added a cardiology zero-shot label-map domain (`CardiacFinding`, `ECGFinding`, `EjectionFraction`, `CardiacProcedure`, `CardiacDevice`, `Anatomy`) plus cardiology keyword routing metadata for future model registration. Public model suggestions continue to fall back to existing general medical models until a cardiology model is registered.
19
+
- Added a canonical `models.jsonl` manifest, manifest refresh workflow, manifest-driven Hugging Face model card generation, and HF publishing support for converted MLX/CoreML artifacts.
20
+
- Added a packaged `openmed` CLI surface with benchmark and calibration commands, plus a de-identification cookbook notebook and an offline clinical NER families example.
21
+
- Added governance, compliance, security, device-tier, FAQ, API reference, release-channel, status, leaderboard, and notebook documentation.
22
+
23
+
### Changed
24
+
25
+
-`deidentify()` now routes through the staged policy pipeline and accepts policy, calibration, threshold, and audit controls. When `audit=True`, it returns an audit report rather than the regular `DeidentificationResult`.
26
+
-`deidentify(..., keep_mapping=True)` now emits unique placeholders for repeated entities of the same type, such as `[NAME]` and `[NAME_2]`, so re-identification round trips can distinguish them.
27
+
- Label metadata now carries policy labels, HIPAA Safe Harbor mappings, risk levels, and ID-number subtype hints while keeping canonical labels stable.
28
+
- Benchmark steady-state latency now excludes cold start while preserving `latency.cold_start_ms` in reports.
29
+
- PyPI publishing now uses a single guarded tag/manual `publish.yml` workflow; the duplicate release workflow was removed.
30
+
- Release metadata now derives changelog sections and expected SemVer bumps from Conventional Commits.
31
+
- Python linting/formatting moved to Ruff and pre-commit, Swift formatting moved to checked-in `swift-format` scripts, and CI now enforces the updated repo policy, lint, tests, security, secret-scan, Swift-format, and release-gate jobs.
32
+
- Packaging now includes the model manifest, release-gate baseline, policy/schema JSON, `LICENSE`, and `NOTICE`.
15
33
16
34
### Fixed
17
35
18
36
- REST/MCP request schemas now accept `ar`, `ja`, and `tr` for the `lang` field. These languages have published PII models and are listed in `SUPPORTED_LANGUAGES`, but the `lang``Literal` in `openmed/service/schemas.py` was never updated, so the service rejected them with a 422 even though the Python API and the models worked. The four `lang` annotations now share a single `PIILanguage` alias kept in sync with `SUPPORTED_LANGUAGES` (guarded by a regression test).
37
+
- Fixed case-insensitive `trust_remote_code` allowlist matching for first-party and environment-configured privacy-filter repositories.
38
+
- Fixed Feb 29 date shifting when `keep_year=True` targets a non-leap year.
39
+
- Fixed REST oversized-text handling with `OPENMED_SERVICE_MAX_TEXT_LENGTH` (default `1_000_000` characters).
40
+
- Fixed `BatchProcessor.iter_process` so `batch_size` is honored while preserving output order.
41
+
- Fixed duplicate benchmark fixture IDs, duplicate benchmark CLI registration, release-gate behavior when no candidate report is present, and repo-policy ignored-file handling.
42
+
- Fixed user-controlled HTML formatter escaping and validation false positives for legitimate long non-ASCII/CJK clinical text.
43
+
- Fixed reversible `remove` mappings and repeated entity-type re-identification round trips when `keep_mapping=True`.
44
+
45
+
### Security
46
+
47
+
- Added a protected `hf-publish` environment and `HF_WRITE_TOKEN` policy for model publishing.
48
+
- Added dependency license policy, `pip-audit` security gate with time-boxed ignores, and gitleaks CI/pre-commit secret scanning with a canary fixture.
49
+
50
+
### Notes
51
+
52
+
-`shift_dates` remains available as a compatibility alias; prefer `method="shift_dates"` in new code.
53
+
- REST clients sending more than `OPENMED_SERVICE_MAX_TEXT_LENGTH` characters now receive a 422 response unless the limit is raised.
54
+
- Full SHIELD/DUA datasets require approved or user-supplied access paths; restricted corpus rows are not vendored.
55
+
- Release-gate candidates for v1.6.0 need release metadata, calibration evidence for masking/replacement profiles, span fixtures for G8, and quantization evidence for quantized formats.
0 commit comments