Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions refiner/app/services/ecr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ The pipeline creates an `AugmentationContext` for each document before any work
All of this is per the eICR Data Augmentation Header template (`urn:hl7ii:2.16.840.1.113883.10.20.15.2.1.3:2025-11-01`):

- **`templateId`** — signals the document conforms to the augmentation header template
- **`id`** — new UUID with `assigningAuthorityName="ecr-refinement"`
- **`id`** — new UUID with `assigningAuthorityName="ecr-refiner"`
- **`effectiveTime`** — timestamp of the augmentation operation (with timezone)
- **`setId`** — new UUID (replaces original, or inserted if absent)
- **`versionNumber`** — reset to 1
Expand All @@ -57,9 +57,9 @@ Refinement attaches an unanchored `<footnote>` to every section in the refined e

The "what was configured" and "what actually happened" columns usually agree, but they can diverge. The most common divergence is the no-match case: a jurisdiction configures a section for refinement, the matching step finds nothing in the section that matches the configured codes, and the refiner stubs the section rather than preserving an orphaned narrative. The footnote makes that decision visible — a reviewer sees "Action: refine, Outcome: Refined; no matches found" in the same row and doesn't have to wonder why a refine-configured section came out empty.

The footnote ID is built from the section's LOINC code and the augmentation timestamp (`ecr-refinement-{loinc}-{timestamp}`), so every footnote in a refinement run is structurally tied to the augmentation author's `<time>` value. A consumer can verify document integrity by checking that all footnote IDs in a document carry the same timestamp the augmentation header advertises.
The footnote ID is built from the section's LOINC code and the augmentation timestamp (`ecr-refiner-{loinc}-{timestamp}`), so every footnote in a refinement run is structurally tied to the augmentation author's `<time>` value. A consumer can verify document integrity by checking that all footnote IDs in a document carry the same timestamp the augmentation header advertises.

The user-facing labels for the configuration source and the runtime outcome live in `section/constants.py` as small dicts keyed by enum values. Editing the copy is one file change with no code touches.
The user-facing labels for the configuration source and the runtime outcome live in `narrative/constants.py` as small dicts keyed by enum values. Editing the copy is one file change with no code touches.

## Supporting modules

Expand Down
61 changes: 61 additions & 0 deletions refiner/app/services/ecr/narrative/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Section narrative writers

This package owns every transformation the refiner makes to a CDA
`<section>`'s human-readable narrative `<text>` element. A section's `<text>`
is what a reviewer sees when they open a refined eICR in a CDA stylesheet; the
machine-readable `<entry>` elements are handled elsewhere (the matching
engines). These writers decide what story the `<text>` tells about what the
refiner did.

## Why this is its own module

Everything that touches a section's `<text>` lives here so the narrative
behavior — and the CDA R2 validity rules it has to respect — can be reasoned
about in one place. The matching engines (`entry_matching`, `generic_matching`)
and the orchestrator (`refine.py`) call into this package; they never build
narrative elements directly.

## Layout

- **`elements.py`** — the shared low-level primitives. `_make_element` /
`_sub_element` emit namespace-qualified elements (every node written into
`<text>` must carry the `urn:hl7-org:v3` namespace or it fails
`NarrativeBlock.xsd`). `_ensure_text_element` places a `<text>` in the
correct CDA R2 `xs:sequence` slot. `remove_all_comments` scrubs stale source
comments. Every other module here builds on these.

- **`footnote.py`** — the per-section provenance footnote. Refinement attaches
an unanchored `<footnote>` to every section (refined, retained, removed, or
narrative-stripped) carrying a one-row table: what the jurisdiction
configured vs. what the refiner actually did. The footnote's `xs:ID` encodes
the augmentation run's timestamp so a consumer can structurally tie every
footnote to the document's augmentation header.

- **`writers.py`** — the narrative-body writers that replace or stub a
section's `<text>`:
- `replace_narrative_with_removal_notice` — strip the narrative to a notice
while keeping clinical entries for machine processing.
- `restore_narrative` — put back a saved `<text>` deep copy (the generic
matching path clears `<text>` during processing to avoid false matches,
then restores it).
- `create_minimal_section` — reduce a section to a `nullFlavor="NI"` stub
with a status message (no match found, or configured for removal).

## Invariants

- **Namespace everything.** All emitted elements go through
`_make_element` / `_sub_element`. A bare (unprefixed) element silently fails
`NarrativeBlock.xsd` validation.
- **Respect the `xs:sequence`.** A `<text>` must sit after `<title>` (or
`<code>`) in `StrucDoc.Section`. Insertion always goes through the placement
helpers rather than a bare `append`.
- **These functions mutate the section in place.** Consistent with the rest of
the `ecr` service; the pipeline owns parse/serialize.

## Planned: narrative reconstruction

A third narrative disposition — reconstruct the `<text>` from the entries that
survived refinement — will land here as a `reconstruction.py` peer of
`writers.py`, built on the same `elements.py` primitives. See
`docs/decisions/0010_2026-06-05_narrative-reconstruction.md` for the design
(typed-value renderer + per-`template_id` field maps + per-section joins).
19 changes: 19 additions & 0 deletions refiner/app/services/ecr/narrative/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from .elements import remove_all_comments
from .footnote import append_section_provenance_footnote
from .reconstruction import reconstruct_narrative
from .writers import (
create_minimal_section,
replace_narrative_with_reconstruction,
replace_narrative_with_removal_notice,
restore_narrative,
)

__all__ = [
"append_section_provenance_footnote",
"create_minimal_section",
"reconstruct_narrative",
"remove_all_comments",
"replace_narrative_with_reconstruction",
"replace_narrative_with_removal_notice",
"restore_narrative",
]
Original file line number Diff line number Diff line change
Expand Up @@ -33,18 +33,8 @@
# NOTE:
# TABLE HEADERS
# =============================================================================
# column headers for the narrative tables the refiner writes. the clinical
# data table headers describe the columns for the refined clinical content
# table; the provenance table headers describe the columns for the
# per-section provenance footnote table

CLINICAL_DATA_TABLE_HEADERS: Final[list[str]] = [
"Display Text",
"Code",
"Code System",
"Is Trigger Code",
"Matching Condition Code",
]
# column headers for the per-section provenance footnote table the refiner
# writes into every section's narrative

PROVENANCE_TABLE_HEADERS: Final[list[str]] = [
"Section (LOINC)",
Expand Down
105 changes: 105 additions & 0 deletions refiner/app/services/ecr/narrative/elements.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
from lxml import etree
from lxml.etree import _Element

from app.services.format import remove_element

from ..model import (
HL7_NAMESPACE,
HL7_NS,
)

# NOTE:
# ELEMENT FACTORY HELPERS
# =============================================================================
# every element emitted into <text> must be qualified with the HL7 v3
# namespace for NarrativeBlock.xsd validation to pass


def _make_element(local_name: str, **attribs: str) -> _Element:
"""
Create a namespace-qualified narrative element.

Returns a detached element in the urn:hl7-org:v3 namespace. Use
`_sub_element` instead when the new element should be appended
to an existing parent.
"""

element = etree.Element(f"{{{HL7_NAMESPACE}}}{local_name}")
for key, value in attribs.items():
element.set(key, value)
return element


def _sub_element(parent: _Element, local_name: str, **attribs: str) -> _Element:
"""
Create a namespace-qualified child element appended to `parent`.

Thin wrapper around etree.SubElement that applies Clark notation
for the HL7 v3 namespace, matching the pattern used in augment.py.
"""

element = etree.SubElement(parent, f"{{{HL7_NAMESPACE}}}{local_name}")
for key, value in attribs.items():
element.set(key, value)
return element


# NOTE:
# TEXT PLACEMENT HELPERS
# =============================================================================


def _ensure_text_element(section: _Element) -> _Element:
"""
Return the section's <text> element, creating one if absent.

If the section has no <text>, a new empty <text> is created and
inserted after <title> per the CDA R2 xs:sequence for
StrucDoc.Section: templateId -> id -> code -> title -> text ->
confidentialityCode -> languageCode -> subject -> author ->
informant -> entry -> component.

If there is no <title> either, the <text> is inserted after
<code>, which is the next-earliest required element in the
sequence. Last resort: append to the section.
"""

text_element = section.find("hl7:text", namespaces=HL7_NS)
if text_element is not None:
return text_element

text_element = _make_element("text")

title_element = section.find("hl7:title", namespaces=HL7_NS)
if title_element is not None:
title_element.addnext(text_element)
return text_element

code_element = section.find("hl7:code", namespaces=HL7_NS)
if code_element is not None:
code_element.addnext(text_element)
return text_element

section.append(text_element)
return text_element


# NOTE:
# COMMENT CLEANUP
# =============================================================================


def remove_all_comments(section: _Element) -> None:
"""
Remove all XML comments from a processed section.

After refining a section, inline comments left over from the source
document may no longer be accurate or relevant. This scrubs them
so the refined output doesn't carry misleading annotations forward.
"""

xpath_result = section.xpath(".//comment()")
if isinstance(xpath_result, list):
for comment in xpath_result:
if isinstance(comment, etree._Element):
remove_element(comment)
169 changes: 169 additions & 0 deletions refiner/app/services/ecr/narrative/footnote.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
import re

from lxml.etree import _Element

from ..model import SectionProvenanceRecord
from .constants import (
PROVENANCE_LABEL,
PROVENANCE_OUTCOME_NOTES,
PROVENANCE_SOURCE_NOTES,
PROVENANCE_TABLE_HEADERS,
)
from .elements import _ensure_text_element, _sub_element

# NOTE:
# PROVENANCE FOOTNOTE
# =============================================================================
# every section in the refined document carries a trailing <footnote>
# documenting how the refiner treated it. the footnote is unanchored:
# no <footnoteRef> points to it. This represents "annotation attached
# to the section as a whole" — valid per NarrativeBlock.xsd's
# StrucDoc.Text and StrucDoc.Footnote content models (both allow
# footnote as an optional child with no anchoring requirement) and
# sidesteps the need to walk arbitrary source narrative looking for
# anchor points, which would be fragile across eICR vendors
#
# the footnote's xs:ID ties it to the augmentation run's timestamp,
# giving the two structural consistency that a consumer can verify
# programmatically (e.g., "every refiner footnote ID should contain
# the timestamp present in the augmentation author's <time> value")
#
# the footnote's data row carries both the configured action ("what
# the jurisdiction asked for") and the runtime outcome ("what the
# refiner actually did"). the two columns let a reader see at a glance
# whether a refiner policy override fired — most rows show the outcome
# confirming the configuration, but the no-match policy override
# produces an outcome that diverges from the configured action


def _build_footnote_id(
loinc_code: str,
augmentation_timestamp: str,
occurrence_index: int = 0,
) -> str:
"""
Build a document-unique xs:ID for a refiner provenance footnote.

The ID is of the form
`ecr-refiner-{loinc}-{timestamp-digits}`, optionally with a
`-{n}` suffix for the rare case where the same LOINC appears on
multiple top-level sections in a single document. The timestamp
digits are extracted from the augmentation author's <time> value
(HL7 V3 `YYYYMMDDHHMMSS±ZZZZ` format) by keeping the leading
run of digits — the timezone offset is stripped because `+` and
the offset digits are not wanted in the ID.

xs:ID cannot start with a digit or hyphen, so the `ecr-refiner-`
prefix is load-bearing: it ensures the resulting string always
satisfies the XML Name production.

Args:
loinc_code: The section's LOINC code (e.g., "46240-8").
augmentation_timestamp: The augmentation author's time value,
shared across all footnotes in this refinement run.
occurrence_index: Zero-based disambiguator for the rare case
where the same LOINC appears on multiple top-level
sections. Zero (the normal case) produces no suffix;
nonzero values append `-N`.

Returns:
A document-unique xs:ID-safe string.
"""

match = re.match(r"^\d+", augmentation_timestamp)
timestamp_digits = match.group(0) if match else ""
base = f"ecr-refiner-{loinc_code}-{timestamp_digits}"
return base if occurrence_index == 0 else f"{base}-{occurrence_index}"


def append_section_provenance_footnote(
section: _Element,
provenance: SectionProvenanceRecord,
augmentation_timestamp: str,
occurrence_index: int = 0,
) -> None:
"""
Append an unanchored <footnote> carrying refiner provenance.

Called by refine_eicr after processing every section (refine,
retain, remove, narrative-removed) so that every section in the
refined document carries a consistent provenance record.

The footnote contains a bolded label paragraph followed by a
single-row table summarizing the jurisdiction's configuration
and the runtime outcome for this section. The table follows
NarrativeBlock.xsd's StrucDoc.Table content model with proper
<thead>/<th> header semantics and <tbody>/<tr>/<td> body rows.

The provenance record passed in must have its `outcome` field
finalized — refine_eicr does this via dataclasses.replace before
calling this function. If the field still holds its default
value at render time, that's a bug in refine_eicr's
interpretation logic, not in this function.

If the section has no <text> element (e.g., a retained section
where the source document omitted it), one is created and inserted
per `_ensure_text_element`'s CDA R2 xs:sequence rules.

Args:
section: The section element to annotate.
provenance: The SectionProvenanceRecord built during plan
creation and finalized by refine_eicr.
augmentation_timestamp: The augmentation run's <time> value,
shared across all footnotes in this refinement run.
occurrence_index: Disambiguator for repeated-LOINC sections;
zero for the normal case.
"""

text_element = _ensure_text_element(section)

footnote_id = _build_footnote_id(
loinc_code=provenance.loinc_code,
augmentation_timestamp=augmentation_timestamp,
occurrence_index=occurrence_index,
)
footnote = _sub_element(text_element, "footnote", ID=footnote_id)

# bolded label paragraph
label_paragraph = _sub_element(footnote, "paragraph")
label_content = _sub_element(label_paragraph, "content", styleCode="Bold")
label_content.text = PROVENANCE_LABEL

# provenance table
table = _sub_element(footnote, "table", border="1")
thead = _sub_element(table, "thead")
header_row = _sub_element(thead, "tr")
for header in PROVENANCE_TABLE_HEADERS:
th = _sub_element(header_row, "th")
th.text = header

tbody = _sub_element(table, "tbody")
row = _sub_element(tbody, "tr")
_add_provenance_cell(row, provenance.loinc_code)
_add_provenance_cell(row, provenance.display_name)
_add_provenance_cell(row, "Yes" if provenance.include else "No")
_add_provenance_cell(row, provenance.action)
_add_provenance_cell(row, "Yes" if provenance.narrative == "retain" else "No")
_add_provenance_cell(
row,
f"v{provenance.config_version}"
if provenance.config_version is not None
else "—",
)
_add_provenance_cell(
row,
PROVENANCE_SOURCE_NOTES.get(provenance.source, str(provenance.source)),
)
_add_provenance_cell(
row,
PROVENANCE_OUTCOME_NOTES.get(provenance.outcome, str(provenance.outcome)),
)


def _add_provenance_cell(row: _Element, text: str) -> None:
"""
Append a single <td> with text content to a provenance table row.
"""

td = _sub_element(row, "td")
td.text = text
Loading
Loading