Skip to content

Defer context.unload() until after QA (keep the terminology cache from being overwritten mid-build)#1318

Draft
jmandel wants to merge 1 commit into
HL7:masterfrom
jmandel:fix/defer-context-unload-until-after-qa
Draft

Defer context.unload() until after QA (keep the terminology cache from being overwritten mid-build)#1318
jmandel wants to merge 1 commit into
HL7:masterfrom
jmandel:fix/defer-context-unload-until-after-qa

Conversation

@jmandel

@jmandel jmandel commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Problem

context.unload() is called in the generate phase's "Reclaiming memory…" block (PublisherGenerator), but QA (ValidationPresenter) runs afterwards in Publisher.createIg() and still uses the terminology context. unload() flushes the tx cache to disk and then clears the in-memory caches; QA then repopulates fresh, partial caches, and TerminologyCache's write-coalescing save() writes those back — overwriting the complete cache files unload() had just written.

Net effect: most persisted terminology results are discarded at the end of every build, so warm/subsequent builds re-query the tx server for answers the previous build already computed. Measured on US mCODE: a cold build performs ~2,600 persistent stores but only ~230 survive on disk; the SNOMED bucket grows to ~1,029 entries, is flushed in full, then overwritten down to ~86.

Fix

Move pf.context.unload() from the generate-phase reclaim block to the end of createIg() (after QA — the last terminology consumer). The final flush then captures everything and the persisted cache stays complete. The only cost is holding the terminology context through QA, which QA needs anyway.

Impact (measured)

We profiled cold-vs-warm terminology traffic across a random sample of published HL7 IGs. Warm-cache effectiveness (the fraction of cold tx round-trips a warm build avoids) was poor precisely for terminology-heavy clinical IGs:

  • light terminology: registry-protocols 100%, smart-scheduling-links 97%, smart-app-launch 89%
  • heavy terminology: US-Core 50%, physical-activity 44%, CARIN-BB 36%, IPS 30%, PQ-CMC-FDA 28%, mCODE 21%, ICHOM-breast-cancer 18%

The low numbers are explained by this bug. Demonstration on mCODE (baseline vs this fix, cold then warm, identical inputs, requests captured via a logging proxy):

warm tx calls on-disk cache entries (after cold) QA result
baseline 2,169 234 8 errors / 117 warnings
with this fix 69 2,532 8 errors / 117 warnings

Warm terminology-server calls drop ~97%, the complete cache persists (~10× more entries on disk), and validation output is unchanged.

Notes

  • Minimal change — moves one line (plus a comment and a release note).
  • The underlying TerminologyCache.save() full-file overwrite is also a latent fragility (a partial re-save after unload() clobbers the complete file); that could be separately hardened in org.hl7.fhir.core, but this publisher change fixes the root cause.
  • Marked draft pending review and a regression test.

@grahamegrieve

Copy link
Copy Markdown
Contributor

QA (ValidationPresenter) runs afterwards in Publisher.createIg() and still uses the terminology context.

really? where? how to reproduce this?

@jmandel

jmandel commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @grahamegrieve — here's a complete, runnable reproduction: published 2.2.8 vs the fix, on a fresh mCODE clone, no proxy. Metric = tx.fhir.org round-trips, counted with grep -c x-request-id output/qa-tx.html (every server response carries an x-request-id).

mCODE (fresh clone, no proxy) cold tx warm tx warm build time txcache after cold QA
published 2.2.8 1280 1075 6m 44s 5.0 MB / 254 entries 6 err / 117 warn
2.2.8 + fix 1210 63 4m 29s 14 MB / 2555 entries 6 err / 117 warn

Warm: 1075 → 63 calls (~94% fewer) and 6m44s → 4m29s (~33% faster — all of it in the terminology phases: generate −2m16s, validation −1m03s, narrative −25s). QA is byte-identical, so it's pure cache reuse, not changed output.

Reproduce (simple commands)

# --- bug: published publisher ---
curl -sL -o publisher.jar \
  https://github.com/HL7/fhir-ig-publisher/releases/download/2.2.8/publisher.jar
git clone --depth 1 https://github.com/HL7/fhir-mCODE-ig mcode      # SUSHI compiles clean
rm -rf mcode/input-cache/txcache
java -jar publisher.jar -ig mcode; grep -c x-request-id mcode/output/qa-tx.html   # 1280  cold
java -jar publisher.jar -ig mcode; grep -c x-request-id mcode/output/qa-tx.html   # 1075  warm

# --- fix: build it and repeat ---
git clone -b repro/defer-unload-on-2.2.8 \
  https://github.com/jmandel/fhir-ig-publisher fixpub
( cd fixpub && mvn -q -DskipTests clean package )
FIX=fixpub/org.hl7.fhir.publisher.cli/target/org.hl7.fhir.publisher.cli-2.2.8.jar
rm -rf mcode/input-cache/txcache
java -jar "$FIX" -ig mcode; grep -c x-request-id mcode/output/qa-tx.html          # 1210  cold
java -jar "$FIX" -ig mcode; grep -c x-request-id mcode/output/qa-tx.html          # 63    warm

What's happening (correction to my original description)

The late terminology consumer isn't QA / ValidationPresenter — it's genCombinedPackage() (PublisherGenerator.java:468), which runs in the same generate() method after the unload() at PublisherGenerator.java:379. unload() flushes the complete cache to disk and then clear()s it; genCombinedPackage's value-set $expands repopulate a partial cache whose coalesced save() overwrites the complete .cache files. Net result on disk: 254 of the 2555 entries the build computed survive (~90% discarded), so the next build reloads the partial cache and re-queries the rest. The fix just moves unload() to the end of createIg(), after both genCombinedPackage and QA — same final flush, but nothing runs after it to overwrite the cache.

One caveat on building against master

I couldn't run mCODE on a master build: current master (core 6.9.10-SNAPSHOT) hard-aborts publishing any R4 IG that emits ttl with "Turtle serialization for R4 is not supported in this build" (org.hl7.fhir.r5.elementmodel.TurtleParser.compose, before any of this cache code runs). So the numbers above come from the identical 2-line change applied on the 2.2.8 release base (which builds against released core 6.9.8, no auth, no Turtle issue). Flagging that R4-Turtle abort separately in case it isn't intended — it's unrelated to this PR.

@grahamegrieve

grahamegrieve commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

The actual problem is that genCombinedPackage() should not be called after unload, and it should be an error if the TerminologyCache is used after unload()

genCombinedPackage() (in generator.generate()) and QA both use the terminology
context, but context.unload() was called earlier, in the generate phase's
"reclaim memory" block. unload() flushes the complete tx cache to disk then
clears it; genCombinedPackage()'s value-set expansions then repopulate a partial
cache whose coalesced save() overwrites the complete .cache files, so warm builds
re-query everything already computed (mCODE: warm tx calls 1075 -> 63, build
6m44s -> 4m29s, QA unchanged).

Move unload() to the end of createIg(), after genCombinedPackage and QA.

Companion core guard (makes post-unload tx-cache use a hard error so this class
of ordering bug fails loudly): hapifhir/org.hl7.fhir.core#2473

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jmandel jmandel force-pushed the fix/defer-context-unload-until-after-qa branch from 33ddc8d to fb90a5d Compare June 5, 2026 00:49
@jmandel

jmandel commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Done — updated this branch and opened the core guard as a separate PR.

This branch (#1318): the rewrite is the same (move pf.context.unload() to the end of createIg(), after every terminology consumer). I corrected the code comments to attribute the late consumer to genCombinedPackage() rather than QA, and cross-referenced the core PR.

Core guard — hapifhir/org.hl7.fhir.core#2473 (draft): implements your "it should be an error if the TerminologyCache is used after unload()". unload() sets an unloaded flag; the four cache entry points (get/cacheExpansion, get/cacheValidation) throw if used afterward; unit test added (existing 42 TerminologyCache tests still pass, on master).

I made it an AssertionError rather than a RuntimeException on purpose: as a RuntimeException it was swallowed by the per-expansion catch (Exception) in genCombinedPackage — 102 logged errors but a "successful" build with a degraded expansions package. An AssertionError escapes that catch and fails the build, with a stack that pinpoints the culprit:

java.lang.AssertionError: Terminology cache used after unload()...
  at TerminologyCache.getExpansion(...)
  at BaseWorkerContext.expandVS(...)
  at PackageReGenerator.generateExpansionPackageInner(...)
  at PublisherGenerator.genCombinedPackage(...)
  at Publisher.createIg(...)

Verified end-to-end on mCODE: buggy ordering + guard → hard fail (above); this branch + guard → builds clean (guard never fires). Since no IG content can trigger it (it's purely a code-ordering invariant), failing hard is safe and turns any future stray post-unload() path into an immediate build failure.

Sequencing: #2473 is a draft because the publisher should adopt it together with this PR — on its own it would hard-fail every terminology-bearing R4 build until this ordering fix is in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

2 participants