test(server): Scenario based testing to mirror Tim's tests#1286
test(server): Scenario based testing to mirror Tim's tests#1286robertmitchellv wants to merge 25 commits into
Conversation
🔒 Security Scan Results
|
| Severity | Total |
|---|---|
| 🟠 High | 10 |
| 🟡 Medium | 4 |
📦 refiner-app
✅ No vulnerabilities found
📦 refiner-lambda
| Severity | Count |
|---|---|
| 🟠 High | 1 |
| 🟡 Medium | 2 |
📦 refiner-ops
| Severity | Count |
|---|---|
| 🟠 High | 9 |
| 🟡 Medium | 2 |
View detailed results: Security tab
Last updated: 2026-06-05 20:45:32 UTC
|
Some of my initial thoughts & observations:
|
| assert result.trace.refinement_outcome == "refined" | ||
| assert result.trace.configuration_resolved is True | ||
| assert result.trace.configuration_version == CONFIGURATION_VERSION | ||
| assert result.trace.canonical_url == COVID_CANONICAL_URL | ||
| assert result.trace.skip_reason is None | ||
| assert result.trace.error_detail is None | ||
|
|
||
| # size reduction must be populated and positive: refining for COVID | ||
| # against a multi-condition eICR drops the non-COVID content. | ||
| assert result.trace.eicr_size_reduction_percentage is not None | ||
| assert result.trace.eicr_size_reduction_percentage > 0 |
There was a problem hiding this comment.
Is it worth making these checks? Are we able to make it this far if a lot of these values are wrong or missing?
There was a problem hiding this comment.
you're not wrong; this was the first thing that i worked on to make sure i had everything up and running so i can test. i decided to leave it in as more or less a smoke test but i would also not be opposed to removing it
There was a problem hiding this comment.
I was going back and forth on this. The thing that is a bit odd about trace is that it kind of acts as both a refining input but also a logging tool.
If refining can't go on without the values being correct I would say it's fine to remove.
There was a problem hiding this comment.
i whittled this down so let me know if it works for you
|
I think I'm team script over commit when it comes to configurations for a few reasons.
If we do do the above, Jake's point of moving things into Do really love the report generation though, think that'll be super helpful once we get it going |
🔀 PULL REQUEST
💡 Summary
Adds a scenarios test suite at
tests/integration/scenarios/that exercises the production refinement pipeline--the same path lambda runs--against committed eICR/RR fixtures. Each scenario authors its configuration through the live API (create → customize → activate → read the activation payload back from localstack), then asserts in two layers: validation (CDA R2 XSD + schematron) and snapshot comparison against committed expected files.Configurations are built from declarative recipes rather than committed JSON: a
Scenarionames the condition plus the custom codes, section-processing overrides, and associated conditions to layer on, and a sharedbuild_scenario_configurationfixture interprets the recipe against the running app. This gives a single authoring path that uses the exact serialization production uses, with no committedactive.jsonto drift--updating a scenario means editing its recipe, not hand-editing or re-downloading JSON.The suite contains 8 snapshot scenarios on the
all_sections_COVID_INFLUENZAfixture, 8 explicit named-assertion tests, and 3 smoke tests pinning the harness itself (19 total). Every refined document is XSD- and schematron-validated on every test run. An auto-generatedREPORT.mdsummarizes the suite for stakeholder review.Because the suite lives under
tests/integration/, it runs as part of the existing integration CI job automatically--no separate CI wiring needed, and these tests passed in CI on this PR.All six issues from the original testing roll-up sheet are directly covered: entry matching across OID mismatch, custom codes in nested locations (
entryRelationship/valueandsubstanceAdministration), procedure retention via unrelatedentryRelationshipcodes, vital sign panel pruning, code-set bloat, and schematron conformance of refined output. The previously informal "Tim runs through a spreadsheet manually" workflow is now a CI-runnable suite.🔗 Related Issue
Fixes #1240
✅ Acceptance Criteria
REPORT.md) is generated deterministically from the committed snapshots.🧪 How to test
cd refiner/.requirements.txtanddev-requirements.txt.tests/integration/.pytest tests/integration/scenarios/.tests/integration/scenarios/REPORT.mdfor the stakeholder-facing summary of what the suite pins.tests/integration/scenarios/snapshots/<fixture>/<scenario>/to see committed expected output per scenario (expected_trace.jsonis the most legible starting point).tests/integration/scenarios/README.mdfor the full developer workflow: authoring new scenarios, adding new fixtures, regenerating the report.To verify the snapshot regeneration workflow:
The diff should be empty if the committed state is current; a non-empty diff indicates drift between committed snapshots and current refinement output, or between snapshots and the report.
ℹ️ Additional Information
Design choices worth flagging for review:
active.jsonto maintain.20260101000000+0000) so deterministic UUIDv5 identifier derivation produces stable snapshots run-to-run. Without this, snapshots would never match between runs.SDDH), not the RR's reportable-to jurisdiction. The fixture RR may be reportable elsewhere; that is intentionally not consulted, so arbitrary test data can be reused without standing up matching fake jurisdictions.current.jsonrather than assuming version 1, so tests that build more than one configuration for the same condition resolve each correctly.test_explicit_assertions.pycomplement the snapshot scenarios. They build configs through the same recipe path and pin specific named behaviors with precondition guards that fail diagnostically if the fixture or configuration drifts in a way that makes the test no longer exercise its named concern.Deferred to follow-up PRs (small, well-scoped):
python tests/integration/scenarios/authoring/build_report.pyand fails ongit diff --exit-code REPORT.mdto prevent the report from going stale relative to snapshots. ATODOat the top ofbuild_report.pyflags this.normalize_xml(currently imported fromtests/unit/conftest.py) to a sharedtests/_helpers/module. The scenarios suite still reaches into the unit suite for this one helper; functional but architecturally awkward. (The integration validation helpers are now inherited/imported from the parent integration conftest, which is the suite's natural parent.)Runtime: 19 tests; the suite now runs on the integration path (docker compose / localstack) rather than infra-free, since configs are authored through the live API. Each scenario does a full create → activate → fetch cycle; if the suite grows substantially, a session-scoped cache of built configurations keyed by scenario name is the natural mitigation.