test(server): Scenario based testing to mirror Tim's tests by robertmitchellv · Pull Request #1286 · CDCgov/dibbs-ecr-refiner

robertmitchellv · 2026-05-29T05:54:48Z

🔀 PULL REQUEST

💡 Summary

Adds a scenarios test suite at tests/integration/scenarios/ that exercises the production refinement pipeline--the same path lambda runs--against committed eICR/RR fixtures. Each scenario authors its configuration through the live API (create → customize → activate → read the activation payload back from localstack), then asserts in two layers: validation (CDA R2 XSD + schematron) and snapshot comparison against committed expected files.

Configurations are built from declarative recipes rather than committed JSON: a Scenario names the condition plus the custom codes, section-processing overrides, and associated conditions to layer on, and a shared build_scenario_configuration fixture interprets the recipe against the running app. This gives a single authoring path that uses the exact serialization production uses, with no committed active.json to drift--updating a scenario means editing its recipe, not hand-editing or re-downloading JSON.

The suite contains 8 snapshot scenarios on the all_sections_COVID_INFLUENZA fixture, 8 explicit named-assertion tests, and 3 smoke tests pinning the harness itself (19 total). Every refined document is XSD- and schematron-validated on every test run. An auto-generated REPORT.md summarizes the suite for stakeholder review.

Because the suite lives under tests/integration/, it runs as part of the existing integration CI job automatically--no separate CI wiring needed, and these tests passed in CI on this PR.

All six issues from the original testing roll-up sheet are directly covered: entry matching across OID mismatch, custom codes in nested locations (entryRelationship/value and substanceAdministration), procedure retention via unrelated entryRelationship codes, vital sign panel pruning, code-set bloat, and schematron conformance of refined output. The previously informal "Tim runs through a spreadsheet manually" workflow is now a CI-runnable suite.

🔗 Related Issue

Fixes #1240

[TASK] Set up mirror of Tim's testing in our CI/CD #1240

✅ Acceptance Criteria

A new test layer exercises the production refinement pipeline end-to-end against committed fixtures, with configurations authored through the live API so there is a single authoring path and no committed configuration JSON to drift.
Snapshots pin refinement behavior (entry matching, identifier derivation, section pruning) such that regressions surface as snapshot diffs rather than silent drift.
All six issues from the original roll-up sheet are directly covered, with five additionally backed by named explicit assertions that fail with a behavior-level message rather than an opaque snapshot diff.
Every refined document is validated against CDA R2 XSD and schematron on every test run; errors and fatal severity fail the test.
A stakeholder-facing report (REPORT.md) is generated deterministically from the committed snapshots.
Developer workflow for authoring new scenarios and fixtures is documented.
The suite runs as part of integration CI.

🧪 How to test

From the project root, cd refiner/.
Activate a virtual environment.
Install requirements.txt and dev-requirements.txt.
Ensure the integration stack is available (docker compose / localstack), same as the rest of tests/integration/.
Run pytest tests/integration/scenarios/.
You should see 19 tests pass.
Inspect tests/integration/scenarios/REPORT.md for the stakeholder-facing summary of what the suite pins.
Browse tests/integration/scenarios/snapshots/<fixture>/<scenario>/ to see committed expected output per scenario (expected_trace.json is the most legible starting point).
Read tests/integration/scenarios/README.md for the full developer workflow: authoring new scenarios, adding new fixtures, regenerating the report.

To verify the snapshot regeneration workflow:

pytest tests/integration/scenarios/test_all_sections_covid_influenza.py --update-snapshots
python tests/integration/scenarios/authoring/build_report.py
git diff

The diff should be empty if the committed state is current; a non-empty diff indicates drift between committed snapshots and current refinement output, or between snapshots and the report.

ℹ️ Additional Information

Design choices worth flagging for review:

Configurations are built through the live API from declarative recipes, then read back from localstack in the production activation-format JSON. This keeps the suite on the same serialization and code path production uses, with a single authoring path and no committed active.json to maintain.
Refinement timestamp is fixed (20260101000000+0000) so deterministic UUIDv5 identifier derivation produces stable snapshots run-to-run. Without this, snapshots would never match between runs.
Augmentation is seeded by the activating jurisdiction (the integration test user's, SDDH), not the RR's reportable-to jurisdiction. The fixture RR may be reportable elsewhere; that is intentionally not consulted, so arbitrary test data can be reused without standing up matching fake jurisdictions.
The activation payload is fetched by discovering the active version from current.json rather than assuming version 1, so tests that build more than one configuration for the same condition resolve each correctly.
Three snapshot artifacts per scenario (trace JSON + refined eICR + refined RR) so regressions surface at the right granularity: top-line outcome changes in the trace, structural changes in the XML.
Explicit assertions in test_explicit_assertions.py complement the snapshot scenarios. They build configs through the same recipe path and pin specific named behaviors with precondition guards that fail diagnostically if the fixture or configuration drifts in a way that makes the test no longer exercise its named concern.

Deferred to follow-up PRs (small, well-scoped):

CI step that runs python tests/integration/scenarios/authoring/build_report.py and fails on git diff --exit-code REPORT.md to prevent the report from going stale relative to snapshots. A TODO at the top of build_report.py flags this.
Lift normalize_xml (currently imported from tests/unit/conftest.py) to a shared tests/_helpers/ module. The scenarios suite still reaches into the unit suite for this one helper; functional but architecturally awkward. (The integration validation helpers are now inherited/imported from the parent integration conftest, which is the suite's natural parent.)

Runtime: 19 tests; the suite now runs on the integration path (docker compose / localstack) rather than infra-free, since configs are authored through the live API. Each scenario does a full create → activate → fetch cycle; if the suite grows substantially, a session-scoped cache of built configurations keyed by scenario name is the natural mitigation.

github-actions · 2026-05-29T05:57:25Z

🔒 Security Scan Results

⚠️ Found 14 vulnerabilities

Severity	Total
🟠 High	10
🟡 Medium	4

📦 refiner-app

✅ No vulnerabilities found

📦 refiner-lambda

Severity	Count
🟠 High	1
🟡 Medium	2

📦 refiner-ops

Severity	Count
🟠 High	9
🟡 Medium	2

View detailed results: Security tab
Last updated: 2026-06-05 20:45:32 UTC

jakewheeler · 2026-06-02T20:04:19Z

Some of my initial thoughts & observations:

Scripting the creation of configurations using app API requests would make it a lot more obvious as to how these configs were assembled
- Maybe I'm missing it but it is not obvious to me how I'd know how to recreate these configs in the UI at a glance
- The scripts that do the assembling would be much more maintainable and low effort when it comes to updates to config format (and would fix the issue I mentioned right above)
I don't know that there is much benefit in storing the "hard" config .json files in /configurations that we check in. We could build the configs via script and keep them in memory while the tests run (we could grab them out of localstack after activation or run config data through convert_config_to_storage_payload() and read it back in with ProcessedConfiguration.from_dict())
I'm wondering if, instead of checking in this file, we generate the content of REPORT.md during a merge to main and store it somewhere. We have a wiki that we haven't used yet, but this could be a good spot for it? This would be fairly simple to do in the CI ticket
Would it make sense to put all of this into a refinement package under tests/integration? I'm not seeing a meaningful difference between these and the types of testing we do within integration. This would also give us the benefit of very easily using the fixtures and other code already defined in integration/conftest.py

jakewheeler · 2026-06-02T20:24:38Z

+    assert result.trace.refinement_outcome == "refined"
+    assert result.trace.configuration_resolved is True
+    assert result.trace.configuration_version == CONFIGURATION_VERSION
+    assert result.trace.canonical_url == COVID_CANONICAL_URL
+    assert result.trace.skip_reason is None
+    assert result.trace.error_detail is None
+
+    # size reduction must be populated and positive: refining for COVID
+    # against a multi-condition eICR drops the non-COVID content.
+    assert result.trace.eicr_size_reduction_percentage is not None
+    assert result.trace.eicr_size_reduction_percentage > 0


Is it worth making these checks? Are we able to make it this far if a lot of these values are wrong or missing?

you're not wrong; this was the first thing that i worked on to make sure i had everything up and running so i can test. i decided to leave it in as more or less a smoke test but i would also not be opposed to removing it

I was going back and forth on this. The thing that is a bit odd about trace is that it kind of acts as both a refining input but also a logging tool.

If refining can't go on without the values being correct I would say it's fine to remove.

i whittled this down so let me know if it works for you

fzhao99 · 2026-06-03T19:53:45Z

I think I'm team script over commit when it comes to configurations for a few reasons.

The script / read-in-memory would (imo) be more maintainable in the long run rather than having to change the large static files, especially if we ever changed our configuration shape.
Think it's worthwhile for us to test the deserialization step in a "live" refinement run, since it's a seam I'd like to see us cover more than we are currently. This may save us an outage in the instance where the lambda has an issue reading a config and subsequently blowing up. Think it's worthwhile, even if its via the API, for us to ensure serialization into S3 is as expected we run the suite,
- This has the added benefit of making sure our serialization, when it changes, doesn't break the refining for these core scenarios
It's another smoke test that our API interface for the core configuration-building components are working as expected.

If we do do the above, Jake's point of moving things into integration would probably be practical in order to leverage the necessary localstack / integration/conftest.py information

Do really love the report generation though, think that'll be super helpful once we get it going

robertmitchellv added 3 commits May 28, 2026 22:30

new report for evaluating condition groupers

9a4e965

add the all conditions fixture and baseline configurations

fe53770

add the first big chunk of end to end scenario testing

c6a236d

robertmitchellv and others added 7 commits May 29, 2026 13:10

add two more configurations to test

e5bbb11

add validation and regenerate snapshots

7990aa6

add more configurations to test

265c69a

add report and additional scenarios

4ccfdb9

ruff fix

ae0580c

Merge branch 'main' into robert/1240-scenario-based-testing

081aded

make the rollup cases stronger via explicit checks

0ef8e25

robertmitchellv changed the title ~~WIP test(server): Scenario based testing to mirror Tim's tests~~ test(server): Scenario based testing to mirror Tim's tests Jun 1, 2026

robertmitchellv marked this pull request as ready for review June 1, 2026 02:40

robertmitchellv requested review from fzhao99, jakewheeler, lina-roth and rogeruiz as code owners June 1, 2026 02:40

robertmitchellv and others added 5 commits June 1, 2026 08:37

Merge branch 'main' into robert/1240-scenario-based-testing

1ae08f6

update configs and add csvs for reproducing them easier

7d5b95a

regenerate snapshots

0600110

Merge branch 'main' into robert/1240-scenario-based-testing

a75c992

Merge branch 'main' into robert/1240-scenario-based-testing

b4e623c

jakewheeler self-assigned this Jun 2, 2026

Merge branch 'main' into robert/1240-scenario-based-testing

5c4112e

jakewheeler reviewed Jun 2, 2026

View reviewed changes

Comment thread refiner/scripts/exports/export-groupers-report.py Outdated

jakewheeler reviewed Jun 2, 2026

View reviewed changes

robertmitchellv added 2 commits June 4, 2026 21:26

add new section processing helper fixture

acc4057

move scenarios to integration and resolve imports

3f8ff3f

robertmitchellv added 7 commits June 5, 2026 11:58

remove the static configuration json files

e3d1c18

update fixture organization and update tests and snapshots

3d5bffd

rerun report

926c922

update documentation

48e7d45

Merge branch 'main' into robert/1240-scenario-based-testing

7555c0b

remove; we don't need this for the repo

5698a77

whittle down the test harness function asserts

37263e2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(server): Scenario based testing to mirror Tim's tests#1286

test(server): Scenario based testing to mirror Tim's tests#1286
robertmitchellv wants to merge 25 commits into
mainfrom
robert/1240-scenario-based-testing

robertmitchellv commented May 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 29, 2026 •

edited

Loading

Uh oh!

jakewheeler commented Jun 2, 2026

Uh oh!

Uh oh!

jakewheeler Jun 2, 2026

Uh oh!

robertmitchellv Jun 2, 2026

Uh oh!

jakewheeler Jun 3, 2026

Uh oh!

robertmitchellv Jun 5, 2026

Uh oh!

fzhao99 commented Jun 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

robertmitchellv commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔀 PULL REQUEST

💡 Summary

🔗 Related Issue

✅ Acceptance Criteria

🧪 How to test

ℹ️ Additional Information

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Security Scan Results

⚠️ Found 14 vulnerabilities

📦 refiner-app

📦 refiner-lambda

📦 refiner-ops

Uh oh!

jakewheeler commented Jun 2, 2026

Uh oh!

Uh oh!

jakewheeler Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

robertmitchellv Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

jakewheeler Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

robertmitchellv Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

fzhao99 commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

robertmitchellv commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading

fzhao99 commented Jun 3, 2026 •

edited

Loading