Skip to content

test(server): Scenario based testing to mirror Tim's tests#1286

Open
robertmitchellv wants to merge 25 commits into
mainfrom
robert/1240-scenario-based-testing
Open

test(server): Scenario based testing to mirror Tim's tests#1286
robertmitchellv wants to merge 25 commits into
mainfrom
robert/1240-scenario-based-testing

Conversation

@robertmitchellv
Copy link
Copy Markdown
Collaborator

@robertmitchellv robertmitchellv commented May 29, 2026

🔀 PULL REQUEST

💡 Summary

Adds a scenarios test suite at tests/integration/scenarios/ that exercises the production refinement pipeline--the same path lambda runs--against committed eICR/RR fixtures. Each scenario authors its configuration through the live API (create → customize → activate → read the activation payload back from localstack), then asserts in two layers: validation (CDA R2 XSD + schematron) and snapshot comparison against committed expected files.

Configurations are built from declarative recipes rather than committed JSON: a Scenario names the condition plus the custom codes, section-processing overrides, and associated conditions to layer on, and a shared build_scenario_configuration fixture interprets the recipe against the running app. This gives a single authoring path that uses the exact serialization production uses, with no committed active.json to drift--updating a scenario means editing its recipe, not hand-editing or re-downloading JSON.

The suite contains 8 snapshot scenarios on the all_sections_COVID_INFLUENZA fixture, 8 explicit named-assertion tests, and 3 smoke tests pinning the harness itself (19 total). Every refined document is XSD- and schematron-validated on every test run. An auto-generated REPORT.md summarizes the suite for stakeholder review.

Because the suite lives under tests/integration/, it runs as part of the existing integration CI job automatically--no separate CI wiring needed, and these tests passed in CI on this PR.

All six issues from the original testing roll-up sheet are directly covered: entry matching across OID mismatch, custom codes in nested locations (entryRelationship/value and substanceAdministration), procedure retention via unrelated entryRelationship codes, vital sign panel pruning, code-set bloat, and schematron conformance of refined output. The previously informal "Tim runs through a spreadsheet manually" workflow is now a CI-runnable suite.

🔗 Related Issue

Fixes #1240

✅ Acceptance Criteria

  • A new test layer exercises the production refinement pipeline end-to-end against committed fixtures, with configurations authored through the live API so there is a single authoring path and no committed configuration JSON to drift.
  • Snapshots pin refinement behavior (entry matching, identifier derivation, section pruning) such that regressions surface as snapshot diffs rather than silent drift.
  • All six issues from the original roll-up sheet are directly covered, with five additionally backed by named explicit assertions that fail with a behavior-level message rather than an opaque snapshot diff.
  • Every refined document is validated against CDA R2 XSD and schematron on every test run; errors and fatal severity fail the test.
  • A stakeholder-facing report (REPORT.md) is generated deterministically from the committed snapshots.
  • Developer workflow for authoring new scenarios and fixtures is documented.
  • The suite runs as part of integration CI.

🧪 How to test

  1. From the project root, cd refiner/.
  2. Activate a virtual environment.
  3. Install requirements.txt and dev-requirements.txt.
  4. Ensure the integration stack is available (docker compose / localstack), same as the rest of tests/integration/.
  5. Run pytest tests/integration/scenarios/.
  6. You should see 19 tests pass.
  7. Inspect tests/integration/scenarios/REPORT.md for the stakeholder-facing summary of what the suite pins.
  8. Browse tests/integration/scenarios/snapshots/<fixture>/<scenario>/ to see committed expected output per scenario (expected_trace.json is the most legible starting point).
  9. Read tests/integration/scenarios/README.md for the full developer workflow: authoring new scenarios, adding new fixtures, regenerating the report.

To verify the snapshot regeneration workflow:

pytest tests/integration/scenarios/test_all_sections_covid_influenza.py --update-snapshots
python tests/integration/scenarios/authoring/build_report.py
git diff

The diff should be empty if the committed state is current; a non-empty diff indicates drift between committed snapshots and current refinement output, or between snapshots and the report.

ℹ️ Additional Information

Design choices worth flagging for review:

  • Configurations are built through the live API from declarative recipes, then read back from localstack in the production activation-format JSON. This keeps the suite on the same serialization and code path production uses, with a single authoring path and no committed active.json to maintain.
  • Refinement timestamp is fixed (20260101000000+0000) so deterministic UUIDv5 identifier derivation produces stable snapshots run-to-run. Without this, snapshots would never match between runs.
  • Augmentation is seeded by the activating jurisdiction (the integration test user's, SDDH), not the RR's reportable-to jurisdiction. The fixture RR may be reportable elsewhere; that is intentionally not consulted, so arbitrary test data can be reused without standing up matching fake jurisdictions.
  • The activation payload is fetched by discovering the active version from current.json rather than assuming version 1, so tests that build more than one configuration for the same condition resolve each correctly.
  • Three snapshot artifacts per scenario (trace JSON + refined eICR + refined RR) so regressions surface at the right granularity: top-line outcome changes in the trace, structural changes in the XML.
  • Explicit assertions in test_explicit_assertions.py complement the snapshot scenarios. They build configs through the same recipe path and pin specific named behaviors with precondition guards that fail diagnostically if the fixture or configuration drifts in a way that makes the test no longer exercise its named concern.

Deferred to follow-up PRs (small, well-scoped):

  • CI step that runs python tests/integration/scenarios/authoring/build_report.py and fails on git diff --exit-code REPORT.md to prevent the report from going stale relative to snapshots. A TODO at the top of build_report.py flags this.
  • Lift normalize_xml (currently imported from tests/unit/conftest.py) to a shared tests/_helpers/ module. The scenarios suite still reaches into the unit suite for this one helper; functional but architecturally awkward. (The integration validation helpers are now inherited/imported from the parent integration conftest, which is the suite's natural parent.)

Runtime: 19 tests; the suite now runs on the integration path (docker compose / localstack) rather than infra-free, since configs are authored through the live API. Each scenario does a full create → activate → fetch cycle; if the suite grows substantially, a session-scoped cache of built configurations keyed by scenario name is the natural mitigation.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

🔒 Security Scan Results

⚠️ Found 14 vulnerabilities

Severity Total
🟠 High 10
🟡 Medium 4

📦 refiner-app

No vulnerabilities found

📦 refiner-lambda

Severity Count
🟠 High 1
🟡 Medium 2

📦 refiner-ops

Severity Count
🟠 High 9
🟡 Medium 2

View detailed results: Security tab
Last updated: 2026-06-05 20:45:32 UTC

@robertmitchellv robertmitchellv changed the title WIP test(server): Scenario based testing to mirror Tim's tests test(server): Scenario based testing to mirror Tim's tests Jun 1, 2026
@robertmitchellv robertmitchellv marked this pull request as ready for review June 1, 2026 02:40
@jakewheeler jakewheeler self-assigned this Jun 2, 2026
@jakewheeler
Copy link
Copy Markdown
Collaborator

Some of my initial thoughts & observations:

  • Scripting the creation of configurations using app API requests would make it a lot more obvious as to how these configs were assembled
    • Maybe I'm missing it but it is not obvious to me how I'd know how to recreate these configs in the UI at a glance
    • The scripts that do the assembling would be much more maintainable and low effort when it comes to updates to config format (and would fix the issue I mentioned right above)
  • I don't know that there is much benefit in storing the "hard" config .json files in /configurations that we check in. We could build the configs via script and keep them in memory while the tests run (we could grab them out of localstack after activation or run config data through convert_config_to_storage_payload() and read it back in with ProcessedConfiguration.from_dict())
  • I'm wondering if, instead of checking in this file, we generate the content of REPORT.md during a merge to main and store it somewhere. We have a wiki that we haven't used yet, but this could be a good spot for it? This would be fairly simple to do in the CI ticket
  • Would it make sense to put all of this into a refinement package under tests/integration? I'm not seeing a meaningful difference between these and the types of testing we do within integration. This would also give us the benefit of very easily using the fixtures and other code already defined in integration/conftest.py

Comment thread refiner/scripts/exports/export-groupers-report.py Outdated
Comment on lines +110 to +120
assert result.trace.refinement_outcome == "refined"
assert result.trace.configuration_resolved is True
assert result.trace.configuration_version == CONFIGURATION_VERSION
assert result.trace.canonical_url == COVID_CANONICAL_URL
assert result.trace.skip_reason is None
assert result.trace.error_detail is None

# size reduction must be populated and positive: refining for COVID
# against a multi-condition eICR drops the non-COVID content.
assert result.trace.eicr_size_reduction_percentage is not None
assert result.trace.eicr_size_reduction_percentage > 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth making these checks? Are we able to make it this far if a lot of these values are wrong or missing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're not wrong; this was the first thing that i worked on to make sure i had everything up and running so i can test. i decided to leave it in as more or less a smoke test but i would also not be opposed to removing it

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going back and forth on this. The thing that is a bit odd about trace is that it kind of acts as both a refining input but also a logging tool.

If refining can't go on without the values being correct I would say it's fine to remove.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i whittled this down so let me know if it works for you

@fzhao99
Copy link
Copy Markdown
Collaborator

fzhao99 commented Jun 3, 2026

I think I'm team script over commit when it comes to configurations for a few reasons.

  • The script / read-in-memory would (imo) be more maintainable in the long run rather than having to change the large static files, especially if we ever changed our configuration shape.
  • Think it's worthwhile for us to test the deserialization step in a "live" refinement run, since it's a seam I'd like to see us cover more than we are currently. This may save us an outage in the instance where the lambda has an issue reading a config and subsequently blowing up. Think it's worthwhile, even if its via the API, for us to ensure serialization into S3 is as expected we run the suite,
    • This has the added benefit of making sure our serialization, when it changes, doesn't break the refining for these core scenarios
  • It's another smoke test that our API interface for the core configuration-building components are working as expected.

If we do do the above, Jake's point of moving things into integration would probably be practical in order to leverage the necessary localstack / integration/conftest.py information

Do really love the report generation though, think that'll be super helpful once we get it going

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[TASK] Set up mirror of Tim's testing in our CI/CD

3 participants