Skip to content

Add API v2#385

Merged
micahwiesner67 merged 45 commits intomainfrom
dev-nam_api_v2_prep
Apr 8, 2026
Merged

Add API v2#385
micahwiesner67 merged 45 commits intomainfrom
dev-nam_api_v2_prep

Conversation

@micahwiesner67
Copy link
Copy Markdown
Collaborator

@micahwiesner67 micahwiesner67 commented Feb 23, 2026

@natemcintosh re-opening that PR here.

Notes:

  • Adding parameter for facility_active_proportion with a default of 0.94
  • Exposing api parameters in the makefile and passing them into azure/generate_config.py for simplicity (user can now simply run 'make config DATA_API=v2' and this will route the input data container as nssp-etl-api-v2 (default is still set to API v1 - nssp-etl)
  • Adding unit tests to test the read_data() with facility_active_proportion passed or defaulted for both API v1 and v2
  • The current stan-dev install from github stan-dev/cmdstanr fails out. It seems the latest dev updates don't play well with our dependencies. Pinning to the 0.9.0 commit hash (da99e2b) appears to work
  • test with a command like this make test-batch JOB=test_name DATA_API=v2
  • Added documentation in the read_data function to highlight how facility_active_proportion is used only in the API v2 DATA

Some additional changes for continuity:

  1. To run 'make run' for local testing it was necessary to update the test.json config in rt-epinow2-config with the low_case_count_thresholds field
    "low_case_count_thresholds": {
    "COVID-19": 10,
    "Influenza": 10,
    "RSV": 5
    },

@natemcintosh
Copy link
Copy Markdown
Collaborator

Yea, I think something lower than 1.0 would be good. I don't remember exactly what worked well in the runs when I did it. @kgostic do you remember if there was a particular threshold you liked?

@micahwiesner67
Copy link
Copy Markdown
Collaborator Author

To start, I resolved the merge conflicts. In the 'main' branch we have 0.94 as the proportion for active facilities threshold so that's what I set to here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 24, 2026

Thank you for your contribution @micahwiesner67 🚀! Your pkgdown-site is ready for download 👉 here 👈!
(The artifact expires on 2026-04-15T15:13:37Z. You can re-generate it by re-running the workflow here.)

@micahwiesner67 micahwiesner67 changed the title Dev nam api v2 prep Add API v2 Feb 24, 2026
Copy link
Copy Markdown
Collaborator

@zsusswein zsusswein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not following around testing. Added some more detail in the relevant comment chain.

Comment thread R/config.R Outdated
Comment thread R/read_data.R Outdated
Comment thread R/read_data.R Outdated
Comment thread tests/testthat/data/README.md Outdated
micahwiesner67 and others added 2 commits April 7, 2026 11:51
Co-authored-by: Zachary Susswein <46581799+zsusswein@users.noreply.github.com>
Comment thread R/read_data.R
@micahwiesner67
Copy link
Copy Markdown
Collaborator Author

@zsusswein
So the facility_active_proportion field is not used in the read_data() function when we are pulling data from API v1. The read_data() function will look for the any_visits_this_day column in the data - if it exists it will determine that data is from API v2 and proceed as such (filtering out based on the facility_active_proportion threshold of 0.94 which we set).

In regard to the 4 tests you mentioned.

  1. Is still being tested in some test that references the sample_config_with_exclusion.json file ("Pipeline run produces expected outputs and returns success"
    • Maybe the confusion is in the naming of the CA_*.parquet files? CA_test.parquet is still a file that has API v1 data. Would it make more sense if it was explicitly named so (CA_apiv1_test.parquet) to match the naming schema of CA_apiv2_test.parquet?

@zsusswein
Copy link
Copy Markdown
Collaborator

@zsusswein So the facility_active_proportion field is not used in the read_data() function when we are pulling data from API v1. The read_data() function will look for the any_visits_this_day column in the data - if it exists it will determine that data is from API v2 and proceed as such (filtering out based on the facility_active_proportion threshold of 0.94 which we set).

That explanation makes sense. My request here is add tests for the expected behavior, especially for the key production use-cases.

  1. Is still being tested in some test ...

And the above is a little concerning because we have a change to the core API here. I believe you when you tell me that this runs in both cases and there's a test somewhere, but I think we need a test to cover explicitly this behavior in each of its permutations.

@micahwiesner67
Copy link
Copy Markdown
Collaborator Author

Documenting our conversation yesterday for thoroughness. Zach raised concerns over testing in these scenarios. The main distinction to make is that if the facility_active_proportion is not passed through the config that is totally fine as there is a default value (0.94) set.

  1. Do not pass facility_active_proportion in the config and run on API v1: this is the existing behavior on main --> This is the branch behavior prior to this PR, correct.
  2. Do pass facility_active_proportion in the config and run on API v1: this is my understanding of the "production" behavior on this branch --> This is correct that this will be "production" behavior. Essentially, facility active proportion is passed as default (0.94) but ignored as it's not used for API v1.
  3. Do not pass facility_active_proportion in the config and run on API v2 --> This is not correct, this will not error out as read_data has a default facility_active_proportion of 0.94 set.
  4. Do pass facility_active_proportion in the config and run on API v2: my understanding is that this is the API v2 backtest behavior

In our meeting yesterday we agreed it wasn't necessary to test the handling of the config as long as we had a respective case in read_data. As there is a default for facility_active_proportion this field is essentially always passed in. To cover these cases, I've added some tests in the tests/testthat/test-read_data.R script.

  1. incoming
  2. "Replace COVID-19/Omicron with COVID-19, US (API v1)"
  3. "API v2 with COIVD-19, US (default facility_active_proportion)"
  4. "facility_active_proportion affects counts (API v2)"

Comment thread R/read_data.R Outdated
Comment thread R/read_data.R Outdated
@micahwiesner67 micahwiesner67 merged commit 924733e into main Apr 8, 2026
23 of 27 checks passed
@micahwiesner67 micahwiesner67 deleted the dev-nam_api_v2_prep branch April 8, 2026 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants