Skip to content

Backfill config gen, and as-of date changes#286

Closed
natemcintosh wants to merge 0 commit intomainfrom
nam-config-gen-updates
Closed

Backfill config gen, and as-of date changes#286
natemcintosh wants to merge 0 commit intomainfrom
nam-config-gen-updates

Conversation

@natemcintosh
Copy link
Copy Markdown
Collaborator

@natemcintosh natemcintosh commented May 20, 2025

Nate's Summary

This aims to do issues:

It became clear that we needed to pass the job ids created during config generation to the job creation script. The options I could think of were:

  • Have config generation spit out the job ids, parse them with bash, and then loop over job.py for each job id.
  • Put a modified copy of job.py in the same script as the backfill config generation.

Option 2 felt much safer. Have tested this in Batch, and things run!

Example model output

You can see output from this in the nssp-rt-testing storage container, in the job ids starting with your-backfill-name....

Copilot's Summary

This pull request introduces a new feature for generating and running backfill configurations, along with various supporting changes. The most significant updates include adding a new script for backfill configuration, updating the Makefile to include a command for running this script, and refining documentation and parameter handling in related files.

New Feature: Backfill Configuration and Execution

  • New script for backfill configuration: Added azure/backfill_generate_and_run.py to generate and upload configuration files for the EpiNow2 pipeline, and optionally execute tasks in Azure Batch. This includes detailed parameter handling, validation, and Azure integration.
  • Makefile update: Added a backfill-config-and-run command to the Makefile to streamline running the new backfill script. This provides an easy way to configure and execute backfill tasks.
  • Documentation update: Added an entry in NEWS.md to announce the new commands for generating backfill configurations.

Improvements to Parameter Handling

  • Refinement of as_of_date parameter: Updated the as_of_date parameter description in R/config.R for better clarity and alignment with its usage.
  • generate_configs.py adjustment: Modified the as_of_date calculation to use the report date directly, ensuring consistency and simplifying logic. [1] [2]

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2025

Thank you for your contribution @natemcintosh 🚀! Your pkgdown-site is ready for download 👉 here 👈!
(The artifact expires on 2025-07-08T15:49:21Z. You can re-generate it by re-running the workflow here.)

@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread Makefile Outdated
Comment thread azure/generate_backfill_configs.py Outdated
Comment thread azure/generate_backfill_configs.py Outdated
Copy link
Copy Markdown
Collaborator

@zsusswein zsusswein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth putting the duplicated code here in a function and importing it? There seems to be quite a bit of logic here shared with existing stuff and it could be annoying to maintain.

@natemcintosh
Copy link
Copy Markdown
Collaborator Author

Is it worth putting the duplicated code here in a function and importing it? There seems to be quite a bit of logic here shared with existing stuff and it could be annoying to maintain.

I was waiting for someone to say this 😆. At what point do we make the azure/ folder its own little python package? Part of the problem with putting the common functionality is that it requires various azure packages. Right now, everything in the azure/ folder is a uv script, each of which creates its own environment. If we wanted to import functionality that uses azure packages, we'd probably have to create a small package.

Copy link
Copy Markdown
Collaborator

@kgostic kgostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving so as not to hold you up, but I have some comments.

Most importantly, could you put your head together with @PatrickTCorbett to talk about how people are supposed to set up a run, and add some documentation?

Comment thread azure/backfill_generate_and_run.py Outdated
Comment thread azure/backfill_generate_and_run.py Outdated
Comment thread azure/backfill_generate_and_run.py Outdated
Comment thread azure/backfill_generate_and_run.py Outdated
Comment thread azure/backfill_generate_and_run.py Outdated
Comment thread azure/backfill_generate_and_run.py Outdated
Comment thread Makefile Outdated
@zsusswein
Copy link
Copy Markdown
Collaborator

zsusswein commented May 23, 2025

At what point do we make the azure/ folder its own little python package?

Definitely out of scope for this PR, but what about sticking it in the STF package you're using for pool setup? Probably way more trouble than it's worth because it's a different format than what they do...........

I do think this is maybe worth doing though. This feel worth hashing out in an issue?

Comment thread Makefile Outdated
@kgostic
Copy link
Copy Markdown
Collaborator

kgostic commented Jun 24, 2025

Summarizing f2f discussion. Nate had been waiting to merge until he figured out an issue with a flu run.

@natemcintosh
Copy link
Copy Markdown
Collaborator Author

I've figured out the issue with the Flu run. It was that the earliest delay parameter for Flu was 2024-11-26. Previously I had only looked at the right_truncation parameter, which has an earlier initial start date (hence my confusion). In this light, everything seems to be working as expected. (Except that our error messages disappear)

@PatrickTCorbett
Copy link
Copy Markdown
Collaborator

I've figured out the issue with the Flu run. It was that the earliest delay parameter for Flu was 2024-11-26. Previously I had only looked at the right_truncation parameter, which has an earlier initial start date (hence my confusion). In this light, everything seems to be working as expected. (Except that our error messages disappear)

This tracks with the issues I had with the RSV backfill run. Would setting as_of date like I did in the RSV run not fix this?

@natemcintosh
Copy link
Copy Markdown
Collaborator Author

As in setting it to the current date? I would, but in my testing of the API v2 data, we want everything (except the data) to match the original production runs, which means running with the parameters as-of date matching the report date.

@natemcintosh
Copy link
Copy Markdown
Collaborator Author

natemcintosh commented Jul 1, 2025

We should attempt to merge this PR and the matching one in the config generator at the same time. Both implement changes in the config generator API, and things will be broken if we do one without the others.

Note that #328 will also require the PR in the config generator to be implemented before it can be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants