Skip to content

Comments

NDH-417 Add Jupyter Notebook to Perform Mapping from Halloween CSVs to API DB#241

Merged
spopelka-dsac merged 29 commits intomainfrom
sjp/halloween-csvs
Jan 30, 2026
Merged

NDH-417 Add Jupyter Notebook to Perform Mapping from Halloween CSVs to API DB#241
spopelka-dsac merged 29 commits intomainfrom
sjp/halloween-csvs

Conversation

@spopelka-dsac
Copy link
Contributor

module-name: Add Jupyter Notebook to Perform Mapping from Halloween CSVs to API DB

Jira Ticket #NDH-417

Problem

The Halloween CSV to API DB mapping in the npd_Puffin repo contained some issues that prevented us from being able to fully connect the API to the data, including:

  • It was very slow to load the data from the insert statements that were created
  • There were missing tables and relationships that caused the API to not be fully populated
  • Some of the inner joins masked data quality issues by omitting records that had invalid values (e.g. an inner join on NPI would exclude an NPI that does not exist in the NPI table, but doing so makes it harder to notice that such a value is present)

Solution

This PR introduces a python notebook that, among other things:

  • Provides a faster way to load data from the Halloween CSVs to any database that has had the flyway migrations applied
  • Corrects for a number of issues noted in the issues spreadsheet
  • Documents data quality issues in the code

Result

I would not recommend deploying this in Dagster, as our load processes will change once we land the core data model. However, in the event that we need to load Halloween CSV data before we finalize the core data model, this Jupyter Notebook represents a more robust way to do so.

Test Plan

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

Backend Django Test Results

59 tests   - 39   56 ✅  - 42   0s ⏱️ -1s
11 suites  -  2    0 💤 ± 0 
11 files    -  2    0 ❌ ± 0   3 🔥 +3 

For more details on these errors, see this check.

Results for commit 8012223. ± Comparison against base commit 784f8dc.

This pull request removes 43 and adds 4 tests. Note that renamed tests count towards both.
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_default
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_filter_by_address
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_filter_by_address_city
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_filter_by_address_postalcode
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_filter_by_address_state
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_filter_by_address_use
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_filter_by_name
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_in_default_order
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_in_descending_order
npdfhir.tests.test_location.LocationViewSetTestCase ‑ test_list_in_order_by_address
…
npdfhir.tests.test_organization.OrganizationViewSetTestCase ‑ test_parent_id
setUpClass (npdfhir.tests.test_location.LocationViewSetTestCase)
setUpClass (npdfhir.tests.test_practitioner.PractitionerViewSetTestCase)
setUpClass (npdfhir.tests.test_practitioner_role.PractitionerRoleViewSetTestCase)

♻️ This comment has been updated with latest results.

@spopelka-dsac spopelka-dsac marked this pull request as ready for review January 29, 2026 13:48
@spopelka-dsac
Copy link
Contributor Author

@wbprice @rmillergv I recognize that this will be superseded (hopefully during this sprint), but I'd like to get this merged in so I don't have to switch branches whenever I need to load the larger dataset locally for testing performance improvement mods

@spopelka-dsac spopelka-dsac enabled auto-merge (squash) January 30, 2026 19:05
@spopelka-dsac spopelka-dsac merged commit 241174e into main Jan 30, 2026
11 checks passed
@rmillergv
Copy link
Contributor

What a wonderful Rossetta stone, thanks Sarah.

@spopelka-dsac spopelka-dsac deleted the sjp/halloween-csvs branch January 30, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants