Skip to content

Comments

[NDH-642] Apply performance improvements to Practitioner Endpoint#385

Merged
spopelka-dsac merged 18 commits intomainfrom
sjp/streamline-joins
Feb 9, 2026
Merged

[NDH-642] Apply performance improvements to Practitioner Endpoint#385
spopelka-dsac merged 18 commits intomainfrom
sjp/streamline-joins

Conversation

@spopelka-dsac
Copy link
Contributor

module-name: [NDH-642] Apply performance improvements to Practitioner Endpoint

Jira Ticket #NDH-642

Problem

Adam and Demetrius had noted timeouts and poor database performance in their load testing, and the Technical Working Group also noticed timeouts. We had deferred optimizing search and indexes, due to uncertainties about use patterns and the shape/structure of what the data would look like coming out of the ETL, which might affect optimization considerations.

Solution

  • Add V18.1 migration to create a tsvector and indexes to individual_to_name table (speeds up filtering and ordering)
  • Add V18.2 migration to add tsvector and index to nucc table (speeds up filtering)
  • Add V18.3 migration to add tsvector and index to address_us table (speeds up filtering)
  • Update models.py to reflect the search_vector and index updates
  • Update PractitionerFilterSet to reflect new filter pattern based on name, address, and nucc tsvectors
  • Update PractitionerRoleFilterSet to reflect new filter pattern based on practitioner_name and practitioner_type updates in PractitionerFilterSet
  • Update tests accordingly
  • Remove unnecessary queryset annotations, which slow down API calls

Result

Response times for /fhir/Practitioner go down from ~30-40 seconds to ~10-20 seconds, depending on the filter being applied

Test Plan

  1. All tests should pass
  2. Test API calls locally and compare response times to the same calls made against https://dev.cnpd.internal.cms.gov/

Copy link
Contributor

@rmillergv rmillergv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backend/npdfhir/filters/practitioner_filter_set.py
- filter_practitioner_type():
queryset.filter(providertotaxonomy__nucc_code__search_vector=query)
providertotaxonomy is multi-valued (providers can have multiple taxonomy rows). If 2 taxonomy rows match the websearch query, code can get
the same Practitioner twice.
- filter_address() (and filter_address_city/state/postalcode()):
joins through individual__individualtoaddress__... which is also multi-valued (multiple addresses), so same duplication risk.

  • backend/npdfhir/filters/practitioner_role_filter_set.py
    • filter_practitioner_type() used to end with .distinct() and now doesn’t: If that's intentional, great, but curious why allowing multiple rows?
      queryset.filter(provider_to_organization__individual__providertotaxonomy__nucc_code__search_vector=query)
      same multi-valued taxonomy join → duplicate PractitionerRole rows possible.

I'm approving this with the above comment, because duplicate rows may not be an issue as the queries for this data when run are likely to return a single row. So, a nicety if you want to look at it. Or if you meant to have duplicates, that is fine too.

@spopelka-dsac
Copy link
Contributor Author

spopelka-dsac commented Feb 9, 2026

backend/npdfhir/filters/practitioner_filter_set.py - filter_practitioner_type(): queryset.filter(providertotaxonomy__nucc_code__search_vector=query) providertotaxonomy is multi-valued (providers can have multiple taxonomy rows). If 2 taxonomy rows match the websearch query, code can get the same Practitioner twice. - filter_address() (and filter_address_city/state/postalcode()): joins through individual__individualtoaddress__... which is also multi-valued (multiple addresses), so same duplication risk.

  • backend/npdfhir/filters/practitioner_role_filter_set.py

    • filter_practitioner_type() used to end with .distinct() and now doesn’t: If that's intentional, great, but curious why allowing multiple rows?
      queryset.filter(provider_to_organization__individual__providertotaxonomy__nucc_code__search_vector=query)
      same multi-valued taxonomy join → duplicate PractitionerRole rows possible.

I'm approving this with the above comment, because duplicate rows may not be an issue as the queries for this data when run are likely to return a single row. So, a nicety if you want to look at it. Or if you meant to have duplicates, that is fine too.

Good observation, Ross! In both instances, the queryset is based on the Provider model, which contains only distinct records by definition (each row represents a single individual with a Type 1 NPI). The way that our filtering framework works, the filters are performing an inner join on those related tables to get the individual_ids associated with the Practitioners for which the filter conditions are fulfilled, and then the rest of the queries are performed based on the set of individual ids that were returned. If you spin up the API locally, you can click the "DJDT" icon on the righthand side of the screen when you're querying a resource using its endpoint directory (i.e. not through the API docs, but rather localhost:8000/fhir/Practitioner for example), which will show you information about what the database is doing under-the-hood. Regardless, .distinct() was not actually doing anything in either line. @rmillergv

@spopelka-dsac spopelka-dsac enabled auto-merge (squash) February 9, 2026 21:06
@spopelka-dsac spopelka-dsac merged commit ac30e15 into main Feb 9, 2026
12 checks passed
@spopelka-dsac spopelka-dsac deleted the sjp/streamline-joins branch February 9, 2026 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants