[NDH-642] Apply performance improvements to Practitioner Endpoint by spopelka-dsac · Pull Request #385 · CMS-Enterprise/npd

spopelka-dsac · 2026-02-05T20:56:55Z

module-name: [NDH-642] Apply performance improvements to Practitioner Endpoint

Jira Ticket #NDH-642

Problem

Adam and Demetrius had noted timeouts and poor database performance in their load testing, and the Technical Working Group also noticed timeouts. We had deferred optimizing search and indexes, due to uncertainties about use patterns and the shape/structure of what the data would look like coming out of the ETL, which might affect optimization considerations.

Solution

Add V18.1 migration to create a tsvector and indexes to individual_to_name table (speeds up filtering and ordering)
Add V18.2 migration to add tsvector and index to nucc table (speeds up filtering)
Add V18.3 migration to add tsvector and index to address_us table (speeds up filtering)
Update models.py to reflect the search_vector and index updates
Update PractitionerFilterSet to reflect new filter pattern based on name, address, and nucc tsvectors
Update PractitionerRoleFilterSet to reflect new filter pattern based on practitioner_name and practitioner_type updates in PractitionerFilterSet
Update tests accordingly
Remove unnecessary queryset annotations, which slow down API calls

Result

Response times for /fhir/Practitioner go down from ~30-40 seconds to ~10-20 seconds, depending on the filter being applied

Test Plan

All tests should pass
Test API calls locally and compare response times to the same calls made against https://dev.cnpd.internal.cms.gov/

backend/npdfhir/filters/practitioner_role_filter_set.py

rmillergv

backend/npdfhir/filters/practitioner_filter_set.py
- filter_practitioner_type():
queryset.filter(providertotaxonomy__nucc_code__search_vector=query)
providertotaxonomy is multi-valued (providers can have multiple taxonomy rows). If 2 taxonomy rows match the websearch query, code can get
the same Practitioner twice.
- filter_address() (and filter_address_city/state/postalcode()):
joins through individual__individualtoaddress__... which is also multi-valued (multiple addresses), so same duplication risk.

backend/npdfhir/filters/practitioner_role_filter_set.py
- filter_practitioner_type() used to end with .distinct() and now doesn’t: If that's intentional, great, but curious why allowing multiple rows?
  queryset.filter(provider_to_organization__individual__providertotaxonomy__nucc_code__search_vector=query)
  same multi-valued taxonomy join → duplicate PractitionerRole rows possible.

I'm approving this with the above comment, because duplicate rows may not be an issue as the queries for this data when run are likely to return a single row. So, a nicety if you want to look at it. Or if you meant to have duplicates, that is fine too.

spopelka-dsac · 2026-02-09T12:31:02Z

backend/npdfhir/filters/practitioner_filter_set.py - filter_practitioner_type(): queryset.filter(providertotaxonomy__nucc_code__search_vector=query) providertotaxonomy is multi-valued (providers can have multiple taxonomy rows). If 2 taxonomy rows match the websearch query, code can get the same Practitioner twice. - filter_address() (and filter_address_city/state/postalcode()): joins through individual__individualtoaddress__... which is also multi-valued (multiple addresses), so same duplication risk.

backend/npdfhir/filters/practitioner_role_filter_set.py

filter_practitioner_type() used to end with .distinct() and now doesn’t: If that's intentional, great, but curious why allowing multiple rows?
queryset.filter(provider_to_organization__individual__providertotaxonomy__nucc_code__search_vector=query)
same multi-valued taxonomy join → duplicate PractitionerRole rows possible.

I'm approving this with the above comment, because duplicate rows may not be an issue as the queries for this data when run are likely to return a single row. So, a nicety if you want to look at it. Or if you meant to have duplicates, that is fine too.

Good observation, Ross! In both instances, the queryset is based on the Provider model, which contains only distinct records by definition (each row represents a single individual with a Type 1 NPI). The way that our filtering framework works, the filters are performing an inner join on those related tables to get the individual_ids associated with the Practitioners for which the filter conditions are fulfilled, and then the rest of the queries are performed based on the set of individual ids that were returned. If you spin up the API locally, you can click the "DJDT" icon on the righthand side of the screen when you're querying a resource using its endpoint directory (i.e. not through the API docs, but rather localhost:8000/fhir/Practitioner for example), which will show you information about what the database is doing under-the-hood. Regardless, .distinct() was not actually doing anything in either line. @rmillergv

spopelka-dsac added 14 commits January 26, 2026 23:35

adding individual_to_name join

f3696de

deleting accidental file

e8dc254

update

b6c5f83

removing .all

7411495

WIP

2b93b75

modified search vector

8bb24cf

Merge branch 'main' into sjp/streamline-joins

c66fc0a

committing uncommitted changes

b6388e0

Merge branch 'main' into sjp/streamline-joins

734528b

env_template update

679e0b8

making good progress

91dc68a

minor updates

8f6bf75

Merge branch 'main' into sjp/streamline-joins

10ce578

Fixing failing test

87ae917

spopelka-dsac requested review from IsaacMilarky, rmillergv, sachin-panayil and wbprice as code owners February 5, 2026 20:56

spopelka-dsac added 2 commits February 5, 2026 15:58

Merge branch 'main' into sjp/streamline-joins

fba91be

removing unused imports to resolve failed linting check

980ee47

spopelka-dsac mentioned this pull request Feb 6, 2026

[NDH-830] Update PractitionerRole Endpoint #389

Merged

wbprice approved these changes Feb 6, 2026

View reviewed changes

backend/npdfhir/filters/practitioner_role_filter_set.py Show resolved Hide resolved

backend/npdfhir/filters/practitioner_role_filter_set.py Show resolved Hide resolved

Merge branch 'main' into sjp/streamline-joins

1039f30

rmillergv reviewed Feb 6, 2026

View reviewed changes

Merge branch 'main' into sjp/streamline-joins

7930b89

spopelka-dsac enabled auto-merge (squash) February 9, 2026 21:06

spopelka-dsac merged commit ac30e15 into main Feb 9, 2026
12 checks passed

spopelka-dsac deleted the sjp/streamline-joins branch February 9, 2026 21:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[NDH-642] Apply performance improvements to Practitioner Endpoint#385

[NDH-642] Apply performance improvements to Practitioner Endpoint#385
spopelka-dsac merged 18 commits intomainfrom
sjp/streamline-joins

spopelka-dsac commented Feb 5, 2026

Uh oh!

Uh oh!

Uh oh!

rmillergv left a comment

Uh oh!

spopelka-dsac commented Feb 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

spopelka-dsac commented Feb 5, 2026

module-name: [NDH-642] Apply performance improvements to Practitioner Endpoint

Jira Ticket #NDH-642

Problem

Solution

Result

Test Plan

Uh oh!

Uh oh!

Uh oh!

rmillergv left a comment

Choose a reason for hiding this comment

Uh oh!

spopelka-dsac commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

spopelka-dsac commented Feb 9, 2026 •

edited

Loading