[NDH-642] Apply performance improvements to Practitioner Endpoint#385
[NDH-642] Apply performance improvements to Practitioner Endpoint#385spopelka-dsac merged 18 commits intomainfrom
Conversation
rmillergv
left a comment
There was a problem hiding this comment.
backend/npdfhir/filters/practitioner_filter_set.py
- filter_practitioner_type():
queryset.filter(providertotaxonomy__nucc_code__search_vector=query)
providertotaxonomy is multi-valued (providers can have multiple taxonomy rows). If 2 taxonomy rows match the websearch query, code can get
the same Practitioner twice.
- filter_address() (and filter_address_city/state/postalcode()):
joins through individual__individualtoaddress__... which is also multi-valued (multiple addresses), so same duplication risk.
- backend/npdfhir/filters/practitioner_role_filter_set.py
- filter_practitioner_type() used to end with .distinct() and now doesn’t: If that's intentional, great, but curious why allowing multiple rows?
queryset.filter(provider_to_organization__individual__providertotaxonomy__nucc_code__search_vector=query)
same multi-valued taxonomy join → duplicate PractitionerRole rows possible.
- filter_practitioner_type() used to end with .distinct() and now doesn’t: If that's intentional, great, but curious why allowing multiple rows?
I'm approving this with the above comment, because duplicate rows may not be an issue as the queries for this data when run are likely to return a single row. So, a nicety if you want to look at it. Or if you meant to have duplicates, that is fine too.
Good observation, Ross! In both instances, the queryset is based on the Provider model, which contains only distinct records by definition (each row represents a single individual with a Type 1 NPI). The way that our filtering framework works, the filters are performing an inner join on those related tables to get the individual_ids associated with the Practitioners for which the filter conditions are fulfilled, and then the rest of the queries are performed based on the set of individual ids that were returned. If you spin up the API locally, you can click the "DJDT" icon on the righthand side of the screen when you're querying a resource using its endpoint directory (i.e. not through the API docs, but rather |
module-name: [NDH-642] Apply performance improvements to Practitioner Endpoint
Jira Ticket #NDH-642
Problem
Adam and Demetrius had noted timeouts and poor database performance in their load testing, and the Technical Working Group also noticed timeouts. We had deferred optimizing search and indexes, due to uncertainties about use patterns and the shape/structure of what the data would look like coming out of the ETL, which might affect optimization considerations.
Solution
Result
Response times for /fhir/Practitioner go down from ~30-40 seconds to ~10-20 seconds, depending on the filter being applied
Test Plan