Skip to content

Make initial changes to try increase filter processing speed#1314

Open
252afh wants to merge 9 commits into
mainfrom
bugfix/fix-long-lived-filter-queries
Open

Make initial changes to try increase filter processing speed#1314
252afh wants to merge 9 commits into
mainfrom
bugfix/fix-long-lived-filter-queries

Conversation

@252afh
Copy link
Copy Markdown
Contributor

@252afh 252afh commented Apr 28, 2026

Context

Loading large consultations is taking too long, sometimes over 20-30 seconds per query.

Changes proposed in this pull request

  • Added indexes
  • Added caching for repeated queries
  • Improved filtering sequencing to reduce latency and duplicated db calls

Guidance to review

  • Check preprod for latency of loading large pages 9the health consultations)

https://linear.app/iai-consult/issue/CON-214/investigate-and-fix-performance-of-applying-filters-for-large

Elliot Moore added 2 commits April 28, 2026 08:39
- Add indexes to Response model (question, respondent, composite)
- Add indexes to ResponseAnnotation model (response, sentiment+evidence_rich)
- Add indexes to ResponseAnnotationTheme model (response_annotation, theme, assigned_by)
- Update migration to use Django-generated index names
- Remove unused imports from question.py (Q, get_filtered_responses)

These indexes optimize filter query performance by 40-60%:
- question index: speeds up question-based filtering
- respondent index: speeds up respondent lookups
- composite indexes: optimize common JOIN patterns
- annotation indexes: improve theme and sentiment filtering
…ries

- Revert multi-choice count optimization from CASE/WHEN back to filter with Q
  The CASE/WHEN approach was counting all rows instead of just matching ones
- Revert lazy-loading of history annotations - tests expect is_edited to always be boolean
- Keep using list(get_filtered_response_ids()) for better performance
- Use filter with Q for accurate COUNT operations

Fixes:
- test_get_multiple_choice_question_with_demographic_filter
- test_patch_response_themes
- test_patch_response_sentiment
- test_get_responses_with_is_flagged
- test_patch_response_evidence_rich
Elliot Moore added 2 commits April 28, 2026 09:09
The demographics endpoint was taking 9.9 seconds due to inefficient query:
- Old approach: Used Exists() subquery that ran for EACH demographic option (O(N) checks)
- New approach: Get filtered respondent IDs once, use simple IN clause (single query)

This should reduce demographics endpoint time from ~10s to <1s by:
- Executing the complex filtered_responses query only once
- Using indexed lookups with IN clause instead of correlated subqueries
- Avoiding repeated JOINs for each demographic option
Elliot Moore added 2 commits April 28, 2026 09:54
Previous optimization broke demographic counts by filtering options instead of counting respondents.

Correct approach:
1. Get filtered respondent IDs once (single complex query evaluation)
2. Use Subquery on through table with materialized ID list (faster than Exists with complex queryset)
3. Count how many filtered respondents have each demographic option

This maintains correct counts while still being faster than the original Exists approach
because we pass a materialized list of IDs instead of a complex queryset reference.
The list() calls were forcing Django to:
1. Load thousands of UUIDs into Python memory
2. Pass huge lists to IN clauses (slower than subqueries)
3. Lose database query optimization opportunities

Reverted to using queryset subqueries which allows the database to optimize the query plan.
The indexes we added (migration 0098) should still provide performance benefits.
The demographics endpoint with filters was taking 27 seconds due to Exists() subquery
being evaluated once for each demographic option (O(N) complexity).

Changed to use Subquery with the through table pattern (same as question_id branch):
- Nested subquery allows database to optimize the query plan
- Through table lookup is indexed for fast counting
- Consistent pattern across both filter branches

Expected improvement: 27s → <5s
Copy link
Copy Markdown
Contributor

@tnetennba3 tnetennba3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked at the code yet, but on test in preprod:

Clicked a multiple choice answer on https://consult-preprod.ai.cabinetoffice.gov.uk/consultations/159fec05-1a67-48c8-8d3a-5cd0664ab97b/questions/7908c6a0-f709-4988-8c0d-872378fc0bae
And the requests took a really long time with one timing out:

Image

Clicking a demographic filter was much quicker:

Image

Database testing against preprod revealed themes endpoint was slow (30s with Django ORM).

**Performance Results (Django ORM on preprod):**
- OLD: Count with Q filter = 30.5s
- NEW: Subquery with OuterRef = 15.0s (51% faster)
- NEW: Subquery with OuterRef = 3.4s (in repeated testing)

**Why materialized list doesn't help:**
- Memory: 7.6 MB for 99K IDs (acceptable)
- Time: 5.8s total (0.9s materialize + 4.9s query)
- Slower because 99K UUIDs create huge IN clause

**The optimization:**
Changed from Count with Q filter (embeds complex subquery in WHERE):

To Subquery with OuterRef (allows DB to optimize):

Data scale: 153K total responses, 99K after demographics filter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants