fix: use selectinload() rather than joinedload() on one-to-many relationships to avoid Cartesian-product blow-up#7865
fix: use selectinload() rather than joinedload() on one-to-many relationships to avoid Cartesian-product blow-up#7865cfm wants to merge 1 commit into
selectinload() rather than joinedload() on one-to-many relationships to avoid Cartesian-product blow-up#7865Conversation
…ionships to avoid Cartesian-product blow-up Since #7628, we've explicitly "prefer[red] expensive eager queries over *apparently* cheaper lazy queries". That's the right instinct, but since #7604 we've overoptimized for the number of *queries* rather than the number of *results*. Here we replace joinedload() for one-to-many relationships, guaranteeing O(1) queries but giving us O(n^2) results, with selectinload(), giving us us queries linear in the number of relationships (i.e., *tables*) involved and O(n) results. In the test suite, assert_query_count() now enforces a maximum rather than an exact number of queries.
There was a problem hiding this comment.
Pull request overview
This PR adjusts eager-loading strategy in the SecureDrop SQLAlchemy models and related APIv2 tests to avoid Cartesian-product row blow-ups from joinedload() on one-to-many relationships, switching to selectinload() while keeping eager loading behavior.
Changes:
- Replace
joinedload()withselectinload()for key one-to-many relationships (e.g.,Source.submissions,Source.replies, and variousseen_*collections). - Update APIv2 query-count test helper to enforce an upper bound (maximum) instead of an exact query count.
- Update affected query-count expectations in APIv2 journalist tests to reflect the new eager-loading behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
securedrop/models.py |
Switches eager-loading options from joinedload() to selectinload() on one-to-many relationships to reduce result-set multiplication. |
securedrop/tests/test_journalist_api2.py |
Changes assert_query_count() semantics to “max queries” and updates several expectations accordingly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
How much lower, you ask? Claude generated a profiling wrapper that found a difference of more than an order of magnitude: We know from the test suite that this change maintains the correctness of the API, so what we're interested in now is seeing what the optimization actually looks like under Apache and mod_wsgi. |
|
This looks really promising, thanks. Oops on optimizing in the wrong direction!! Code looks good, I'll test it on a prod server on Monday. |
Fixes #7862. Since #7628, we've explicitly "prefer[red] expensive eager queries over apparently cheaper lazy queries". That's the right instinct, but since #7604 we've overoptimized for the number of queries rather than the number of results.
Here we replace$O(1)$ queries but giving us $O(n^2)$ results, with $O(n)$ results. In the test suite,
joinedload()for one-to-many relationships, guaranteeingselectinload(), giving us us queries linear in the number of relationships (i.e., tables) involved andassert_query_count()now enforces a maximum rather than an exact number of queries.Test plan
On
develop:On this branch (or, on a production installation, just overwrite
models.pyfrom this branch):