Only preload interactive algorithms for reader studies with budget#4741
Only preload interactive algorithms for reader studies with budget#4741amickan wants to merge 1 commit into
Conversation
| reader_studies_out_of_budget = [ | ||
| rs.pk | ||
| for rs in ReaderStudy.objects.exclude(max_credits__isnull=True).only( | ||
| "pk", "max_credits" | ||
| ) | ||
| if not rs.has_budget | ||
| ] | ||
|
|
There was a problem hiding this comment.
This query has an N+1 problem (due to expensive queries in credits_consumed). It's an asynchronous task, but still. Currently looking at annotating credits_consumed in the DB to make this more efficient.
There was a problem hiding this comment.
Struggling with turning credits_consumed into an annotation. It involves nested aggregates, and gets quite complex. Maybe prefetching the related objects is good enough here? In general, I would like to know how bad the performance is before I continue trying to optimize this.
@jmsmkn could you compare timings for the following 2 queries:
import time
from grand_challenge.reader_studies.models import ReaderStudy
def query_without_prefetch():
return [
rs.pk
for rs in ReaderStudy.objects.exclude(max_credits__isnull=True).only(
"pk", "max_credits"
)
if not rs.has_budget
]
def query_with_prefetch():
return [
rs.pk
for rs in ReaderStudy.objects.exclude(max_credits__isnull=True)
.prefetch_related(
"session_utilizations__reader_studies",
"endpoint_utilizations__reader_studies",
)
.only("pk", "max_credits")
if not rs.has_budget
]
def benchmark(label, func):
start = time.perf_counter()
result = func()
end = time.perf_counter()
return {
"label": label,
"time_seconds": end - start,
}
def main():
results = [
benchmark("without_prefetch", query_without_prefetch),
benchmark("with_prefetch", query_with_prefetch),
]
for r in results:
print(f"{r['label']}:")
print(f" time: {r['time_seconds']:.6f}s")
if __name__ == "__main__":
main()There was a problem hiding this comment.
without_prefetch:
time: 0.294184s
with_prefetch:
time: 0.203041s
There was a problem hiding this comment.
Thanks. Those timings seem fine to me for an asynchronous task and not worth the added complexity of moving the consumed credits into a DB annotation.
There was a problem hiding this comment.
Would still be worth including the prefetching if those objects are going to be used.
There was a problem hiding this comment.
Very true, I had that as a local commit, but forgot to push!
| Session.RUNNING, | ||
| ], | ||
| ) | ||
| .exclude(reader_study__pk__in=reader_studies_out_of_budget) |
There was a problem hiding this comment.
Is there a reason to use the negative here? Filtering is going to be more efficient than excluding.
There was a problem hiding this comment.
No. This seemed easier to read to me and I didn't know that filtering is more efficient than excluding. I'll change it to filter().
Only preload interactive algorithms of reader studies with budget.
Part of https://github.com/DIAGNijmegen/rse-roadmap/issues/469