Skip to content

Only preload interactive algorithms for reader studies with budget#4741

Open
amickan wants to merge 1 commit into
mainfrom
only_preload_algorithms_for_studies_with_budget
Open

Only preload interactive algorithms for reader studies with budget#4741
amickan wants to merge 1 commit into
mainfrom
only_preload_algorithms_for_studies_with_budget

Conversation

@amickan
Copy link
Copy Markdown
Contributor

@amickan amickan commented May 27, 2026

Only preload interactive algorithms of reader studies with budget.

Part of https://github.com/DIAGNijmegen/rse-roadmap/issues/469

@amickan amickan requested a review from jmsmkn as a code owner May 27, 2026 13:54
@amickan amickan marked this pull request as draft May 27, 2026 14:01
@amickan amickan self-assigned this May 27, 2026
Comment on lines +1313 to +1320
reader_studies_out_of_budget = [
rs.pk
for rs in ReaderStudy.objects.exclude(max_credits__isnull=True).only(
"pk", "max_credits"
)
if not rs.has_budget
]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query has an N+1 problem (due to expensive queries in credits_consumed). It's an asynchronous task, but still. Currently looking at annotating credits_consumed in the DB to make this more efficient.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Struggling with turning credits_consumed into an annotation. It involves nested aggregates, and gets quite complex. Maybe prefetching the related objects is good enough here? In general, I would like to know how bad the performance is before I continue trying to optimize this.

@jmsmkn could you compare timings for the following 2 queries:

import time
from grand_challenge.reader_studies.models import ReaderStudy


def query_without_prefetch():
    return [
        rs.pk
        for rs in ReaderStudy.objects.exclude(max_credits__isnull=True).only(
            "pk", "max_credits"
        )
        if not rs.has_budget
    ]


def query_with_prefetch():
    return [
        rs.pk
        for rs in ReaderStudy.objects.exclude(max_credits__isnull=True)
        .prefetch_related(
            "session_utilizations__reader_studies",
            "endpoint_utilizations__reader_studies",
        )
        .only("pk", "max_credits")
        if not rs.has_budget
    ]


def benchmark(label, func):
    start = time.perf_counter()
    result = func()
    end = time.perf_counter()

    return {
        "label": label,
        "time_seconds": end - start,
    }


def main():
    results = [
        benchmark("without_prefetch", query_without_prefetch),
        benchmark("with_prefetch", query_with_prefetch),
    ]

    for r in results:
        print(f"{r['label']}:")
        print(f"  time: {r['time_seconds']:.6f}s")


if __name__ == "__main__":
    main()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without_prefetch:
  time: 0.294184s
with_prefetch:
  time: 0.203041s

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Those timings seem fine to me for an asynchronous task and not worth the added complexity of moving the consumed credits into a DB annotation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would still be worth including the prefetching if those objects are going to be used.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very true, I had that as a local commit, but forgot to push!

@amickan amickan marked this pull request as ready for review May 29, 2026 14:10
@amickan amickan removed their assignment May 29, 2026
Session.RUNNING,
],
)
.exclude(reader_study__pk__in=reader_studies_out_of_budget)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use the negative here? Filtering is going to be more efficient than excluding.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This seemed easier to read to me and I didn't know that filtering is more efficient than excluding. I'll change it to filter().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants