Skip to content

Conversation

@m-blaha
Copy link
Member

@m-blaha m-blaha commented Nov 4, 2025

In addition to pagination slicing, the CoprBuildsList class uses Python code to filter out Copr builds that are waiting for an SRPM or whose SRPM build failed. This causes that the API in some cases can return fewer items than the user requested.

Moving the filter into SQL resolves the problem. The SQL filter relies on the fact that the build_id field is NULL until the build is actually created by submitting to Copr.

Resolves: #2505

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @m-blaha, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the data retrieval process for Copr build lists by migrating the filtering of incomplete builds from application-level Python code to the database query. This enhancement ensures that the API consistently delivers the expected number of results and improves overall performance by optimizing data processing at its source.

Highlights

  • SQL-based Filtering: The filtering logic for Copr builds that are waiting for an SRPM or whose SRPM build failed has been moved from Python code to the SQL query. This is achieved by filtering out records where the build_id is NULL.
  • Improved API Efficiency: By performing the filtering at the database level, the API becomes more efficient as it no longer needs to fetch and then discard unwanted records in Python.
  • Accurate Pagination: This change resolves an issue where the API could return fewer items than requested due to post-query filtering, ensuring that pagination slicing works as expected.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

In addition to pagination slicing, the CoprBuildsList class uses Python
code to filter out Copr builds that are waiting for an SRPM or whose
SRPM build failed. This causes that the API in some cases can return
fewer items than the user requested.

Moving the filter into SQL resolves the problem. The SQL filter relies
on the fact that the build_id field is NULL until the build is actually
created by submitting to Copr.

Resolves: packit#2505
@m-blaha m-blaha force-pushed the filter-out-no-source branch from ab917f1 to fe6327e Compare November 4, 2025 12:22
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly moves the filtering of Copr builds from Python code into the SQL query. This is a good change that resolves an issue with pagination and improves efficiency. The implementation is clean and the reasoning is sound. I have one suggestion for a further performance improvement to address a potential N+1 query problem in the API endpoint, which would make this part of the code even more efficient.

Comment on lines 35 to 37
for build in CoprBuildTargetModel.get_merged_chroots(first, last):
build_info = CoprBuildTargetModel.get_by_build_id(build.build_id, None)
if build_info.status == BuildStatus.waiting_for_srpm:
continue
if (
build_info.status == BuildStatus.failure
and not build_info.build_start_time
and not build_info.build_logs_url
):
# SRPM build failed, it doesn't make sense to list this build
continue
project_info = build_info.get_project()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This loop currently causes an N+1 query problem. For each build returned by get_merged_chroots, you're making at least two more database queries: get_by_build_id and then get_project (which itself can trigger multiple lazy-loads). This can lead to significant performance degradation, especially with a large number of builds.

To resolve this, I recommend modifying CoprBuildTargetModel.get_merged_chroots to fetch all the necessary information in a single query by using joins and returning all required fields. This would eliminate the need for extra queries inside the loop.

For example, you could extend the query in get_merged_chroots to join with CoprBuildGroupModel, PipelineModel, ProjectEventModel, and GitProjectModel to retrieve fields like project_name, build_submitted_time, web_url, commit_sha, and project details. You would need to use an aggregate function (like min or max) on these additional fields within the group_by clause, since they will be the same for all chroots of a given build.

This would make the API endpoint much more performant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, but is out of scope here. If we wanted to optimize the code, it should be tracked as a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: new

Development

Successfully merging this pull request may close these issues.

Respect number of items requested for API responses

1 participant