-
Notifications
You must be signed in to change notification settings - Fork 56
Filter out Copr builds without SRPM in SQL #2863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @m-blaha, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the data retrieval process for Copr build lists by migrating the filtering of incomplete builds from application-level Python code to the database query. This enhancement ensures that the API consistently delivers the expected number of results and improves overall performance by optimizing data processing at its source. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
In addition to pagination slicing, the CoprBuildsList class uses Python code to filter out Copr builds that are waiting for an SRPM or whose SRPM build failed. This causes that the API in some cases can return fewer items than the user requested. Moving the filter into SQL resolves the problem. The SQL filter relies on the fact that the build_id field is NULL until the build is actually created by submitting to Copr. Resolves: packit#2505
ab917f1 to
fe6327e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly moves the filtering of Copr builds from Python code into the SQL query. This is a good change that resolves an issue with pagination and improves efficiency. The implementation is clean and the reasoning is sound. I have one suggestion for a further performance improvement to address a potential N+1 query problem in the API endpoint, which would make this part of the code even more efficient.
| for build in CoprBuildTargetModel.get_merged_chroots(first, last): | ||
| build_info = CoprBuildTargetModel.get_by_build_id(build.build_id, None) | ||
| if build_info.status == BuildStatus.waiting_for_srpm: | ||
| continue | ||
| if ( | ||
| build_info.status == BuildStatus.failure | ||
| and not build_info.build_start_time | ||
| and not build_info.build_logs_url | ||
| ): | ||
| # SRPM build failed, it doesn't make sense to list this build | ||
| continue | ||
| project_info = build_info.get_project() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop currently causes an N+1 query problem. For each build returned by get_merged_chroots, you're making at least two more database queries: get_by_build_id and then get_project (which itself can trigger multiple lazy-loads). This can lead to significant performance degradation, especially with a large number of builds.
To resolve this, I recommend modifying CoprBuildTargetModel.get_merged_chroots to fetch all the necessary information in a single query by using joins and returning all required fields. This would eliminate the need for extra queries inside the loop.
For example, you could extend the query in get_merged_chroots to join with CoprBuildGroupModel, PipelineModel, ProjectEventModel, and GitProjectModel to retrieve fields like project_name, build_submitted_time, web_url, commit_sha, and project details. You would need to use an aggregate function (like min or max) on these additional fields within the group_by clause, since they will be the same for all chroots of a given build.
This would make the API endpoint much more performant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, but is out of scope here. If we wanted to optimize the code, it should be tracked as a separate issue.
In addition to pagination slicing, the CoprBuildsList class uses Python code to filter out Copr builds that are waiting for an SRPM or whose SRPM build failed. This causes that the API in some cases can return fewer items than the user requested.
Moving the filter into SQL resolves the problem. The SQL filter relies on the fact that the build_id field is NULL until the build is actually created by submitting to Copr.
Resolves: #2505