perf(increment-approve): optimize OBS source report processing + development group filtering#495
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #495 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 97 97
Lines 9690 9703 +13
Branches 514 515 +1
=========================================
+ Hits 9690 9703 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
3d0946b to
faa2124
Compare
faa2124 to
f0320c6
Compare
asmorodskyi
left a comment
There was a problem hiding this comment.
I just would like to raise a flag that multi-thread is quite expensive thing which brings it's own set of problems to a code . And we should always think hard to find other ways how to achieve same goal . From I understand only reason why we switching to multi-threading is because @okurz find it's annoying to wait 10 minutes for single run during development phase which I find quite questionable reasoning . multi-threading for example makes debug'ing of this code challenging .
My blocking does not mean hard block but I would like raise discussion and gather more opinions before we actually dive into this river ...
multi-threading is not expensive. Here the code is naturally mapping out what we want to do: Process all datapoints, regardless of the order or any other dependency. By the way, the two commits mostly optimized by caching and a more efficient lookup. The multi-threading is only secondary and is following the concept of futures that are already used in qem-bot in other locations |
Motivation: The script performed a sequential openQA API request (get_single_job) for every individual job to check if it belonged to a development group. This caused hundreds of redundant HTTP requests. Design Choices: Modified _filter_jobs to use the group_id already present in the job_stats response. This allows checking the development group status once per group instead of once per job ID, utilizing openQA's job grouping. Removed the now-unused is_in_devel_group(job_id) helper. Benefits: Reduced execution time by approximately 80 seconds (over 50% improvement). Significantly reduced load on the openQA API. Related issue: https://progress.opensuse.org/issues/198728
Motivation: Processing multiple OBS actions for a request involved redundant repository lookups and sequential loading of package data from source reports, leading to high execution times for requests with many actions. Design Choices: 1. Added caching for project repository lookups via get_repos_of_project. 2. Parallelized the loading of packages from source reports for all actions using ThreadPoolExecutor and thread-safe updates to the package lists. Benefits: Reduced runtime for package diffing in large requests (e.g., 16 actions) by approximately 34 seconds. Related issue: https://progress.opensuse.org/issues/198728
f0320c6 to
77d13b1
Compare
Related issue: https://progress.opensuse.org/issues/198728