Build: add concurrency options for Git and Python builds#12853
Build: add concurrency options for Git and Python builds#12853
Conversation
While investigating slow build times, I found that there are some projects with big Git repositories that take a lot of time unpacking, checking out, and fetching submodules. This commit adds a few options to try to improve the performance of these Git operations. Besides, it adds some known environment variables to try to improve the performance of Python package builds, especially for C and C++ extensions. I noticed that only 1vCPU is used at 100% all the time on these two type of operations and it seems we can make them to perform better. Related readthedocs/readthedocs-ops#1588
stsewd
left a comment
There was a problem hiding this comment.
Have you tested how much are we gaining with these options?
| "MAKEFLAGS": f"-j {cpus}", | ||
| # Pillow and other libraries use this variable | ||
| # https://pillow.readthedocs.io/en/stable/installation/building-from-source.html#build-options | ||
| "MAX_CONCURRENCY": cpus, |
There was a problem hiding this comment.
Looks like this is the default
By default, it uses as many CPUs as are present.
There was a problem hiding this comment.
Yeah, I put Pillow as an example, but I found that it was used by other libraries as well while doing the research. We can remove it if we want, tho.
| # https://git-scm.com/docs/git-config#Documentation/git-config.txt-checkoutworkers | ||
| self.run("git", "config", "--global", "checkout.workers", "-1") | ||
| # https://git-scm.com/docs/git-config#Documentation/git-config.txt-checkoutthresholdForParallelism | ||
| self.run("git", "config", "--global", "checkout.thresholdForParallelism", "100") |
There was a problem hiding this comment.
Looks like this is already the default?
The default is 100.
There was a problem hiding this comment.
Ah, yeah, I was testing other numbers locally and forgot to remove it from here.
Co-authored-by: Santos Gallegos <stsewd@proton.me>
I tested the other day when I pushed the PR and I've noticed very small improvements in a basic repository. Today, I went deeper and found for a repository that takes more than 5 minutes to clone in our platform and I used it to test this specifically: Here is what I found:
We have a reduction of 3 minutes here just by setting this configs 🤯 Note This is the query I used to find this project: BuildCommandResult.objects.filter(command__startswith="git", exit_code=0).annotate(length=F("end_time")-F("start_time")).filter(length__gt=timezone.timedelta(seconds=60*5))Details |
|
I'm not sure how I feel with adding extra env vars to the build process, that's kind of breaking our rule to try to run the same steps a user runs locally. We may want to put these as general recommendations instead. A note from your test, you are running |
Yeah, we can remove those vars. I'm more interested in the Git configs here. I copied the commands from the build detail pages of each of those projects, so I'm testing with the same commands we are running in production |
While investigating slow build times, I found that there are some projects with big Git repositories that take a lot of time unpacking, checking out, and fetching submodules. This commit adds a few options to try to improve the performance of these Git operations.
Besides, it adds some known environment variables to try to improve the performance of Python package builds, especially for C and C++ extensions.
I noticed that only 1vCPU is used at 100% all the time on these two type of operations and it seems we can make them to perform better.
Related https://github.com/readthedocs/readthedocs-ops/issues/1588