worker: run repo maintenance during idle time (Bug 2037216)#1135
worker: run repo maintenance during idle time (Bug 2037216)#1135cgsheeh wants to merge 19 commits into
Conversation
Lando's workers typically run repo-cleaning commands at the beginning of each job processing time. In hg workers, we run `hg strip` at the start of each job to remove previously-created stale commits, despite those commits not interfering with the job completion. In Git, we take the opposite approach and simply ignore the temporary work branch - it is never cleaned up. Add a new "repo maintenance" step that runs while the worker is being throttled due to no jobs remaining in the queue. To avoid running excessive maintenance, the runtime of the last maintenance run for each repo is recorded in the worker, and maintenance is skipped if it has been completed within the threshold. For Mercurial, move the `hg strip` command into this maintenance task, which should save us about 8s for each push to try. For Git, add a cleanup of the stale working branches, so we no longer have thousands of temp branches in our worker repos. After this change, each `HgSCM.clean_repo` call sites always pass `strip_non_public_commits=False`, while `GitSCM.clean_repo` call sites always pass `True`. Remove the kwarg and make each behaviour the default.
|
View this pull request in Lando to land it once approved. |
| @@ -31,7 +31,7 @@ def test_integrated_hgrepo_clean_repo(hg_clone): | |||
| repo = HgSCM(hg_clone.strpath) | |||
There was a problem hiding this comment.
nit: Could we rename this variable to scm while we're at it?
There was a problem hiding this comment.
repo = HgSCM() is the convention in this file, oddly enough. I updated this instance, but we should fix the others in a follow-up. :)
zzzeid
left a comment
There was a problem hiding this comment.
Few additional comments but looks good otherwise.
| # `git branch -D` refuses to delete the currently checked-out branch, | ||
| # so move off any `lando-*` branch first. |
There was a problem hiding this comment.
Interesting bit here. In the future it might make more sense to ensure we are back on the default branch after a job is finished.
zzzeid
left a comment
There was a problem hiding this comment.
It occurred to me that there may be some unexpected behaviour (not mentioned here) as to what would happen (1) on first deploy (i.e., the first maintenance run) but also (2) when the landing worker is busy for a long time (which is typical) and things accumulate in such a way that the maintenance run may take longer than expected (or possibly take more resources than expected). Worth testing those scenarios before deploying.
I'm going to throw this up on dev and test it out before merging/deploying. 👍 To handle 2), we should check that we haven't exceeded the throttle time between each repo maintenance call, so we only clean up a few repos at a time before looking for another landing job. We could also sort the repos by time since the last maintenance to decide which one to process, that way each repo is guaranteed to have maintenance run on it eventually, and we don't accidentally spend several minutes running maintenance tasks instead of processing landing jobs. |
Lando's workers typically run repo-cleaning commands at
the beginning of each job processing time. In hg workers,
we run
hg stripat the start of each job to removepreviously-created stale commits, despite those commits
not interfering with the job completion. In Git, we take
the opposite approach and simply ignore the temporary
work branch - it is never cleaned up.
Add a new "repo maintenance" step that runs while the
worker is being throttled due to no jobs remaining in
the queue. To avoid running excessive maintenance, the
runtime of the last maintenance run for each repo is
recorded in the worker, and maintenance is skipped if
it has been completed within the threshold.
For Mercurial, move the
hg stripcommand into this maintenancetask, which should save us about 8s for each push to try.
For Git, add a cleanup of the stale working branches,
so we no longer have thousands of temp branches in our
worker repos.
After this change, each
HgSCM.clean_repocall sitesalways pass
strip_non_public_commits=False, whileGitSCM.clean_repocall sites always passTrue.Remove the kwarg and make each behaviour the default.