worker: run repo maintenance during idle time (Bug 2037216) by cgsheeh · Pull Request #1135 · mozilla-conduit/lando

cgsheeh · 2026-05-05T19:47:39Z

Lando's workers typically run repo-cleaning commands at
the beginning of each job processing time. In hg workers,
we run hg strip at the start of each job to remove
previously-created stale commits, despite those commits
not interfering with the job completion. In Git, we take
the opposite approach and simply ignore the temporary
work branch - it is never cleaned up.

Add a new "repo maintenance" step that runs while the
worker is being throttled due to no jobs remaining in
the queue. To avoid running excessive maintenance, the
runtime of the last maintenance run for each repo is
recorded in the worker, and maintenance is skipped if
it has been completed within the threshold.

For Mercurial, move the hg strip command into this maintenance
task, which should save us about 8s for each push to try.
For Git, add a cleanup of the stale working branches,
so we no longer have thousands of temp branches in our
worker repos.

After this change, each HgSCM.clean_repo call sites
always pass strip_non_public_commits=False, while
GitSCM.clean_repo call sites always pass True.
Remove the kwarg and make each behaviour the default.

Lando's workers typically run repo-cleaning commands at the beginning of each job processing time. In hg workers, we run `hg strip` at the start of each job to remove previously-created stale commits, despite those commits not interfering with the job completion. In Git, we take the opposite approach and simply ignore the temporary work branch - it is never cleaned up. Add a new "repo maintenance" step that runs while the worker is being throttled due to no jobs remaining in the queue. To avoid running excessive maintenance, the runtime of the last maintenance run for each repo is recorded in the worker, and maintenance is skipped if it has been completed within the threshold. For Mercurial, move the `hg strip` command into this maintenance task, which should save us about 8s for each push to try. For Git, add a cleanup of the stale working branches, so we no longer have thousands of temp branches in our worker repos. After this change, each `HgSCM.clean_repo` call sites always pass `strip_non_public_commits=False`, while `GitSCM.clean_repo` call sites always pass `True`. Remove the kwarg and make each behaviour the default.

github-actions · 2026-05-05T19:47:49Z

View this pull request in Lando to land it once approved.

zzzeid

Few small comments (and note failing tests).

shtrom · 2026-05-06T03:06:11Z

@@ -31,7 +31,7 @@ def test_integrated_hgrepo_clean_repo(hg_clone):
    repo = HgSCM(hg_clone.strpath)


nit: Could we rename this variable to scm while we're at it?

repo = HgSCM() is the convention in this file, oddly enough. I updated this instance, but we should fix the others in a follow-up. :)

It pre-dates the SCM split (;

zzzeid

Few additional comments but looks good otherwise.

zzzeid · 2026-05-06T14:03:48Z

+        # `git branch -D` refuses to delete the currently checked-out branch,
+        # so move off any `lando-*` branch first.


Interesting bit here. In the future it might make more sense to ensure we are back on the default branch after a job is finished.

zzzeid

It occurred to me that there may be some unexpected behaviour (not mentioned here) as to what would happen (1) on first deploy (i.e., the first maintenance run) but also (2) when the landing worker is busy for a long time (which is typical) and things accumulate in such a way that the maintenance run may take longer than expected (or possibly take more resources than expected). Worth testing those scenarios before deploying.

cgsheeh · 2026-05-08T17:37:01Z

It occurred to me that there may be some unexpected behaviour (not mentioned here) as to what would happen (1) on first deploy (i.e., the first maintenance run) but also (2) when the landing worker is busy for a long time (which is typical) and things accumulate in such a way that the maintenance run may take longer than expected (or possibly take more resources than expected). Worth testing those scenarios before deploying.

I'm going to throw this up on dev and test it out before merging/deploying. 👍

To handle 2), we should check that we haven't exceeded the throttle time between each repo maintenance call, so we only clean up a few repos at a time before looking for another landing job. We could also sort the repos by time since the last maintenance to decide which one to process, that way each repo is guaranteed to have maintenance run on it eventually, and we don't accidentally spend several minutes running maintenance tasks instead of processing landing jobs.

cgsheeh requested a review from a team as a code owner May 5, 2026 19:47

zzzeid requested changes May 5, 2026

View reviewed changes

cgsheeh added 9 commits May 5, 2026 16:40

add treestatusdouble to landing worker fixtures

02c3eac

docstring

7d271d1

clarify docstring

9e9bafb

comment clarify

2ef9a61

less comment

d19a8ab

revert to old docstring state

3a62160

set maintenance interval via config on Worker class

6fa23f7

add db migration

da14d67

use -infinity instead of 0 as default last_maintenance

80563fe

cgsheeh requested review from shtrom and zzzeid May 6, 2026 03:10

shtrom approved these changes May 6, 2026

View reviewed changes

zzzeid approved these changes May 6, 2026

View reviewed changes

cgsheeh added 3 commits May 6, 2026 10:57

s/repo/scm/

169feeb

spacing

6b948be

use datetime.now instead of time.monotonic

6463654

zzzeid reviewed May 7, 2026

View reviewed changes

cgsheeh added 2 commits May 7, 2026 15:03

remove as_cwd

51c01e6

move to default branch unconditionally

290cd8a

cgsheeh added 4 commits May 8, 2026 13:38

update maintenance docstrings

adecbf2

max runtime for maintenance

3eb51e4

Merge remote-tracking branch 'origin/main' into idle-strip

6dbec73

ruff formatting

ffdf03f

		@@ -31,7 +31,7 @@ def test_integrated_hgrepo_clean_repo(hg_clone):
		repo = HgSCM(hg_clone.strpath)

		# `git branch -D` refuses to delete the currently checked-out branch,
		# so move off any `lando-*` branch first.

Conversation

cgsheeh commented May 5, 2026

Uh oh!

github-actions Bot commented May 5, 2026

Uh oh!

zzzeid left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shtrom May 6, 2026

Choose a reason for hiding this comment

Uh oh!

cgsheeh May 6, 2026

Choose a reason for hiding this comment

Uh oh!

shtrom May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zzzeid left a comment

Choose a reason for hiding this comment

Uh oh!

zzzeid May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zzzeid left a comment

Choose a reason for hiding this comment

Uh oh!

cgsheeh commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zzzeid left a comment •

edited

Loading