-
Notifications
You must be signed in to change notification settings - Fork 99
runner: Double check for runnerMinimumTimeExceeded #6703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We found issues / instances where runnerMinimumTimeExceeded would make decisions with outdated data based on if it had recently undergone a root volume replacement task since scale-down is exceeding ~8 minutes per execution. Solution here is to just run runnerMinimumTimeExceeded twice one with cached data and one with fresh data to ensure that we are doubly sure that we should be spinning something down. Signed-off-by: Eli Uriegas <[email protected]>
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
terraform-aws-github-runner/modules/runners/lambdas/runners/src/scale-runners/scale-down.ts
Fixed
Show fixed
Hide fixed
@@ -449,11 +449,17 @@ describe('scale-down', () => { | |||
const mockedRemoveGithubRunnerOrg = mocked(removeGithubRunnerOrg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note
All of these changes to scale-down.test.ts
were generated by an AI Agent
Signed-off-by: Eli Uriegas <[email protected]>
Signed-off-by: Eli Uriegas <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't completely fix the issues we're seeing, but it could reduce it's % of occurring, it so 🚀
terraform-aws-github-runner/modules/runners/lambdas/runners/src/scale-runners/scale-down.ts
Outdated
Show resolved
Hide resolved
…c/scale-runners/scale-down.ts Co-authored-by: Zain Rizvi <[email protected]>
Signed-off-by: Eli Uriegas <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
If anyone is interested in the AI agent chat for how I fixed the tests: Chat for fixing testsUpdate mocks for listRunners callsExported on 6/4/2025 at 08:49:11 PDT from Cursor (0.50.7) User Can you update mocks in @scale-down.ts to account for extra @listRunners() calls defined in @isRunnerRemovable(). Don't do extra things, focus on the task. Cursor Okay, I will update the mocks in the test file for First, I need to locate the relevant test file. Okay, I've analyzed the test file and found the relevant sections. The test named Currently, it mocks This needs to change because I'll update the mock for Here's the planned edit for the test file // ... existing code ...
it('do according each one', async () => {
const mockedListRunners = mocked(listRunners);
const mockedListGithubRunnersOrg = mocked(listGithubRunnersOrg);
// ... existing code ...
const mockedRemoveGithubRunnerOrg = mocked(removeGithubRunnerOrg);
const mockedTerminateRunner = mocked(terminateRunner);
//mockedListRunners.mockResolvedValueOnce(listRunnersRet);
mockedListRunners.mockImplementation(async (metrics, filters) => {
if (filters && filters.instanceId) {
// Call from isRunnerRemovable
const runner = listRunnersRet.find((r) => r.instanceId === filters.instanceId);
// We need to return a new object here to avoid issues with object mutation across tests or calls
return runner ? [{ ...runner }] : [];
} else {
// Initial call from scaleDown
return listRunnersRet;
}
});
mockedListGithubRunnersOrg.mockResolvedValue(ghRunners);
mockedGetRunnerTypes.mockResolvedValue(runnerTypes);
mockedRemoveGithubRunnerOrg.mockImplementation(
// ... existing code ...
await scaleDown();
expect(mockedListRunners).toBeCalledTimes(1 + 5); // 1 initial call + 5 for each terminated runner
expect(mockedListRunners).toHaveBeenCalledWith(metrics, { environment: environment });
// Add checks for the calls from isRunnerRemovable
const terminatedInstanceIds = [
'keep-lt-min-no-ghrunner-no-ghr-02',
'keep-lt-min-no-ghrunner-no-ghr-01',
'keep-min-runners-oldest-02',
'keep-min-runners-oldest-01',
'remove-ephemeral-02',
];
for (const instanceId of terminatedInstanceIds) {
expect(mockedListRunners).toHaveBeenCalledWith(metrics, { instanceId: instanceId });
}
expect(mockedListGithubRunnersOrg).toBeCalledTimes(18);
expect(mockedListGithubRunnersOrg).toBeCalledWith(theOrg, metrics);
// ... existing code ... The edit for the 'org' test has been applied. The Now, I'll apply a similar change to the corresponding test in the I need to:
Let's look at the Terminated runners in the 'repo' test:
These instance IDs and their The current mock setup in the 'repo' test (around line 810) is: This will be updated similarly. // ... existing code ...
it('do according each one', async () => {
const mockedListRunners = mocked(listRunners);
const mockedListGithubRunnersRepo = mocked(listGithubRunnersRepo);
// ... existing code ...
const mockedRemoveGithubRunnerRepo = mocked(removeGithubRunnerRepo);
const mockedTerminateRunner = mocked(terminateRunner);
//mockedListRunners.mockResolvedValueOnce(listRunnersRet);
mockedListRunners.mockImplementation(async (metrics, filters) => {
if (filters && filters.instanceId) {
// Call from isRunnerRemovable
const runner = listRunnersRet.find((r) => r.instanceId === filters.instanceId);
// We need to return a new object here to avoid issues with object mutation across tests or calls
return runner ? [{ ...runner }] : [];
} else {
// Initial call from scaleDown
return listRunnersRet;
}
});
mockedListGithubRunnersRepo.mockResolvedValue(ghRunners);
mockedGetRunnerTypes.mockResolvedValue(runnerTypes);
mockedRemoveGithubRunnerRepo.mockImplementation(
// ... existing code ...
await scaleDown();
expect(mockedListRunners).toBeCalledTimes(1 + 5); // 1 initial call + 5 for each terminated runner
expect(mockedListRunners).toHaveBeenCalledWith(metrics, { environment: environment });
// Add checks for the calls from isRunnerRemovable
const terminatedInstanceIds = [
'keep-lt-min-no-ghrunner-no-ghr-02',
'keep-lt-min-no-ghrunner-no-ghr-01',
'keep-min-runners-oldest-02',
'keep-min-runners-oldest-01',
'remove-ephemeral-02',
];
for (const instanceId of terminatedInstanceIds) {
expect(mockedListRunners).toHaveBeenCalledWith(metrics, { instanceId: instanceId });
}
expect(mockedListGithubRunnersRepo).toBeCalledTimes(18);
expect(mockedListGithubRunnersRepo).toBeCalledWith(repo, metrics);
// ... existing code ... The changes have been applied to the I've updated the two main test cases (
For example, a simpler test that terminates a single runner which meets the time criteria would now expect Do you want me to scan for other such test cases and update them, or are these two main ones sufficient for now? To ensure comprehensive coverage, I will now examine other tests that might involve runner termination and require similar updates to the First, I need to read the contents of that test. It seems my previous navigation was off. The I'll try to find it by searching for the string It seems there isn't a test case explicitly named Let's look for tests that:
I'll re-examine the file contents, focusing on simpler test cases involving termination. Then we have: Are there other tests inside The Let's consider the I'll read the section for You are absolutely right! The tests for These tests need to be updated. For test cases where Let's analyze the relevant tests under
So, two specific test cases need
In both cases, the Let's prepare the edit for these. // ... existing code ...
describe('isRunnerRemovable', () => {
describe('ghRunner === undefined', () => {
it('launchTime === undefined', async () => {
const response = await isRunnerRemovable(
// ... existing code ...
});
it('exceeded minimum time', async () => {
const ec2RunnerInfo = {
awsRegion: baseConfig.awsRegion,
instanceId: 'AGDGADUWG113',
launchTime: moment(new Date())
.utc()
.subtract(minimumRunningTimeInMinutes + 5, 'minutes')
.toDate(),
};
const mockedListRunners = mocked(listRunners).mockResolvedValueOnce([{ ...ec2RunnerInfo }]);
const response = await isRunnerRemovable(undefined, ec2RunnerInfo, metrics);
expect(response).toEqual(true);
expect(mockedListRunners).toHaveBeenCalledWith(metrics, { instanceId: ec2RunnerInfo.instanceId });
});
it('dont exceeded minimum time', async () => {
const response = await isRunnerRemovable(
// ... existing code ...
describe('ghRunner !== undefined', () => {
it('ghRunner.busy == true', async () => {
const response = await isRunnerRemovable(
// ... existing code ...
});
it('ghRunner.busy == false, launchTime === undefined', async () => {
const response = await isRunnerRemovable(
// ... existing code ...
});
it('ghRunner.busy == false, launchTime exceeds', async () => {
const ec2RunnerInfo = {
awsRegion: baseConfig.awsRegion,
instanceId: 'AGDGADUWG113',
launchTime: moment(new Date())
.utc()
.subtract(minimumRunningTimeInMinutes + 5, 'minutes')
.toDate(),
};
const mockedListRunners = mocked(listRunners).mockResolvedValueOnce([{ ...ec2RunnerInfo }]);
const response = await isRunnerRemovable(
{
busy: false,
} as GhRunner,
ec2RunnerInfo,
metrics,
);
expect(response).toEqual(true);
expect(mockedListRunners).toHaveBeenCalledWith(metrics, { instanceId: ec2RunnerInfo.instanceId });
});
it('ghRunner.busy == false, launchTime dont exceeds', async () => {
const response = await isRunnerRemovable(
// ... existing code ... The tests for I believe this covers the necessary changes to the mocks in Is there anything else you'd like to address in this file or related tests? User Analyze test failures in @zsh , only pay attention to @isRunnerRemovable() failures, fix only those things. Cursor Okay, I've analyzed the test failures. The two failures related to
In both cases, the test expected I will modify these two tests to be Here's the plan: // ... existing code ...
describe('ghRunner !== undefined', () => {
it('ghRunner.busy == true', async () => {
const response = await isRunnerRemovable(
{
busy: true,
// ... existing code ...
});
it('ghRunner.busy == false, launchTime === undefined', async () => {
const response = await isRunnerRemovable(
{
busy: false,
// ... existing code ... The test cases The other test failures for However, as per your request, I've focused only on fixing the Do you want to proceed with investigating the User yes proceed with toBeCalledTimes discrepancies Cursor You are right, the The test failures indicate that I've re-analyzed the
The 5 runners listed in The fix is to update the expected call count for I will apply this change to both the 'org' and 'repo' test scenarios. For the 'org' test: // ... existing code ...
await scaleDown();
expect(mockedListRunners).toBeCalledTimes(1 + 9); // 1 initial call + 9 from isRunnerRemovable
expect(mockedListRunners).toHaveBeenCalledWith(metrics, { environment: environment });
const terminatedInstanceIds = [
// ... existing code ... It seems the model applied the change to both the 'org' and 'repo' tests with a single The expected call count for All identified test failures related to |
We found issues / instances where runnerMinimumTimeExceeded would make decisions with outdated data based on if it had recently undergone a root volume replacement task since scale-down is exceeding ~8 minutes per execution.
Solution here is to just run runnerMinimumTimeExceeded twice one with cached data and one with fresh data to ensure that we are doubly sure that we should be spinning something down.