Skip to content

Save EC2 cloud when error code is ExpiredToken #1084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

salvomarino
Copy link

@salvomarino salvomarino commented May 2, 2025

This PR enhances the handling of ExpiredToken error code for the AwsServiceException exception in the provision method of EC2Cloud. The previous PR #1008 successfully catches the exception error code, but the reconnectToEc2() call doesn't resolve the issue, and the AWS STS token remains expired. Our investigation revealed that saving the EC2 Cloud configuration refreshes the AWS STS token and resolves the problem. Currently, we're mitigating this by manually saving configurations
Screenshot 2025-05-02 at 14 45 03
or running the Groovy script below via the Script Console.

import hudson.plugins.ec2.*

def newSessionNamesMap = [:]
def j = Jenkins.get()
def ec2Clouds = j.clouds.findAll { it instanceof EC2Cloud }
if (ec2Clouds) {
    ec2Clouds.each { oldCloud ->
        println "[info] Handling EC2 Cloud: " + oldCloud.name
        String oldSessionName = oldCloud.roleSessionName
        def newSessionName
        String currentTimestamp = System.currentTimeMillis().toString()
        def lastHyphenIndex = oldSessionName.lastIndexOf('-')
        if (lastHyphenIndex != -1 && oldSessionName.substring(lastHyphenIndex + 1).matches("\\d+")) {
            newSessionName = oldSessionName.substring(0, lastHyphenIndex) + '-' + currentTimestamp
        } else {
            newSessionName = oldSessionName + '-' + currentTimestamp
        }
        newSessionNamesMap[oldCloud.name] = newSessionName
        println "[info] Current session name : " + oldSessionName + ", new session name : " + newSessionName
        def newCloud = new EC2Cloud(
            oldCloud.name,
            oldCloud.useInstanceProfileForCredentials,
            oldCloud.credentialsId,
            oldCloud.region,
            oldCloud.privateKey,
            oldCloud.sshKeysCredentialsId,
            oldCloud.instanceCapStr,
            oldCloud.templates,
            oldCloud.roleArn,
            newSessionName
        )
        j.clouds.replace(oldCloud, newCloud)
    }
    println "[info] Saving Jenkins instance configuration..."
    j.save()
    ec2Clouds = j.clouds.findAll { it instanceof EC2Cloud }
    ec2Clouds.each { cloud ->
        println "[info] Checking EC2 Cloud: '" + cloud.name + "'..."
        if (cloud.roleSessionName == newSessionNamesMap[cloud.name]) {
            println "[info] Session name has been updated as expected: " + cloud.roleSessionName
        } else {
            def errorMessage = "[error] Session name has not been updated as expected.\n" +
                "Current roleSessionName: " + cloud.roleSessionName + "\n" +
                "Expected roleSessionName: " + newSessionNamesMap[cloud.name] + "\n"
            println errorMessage
            throw new Exception(errorMessage)
        }
    }
} else {
    println "[info] No EC2 Cloud found."
}
return null

The script above performs the save on all the EC2 Clouds and changes the session name to validate that a change/save has been made on each cloud. However, saving the cloud without changing fields is enough to handle the issue. This PR automates this process to eliminate the need for external intervention and save the EC2 Cloud configuration only if and when necessary.

Testing done

We've been experiencing this issue in our production and testing environments, but couldn't reliably reproduce it.

To validate the solution:

  1. We monitored instances where token expiration occurred and confirmed that saving the EC2 Cloud configuration resolves the issue
  2. We implemented a scheduled job to save the EC2 Cloud configuration periodically, and this mitigated the ExpiredToken issue

The fix handles the token refresh transparently to users, maintaining the same behaviour but eliminating the need for external configuration saves.

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

@salvomarino salvomarino marked this pull request as draft May 2, 2025 10:23
@salvomarino salvomarino marked this pull request as ready for review May 19, 2025 10:33
@salvomarino salvomarino changed the title Save EC2 cloud when error code is ExpiredToken Save EC2 cloud when error code is ExpiredToken May 22, 2025
@salvomarino salvomarino changed the title Save EC2 cloud when error code is ExpiredToken Save EC2 cloud when error code is ExpiredToken May 22, 2025
@salvomarino salvomarino changed the title Save EC2 cloud when error code is ExpiredToken Save EC2 cloud when error code is ExpiredToken May 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant