Description
Describe the bug
I don't know what the cause is, but when using this action I've started to see failures where the credential job crashes, but no retry or exit seems to happen - it just hangs (5+ hours)
We've had this working pretty consistently for a long time (~year?) but now it's sometimes failing like this which just takes up all of the runners.
Regression Issue
- Select this option if this issue appears to be a regression.
Expected Behavior
It should never hang. If the internet fails then it can crash
Current Behavior
Run aws-actions/configure-aws-credentials@v4
with:
role-session-name: GithubActionsRoleSession
role-to-assume: arn:aws:iam::425642425116:role/github
aws-region: us-west-2
role-duration-seconds: 21600
output-credentials: true
audience: sts.amazonaws.com
env:
HOME: /root
ADK_GITHUB_TOKEN: ***
REMOTE_ROOT: /mnt/ssd/bazeltest_github
VPU_ADDR: bazelvpu
Error: getIDToken call failed: Error message: Failed to get ID Token.
Error Code : undefined
Error Message: read ECONNRESET
context canceled
Error: The operation was canceled.
The error appeared after cancelling the job
Reproduction Steps
It probably will not be reproduced easily. We are running in a company docker container, though I don't see why that would be an issue. Removing everything irrelevant the job looks like this.
jobs:
run_embedded_vpu_bundle:
runs-on: adk-vpu2-jp5
container:
image: artifactory.bluerivertech.com/dev-adk-docker/autonomy/adk/ubuntu_2204_build:2025-02-26
volumes:
- ghrunner_ci_cache_adk-vpu2-jp5:/ci_cache
options: --shm-size 32G
steps:
- name: Assume AWS-Github role using OIDC (prod)
id: aws-role
uses: aws-actions/configure-aws-credentials@v4
with:
role-session-name: GithubActionsRoleSession
role-to-assume: arn:aws:iam::425642425116:role/github
aws-region: us-west-2
role-duration-seconds: 21600
output-credentials: true
retry-max-attempts: 50
Possible Solution
Error Message: read ECONNRESET
makes it seem that the network connection is breaking at an inopportune time during the step? Perhaps there is a point where you wait for a packet and don't crash if it doesn't arrive in a few seconds.
Additional Information/Context
No response