Skip to content

Conversation

@luolanzone
Copy link
Contributor

Multiple E2E jobs can be scheduled on the same Jenkins node.
If stale image files from previous jobs are not deleted, they
accumulate over time, consuming significant disk space. This
can eventually lead to 'no disk space left' errors when new
jobs start.

This change ensures that saved image files are cleaned up
when an E2E job exits, preventing disk space exhaustion.

cleanup_multicluster_antrea $kubeconfig
done
fi
rm -f ${WORKDIR}/antrea-ubuntu.tar "${WORKDIR}"/antrea-mcs.tar "${WORKDIR}"/nginx.tar "${WORKDIR}"/agnhost.tar
Copy link
Contributor

@XinShuYang XinShuYang Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luolanzone Thanks for the enhancement. I'm concerned this deletion could lead to a race condition, as multiple concurrent jobs will be placing the antrea-ubuntu.tar file in the same path. I suggest reconfiguring the tar file path to WORKSPACE instead because Jenkins can resolve this environment variable to the current job's real workspace path.

Also, I investigated this issue and found that the go cache was consuming 11 GB of space on the testbed. After running go clean -cache and go clean -modcache, this space can be recovered. I feel this can provide more benefits in mitigating the disk space shortage issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XinShuYang do we allow the same e2e job being scheduled and run in parallel in the same Node? If not, then, there should be no concurrent issue for this deletion step.
What I observed is different e2e jobs would be in the same Node, but the path of *.tar are different. e.g:
/var/lib/jenkins/workspace/antrea-kind-ipv6-ds-conformance-for-pull-request/antrea-ubuntu.tar
/var/lib/jenkins/workspace/antrea-kind-ipv6-ds-e2e-for-pull-request/antrea-ubuntu.tar
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you only referring to mc related tar? you are right, even there is no such concurrent issue, we should place them in the WORKSPACE path. I will update the mc e2e script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the path of tar files in the mc script.

I am not so sure about the go clean -cache and go clean -modcache considering it will requires each e2e job to download the cache again if we clean up the cache during every execution.

@antoninbas do you have any suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XinShuYang do we allow the same e2e job being scheduled and run in parallel in the same Node? If not, then, there should be no concurrent issue for this deletion step. What I observed is different e2e jobs would be in the same Node, but the path of *.tar are different. e.g: /var/lib/jenkins/workspace/antrea-kind-ipv6-ds-conformance-for-pull-request/antrea-ubuntu.tar /var/lib/jenkins/workspace/antrea-kind-ipv6-ds-e2e-for-pull-request/antrea-ubuntu.tar ...

Yes, this feature should have been supported in the scripts since #5734. Although we disabled it on the Jenkins configuration after the CI migration to save resource costs, we should still consider it in new code changes in case we enable it in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luolanzone I expect the Go build cache to be pretty small because we build Antrea inside docker and we only use Go "natively" to run e2e tests.
You could check the size with du -sh $(go env GOCACHE), and I read that Go deletes cache files if they haven't been used for at least 5 days.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XinShuYang do we allow the same e2e job being scheduled and run in parallel in the same Node? If not, then, there should be no concurrent issue for this deletion step. What I observed is different e2e jobs would be in the same Node, but the path of *.tar are different. e.g: /var/lib/jenkins/workspace/antrea-kind-ipv6-ds-conformance-for-pull-request/antrea-ubuntu.tar /var/lib/jenkins/workspace/antrea-kind-ipv6-ds-e2e-for-pull-request/antrea-ubuntu.tar ...

Yes, this feature should have been supported in the scripts since #5734. Although we disabled it on the Jenkins configuration after the CI migration to save resource costs, we should still consider it in new code changes in case we enable it in the future.

Hi @XinShuYang do we ever support to run the same e2e jobs in the same Node? How do we avoid that two jobs are generating the image files and overwrite another in the same directory? I think the capability of #5734 is to enable multiple jobs (but from different kind of jobs) on the same Node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XinShuYang do we allow the same e2e job being scheduled and run in parallel in the same Node? If not, then, there should be no concurrent issue for this deletion step. What I observed is different e2e jobs would be in the same Node, but the path of *.tar are different. e.g: /var/lib/jenkins/workspace/antrea-kind-ipv6-ds-conformance-for-pull-request/antrea-ubuntu.tar /var/lib/jenkins/workspace/antrea-kind-ipv6-ds-e2e-for-pull-request/antrea-ubuntu.tar ...

Yes, this feature should have been supported in the scripts since #5734. Although we disabled it on the Jenkins configuration after the CI migration to save resource costs, we should still consider it in new code changes in case we enable it in the future.

Hi @XinShuYang do we ever support to run the same e2e jobs in the same Node? How do we avoid that two jobs are generating the image files and overwrite another in the same directory? I think the capability of #5734 is to enable multiple jobs (but from different kind of jobs) on the same Node.

When two same jobs run on a single node concurrently, Jenkins creates separate working directories for each. Here is the explanation from the Jenkins documentation: Each concurrently executed build occurs in its own build workspace, isolated from any other builds. By default, Jenkins appends " @<num> " to the workspace directory name, e.g. " @2 ". @luolanzone

@luolanzone
Copy link
Contributor Author

/test-multicluster-e2e

@antoninbas antoninbas removed their assignment Nov 25, 2025
@antoninbas antoninbas self-requested a review November 25, 2025 17:10
1. Multiple E2E jobs can be scheduled on the same Jenkins node.
If stale image files from previous jobs are not deleted, they
accumulate over time, consuming significant disk space. This
can eventually lead to 'no disk space left' errors when new
jobs start.

This change ensures that saved image files are cleaned up
when an E2E job exits, preventing disk space exhaustion.

2. Clean up Golang caches unconditionally

Signed-off-by: Lan Luo <[email protected]>

function check_and_upgrade_golang() {
echo "====== Clean up Golang cache ======"
go clean -cache -modcache -testcache || true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Met the disk space issue again, I checked the Jenkins Node, the cache will be over 1G quite soon, so add a step to clean up cache unconditionally here. @XinShuYang @antoninbas can you take another look? Thanks.

root@antrea-kind-testbed:/var/lib/jenkins# du -sh $(go env GOCACHE)
1.9G	/var/lib/jenkins//.cache/go-build
root@antrea-kind-testbed:/var/lib/jenkins# du -sh $(go env GOMODCACHE)
2.1G	/var/lib/jenkins/go/pkg/mod

Copy link
Contributor

@XinShuYang XinShuYang Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to delete this cache for every run? Can we instead clean it only when the storage space is below a certain threshold, similar to the implement in

if [[ $free_space -lt $free_space_threshold ]]; then
?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants