-
Notifications
You must be signed in to change notification settings - Fork 433
Clean up saved image files when E2E job exits #7590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ci/jenkins/test-mc.sh
Outdated
| cleanup_multicluster_antrea $kubeconfig | ||
| done | ||
| fi | ||
| rm -f ${WORKDIR}/antrea-ubuntu.tar "${WORKDIR}"/antrea-mcs.tar "${WORKDIR}"/nginx.tar "${WORKDIR}"/agnhost.tar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luolanzone Thanks for the enhancement. I'm concerned this deletion could lead to a race condition, as multiple concurrent jobs will be placing the antrea-ubuntu.tar file in the same path. I suggest reconfiguring the tar file path to WORKSPACE instead because Jenkins can resolve this environment variable to the current job's real workspace path.
Also, I investigated this issue and found that the go cache was consuming 11 GB of space on the testbed. After running go clean -cache and go clean -modcache, this space can be recovered. I feel this can provide more benefits in mitigating the disk space shortage issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XinShuYang do we allow the same e2e job being scheduled and run in parallel in the same Node? If not, then, there should be no concurrent issue for this deletion step.
What I observed is different e2e jobs would be in the same Node, but the path of *.tar are different. e.g:
/var/lib/jenkins/workspace/antrea-kind-ipv6-ds-conformance-for-pull-request/antrea-ubuntu.tar
/var/lib/jenkins/workspace/antrea-kind-ipv6-ds-e2e-for-pull-request/antrea-ubuntu.tar
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you only referring to mc related tar? you are right, even there is no such concurrent issue, we should place them in the WORKSPACE path. I will update the mc e2e script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the path of tar files in the mc script.
I am not so sure about the go clean -cache and go clean -modcache considering it will requires each e2e job to download the cache again if we clean up the cache during every execution.
@antoninbas do you have any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XinShuYang do we allow the same e2e job being scheduled and run in parallel in the same Node? If not, then, there should be no concurrent issue for this deletion step. What I observed is different e2e jobs would be in the same Node, but the path of *.tar are different. e.g: /var/lib/jenkins/workspace/antrea-kind-ipv6-ds-conformance-for-pull-request/antrea-ubuntu.tar /var/lib/jenkins/workspace/antrea-kind-ipv6-ds-e2e-for-pull-request/antrea-ubuntu.tar ...
Yes, this feature should have been supported in the scripts since #5734. Although we disabled it on the Jenkins configuration after the CI migration to save resource costs, we should still consider it in new code changes in case we enable it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luolanzone I expect the Go build cache to be pretty small because we build Antrea inside docker and we only use Go "natively" to run e2e tests.
You could check the size with du -sh $(go env GOCACHE), and I read that Go deletes cache files if they haven't been used for at least 5 days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XinShuYang do we allow the same e2e job being scheduled and run in parallel in the same Node? If not, then, there should be no concurrent issue for this deletion step. What I observed is different e2e jobs would be in the same Node, but the path of *.tar are different. e.g: /var/lib/jenkins/workspace/antrea-kind-ipv6-ds-conformance-for-pull-request/antrea-ubuntu.tar /var/lib/jenkins/workspace/antrea-kind-ipv6-ds-e2e-for-pull-request/antrea-ubuntu.tar ...
Yes, this feature should have been supported in the scripts since #5734. Although we disabled it on the Jenkins configuration after the CI migration to save resource costs, we should still consider it in new code changes in case we enable it in the future.
Hi @XinShuYang do we ever support to run the same e2e jobs in the same Node? How do we avoid that two jobs are generating the image files and overwrite another in the same directory? I think the capability of #5734 is to enable multiple jobs (but from different kind of jobs) on the same Node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XinShuYang do we allow the same e2e job being scheduled and run in parallel in the same Node? If not, then, there should be no concurrent issue for this deletion step. What I observed is different e2e jobs would be in the same Node, but the path of *.tar are different. e.g: /var/lib/jenkins/workspace/antrea-kind-ipv6-ds-conformance-for-pull-request/antrea-ubuntu.tar /var/lib/jenkins/workspace/antrea-kind-ipv6-ds-e2e-for-pull-request/antrea-ubuntu.tar ...
Yes, this feature should have been supported in the scripts since #5734. Although we disabled it on the Jenkins configuration after the CI migration to save resource costs, we should still consider it in new code changes in case we enable it in the future.
Hi @XinShuYang do we ever support to run the same e2e jobs in the same Node? How do we avoid that two jobs are generating the image files and overwrite another in the same directory? I think the capability of #5734 is to enable multiple jobs (but from different kind of jobs) on the same Node.
When two same jobs run on a single node concurrently, Jenkins creates separate working directories for each. Here is the explanation from the Jenkins documentation: Each concurrently executed build occurs in its own build workspace, isolated from any other builds. By default, Jenkins appends " @<num> " to the workspace directory name, e.g. " @2 ". @luolanzone
cf95c62 to
ff43d52
Compare
|
/test-multicluster-e2e |
ff43d52 to
0e42b88
Compare
1. Multiple E2E jobs can be scheduled on the same Jenkins node. If stale image files from previous jobs are not deleted, they accumulate over time, consuming significant disk space. This can eventually lead to 'no disk space left' errors when new jobs start. This change ensures that saved image files are cleaned up when an E2E job exits, preventing disk space exhaustion. 2. Clean up Golang caches unconditionally Signed-off-by: Lan Luo <[email protected]>
0e42b88 to
f1d77a4
Compare
|
|
||
| function check_and_upgrade_golang() { | ||
| echo "====== Clean up Golang cache ======" | ||
| go clean -cache -modcache -testcache || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Met the disk space issue again, I checked the Jenkins Node, the cache will be over 1G quite soon, so add a step to clean up cache unconditionally here. @XinShuYang @antoninbas can you take another look? Thanks.
root@antrea-kind-testbed:/var/lib/jenkins# du -sh $(go env GOCACHE)
1.9G /var/lib/jenkins//.cache/go-build
root@antrea-kind-testbed:/var/lib/jenkins# du -sh $(go env GOMODCACHE)
2.1G /var/lib/jenkins/go/pkg/mod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to delete this cache for every run? Can we instead clean it only when the storage space is below a certain threshold, similar to the implement in
Line 20 in 636f63d
| if [[ $free_space -lt $free_space_threshold ]]; then |
Multiple E2E jobs can be scheduled on the same Jenkins node.
If stale image files from previous jobs are not deleted, they
accumulate over time, consuming significant disk space. This
can eventually lead to 'no disk space left' errors when new
jobs start.
This change ensures that saved image files are cleaned up
when an E2E job exits, preventing disk space exhaustion.