Skip to content

Steps delayed due to hash mismatch with kubernetes-novolume mode #269

@AlexLukasT

Description

@AlexLukasT

Hey,

I upgraded to the recent v0.8.0 to use the new kubernetes-novolume mode.
And while it's working fine, I noticed that each Step in a Job is delayed by ~30s because of a hash mismatch in the copied files.
Here are some example logs from the Initialize containers step:

##[debug]Evaluating condition for step: 'Initialize containers'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Initialize containers
##[debug]Register post job cleanup for stopping/deleting containers.
Run '/home/runner/k8s-novolume/index.js'
##[debug]/home/runner/externals/node20/bin/node /home/runner/k8s-novolume/index.js
##[debug]Job pod created, waiting for it to come online <workflow-pod-name>
##[debug]Copying /home/runner/_work to pod <workflow-pod-name> at /__w
(node:98) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
##[debug]internalExecOutput response: {"metadata":{},"status":"Success"}
##[debug]The hash of the directory does not match the expected value; want='742f6770882c57760c85f1c1fd1d8f781d52b04751482098954181e9d1cd8e35' got='a7c551c3c067391a05bd1cd314c0144eb04bd3f5901f7cac48598e167b77acc7'
##[debug]internalExecOutput response: {"metadata":{},"status":"Success"}
##[debug]The hash of the directory does not match the expected value; want='742f6770882c57760c85f1c1fd1d8f781d52b04751482098954181e9d1cd8e35' got='a7c551c3c067391a05bd1cd314c0144eb04bd3f5901f7cac48598e167b77acc7'
##[debug]internalExecOutput response: {"metadata":{},"status":"Success"}
##[debug]The hash of the directory does not match the expected value; want='742f6770882c57760c85f1c1fd1d8f781d52b04751482098954181e9d1cd8e35' got='a7c551c3c067391a05bd1cd314c0144eb04bd3f5901f7cac48598e167b77acc7'
##[debug]internalExecOutput response: {"metadata":{},"status":"Success"}
##[debug]The hash of the directory does not match the expected value; want='742f6770882c57760c85f1c1fd1d8f781d52b04751482098954181e9d1cd8e35' got='a7c551c3c067391a05bd1cd314c0144eb04bd3f5901f7cac48598e167b77acc7'
##[debug]internalExecOutput response: {"metadata":{},"status":"Success"}
##[debug]The hash of the directory does not match the expected value; want='742f6770882c57760c85f1c1fd1d8f781d52b04751482098954181e9d1cd8e35' got='a7c551c3c067391a05bd1cd314c0144eb04bd3f5901f7cac48598e167b77acc7'
##[debug]internalExecOutput response: {"metadata":{},"status":"Success"}
##[debug]The hash of the directory does not match the expected value; want='742f6770882c57760c85f1c1fd1d8f781d52b04751482098954181e9d1cd8e35' got='a7c551c3c067391a05bd1cd314c0144eb04bd3f5901f7cac48598e167b77acc7'
##[debug]internalExecOutput response: {"metadata":{},"status":"Success"}
##[debug]The hash of the directory does not match the expected value; want='742f6770882c57760c85f1c1fd1d8f781d52b04751482098954181e9d1cd8e35' got='a7c551c3c067391a05bd1cd314c0144eb04bd3f5901f7cac48598e167b77acc7'
##[debug]internalExecOutput response: {"metadata":{},"status":"Success"}
##[debug]The hash of the directory does not match the expected value; want='742f6770882c57760c85f1c1fd1d8f781d52b04751482098954181e9d1cd8e35' got='a7c551c3c067391a05bd1cd314c0144eb04bd3f5901f7cac48598e167b77acc7'
##[debug]internalExecOutput response: {"metadata":{},"status":"Success"}
##[debug]The hash of the directory does not match the expected value; want='742f6770882c57760c85f1c1fd1d8f781d52b04751482098954181e9d1cd8e35' got='a7c551c3c067391a05bd1cd314c0144eb04bd3f5901f7cac48598e167b77acc7'
##[debug]internalExecOutput response: {"metadata":{},"status":"Success"}
##[debug]The hash of the directory does not match the expected value; want='742f6770882c57760c85f1c1fd1d8f781d52b04751482098954181e9d1cd8e35' got='a7c551c3c067391a05bd1cd314c0144eb04bd3f5901f7cac48598e167b77acc7'
##[debug]internalExecOutput response: {"metadata":{},"status":"Success"}
##[debug]The hash of the directory does not match the expected value; want='742f6770882c57760c85f1c1fd1d8f781d52b04751482098954181e9d1cd8e35' got='a7c551c3c067391a05bd1cd314c0144eb04bd3f5901f7cac48598e167b77acc7'
##[debug]internalExecOutput response: {"metadata":{},"status":"Success"}
##[debug]The hash of the directory does not match the expected value; want='742f6770882c57760c85f1c1fd1d8f781d52b04751482098954181e9d1cd8e35' got='a7c551c3c067391a05bd1cd314c0144eb04bd3f5901f7cac48598e167b77acc7'
##[debug]internalExecOutput response: {"metadata":{},"status":"Success"}
##[debug]The hash of the directory does not match the expected value; want='742f6770882c57760c85f1c1fd1d8f781d52b04751482098954181e9d1cd8e35' got='a7c551c3c067391a05bd1cd314c0144eb04bd3f5901f7cac48598e167b77acc7'
##[debug]internalExecOutput response: {"metadata":{},"status":"Success"}
##[debug]The hash of the directory does not match the expected value; want='742f6770882c57760c85f1c1fd1d8f781d52b04751482098954181e9d1cd8e35' got='a7c551c3c067391a05bd1cd314c0144eb04bd3f5901f7cac48598e167b77acc7'
##[debug]internalExecOutput response: {"metadata":{},"status":"Success"}
##[debug]The hash of the directory does not match the expected value; want='742f6770882c57760c85f1c1fd1d8f781d52b04751482098954181e9d1cd8e35' got='a7c551c3c067391a05bd1cd314c0144eb04bd3f5901f7cac48598e167b77acc7'
##[debug]Job pod is ready for traffic
##[debug]execPodStep response: {"metadata":{},"status":"Failure","message":"command terminated with non-zero exit code: command terminated with exit code 1","reason":"NonZeroExitCode","details":{"causes":[{"reason":"ExitCode","message":"1"}]}}
##[debug]{"message":"command terminated with non-zero exit code: command terminated with exit code 1","details":{"causes":[{"reason":"ExitCode","message":"1"}]}}
##[debug]Setting isAlpine to false
##[debug]Finishing: Initialize containers

I searched through the code and found that there is a retry loop of 15 attemps with a delay of 1s each that tries to get the correct hash.
To debug this I ran the command used to list the files manually in each Pod to compare the output:

/bin/sh -c "find . -not -path '*/_runner_hook_responses*' -exec stat -c '%b %n' {} \\;"

And what I noticed is that each time on the runner Pod a file like ./_temp/b584baa0-b98c-11f0-aca5-611cac01a403.sh was present with content like this:

#!/bin/sh -l
set -e
rm "$0" # remove script after running
<calling-some-bash-script>

So it looks to me like there is a temporary shell script which immediately deletes itself and seems to be included in the local hash but not in the exec hash.
I was able to reproduce this multiple times in a row and it happens on every copy execution so it doesn't seem to be a race condition.

Is there a misconfiguration on my end which causes this or is it a bug?
It's mostly an inconvience and not a major issue, but I would appreciate some assistance here.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions