Pre-requisites
What happened? What did you expect to happen?
When a ContainerSet-template's input artifact path coincides with one of the containerSet's volumeMounts.mountPath, the artifact init container fails with:
artifact <name> failed to load: rename /mainctrfs/<mount>.tmp /mainctrfs/<mount>: file exists
The executor stages the loaded artifact at artPath + ".tmp" and then os.Renames it onto the mount. The staging sibling lives on the init container's local filesystem, while the destination is a volume mount on a different filesystem — the kernel refuses the rename (seen as EEXIST on Linux/overlayfs in our reproducer; EXDEV, EBUSY, and ENOTEMPTY are also possible depending on the kernel and underlying filesystem). The same problem affects unpack for tar/zip artifacts (it renames <destPath>.tmpdir onto destPath).
The bug is independent of the artifact driver — raw: is enough to trigger it, as is git:, s3:, gcs:, http:, plugin:, etc.
For the plain Container template form, the validator at workflow/validate/validate.go:789-803 rejects this configuration upfront (already mounted in container.volumeMounts.<name>). That check was presumably added because of this same rename bug. It is not applied to ContainerSet templates (the validate path at lines 819-829 only calls tmpl.ContainerSet.Validate() which checks intra-set mount collisions, not artifact-vs-mount). So ContainerSets get past validation, attempt the doomed rename, and crash dirty at runtime.
Expected: Either the artifact lands inside the volume mount (the existing source comment on the overlap branch says "extracting to volume mount"), or the validator rejects the config consistently for all template shapes.
Actual: ContainerSet templates with an artifact whose path equals a mount path are accepted, the pod is created, the artifact init container fails with the rename error, and the workflow never starts its main container.
Related but distinct issues
Version(s)
v4.0.5, gitCommit=0ab1452144d8f4d57c50b37ce50dad218868e950. The same code shape exists on main at 6cc0d115b... and on release-4.0.
Paste a minimal workflow that reproduces the issue.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: rename-onto-mount-repro-
spec:
entrypoint: main
templates:
- name: main
inputs:
artifacts:
- name: data
path: /opt/workspace # <-- same as the volumeMount below
raw:
data: "hello world"
volumes:
- name: workspace
emptyDir: {}
containerSet:
volumeMounts:
- name: workspace
mountPath: /opt/workspace # <-- same as inputs.artifacts[0].path
containers:
- name: main
image: alpine:3.20
command: [sh, -c]
args: ["ls -la /opt/workspace"]
Workflow phase ends as Error with message: init: Error (exit code 64): rename /mainctrfs/opt/workspace.tmp /mainctrfs/opt/workspace: file exists.
(Switching this to a Container template instead of ContainerSet will be rejected by the validator with "templates.main.inputs.artifacts[0].path '/opt/workspace' already mounted in container.volumeMounts.workspace", demonstrating that the validator already knows this case can't work — it just doesn't cover ContainerSet.)
Logs from the workflow controller
The controller accepts the workflow and creates the pod normally. The failure surfaces at the init-container level (below). The controller log entry for the failed node:
level=ERROR msg="Workflow has failed" message="init: Error (exit code 64): rename /mainctrfs/opt/workspace.tmp /mainctrfs/opt/workspace: file exists"
Logs from in your workflow's wait container
The wait container never starts because the artifact init container exits non-zero. The init container's logs:
time=... level=INFO msg="Starting Workflow Executor" version=v4.0.5 gitCommit=0ab1452144d8f4d57c50b37ce50dad218868e950
time=... level=INFO msg="Start loading input artifacts..."
time=... level=INFO msg="Downloading artifact" name=data
time=... level=INFO msg="Specified artifact path overlaps with volume mount, extracting to volume mount" path=/opt/workspace mountPath=/opt/workspace
time=... level=INFO msg="Loading artifact"
time=... level=INFO msg="Load artifact" artifactName=data error=<nil> key=""
time=... level=INFO msg="Detecting if file is a tarball" path=/mainctrfs/opt/workspace.tmp
time=... level=ERROR msg="executor error" error="rename /mainctrfs/opt/workspace.tmp /mainctrfs/opt/workspace: file exists"
Error: rename /mainctrfs/opt/workspace.tmp /mainctrfs/opt/workspace: file exists
Root cause
workflow/executor/executor.go::loadArtifact (and the same shape in unpack) promotes the loaded artifact via os.Rename(temp, dest). When dest is a volume mount, temp is a sibling on the init container's local filesystem — different filesystem from the mount, so the kernel refuses. The check at workflow/validate/validate.go:789-803 papers over this for tmpl.Container but is not applied to tmpl.ContainerSet (or, in principle, any other template shape that produces a pod with volume mounts).
Proposed fix
Make the executor's rename actually work for the overlap case. Add a renameOrMerge helper that tries os.Rename first and, on the specific errnos that flag this cross-filesystem-onto-mount case (EXDEV, EBUSY, EEXIST, ENOTEMPTY), falls back to recursively copying source contents into the destination and removing the source. Use it at the three artifact-promotion call sites: the non-tar/non-zip branch in loadArtifact and both rename branches in unpack. The untar/unzip extraction into a sibling .tmpdir is unchanged; only the final move onto the destination switches to the helper.
PR attached as a follow-up. Tests cover the fast-path atomic rename, the merge-into-existing-directory fallback, symlink preservation, file mode preservation, and ENOENT propagation.
I did not touch the validator check for tmpl.Container in this PR because removing it would relax behavior and risk surprising existing users; that can be done in a follow-up once the executor fix lands. If maintainers prefer the alternative path (validator-only fix: extend the existing check to cover tmpl.ContainerSet too, rejecting these workflows upfront for all template shapes), happy to swap to that approach instead.
Behavioural note
When the fallback runs, any pre-existing files in the destination directory are merged with the artifact contents (entries from the staging path overwrite same-named entries at the destination). For the volume-mount case the mount is typically a fresh emptyDir so this is invisible, but for persistentVolumeClaim-backed mounts that already contain files this is a softer "merge" semantic than the original atomic rename would have provided. The existing source comment on the overlap branch ("Extracting to volume mount") already reads as "extract into", so this aligns with the documented intent.
Disclosure
This issue (and the accompanying PR) was authored with assistance from Claude Code (Anthropic's coding agent). The reproducer was run on a local k3s cluster, the logs and validator behaviour quoted above were captured first-hand, and I have read and reviewed every line before submitting.
Pre-requisites
:latestimage tag (i.e. v4.0.5,gitCommit=0ab1452144d8f4d57c50b37ce50dad218868e950) and can confirm the issue still existsWhat happened? What did you expect to happen?
When a
ContainerSet-template's input artifactpathcoincides with one of the containerSet'svolumeMounts.mountPath, the artifact init container fails with:The executor stages the loaded artifact at
artPath + ".tmp"and thenos.Renames it onto the mount. The staging sibling lives on the init container's local filesystem, while the destination is a volume mount on a different filesystem — the kernel refuses the rename (seen asEEXISTon Linux/overlayfs in our reproducer;EXDEV,EBUSY, andENOTEMPTYare also possible depending on the kernel and underlying filesystem). The same problem affectsunpackfor tar/zip artifacts (it renames<destPath>.tmpdirontodestPath).The bug is independent of the artifact driver —
raw:is enough to trigger it, as isgit:,s3:,gcs:,http:,plugin:, etc.For the plain
Containertemplate form, the validator atworkflow/validate/validate.go:789-803rejects this configuration upfront (already mounted in container.volumeMounts.<name>). That check was presumably added because of this same rename bug. It is not applied toContainerSettemplates (the validate path at lines 819-829 only callstmpl.ContainerSet.Validate()which checks intra-set mount collisions, not artifact-vs-mount). So ContainerSets get past validation, attempt the doomed rename, and crash dirty at runtime.Expected: Either the artifact lands inside the volume mount (the existing source comment on the overlap branch says "extracting to volume mount"), or the validator rejects the config consistently for all template shapes.
Actual: ContainerSet templates with an artifact whose
pathequals a mount path are accepted, the pod is created, the artifact init container fails with the rename error, and the workflow never starts its main container.Related but distinct issues
pathis in ephemeral volume mount #12174 — single-file artifact into ephemeral volume mount. Closed. Different code path; fix was to MkdirAll a missing parent.Version(s)
v4.0.5,
gitCommit=0ab1452144d8f4d57c50b37ce50dad218868e950. The same code shape exists onmainat6cc0d115b...and onrelease-4.0.Paste a minimal workflow that reproduces the issue.
Workflow phase ends as
Errorwithmessage: init: Error (exit code 64): rename /mainctrfs/opt/workspace.tmp /mainctrfs/opt/workspace: file exists.(Switching this to a
Containertemplate instead ofContainerSetwill be rejected by the validator with"templates.main.inputs.artifacts[0].path '/opt/workspace' already mounted in container.volumeMounts.workspace", demonstrating that the validator already knows this case can't work — it just doesn't cover ContainerSet.)Logs from the workflow controller
The controller accepts the workflow and creates the pod normally. The failure surfaces at the init-container level (below). The controller log entry for the failed node:
Logs from in your workflow's wait container
The wait container never starts because the artifact init container exits non-zero. The init container's logs:
Root cause
workflow/executor/executor.go::loadArtifact(and the same shape inunpack) promotes the loaded artifact viaos.Rename(temp, dest). Whendestis a volume mount,tempis a sibling on the init container's local filesystem — different filesystem from the mount, so the kernel refuses. The check atworkflow/validate/validate.go:789-803papers over this fortmpl.Containerbut is not applied totmpl.ContainerSet(or, in principle, any other template shape that produces a pod with volume mounts).Proposed fix
Make the executor's rename actually work for the overlap case. Add a
renameOrMergehelper that triesos.Renamefirst and, on the specific errnos that flag this cross-filesystem-onto-mount case (EXDEV,EBUSY,EEXIST,ENOTEMPTY), falls back to recursively copying source contents into the destination and removing the source. Use it at the three artifact-promotion call sites: the non-tar/non-zip branch inloadArtifactand both rename branches inunpack. The untar/unzip extraction into a sibling.tmpdiris unchanged; only the final move onto the destination switches to the helper.PR attached as a follow-up. Tests cover the fast-path atomic rename, the merge-into-existing-directory fallback, symlink preservation, file mode preservation, and ENOENT propagation.
I did not touch the validator check for
tmpl.Containerin this PR because removing it would relax behavior and risk surprising existing users; that can be done in a follow-up once the executor fix lands. If maintainers prefer the alternative path (validator-only fix: extend the existing check to covertmpl.ContainerSettoo, rejecting these workflows upfront for all template shapes), happy to swap to that approach instead.Behavioural note
When the fallback runs, any pre-existing files in the destination directory are merged with the artifact contents (entries from the staging path overwrite same-named entries at the destination). For the volume-mount case the mount is typically a fresh
emptyDirso this is invisible, but forpersistentVolumeClaim-backed mounts that already contain files this is a softer "merge" semantic than the original atomic rename would have provided. The existing source comment on the overlap branch ("Extracting to volume mount") already reads as "extract into", so this aligns with the documented intent.Disclosure
This issue (and the accompanying PR) was authored with assistance from Claude Code (Anthropic's coding agent). The reproducer was run on a local k3s cluster, the logs and validator behaviour quoted above were captured first-hand, and I have read and reviewed every line before submitting.