-
Notifications
You must be signed in to change notification settings - Fork 71
Description
What happened?
File writes fail due to premature closing of files.
...
50604:2025-09-23T16:23:08.047121Z DEBUG fuser::request: FUSE(91744) ino 0x0000000000000011 WRITE fh FileHandle(101), offset 40542208, size 4096, write flags 0x0
50606:2025-09-23T16:23:08.047210Z DEBUG fuser::request: FUSE(91748) ino 0x0000000000000011 WRITE fh FileHandle(101), offset 40546304, size 4096, write flags 0x0
50609:2025-09-23T16:23:08.047292Z DEBUG fuser::request: FUSE(91754) ino 0x0000000000000011 FLUSH fh FileHandle(101), lock owner LockOwner(4103602521915412644)
50610:2025-09-23T16:23:08.047296Z DEBUG fuser::request: FUSE(91756) ino 0x0000000000000011 WRITE fh FileHandle(101), offset 40550400, size 4096, write flags 0x0
50612:2025-09-23T16:23:08.047354Z DEBUG fuser::request: FUSE(91760) ino 0x0000000000000011 WRITE fh FileHandle(101), offset 40554496, size 4096, write flags 0x0
51713:2025-09-23T16:23:08.119019Z DEBUG flush{req=91754 ino=17 fh=101 pid=0 name="part-00000-5b5cbfde-a336-40c5-9bf1-a4298fb20b1b-c000.snappy.parquet"}: mountpoint_s3_fs::fs::handles: put succeeded etag="\"47d379521fe24ad28da6fadd434a912d-5\"" key="order/output/ondemand/_temporary/0/_temporary/attempt_202509231623041559459485160075385_0007_m_000000_51/part-00000-5b5cbfde-a336-40c5-9bf1-a4298fb20b1b-c000.snappy.parquet" size=40554496
51714:2025-09-23T16:23:08.119073Z WARN write{req=91760 ino=17 fh=101 offset=40554496 length=4096 pid=0 name="part-00000-5b5cbfde-a336-40c5-9bf1-a4298fb20b1b-c000.snappy.parquet"}: mountpoint_s3_fs::fuse: write failed with errno 5: upload already completed for key "order/output/ondemand/_temporary/0/_temporary/attempt_202509231623041559459485160075385_0007_m_000000_51/part-00000-5b5cbfde-a336-40c5-9bf1-a4298fb20b1b-c000.snappy.parquet"
55863:2025-09-23T16:23:08.367917Z DEBUG fuser::request: FUSE(102002) ino 0x0000000000000011 FLUSH fh FileHandle(101), lock owner LockOwner(13709436311748740162)
57344:2025-09-23T16:23:08.456849Z DEBUG fuser::request: FUSE(104866) ino 0x0000000000000011 FLUSH fh FileHandle(101), lock owner LockOwner(2560387006992014556)
58457:2025-09-23T16:23:08.521675Z DEBUG fuser::request: FUSE(107006) ino 0x0000000000000011 FLUSH fh FileHandle(101), lock owner LockOwner(15889247685641757733)
207094:2025-09-23T16:23:18.046015Z DEBUG fuser::request: FUSE(403344) ino 0x0000000000000011 FLUSH fh FileHandle(101), lock owner LockOwner(7237174021238096036)
...
What you expected to happen?
There should be no change between V1 and V2. The file should be able to be written to completely, and complete the upload to S3 after all file descriptors were closed.
How to reproduce it (as minimally and precisely as possible)?
Running this example is enough to fail: https://github.com/awslabs/data-on-eks/blob/main/analytics/terraform/spark-k8s-operator/examples/karpenter/spark-app-graviton.yaml
Anything else we need to know?:
Internal tracking: D306200245
Workaround, not recommended: kubectl patch daemonset s3-csi-node -n kube-system -p '{"spec":{"template":{"spec":{"hostPID":true}}}}' && kubectl rollout restart daemonset s3-csi-node -n kube-system
This is a regression due to containerization of Mountpoint for V2. (V2 issue: #504)
Environment
- Kubernetes version (use
kubectl version): TBC - Driver version: v2.0.0