feat(services/swift): add SLO multipart upload for large objects#7212
feat(services/swift): add SLO multipart upload for large objects#7212Xuanwo merged 1 commit intoapache:mainfrom
Conversation
Xuanwo
left a comment
There was a problem hiding this comment.
Thank you for this great work!
| pub fn slo_segment_path(&self, path: &str, upload_id: &str, part_number: usize) -> String { | ||
| let abs = build_abs_path(&self.root, path); | ||
| format!( | ||
| ".segments/{}{}/{:08}", |
There was a problem hiding this comment.
Is the path defined by Swift, or did we choose it ourselves? Will this path be listed out by users?
There was a problem hiding this comment.
The .segments/ prefix is a convention, not defined by Swift — SLO just needs segments to be reachable objects anywhere in the container. This convention is widely used by Swift clients (e.g. python-swiftclient uses the same pattern).
Segments won't appear in OpenDAL listings because they're outside the user's root prefix — swift_list uses build_abs_path(&self.root, path) as the prefix filter, and .segments/ sits at the container root. They would be visible if someone lists the container directly via the Swift API, but that's the same behavior as python-swiftclient.
There was a problem hiding this comment.
Got it, thanks for the explanation. I would greatly appreciate it if you could add those explanations directly in the code as comments.
| /// each segment's path, etag, and size. | ||
| /// | ||
| /// Reference: <https://docs.openstack.org/swift/latest/overview_large_objects.html> | ||
| pub async fn swift_put_slo_manifest( |
There was a problem hiding this comment.
I'm guessing we need to carry the user metadata here?
There was a problem hiding this comment.
Good catch — the manifest PUT should carry user metadata and content headers (Content-Type, Content-Disposition, etc.) from OpWrite. Currently only segments get the raw bytes and the manifest gets none of the user's metadata. I'll fix this.
core/services/swift/src/writer.rs
Outdated
| core, | ||
| op, | ||
| path, | ||
| part_sizes: Arc::new(Mutex::new(HashMap::new())), |
There was a problem hiding this comment.
Maybe we can add size: Option<usize> in oio::MultipartPart so that we can remove this lock.
There was a problem hiding this comment.
Agree, that's cleaner. I'll add size: Option<u64> to oio::MultipartPart and remove the HashMap<usize, u64> + Mutex from the writer.
There was a problem hiding this comment.
Done — added size: Option<u64> to oio::MultipartPart and removed the HashMap<usize, u64> + Mutex from the Swift writer.
Swift's SLO manifest format requires size_bytes for each segment — Swift validates that each segment's actual size matches the manifest. S3/GCS don't need size in their complete calls (only part number and etag), which is why MultipartPart didn't have it before.
Since this is a core struct, this required updating all 11 existing MultipartPart constructors across other services (S3, GCS, COS, OSS, OBS, B2, Upyun, Vercel Blob, object_store integration, and the test mock) to include size: None. Let me know if you'd prefer this split into a separate PR.
Also addressed your other feedback:
- Manifest PUT now forwards user metadata (
X-Object-Meta-*headers) fromOpWrite - Rebased onto current main (includes the 4 merged Swift PRs)
8c12ed6 to
83deaa7
Compare
Implement oio::MultipartWrite for SwiftWriter using Swift's Static
Large Object (SLO) mechanism, removing the 5GB single-upload ceiling.
SLO flow:
- initiate_part: generate a local UUID (no server call needed)
- write_part: PUT segment to .segments/{path}/{upload_id}/{part:08}
- complete_part: PUT JSON manifest to {path}?multipart-manifest=put
- abort_part: list and delete all segments under the upload_id prefix
Changes:
- Add swift_put_segment, swift_put_slo_manifest, swift_delete_slo,
slo_segment_path to SwiftCore
- Add SloManifestEntry serde struct for manifest JSON
- Replace OneShotWrite with MultipartWrite on SwiftWriter
- Track per-part sizes in Arc<Mutex<HashMap>> for manifest assembly
- Change Writer type from OneShotWriter to MultipartWriter
- Declare write_can_multi, write_multi_min_size (5MB),
write_multi_max_size (5GB) capabilities
- Add uuid dependency for upload ID generation
83deaa7 to
9158524
Compare
Summary
oio::OneShotWriterwithoio::MultipartWriterbacked by Swift's Static Large Object (SLO) APIinitiate_part()generates a local UUID (no server call needed, unlike S3)write_part()uploads segments to.segments/{path}/{upload_id}/{part_number:08}complete_part()PUTs a JSON manifest to{path}?multipart-manifest=putabort_part()lists and deletes orphaned segmentswrite_once()as a single PUT —oio::MultipartWriterhandles the decision automaticallyargs.concurrent()uuiddependency for upload ID generationReference: https://docs.openstack.org/swift/latest/overview_large_objects.html
Test plan
test_writer_write,test_writer_write_with_overwrite,test_writer_write_with_concurrent,test_writer_sink,test_writer_abortand other writer tests now execute