Skip to content

feat(services/swift): add SLO multipart upload for large objects#7212

Merged
Xuanwo merged 1 commit intoapache:mainfrom
benroeder:swift/slo-multipart
Feb 23, 2026
Merged

feat(services/swift): add SLO multipart upload for large objects#7212
Xuanwo merged 1 commit intoapache:mainfrom
benroeder:swift/slo-multipart

Conversation

@benroeder
Copy link
Contributor

Summary

  • Fixes feat: Swift service should support multipart upload for large objects #7211
  • Replace oio::OneShotWriter with oio::MultipartWriter backed by Swift's Static Large Object (SLO) API
  • initiate_part() generates a local UUID (no server call needed, unlike S3)
  • write_part() uploads segments to .segments/{path}/{upload_id}/{part_number:08}
  • complete_part() PUTs a JSON manifest to {path}?multipart-manifest=put
  • abort_part() lists and deletes orphaned segments
  • Small writes still go through write_once() as a single PUT — oio::MultipartWriter handles the decision automatically
  • Supports concurrent part uploads via args.concurrent()
  • Adds uuid dependency for upload ID generation

Reference: https://docs.openstack.org/swift/latest/overview_large_objects.html

Test plan

  • test_writer_write, test_writer_write_with_overwrite, test_writer_write_with_concurrent, test_writer_sink, test_writer_abort and other writer tests now execute
  • All 93 behavior tests pass against both local SAIO and a real Swift cluster

@benroeder benroeder requested a review from Xuanwo as a code owner February 23, 2026 00:36
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 23, 2026
@dosubot
Copy link

dosubot bot commented Feb 23, 2026

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@dosubot dosubot bot added the releases-note/feat The PR implements a new feature or has a title that begins with "feat" label Feb 23, 2026
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this great work!

pub fn slo_segment_path(&self, path: &str, upload_id: &str, part_number: usize) -> String {
let abs = build_abs_path(&self.root, path);
format!(
".segments/{}{}/{:08}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the path defined by Swift, or did we choose it ourselves? Will this path be listed out by users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .segments/ prefix is a convention, not defined by Swift — SLO just needs segments to be reachable objects anywhere in the container. This convention is widely used by Swift clients (e.g. python-swiftclient uses the same pattern).

Segments won't appear in OpenDAL listings because they're outside the user's root prefix — swift_list uses build_abs_path(&self.root, path) as the prefix filter, and .segments/ sits at the container root. They would be visible if someone lists the container directly via the Swift API, but that's the same behavior as python-swiftclient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the explanation. I would greatly appreciate it if you could add those explanations directly in the code as comments.

/// each segment's path, etag, and size.
///
/// Reference: <https://docs.openstack.org/swift/latest/overview_large_objects.html>
pub async fn swift_put_slo_manifest(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing we need to carry the user metadata here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — the manifest PUT should carry user metadata and content headers (Content-Type, Content-Disposition, etc.) from OpWrite. Currently only segments get the raw bytes and the manifest gets none of the user's metadata. I'll fix this.

core,
op,
path,
part_sizes: Arc::new(Mutex::new(HashMap::new())),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add size: Option<usize> in oio::MultipartPart so that we can remove this lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, that's cleaner. I'll add size: Option<u64> to oio::MultipartPart and remove the HashMap<usize, u64> + Mutex from the writer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — added size: Option<u64> to oio::MultipartPart and removed the HashMap<usize, u64> + Mutex from the Swift writer.

Swift's SLO manifest format requires size_bytes for each segment — Swift validates that each segment's actual size matches the manifest. S3/GCS don't need size in their complete calls (only part number and etag), which is why MultipartPart didn't have it before.

Since this is a core struct, this required updating all 11 existing MultipartPart constructors across other services (S3, GCS, COS, OSS, OBS, B2, Upyun, Vercel Blob, object_store integration, and the test mock) to include size: None. Let me know if you'd prefer this split into a separate PR.

Also addressed your other feedback:

  • Manifest PUT now forwards user metadata (X-Object-Meta-* headers) from OpWrite
  • Rebased onto current main (includes the 4 merged Swift PRs)

@benroeder benroeder force-pushed the swift/slo-multipart branch 2 times, most recently from 8c12ed6 to 83deaa7 Compare February 23, 2026 01:20
Implement oio::MultipartWrite for SwiftWriter using Swift's Static
Large Object (SLO) mechanism, removing the 5GB single-upload ceiling.

SLO flow:
- initiate_part: generate a local UUID (no server call needed)
- write_part: PUT segment to .segments/{path}/{upload_id}/{part:08}
- complete_part: PUT JSON manifest to {path}?multipart-manifest=put
- abort_part: list and delete all segments under the upload_id prefix

Changes:
- Add swift_put_segment, swift_put_slo_manifest, swift_delete_slo,
  slo_segment_path to SwiftCore
- Add SloManifestEntry serde struct for manifest JSON
- Replace OneShotWrite with MultipartWrite on SwiftWriter
- Track per-part sizes in Arc<Mutex<HashMap>> for manifest assembly
- Change Writer type from OneShotWriter to MultipartWriter
- Declare write_can_multi, write_multi_min_size (5MB),
  write_multi_max_size (5GB) capabilities
- Add uuid dependency for upload ID generation
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, really nice!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 23, 2026
@Xuanwo Xuanwo merged commit 9932bd4 into apache:main Feb 23, 2026
335 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer releases-note/feat The PR implements a new feature or has a title that begins with "feat" size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Swift service should support multipart upload for large objects

2 participants