-
Notifications
You must be signed in to change notification settings - Fork 129
WIP/Don't review: update split and splice docs #353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -440,13 +440,21 @@ service ContentAddressableStorage { | |||||||||||||||||||||||||||||
| option (google.api.http) = { get: "/v2/{instance_name=**}/blobs/{root_digest.hash}/{root_digest.size_bytes}:getTree" }; | ||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| // Split a blob into chunks. | ||||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // This call splits a blob into chunks, stores the chunks in the CAS, and | ||||||||||||||||||||||||||||||
| // returns a list of the chunk digests. Using this list, a client can check | ||||||||||||||||||||||||||||||
| // which chunks are locally available and just fetch the missing ones. The | ||||||||||||||||||||||||||||||
| // desired blob can be assembled by concatenating the fetched chunks in the | ||||||||||||||||||||||||||||||
| // order of the digests in the list. | ||||||||||||||||||||||||||||||
| // Split retrieves information about how a blob is split into chunks. | ||||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // This call returns information about how a blob is split into chunks, and | ||||||||||||||||||||||||||||||
| // returns a list of the chunk digests. The server returns a known manifest | ||||||||||||||||||||||||||||||
| // if one exists for the requested blob digest. If no manifest exists, the | ||||||||||||||||||||||||||||||
| // server MAY compute a new one by performing chunking on-demand if the blob | ||||||||||||||||||||||||||||||
| // exists in the CAS, though this should be rare as manifests are typically | ||||||||||||||||||||||||||||||
| // created when blobs are written. The original blob digest does not need to | ||||||||||||||||||||||||||||||
| // be present in the CAS for this call to succeed if a manifest already exists. | ||||||||||||||||||||||||||||||
|
Comment on lines
+449
to
+451
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Suggested change
Leave the manifest creation as an implementation detail and avoid discussing it in the spec. That gives the server more freedom to pick the right tradeoffs (i.e. chunking eagerly vs lazily). The server can keep both the chunked version and the non-chunked version of the same blob to serve both kinds of requests. |
||||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // Using the returned list of chunk digests, a client can check which chunks | ||||||||||||||||||||||||||||||
| // are locally available and just fetch the missing ones. The desired blob can | ||||||||||||||||||||||||||||||
| // be assembled by concatenating the fetched chunks in the order of the | ||||||||||||||||||||||||||||||
| // digests in the list. ZSTD-compressed chunks can be concatenated without | ||||||||||||||||||||||||||||||
| // decompression. | ||||||||||||||||||||||||||||||
|
Comment on lines
+456
to
+457
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. There are a few ways compression can be applied here: a. Compress the original large blob before chunking do not compress chunks From the previous discussions, I think we do not care about (a) and (b) and will mostly be doing (c). Theoretically, (c) should give us a better chunk "hit rate" when using a consistent chunking algorithm (i.e. FastCDC) across different versions of the same large blob, as the compression algorithm can shuffle contents around. Under (c), each chunk will be compressed individually. To get back the original large blob, one would need to decompress each chunk before concatenating them based on the order in the response.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, agree
One nice property of ZSTD is you can concatenate first and then decompress. This is because with ZSTD: Decompress(Compress(A1) + Compress(A2)) == A1 + A2 So as long as we do (c) for creation of the chunks, the user can get the full file compressed by concatenating the compressed chunks. I think this is probably also not important to the spec, and I should remove it
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The spec has DEFLATE and BROTLI as well, which may not share that property. |
||||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // This rpc can be used to reduce the required data to download a large blob | ||||||||||||||||||||||||||||||
| // from CAS if chunks from earlier downloads of a different version of this | ||||||||||||||||||||||||||||||
|
|
@@ -459,6 +467,8 @@ service ContentAddressableStorage { | |||||||||||||||||||||||||||||
| // 1. The blob chunks are stored in CAS. | ||||||||||||||||||||||||||||||
| // 2. Concatenating the blob chunks in the order of the digest list returned | ||||||||||||||||||||||||||||||
| // by the server results in the original blob. | ||||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // A client should NOT expect that the original blob is available. | ||||||||||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again, I don't think this is true. It's up to the implementation to decide whether to keep or discard the original blob after splitting. From the client perspective, nothing should change: calling ByteStream.Read or FindMissing over the original blob after splitting SHOULD still yield the original blob. Keep in mind that we are assuming different types of clients are accessing the same remote cache. Some clients may support Split/Splice APIs, some may NOT. And it's important to remind the server implementations that both types MAY exist at the same time. |
||||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // Servers which implement this functionality MUST declare that they support | ||||||||||||||||||||||||||||||
| // it by setting the | ||||||||||||||||||||||||||||||
|
|
@@ -491,26 +501,35 @@ service ContentAddressableStorage { | |||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // Errors: | ||||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // * `NOT_FOUND`: The requested blob is not present in the CAS. | ||||||||||||||||||||||||||||||
| // * `NOT_FOUND`: No manifest exists for the blob and the blob is not present | ||||||||||||||||||||||||||||||
| // in the CAS. | ||||||||||||||||||||||||||||||
| // * `RESOURCE_EXHAUSTED`: There is insufficient disk quota to store the blob | ||||||||||||||||||||||||||||||
| // chunks. | ||||||||||||||||||||||||||||||
| rpc SplitBlob(SplitBlobRequest) returns (SplitBlobResponse) { | ||||||||||||||||||||||||||||||
| option (google.api.http) = { get: "/v2/{instance_name=**}/blobs/{blob_digest.hash}/{blob_digest.size_bytes}:splitBlob" }; | ||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
| // Splice a blob from chunks. | ||||||||||||||||||||||||||||||
| // Splice tells the CAS how chunks compose a blob. | ||||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // This is the complementary operation to the | ||||||||||||||||||||||||||||||
| // [ContentAddressableStorage.SplitBlob][build.bazel.remote.execution.v2.ContentAddressableStorage.SplitBlob] | ||||||||||||||||||||||||||||||
| // function to handle the chunked upload of large blobs to save upload | ||||||||||||||||||||||||||||||
| // traffic. | ||||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // When uploading a large blob using chunked upload, clients MUST first upload | ||||||||||||||||||||||||||||||
| // all chunks to the CAS, then call this RPC to store a manifest that describes | ||||||||||||||||||||||||||||||
| // how those chunks compose the original blob. The chunks referenced in the | ||||||||||||||||||||||||||||||
| // manifest must be available in the CAS before calling this RPC. The original | ||||||||||||||||||||||||||||||
| // blob does not need to be available: the correctness of the manifest can be | ||||||||||||||||||||||||||||||
| // validated from manifest correctness and by verifying that the chunks match | ||||||||||||||||||||||||||||||
| // their specified digests. | ||||||||||||||||||||||||||||||
|
Comment on lines
+519
to
+525
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
s/must/SHOULD/. We don't have a transactional guarantee between uploading chunks and calling Splice API. So a remote cache with a very short TTL can evict some of the chunks by the time the client finishes uploading and calls SpliceBlob. In those cases, the client should expect an error response from the server.
Huh? Might need some rewording here
Keep in mind that |
||||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // If a client needs to upload a large blob and is able to split a blob into | ||||||||||||||||||||||||||||||
| // chunks in such a way that reusable chunks are obtained, e.g., by means of | ||||||||||||||||||||||||||||||
| // content-defined chunking, it can first determine which parts of the blob | ||||||||||||||||||||||||||||||
| // are already available in the remote CAS and upload the missing chunks, and | ||||||||||||||||||||||||||||||
| // then use this API to instruct the server to splice the original blob from | ||||||||||||||||||||||||||||||
| // the remotely available blob chunks. | ||||||||||||||||||||||||||||||
| // then use this API to store the manifest describing how the chunks compose | ||||||||||||||||||||||||||||||
| // the original blob. | ||||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // Servers which implement this functionality MUST declare that they support | ||||||||||||||||||||||||||||||
| // it by setting the | ||||||||||||||||||||||||||||||
|
|
@@ -522,17 +541,22 @@ service ContentAddressableStorage { | |||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // In order to ensure data consistency of the CAS, the server MUST only add | ||||||||||||||||||||||||||||||
| // blobs to the CAS after verifying their digests. In particular, servers MUST NOT | ||||||||||||||||||||||||||||||
| // trust digests provided by the client. The server MAY accept a request as no-op | ||||||||||||||||||||||||||||||
| // if the client-specified blob is already in CAS; the lifetime of that blob SHOULD | ||||||||||||||||||||||||||||||
| // be extended as usual. If the client-specified blob is not already in the CAS, | ||||||||||||||||||||||||||||||
| // the server SHOULD verify that the digest of the newly created blob matches the | ||||||||||||||||||||||||||||||
| // digest specified by the client, and reject the request if they differ. | ||||||||||||||||||||||||||||||
| // trust digests provided by the client. The server MUST validate the manifest | ||||||||||||||||||||||||||||||
| // and verify that all referenced chunks exist in the CAS and match their | ||||||||||||||||||||||||||||||
| // specified digests. The server MAY optionally verify that concatenating the | ||||||||||||||||||||||||||||||
| // chunks results in a blob matching the original blob digest, particularly if | ||||||||||||||||||||||||||||||
| // the client is not trusted. The server MAY accept a request as no-op if a | ||||||||||||||||||||||||||||||
| // manifest for the client-specified blob already exists; the lifetime of that | ||||||||||||||||||||||||||||||
| // manifest and its chunks SHOULD be extended as usual. | ||||||||||||||||||||||||||||||
|
Comment on lines
+544
to
+550
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This seems to assume that the large blob exists only in the manifest + chunk form, not in its original form. In reality, servers may choose to store both. So this is an implementation detail our spec should avoid taking an opinion on.
We already established a few sentences earlier that "the server MUST NOT trust ... the client", so the "if the client is not trusted" part should be implied? This sentence feels unnecessary to me.
I know this is from the original text, but I wonder if this should be updated. In particular, let's say I were to send this splice request to the server. What the current text is advising: if blob_digest already exists, then the server MAY accept this result. It also implies that if the large blob exists in the manifest + chunks form, the manifest could be updated to the latest list of chunks. This seems wrong. On one hand, I do think it's nice to have a short circuit way to skip having to verify the concatenated result. But it might be worth reminding the server implementations that the server should never update the underlying manifest without verifying the concatenated result. This would help us disambiguate cases where the clients want to update the chunking result to an improved chunking algorithm / configuration versus cases where malicious actors tried to replace an existing chunk manifest with a malicious one.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, now that I think about this a little bit more: I think that the server should never let SpliceBlob update an existing large blob manifest. It's ok to create a new one. In that case, the server MUST verify the concat result. |
||||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // When blob splitting and splicing is used at the same time, the clients and | ||||||||||||||||||||||||||||||
| // the server SHOULD agree out-of-band upon a chunking algorithm used by both | ||||||||||||||||||||||||||||||
| // parties to benefit from each others chunk data and avoid unnecessary data | ||||||||||||||||||||||||||||||
| // duplication. | ||||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // After a successful Splice call, a subsequent Split call for the same blob digest | ||||||||||||||||||||||||||||||
| // SHOULD retrieve the manifest information that was stored. | ||||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // Errors: | ||||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||
| // * `NOT_FOUND`: At least one of the blob chunks is not present in the CAS. | ||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared to the original doc, this new version missed the part where the chunks MUST/SHOULD be stored in CAS. It's something obvious to an experienced maintainer, but should be documented to help newer implementations onboard the spec.