Skip to content

"Big blob support" proposals summary #326

Open
@sluongng

Description

@sluongng

Currently, we have two open proposals that both aim to improve support for big blobs in Remote APIs.

  1. Split and Splice RPCs
  2. Remote Execution Manifest Blobs

Here is a quick summary of the 2 proposals. (please give both of them a read)

Split and Splice RPCs

A new set of RPCs is introduced as an extension of the existing CAS service.
The server shall advertise whether they support the extension via new boolean fields in the CacheCapabilities message.

service ContentAddressableStorage {
  ...

  rpc SplitBlob(SplitBlobRequest) returns (SplitBlobResponse) {}

  rpc SpliceBlob(SpliceBlobRequest) returns (SpliceBlobResponse) {}
}

message CacheCapabilities {
  ...

  repeated ChunkingAlgorithm.Value supported_chunking_algorithms = 8;

  bool blob_split_support = 9;

  bool blob_splice_support = 10;
}

If the client has a big blob digest, it can call Split() to get back a list of chunks from the server.

message SplitBlobRequest {
  ...

  Digest blob_digest = 2;

  ...
}

message SplitBlobResponse {
  repeated Digest chunk_digests = 1;

  ...
}

If the client has split a big blob by itself into chunks, it can call Splice() to tell the server to put those chunks together into a big blob.

message SpliceBlobRequest {
  ...

  // Expected digest of the spliced blob.
  Digest blob_digest = 2;

  // The ordered list of digests of the chunks which need to be concatenated to
  // assemble the original blob.
  repeated Digest chunk_digests = 3;

  ...
}

message SpliceBlobResponse {
  // Computed digest of the spliced blob.
  Digest blob_digest = 1;

  ...
}

The current proposal included a definition of supported chunking algorithms and advertising them through CacheCapabilities, but through recent conversations, that will likely be removed in the final version.

Remote Execution Manifest Blobs

A new SHA256Encoded Digest function is introduced.

When using this new SHA256Encoded, each blob is expected to come with a small fixed-size header chunk to help identify whether it's a "normal" blob or if it's a "manifest" blob.

Image Image

A manifest blob includes a list of digests inside, each pointing to the chunk that can be used to put the large blob back together.

Key takeaways

  1. Overall:
    Both proposals are sound and sufficient to support a large blob in Remote API. However, both tried to avoid specifying how the blobs should be chunked. The only assumption that was made by both proposals was that implementations should be able to put the big blob back together by concatenating the small chunks orderly.

    In practice, we noted that if the build client/server/worker does not agree on a chunking algorithm, both proposals should still work. However, an agreement on the chunking algorithm is expected to improve cache hit rates and reduce the data transfer over the network.

  2. Adoption cost:
    There was concern about adoption costs when implementing these new proposals back in BazelCon.Worth noting that both proposals require client/server implementations to write some extra code to support it so there will be some cost to adoption.

    In particular, the SHA256Encoded proposal uses a new digest function which means that both the client and server need to implement it. The existing remote cache entries cannot be re-used with this proposal, so there could be a brief increase in cache storage requirements for the existing system (or shorter cache entry ttl during the migration to the new digest function).

    Meanwhile, the Split/Splice RPCs are compatible with existing SHA256/BLAKE3 digest function cache entries. It is also worth noting that Remote APIs do not dictate how the Worker APIs should be designed. However, for setups in which Workers use Remote APIs to communicate with Remote Cache. There can be partial benefits for older clients doing remote builds against server+worker which uses the new RPCs to speed up remote actions.

  3. Performance vs Verification:
    During a recent conversation, we noted that the current SHA256Encoded does not require implementations to hash the large blob to compute its digest. Instead, the large blob can only be accessed through the manifest and its digest. Not having to hash the larger blobs could save some amount of compute power and therefore, speed up builds with many larger blobs. However, because the big blob digest is not stored anywhere in the system, there won't be a way to verify the concat operation on smaller chunks when writing the big blob to disk. This implies a certain level of trust in client/worker implementations to concat these chunks correctly.

    In contrast, Split and Splice RPCs require implementations to hash the larger blob for its digest. The hashing operation can be slow but provides a way to verify the concat operation if needed. As both the large blob digest and manifest can be used, the build may require additional RPC calls to translate between different digests <-> manifests. The performance cost of making these additional network round trips vs downloading the big blob as-is can be non-trivial to predict.

    Note: the SHA256Encoded proposal can add an optional digest field into the manifest to support verifying the concat operation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions