-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ONTAP S3 Store with existence cache #1630
base: main
Are you sure you want to change the base?
Conversation
Implements a NetApp ONTAP S3 compatible backend with an existence cache for better performance when checking if objects exist. This provides a specialized S3 implementation for NetApp ONTAP system with optimized handling for Content Addressable Storage. Signed-off-by: Kadam (EXT), Prajwal v08wha <[email protected]>
|
Initially, this seems quite interesting. I do believe that we should defer review of this to after the next aws sdk update. It was a long standing issue that the aws sdk didn't support hyper 1.x, but that's now been fixed in the latest versions. So I'd say let's first bring our S3 implementation to hyper 1.x and then we have a better baseline to implement this ontap functionality. If we do it the other way around it might complicate the migration as we'd have to migrate two stores to the new hyper. @prakad1x Regarding the implementation, did you investigate a templating/monomorphization approach for the existing S3 store? This implementation seems very similar to the existing one, so there might be a way to implement this functionality with a lot less code. Also, could you outline how this is different from wrapping the S3 store in an existencecache? |
I understand your suggestion to wait for the AWS SDK update to Hyper 1.x before proceeding with this PR. That makes sense from a maintenance perspective to avoid having to migrate two stores later. Templating/MonomorphizationI initially looked at generalizing the existing S3 store with configuration options or traits to support both standard S3 and ONTAP S3. I had similar concerns about code duplication and discussed this with Marcus. He specifically recommended creating a separate implementation rather than trying to modify the existing S3 store to handle both use cases. Regarding the Existence CacheThe
Regarding the Extent of ChangesIt's worth noting that while the PR appears to introduce entirely new files, the implementation is actually built on top of the existing S3 store with targeted modifications. The commits were squashed, so you are seeing it as completely new code. I suggest looking at this development commit 8000013, specifically |
I think updating the s3 store to work with both aws s3 and ontap s3 would be better, should I go ahead and do that? because the changes are relatively less and largely the code does the same thing, refer to this commit to see what I have added ontop of base s3 store implementation to work with ontap s3 8000013 specifically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think that would be better. We'll need to change the way that credentials are passed into the sdk anyways with the migration. I put up a wip draft of the migration at #1633 in case it helps.
Your change to the existencecache behavior seems interesting to me for other stores as well. I wasn't aware that the index backend wasn't configurable which seems wrong. What are your thoughts on creating an index backend for the existencecache store? I.e. something along these lines:
{
"existence_cache": {
"backend": ...,
"cache": ..., // Or index? Or some other name?
}
}
This way we could keep the existing behavior with a memorystore and also support the IMO very valid usecase of a filesystem store as cache/index backend.
The polling mechanism also seems interesting to me. Previously (with only the memory cache) this probably wouldn't have been too useful, but with an arbitrary index backend this makes much more sense.
(Asterisk here that I might be overlooking something regarding the current existencecache functionality, but it seems unintuitive to me that we "require" the cache to live in memory)
Reviewable status: 0 of 2 LGTMs obtained, and 0 of 10 files reviewed, and pending CI: license/cla, pre-commit-checks
This has the side-effect of changing the guarantees that @prakad1x's existence cache can offer. It offers "does not exist" status (as of last poll), but does not offer a definite "exists", because something may have been deleted from s3 in the gap from the last poll. The 'exists' result might still result in a cache miss once s3 is actually queried. I think for this reason it may make sense for this to be kept separate, unless you can think of a way to unify the two? I like the idea of the data that @prakad1x fetches from s3 being materialized to disk. I agree it should not be required to live in memory. |
I started some work on it, the problem is how do we determine which backend bucket we are using AWS or ONTAP s3,
For the changes to stores.rs, I've added:
All new fields are optional with sensible defaults to ensure backward compatibility with existing AWS S3 users. |
Implements a NetApp ONTAP S3 compatible backend with an existence cache for better performance when checking if objects exist. This provides a specialized S3 implementation for NetApp ONTAP system with optimized handling for Content Addressable Storage.
Description
This PR implements support for NetApp ONTAP S3-compatible storage as a backend for NativeLink, along with an existence cache layer that optimizes performance when checking if objects exist.
The implementation includes:
A dedicated OntapS3Store that provides specialized handling for NetApp ONTAP S3 systems with proper configuration for TLS, credentials management, and vserver settings.
An OntapS3ExistenceCache layer that maintains an in-memory cache of object digests and periodically synchronizes with the backend to reduce latency for repeated existence checks.
This implementation provides significant performance improvements for build systems that repeatedly check for the same objects, reducing network calls and latency in environments using NetApp ONTAP storage.
Fixes # (issue)
Type of change
Please delete options that aren't relevant.
How Has This Been Tested?
This implementation has been thoroughly tested through multiple approaches:
Unit tests covering both OntapS3Store and OntapS3ExistenceCache components
Published a new container image with tag v0.6.0-splm-2 to container registry
Deployed the tagged image to our staging environment connected to an actual NetApp ONTAP S3 instance
Tested parallel uploads using multipart functionality with large test artifacts
Validated correct behavior of cache synchronization over extended periods
Checklist
bazel test //...
passes locallygit amend
see some docsThis change is