Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ONTAP S3 Store with existence cache #1630

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

prakad1x
Copy link

@prakad1x prakad1x commented Mar 16, 2025

Implements a NetApp ONTAP S3 compatible backend with an existence cache for better performance when checking if objects exist. This provides a specialized S3 implementation for NetApp ONTAP system with optimized handling for Content Addressable Storage.

Description

This PR implements support for NetApp ONTAP S3-compatible storage as a backend for NativeLink, along with an existence cache layer that optimizes performance when checking if objects exist.
The implementation includes:

A dedicated OntapS3Store that provides specialized handling for NetApp ONTAP S3 systems with proper configuration for TLS, credentials management, and vserver settings.
An OntapS3ExistenceCache layer that maintains an in-memory cache of object digests and periodically synchronizes with the backend to reduce latency for repeated existence checks.

This implementation provides significant performance improvements for build systems that repeatedly check for the same objects, reducing network calls and latency in environments using NetApp ONTAP storage.

Fixes # (issue)

Type of change

Please delete options that aren't relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

This implementation has been thoroughly tested through multiple approaches:

Unit tests covering both OntapS3Store and OntapS3ExistenceCache components
Published a new container image with tag v0.6.0-splm-2 to container registry
Deployed the tagged image to our staging environment connected to an actual NetApp ONTAP S3 instance
Tested parallel uploads using multipart functionality with large test artifacts
Validated correct behavior of cache synchronization over extended periods

Checklist

  • Updated documentation if needed
  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

Implements a NetApp ONTAP S3 compatible backend with an existence cache for better performance when checking if objects exist. This provides a specialized S3 implementation for NetApp ONTAP system with optimized handling for Content Addressable Storage.

Signed-off-by: Kadam (EXT), Prajwal v08wha <[email protected]>
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@aaronmondal
Copy link
Member

Initially, this seems quite interesting. I do believe that we should defer review of this to after the next aws sdk update. It was a long standing issue that the aws sdk didn't support hyper 1.x, but that's now been fixed in the latest versions. So I'd say let's first bring our S3 implementation to hyper 1.x and then we have a better baseline to implement this ontap functionality. If we do it the other way around it might complicate the migration as we'd have to migrate two stores to the new hyper.

@prakad1x Regarding the implementation, did you investigate a templating/monomorphization approach for the existing S3 store? This implementation seems very similar to the existing one, so there might be a way to implement this functionality with a lot less code. Also, could you outline how this is different from wrapping the S3 store in an existencecache?

@prakad1x
Copy link
Author

prakad1x commented Mar 19, 2025

Initially, this seems quite interesting. I do believe that we should defer review of this to after the next aws sdk update. It was a long standing issue that the aws sdk didn't support hyper 1.x, but that's now been fixed in the latest versions. So I'd say let's first bring our S3 implementation to hyper 1.x and then we have a better baseline to implement this ontap functionality. If we do it the other way around it might complicate the migration as we'd have to migrate two stores to the new hyper.

@prakad1x Regarding the implementation, did you investigate a templating/monomorphization approach for the existing S3 store? This implementation seems very similar to the existing one, so there might be a way to implement this functionality with a lot less code. Also, could you outline how this is different from wrapping the S3 store in an existencecache?

I understand your suggestion to wait for the AWS SDK update to Hyper 1.x before proceeding with this PR. That makes sense from a maintenance perspective to avoid having to migrate two stores later.

Templating/Monomorphization

I initially looked at generalizing the existing S3 store with configuration options or traits to support both standard S3 and ONTAP S3. I had similar concerns about code duplication and discussed this with Marcus. He specifically recommended creating a separate implementation rather than trying to modify the existing S3 store to handle both use cases.
Slack discussion link

Regarding the Existence Cache

The OntapS3ExistenceCache is fundamentally different from wrapping S3Store in an ExistenceCacheStore because:

  1. File-backed persistence mechanism: It provides a file-backed persistence mechanism for the cache.
  2. Background synchronization: It implements background synchronization with configurable intervals.
  3. Complete index of objects: Most importantly, it maintains a complete index of all objects in the bucket through periodic listing operations. This allows it to definitively answer "no" for objects that don't exist without making network calls to s3, whereas the standard ExistenceCacheStore must always check the underlying store for any object it hasn't seen before. This significantly reduces network calls for non-existent objects.

Regarding the Extent of Changes

It's worth noting that while the PR appears to introduce entirely new files, the implementation is actually built on top of the existing S3 store with targeted modifications. The commits were squashed, so you are seeing it as completely new code. I suggest looking at this development commit 8000013, specifically ontap_s3_store.rs from a different branch I used during development. It will give you a clearer idea of what I contributed ontop of the S3 store as a base implementation to make it work with ONTAP S3.

@prakad1x
Copy link
Author

prakad1x commented Mar 19, 2025

Initially, this seems quite interesting. I do believe that we should defer review of this to after the next aws sdk update. It was a long standing issue that the aws sdk didn't support hyper 1.x, but that's now been fixed in the latest versions. So I'd say let's first bring our S3 implementation to hyper 1.x and then we have a better baseline to implement this ontap functionality. If we do it the other way around it might complicate the migration as we'd have to migrate two stores to the new hyper.
@prakad1x Regarding the implementation, did you investigate a templating/monomorphization approach for the existing S3 store? This implementation seems very similar to the existing one, so there might be a way to implement this functionality with a lot less code. Also, could you outline how this is different from wrapping the S3 store in an existencecache?

I understand your suggestion to wait for the AWS SDK update to Hyper 1.x before proceeding with this PR. That makes sense from a maintenance perspective to avoid having to migrate two stores later.

Templating/Monomorphization

I initially looked at generalizing the existing S3 store with configuration options or traits to support both standard S3 and ONTAP S3. I had similar concerns about code duplication and discussed this with Marcus. He specifically recommended creating a separate implementation rather than trying to modify the existing S3 store to handle both use cases. Slack discussion link

Regarding the Existence Cache

The OntapS3ExistenceCache is fundamentally different from wrapping S3Store in an ExistenceCacheStore because:

  1. File-backed persistence mechanism: It provides a file-backed persistence mechanism for the cache.
  2. Background synchronization: It implements background synchronization with configurable intervals.
  3. Complete index of objects: Most importantly, it maintains a complete index of all objects in the bucket through periodic listing operations. This allows it to definitively answer "no" for objects that don't exist without making network calls to s3, whereas the standard ExistenceCacheStore must always check the underlying store for any object it hasn't seen before. This significantly reduces network calls for non-existent objects.

Regarding the Extent of Changes

It's worth noting that while the PR appears to introduce entirely new files, the implementation is actually built on top of the existing S3 store with targeted modifications. The commits were squashed, so you are seeing it as completely new code. I suggest looking at this development commit 8000013, specifically ontap_s3_store.rs from a different branch I used during development. It will give you a clearer idea of what I contributed ontop of the S3 store as a base implementation to make it work with ONTAP S3.

I think updating the s3 store to work with both aws s3 and ontap s3 would be better, should I go ahead and do that? because the changes are relatively less and largely the code does the same thing, refer to this commit to see what I have added ontop of base s3 store implementation to work with ontap s3 8000013 specifically ontap_s3_store.rs @aaronmondal @MarcusSorealheis ?

Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that would be better. We'll need to change the way that credentials are passed into the sdk anyways with the migration. I put up a wip draft of the migration at #1633 in case it helps.

Your change to the existencecache behavior seems interesting to me for other stores as well. I wasn't aware that the index backend wasn't configurable which seems wrong. What are your thoughts on creating an index backend for the existencecache store? I.e. something along these lines:

{
  "existence_cache": {
    "backend": ...,
    "cache": ...,  // Or index? Or some other name?
  }
}

This way we could keep the existing behavior with a memorystore and also support the IMO very valid usecase of a filesystem store as cache/index backend.

The polling mechanism also seems interesting to me. Previously (with only the memory cache) this probably wouldn't have been too useful, but with an arbitrary index backend this makes much more sense.

(Asterisk here that I might be overlooking something regarding the current existencecache functionality, but it seems unintuitive to me that we "require" the cache to live in memory)

Reviewable status: 0 of 2 LGTMs obtained, and 0 of 10 files reviewed, and pending CI: license/cla, pre-commit-checks

@peakschris
Copy link

peakschris commented Mar 20, 2025

The polling mechanism also seems interesting to me. Previously (with only the memory cache) this probably wouldn't have been too useful, but with an arbitrary index backend this makes much more sense.

This has the side-effect of changing the guarantees that @prakad1x's existence cache can offer. It offers "does not exist" status (as of last poll), but does not offer a definite "exists", because something may have been deleted from s3 in the gap from the last poll. The 'exists' result might still result in a cache miss once s3 is actually queried. I think for this reason it may make sense for this to be kept separate, unless you can think of a way to unify the two?

I like the idea of the data that @prakad1x fetches from s3 being materialized to disk. I agree it should not be required to live in memory.

@prakad1x
Copy link
Author

Yeah I think that would be better. We'll need to change the way that credentials are passed into the sdk anyways with the migration. I put up a wip draft of the migration at #1633 in case it helps.

Your change to the existencecache behavior seems interesting to me for other stores as well. I wasn't aware that the index backend wasn't configurable which seems wrong. What are your thoughts on creating an index backend for the existencecache store? I.e. something along these lines:

{
  "existence_cache": {
    "backend": ...,
    "cache": ...,  // Or index? Or some other name?
  }
}

This way we could keep the existing behavior with a memorystore and also support the IMO very valid usecase of a filesystem store as cache/index backend.

The polling mechanism also seems interesting to me. Previously (with only the memory cache) this probably wouldn't have been too useful, but with an arbitrary index backend this makes much more sense.

(Asterisk here that I might be overlooking something regarding the current existencecache functionality, but it seems unintuitive to me that we "require" the cache to live in memory)

Reviewable status: 0 of 2 LGTMs obtained, and 0 of 10 files reviewed, and pending CI: license/cla, pre-commit-checks

I started some work on it, the problem is how do we determine which backend bucket we are using AWS or ONTAP s3,
To determine whether we're using AWS S3 or ONTAP S3, I'm using the following approach:

// 1. Explicit s3_type parameter set to ONTAP, or
// 2. vserver_name is provided (which is only used by ONTAP)
if spec.s3_type == S3Type::ONTAP || spec.vserver_name.is_some() {
    // ONTAP S3 specific configuration
    // ...
} else {
    // Standard AWS S3 configuration (unchanged)
    // ...
}

For the changes to stores.rs, I've added:
A new S3Type enum to clearly distinguish between AWS and ONTAP S3 backends:

#[allow(non_camel_case_types)]
#[derive(Serialize, Deserialize, Debug, Clone, Copy, PartialEq)]
pub enum S3Type {
    #[serde(rename = "aws")]
    AWS,
    #[serde(rename = "ontap")]
    ONTAP,
}

impl Default for S3Type {
    fn default() -> Self {
        S3Type::AWS
    }
}
  1. Extended the S3Spec struct with ONTAP-specific fields:
  • s3_type: Enum to explicitly specify the S3 implementation (defaults to AWS)
  • endpoint: Required URL for ONTAP S3, optional for AWS S3
  • vserver_name: ONTAP-specific field replacing region for ONTAP
  • root_certificates: Optional path to custom root certificates for ONTAP

All new fields are optional with sensible defaults to ensure backward compatibility with existing AWS S3 users.
This approach ensures backward compatibility - existing AWS S3 users don't need to change their configuration as both s3_type and vserver_name default to values that trigger the AWS path.
Is this approach acceptable, or would you prefer a different method for distinguishing between the two S3 types?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants