Add ONTAP S3 Store with existence cache #1630

prakad1x · 2025-03-16T20:30:09Z

Implements a NetApp ONTAP S3 compatible backend with an existence cache for better performance when checking if objects exist. This provides a specialized S3 implementation for NetApp ONTAP system with optimized handling for Content Addressable Storage.

Description

This PR implements support for NetApp ONTAP S3-compatible storage as a backend for NativeLink, along with an existence cache layer that optimizes performance when checking if objects exist.
The implementation includes:

A dedicated OntapS3Store that provides specialized handling for NetApp ONTAP S3 systems with proper configuration for TLS, credentials management, and vserver settings.
An OntapS3ExistenceCache layer that maintains an in-memory cache of object digests and periodically synchronizes with the backend to reduce latency for repeated existence checks.

This implementation provides significant performance improvements for build systems that repeatedly check for the same objects, reducing network calls and latency in environments using NetApp ONTAP storage.

New Work

The following complete files of new code are contributed as part of this PR (ontap_s3_existence_cache_store.rs / ontap_s3_existence_cache_store_test.rs) and copyright is held by Siemens Digital Industries Software. This work is licensed to Nativelink under our agreed CCLA. We request Siemens Industry Software Inc. is listed in any central copyright holders list.

Fixes # (issue)

Type of change

Please delete options that aren't relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

How Has This Been Tested?

This implementation has been thoroughly tested through multiple approaches:

Unit tests covering both OntapS3Store and OntapS3ExistenceCache components
Published a new container image with tag v0.6.0-splm-2 to container registry
Deployed the tagged image to our staging environment connected to an actual NetApp ONTAP S3 instance
Tested parallel uploads using multipart functionality with large test artifacts
Validated correct behavior of cache synchronization over extended periods

Checklist

Updated documentation if needed
Tests added/amended
bazel test //... passes locally
PR is contained in a single commit, using git amend see some docs

This change is

Implements a NetApp ONTAP S3 compatible backend with an existence cache for better performance when checking if objects exist. This provides a specialized S3 implementation for NetApp ONTAP system with optimized handling for Content Addressable Storage. Signed-off-by: Kadam (EXT), Prajwal v08wha <[email protected]>

CLAassistant · 2025-03-16T20:30:16Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ palfrey
❌ prakad1x
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

aaronmondal · 2025-03-18T21:02:20Z

Initially, this seems quite interesting. I do believe that we should defer review of this to after the next aws sdk update. It was a long standing issue that the aws sdk didn't support hyper 1.x, but that's now been fixed in the latest versions. So I'd say let's first bring our S3 implementation to hyper 1.x and then we have a better baseline to implement this ontap functionality. If we do it the other way around it might complicate the migration as we'd have to migrate two stores to the new hyper.

@prakad1x Regarding the implementation, did you investigate a templating/monomorphization approach for the existing S3 store? This implementation seems very similar to the existing one, so there might be a way to implement this functionality with a lot less code. Also, could you outline how this is different from wrapping the S3 store in an existencecache?

prakad1x · 2025-03-19T07:25:36Z

Initially, this seems quite interesting. I do believe that we should defer review of this to after the next aws sdk update. It was a long standing issue that the aws sdk didn't support hyper 1.x, but that's now been fixed in the latest versions. So I'd say let's first bring our S3 implementation to hyper 1.x and then we have a better baseline to implement this ontap functionality. If we do it the other way around it might complicate the migration as we'd have to migrate two stores to the new hyper.

@prakad1x Regarding the implementation, did you investigate a templating/monomorphization approach for the existing S3 store? This implementation seems very similar to the existing one, so there might be a way to implement this functionality with a lot less code. Also, could you outline how this is different from wrapping the S3 store in an existencecache?

I understand your suggestion to wait for the AWS SDK update to Hyper 1.x before proceeding with this PR. That makes sense from a maintenance perspective to avoid having to migrate two stores later.

Templating/Monomorphization

I initially looked at generalizing the existing S3 store with configuration options or traits to support both standard S3 and ONTAP S3. I had similar concerns about code duplication and discussed this with Marcus. He specifically recommended creating a separate implementation rather than trying to modify the existing S3 store to handle both use cases.
Slack discussion link

Regarding the Existence Cache

The OntapS3ExistenceCache is fundamentally different from wrapping S3Store in an ExistenceCacheStore because:

File-backed persistence mechanism: It provides a file-backed persistence mechanism for the cache.
Background synchronization: It implements background synchronization with configurable intervals.
Complete index of objects: Most importantly, it maintains a complete index of all objects in the bucket through periodic listing operations. This allows it to definitively answer "no" for objects that don't exist without making network calls to s3, whereas the standard ExistenceCacheStore must always check the underlying store for any object it hasn't seen before. This significantly reduces network calls for non-existent objects.

Regarding the Extent of Changes

It's worth noting that while the PR appears to introduce entirely new files, the implementation is actually built on top of the existing S3 store with targeted modifications. The commits were squashed, so you are seeing it as completely new code. I suggest looking at this development commit 8000013, specifically ontap_s3_store.rs from a different branch I used during development. It will give you a clearer idea of what I contributed ontop of the S3 store as a base implementation to make it work with ONTAP S3.

prakad1x · 2025-03-19T10:09:12Z

Initially, this seems quite interesting. I do believe that we should defer review of this to after the next aws sdk update. It was a long standing issue that the aws sdk didn't support hyper 1.x, but that's now been fixed in the latest versions. So I'd say let's first bring our S3 implementation to hyper 1.x and then we have a better baseline to implement this ontap functionality. If we do it the other way around it might complicate the migration as we'd have to migrate two stores to the new hyper.
@prakad1x Regarding the implementation, did you investigate a templating/monomorphization approach for the existing S3 store? This implementation seems very similar to the existing one, so there might be a way to implement this functionality with a lot less code. Also, could you outline how this is different from wrapping the S3 store in an existencecache?

I understand your suggestion to wait for the AWS SDK update to Hyper 1.x before proceeding with this PR. That makes sense from a maintenance perspective to avoid having to migrate two stores later.

Templating/Monomorphization

I initially looked at generalizing the existing S3 store with configuration options or traits to support both standard S3 and ONTAP S3. I had similar concerns about code duplication and discussed this with Marcus. He specifically recommended creating a separate implementation rather than trying to modify the existing S3 store to handle both use cases. Slack discussion link

Regarding the Existence Cache

The OntapS3ExistenceCache is fundamentally different from wrapping S3Store in an ExistenceCacheStore because:

File-backed persistence mechanism: It provides a file-backed persistence mechanism for the cache.

Background synchronization: It implements background synchronization with configurable intervals.

Complete index of objects: Most importantly, it maintains a complete index of all objects in the bucket through periodic listing operations. This allows it to definitively answer "no" for objects that don't exist without making network calls to s3, whereas the standard ExistenceCacheStore must always check the underlying store for any object it hasn't seen before. This significantly reduces network calls for non-existent objects.

Regarding the Extent of Changes

It's worth noting that while the PR appears to introduce entirely new files, the implementation is actually built on top of the existing S3 store with targeted modifications. The commits were squashed, so you are seeing it as completely new code. I suggest looking at this development commit 8000013, specifically ontap_s3_store.rs from a different branch I used during development. It will give you a clearer idea of what I contributed ontop of the S3 store as a base implementation to make it work with ONTAP S3.

I think updating the s3 store to work with both aws s3 and ontap s3 would be better, should I go ahead and do that? because the changes are relatively less and largely the code does the same thing, refer to this commit to see what I have added ontop of base s3 store implementation to work with ontap s3 8000013 specifically ontap_s3_store.rs @aaronmondal @MarcusSorealheis ?

aaronmondal

Yeah I think that would be better. We'll need to change the way that credentials are passed into the sdk anyways with the migration. I put up a wip draft of the migration at #1633 in case it helps.

Your change to the existencecache behavior seems interesting to me for other stores as well. I wasn't aware that the index backend wasn't configurable which seems wrong. What are your thoughts on creating an index backend for the existencecache store? I.e. something along these lines:

{
  "existence_cache": {
    "backend": ...,
    "cache": ...,  // Or index? Or some other name?
  }
}

This way we could keep the existing behavior with a memorystore and also support the IMO very valid usecase of a filesystem store as cache/index backend.

The polling mechanism also seems interesting to me. Previously (with only the memory cache) this probably wouldn't have been too useful, but with an arbitrary index backend this makes much more sense.

(Asterisk here that I might be overlooking something regarding the current existencecache functionality, but it seems unintuitive to me that we "require" the cache to live in memory)

Reviewable status: 0 of 2 LGTMs obtained, and 0 of 10 files reviewed, and pending CI: license/cla, pre-commit-checks

peakschris · 2025-03-20T10:13:03Z

The polling mechanism also seems interesting to me. Previously (with only the memory cache) this probably wouldn't have been too useful, but with an arbitrary index backend this makes much more sense.

This has the side-effect of changing the guarantees that @prakad1x's existence cache can offer. It offers "does not exist" status (as of last poll), but does not offer a definite "exists", because something may have been deleted from s3 in the gap from the last poll. The 'exists' result might still result in a cache miss once s3 is actually queried. I think for this reason it may make sense for this to be kept separate, unless you can think of a way to unify the two?

I like the idea of the data that @prakad1x fetches from s3 being materialized to disk. I agree it should not be required to live in memory.

prakad1x · 2025-03-20T17:54:42Z

Yeah I think that would be better. We'll need to change the way that credentials are passed into the sdk anyways with the migration. I put up a wip draft of the migration at #1633 in case it helps.

Your change to the existencecache behavior seems interesting to me for other stores as well. I wasn't aware that the index backend wasn't configurable which seems wrong. What are your thoughts on creating an index backend for the existencecache store? I.e. something along these lines:
{
  "existence_cache": {
    "backend": ...,
    "cache": ...,  // Or index? Or some other name?
  }
}
This way we could keep the existing behavior with a memorystore and also support the IMO very valid usecase of a filesystem store as cache/index backend.

The polling mechanism also seems interesting to me. Previously (with only the memory cache) this probably wouldn't have been too useful, but with an arbitrary index backend this makes much more sense.

(Asterisk here that I might be overlooking something regarding the current existencecache functionality, but it seems unintuitive to me that we "require" the cache to live in memory)

Reviewable status: 0 of 2 LGTMs obtained, and 0 of 10 files reviewed, and pending CI: license/cla, pre-commit-checks

I started some work on it, the problem is how do we determine which backend bucket we are using AWS or ONTAP s3,
To determine whether we're using AWS S3 or ONTAP S3, I'm using the following approach:

// 1. Explicit s3_type parameter set to ONTAP, or
// 2. vserver_name is provided (which is only used by ONTAP)
if spec.s3_type == S3Type::ONTAP || spec.vserver_name.is_some() {
    // ONTAP S3 specific configuration
    // ...
} else {
    // Standard AWS S3 configuration (unchanged)
    // ...
}

For the changes to stores.rs, I've added:
A new S3Type enum to clearly distinguish between AWS and ONTAP S3 backends:

#[allow(non_camel_case_types)]
#[derive(Serialize, Deserialize, Debug, Clone, Copy, PartialEq)]
pub enum S3Type {
    #[serde(rename = "aws")]
    AWS,
    #[serde(rename = "ontap")]
    ONTAP,
}

impl Default for S3Type {
    fn default() -> Self {
        S3Type::AWS
    }
}

Extended the S3Spec struct with ONTAP-specific fields:

s3_type: Enum to explicitly specify the S3 implementation (defaults to AWS)
endpoint: Required URL for ONTAP S3, optional for AWS S3
vserver_name: ONTAP-specific field replacing region for ONTAP
root_certificates: Optional path to custom root certificates for ONTAP

All new fields are optional with sensible defaults to ensure backward compatibility with existing AWS S3 users.
This approach ensures backward compatibility - existing AWS S3 users don't need to change their configuration as both s3_type and vserver_name default to values that trigger the AWS path.
Is this approach acceptable, or would you prefer a different method for distinguishing between the two S3 types?

palfrey · 2025-06-13T12:24:37Z

I've just pushed a whole lot of changes to this, bringing it at least somewhat in line with the unified cloud storage setup, and more importantly making it work with current main again. As it stands, it passes local testing, but I'm still working on this further.

peakschris · 2025-07-08T20:14:37Z

@MarcusSorealheis could you mark @prakad1x as CLA signed? As he is covered by our Siemens CCLA - thanks!

MarcusSorealheis requested review from aaronmondal, MarcusSorealheis and kubevalet March 18, 2025 20:22

aaronmondal reviewed Mar 19, 2025

View reviewed changes

aaronmondal mentioned this pull request Mar 26, 2025

Added implementation for Google Cloud Storage with REST #1645

Merged

6 tasks

palfrey temporarily deployed to production June 13, 2025 12:22 — with GitHub Actions Inactive

palfrey had a problem deploying to production June 13, 2025 12:22 — with GitHub Actions Error

palfrey force-pushed the add-ontap-s3-store branch from c4de838 to f9c76c0 Compare June 13, 2025 12:38

palfrey temporarily deployed to production June 13, 2025 12:38 — with GitHub Actions Inactive

palfrey force-pushed the add-ontap-s3-store branch from f9c76c0 to 45be46f Compare June 13, 2025 13:04

palfrey temporarily deployed to production June 13, 2025 13:04 — with GitHub Actions Inactive

palfrey had a problem deploying to production June 13, 2025 13:04 — with GitHub Actions Error

palfrey temporarily deployed to production June 13, 2025 13:04 — with GitHub Actions Inactive

palfrey had a problem deploying to production June 25, 2025 08:13 — with GitHub Actions Error

palfrey temporarily deployed to production June 25, 2025 08:13 — with GitHub Actions Inactive

palfrey force-pushed the add-ontap-s3-store branch from b5d5855 to f170dcc Compare June 25, 2025 08:23

palfrey temporarily deployed to production June 25, 2025 08:23 — with GitHub Actions Inactive

palfrey force-pushed the add-ontap-s3-store branch from f170dcc to fe8f7cc Compare June 25, 2025 09:34

palfrey temporarily deployed to production June 25, 2025 09:34 — with GitHub Actions Inactive

palfrey marked this pull request as ready for review June 25, 2025 12:25

Merge branch 'main' into add-ontap-s3-store

5875a76

palfrey force-pushed the add-ontap-s3-store branch from fe8f7cc to 0fa2b01 Compare June 25, 2025 15:55

palfrey temporarily deployed to production June 25, 2025 15:55 — with GitHub Actions Inactive

palfrey had a problem deploying to production June 25, 2025 15:55 — with GitHub Actions Error

Ontap: Various fixes for the s3-compatible store

9bd689f

palfrey force-pushed the add-ontap-s3-store branch from 0fa2b01 to 9bd689f Compare June 25, 2025 15:56

palfrey temporarily deployed to production June 25, 2025 15:56 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ONTAP S3 Store with existence cache #1630

Add ONTAP S3 Store with existence cache #1630

Uh oh!

prakad1x commented Mar 16, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Mar 16, 2025 •

edited

Loading

Uh oh!

aaronmondal commented Mar 18, 2025

Uh oh!

prakad1x commented Mar 19, 2025 •

edited

Loading

Uh oh!

prakad1x commented Mar 19, 2025 •

edited

Loading

Templating/Monomorphization

Regarding the Existence Cache

Regarding the Extent of Changes

Uh oh!

aaronmondal left a comment •

edited

Loading

Uh oh!

peakschris commented Mar 20, 2025 •

edited

Loading

Uh oh!

prakad1x commented Mar 20, 2025

Uh oh!

palfrey commented Jun 13, 2025

Uh oh!

peakschris commented Jul 8, 2025

Uh oh!

Uh oh!

Add ONTAP S3 Store with existence cache #1630

Are you sure you want to change the base?

Add ONTAP S3 Store with existence cache #1630

Uh oh!

Conversation

prakad1x commented Mar 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

New Work

Type of change

How Has This Been Tested?

Checklist

Uh oh!

CLAassistant commented Mar 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aaronmondal commented Mar 18, 2025

Uh oh!

prakad1x commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Templating/Monomorphization

Regarding the Existence Cache

Regarding the Extent of Changes

Uh oh!

prakad1x commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Templating/Monomorphization

Regarding the Existence Cache

Regarding the Extent of Changes

Uh oh!

aaronmondal left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peakschris commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prakad1x commented Mar 20, 2025

Uh oh!

palfrey commented Jun 13, 2025

Uh oh!

peakschris commented Jul 8, 2025

Uh oh!

Uh oh!

prakad1x commented Mar 16, 2025 •

edited

Loading

CLAassistant commented Mar 16, 2025 •

edited

Loading

prakad1x commented Mar 19, 2025 •

edited

Loading

prakad1x commented Mar 19, 2025 •

edited

Loading

aaronmondal left a comment •

edited

Loading

peakschris commented Mar 20, 2025 •

edited

Loading