Skip to content

refactor: reuse S3 client per Aurora tenant#194

Closed
bajtos wants to merge 3 commits intomainfrom
feature/fil-121-reuse-the-s3-client-instance
Closed

refactor: reuse S3 client per Aurora tenant#194
bajtos wants to merge 3 commits intomainfrom
feature/fil-121-reuse-the-s3-client-instance

Conversation

@bajtos
Copy link
Copy Markdown
Collaborator

@bajtos bajtos commented Apr 13, 2026

Handlers previously built a fresh S3Client on every request. Under provisioned concurrency this re-runs middleware setup and connection pool init on every warm invocation — measurable on get-activity and head-object.

Add getAuroraS3Client(stage, region, tenantId), backed by a module QuickLRU(500) mirroring the existing SSM credentials cache. The client uses an async credentials provider that delegates to getAuroraS3Credentials, so the cached instance stays valid across credential refreshes.

Library functions (listBuckets, listObjects, deleteBucket, deleteObject, headObject, getObjectRetention, presigners) now take an S3Client as first argument; handlers obtain it from the factory. HeadObject's fil-include-meta query injection and X-Fil-Cid response capture move from the shared client onto the per-request HeadObjectCommand, preventing middleware accumulation.

Close FIL-121

Handlers previously built a fresh S3Client on every request. Under
provisioned concurrency this re-runs middleware setup and connection
pool init on every warm invocation — measurable on get-activity and
head-object.

Add getAuroraS3Client(stage, region, tenantId), backed by a module
QuickLRU(500) mirroring the existing SSM credentials cache. The
client uses an async credentials provider that delegates to
getAuroraS3Credentials, so the cached instance stays valid across
credential refreshes.

Library functions (listBuckets, listObjects, deleteBucket,
deleteObject, headObject, getObjectRetention, presigners) now take
an S3Client as first argument; handlers obtain it from the factory.
HeadObject's fil-include-meta query injection and X-Fil-Cid response
capture move from the shared client onto the per-request
HeadObjectCommand, preventing middleware accumulation.

Assisted-by: Claude:claude-opus-4-6
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the backend Aurora S3 integration to reuse a single S3Client per (stage, region, tenantId) within a warm Lambda, reducing repeated AWS SDK client construction overhead (especially under provisioned concurrency).

Changes:

  • Added getAuroraS3Client(stage, region, tenantId) backed by a module-scope QuickLRU cache.
  • Updated Aurora S3 helper functions to accept an S3Client as the first argument; updated handlers accordingly.
  • Moved HeadObject-specific middleware onto the per-request HeadObjectCommand to prevent middleware accumulation, and added/updated unit tests + ADR notes.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/backend/src/lib/aurora-s3-client.ts Introduces cached per-tenant S3Client factory; refactors helpers to accept S3Client; moves HeadObject middleware to command scope.
packages/backend/src/lib/aurora-s3-client.test.ts Adds unit tests for client caching, endpoint configuration, credentials resolution, and eviction behavior.
packages/backend/src/handlers/presign-upload.ts Switches presign upload flow to obtain and pass a cached S3Client.
packages/backend/src/handlers/presign-upload.test.ts Updates mocks/expectations for getAuroraS3Client + new presign function signature.
packages/backend/src/handlers/list-objects.ts Uses cached S3Client and passes it into listObjects.
packages/backend/src/handlers/list-objects.test.ts Updates mocks/expectations for the new S3Client-first helper signature.
packages/backend/src/handlers/head-object.ts Uses cached S3Client; passes it into headObject and getObjectRetention.
packages/backend/src/handlers/head-object.test.ts Updates mocks/expectations for the new S3Client-first helper signatures.
packages/backend/src/handlers/get-usage.test.ts Removes now-unneeded mocks for aurora-s3-client.
packages/backend/src/handlers/get-activity.ts Uses cached S3Client (when tenant setup is complete) and passes it into listBuckets.
packages/backend/src/handlers/get-activity.test.ts Updates mocks to use getAuroraS3Client instead of credential fetching.
packages/backend/src/handlers/download-object.ts Uses cached S3Client and passes it into getPresignedGetObjectUrl.
packages/backend/src/handlers/download-object.test.ts Updates mocks/expectations for getAuroraS3Client + new presign function signature.
packages/backend/src/handlers/delete-object.ts Uses cached S3Client and passes it into deleteObject.
packages/backend/src/handlers/delete-object.test.ts Updates mocks/expectations for the new S3Client-first helper signature.
packages/backend/src/handlers/delete-bucket.ts Uses cached S3Client; passes it into listObjects and deleteBucket.
packages/backend/src/handlers/delete-bucket.test.ts Updates mocks/expectations for the new S3Client-first helper signatures.
docs/architectural-decisions/2026-03-lambda-provisioned-concurrency.md Documents the new S3 client caching strategy and HeadObject middleware placement.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +22 to +25
// One S3Client per (stage, region, tenantId) — reused across Lambda warm
// invocations. Capped at 500 entries to match the SSM credential cache.
const s3ClientCache = new QuickLRU<string, S3Client>({ maxSize: 500 });
export const _resetS3ClientCacheForTesting = () => s3ClientCache.clear();
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LRU cache can evict S3Client instances, but evicted clients aren’t destroyed. Since AWS SDK v3 clients own an HTTP handler/agent with open sockets, eviction can leave idle connections around longer than necessary. Consider configuring the QuickLRU with an eviction callback that calls client.destroy() (and clearing should also destroy) to avoid resource leaks in long-lived Lambda instances.

Suggested change
// One S3Client per (stage, region, tenantId) — reused across Lambda warm
// invocations. Capped at 500 entries to match the SSM credential cache.
const s3ClientCache = new QuickLRU<string, S3Client>({ maxSize: 500 });
export const _resetS3ClientCacheForTesting = () => s3ClientCache.clear();
function destroyS3Client(client: S3Client): void {
client.destroy();
}
// One S3Client per (stage, region, tenantId) — reused across Lambda warm
// invocations. Capped at 500 entries to match the SSM credential cache.
const s3ClientCache = new QuickLRU<string, S3Client>({
maxSize: 500,
onEviction: (_key, client) => destroyS3Client(client),
});
export const _resetS3ClientCacheForTesting = () => {
for (const client of s3ClientCache.values()) {
destroyS3Client(client);
}
s3ClientCache.clear();
};

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 6171a41

Comment on lines +32 to +38
const client = new S3Client({
endpoint: getS3Endpoint(region, stage),
region: 'auto',
credentials: async () => {
const { accessKeyId, secretAccessKey } = await getAuroraS3Credentials(stage, tenantId);
return { accessKeyId, secretAccessKey };
},
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getAuroraS3Client uses an async credentials provider that calls getAuroraS3Credentials on every request. When multiple S3 operations run concurrently on the same client (e.g., head-object does Promise.all([headObject, getObjectRetention])), the provider can be invoked in parallel and trigger duplicate SSM fetches on a cold cache. Consider memoizing/deduping in-flight credential lookups per (stage, tenantId) (or using a memoized credentials provider) so the first warm invocation only hits SSM once.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude's response:

The AWS SDK v3 wraps any function passed as credentials with @smithy/core's memoizeIdentityProvider. That memoizer:

  1. Caches the resolved credentials for the lifetime of the client (there is no expiration returned, so it never re-resolves).
  2. Dedupes concurrent in-flight calls — the second caller awaits the same promise as the first.

So on a single S3Client instance, getAuroraS3Credentials is invoked once, not per request.

Claude also suggested implementing a minimal in-flight dedupe inside getAuroraS3Credentials.


I am not sure if this isn't overcomplication, but since the complexity is contained within a single file (aurora-s3-client.ts) + tests, I feel it's acceptable.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 6171a41

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do no think this will be an issue since Lambda runs once per request. That container gets reused but it wont be executing concurrently on one in-memory version of this code. If a second request comes in while one is ongoing, it will spin up another lambda to handle that with fresh memory.

Address review feedback on PR #194.

S3Client eviction: QuickLRU was dropping references to evicted
clients without destroying them, so the AWS SDK's HTTP handler
kept idle sockets open until GC reclaimed the instance. Add an
onEviction hook that calls client.destroy(), and make the test
helper destroy entries before clearing.

SSM credential dedupe: getAuroraS3Credentials is memoized per
(stage, tenantId) by an LRU of resolved values, but two callers
racing on a cold cache would each issue their own GetParameter
call. Add an in-flight Map<string, Promise> so concurrent lookups
share a single SSM round-trip. The factory's async credentials
provider already benefits indirectly via SDK v3's built-in
credentials memoizer, but this also protects any direct callers.

Assisted-by: Claude:claude-opus-4-6
s3ClientCache.clear();
};

export function getAuroraS3Client(stage: string, region: S3Region, tenantId: string): S3Client {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this one props interface instead of 3 args?

Copy link
Copy Markdown
Collaborator

@joemocode-business joemocode-business left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is going to be majorly overhauled very soon as a part of: https://linear.app/filecoin-foundation/issue/FIL-168/move-aurora-s3-calls-to-frontend

I posted the WIP here: #198

I was going to keep the get-activity and bucket operation endpoints. I think most of these other endpoints get deleted in favor of a presign endpoint.

@bajtos
Copy link
Copy Markdown
Collaborator Author

bajtos commented Apr 14, 2026

So this is going to be majorly overhauled very soon as a part of: https://linear.app/filecoin-foundation/issue/FIL-168/move-aurora-s3-calls-to-frontend

I posted the WIP here: #198

I was going to keep the get-activity and bucket operation endpoints. I think most of these other endpoints get deleted in favor of a presign endpoint.

Thanks for flagging this conflict! I will wait until #198 is landed and then recreate this PR from scratch.

@bajtos bajtos closed this Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants