Skip to content

Feat: S3 calls move to frontend. Refactor backend to one presign handler#198

Open
joemocode-business wants to merge 8 commits intomainfrom
FrontendS3Calls
Open

Feat: S3 calls move to frontend. Refactor backend to one presign handler#198
joemocode-business wants to merge 8 commits intomainfrom
FrontendS3Calls

Conversation

@joemocode-business
Copy link
Copy Markdown
Collaborator

@joemocode-business joemocode-business commented Apr 13, 2026

https://linear.app/filecoin-foundation/issue/FIL-168/move-aurora-s3-calls-to-frontend

I decided to move to a single endpoint to eliminate the need to have many distributed lambdas that all take the heavy SSM tax to get our stored credential for operations that are directly supported by aurora's gateway.

This way we can keep a single lambda warm that performs all the signing operations. The perceived latency is much much better after this change. Also fixes this one by happenstance: https://linear.app/filecoin-foundation/issue/FIL-99/duplicate-upload-incorrectly-creates-two-rows

This of course does not include the bucket create operations nor accessKey creation which need backoffice API.

Manual testing:

  1. created buckets with 1) governance, 2) object lock and no default governance, and 3) default settings.
  2. Upload objects to each (including duplicates).
  3. downloaded and deleted objects across these buckets.
  4. Made sure activity in dashboard rendered correctly per objects I uploaded.
  5. Observed in console no errors - one exception, we get a 400 when requesting governance info from an object without it that is uploaded to a bucket that enables it. This ends up doing the right thing in UI and is expected behavior. We do show governance info when it is set by default. Added more description to separate backlog to address some gaps in the feature: https://linear.app/filecoin-foundation/issue/FIL-127/objectlock-implementation

@joemocode-business
Copy link
Copy Markdown
Collaborator Author

Reopening after testing more thoroughly.

Comment on lines +156 to +176
// The subscription guard middleware uses Read access level so that listing
// and viewing objects still works during a grace period. If the batch
// contains write ops (putObject, deleteObject), block during grace period.
if (ops.some((op) => WRITE_OPS.has(op.op))) {
const billingResult = await dynamo.send(
new GetItemCommand({
TableName: Resource.BillingTable.name,
Key: { pk: { S: `CUSTOMER#${userId}` }, sk: { S: 'SUBSCRIPTION' } },
}),
);
const status = billingResult.Item?.subscriptionStatus?.S;
if (status === SubscriptionStatus.GracePeriod || status === SubscriptionStatus.PastDue) {
return new ResponseBuilder()
.status(403)
.body<ErrorResponse>({
message:
'Your account is in a grace period. Read-only access is available. Please reactivate your subscription to make changes.',
code: ApiErrorCode.GRACE_PERIOD_WRITE_BLOCKED,
})
.build();
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to other suggestions rather than replicating this logic. The core issue is that we only need READ access for most of these operations, but some of them are write operations. Technically, this endpoint does no writes, so we could vend presigned URLs using our Key/Secret credentials for that tenant and let Aurora handle any unauthorized use, but this breaks down if we add additional access control (like user roles). I figured this provided a better experience but I can see how maintaining this code can be iffy.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion, TBH 🤷🏻

@bajtos bajtos requested review from bajtos, Copilot and pyropy and removed request for bajtos and Copilot April 14, 2026 11:39
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR consolidates Aurora S3 signing into a single backend presign endpoint and shifts most S3 operations (list/head/retention/delete/download/upload execution) to the frontend via presigned URLs, reducing Lambda/SSM overhead and improving perceived latency.

Changes:

  • Replace multiple per-operation S3 Lambdas with one POST /api/presign handler that batch-generates presigned URLs.
  • Update the website to call batchPresign() and execute presigned S3 requests directly, including new XML/header parsing utilities.
  • Extend bucket metadata to expose objectLockEnabled for conditional retention fetching.

Reviewed changes

Copilot reviewed 25 out of 26 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
sst.config.ts Removes per-object S3 routes and adds the single /api/presign route with provisioned concurrency.
sst-env.d.ts Updates generated resource typings to remove old S3 handlers and add Presign.
packages/website/src/pages/ObjectDetailPage.tsx Switches object metadata/retention fetching to presigned S3 calls.
packages/website/src/pages/BucketDetailPage.tsx Switches object listing to presigned S3 ListObjects + XML parsing.
packages/website/src/lib/use-presign.ts Adds batchPresign() helper calling /presign.
packages/website/src/lib/use-object-actions.ts Updates delete/download/copy-url actions to use presigned S3 operations.
packages/website/src/lib/use-file-upload.ts Updates upload flow to use putObject presign and direct PUT to S3.
packages/website/src/lib/aurora-s3.ts Adds browser-side XML/header/error parsing + presigned URL execution helper.
packages/shared/src/index.ts Exports new presign API types/schemas.
packages/shared/src/api/presign.ts Adds Zod schemas + types for batch presign request/response.
packages/shared/src/api/buckets.ts Adds objectLockEnabled? to the Bucket shape.
packages/backend/src/lib/aurora-s3-client.ts Refactors S3 client creation and adds presigned URL generators for multiple ops.
packages/backend/src/handlers/presign.ts Introduces consolidated presign handler with batch ops + grace-period write blocking.
packages/backend/src/handlers/presign-upload.ts Removed (superseded by presign).
packages/backend/src/handlers/presign-upload.test.ts Removed (superseded by presign tests, not yet added).
packages/backend/src/handlers/list-objects.ts Removed (now presigned + executed in browser).
packages/backend/src/handlers/list-objects.test.ts Removed.
packages/backend/src/handlers/head-object.ts Removed (now presigned + executed in browser).
packages/backend/src/handlers/head-object.test.ts Removed.
packages/backend/src/handlers/download-object.ts Removed (now presigned + executed in browser).
packages/backend/src/handlers/download-object.test.ts Removed.
packages/backend/src/handlers/delete-object.ts Removed (now presigned + executed in browser).
packages/backend/src/handlers/delete-object.test.ts Removed.
packages/backend/src/handlers/get-bucket.ts Adds objectLockEnabled mapping from Aurora response.
packages/backend/src/handlers/get-bucket.test.ts Adds assertions for objectLockEnabled true/false.
docs/architectural-decisions/2026-04-presigned-url-s3-operations.md Adds ADR documenting the new presigned URL architecture and tradeoffs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +159 to +167
if (ops.some((op) => WRITE_OPS.has(op.op))) {
const billingResult = await dynamo.send(
new GetItemCommand({
TableName: Resource.BillingTable.name,
Key: { pk: { S: `CUSTOMER#${userId}` }, sk: { S: 'SUBSCRIPTION' } },
}),
);
const status = billingResult.Item?.subscriptionStatus?.S;
if (status === SubscriptionStatus.GracePeriod || status === SubscriptionStatus.PastDue) {
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The write-op grace-period check does a second GetItem right after subscriptionGuardMiddleware(AccessLevel.Read) runs. Because DynamoDB reads are eventually consistent by default, this check can see a stale subscriptionStatus (e.g., still trialing after the middleware lazily transitioned it to grace_period) and incorrectly allow write presigns. Consider using ConsistentRead: true for this billing record lookup (or otherwise reusing the middleware’s computed status) to avoid any access-control bypass due to stale reads.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highly unlikely to be a probblem. BUt I like idea to use middy computed status. Almost went that approach initially. will do.

Comment on lines +132 to +177
export async function baseHandler(
event: AuthenticatedEvent,
): Promise<APIGatewayProxyStructuredResultV2> {
let body: unknown;
try {
body = JSON.parse(event.body ?? '[]');
} catch {
return new ResponseBuilder()
.status(400)
.body<ErrorResponse>({ message: 'Invalid JSON body' })
.build();
}

const parsed = PresignRequestSchema.safeParse(body);
if (!parsed.success) {
return new ResponseBuilder()
.status(400)
.body<ErrorResponse>({ message: parsed.error.issues[0].message })
.build();
}

const ops = parsed.data;
const { orgId, userId } = getUserInfo(event);

// The subscription guard middleware uses Read access level so that listing
// and viewing objects still works during a grace period. If the batch
// contains write ops (putObject, deleteObject), block during grace period.
if (ops.some((op) => WRITE_OPS.has(op.op))) {
const billingResult = await dynamo.send(
new GetItemCommand({
TableName: Resource.BillingTable.name,
Key: { pk: { S: `CUSTOMER#${userId}` }, sk: { S: 'SUBSCRIPTION' } },
}),
);
const status = billingResult.Item?.subscriptionStatus?.S;
if (status === SubscriptionStatus.GracePeriod || status === SubscriptionStatus.PastDue) {
return new ResponseBuilder()
.status(403)
.body<ErrorResponse>({
message:
'Your account is in a grace period. Read-only access is available. Please reactivate your subscription to make changes.',
code: ApiErrorCode.GRACE_PERIOD_WRITE_BLOCKED,
})
.build();
}
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR removes several handler tests but doesn’t add coverage for the new consolidated presign handler. Since this endpoint now gates all object read/write presigning and has non-trivial branching (schema validation, org setup checks, grace-period write blocking, per-op URL generation), please add unit tests for at least: invalid JSON / schema errors, read-only batch success, mixed read+write batch blocked during grace period/past due, and ensuring item order matches request order.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Let's re-add tests.

Comment on lines +12 to +40
export function parseListObjectsResponse(xml: string): ListObjectsResponse {
const doc = new DOMParser().parseFromString(xml, 'application/xml');

const errorNode = doc.querySelector('parsererror');
if (errorNode) {
throw new Error('Failed to parse S3 ListObjects response');
}

const contents = doc.getElementsByTagName('Contents');
const objects: S3Object[] = [];

for (let i = 0; i < contents.length; i++) {
const el = contents[i];
const key = getText(el, 'Key');
if (!key) continue;

objects.push({
key,
sizeBytes: parseInt(getText(el, 'Size') ?? '0', 10),
lastModified: getText(el, 'LastModified') ?? new Date().toISOString(),
...(getText(el, 'ETag') && { etag: getText(el, 'ETag') }),
});
}

const nextToken = getText(doc.documentElement, 'NextContinuationToken');
const isTruncated = getText(doc.documentElement, 'IsTruncated') === 'true';

return { objects, nextToken, isTruncated };
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New XML/header parsing + S3 error parsing is introduced here but there are no unit tests alongside the existing website lib tests (e.g., time.test.ts, hubspot.test.ts). Please add tests covering representative ListObjectsV2 XML (including empty buckets and pagination), HeadObject headers -> metadata mapping (including x-amz-meta-* and x-fil-cid), and error XML parsing so regressions don’t silently break object browsing.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding unit tests to this is another good call.

Comment on lines 88 to 90
xhr.onerror = () => reject(new Error('Upload failed'));
xhr.open('PUT', presignData.url);
xhr.open('PUT', presignedUrl);
xhr.setRequestHeader('Content-Type', contentType);
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xhr.open('PUT', presignedUrl) hard-codes the method even though the presign response includes method. To make this more robust (and to fail fast if the backend ever changes/misconfigures the op), consider using the returned method (or at least asserting it’s PUT) rather than assuming.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlikely to be an issue. Let's go ahead and do this anyways.

Comment on lines +53 to +57
const cachedBucket = queryClient.getQueryData<GetBucketResponse>(
queryKeys.bucket(bucketName),
);
const hasObjectLock = cachedBucket?.bucket.objectLockEnabled ?? false;

Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasObjectLock is derived only from queryClient.getQueryData(queryKeys.bucket(bucketName)). If a user lands on this page via a direct link (or cache was GC’d), cachedBucket will be undefined and you’ll never request retention info even when the bucket has Object Lock enabled. Consider ensuring the bucket metadata is fetched here (or always including getObjectRetention and handling the expected error) so retention data is correct regardless of navigation path.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look into this again.

Comment on lines +70 to +76
if (hasObjectLock && items[1]) {
const retentionResult = await fetch(items[1].url, { method: items[1].method });
if (retentionResult.ok) {
const xml = await retentionResult.text();
retention = parseGetObjectRetentionResponse(xml) ?? undefined;
}
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getObjectRetention call uses a raw fetch() and silently ignores all non-2xx responses. That will hide real failures (e.g., expired presign URL, access denied, network errors) and can make the UI show “no retention” when the request actually failed. Consider reusing executePresignedUrl() (and/or parsing the S3 error body) while explicitly allowing only the expected “no retention configured” error cases to be treated as absence of retention.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this block use executePresignedUrl instead of fetch?

If not, then I think we need to handle the case where the error response XML body indicates and error, don't we?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the error paths and we get 400 and the UI renders appropriately with and without object level lock. that said, Ill look into this suggestion.

Copy link
Copy Markdown
Member

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff!

I love the high-level direction, assuming we are okay with increased latency for some geographies.

See my comments below for things to improve.

Comment on lines +14 to +24
### Current Request Path (Proxied)

```
Browser -> CloudFront -> API Gateway -> Lambda -> Aurora S3 -> Lambda -> API Gateway -> CloudFront -> Browser
```

### Desired Request Path (Presigned)

```
Browser -> Lambda (presign, ~50ms) -> Browser -> Aurora S3 (direct)
```
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this decision, but want to flag the impact of different network routing on response times. (Latency numbers based on https://latency.bluegoat.net/).

For example, for customers in New Zealand, we are increasing the response times by ~300ms.

Current request path - total ~302ms

  • ap-southeast-6 → us-east-2 (FilOne Console): ~212ms
  • us-east-2 → eu-west-1 (Aurora S3): ~90ms

New request path - total ~508ms

  • ap-southeast-6 -> us-east-2: ~212ms
  • ap-southeast-6 -> eu-west-1: ~296ms

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting numbers.... Will add this to the ADR.

In my experience close to us-west-2, it feels faster. 🤷

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, on second thought, I think my comment is missing the perspective. We are primarily targeting customers in Europe now, and they should see a small improvement in response times.

Current request path - total 181ms

  • eu-central-1 (Germany) → us-east-2: 102ms
  • us-east-2 → eu-west-2: 79ms

New request path - total 122ms

  • eu-central-1 → us-east-2: 102ms
  • eu-central-1 → eu-west-2: 20ms

I think improving the response times for European customers by ~60ms (30%) is meaningful.

| CreateBucket | Aurora Portal API mutation |
| DeleteBucket | Aurora Portal API; must verify bucket is empty server-side |

ListBuckets could switch from the Portal API to the S3 `ListBuckets` command (making it presignable), since the handler currently only uses `name`, `createdAt`, `region` (hardcoded), and `isPublic` (hardcoded false). However, the Portal API returns richer metadata that will matter as the UI matures. This can be revisited independently.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

I was thinking about the same! 😄

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL yeah I thought you'd be aligned on this one. We should rethink this if it complicates other providers


**Route:** `POST /api/presign`

**Middleware:** Auth (JWT cookie) + subscription guard. No CSRF — presigned URLs are themselves the authorization token. The handler inspects the batch to determine access level: if any operation is `putObject` or `deleteObject`, Write access is required; otherwise Read.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused about why we need to determine Write vs Read access. After inspecting the source code, I realised this is used to support write-lock for accounts past the trial period.

Can you please enhance the ADR and explain why Write/Read access is important.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good call. I left that out. Will add this into ADR.

TECHNICALLY the API is read access... but better to enforce it where we can in case of future changes to how we utilize these URLs and/or API.


### Aurora CORS Header Exposure

The Aurora S3 endpoint must expose `x-fil-cid` and `x-amz-meta-*` headers via `Access-Control-Expose-Headers` for HeadObject to work from the browser. Without this, the Filecoin CID and custom metadata are invisible to JavaScript. File upload (PUT) already works, confirming CORS is partially configured. GET, HEAD, and DELETE methods and the specific exposed headers must be verified before deploying the frontend changes. The presign handler can ship independently; only the frontend switch depends on CORS.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joemocode-business did you verify that Aurora S3 has the correct CORS configuration? If yes, then please update this section accordingly.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does indeed! I missed updating this. Thanks. Will update.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for this new file.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great callout. My bad!

Comment on lines +156 to +176
// The subscription guard middleware uses Read access level so that listing
// and viewing objects still works during a grace period. If the batch
// contains write ops (putObject, deleteObject), block during grace period.
if (ops.some((op) => WRITE_OPS.has(op.op))) {
const billingResult = await dynamo.send(
new GetItemCommand({
TableName: Resource.BillingTable.name,
Key: { pk: { S: `CUSTOMER#${userId}` }, sk: { S: 'SUBSCRIPTION' } },
}),
);
const status = billingResult.Item?.subscriptionStatus?.S;
if (status === SubscriptionStatus.GracePeriod || status === SubscriptionStatus.PastDue) {
return new ResponseBuilder()
.status(403)
.body<ErrorResponse>({
message:
'Your account is in a grace period. Read-only access is available. Please reactivate your subscription to make changes.',
code: ApiErrorCode.GRACE_PERIOD_WRITE_BLOCKED,
})
.build();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion, TBH 🤷🏻

Comment on lines -293 to -294
// Inject fil-include-meta=1 query parameter so Aurora returns
// X-Fil-Cid and X-Fil-Offload-Status headers in the response.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please preserve this code comment and move it to the relevant place in the new version.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment thread packages/website/src/lib/aurora-s3.ts Outdated

const errorNode = doc.querySelector('parsererror');
if (errorNode) {
throw new Error('Failed to parse S3 ListObjects response');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include information about the parsing error in the error message, or at least log it to console?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good call.

Comment thread packages/website/src/lib/aurora-s3.ts Outdated
Comment on lines +90 to +91
const errorNode = doc.querySelector('parsererror');
if (errorNode) return null;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to silently swallow the parsing error?

IMO, we should at least log the error to console.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let's log it

Comment on lines +70 to +76
if (hasObjectLock && items[1]) {
const retentionResult = await fetch(items[1].url, { method: items[1].method });
if (retentionResult.ok) {
const xml = await retentionResult.text();
retention = parseGetObjectRetentionResponse(xml) ?? undefined;
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this block use executePresignedUrl instead of fetch?

If not, then I think we need to handle the case where the error response XML body indicates and error, don't we?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants