Feat: S3 calls move to frontend. Refactor backend to one presign handler#198
Feat: S3 calls move to frontend. Refactor backend to one presign handler#198joemocode-business wants to merge 8 commits intomainfrom
Conversation
|
Reopening after testing more thoroughly. |
| // The subscription guard middleware uses Read access level so that listing | ||
| // and viewing objects still works during a grace period. If the batch | ||
| // contains write ops (putObject, deleteObject), block during grace period. | ||
| if (ops.some((op) => WRITE_OPS.has(op.op))) { | ||
| const billingResult = await dynamo.send( | ||
| new GetItemCommand({ | ||
| TableName: Resource.BillingTable.name, | ||
| Key: { pk: { S: `CUSTOMER#${userId}` }, sk: { S: 'SUBSCRIPTION' } }, | ||
| }), | ||
| ); | ||
| const status = billingResult.Item?.subscriptionStatus?.S; | ||
| if (status === SubscriptionStatus.GracePeriod || status === SubscriptionStatus.PastDue) { | ||
| return new ResponseBuilder() | ||
| .status(403) | ||
| .body<ErrorResponse>({ | ||
| message: | ||
| 'Your account is in a grace period. Read-only access is available. Please reactivate your subscription to make changes.', | ||
| code: ApiErrorCode.GRACE_PERIOD_WRITE_BLOCKED, | ||
| }) | ||
| .build(); | ||
| } |
There was a problem hiding this comment.
Open to other suggestions rather than replicating this logic. The core issue is that we only need READ access for most of these operations, but some of them are write operations. Technically, this endpoint does no writes, so we could vend presigned URLs using our Key/Secret credentials for that tenant and let Aurora handle any unauthorized use, but this breaks down if we add additional access control (like user roles). I figured this provided a better experience but I can see how maintaining this code can be iffy.
There was a problem hiding this comment.
I don't have a strong opinion, TBH 🤷🏻
There was a problem hiding this comment.
Pull request overview
This PR consolidates Aurora S3 signing into a single backend presign endpoint and shifts most S3 operations (list/head/retention/delete/download/upload execution) to the frontend via presigned URLs, reducing Lambda/SSM overhead and improving perceived latency.
Changes:
- Replace multiple per-operation S3 Lambdas with one
POST /api/presignhandler that batch-generates presigned URLs. - Update the website to call
batchPresign()and execute presigned S3 requests directly, including new XML/header parsing utilities. - Extend bucket metadata to expose
objectLockEnabledfor conditional retention fetching.
Reviewed changes
Copilot reviewed 25 out of 26 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| sst.config.ts | Removes per-object S3 routes and adds the single /api/presign route with provisioned concurrency. |
| sst-env.d.ts | Updates generated resource typings to remove old S3 handlers and add Presign. |
| packages/website/src/pages/ObjectDetailPage.tsx | Switches object metadata/retention fetching to presigned S3 calls. |
| packages/website/src/pages/BucketDetailPage.tsx | Switches object listing to presigned S3 ListObjects + XML parsing. |
| packages/website/src/lib/use-presign.ts | Adds batchPresign() helper calling /presign. |
| packages/website/src/lib/use-object-actions.ts | Updates delete/download/copy-url actions to use presigned S3 operations. |
| packages/website/src/lib/use-file-upload.ts | Updates upload flow to use putObject presign and direct PUT to S3. |
| packages/website/src/lib/aurora-s3.ts | Adds browser-side XML/header/error parsing + presigned URL execution helper. |
| packages/shared/src/index.ts | Exports new presign API types/schemas. |
| packages/shared/src/api/presign.ts | Adds Zod schemas + types for batch presign request/response. |
| packages/shared/src/api/buckets.ts | Adds objectLockEnabled? to the Bucket shape. |
| packages/backend/src/lib/aurora-s3-client.ts | Refactors S3 client creation and adds presigned URL generators for multiple ops. |
| packages/backend/src/handlers/presign.ts | Introduces consolidated presign handler with batch ops + grace-period write blocking. |
| packages/backend/src/handlers/presign-upload.ts | Removed (superseded by presign). |
| packages/backend/src/handlers/presign-upload.test.ts | Removed (superseded by presign tests, not yet added). |
| packages/backend/src/handlers/list-objects.ts | Removed (now presigned + executed in browser). |
| packages/backend/src/handlers/list-objects.test.ts | Removed. |
| packages/backend/src/handlers/head-object.ts | Removed (now presigned + executed in browser). |
| packages/backend/src/handlers/head-object.test.ts | Removed. |
| packages/backend/src/handlers/download-object.ts | Removed (now presigned + executed in browser). |
| packages/backend/src/handlers/download-object.test.ts | Removed. |
| packages/backend/src/handlers/delete-object.ts | Removed (now presigned + executed in browser). |
| packages/backend/src/handlers/delete-object.test.ts | Removed. |
| packages/backend/src/handlers/get-bucket.ts | Adds objectLockEnabled mapping from Aurora response. |
| packages/backend/src/handlers/get-bucket.test.ts | Adds assertions for objectLockEnabled true/false. |
| docs/architectural-decisions/2026-04-presigned-url-s3-operations.md | Adds ADR documenting the new presigned URL architecture and tradeoffs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (ops.some((op) => WRITE_OPS.has(op.op))) { | ||
| const billingResult = await dynamo.send( | ||
| new GetItemCommand({ | ||
| TableName: Resource.BillingTable.name, | ||
| Key: { pk: { S: `CUSTOMER#${userId}` }, sk: { S: 'SUBSCRIPTION' } }, | ||
| }), | ||
| ); | ||
| const status = billingResult.Item?.subscriptionStatus?.S; | ||
| if (status === SubscriptionStatus.GracePeriod || status === SubscriptionStatus.PastDue) { |
There was a problem hiding this comment.
The write-op grace-period check does a second GetItem right after subscriptionGuardMiddleware(AccessLevel.Read) runs. Because DynamoDB reads are eventually consistent by default, this check can see a stale subscriptionStatus (e.g., still trialing after the middleware lazily transitioned it to grace_period) and incorrectly allow write presigns. Consider using ConsistentRead: true for this billing record lookup (or otherwise reusing the middleware’s computed status) to avoid any access-control bypass due to stale reads.
There was a problem hiding this comment.
Highly unlikely to be a probblem. BUt I like idea to use middy computed status. Almost went that approach initially. will do.
| export async function baseHandler( | ||
| event: AuthenticatedEvent, | ||
| ): Promise<APIGatewayProxyStructuredResultV2> { | ||
| let body: unknown; | ||
| try { | ||
| body = JSON.parse(event.body ?? '[]'); | ||
| } catch { | ||
| return new ResponseBuilder() | ||
| .status(400) | ||
| .body<ErrorResponse>({ message: 'Invalid JSON body' }) | ||
| .build(); | ||
| } | ||
|
|
||
| const parsed = PresignRequestSchema.safeParse(body); | ||
| if (!parsed.success) { | ||
| return new ResponseBuilder() | ||
| .status(400) | ||
| .body<ErrorResponse>({ message: parsed.error.issues[0].message }) | ||
| .build(); | ||
| } | ||
|
|
||
| const ops = parsed.data; | ||
| const { orgId, userId } = getUserInfo(event); | ||
|
|
||
| // The subscription guard middleware uses Read access level so that listing | ||
| // and viewing objects still works during a grace period. If the batch | ||
| // contains write ops (putObject, deleteObject), block during grace period. | ||
| if (ops.some((op) => WRITE_OPS.has(op.op))) { | ||
| const billingResult = await dynamo.send( | ||
| new GetItemCommand({ | ||
| TableName: Resource.BillingTable.name, | ||
| Key: { pk: { S: `CUSTOMER#${userId}` }, sk: { S: 'SUBSCRIPTION' } }, | ||
| }), | ||
| ); | ||
| const status = billingResult.Item?.subscriptionStatus?.S; | ||
| if (status === SubscriptionStatus.GracePeriod || status === SubscriptionStatus.PastDue) { | ||
| return new ResponseBuilder() | ||
| .status(403) | ||
| .body<ErrorResponse>({ | ||
| message: | ||
| 'Your account is in a grace period. Read-only access is available. Please reactivate your subscription to make changes.', | ||
| code: ApiErrorCode.GRACE_PERIOD_WRITE_BLOCKED, | ||
| }) | ||
| .build(); | ||
| } | ||
| } |
There was a problem hiding this comment.
This PR removes several handler tests but doesn’t add coverage for the new consolidated presign handler. Since this endpoint now gates all object read/write presigning and has non-trivial branching (schema validation, org setup checks, grace-period write blocking, per-op URL generation), please add unit tests for at least: invalid JSON / schema errors, read-only batch success, mixed read+write batch blocked during grace period/past due, and ensuring item order matches request order.
There was a problem hiding this comment.
Good catch. Let's re-add tests.
| export function parseListObjectsResponse(xml: string): ListObjectsResponse { | ||
| const doc = new DOMParser().parseFromString(xml, 'application/xml'); | ||
|
|
||
| const errorNode = doc.querySelector('parsererror'); | ||
| if (errorNode) { | ||
| throw new Error('Failed to parse S3 ListObjects response'); | ||
| } | ||
|
|
||
| const contents = doc.getElementsByTagName('Contents'); | ||
| const objects: S3Object[] = []; | ||
|
|
||
| for (let i = 0; i < contents.length; i++) { | ||
| const el = contents[i]; | ||
| const key = getText(el, 'Key'); | ||
| if (!key) continue; | ||
|
|
||
| objects.push({ | ||
| key, | ||
| sizeBytes: parseInt(getText(el, 'Size') ?? '0', 10), | ||
| lastModified: getText(el, 'LastModified') ?? new Date().toISOString(), | ||
| ...(getText(el, 'ETag') && { etag: getText(el, 'ETag') }), | ||
| }); | ||
| } | ||
|
|
||
| const nextToken = getText(doc.documentElement, 'NextContinuationToken'); | ||
| const isTruncated = getText(doc.documentElement, 'IsTruncated') === 'true'; | ||
|
|
||
| return { objects, nextToken, isTruncated }; | ||
| } |
There was a problem hiding this comment.
New XML/header parsing + S3 error parsing is introduced here but there are no unit tests alongside the existing website lib tests (e.g., time.test.ts, hubspot.test.ts). Please add tests covering representative ListObjectsV2 XML (including empty buckets and pagination), HeadObject headers -> metadata mapping (including x-amz-meta-* and x-fil-cid), and error XML parsing so regressions don’t silently break object browsing.
There was a problem hiding this comment.
Adding unit tests to this is another good call.
| xhr.onerror = () => reject(new Error('Upload failed')); | ||
| xhr.open('PUT', presignData.url); | ||
| xhr.open('PUT', presignedUrl); | ||
| xhr.setRequestHeader('Content-Type', contentType); |
There was a problem hiding this comment.
xhr.open('PUT', presignedUrl) hard-codes the method even though the presign response includes method. To make this more robust (and to fail fast if the backend ever changes/misconfigures the op), consider using the returned method (or at least asserting it’s PUT) rather than assuming.
There was a problem hiding this comment.
Unlikely to be an issue. Let's go ahead and do this anyways.
| const cachedBucket = queryClient.getQueryData<GetBucketResponse>( | ||
| queryKeys.bucket(bucketName), | ||
| ); | ||
| const hasObjectLock = cachedBucket?.bucket.objectLockEnabled ?? false; | ||
|
|
There was a problem hiding this comment.
hasObjectLock is derived only from queryClient.getQueryData(queryKeys.bucket(bucketName)). If a user lands on this page via a direct link (or cache was GC’d), cachedBucket will be undefined and you’ll never request retention info even when the bucket has Object Lock enabled. Consider ensuring the bucket metadata is fetched here (or always including getObjectRetention and handling the expected error) so retention data is correct regardless of navigation path.
There was a problem hiding this comment.
Will look into this again.
| if (hasObjectLock && items[1]) { | ||
| const retentionResult = await fetch(items[1].url, { method: items[1].method }); | ||
| if (retentionResult.ok) { | ||
| const xml = await retentionResult.text(); | ||
| retention = parseGetObjectRetentionResponse(xml) ?? undefined; | ||
| } | ||
| } |
There was a problem hiding this comment.
The getObjectRetention call uses a raw fetch() and silently ignores all non-2xx responses. That will hide real failures (e.g., expired presign URL, access denied, network errors) and can make the UI show “no retention” when the request actually failed. Consider reusing executePresignedUrl() (and/or parsing the S3 error body) while explicitly allowing only the expected “no retention configured” error cases to be treated as absence of retention.
There was a problem hiding this comment.
Can this block use executePresignedUrl instead of fetch?
If not, then I think we need to handle the case where the error response XML body indicates and error, don't we?
There was a problem hiding this comment.
I tested the error paths and we get 400 and the UI renders appropriately with and without object level lock. that said, Ill look into this suggestion.
bajtos
left a comment
There was a problem hiding this comment.
Great stuff!
I love the high-level direction, assuming we are okay with increased latency for some geographies.
See my comments below for things to improve.
| ### Current Request Path (Proxied) | ||
|
|
||
| ``` | ||
| Browser -> CloudFront -> API Gateway -> Lambda -> Aurora S3 -> Lambda -> API Gateway -> CloudFront -> Browser | ||
| ``` | ||
|
|
||
| ### Desired Request Path (Presigned) | ||
|
|
||
| ``` | ||
| Browser -> Lambda (presign, ~50ms) -> Browser -> Aurora S3 (direct) | ||
| ``` |
There was a problem hiding this comment.
I am fine with this decision, but want to flag the impact of different network routing on response times. (Latency numbers based on https://latency.bluegoat.net/).
For example, for customers in New Zealand, we are increasing the response times by ~300ms.
Current request path - total ~302ms
- ap-southeast-6 → us-east-2 (FilOne Console): ~212ms
- us-east-2 → eu-west-1 (Aurora S3): ~90ms
New request path - total ~508ms
- ap-southeast-6 -> us-east-2: ~212ms
- ap-southeast-6 -> eu-west-1: ~296ms
There was a problem hiding this comment.
Interesting numbers.... Will add this to the ADR.
In my experience close to us-west-2, it feels faster. 🤷
There was a problem hiding this comment.
Yeah, on second thought, I think my comment is missing the perspective. We are primarily targeting customers in Europe now, and they should see a small improvement in response times.
Current request path - total 181ms
- eu-central-1 (Germany) → us-east-2: 102ms
- us-east-2 → eu-west-2: 79ms
New request path - total 122ms
- eu-central-1 → us-east-2: 102ms
- eu-central-1 → eu-west-2: 20ms
I think improving the response times for European customers by ~60ms (30%) is meaningful.
| | CreateBucket | Aurora Portal API mutation | | ||
| | DeleteBucket | Aurora Portal API; must verify bucket is empty server-side | | ||
|
|
||
| ListBuckets could switch from the Portal API to the S3 `ListBuckets` command (making it presignable), since the handler currently only uses `name`, `createdAt`, `region` (hardcoded), and `isPublic` (hardcoded false). However, the Portal API returns richer metadata that will matter as the UI matures. This can be revisited independently. |
There was a problem hiding this comment.
💯
I was thinking about the same! 😄
There was a problem hiding this comment.
LOL yeah I thought you'd be aligned on this one. We should rethink this if it complicates other providers
|
|
||
| **Route:** `POST /api/presign` | ||
|
|
||
| **Middleware:** Auth (JWT cookie) + subscription guard. No CSRF — presigned URLs are themselves the authorization token. The handler inspects the batch to determine access level: if any operation is `putObject` or `deleteObject`, Write access is required; otherwise Read. |
There was a problem hiding this comment.
I was confused about why we need to determine Write vs Read access. After inspecting the source code, I realised this is used to support write-lock for accounts past the trial period.
Can you please enhance the ADR and explain why Write/Read access is important.
There was a problem hiding this comment.
Yes good call. I left that out. Will add this into ADR.
TECHNICALLY the API is read access... but better to enforce it where we can in case of future changes to how we utilize these URLs and/or API.
|
|
||
| ### Aurora CORS Header Exposure | ||
|
|
||
| The Aurora S3 endpoint must expose `x-fil-cid` and `x-amz-meta-*` headers via `Access-Control-Expose-Headers` for HeadObject to work from the browser. Without this, the Filecoin CID and custom metadata are invisible to JavaScript. File upload (PUT) already works, confirming CORS is partially configured. GET, HEAD, and DELETE methods and the specific exposed headers must be verified before deploying the frontend changes. The presign handler can ship independently; only the frontend switch depends on CORS. |
There was a problem hiding this comment.
@joemocode-business did you verify that Aurora S3 has the correct CORS configuration? If yes, then please update this section accordingly.
There was a problem hiding this comment.
It does indeed! I missed updating this. Thanks. Will update.
There was a problem hiding this comment.
Please add tests for this new file.
There was a problem hiding this comment.
Great callout. My bad!
| // The subscription guard middleware uses Read access level so that listing | ||
| // and viewing objects still works during a grace period. If the batch | ||
| // contains write ops (putObject, deleteObject), block during grace period. | ||
| if (ops.some((op) => WRITE_OPS.has(op.op))) { | ||
| const billingResult = await dynamo.send( | ||
| new GetItemCommand({ | ||
| TableName: Resource.BillingTable.name, | ||
| Key: { pk: { S: `CUSTOMER#${userId}` }, sk: { S: 'SUBSCRIPTION' } }, | ||
| }), | ||
| ); | ||
| const status = billingResult.Item?.subscriptionStatus?.S; | ||
| if (status === SubscriptionStatus.GracePeriod || status === SubscriptionStatus.PastDue) { | ||
| return new ResponseBuilder() | ||
| .status(403) | ||
| .body<ErrorResponse>({ | ||
| message: | ||
| 'Your account is in a grace period. Read-only access is available. Please reactivate your subscription to make changes.', | ||
| code: ApiErrorCode.GRACE_PERIOD_WRITE_BLOCKED, | ||
| }) | ||
| .build(); | ||
| } |
There was a problem hiding this comment.
I don't have a strong opinion, TBH 🤷🏻
| // Inject fil-include-meta=1 query parameter so Aurora returns | ||
| // X-Fil-Cid and X-Fil-Offload-Status headers in the response. |
There was a problem hiding this comment.
Can you please preserve this code comment and move it to the relevant place in the new version.
|
|
||
| const errorNode = doc.querySelector('parsererror'); | ||
| if (errorNode) { | ||
| throw new Error('Failed to parse S3 ListObjects response'); |
There was a problem hiding this comment.
Can we include information about the parsing error in the error message, or at least log it to console?
There was a problem hiding this comment.
yes, good call.
| const errorNode = doc.querySelector('parsererror'); | ||
| if (errorNode) return null; |
There was a problem hiding this comment.
Is it okay to silently swallow the parsing error?
IMO, we should at least log the error to console.
There was a problem hiding this comment.
yeah let's log it
| if (hasObjectLock && items[1]) { | ||
| const retentionResult = await fetch(items[1].url, { method: items[1].method }); | ||
| if (retentionResult.ok) { | ||
| const xml = await retentionResult.text(); | ||
| retention = parseGetObjectRetentionResponse(xml) ?? undefined; | ||
| } | ||
| } |
There was a problem hiding this comment.
Can this block use executePresignedUrl instead of fetch?
If not, then I think we need to handle the case where the error response XML body indicates and error, don't we?
https://linear.app/filecoin-foundation/issue/FIL-168/move-aurora-s3-calls-to-frontend
I decided to move to a single endpoint to eliminate the need to have many distributed lambdas that all take the heavy SSM tax to get our stored credential for operations that are directly supported by aurora's gateway.
This way we can keep a single lambda warm that performs all the signing operations. The perceived latency is much much better after this change. Also fixes this one by happenstance: https://linear.app/filecoin-foundation/issue/FIL-99/duplicate-upload-incorrectly-creates-two-rows
This of course does not include the bucket create operations nor accessKey creation which need backoffice API.
Manual testing: