Skip to content

[REVIEW BUNDLE] S3 image storage + storage management (will be split into 4 PRs)#157

Closed
OhYee wants to merge 14 commits into
ZimengXiong:mainfrom
OhYee:review/s3-image-storage-bundle
Closed

[REVIEW BUNDLE] S3 image storage + storage management (will be split into 4 PRs)#157
OhYee wants to merge 14 commits into
ZimengXiong:mainfrom
OhYee:review/s3-image-storage-bundle

Conversation

@OhYee
Copy link
Copy Markdown
Contributor

@OhYee OhYee commented Apr 28, 2026

⚠️ Draft / Review-only:这个 PR 把后续计划拆出去的 4 个 PR 合在一起,方便先做整体确认。确认通过后,会按下方"拆分计划"重新作为 4 个独立 PR 提交,这个 bundle PR 不会直接合

为什么先开 bundle PR

原工作分支 copilot/auto-upload-images-to-s3 是一次 squash 提交,混杂了 S3 上传、后端直传重构、存储管理三块功能 + 多处安全/正确性修复,无法单独 review。已经按功能边界整理为 4 组:

序号 计划独立 PR 自身 commits 依赖
1 fix/sqlite-wal-busy-timeout 1
3 feat/s3-image-upload 4
4 refactor/backend-direct-upload 4 PR3
5 feat/storage-management 5 PR4

说明:原计划的 PR2 (Excalidraw 0.18 paste fix) 单独无法编译(依赖 PR3 的前端基础设施),且会被 PR4 的"后端直传"机制完全覆盖(save 时后端统一处理 paste/拖拽/导入的 base64),因此并入 PR3+PR4 的内部链路,不再单独成 PR。

每个 PR 内部按"1 个核心 feat/refactor + N 个独立 fix commit"组织,方便审阅人逐项核对。


PR1 — fix/sqlite-wal-busy-timeout(1 commit)

改动

  • PRAGMA journal_mode = WAL + PRAGMA busy_timeout = 5000
  • configureSqlite() 可 await,监听端口前先 await
  • 跳过非 SQLite provider;真错误改为 console.warn

必要性:协作场景下 SQLite "database is locked" 频出;原 fire-and-forget 实现让进程启动头 ~10ms 内的查询用不到 WAL 设置;原 catch {} 把权限/损坏类的真错误也吞了。

风险:极低,纯运行时配置。


PR3 — feat/s3-image-upload(4 commits = 1 feat + 3 fix)

核心功能(合并后单一 feat commit):

  • 新增 S3File 表 + backend/src/s3.ts(支持 AWS / MinIO / OSS / R2)
  • POST /files/upload-url 预签名 URL;GET /files/:fileId 私有桶 302 重定向
  • 前端 Editor.tsx addFiles monkey-patch + Excalidraw 0.18 onChange 检测
  • 配置项全部可选;S3 启用后移除 base64 静默降级
  • key 格式 {prefix}/{userId}/{drawingId}/{fileId}.{ext} + immutable Cache-Control

审查反馈修复(保留为独立 commit)

commit 修复 严重度
fix(s3): validate fileId before using as S3 key segment 路径遍历:files: { "../../etc/passwd": ... } 越权写入桶。两层防御(sanitize + processFilesForS3)。 Critical
fix(files): authorise GET /files/:fileId via drawing access, not by guessability 原代码假设 fileId 不可猜,但 fileId 是图片 SHA-1 ── 任何持有原图的人都能算出。改为通过 drawing 访问权限授权。 High
fix(s3): forward S3_FORCE_PATH_STYLE config to initS3() env var 解析了但没传给 S3 客户端,MinIO/OSS 用户配了等于白配。 Critical

必要性:原 base64 in SQLite 在多图协作场景下 DB 体积爆炸、同步慢;私有桶模式让对外暴露受限的部署也能用。三个 fix 不修属于安全/可用性硬伤。

兼容性:纯加法变更(新表、新可选 env、新路由),未配置 S3 时行为不变。


PR4 — refactor/backend-direct-upload(4 commits = 1 refactor + 3 fix,stack 在 PR3 上)

核心重构(合并后单一 refactor commit):

  • 新增 backend/src/fileProcessing.tsprocessFilesForS3 工具
  • s3.ts 增加 uploadBuffer / listS3Objects
  • POST /drawingsPUT /drawings/:id、两个 /import/* 路由保存前调用
  • 删除 POST /files/upload-url + 前端所有 S3 上传逻辑
  • 粘贴/拖拽/导入的 base64 由后端在保存时统一上传

审查反馈修复

commit 修复 严重度
fix(s3): validate fileId before using as S3 key segment processFilesForS3 内的纵深防御(与 PR3 sanitize 层对齐)。 Critical
fix(drawings): version pre-check, non-owner gate, DELETE-time S3 cleanup 三连修:①PUT 在上传前先做 version 预检(避免 409 后 S3 永久孤儿),②非 owner share-link editor 不可加新 fileId(防止用 50MB×N 塞满 owner 的桶),③DELETE drawing 时清理 S3 对象 + S3File 行(否则永久泄漏)。 High×3
fix(import): pre-resolve drawing id, bound parallel S3 uploads 导入冲突时原代码先用 prepared.id 上传、事务里发现冲突用 uuidv4() 写库 ── 上传 key 和 DB id 对不上的永久孤儿;现在事先 peek 决定 finalId。同时把"5000 个 drawing 串行上传"改成 8 并发(防反代超时)。 Critical

必要性:客户端持有预签名 URL 的实现复杂、安全面广;后端代理上传简化鉴权 + 统一 audit + 自然覆盖所有 base64 来源(包括 Excalidraw 0.18 内部粘贴流程)。导入冲突的孤儿对象 = 用户看到导入"成功"但下次打开图片 404 的数据丢失场景。

兼容性

  • ⚠️ POST /files/upload-url 移除是 API breaking。但后端 processFilesForS3 仍然接受 base64 输入,旧前端缓存的 save 路径仍可用 ── 仅粘贴时多一个 console error。
  • 数据格式向后兼容:已存 base64 不需迁移,下次 save 时懒升级。

PR5 — feat/storage-management(5 commits = 1 feat + 3 fix + 1 test,stack 在 PR4 上)

核心功能(合并后单一 feat commit):

  • POST /drawings/:id/trim(删 isDeleted 元素 + 孤儿 file + 对应 S3 对象)
  • GET /drawings/:id/files/diff(canvas / SQLite / S3 三方对比)
  • DELETE /drawings/:id/files/orphans(选择性删孤儿;同步剔除引用被删 fileId 的已删除元素,避免 ghost diff)
  • 前端 StorageManageModal + DrawingCard 右键菜单接入
  • 全部破坏性操作要求 confirmName 双确认

审查反馈修复 + 测试重写

commit 修复 严重度
fix(storage): invalidate cache, share-aware orphan deletion, version bump 三连修:①缓存失效(5s 内 /drawings 返回脏数据);②跨 drawing 共享 fileId 保护(同一图被多 drawing 引用,trim 不该删全局对象 ── 不修会丢生产数据);③orphan 路由也 bump version。 High×3
fix(storage): notify open editors of server-side mutations trim/orphan 后通过 socket 广播 drawing-server-update,前端自动 reload;否则协作者下次 save 把刚清理的元素又写回,整个 trim 白做。 Medium
test(storage): replace tautological tests with real route integration 原测试在内部重新实现 trim 逻辑然后 assert 自己 ── 同义反复,零价值。改用 supertest + 真 Express,覆盖 8 个真实场景。 Test debt
fix(storage): bump version on trim instead of resetting to 1 trim 把版本号写死 1 ── 当前 version > 1 的协作者要么得假冲突,要么被静默覆盖;改为 { increment: 1 }。被新测试锁住。 Critical

必要性:协作图反复粘贴大图,被删元素的 file 条目仍占空间,没工具就只能直连库;canvas/SQLite/S3 的 drift 是必然,diff 工具能让运维看见才能修。跨 drawing 保护漏掉直接丢生产数据

风险:写操作有破坏性 → 全部要求 confirmName 双确认。跨 drawing 保护用 LIKE 查 JSON 列,加引号匹配 fileId(≥16 字符随机串)几乎不可能假阳性。


验证

  • 后端 npm run build:✅
  • 后端 npx vitest run:✅ 172/172 通过
  • 前端 npm run build:✅

仅有 zod 版本不兼容引发的 2 处 tsc 警告(在 src/__tests__/drawing-history.test.ts 内),与本 PR 无关,upstream/main 上预先存在。


拆分后的提交计划

确认无误后,会按以下顺序提交独立 PR:

  1. PR1 fix/sqlite-wal-busy-timeout ── 可任何时候独立合
  2. PR3 feat/s3-image-upload ── 新功能基线
  3. PR4 refactor/backend-direct-upload ── 依赖 PR3 合并后再开
  4. PR5 feat/storage-management ── 依赖 PR4 合并后再开

OhYee and others added 14 commits April 28, 2026 16:10
Optional S3-backed image storage for Excalidraw drawings. When
S3_BUCKET is configured, image dataURLs are uploaded to S3 instead of
being base64-encoded into the drawing's files JSON in SQLite.

Why: a single uncompressed image is several MB; embedding many of
them in SQLite bloats the DB, slows collaborative sync, and makes
backups / exports unwieldy.

Module + routes (backend):
- backend/src/s3.ts: typed S3Config, presigned URL generation, supports
  AWS S3, MinIO, Cloudflare R2, Alibaba OSS via S3_ENDPOINT,
  S3_FORCE_PATH_STYLE, S3_PUBLIC_URL.
- POST /files/upload-url: presigned PUT URL for browser direct uploads.
- GET /files/config: surfaces { s3Enabled } so the frontend can decide
  whether to attempt direct uploads.
- GET /files/:fileId: 302 redirect to a presigned GET URL for private
  buckets (otherwise the dataURL stored in the drawing points at
  S3_PUBLIC_URL).
- Prisma migration adds S3File(id, userId, s3Key, mimeType, createdAt)
  so the redirect route can resolve fileId -> S3 key.

Frontend integration (Editor.tsx):
- Monkey-patch addFiles() to detect new image files, upload them in
  the background, and resolve S3 URLs into the persisted scene before
  save (saves with base64 are awaited so we never write base64 once
  S3 is configured).
- Excalidraw 0.18+ paste flow bypasses addFiles, so onChange also
  detects new base64 entries and kicks the same upload pipeline.
- compressExcalidrawFiles still runs first; S3 stores compressed bytes.

Configuration (all optional, S3 disabled when unset):
  S3_BUCKET, S3_REGION (default us-east-1), S3_ENDPOINT, S3_PUBLIC_URL,
  S3_FORCE_PATH_STYLE, S3_KEY_PREFIX (default "excalidash"),
  AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY.

Object key layout: {prefix}/{userId}/{drawingId}/{fileId}.{ext}, with
Cache-Control: public, max-age=31536000, immutable since fileIds are
content hashes.

When S3 is enabled there is no silent base64 fallback: upload failures
surface to the user instead of pretending the image was persisted.

Change-Id: Ie8fc2ad52036c3921f5eb932f2b35f1ce3d31576
Co-developed-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reject file ids that don't match the safe identifier shape (word chars
plus dash, 1-200 long) in two layers:

- security.ts sanitizeDrawingData drops malformed keys before persistence,
  so the drawing.files JSON cannot store entries that would later be
  uploaded under attacker-controlled paths.
- fileProcessing.processFilesForS3 also drops them defensively, since
  this utility is reachable from import paths that bypass the schema.

Without this, a request body of
  files: { "../../etc/passwd": { dataURL: "data:image/png;base64,..." } }
would write to S3 outside the per-user prefix and create a corresponding
S3File row.

Adds a regression test in fileProcessing.test.ts.

Change-Id: I4c27df54c3370977a20631a3cb6c08476d1546b9
Co-developed-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: OhYee <oyohyee@oyohyee.com>
…uessability

Excalidraw fileIds are SHA-1 hashes of the file bytes — anyone in
possession of the original image can compute the id, so the previous
"fileId is unguessable" comment was incorrect and any authenticated
user could fetch images uploaded by anyone else.

Now the route authorises by drawing access:
- the uploader (S3File.userId === req.user.id) can always read; OR
- the caller has access to a drawing that references this fileId, either
  one they own or one explicitly shared with them via DrawingPermission.

A single substring query on the JSON columns is enough because fileIds
are >= 16 random chars; the wrapping quotes prevent prefix collisions
(`abc1` won't match `abc12`).

Also adds a 401 fallback in case requireAuth is ever removed.

Change-Id: I4c9c0640e39d8d6a791009db289dc3a29e6658df
Co-developed-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: OhYee <oyohyee@oyohyee.com>
S3_FORCE_PATH_STYLE was parsed in config.ts and surfaced on the typed
S3Config but the property was never threaded through to the initS3()
call site. MinIO/Alibaba OSS users who set the env var saw no effect —
the S3 client always picked the default addressing style.

Change-Id: If16d6d8326612d4fb48501a54fea35c2b1e62d1b
Co-developed-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: OhYee <oyohyee@oyohyee.com>
Replaces the presigned-URL upload flow with backend-direct uploads. The
frontend now sends the entire drawing payload (base64 dataURLs intact)
to the standard create/update/import routes; the backend uploads any
new dataURLs to S3 and rewrites them to S3 URLs before persisting.

Why move it:
- Single auth boundary: avoid distributing presigned URLs that bypass
  app-level access checks.
- Coverage of all input paths: paste, drag-and-drop, import all flow
  through the same processFilesForS3 utility — no need for the
  Editor.tsx onChange detection that Excalidraw 0.18 routinely missed.
- Auditability: S3File rows are written by the backend in the same
  request that creates/updates the drawing.

Changes:
- Add backend/src/fileProcessing.ts with processFilesForS3(files,
  userId, drawingId, prisma): scans the files record for base64
  dataURLs, uploads them, upserts S3File rows, returns rewritten files
  (base64 -> public S3 URL or /api/files/:fileId redirect).
- Extend backend/src/s3.ts with uploadBuffer() and listS3Objects().
- Wire processFilesForS3 into POST /drawings, PUT /drawings/:id, and
  both /import/* routes.
- Remove POST /files/upload-url (no longer used by any client).
- Strip frontend S3 upload code: addFiles monkey-patch, the
  pendingS3UploadsRef, and the save-time await loop are all gone.

Change-Id: I4ff81d050e08d3eb7344712256785f6dc3465a61
Co-developed-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reject file ids that don't match the safe identifier shape (word chars
plus dash, 1-200 long) in two layers:

- security.ts sanitizeDrawingData drops malformed keys before persistence,
  so the drawing.files JSON cannot store entries that would later be
  uploaded under attacker-controlled paths.
- fileProcessing.processFilesForS3 also drops them defensively, since
  this utility is reachable from import paths that bypass the schema.

Without this, a request body of
  files: { "../../etc/passwd": { dataURL: "data:image/png;base64,..." } }
would write to S3 outside the per-user prefix and create a corresponding
S3File row.

Adds a regression test in fileProcessing.test.ts.

Change-Id: I4c27df54c3370977a20631a3cb6c08476d1546b9
Co-developed-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: OhYee <oyohyee@oyohyee.com>
PUT /drawings/:id (scene update path):

- Version conflict detection now happens BEFORE processFilesForS3, not
  after the updateMany. Previously, on a stale-version save the upload
  had already PUT every base64 file to S3 and we returned 409 — the
  objects became permanent orphans with no DB row. The new check uses
  a cheap findFirst before the upload; the existing version-filtered
  updateMany still catches the residual race window.
- Non-owner editors (link-share + edit, share-with-user) can no longer
  introduce new fileIds. Reading the existing files JSON gives us the
  set of allowed ids; any unknown id in the payload returns 403. Owners
  retain full control. This prevents a shared editor from filling the
  owner's S3 bucket via 50MB uploads.

DELETE /drawings/:id:

- Best-effort cleanup of S3 objects and S3File rows under the per-
  drawing prefix. Failures log and continue; the drawing row is
  already gone, so the cleanup is opportunistic. Without this, deleting
  a drawing leaked every uploaded image permanently.

Change-Id: Ia684b408c1eb0c474c04bf8c04eb339f9a497e47
Co-developed-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: OhYee <oyohyee@oyohyee.com>
Two import flows (excalidash backup + legacy SQLite) both pre-uploaded
files to S3 using the imported drawing id, then on a foreign-owner id
collision wrote the row under a fresh uuidv4() — leaving every uploaded
object as a permanent S3 orphan keyed under an id that no DB row knew
about.

Now both routes pre-resolve the final drawing id before uploading by
peeking at any pre-existing drawing of the same id; if it belongs to
another user, we generate the UUID up front and feed it to both the
upload and the in-transaction create. The transaction still re-checks
authoritatively, but always writes under the same finalId.

Also replaces the per-drawing serial await with bounded Promise.all
batches of 8. A 5000-drawing import previously serialised every S3 PUT
inside the request, easily hitting reverse-proxy timeouts; the bound
keeps memory and socket pressure flat without saturating the bucket.

Change-Id: I7ae77f642f097c295c71771d59ecea0d045661d3
Co-developed-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: OhYee <oyohyee@oyohyee.com>
Adds operator tooling for the inevitable drift between canvas refs,
the SQLite files JSON, and the S3 bucket: orphan files from
soft-deleted elements, S3 objects with no DB row, ghost file entries
left behind by partially-failed saves, etc.

Backend routes (require ownership + name confirmation):
- POST /drawings/:id/trim: drop isDeleted elements, prune unreferenced
  file entries, delete the corresponding S3 objects + S3File rows.
  Requires confirmName matching the drawing name (GitHub-style
  destructive-action guard).
- GET /drawings/:id/files/diff: three-way diff over canvas refs,
  SQLite files keys, and S3 objects under the per-drawing prefix.
  Returns one row per fileId with presence flags.
- DELETE /drawings/:id/files/orphans: selectively delete fileIds the
  caller knows are unused. Refuses fileIds still active in
  non-deleted elements; also strips deleted-element references to the
  same fileIds so the diff view stops reporting them as ghost canvas
  references.

Frontend:
- StorageManageModal renders the three-way diff with checkboxes;
  surfaces "trim history" and "delete selected orphans" actions, both
  requiring the user to type the drawing name.
- Hooked into DrawingCard right-click menu so the action is one click
  from the dashboard instead of an API call.
- API helpers added in frontend/src/api/index.ts.

Change-Id: Ic989d096938a841c4709a6b6a63c2c732e38cf81
Co-developed-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…bump

Three related defects in trim/orphans handlers:

- Stale /drawings cache: trim and orphan-delete mutate the underlying
  drawing but never called invalidateDrawingsCache(), so the listing
  endpoint returned pre-trim element/file payloads for up to 5s.
- Cross-drawing fileId deletion: Excalidraw fileIds are content hashes,
  so the same image can appear in multiple drawings. The previous code
  unconditionally removed the S3 object plus S3File row when trimming
  one drawing, breaking sibling drawings. Skip the S3-side delete when
  any other drawing of the same user still references the file (checked
  via a substring-on-JSON match scoped by quotes to avoid prefix
  collisions).
- Orphan-delete left version unchanged: now bumps version like trim, so
  concurrent editors reload instead of overwriting.

Change-Id: I1a83a5bab7d2354c9a1125142e94c2025f3e2f6d
Co-developed-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: OhYee <oyohyee@oyohyee.com>
Trim and orphan-delete mutate Drawing.elements / files directly from
the dashboard. A collaborator currently in the editor at the same
moment never received any signal, so their next save would echo back
the trimmed-away elements and undo the cleanup.

Now both routes emit drawing-server-update on the drawing's socket
room. Editor.tsx subscribes and reloads — coarse but correct, since
the cleanup intentionally drops in-progress soft-deleted state that
the editor would otherwise try to merge.

Change-Id: Id0ce5f4ea7eabdb5af3531e33556d1aaa73ae180
Co-developed-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: OhYee <oyohyee@oyohyee.com>
The previous storage.integration.ts re-implemented trim/diff/orphan
logic inline against Prisma and asserted on its own work — a tautology
that could not catch real bugs in the route handlers, and one assertion
("Wrong Name" !== "My Drawing") was just a JS string compare.

Now boots the actual Express app via the same supertest+CSRF pattern
used by drawings-shared.integration.ts and exercises:

- POST /drawings/:id/trim: happy path, version monotonicity (regression
  for the trim version-reset bug), confirmName mismatch (403),
  non-owner (404).
- DELETE /drawings/:id/files/orphans: happy path including soft-deleted
  element cleanup and version bump, blocked by active references (400),
  empty payload (400), confirmName mismatch (403).
- GET /drawings/:id/files/diff: canvas/sqlite presence columns.

S3-side branches stay uncovered (S3 disabled in tests) — that needs a
mock S3 client and is out of scope for this rewrite.

Change-Id: I6ffa867b9beab7337906bdbfa911824ad77e34ec
Co-developed-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: OhYee <oyohyee@oyohyee.com>
The trim handler wrote `version: 1` literally on completion. Any open
editor with version > 1 would either get a false VERSION_CONFLICT on
their next save, or — worse — silently have their newer version
overwritten by a later updateMany whose version filter happened to
match the reset value. Use { increment: 1 } to match orphan-delete and
the rest of the codebase, so concurrent editors hit a real conflict
and reload.

Caught by the storage integration tests, which now lock in the
post-trim version is the pre-trim version + 1.

Change-Id: I3c76e5838bc2eadd9960b550ff27a38c87b7173a
Co-developed-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: OhYee <oyohyee@oyohyee.com>
Improves SQLite concurrency and reliability under collaborative load.

- PRAGMA journal_mode = WAL: readers no longer block writers,
  dramatically reducing "database is locked" failures during concurrent
  saves.
- PRAGMA busy_timeout = 5000: writers wait up to 5s on contention
  instead of failing immediately.
- The PRAGMAs are now awaited inside httpServer.listen() before the
  server accepts requests; the previous fire-and-forget meant queries
  issued in the first ~10ms of process life could still hit the default
  rollback journal and timeout behaviour.
- Skip the PRAGMA on non-SQLite providers (file: URLs only) and warn on
  real failures rather than silently swallowing every error — a
  misconfigured database used to disappear without a trace.

Change-Id: I6ac5777aaa70ed07b85d3c062bdaf013afc169cf
Co-developed-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@OhYee OhYee closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant