Skip to content

fix: derive skill publish metadata from storage#2470

Merged
Patrick-Erichsen merged 1 commit into
mainfrom
jesse/fix-skill-publish-metadata
Jun 17, 2026
Merged

fix: derive skill publish metadata from storage#2470
Patrick-Erichsen merged 1 commit into
mainfrom
jesse/fix-skill-publish-metadata

Conversation

@jesse-merhi

Copy link
Copy Markdown
Member

Summary

  • Derive skill publish file size, content type, and sha256 from stored blobs inside publishVersionForUser.
  • Keep existing publish inputs working while enforcing file and total size limits against the stored bytes.
  • Add regression coverage for caller-supplied metadata that does not match storage.

Tests

bun run test -- convex/lib/skillPublish.test.ts --reporter=dot
bun run test -- convex/httpApiV1.handlers.test.ts --testNamePattern "publish json|publish multipart|Package publish|multipart package publish|multipart ClawPack|staged ClawPack" --reporter=dot
bun run test -- convex/httpApi.handlers.test.ts --testNamePattern "cliPublishHttp" --reporter=dot
bun run format:check
bun run lint
bunx tsc --noEmit --pretty false
bunx tsc -p packages/schema/tsconfig.json --noEmit --pretty false
bunx tsc -p packages/clawhub/tsconfig.json --noEmit --pretty false

Copilot AI review requested due to automatic review settings June 2, 2026 07:39
@jesse-merhi jesse-merhi requested review from a team and Patrick-Erichsen as code owners June 2, 2026 07:39
@vercel

vercel Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clawhub Ready Ready Preview, Comment Jun 2, 2026 7:39am

@clawsweeper

clawsweeper Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs changes before merge. Reviewed June 17, 2026, 2:12 AM ET / 06:12 UTC.

Summary
The PR changes skill publishing to derive stored file size, content type, and sha256 from Convex storage blobs and adds regression tests for caller-supplied metadata mismatches.

Reproducibility: yes. source-reproducible: current main checks skill publish limits with caller-provided file.size, and the PR head downloads blobs before trusted stored-size rejection. I did not execute a live Convex publish in this read-only review.

Review metrics: 1 noteworthy metric.

  • Changed backend surface: 2 files changed, +123/-14. Both changed files sit in the skill publish validation path where storage-read ordering affects abuse resistance.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🌊 off-meta tidepool
Patch quality: 🧂 unranked krab
Result: blocked by patch quality or review findings.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Move trusted stored metadata checks ahead of blob downloads.
  • Run a real Convex validation path such as bunx convex dev --once plus the focused publish tests.

Risk before merge

  • [P1] The PR can still spend backend storage-read work before rejecting over-limit stored content, so the intended publish-path availability hardening is incomplete.
  • [P1] The PR body lists unit/static/type checks but no real Convex runtime validation path for this storage-dependent publish behavior.

Maintainer options:

  1. Read trusted storage metadata first (recommended)
    Add an internal Convex metadata lookup, reject per-file and running-total violations before ctx.storage.get, then fetch only accepted blobs for hashing and text processing.
  2. Accept read-before-reject cost
    Maintainers could explicitly decide that upstream upload limits are enough and accept the backend read cost before rejection.
Copy recommended automerge instruction
@clawsweeper automerge

Special instructions:
Refactor skill publish metadata derivation to query trusted Convex _storage metadata through an internal Convex function before any ctx.storage.get, reject non-text/per-file/total stored metadata early, then download only accepted blobs for hashing and text processing. Add regression coverage proving over-limit and non-text stored uploads reject before blob fetches, and run a real Convex validation path such as bunx convex dev --once plus the focused publish tests.

Next step before merge

  • [P2] A focused automated repair is safe because the blocker is a narrow storage-metadata validation-order change in the publish path.

Security
Cleared: No dependency, workflow, secret, or supply-chain regression was found; the concrete blocker is availability from storage-read ordering.

Review findings

  • [P1] Enforce limits before reading blobs — convex/lib/skillPublish.ts:519
Review details

Best possible solution:

Reject trusted _storage metadata violations before blob downloads, hash only accepted files, and validate the changed publish path against a real Convex runtime.

Do we have a high-confidence way to reproduce the issue?

Yes, source-reproducible: current main checks skill publish limits with caller-provided file.size, and the PR head downloads blobs before trusted stored-size rejection. I did not execute a live Convex publish in this read-only review.

Is this the best way to solve the issue?

No as written. Deriving metadata from storage is the right layer, but the maintainable fix should check trusted _storage metadata before downloading blobs and include real Convex runtime validation.

Full review comments:

  • [P1] Enforce limits before reading blobs — convex/lib/skillPublish.ts:519
    derivePublishFilesFromStorage calls loadPublishFileBlobs before checking trusted stored type, per-file size, or total size, and loadPublishFileBlobs performs ctx.storage.get for every submitted file. A caller that understates metadata can still spend storage-read resources before rejection, so query _storage metadata and fail over-limit input before downloading blobs for hashing.
    Confidence: 0.87

Overall correctness: patch is incorrect
Overall confidence: 0.87

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against 4860945f347e.

Label changes

Label justifications:

  • P2: This is a normal-priority backend publish hardening fix with limited blast radius to skill publishing.
  • merge-risk: 🚨 availability: The PR can download submitted storage blobs before rejecting over-limit stored bytes, which can spend backend resources despite green CI.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🌊 off-meta tidepool and patch quality is 🧂 unranked krab.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Not applicable: The external-contributor proof gate does not apply to this MEMBER-authored backend PR, but repository Convex runtime validation is still needed before merge.
Evidence reviewed

Acceptance criteria:

  • [P1] bun run test -- convex/lib/skillPublish.test.ts --reporter=dot.
  • [P1] bun run test -- convex/httpApiV1.handlers.test.ts --testNamePattern "publish json|publish multipart|Package publish|multipart package publish|multipart ClawPack|staged ClawPack" --reporter=dot.
  • [P1] bun run test -- convex/httpApi.handlers.test.ts --testNamePattern "cliPublishHttp" --reporter=dot.
  • [P1] bunx convex dev --once.
  • [P1] bun run ci:static.

What I checked:

  • Live PR state: GitHub reports this PR open, member-authored by jesse-merhi, mergeable, and changing 2 files with +123/-14; the latest ClawSweeper comment still requests changes on head 11d5fd0. (11d5fd011fcc)
  • Current main still uses caller metadata: Current main filters publish files, checks text-ness, and enforces per-file and total limits using caller-provided file.contentType and file.size before any storage-derived skill publish metadata exists. (convex/lib/skillPublish.ts:154, 4860945f347e)
  • PR head reads blobs before rejection: On the PR head, derivePublishFilesFromStorage first calls loadPublishFileBlobs, and that helper performs ctx.storage.get for every submitted file before text, per-file size, or total-size checks run. (convex/lib/skillPublish.ts:519, 11d5fd011fcc)
  • Regression coverage is mock-only: The added tests exercise derivePublishFilesFromStorage with mocked storage.get returning Blob instances; they do not validate the changed storage-dependent publish behavior in a real Convex runtime path. (convex/lib/skillPublish.test.ts:42, 11d5fd011fcc)
  • Repository Convex validation policy: AGENTS.md requires a real Convex validation path for behavior that depends on Convex runtime semantics such as storage; mocked ctx tests do not count for that behavior. (AGENTS.md:54, 4860945f347e)
  • Convex storage metadata guidance: The repo-committed Convex guidance says trusted file metadata should be read from the _storage system table via ctx.db.system.get rather than deprecated storage metadata APIs. (convex/_generated/ai/guidelines.md:341, 4860945f347e)

Likely related people:

  • Patrick-Erichsen: GitHub commit history shows multiple recent changes to convex/lib/skillPublish.ts and adjacent publish artifact paths, and local blame ties the current validation block to the recent release snapshot. (role: recent area contributor; confidence: high; commits: 1e7de4a2a263, 4b3c2bb63020, b62d8ca81345; files: convex/lib/skillPublish.ts, convex/lib/publishLimits.ts)
  • jesse-merhi: Separate from authoring this PR, current-main history includes package publish metadata hardening in adjacent storage/upload publish paths. (role: recent adjacent contributor; confidence: medium; commits: dcbc38999f1a; files: convex/httpApiV1/packagesV1.ts, convex/lib/publishLimits.ts, convex/uploads.ts)
  • vincentkoc: GitHub history shows the shared published skill/plugin file-size cap helper originated in this area. (role: introduced publish limit helper; confidence: high; commits: 5cbeb54c8e7b; files: convex/lib/publishLimits.ts, convex/lib/skillPublish.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Replaces caller-supplied skill publish metadata with values derived from stored blobs, ensuring size, contentType, and sha256 reflect the actual storage contents and that file/total size limits cannot be bypassed by lying inputs.

Changes:

  • Extract a new derivePublishFilesFromStorage helper that loads each blob, validates content-type/size/total-size, and recomputes sha256.
  • Reorder publishVersionForUser to apply the storage-derived metadata before downstream consumers (readme detection, quality checks, etc.).
  • Add unit tests that cover happy-path metadata derivation and oversized-stored-blob rejection when caller metadata understates the size.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
convex/lib/skillPublish.ts Adds storage-driven metadata derivation, sha256 hashing, and rewires publish flow to use it.
convex/lib/skillPublish.test.ts Adds regression tests for derivePublishFilesFromStorage, including oversized-blob rejection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 2, 2026
@clawsweeper clawsweeper Bot added P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels Jun 10, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. merge-risk: 🚨 availability 🚨 Merging this PR could cause crashes, hangs, restart loops, stalls, or process outages. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. labels Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 availability 🚨 Merging this PR could cause crashes, hangs, restart loops, stalls, or process outages. P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants