Skip to content

Add blob storage with upload sessions and per-blob delete#11

Merged
d4rken merged 42 commits into
mainfrom
blob-support-plan
Apr 28, 2026
Merged

Add blob storage with upload sessions and per-blob delete#11
d4rken merged 42 commits into
mainfrom
blob-support-plan

Conversation

@d4rken

@d4rken d4rken commented Apr 24, 2026

Copy link
Copy Markdown
Member

Summary

Adds blob-backed module storage to pair with the Octi Android client's large-attachment support:

  • Upload session protocol — resumable POST/PATCH/finalize flow with a per-account quota tracker, idempotent finalize, and startup recovery for crashed sessions.
  • Atomic commitPUT /v1/module/{id} resolves blobIds, moves staged payloads into place, and writes module.json as the single commit point. Orphaned blobs from previous revisions are swept asynchronously.
  • Per-blob deleteDELETE /v1/module/{id}/blobs/{blobId} drops a single committed blob, advances the module ETag, and releases the account-quota bytes. Requires If-Match to prevent a stale client from destroying a blob that a newer revision still references.
  • WebSocket notifications — blob commits and deletes emit module_changed/updated so peer devices refresh.

Test plan

  • ./gradlew check passes (integration + unit suites)
  • BlobFlowTest covers happy path, retry idempotency, stale If-Match, malformed headers, wildcard rejection, cross-account isolation, multi-blob modules, WS notification, and low-quota reuse
  • BlobFlowIntegrationTest exercises the full session lifecycle end-to-end
  • StartupRecoveryService sweeps orphaned blob directories and rebuilds quota on boot
  • Manual smoke: installDist → upload + commit a blob → DELETE /blobs/{id} with If-Match → verify GET /v1/account/storage drops by the blob's size

d4rken added 30 commits April 11, 2026 17:11
Document the server-side blob model, resumable upload flow, quota handling, and old-client compatibility constraints before implementation begins.
Clarify device-scoped blob endpoints, raw module read semantics, and legacy compatibility so the implementation matches cross-device sync behavior.
Adds chunked blob upload endpoints (POST, PATCH, finalize) under v1 module, PUT commit with blob refs gated by If-Match and If-None-Match preconditions, per-account storage quota, and startup recovery for stale sessions.

Module reads open payload handles under the device lock so concurrent writes cannot corrupt in-flight responses. Orphan blob cleanup runs on AppScope after the lock releases. Commit fails explicitly if the staged session blob is missing on disk.
Replaces per-file @OptIn(ExperimentalPathApi::class) annotations with a project-wide -opt-in compiler argument.
Replaces five duplicated SHA-1/SHA-256 hex helpers across ModuleRepo, UploadSessionRepo, and three test files with a single internal ModuleId.toModuleDirName() for storage-layout, plus a test-only ByteArray.sha256Hex() for blob checksums.

Adds ModuleIdTest with a pinned-vector regression so a silent break in the storage-name scheme cannot slip through shared-helper tests.
…ites

Collapses the two per-blob session-map scans in commitModule into a single consumeCompletedBlob call that returns meta and file together, distinguishing session-not-found from staged-payload-missing.

Adds an in-memory shadow for module access times so rapid blob downloads and reads no longer thrash access.json — disk writes are bounded by a 30s debounce and a GC-tick flush.

Documents that legacy DELETE deliberately does not reject blob-backed modules (asymmetric with legacy POST) so clients without blob awareness retain an escape hatch.
Adds DELETE /v1/module/{moduleId}/blobs/{blobId}?device-id=... that
drops a single committed blob, releases account quota, advances the
module etag, emits a module_changed/updated WebSocket event, and
asynchronously cleans up the on-disk blob directory. Previously the
client had to use PUT commit with an updated blobRefs list to drop
one blob; DELETE now offers a lightweight quota-cleanup path.

Requires If-Match against the current module etag (matching PUT
commit semantics) to prevent a stale client from destroying a blob
that a newer revision still intentionally references. Wildcard
If-Match is rejected; a 412 response includes the current ETag
header so clients can retry without a separate GET. The new
revision's sourceDeviceId is set to the caller so WebSocket
self-suppression and event attribution stay correct.

Quota adjustment happens inside the target.sync lock — same scope
as persistMeta — so a concurrent tryReserve cannot see stale
usedBytes. Duplicate blobRef rows (post-recovery corruption) are
handled defensively: remove exactly one matching ref, log a
warning, let subsequent DELETEs drain the rest.
- parseStrongEtag was duplicated in ModuleRoute and BlobRoute; lift it to HttpExtensions as a shared top-level helper.

- commitModule's orphan blob cleanup launched deleteRecursively on the default dispatcher; switch to Dispatchers.IO to match deleteBlob and free the Default pool from blocking FS work.
Adds test coverage for previously untested subsystems:

- StartupRecoveryService.recover - all blob/session branches incl. orphan reclaim, malformed meta, expired/COMPLETE/promoted-from-ACTIVE sessions, offset normalisation, two-account isolation, and edge cases around size/hash mismatch (documented as current behaviour).

- UploadSessionRepo.reapExpiredSessions - idle, absolute, COMPLETE, and zero-byte expiry paths.

- ModuleLifecycleService.deleteForDevice / account-route cascade - committed blobs, active sessions, post-teardown recovery.

- ModuleRepo.openBlobHandle vs concurrent DELETE - validates POSIX inode-kept-alive-on-unlink.

- Finalize idempotency across server restart.

- AccountStorageTracker arithmetic edges + HTTP quota boundary flow.

Production changes are minimal and infra-only:

- UploadSessionRepo.reapExpiredSessions visibility: private to internal so tests can drive GC deterministically.

- AppComponent exposes test-only accessors for storageTracker, sessionRepo, moduleRepo, lifecycleService, accountRepo, deviceRepo, json.

- TestRunner: AppComponent on TestEnvironment, bounded startup wait that captures launch exceptions instead of hanging.

Surfaced bug (left as follow-up, not patched): deleteForDevice does not release quota for modules whose module.json is malformed/missing, even though startup recovery counts those bytes from payload.blob. The malformed-meta teardown test documents this divergence.
deleteForDevice used a private decode helper that returned null on malformed module.json and skipped the module entirely. The on-disk dir got wiped via clearUnlocked but the tracker still held the bytes counted at startup recovery (which falls back to payload.blob.fileSize for malformed/missing meta). Result: quota leaked to oblivion on every device-delete that touched a corrupt module.

Extract the accounting rule onto ModuleRepo.accountForModule(moduleDir) returning bytes plus the parsed v1 meta. Recovery and deleteForDevice now use the same calculator, so they agree by construction. Recovery also gets the parsed meta back without re-reading module.json.

deleteForDevice further refactored to aggregate-then-delete-then-adjust: a partial clearUnlocked failure can no longer release quota for files still on disk.

legacyDelete is unchanged - it routes through loadOrMigrateMeta which already synthesises v1 meta from payload.blob for malformed/missing cases.
runs-on: ubuntu-latest pinned to ubuntu-24.04 across all three workflows. container: ubuntu pinned to ubuntu:24.04. No more silent re-pointing when GitHub rolls the alias forward.

Test job container also got LANG/LC_ALL=C.UTF-8. The bare ubuntu image ships with POSIX locale so the JVM defaults to ASCII for sun.jnu.encoding, which broke Kotlin .class output for backticked test names containing em-dashes (e.g. 'concurrent writer-writer on blob-backed module — one commits, other retries'). compileTestKotlin failed with InvalidPathException. Local builds were fine because the host locale is UTF-8.
The blob download endpoint now honors the Range header (RFC 7233 §4.3): single-range requests return 206 Partial Content with Content-Range; multi-range falls back to 200 full body; out-of-bounds returns 416 with the bytes hint. ETag and If-None-Match are wired in for 304 responses, using an <algo>:<hex> strong ETag derived from the BlobHandle.

BlobHandle gains optional hashAlgorithm + hashHex fields so the route can build a strong ETag without rehashing on every request. The accompanying BlobRangeFlowTest exercises 9 named cases (bare GET, single range, prefix, suffix, OOB, multi-range, If-None-Match hit and miss, malformed header).

Closes the PLAN-BLOB-SUPPORT.md deferral so future per-file size-cap raises do not block on the route layout.
Two assumptions the implementation relies on were unpinned in the original review of file-sharing.

First, the runtime read path (GET on a module endpoint) survives a corrupted module.json as long as payload.blob is intact: ModuleRepo.loadOrMigrateMeta synthesises new meta from the blob and the client still gets the bytes. Pin this with an HTTP-driven test that corrupts module.json on disk between write and read, then asserts both the body and the regenerated meta.

Second, ModuleLifecycleService.commitModule serialises concurrent commits to the same module via the per-device target.sync mutex. Fire eight concurrent PUTs with If-None-Match: * and assert exactly one returns 200 and the other seven return 412 — without the mutex, two commits could land before either sees the other's module.json.
Both enum values were dead writes — no code path ever assigned them. The defensive reads in isExpired() and terminateSession() collapse cleanly: ABORTED is the only terminal that suppresses quota release, and time-based expiry is computed live from the timestamp fields rather than persisted via a state mutation.

Re-adding either case is a 5-minute change if a future status-polling endpoint wants to surface session-expired or finalize-in-flight as observable states.
BlobRoute.createSession only reserves storage quota when sizeBytes > 0, which means a misbehaving client could otherwise open as many zero-byte sessions as it liked. The active-session count cap in UploadSessionRepo.createSession is the real safety net for that case.

Add three regression tests: (1) the cap rejects a zero-byte ninth session per device with SessionLimitExceededException and leaves the storage tracker untouched; (2) aborting a saturated session frees a slot for a new one; (3) the cap is per-device, so saturating device A does not block device B in the same account.

No production change. Codex review (option A) preferred this over coupling the byte-quota to a synthetic per-session token, which would regress legitimate near-quota small-file shares.
After finalize the only legitimate next step is the client's module PUT — there is no slow-network rationale to keep the session for the full 1 h ACTIVE-state TTL. Cancel-after-finalize and client-crash orphans now reap within ~20 min (10 min idle + one GC tick) instead of ~70 min.

Hardcoded constant; not exposed via App.Config since the value is structural to the protocol, not deployment-tunable.
Legacy POST and PUT commit applied document deltas to the quota counter after disk writes with no pre-check, so an attacker could exceed accountQuotaBytes by spamming many small document-only modules. Add tryAdjustUsed() to AccountStorageTracker and pre-check the doc delta before writing; rollback on write failure or BadRequest path. Routes surface 507 Insufficient Storage.
Module GC and device GC previously deleted disk state without telling AccountStorageTracker, so usedBytes inflated over uptime and only restart reconciled. ModuleRepo now sums bytes via accountForModule() and adjusts the tracker when reaping stale modules. DeviceRepo routes stale-device reaps through ModuleLifecycleService.deleteForDevice (Lazy-injected to break the Dagger cycle) before removing the device. AccountRepo also clears the tracker entry on account deletion so the GC path doesn't leak the in-memory map.
consumeCompletedBlob took no lock and ignored expiry, leaving a window where the GC reaper could terminate a COMPLETE session and delete its payload between commitModule's lookup and its file move. Replace with consumeAndMoveCompletedBlob, which takes a destinationFor lambda and performs the move inside state.lock so terminateSessionLocked must wait. Re-validates state and expiry under the lock.
Closes the inode/dirent abuse vector where a single account could register unlimited devices, each carrying its own modules subtree. Cap is enforced inside DeviceRepo.createDevice under the existing mutex so concurrent creates can't both pass the check. AccountRoute returns 409 Conflict and restores any consumed share so retries don't burn it.
All three module-creation paths (legacy POST, PUT commit, blob session create) now check the per-device count cap under target.sync.withLock before creating any directory. Existing moduleIds skip the check so retries and overwrites are unaffected. BlobRoute.createSession folds the cap check, quota reservation, and session creation into a single locked block via the CreateSessionOutcome sealed type so concurrent requests can't both pass and end up over the cap.
Bounds blob-dir accumulation in a single module across repeated commits. Without this cap, an attacker exploiting the zero-byte session loophole could keep adding refs to the same module indefinitely. Checked in commitModule before any blob resolution; returns 400 BadRequest.
createSession created modules/{hash}/sessions/{sessionId}/ as a side effect, but terminateSession only deleted the session leaf. Session-only modules (created and aborted without commit) accumulated empty modules/{hash}/ dirents. cleanupEmptyParentsAfterSession() walks up after the session delete and removes the empty sessions/ and modules/{hash}/ dirs. Committed modules with their own state are preserved.
Pre-fix the per-device session cap counted only ACTIVE sessions, so a client could finalize 8 to COMPLETE then create 8 more, repeatedly accumulating COMPLETE sessions that hold reservedBytes until expiry. Now both ACTIVE and COMPLETE count. Add maxActiveUploadSessionsPerAccount = 32 as a ceiling above the per-device cap so multi-device exploitation can't sidestep the per-device limit. Per-account is checked first, then per-device.
Adds completeIdleTtlSeconds (default 600s / 10 min) to App.Config and embeds it in UploadSessionMeta so isExpired() can branch on state. Pre-fix a finalize-but-no-commit session would hold its quota for the full 1h ACTIVE idle TTL — common in real-world client crashes between finalize and PUT-commit. Default field on the meta keeps existing on-disk session metas backward-compatible during deserialization.
Split authenticateDevice into a side-effect-free validator and a separate touchAuthenticatedDevice that records lastSeen and IP-device association. verifyCaller now sequences validate -> per-account rate-limit gate -> touch, so over-limit requests don't churn device metadata. WsRoute follows the same pattern at connection establishment. New AccountRateLimiter (256 req / 60 s default, layered on top of the per-IP bucket so NAT-shared accounts don't get starved by neighbours).
Adds two operator-facing CLI flags --account-quota-mb and --max-blob-mb (other limits stay hardcoded; they're not deployment-policy knobs). parseSizeFlag throws on malformed/non-positive values so a typo'd config fails the boot rather than silently flipping to a default. /v1/account/storage gains the new caps in v2: maxDevicesPerAccount, maxModulesPerDevice, maxBlobRefsPerModule, maxActiveUploadSessionsPerAccount, completeIdleTtlSeconds, accountRateLimit, accountRateLimitWindowSeconds. storageApiVersion bumped to 2.
Reject new blob sessions and chunks with 507 + X-Octi-Reason: server_disk_low when usable space on the data path falls below --min-free-disk-mb (500 MB default). Account quotas alone don't bound the sum across accounts, so a server with many accounts can otherwise exhaust the host disk.

Adds OctiResponseHeaders shared between BlobRoute and ModuleRoute for stable client-side switching on 507 reasons (server_disk_low vs account_quota_exceeded).
parseSizeFlag now rejects duplicate flag occurrences (previously silently fell back to defaults via singleOrNull) and uses Math.multiplyExact to fail loudly on overflow.

appendToSession validates Upload-Offset against the stored session offset before consulting the disk-space probe, so a wrong-offset PATCH returns 409 with the expected offset rather than getting masked by 507 when disk is low.

Also trims verbose KDocs and inline comments per project style.
@d4rken d4rken added the enhancement New feature or request label Apr 26, 2026
d4rken added 12 commits April 26, 2026 13:40
Operators see one log line on launch indicating whether the disk-space gate is disabled, OK, or already below the configured floor — instead of having to wait for the first blob upload to discover a misconfigured --min-free-disk-mb.
Tier A fixes from code review of the blob-support-plan branch:

- UploadSessionRepo: re-read session state inside lock at append/finalize/consume sites; concurrent PATCH at the same offset no longer corrupts the part file.

- UploadSessionRepo: terminateSessionLocked is idempotent and skips releaseReservation when staged files are already gone (a successful commit moved them); avoids double-release with the GC reaper.

- BlobRoute: reject hashHex without hashAlgorithm at create and finalize; the persisted (null, hex) shape would otherwise be wiped by recovery on next restart.

- Server: remove global install(PartialContent) that collided with the manual Range handling in BlobRoute.downloadBlob (kept manual to preserve the open-fd-after-unlink invariant).

- BlobRoute: emit Last-Modified and validate If-Range against ETag and HTTP-date; malformed If-Range falls back to a full 200 per RFC 7233 §3.2.

- ModuleLifecycleService.commitModule: two-pass commit (peek then move with rollback) so a failed blob ref doesn't orphan already-moved blobs or leak reservations; reorder deleteForDevice/legacyDelete to abort sessions before clearing the disk.

- StartupRecoveryService: add document_size_normalized auto-heal for payload.blob/documentSizeBytes drift; per-module + per-session try/catch isolation; rename event strings to recovery.orphan_reclaimed and recovery.blobref_removed to match the plan contract.

- AccountRateLimiter / App: --disable-rate-limits now also disables the per-account limiter (acquire short-circuits when limit <= 0; CLI sets accountRateLimit = 0).

Adds 7 regression tests pinning the new behavior: concurrent PATCH same-offset rejection, hashHex/algorithm coupling, Last-Modified + If-Range (matching, stale, malformed), document_size_normalized recovery, and the rate-limit disable path.
Tier B batch 1 from code review:

- BlobRoute downloadBlob: handle If-None-Match: * (RFC 7232 §3.2 — matches when the resource exists, returns 304 instead of full body).

- BlobRoute deleteBlob + ModuleRepo.removeBlobRefUnlocked: accept If-Match: * as the standard delete-if-exists idiom instead of rejecting with 400.

- BlobRoute appendToSession: pass min(maxBlobPatchBytes, remainingBytes) as the streaming chunk cap so a chunked-encoded request can't exceed the per-session declared size.

- UploadSessionRepo.abortSessionsForAccount: drop the per-account creation lock from the accountLocks map after termination so it doesn't grow unbounded over the server's lifetime.

- SyncNotifier.EventPayload.Event.ModuleChanged.sourceDeviceId is now a required String (plan §"Notification Semantics During Mixed Operation"); self-suppression filter no longer falls back to event.deviceId.

- BlobRoute: document that GET /blob-sessions/{sessionId} is the registered route and HEAD is synthesized by Ktor's AutoHeadResponse — explicit head{} would be shadowed by the plugin's HEAD→GET method swap.

- BlobFlowTest: switch session-status calls to http.head(...) and update the wildcard-If-Match delete test to assert the new pass-through behavior.
Tier B batch 2 from code review:

- StartupRecoveryService: ACTIVE→COMPLETE promotion now skips when the blobId is already a live blobRef on the module — avoids double-counting (live in usedBytes plus promoted in reservedBytes) when a prior crash landed between module.json rename and session cleanup.

- StartupRecoveryService: strict-lower hashAlgorithm comparison everywhere ("sha256" only). Pre-fix code accepted "SHA-256" too; that path is dead in practice (createSession rejects it on the way in) and tolerating it would mask invariant violations from hand-edited session.json.

- BlobRoute: createSession / appendToSession / finalizeSession now call moduleRepo.touchAccess on success per plan §"Module Expiration Protection". touchAccess skips session-only module dirs (no module.json + no payload.blob) so a stray access.json doesn't block the post-abort empty-parent sweep.

- ModuleRepo GC: payload-only legacy modules are no longer permanently skipped — the session-only filter checks for payload.blob too, and readEffectiveAccessTime falls back to module.json mtime then payload.blob mtime so v0 dirs can age out.

- ModuleRepo GC: per-module try/catch in the stale-module forEach — one delete failure no longer strands the rest of that device's stale modules; totalReclaimed only counts after delete succeeds.

- ModuleRepo.recordAccess: atomic accessShadow.compute() instead of read-then-put — prevents concurrent recordAccess calls from overwriting a fresher value with an older one. (Skipped £#23 zero-byte session bypass: plan explicitly allows zero-byte sessions; per-session-count caps already bound the inode pressure.)
Tier B batch 3 from code review:

- BlobFlowTest: new test asserts that the per-route RequestBodyLimit (1 MB) overrides the global 128 KB body limit on PATCH. Empirical result: 204 NoContent. Codex's claim that the child plugin install does not override the parent is wrong on Ktor 3 — closing finding #12 as a false positive.

- DeviceRoute.deleteDevice: added a comment documenting why lifecycleService.deleteForDevice runs before deviceRepo.deleteDevice. The Codex finding #20 ("orphan data behind") doesn't materialise because DeviceRepo.deleteDevice does a recursive delete of the whole device path under target.sync — any in-flight write landing between the two calls is wiped clean. Closing #20 as no-op.

- ModuleLifecycleService.commitModule: KDoc note about the §"Concurrency And Locking" plan deviation (module lock acquired before session lock instead of the prescribed reverse). No deadlock today; revisit if the deferred per-module lifecycle flag is added.

- PLAN-BLOB-SUPPORT.md §"Implementation Deferrals": added the lock-order inversion as an explicit deferred item so the deviation is visible in the plan.
Adds a happy-path test where a second device on the same account reads back the module document and downloads the blob bytes uploaded by the first device. Closes a coverage gap — every prior blob test used either one device or two devices on different accounts.
Pre-seed a session dir with a partial session.json.tmp and no final session.json, then assert StartupRecoveryService deletes the dir without crashing on the dangling temp.
Side-by-side legacy v0.8.1 + new server containers bound to localhost for manual pre-cutover smoke. replay-prod.local.sh unzips a developer's own zdatapath-prod.zip into data-prod-replay/, and both data dirs are gitignored. Strictly local; CI uses synthetic fixtures (separate commit).
…replay

CrossVersionFlowTest pins octi/0.8.1 on legacy clients against the new server: register, write, read, mixed PUT-with-blob-refs reads, and same-device downgrade overwrites. Adds a 500-account SyntheticDataFixture that boots StartupRecoveryService at production-like scale without touching real prod data. Optional OCTI_PROD_FIXTURE env-gated test stays local-only for developers replaying their own export.
New regression-synthetic-replay GitHub Actions job runs after build-docker-image and run-tests: gradle generateSyntheticData seeds ~500 accounts, the freshly built server image boots against that tree, the v1 status endpoint is polled, and the job fails if logs contain recovery.*malformed. No prod data is referenced anywhere in CI.
@d4rken d4rken merged commit 69f3157 into main Apr 28, 2026
5 checks passed
@d4rken d4rken deleted the blob-support-plan branch April 28, 2026 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant