Concurrency hardening, commit rollback, and CI bumps#12
Merged
Conversation
- actions/checkout v4 -> v6.0.2, actions/setup-java v3 -> v5.2.0, actions/cache v3 -> v5.0.3 (commit-pinned) - JDK distribution adopt -> temurin (Adoptium rebrand) - drop jimschubert/query-tag-action; read github.ref_name directly; collapse dual release-tag steps with a conditional prerelease (-beta only) - KSP 2.3.5 -> 2.3.6, Dagger 2.59.1 -> 2.59.2 - fix stale JDK 17 / Gradle 8.13 line in build-commands.md
Concurrent updateDevice + deleteDevice could resurrect a deleted device because per-device sync was acquired without coordinating against the global devices map. Holding the global mutex through file I/O would have fixed correctness but stalled all auth on the server behind any slow per-device operation, since updateDevice runs on every authenticated request. New design holds the global mutex only for short critical sections: - updateDevice fetches under a brief global mutex, then re-checks under sync that the same Device instance is still in the map (reference-equality on the per-device Mutex), commits the map update under another brief mutex slice, and never holds the global lock during file I/O. - deleteDevice / deleteDevices register a PendingDeviceDeletion (CompletableDeferred indexed by DeviceKey + DeviceId) under the brief global mutex, then drop the path under per-device sync only. Tracker cleanup runs in a NonCancellable scope so it always clears. - createDevice awaits any matching pendingDeletion before recreating, preventing a resurrected path from being clobbered by an in-flight delete. Adds four DeviceRepoTest cases pinning resurrection prevention and the no-head-of-line-blocking property for unrelated devices.
Old rollbackMoves() only unlinked moved blob files; it did not release reservations or remove the upload sessions whose payloads had been consumed. A commit that failed after consumeAndMoveCompletedBlob (e.g. module.json rename throws) leaked reservedBytes and orphan sessions. commitModule now tracks each consumed blob in a ConsumedBlob struct (meta, live blob path, session dir) and exposes an explicit commitPointReached flag set right after the module.json.tmp -> module.json rename. Before that point, rollbackConsumedBlobs() does the full reverse: delete payload, removeConsumedSession, releaseReservation. After the commit point, even an access.json write failure is logged-and-swallowed - the module is committed and must not turn into a quota rollback. UploadSessionRepo.ConsumeResult.Ready now carries sessionDir so callers don't re-resolve it. New removeConsumedSession() deletes a session whose payload was already moved out, without touching quota (the caller owns the reserved -> used transition). Adds a BlobQuotaBoundaryFlowTest case that corrupts module.json.tmp (creates a directory with that name so rename fails), asserts 500 + reservedBytes back to 0 + session removed + a fresh session can be created.
emtee40
pushed a commit
to emtee40/octi-sync-server-kotlin
that referenced
this pull request
May 4, 2026
Tier B batch 3 from code review: - BlobFlowTest: new test asserts that the per-route RequestBodyLimit (1 MB) overrides the global 128 KB body limit on PATCH. Empirical result: 204 NoContent. Codex's claim that the child plugin install does not override the parent is wrong on Ktor 3 — closing finding d4rken-org#12 as a false positive. - DeviceRoute.deleteDevice: added a comment documenting why lifecycleService.deleteForDevice runs before deviceRepo.deleteDevice. The Codex finding d4rken-org#20 ("orphan data behind") doesn't materialise because DeviceRepo.deleteDevice does a recursive delete of the whole device path under target.sync — any in-flight write landing between the two calls is wiped clean. Closing d4rken-org#20 as no-op. - ModuleLifecycleService.commitModule: KDoc note about the §"Concurrency And Locking" plan deviation (module lock acquired before session lock instead of the prescribed reverse). No deadlock today; revisit if the deferred per-module lifecycle flag is added. - PLAN-BLOB-SUPPORT.md §"Implementation Deferrals": added the lock-order inversion as an explicit deferred item so the deviation is visible in the plan.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
updateDevice+deleteDevicecould resurrect a deleted device because per-device sync was acquired without coordinating against the global devices map. Holding the global mutex through file I/O would have stalled all auth on the server behind any slow per-device operation. New design holds the global mutex only for short critical sections (map ops, identity-pinned commit) and uses per-instance reference equality + aPendingDeviceDeletiontracker (CompletableDeferredindexed by bothDeviceKeyandDeviceId) to gatecreateDeviceagainst in-flight path deletions.rollbackMoves()only unlinked moved blob files; it didn't release reservations or remove the upload sessions whose payloads had been consumed. A commit that failed afterconsumeAndMoveCompletedBlob(e.g. module.json rename throws) leakedreservedBytesand orphan sessions.commitModulenow tracks each consumed blob explicitly and rolls back payload + session + reservation before the module.json rename. After the rename (the commit point), anaccess.jsonfailure is logged-and-swallowed.checkoutv6.0.2,setup-javav5.2.0,cachev5.0.3,action-gh-releasev3), JDK distributionadopt→temurin(Adoptium rebrand), dropjimschubert/query-tag-actionin favor ofgithub.ref_name, collapse dual release-tag steps with conditional prerelease (-betaonly — intentional), KSP 2.3.5 → 2.3.6, Dagger 2.59.1 → 2.59.2.Test plan
./gradlew checkpasses locally (5m 40s)DeviceRepoTest— 4 tests: 2 resurrection-prevention + 2 no-head-of-line-blockingBlobQuotaBoundaryFlowTest.failed PUT after consuming blob releases reservationcode-checks.yml(assemble, check, Docker build, cross-version regression replay)-betatags)