Skip to content

Scope replication-failure read-only to repo/project, not the whole server#1305

Draft
ikhoon wants to merge 9 commits into
line:mainfrom
ikhoon:readonly-per-repository
Draft

Scope replication-failure read-only to repo/project, not the whole server#1305
ikhoon wants to merge 9 commits into
line:mainfrom
ikhoon:readonly-per-repository

Conversation

@ikhoon
Copy link
Copy Markdown
Contributor

@ikhoon ikhoon commented May 14, 2026

Motivation:

Central Dogma is a multi-tenant store: a single replica hosts many projects and repositories belonging to unrelated tenants. When a follower fails to replay a single log (state divergence, lock acquisition timeout, or transient I/O error), the existing failure path puts the whole server into read-only mode. A problem in one repository then propagates to every other tenant on that replica: writes stop everywhere until an operator investigates and lifts the server-wide read-only state.

The goal of this change is to shrink that blast radius. A replay failure flips only the smallest enclosing scope (repository, project, or server) to read-only, and the executor keeps replicating unrelated logs for everyone else. The read-only state is persisted so it survives restarts — in dogma/dogma for repo/project scope, and in a properties file for server scope. The existing administrative API can lift it once the operator has investigated.

Modifications:

  • Add UpdateProjectStatusCommand and UpdateRepositoryStatusCommand, routed through the same replication pipeline as ordinary commands so scope changes converge across the cluster. Their effect lands in dogma/dogma/status/{project}/{repo}.json — the same storage as ordinary commands, intentionally: a replay failure of the status update itself is then treated as a dogma/dogma failure and escalates to server scope.
  • Carry commandType, projectName, and repoName on every LogMeta.
  • Add ReplicationLogContext attached to ReplicationException so ZooKeeperCommandExecutor.handleReplicationFailure() can pick the read-only scope:
    • null project name (root commands or dogma/dogma) → server
    • non-null project name with null repo (or the dogma metadata repo) → project
    • otherwise → repository
  • Add RepoStatusManager to manage the read-only status of projects and repositories; remove MetadataService.updateRepositoryStatus() and migrate callers.
  • On a replay error, skip past the failing log (so subsequent unrelated logs keep replaying) and issue the scope-appropriate Update(Project|Repository)StatusCommand. Fall back to a server-wide read-only state only if recovery itself fails.
  • Wire StandaloneCommandExecutor to consult RepoStatusManager on every push, so a per-repo / per-project read-only state rejects writes with ReadOnlyException before reaching the storage layer.
  • Rename DefaultCrud*ReplicatingCrud* and introduce StandaloneCrud* to bypass replication when the standalone executor applies state updates directly.
  • Move ServerStatusManager to the internal package.
  • Add ReplicationStatus to describe the status of projects and repositories, and migrate RepositoryStatus in RepositoryDto to ReplicationStatus.

Result:

  • A replication failure no longer puts the whole server into read-only mode; only the affected project or repository becomes read-only.

…rver

Motivation:

Central Dogma is a multi-tenant store: a single replica hosts many
projects and repositories belonging to unrelated tenants. When a
follower fails to replay a single log (state divergence, lock
acquisition timeout, or transient I/O error), the existing failure
path puts the whole server into read-only mode. A problem in one
repository then propagates to every other tenant on that replica:
writes stop everywhere until an operator investigates and lifts the
server-wide read-only state.

The goal of this change is to shrink that blast radius. A replay
failure flips only the smallest enclosing scope (repository, project,
or server) to read-only, and the executor keeps replicating unrelated
logs for everyone else. The read-only state is persisted so it
survives restarts — in `dogma/dogma` for repo/project scope, and in
a properties file for server scope. The existing administrative API
can lift it once the operator has investigated.

Modifications:

- Add `UpdateProjectStatusCommand` and `UpdateRepositoryStatusCommand`,
  routed through the same replication pipeline as ordinary commands so
  scope changes converge across the cluster. Their effect lands in
  `dogma/dogma/status/{project}/{repo}.json` — the same storage as
  ordinary commands, intentionally: a replay failure of the status
  update itself is then treated as a `dogma/dogma` failure and
  escalates to server scope.
- Carry `commandType`, `projectName`, and `repoName` on every
  `LogMeta`.
- Add `ReplicationLogContext` attached to `ReplicationException` so
  `ZooKeeperCommandExecutor.handleReplicationFailure()` can pick the
  read-only scope:
  - null project name (root commands or `dogma/dogma`) → server
  - non-null project name with null repo (or the dogma metadata
    repo) → project
  - otherwise → repository
- Add `RepoStatusManager` to manage the read-only status of projects
  and repositories; remove `MetadataService.updateRepositoryStatus()`
  and migrate callers.
- On a replay error, skip past the failing log (so subsequent
  unrelated logs keep replaying) and issue the scope-appropriate
  `Update(Project|Repository)StatusCommand`. Fall back to a
  server-wide read-only state only if recovery itself fails.
- Wire `StandaloneCommandExecutor` to consult `RepoStatusManager` on
  every push, so a per-repo / per-project read-only state rejects
  writes with `ReadOnlyException` before reaching the storage layer.
- Rename `DefaultCrud*` → `ReplicatingCrud*` and introduce
  `StandaloneCrud*` to bypass replication when the standalone executor
  applies state updates directly.
- Move `ServerStatusManager` to the internal package.
- Add `ReplicationStatus` to describe the status of projects and
  repositories, and migrate `RepositoryStatus` in `RepositoryDto` to
  `ReplicationStatus`.

Result:

- A replication failure no longer puts the whole server into read-only
  mode; only the affected project or repository becomes read-only.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds ReplicationStatus and RepoStatusManager; replaces git-scoped CRUD with replicating/standalone CRUD; adds update-project/repository status commands; wires repo-status checks through executors and HTTP API; refactors ZooKeeperCommandExecutor to carry replication log context and apply scoped read-only failure handling.

Changes

Repo status and DTOs

Layer / File(s) Summary
Replication types and DTO updates
common/.../ReplicationStatus.java, common/.../RepositoryStatus.java, common/.../internal/api/v1/RepositoryDto.java, server/.../internal/api/UpdateRepositoryStatusRequest.java
Add ReplicationStatus, deprecate RepositoryStatus, and change DTO/request types to use ReplicationStatus.

Repo status manager & models

Layer / File(s) Summary
RepoStatusManager and RepositoryState
server/src/main/java/com/linecorp/centraldogma/server/internal/management/*
Add RepoStatusManager that caches/persists per-repo replication status and RepositoryState model; refactor ServerStatusManager storage and add package-level annotations.

CRUD abstractions and implementations

Layer / File(s) Summary
CrudContext & operation helpers
server/.../storage/repository/crud/CrudContext.java, CrudOperation.java
Add constructors/helpers: default HEAD ctor, withRevision, getPath, and entryToValue helper for JSON mapping.
Replicating and standalone CRUD implementations
server/.../storage/repository/crud/ReplicatingCrudOperation.java, ReplicatingCrudRepository.java, StandaloneCrudOperation.java, StandaloneCrudRepository.java, package-info.java
Introduce replicating write-via-CommandExecutor/read-via-Standalone delegate pattern and new standalone CRUD repository/operation implementations; move interfaces to .crud package.
Test & extension wiring for CRUD
testing-internal/.../CrudRepositoryExtension.java, ProjectManagerExtension.java
Switch tests to use ReplicatingCrudRepository and expose ServerStatusManager to test harness.

Commands and executor wiring

Layer / File(s) Summary
Command types and factory methods
server/src/main/java/com/linecorp/centraldogma/server/command/*
Add UPDATE_PROJECT_STATUS and UPDATE_REPOSITORY_STATUS command types; add UpdateProjectStatusCommand and UpdateRepositoryStatusCommand classes; add Command.updateProjectStatus/updateRepositoryStatus(...) factories.
StandaloneCommandExecutor & CentralDogma wiring
server/src/main/java/com/linecorp/centraldogma/server/CentralDogma.java, StandaloneCommandExecutor.java
Pass RepoStatusManager into executors and startup paths; use repoStatusManager.isWritable(...) to guard writes and handle new status commands by delegating to repoStatusManager.
Metadata and RepositoryMetadata deprecation
server/.../metadata/MetadataService.java, RepositoryMetadata.java
Remove old updateRepositoryStatus API and deprecate repo metadata status accessor in favor of RepoStatusManager.

HTTP API and variable service

Layer / File(s) Summary
RepositoryServiceV1, DtoConverter
server/src/main/java/.../RepositoryServiceV1.java, DtoConverter.java
Wire RepositoryServiceV1 to accept RepoStatusManager and use ReplicationStatus for listing, create/patch/status/update flows; update fallback/migrate flows to use updateRepositoryStatus helper that executes status commands.
Variable/Templater service changes
server/.../variable/VariableServiceV1.java, Templater.java
Switch to ReplicatingCrudOperation for variable persistence and templater repo lookups.

Replication log & ZooKeeper replay

Layer / File(s) Summary
ZooKeeperCommandExecutor replay flow & context
server/src/main/java/.../replication/ZooKeeperCommandExecutor.java, LogMeta.java, ReplicationException.java, ReplicationLogContext.java, ScopedReadOnlyAppliedException.java
Refactor load/store to populate and return ReplicationLogContext (bytes + enriched LogMeta including commandType/projectName/repoName), compress/store bytes, change replay to use context, centralize handleReplicationFailure that emits scoped Update*Status commands, and adapt async/sync paths to call the handler and surface ScopedReadOnlyAppliedException for sync failures.
LogMeta and exception updates
server/.../replication/LogMeta.java, ReplicationException.java
Add optional commandType/projectName/repoName fields to LogMeta; make ReplicationException carry optional ReplicationLogContext and update constructors/accessors.

Tests and other small changes

Layer / File(s) Summary
Tests: scoped read-only, RepoStatusManager tests, ZooKeeperCommandExecutorTest updates
server/src/test/.../replication/*, RepoStatusManagerTest.java, RepositoryServiceV1Test.java, various test imports/expectations
Add extensive integration and unit tests for scoped read-only behaviors, update tests to assert ReplicationStatus, adjust test wiring to pass RepoStatusManager/ServerStatusManager and use ReplicatingCrudRepository in extensions.
Misc: .gitignore and build update
.gitignore, it/xds-k8s-node-ip-extractor/build.gradle
Add AI-agent ignores and a test dependency for armeria gRPC.

Sequence Diagram (high level)

sequenceDiagram
  participant Client
  participant RepositoryServiceV1
  participant RepoStatusManager
  participant CommandExecutor
  participant ZooKeeper
  Client->>RepositoryServiceV1: API call (create/patch/update status)
  RepositoryServiceV1->>RepoStatusManager: query/update ReplicationStatus
  RepositoryServiceV1->>CommandExecutor: execute status update command
  CommandExecutor->>ZooKeeper: store replication log (LogMeta + bytes)
  ZooKeeper->>CommandExecutor: deliver logs for replay (loadLog -> ReplicationLogContext)
  CommandExecutor->>RepoStatusManager: handleReplicationFailure -> emit scoped Update*Status
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

defect

Suggested reviewers

  • trustin
  • jrhee17

"🐰
I hopped through logs and nudged the code,
Marked repos read-only on the troubled road.
With statuses set and tests in tow,
I guard the writes where failures grow.
Hooray — a rabbit's tidy deploy!"

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/src/main/java/com/linecorp/centraldogma/server/CentralDogma.java (1)

644-678: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Move repoStatusManager.initialize() before executor.start() to load persisted repository status before accepting commands.

The RepoStatusManager constructor initializes an empty statusMap. The initialize() method only registers a listener but does not synchronously load existing persisted status files from disk.

In the replicating path (lines 651–676), the executor starts at line 653 with an unpopulated cache. Any write commands processed between lines 653–671 will invoke repoStatusManager.isWritable() (StandaloneCommandExecutor line 138) against an empty cache, potentially bypassing scoped read-only restrictions that were persisted before restart.

The listener pattern is reactive and only fires on future changes, not on registration. To ensure all persisted read-only states are loaded before the executor begins accepting commands, initialize the repo status manager before starting the executor.

The non-replicating path (lines 646–650) is safe since initialize() runs before the executor is returned; apply the same ordering to the replicating path (lines 651–676).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/main/java/com/linecorp/centraldogma/server/CentralDogma.java`
around lines 644 - 678, The repoStatusManager.initialize() must be called before
starting the executor so persisted repo statuses are loaded into the cache prior
to accepting commands; move the call to repoStatusManager.initialize() to run
before invoking executor.start() in the replicating branch (the block that
currently calls executor.start(), waits on startFuture, then calls
projectInitializer.initialize() and repoStatusManager.initialize()), ensuring
the order is repoStatusManager.initialize(), projectInitializer.initialize() (as
appropriate), then start the executor (executor.start()) and proceed with the
existing start/wait/stop logic and finally
setMirrorAccessControllerRepository(pm, executor).
🧹 Nitpick comments (1)
server/src/main/java/com/linecorp/centraldogma/server/CentralDogma.java (1)

645-678: 💤 Low value

Optional: deduplicate repoStatusManager.initialize() and setMirrorAccessControllerRepository(...) calls.

repoStatusManager.initialize() is invoked in all three branches (lines 647, 671, 675), and setMirrorAccessControllerRepository(pm, executor) runs in both the non-replicating return path (line 648) and after the try/catch (line 677). Consolidating into a single tail block reduces the chance of one path forgetting to run them when this method evolves.

♻️ Sketch
         final ServerStatus initialServerStatus = statusManager.serverStatus();
         executor.setWritable(initialServerStatus.writable());
-        if (!initialServerStatus.replicating()) {
-            projectInitializer.initializeInReadOnlyMode();
-            repoStatusManager.initialize();
-            setMirrorAccessControllerRepository(pm, executor);
-            return executor;
-        }
-        try {
-            final CompletableFuture<Void> startFuture = executor.start();
-            while (!startFuture.isDone()) {
-                if (numPendingStopRequests.get() > 0) {
-                    // Stop request has been issued.
-                    executor.stop().get();
-                    break;
-                }
-                try {
-                    startFuture.get(100, TimeUnit.MILLISECONDS);
-                } catch (TimeoutException unused) {
-                    // Taking long time ..
+        if (!initialServerStatus.replicating()) {
+            projectInitializer.initializeInReadOnlyMode();
+        } else {
+            try {
+                final CompletableFuture<Void> startFuture = executor.start();
+                while (!startFuture.isDone()) {
+                    if (numPendingStopRequests.get() > 0) {
+                        executor.stop().get();
+                        break;
+                    }
+                    try {
+                        startFuture.get(100, TimeUnit.MILLISECONDS);
+                    } catch (TimeoutException unused) {
+                        // Taking long time ..
+                    }
                 }
+                startFuture.get();
+                projectInitializer.initialize();
+            } catch (Exception e) {
+                logger.warn("Failed to start the command executor. Entering read-only.", e);
+                projectInitializer.initializeInReadOnlyMode();
             }
-            // Trigger the exception if any.
-            startFuture.get();
-            projectInitializer.initialize();
-            repoStatusManager.initialize();
-        } catch (Exception e) {
-            logger.warn("Failed to start the command executor. Entering read-only.", e);
-            projectInitializer.initializeInReadOnlyMode();
-            repoStatusManager.initialize();
         }
+        repoStatusManager.initialize();
         setMirrorAccessControllerRepository(pm, executor);
         return executor;

(Note: only adopt this if the ordering concern raised in the other comment is resolved — i.e. it is fine to defer repoStatusManager.initialize() until after executor.start() in the non-replicating path as well.)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/main/java/com/linecorp/centraldogma/server/CentralDogma.java`
around lines 645 - 678, Consolidate duplicated cleanup calls by removing the
early invocations of repoStatusManager.initialize() and
setMirrorAccessControllerRepository(pm, executor) inside the non-replicating
branch and inside the try/catch, and instead call repoStatusManager.initialize()
and setMirrorAccessControllerRepository(pm, executor) once at the end just
before returning executor; ensure the non-replicating path no longer returns
early (remove the premature return in the !initialServerStatus.replicating()
branch) so that the single tail block always runs, and keep the existing
executor.setWritable(initialServerStatus.writable()) and
projectInitializer.initialize()/initializeInReadOnlyMode() logic intact (adjust
projectInitializer.initializeInReadOnlyMode() call to remain in the
non-replicating path as currently done).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@it/xds-k8s-node-ip-extractor/build.gradle`:
- Line 6: Remove the unnecessary test dependency on armeria.grpc by deleting or
replacing the line that adds testImplementation libs.armeria.grpc; tests only
use Armeria HTTP types (e.g., AggregatedHttpResponse, HttpStatus,
RequestHeaders) and a grpc-status header check so the com.linecorp.armeria.grpc
artifact is not required—ensure tests still compile using existing
armeria.junit5 or other Armeria HTTP dependencies instead of adding
armeria.grpc.

In
`@server/src/main/java/com/linecorp/centraldogma/server/internal/replication/ReplicationLogContext.java`:
- Around line 65-81: equals() compares the byte[] field bytes by content
(Objects.deepEquals), but hashCode() uses Objects.hash(...) which uses the
byte[] identity; update hashCode in ReplicationLogContext to compute a
content-based hash for bytes (e.g., int bytesHash = Arrays.hashCode(bytes)) and
include that int in the returned hash (for example: return
Objects.hash(replayRevision, meta, log, bytesHash)); also add the necessary
import for java.util.Arrays. Ensure you modify the hashCode() method only and
keep equals() as-is.

In
`@server/src/test/java/com/linecorp/centraldogma/server/internal/replication/ZooKeeperScopedReadonlyIntegrationTest.java`:
- Around line 614-619: The test performs an unsafe cast in start(PluginContext):
ZooKeeperCommandExecutor.unwrap() is returning the delegate from Replica which
is an anonymous AbstractCommandExecutor, so casting to StandaloneCommandExecutor
can throw ClassCastException; to fix, either stop casting and store the
unwrapped executor in a compatible type (e.g., AbstractCommandExecutor or
CommandExecutor) and adjust uses in ZooKeeperScopedReadonlyIntegrationTest to
work with that API, or modify the Replica test setup so its delegate is an
actual StandaloneCommandExecutor implementation; update the start(PluginContext)
code (and/or Replica initialization) to consistently use the same executor type
rather than forcing StandaloneCommandExecutor.

In
`@testing-internal/src/main/java/com/linecorp/centraldogma/testing/internal/ProjectManagerExtension.java`:
- Line 89: The ServerStatusManager field is being re-created after the executor
is constructed, causing two different instances; move the initialization of the
serverStatusManager field (the ServerStatusManager constructor call) so it
occurs before newCommandExecutor(...) is invoked, and modify newCommandExecutor
to use the existing serverStatusManager field instead of constructing a new
ServerStatusManager instance; ensure the public serverStatusManager() getter and
any places that previously relied on a local/new instance now reference the
single shared serverStatusManager field.
- Around line 172-176: The method currently creates a local ServerStatusManager
(final ServerStatusManager serverStatusManager = new
ServerStatusManager(dataDir)) and passes it into the StandaloneCommandExecutor,
causing two distinct instances instead of reusing the field initialized earlier
in before(); replace the local instantiation with the existing field
serverStatusManager so StandaloneCommandExecutor, RepoStatusManager, and any
other consumers use the single shared ServerStatusManager instance already set
up in before().

---

Outside diff comments:
In `@server/src/main/java/com/linecorp/centraldogma/server/CentralDogma.java`:
- Around line 644-678: The repoStatusManager.initialize() must be called before
starting the executor so persisted repo statuses are loaded into the cache prior
to accepting commands; move the call to repoStatusManager.initialize() to run
before invoking executor.start() in the replicating branch (the block that
currently calls executor.start(), waits on startFuture, then calls
projectInitializer.initialize() and repoStatusManager.initialize()), ensuring
the order is repoStatusManager.initialize(), projectInitializer.initialize() (as
appropriate), then start the executor (executor.start()) and proceed with the
existing start/wait/stop logic and finally
setMirrorAccessControllerRepository(pm, executor).

---

Nitpick comments:
In `@server/src/main/java/com/linecorp/centraldogma/server/CentralDogma.java`:
- Around line 645-678: Consolidate duplicated cleanup calls by removing the
early invocations of repoStatusManager.initialize() and
setMirrorAccessControllerRepository(pm, executor) inside the non-replicating
branch and inside the try/catch, and instead call repoStatusManager.initialize()
and setMirrorAccessControllerRepository(pm, executor) once at the end just
before returning executor; ensure the non-replicating path no longer returns
early (remove the premature return in the !initialServerStatus.replicating()
branch) so that the single tail block always runs, and keep the existing
executor.setWritable(initialServerStatus.writable()) and
projectInitializer.initialize()/initializeInReadOnlyMode() logic intact (adjust
projectInitializer.initializeInReadOnlyMode() call to remain in the
non-replicating path as currently done).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b84f0604-e96f-4cc2-ad7c-e9becac91db8

📥 Commits

Reviewing files that changed from the base of the PR and between 458ca6c and dbbceab.

📒 Files selected for processing (52)
  • .gitignore
  • common/src/main/java/com/linecorp/centraldogma/common/ReplicationStatus.java
  • common/src/main/java/com/linecorp/centraldogma/common/RepositoryStatus.java
  • common/src/main/java/com/linecorp/centraldogma/common/jsonpatch/JsonPatchOperation.java
  • common/src/main/java/com/linecorp/centraldogma/internal/api/v1/RepositoryDto.java
  • it/xds-k8s-node-ip-extractor/build.gradle
  • server/src/main/java/com/linecorp/centraldogma/server/CentralDogma.java
  • server/src/main/java/com/linecorp/centraldogma/server/command/Command.java
  • server/src/main/java/com/linecorp/centraldogma/server/command/CommandExecutorStatusManager.java
  • server/src/main/java/com/linecorp/centraldogma/server/command/CommandType.java
  • server/src/main/java/com/linecorp/centraldogma/server/command/ProjectCommand.java
  • server/src/main/java/com/linecorp/centraldogma/server/command/StandaloneCommandExecutor.java
  • server/src/main/java/com/linecorp/centraldogma/server/command/UpdateProjectStatusCommand.java
  • server/src/main/java/com/linecorp/centraldogma/server/command/UpdateRepositoryStatusCommand.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/api/DtoConverter.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/api/RepositoryServiceV1.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/api/UpdateRepositoryStatusRequest.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/api/sysadmin/MirrorAccessControlRequest.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/api/sysadmin/ServerStatusService.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/api/variable/Templater.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/api/variable/VariableServiceV1.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/management/RepoStatusManager.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/management/RepositoryState.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/management/ServerStatusManager.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/management/package-info.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/mirror/DefaultMirrorAccessController.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/replication/LogMeta.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/replication/ReplicationException.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/replication/ReplicationLogContext.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/replication/ScopedReadOnlyAppliedException.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/replication/ZooKeeperCommandExecutor.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/storage/repository/crud/CrudContext.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/storage/repository/crud/CrudOperation.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/storage/repository/crud/CrudRepository.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/storage/repository/crud/ReplicatingCrudOperation.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/storage/repository/crud/ReplicatingCrudRepository.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/storage/repository/crud/StandaloneCrudOperation.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/storage/repository/crud/StandaloneCrudRepository.java
  • server/src/main/java/com/linecorp/centraldogma/server/internal/storage/repository/crud/package-info.java
  • server/src/main/java/com/linecorp/centraldogma/server/metadata/MetadataService.java
  • server/src/main/java/com/linecorp/centraldogma/server/metadata/RepositoryMetadata.java
  • server/src/test/java/com/linecorp/centraldogma/server/internal/api/RepositoryServiceV1Test.java
  • server/src/test/java/com/linecorp/centraldogma/server/internal/api/auth/RequiresRoleTest.java
  • server/src/test/java/com/linecorp/centraldogma/server/internal/management/RepoStatusManagerTest.java
  • server/src/test/java/com/linecorp/centraldogma/server/internal/replication/ZooKeeperCommandExecutorTest.java
  • server/src/test/java/com/linecorp/centraldogma/server/internal/replication/ZooKeeperScopedReadOnlyTest.java
  • server/src/test/java/com/linecorp/centraldogma/server/internal/replication/ZooKeeperScopedReadonlyIntegrationTest.java
  • server/src/test/java/com/linecorp/centraldogma/server/internal/storage/repository/git/ReplicatingCrudRepositoryTest.java
  • server/src/test/java/com/linecorp/centraldogma/server/management/ServerStatusManagerTest.java
  • server/src/test/java/com/linecorp/centraldogma/server/metadata/MetadataServiceTest.java
  • testing-internal/src/main/java/com/linecorp/centraldogma/testing/internal/CrudRepositoryExtension.java
  • testing-internal/src/main/java/com/linecorp/centraldogma/testing/internal/ProjectManagerExtension.java
💤 Files with no reviewable changes (2)
  • server/src/test/java/com/linecorp/centraldogma/server/metadata/MetadataServiceTest.java
  • server/src/main/java/com/linecorp/centraldogma/server/metadata/MetadataService.java

Comment thread it/xds-k8s-node-ip-extractor/build.gradle
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/src/main/java/com/linecorp/centraldogma/server/internal/replication/ZooKeeperCommandExecutor.java (1)

896-947: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Replace assert with a hard null-check or make loadLog non-nullable.

loadLog(...) is declared @Nullable (line 1206), so static analyzers will rightfully assume logContext here can be null, yet logContext.log() on line 903 would NPE if it ever were. The assert on line 904 is a no-op in production (assertions are off by default), so it provides no real protection. Either drop @Nullable from loadLog (its body either returns a non-null context or throws) or guard with an explicit requireNonNull / if (logContext == null) throw ... here. See the related comment on loadLog.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@server/src/main/java/com/linecorp/centraldogma/server/internal/replication/ZooKeeperCommandExecutor.java`
around lines 896 - 947, The assert on logContext in ZooKeeperCommandExecutor is
ineffective in production because loadLog(...) is annotated `@Nullable`; replace
the assert with an explicit null-check or make loadLog non-nullable: after
calling ReplicationLogContext logContext = loadLog(nextRevision) validate
logContext (e.g., Objects.requireNonNull(logContext, "loadLog returned null for
revision " + nextRevision) or if (logContext == null) throw new
ReplicationException("missing log for revision " + nextRevision, logContext));
ensure you reference the same ReplicationLogContext variable and throw a
ReplicationException (or similar) so subsequent code using logContext.log() /
l.command() is safe.
🧹 Nitpick comments (2)
server/src/main/java/com/linecorp/centraldogma/server/internal/replication/ZooKeeperCommandExecutor.java (2)

1228-1234: 💤 Low value

Redundant setBytes call before the decompression branch.

logContext.setBytes(bytes) on line 1228 stores the still-compressed bytes, and line 1234 unconditionally overwrites it with bytes again (decompressed when compressed == TRUE, identical otherwise). The first call is dead — either remove it or move the assignment to occur only once, after the conditional decompression.

♻️ Proposed fix
             assert logMeta.size() == offset;
-            logContext.setBytes(bytes);
 
             final Boolean compressed = logMeta.compressed();
             if (Boolean.TRUE.equals(compressed)) {
                 bytes = Zstd.decompress(bytes);
             }
             logContext.setBytes(bytes);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@server/src/main/java/com/linecorp/centraldogma/server/internal/replication/ZooKeeperCommandExecutor.java`
around lines 1228 - 1234, The call logContext.setBytes(bytes) before checking
compressed is redundant because bytes is overwritten after potential
decompression; in ZooKeeperCommandExecutor adjust the logic to set logContext
bytes only once after handling compression: remove the early
logContext.setBytes(bytes) (or move that assignment to after the if
(Boolean.TRUE.equals(logMeta.compressed())) { bytes = Zstd.decompress(bytes); })
so that logContext.setBytes(bytes) is invoked only once with the final byte
array; reference variables: logContext, logMeta, compressed, and
Zstd.decompress.

1206-1240: ⚡ Quick win

@Nullable is inconsistent with the implementation.

loadLog never returns null: it either returns the populated logContext (line 1237) or throws a ReplicationException (line 1239). The @Nullable annotation misleads callers (e.g. replayLogs at lines 902-904, which uses an assert rather than a real null check) and any static-analysis tooling. Drop the annotation, which also lets you remove the assert l != null at line 904.

♻️ Proposed fix
-    `@Nullable`
     `@VisibleForTesting`
     ReplicationLogContext loadLog(long revision) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@server/src/main/java/com/linecorp/centraldogma/server/internal/replication/ZooKeeperCommandExecutor.java`
around lines 1206 - 1240, The method loadLog currently never returns null (it
always returns a ReplicationLogContext or throws), so remove the misleading
`@Nullable` annotation on loadLog (and any related import), update callers like
replayLogs which currently do an assert l != null to rely on the non-null return
(remove that assert and any defensive null-branches), and adjust any
static-analysis or javadoc if present to reflect that loadLog, which returns
ReplicationLogContext and populates logContext, is non-null and will throw
ReplicationException on failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@server/src/main/java/com/linecorp/centraldogma/server/internal/replication/ZooKeeperCommandExecutor.java`:
- Around line 1180-1191: The current directExecution branch calls
delegate.execute(ExecutionContext.empty(), command).get() while holding the
per-path InterProcessMutex, which can block indefinitely; change this to use a
bounded wait (e.g., delegate.execute(...).get(timeout, unit)) and handle
TimeoutException by logging the timeout (include scope and command details),
calling stopLater(), and avoiding storing the ReplicationLog; ensure other
exceptions are still caught by the existing catch block so the existing
logger.error and stopLater() behavior remains for failures.
- Around line 1178-1179: The logger.error message in ZooKeeperCommandExecutor
currently concatenates two string literals without a separator and contains a
typo — change the message passed to logger.error (the string around the
logger.error(...) call) to include a separator (e.g., a space or comma/newline)
between sentences and correct "attemping" to "attempting", preserving the SLF4J
placeholders and the trailing Throwable argument (i.e., the message should read
something like "Failed to replicate a log; attempting to switch to read-only
mode. scope: {}, logContext: {}" and leave the exception as the final
parameter).
- Around line 1145-1153: In handleReplicationFailure in class
ZooKeeperCommandExecutor, update the logger.error call that currently logs "Fail
to replicate a log; entering read-only mode." to use the correct past-tense
wording "Failed to replicate a log; entering read-only mode." so it matches
other messages; locate the logger.error(...) invocation inside the
handleReplicationFailure(ReplicationException exception, boolean
directExecution) method and change the message string accordingly.

---

Outside diff comments:
In
`@server/src/main/java/com/linecorp/centraldogma/server/internal/replication/ZooKeeperCommandExecutor.java`:
- Around line 896-947: The assert on logContext in ZooKeeperCommandExecutor is
ineffective in production because loadLog(...) is annotated `@Nullable`; replace
the assert with an explicit null-check or make loadLog non-nullable: after
calling ReplicationLogContext logContext = loadLog(nextRevision) validate
logContext (e.g., Objects.requireNonNull(logContext, "loadLog returned null for
revision " + nextRevision) or if (logContext == null) throw new
ReplicationException("missing log for revision " + nextRevision, logContext));
ensure you reference the same ReplicationLogContext variable and throw a
ReplicationException (or similar) so subsequent code using logContext.log() /
l.command() is safe.

---

Nitpick comments:
In
`@server/src/main/java/com/linecorp/centraldogma/server/internal/replication/ZooKeeperCommandExecutor.java`:
- Around line 1228-1234: The call logContext.setBytes(bytes) before checking
compressed is redundant because bytes is overwritten after potential
decompression; in ZooKeeperCommandExecutor adjust the logic to set logContext
bytes only once after handling compression: remove the early
logContext.setBytes(bytes) (or move that assignment to after the if
(Boolean.TRUE.equals(logMeta.compressed())) { bytes = Zstd.decompress(bytes); })
so that logContext.setBytes(bytes) is invoked only once with the final byte
array; reference variables: logContext, logMeta, compressed, and
Zstd.decompress.
- Around line 1206-1240: The method loadLog currently never returns null (it
always returns a ReplicationLogContext or throws), so remove the misleading
`@Nullable` annotation on loadLog (and any related import), update callers like
replayLogs which currently do an assert l != null to rely on the non-null return
(remove that assert and any defensive null-branches), and adjust any
static-analysis or javadoc if present to reflect that loadLog, which returns
ReplicationLogContext and populates logContext, is non-null and will throw
ReplicationException on failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d16fe29b-dfa2-4b60-a1d6-7ab13e9e302c

📥 Commits

Reviewing files that changed from the base of the PR and between dbbceab and d49be35.

📒 Files selected for processing (2)
  • server/src/main/java/com/linecorp/centraldogma/server/internal/replication/ZooKeeperCommandExecutor.java
  • server/src/test/java/com/linecorp/centraldogma/server/internal/replication/ZooKeeperCommandExecutorTest.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/src/test/java/com/linecorp/centraldogma/server/internal/replication/ZooKeeperCommandExecutorTest.java

@ikhoon ikhoon added this to the 0.84.0 milestone May 20, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/src/main/java/com/linecorp/centraldogma/server/CentralDogma.java`:
- Around line 1436-1440: The leader-scoped PluginGroup instances
(pluginsForLeaderOnly and pluginsForZoneLeaderOnly) are being closed before the
command executor is stopped, which prevents their stop(...) callbacks from
reliably submitting work; move the calls that close pluginsForLeaderOnly and
pluginsForZoneLeaderOnly to after CentralDogma.this.doStop()/the command
executor shutdown so their release-leadership callbacks can submit stop tasks to
the executor and run to completion.

In `@server/src/main/java/com/linecorp/centraldogma/server/PluginGroup.java`:
- Around line 182-199: The close() method in PluginGroup currently calls
executor.shutdown() and loops on awaitTermination indefinitely, which can hang;
change it to perform a bounded graceful shutdown: call executor.shutdown(), wait
up to a configurable timeout (e.g., total wait of N seconds via repeated
executor.awaitTermination calls or a single awaitTermination), and if the
executor does not terminate within that timeout call executor.shutdownNow() to
cancel stuck tasks; preserve the InterruptedException handling (set interrupted
flag and re-interrupt thread in finally), and update the logger message to
reflect escalation (use executor.shutdownNow() result if helpful).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cfd0a5d9-8b00-432f-95eb-532578755885

📥 Commits

Reviewing files that changed from the base of the PR and between fb263b8 and 4c09857.

📒 Files selected for processing (2)
  • server/src/main/java/com/linecorp/centraldogma/server/CentralDogma.java
  • server/src/main/java/com/linecorp/centraldogma/server/PluginGroup.java

Comment thread server/src/main/java/com/linecorp/centraldogma/server/CentralDogma.java Outdated
Comment on lines +182 to +199
void close() {
executor.shutdown();
boolean interrupted = false;
try {
while (!executor.isTerminated()) {
try {
executor.awaitTermination(1, TimeUnit.SECONDS);
} catch (InterruptedException ignore) {
interrupted = true;
}
}
} catch (Exception cause) {
logger.warn("Failed to shutdown the {}:", executor, cause);
} finally {
if (interrupted) {
Thread.currentThread().interrupt();
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Bound close() so a stuck plugin task can't hang shutdown.

shutdown() plus an unbounded awaitTermination() loop means one blocked plugin task can stall server shutdown forever here. This needs a timeout and a shutdownNow() fallback, especially now that CentralDogma calls close() on every plugin group during stop.

⏱️ Suggested shutdown escalation
     void close() {
         executor.shutdown();
         boolean interrupted = false;
         try {
-            while (!executor.isTerminated()) {
-                try {
-                    executor.awaitTermination(1, TimeUnit.SECONDS);
-                } catch (InterruptedException ignore) {
-                    interrupted = true;
-                }
+            if (!executor.awaitTermination(10, TimeUnit.SECONDS)) {
+                executor.shutdownNow();
+                if (!executor.awaitTermination(10, TimeUnit.SECONDS)) {
+                    logger.warn("Timed out shutting down {}", executor);
+                }
             }
+        } catch (InterruptedException cause) {
+            interrupted = true;
+            executor.shutdownNow();
         } catch (Exception cause) {
             logger.warn("Failed to shutdown the {}:", executor, cause);
         } finally {
             if (interrupted) {
                 Thread.currentThread().interrupt();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void close() {
executor.shutdown();
boolean interrupted = false;
try {
while (!executor.isTerminated()) {
try {
executor.awaitTermination(1, TimeUnit.SECONDS);
} catch (InterruptedException ignore) {
interrupted = true;
}
}
} catch (Exception cause) {
logger.warn("Failed to shutdown the {}:", executor, cause);
} finally {
if (interrupted) {
Thread.currentThread().interrupt();
}
}
void close() {
executor.shutdown();
boolean interrupted = false;
try {
if (!executor.awaitTermination(10, TimeUnit.SECONDS)) {
executor.shutdownNow();
if (!executor.awaitTermination(10, TimeUnit.SECONDS)) {
logger.warn("Timed out shutting down {}", executor);
}
}
} catch (InterruptedException cause) {
interrupted = true;
executor.shutdownNow();
} catch (Exception cause) {
logger.warn("Failed to shutdown the {}:", executor, cause);
} finally {
if (interrupted) {
Thread.currentThread().interrupt();
}
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/main/java/com/linecorp/centraldogma/server/PluginGroup.java`
around lines 182 - 199, The close() method in PluginGroup currently calls
executor.shutdown() and loops on awaitTermination indefinitely, which can hang;
change it to perform a bounded graceful shutdown: call executor.shutdown(), wait
up to a configurable timeout (e.g., total wait of N seconds via repeated
executor.awaitTermination calls or a single awaitTermination), and if the
executor does not terminate within that timeout call executor.shutdownNow() to
cancel stuck tasks; preserve the InterruptedException handling (set interrupted
flag and re-interrupt thread in finally), and update the logger message to
reflect escalation (use executor.shutdownNow() result if helpful).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant