Skip to content

Conversation

@sofiaz11
Copy link
Collaborator

@sofiaz11 sofiaz11 commented Dec 17, 2025

Problem Statement

[controller] Skip truncating parent VT based on ConcurrentPushDetectionStrategy in the DeferredVersionSwapService.
We still try to truncate zombie parent VT during defer swap, this skip helps remove the unnecessary truncation.

Related PR: #2305

Solution

Code changes

  • Added new code behind a config. If so list the config names and their default values in the PR description.
  • Introduced new log lines.
  • Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.

@sofiaz11 sofiaz11 marked this pull request as ready for review December 18, 2025 03:12
Copilot AI review requested due to automatic review settings December 18, 2025 03:12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds conditional logic to skip truncating parent version topics (VT) during deferred version swap operations based on the ConcurrentPushDetectionStrategy. The change prevents unnecessary truncation of zombie parent VTs when the strategy is set to PARENT_VERSION_STATUS_ONLY, which doesn't require topic writes.

Key changes:

  • Modified DeferredVersionSwapService.updateStore() to check if topic write is needed before attempting truncation
  • Added test verifications to ensure truncation is skipped when appropriate strategy is configured
  • Updated test setup to configure ConcurrentPushDetectionStrategy in multiple test scenarios

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
services/venice-controller/src/main/java/com/linkedin/venice/controller/DeferredVersionSwapService.java Added logic to conditionally skip parent VT truncation based on ConcurrentPushDetectionStrategy.isTopicWriteNeeded()
services/venice-controller/src/test/java/com/linkedin/venice/controller/TestDeferredVersionSwapService.java Added test setup to configure PARENT_VERSION_STATUS_ONLY strategy and verify truncation is not called
services/venice-controller/src/test/java/com/linkedin/venice/controller/TestDeferredVersionSwapServiceWithSequentialRollout.java Added strategy configuration and verification for two test scenarios to ensure truncation behavior matches strategy

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// skip truncating if the topic was not created based on ConcurrentPushDetectionStrategy
if (strategy.isTopicWriteNeeded() && !veniceParentHelixAdmin.isTopicTruncated(kafkaTopicName)) {
LOGGER.info("Truncating parent VT for {}", kafkaTopicName);
veniceParentHelixAdmin.truncateKafkaTopic(Version.composeKafkaTopic(storeName, targetVersionNum));
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kafka topic name is being recomputed using Version.composeKafkaTopic(storeName, targetVersionNum) even though it was already computed on line 1228 and stored in the kafkaTopicName variable. Consider using the kafkaTopicName variable instead to avoid redundant computation and improve code maintainability.

Suggested change
veniceParentHelixAdmin.truncateKafkaTopic(Version.composeKafkaTopic(storeName, targetVersionNum));
veniceParentHelixAdmin.truncateKafkaTopic(kafkaTopicName);

Copilot uses AI. Check for mistakes.
if (!veniceParentHelixAdmin.isTopicTruncated(kafkaTopicName)) {
ConcurrentPushDetectionStrategy strategy =
veniceControllerMultiClusterConfig.getControllerConfig(clusterName).getConcurrentPushDetectionStrategy();
// skip truncating if the topic was not created based on ConcurrentPushDetectionStrategy
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment could be more precise. It currently says "skip truncating if the topic was not created" but the logic checks if isTopicWriteNeeded() returns true. Consider revising to: "skip truncating if the topic was not created (when strategy doesn't require topic writes)" to better explain the relationship between the strategy and topic creation.

Suggested change
// skip truncating if the topic was not created based on ConcurrentPushDetectionStrategy
// Skip truncating if the topic was not created, i.e., when the configured strategy does not require topic writes.

Copilot uses AI. Check for mistakes.
Comment on lines +603 to +604
doReturn(ConcurrentPushDetectionStrategy.TOPIC_BASED_ONLY).when(clusterConfig).getConcurrentPushDetectionStrategy();
doReturn(false).when(admin).isTopicTruncated(anyString());
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test sets ConcurrentPushDetectionStrategy to TOPIC_BASED_ONLY (line 603) and mocks isTopicTruncated to return false (line 604), which according to the code logic in DeferredVersionSwapService.java line 1232 would trigger truncation (since isTopicWriteNeeded() returns true for TOPIC_BASED_ONLY and the topic is not truncated). However, line 616 expects that truncateKafkaTopic is never called. This test may not be properly validating the intended behavior. Consider either changing the strategy to PARENT_VERSION_STATUS_ONLY if the intent is to test that truncation doesn't happen, or adjusting the expectation to verify that truncation does happen with TOPIC_BASED_ONLY strategy.

Copilot uses AI. Check for mistakes.
// Verify error recording was called due to the failure
verify(store, atLeastOnce()).updateVersionStatus(2, VersionStatus.PARTIALLY_ONLINE);
verify(admin, never()).rollForwardToFutureVersion(clusterName, storeName, region3);
verify(admin, never()).truncateKafkaTopic(anyString(), anyInt());
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verify call is checking for the wrong method signature. The actual code in DeferredVersionSwapService.java line 1234 calls truncateKafkaTopic(String topicName) with one parameter, but this test is verifying the two-parameter overload truncateKafkaTopic(anyString(), anyInt()). This means the test would not detect if the one-parameter version is called. Change this to verify(admin, never()).truncateKafkaTopic(anyString()); to match the actual method being called.

Suggested change
verify(admin, never()).truncateKafkaTopic(anyString(), anyInt());
verify(admin, never()).truncateKafkaTopic(anyString());

Copilot uses AI. Check for mistakes.
TestUtils.waitForNonDeterministicAssertion(5, TimeUnit.SECONDS, () -> {
// Verify that updateStore was called to mark parent version as ONLINE
verify(store, atLeastOnce()).updateVersionStatus(versionTwo, VersionStatus.ONLINE);
verify(admin, never()).truncateKafkaTopic(anyString(), anyInt());
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verify call is checking for the wrong method signature. The actual code in DeferredVersionSwapService.java line 1234 calls truncateKafkaTopic(String topicName) with one parameter, but this test is verifying the two-parameter overload truncateKafkaTopic(anyString(), anyInt()). This means the test would not detect if the one-parameter version is called. Change this to verify(admin, never()).truncateKafkaTopic(anyString()); to match the actual method being called.

Suggested change
verify(admin, never()).truncateKafkaTopic(anyString(), anyInt());
verify(admin, never()).truncateKafkaTopic(anyString());

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant