Skip to content

Conversation

@arjun4084346
Copy link
Contributor

@arjun4084346 arjun4084346 commented Nov 4, 2025

Problem Statement

This pull request extend RT Versioning introduced for user stores in PR#1555, PR#1657 to system stores.
To find the right RT name, we need to now have Store object of a system store, which is the why we have added a unified store resolver for system store access in various components.

System Store Resolution Improvements

  • Added a getStore method to DaVinciBackend to correctly resolve system stores and user stores, improving how system store attributes are accessed.

System Store Topic Handling

  • Modified logic in StoreIngestionTask to use the store resolver for system store topic name resolution, ensuring correct topic handling for meta stores.
  • Updated utility usage to fetch real-time topic names from Store objects rather than raw strings, improving correctness.

Solution

Code changes

  • Added new code behind a config. If so list the config names and their default values in the PR description.
  • Introduced new log lines.
    • Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.

Copilot AI review requested due to automatic review settings November 4, 2025 23:35
@arjun4084346 arjun4084346 marked this pull request as draft November 4, 2025 23:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR appears to be a work-in-progress change that modifies real-time topic handling logic and adds debugging code. The main changes include updating a log message to include the real-time topic name, modifying the composeRealTimeTopic method to delegate to its versioned counterpart, and temporarily altering the isRTVersioningApplicable logic.

  • Modified composeRealTimeTopic(String) to delegate to the versioned method with version 1
  • Changed isRTVersioningApplicable to always return true instead of checking system store types
  • Added debugging statements with System.out.println calls

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
VeniceHelixAdmin.java Enhanced log message to include the real-time topic name for better debugging
Utils.java Modified real-time topic composition and versioning applicability logic with commented-out code and debug statements
MetaStoreWriter.java Added unused variable declaration for old topic name
PubSubTopicImpl.java Added debugging code with empty print statement for specific topic pattern

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@arjun4084346 arjun4084346 force-pushed the rtsystemstore branch 7 times, most recently from e3f7077 to 6498e12 Compare November 13, 2025 11:04
@arjun4084346 arjun4084346 marked this pull request as ready for review November 13, 2025 11:04
Copilot AI review requested due to automatic review settings November 13, 2025 11:04
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 15 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@arjun4084346 arjun4084346 changed the title [controller] extend rt versioning to system stores [controller] [server] extend rt versioning to system stores Nov 13, 2025
Copilot AI review requested due to automatic review settings November 13, 2025 21:56
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 15 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return storeRepository;
}

public final Object getStore(String storeName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic scattered in here is not great IMO.
I think the main purpose to extract push status store's RT version right? Then can we just instead pass ReadOnlyStoreRepository interface into the the PushStatusStoreWriter constructor, and extract the user store object -> extract system store info -> get largest RT version when preparing VW? I think this will make the logic hidden inside the corresponding object.
Same comment goes to MetaStoreWriter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the code again. I think even with the discussion we had offline, I think this method is still not great as (1) this is internal usage and we should not expose as public and leave it in this class (2) the usage is limited.
Also, I see meta store writer and push status store writer is having duplicate logic of getting RT topic from the Store-like object.
Can we move this method to Util class, and also create a method to extract that duplicate logic from both class to be a getRealtimeTopicNameFromStore util method?

Store store = storeResolver.apply(metaStoreName);
int largestUsedRTVersionNumber;
VeniceSystemStoreType type = VeniceSystemStoreType.getSystemStoreType(store.getName());
if (type != null && store.isSystemStore()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the store is not system store, should we just return null, as this function is for system store only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if block is actually to distinguish b/w user system store and zkShared system store

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@linkedin linkedin deleted a comment from Copilot AI Dec 8, 2025
@linkedin linkedin deleted a comment from Copilot AI Dec 8, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@linkedin linkedin deleted a comment from Copilot AI Dec 9, 2025
@linkedin linkedin deleted a comment from Copilot AI Dec 9, 2025
@linkedin linkedin deleted a comment from Copilot AI Dec 9, 2025
@linkedin linkedin deleted a comment from Copilot AI Dec 9, 2025
@linkedin linkedin deleted a comment from Copilot AI Dec 9, 2025
@linkedin linkedin deleted a comment from Copilot AI Dec 9, 2025
if (storeIngestionTask.isHybridMode() && partitionConsumptionState.isEndOfPushReceived()
&& partitionConsumptionState.getLeaderFollowerState() == LeaderFollowerStateType.LEADER) {
ingestingTopic = pubSubTopicRepository.getTopic(Utils.composeRealTimeTopic(storeName));
ingestingTopic = pubSubTopicRepository.getTopic(Utils.getRealTimeTopicName(store));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this util come with a fallback? I think it is important to have a fallback inside the util method so that it does not throw exception in a random place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it does come with a fallback, for stores not using rt versioning, it returns "storeName_rt"

return storeRepository;
}

public final Object getStore(String storeName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the code again. I think even with the discussion we had offline, I think this method is still not great as (1) this is internal usage and we should not expose as public and leave it in this class (2) the usage is limited.
Also, I see meta store writer and push status store writer is having duplicate logic of getting RT topic from the Store-like object.
Can we move this method to Util class, and also create a method to extract that duplicate logic from both class to be a getRealtimeTopicNameFromStore util method?

Copilot AI review requested due to automatic review settings January 6, 2026 19:23
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


public static String getRealTimeTopicName(Store store) {
if (store instanceof SystemStore) {
return getRealTimeTopicName(store, ((SystemStore) store).getVeniceStore().getLargestUsedRTVersionNumber());
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential NullPointerException when casting to SystemStore. If store.getVeniceStore() returns null, this will throw a NullPointerException. Add a null check before accessing the venice store's RT version number.

Suggested change
return getRealTimeTopicName(store, ((SystemStore) store).getVeniceStore().getLargestUsedRTVersionNumber());
SystemStore systemStore = (SystemStore) store;
Store veniceStore = systemStore.getVeniceStore();
int rtVersionNumber =
veniceStore != null ? veniceStore.getLargestUsedRTVersionNumber() : DEFAULT_RT_VERSION_NUMBER;
return getRealTimeTopicName(store, rtVersionNumber);

Copilot uses AI. Check for mistakes.
int largestUsedRTVersionNumber;
VeniceSystemStoreType type = VeniceSystemStoreType.getSystemStoreType(store.getName());
if (type != null && store.isSystemStore()) {
largestUsedRTVersionNumber = ((SystemStore) store).getVeniceStore().getLargestUsedRTVersionNumber();
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential NullPointerException when casting to SystemStore. If ((SystemStore) store).getVeniceStore() returns null, the subsequent call to getLargestUsedRTVersionNumber() will fail. Add a null check before accessing the venice store.

Suggested change
largestUsedRTVersionNumber = ((SystemStore) store).getVeniceStore().getLargestUsedRTVersionNumber();
Store veniceStore = ((SystemStore) store).getVeniceStore();
if (veniceStore != null) {
largestUsedRTVersionNumber = veniceStore.getLargestUsedRTVersionNumber();
} else {
largestUsedRTVersionNumber = store.getLargestUsedRTVersionNumber();
}

Copilot uses AI. Check for mistakes.
VeniceSystemStoreType type = VeniceSystemStoreType.getSystemStoreType(newStore.getName());
int newNumber;
if (type == null && newStore.isSystemStore()) {
// Top-level shared ZK store
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 1259 is misleading. It says "Top-level shared ZK store" but the condition checks for cases where the type is null AND the store is a system store. This description doesn't clearly explain what kind of stores fall into this category. Consider updating the comment to be more specific about which stores are affected by this condition.

Suggested change
// Top-level shared ZK store
// System store without a specific VeniceSystemStoreType (for example, a shared top-level ZK-backed system store)

Copilot uses AI. Check for mistakes.
Comment on lines +575 to +576
Store userStore = storeRepository.getStore(userStoreName);
Map<String, SystemStoreAttributes> systemStores = userStore.getSystemStores();
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential NullPointerException when userStore is null. If the user store doesn't exist in the repository, calling userStore.getSystemStores() will throw a NullPointerException. Add a null check before accessing userStore methods.

Copilot uses AI. Check for mistakes.
}
return null;
} else {
return Utils.getRealTimeTopicName(storeRepository.getStore(storeName));
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential NullPointerException when calling Utils.getRealTimeTopicName() with a null store. If storeRepository.getStore(storeName) returns null, this will fail. Add a null check before calling the utility method.

Copilot uses AI. Check for mistakes.
Comment on lines +1876 to +1879
int largestUsedRTVersionNumber =
store.isSystemStore() && VeniceSystemStoreType.getSystemStoreType(store.getName()) != null
? ((SystemStore) store).getVeniceStore().getLargestUsedRTVersionNumber()
: store.getLargestUsedRTVersionNumber();
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential NullPointerException when casting to SystemStore. If ((SystemStore) store).getVeniceStore() returns null, the subsequent call to getLargestUsedRTVersionNumber() will fail. Add a null check before accessing the venice store.

Suggested change
int largestUsedRTVersionNumber =
store.isSystemStore() && VeniceSystemStoreType.getSystemStoreType(store.getName()) != null
? ((SystemStore) store).getVeniceStore().getLargestUsedRTVersionNumber()
: store.getLargestUsedRTVersionNumber();
int largestUsedRTVersionNumber = store.getLargestUsedRTVersionNumber();
if (store.isSystemStore() && VeniceSystemStoreType.getSystemStoreType(store.getName()) != null) {
Store veniceStore = ((SystemStore) store).getVeniceStore();
if (veniceStore != null) {
largestUsedRTVersionNumber = veniceStore.getLargestUsedRTVersionNumber();
}
}

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings January 6, 2026 22:29
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +76 to +82
@BeforeClass
public void init() {
mockZkFactory = mockStatic(ZkClientFactory.class);
ZkClient mockZkClient = mock(ZkClient.class);
mockZkFactory.when(() -> ZkClientFactory.newZkClient(anyString())).thenReturn(mockZkClient);
doNothing().when(mockZkClient).subscribeStateChanges(any(ZkClientStatusStats.class));
}
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential test resource leak. The mockZkFactory MockedStatic is created in BeforeClass but never closed in an AfterClass method. MockedStatic instances should be closed to avoid resource leaks and test interference. Add an AfterClass method that calls mockZkFactory.close().

Copilot uses AI. Check for mistakes.
commonConfig.getMetaStoreWriterCloseTimeoutInMS(),
commonConfig.getMetaStoreWriterCloseConcurrency());
commonConfig.getMetaStoreWriterCloseConcurrency(),
storeName -> Utils.getRealTimeTopicName(getStore(discoverCluster(storeName), storeName)));
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential NullPointerException if getStore returns null or discoverCluster returns null. When the store or cluster is not found, calling Utils.getRealTimeTopicName with a null Store will cause a NullPointerException. Add null checks within the lambda to handle cases where the store or cluster cannot be found, returning null or a default value.

Copilot uses AI. Check for mistakes.
currentTimestamp);
getKafkaStoreIngestionService().attemptToPrintIngestionInfoFor(
storeName.getKey(),
metadataRepository.getStore(storeName.getKey()),
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential NullPointerException if metadataRepository.getStore returns null. When the store is not found in the repository, calling attemptToPrintIngestionInfoFor with a null Store will cause a NullPointerException when accessing store.getName() on line 1491. Add a null check before calling attemptToPrintIngestionInfoFor or handle the null case within the method.

Copilot uses AI. Check for mistakes.
Comment on lines +577 to +580
for (Map.Entry<String, SystemStoreAttributes> systemStoreEntries: systemStores.entrySet()) {
if (storeName.startsWith(systemStoreEntries.getKey())) {
return Utils.getRealTimeTopicName(systemStoreEntries.getValue());
}
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inefficient string matching using startsWith. The code iterates through all system store entries and uses storeName.startsWith(systemStoreEntries.getKey()) to find a match. This approach is fragile because it relies on prefix matching rather than exact matching. Consider using a direct lookup by comparing the full store name or using a more precise matching strategy to avoid false positives if one system store name is a prefix of another.

Suggested change
for (Map.Entry<String, SystemStoreAttributes> systemStoreEntries: systemStores.entrySet()) {
if (storeName.startsWith(systemStoreEntries.getKey())) {
return Utils.getRealTimeTopicName(systemStoreEntries.getValue());
}
SystemStoreAttributes systemStoreAttributes = systemStores.get(storeName);
if (systemStoreAttributes != null) {
return Utils.getRealTimeTopicName(systemStoreAttributes);

Copilot uses AI. Check for mistakes.
private String getRealTimeTopicName(String storeName) {
VeniceSystemStoreType systemStoreType = VeniceSystemStoreType.getSystemStoreType(storeName);
if (systemStoreType != null) {
// it is a user system store
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything else is good but I just found that this assumption might not be correct. There is a BATCH JOB HB system store which is a ZK shared one. I believe we should exclude that.

Copilot AI review requested due to automatic review settings January 7, 2026 22:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 30 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

public void testDeleteMetaSystemStore() {
// Test when running in child fabric - should truncate RT topic
when(mockAdmin.isParent()).thenReturn(false);
when(mockStore.getName()).thenReturn(META_STORE_NAME);
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test now sets mockStore.getName() to return the system store name. This is necessary because the production code now calls Utils.getRealTimeTopicName(systemStore) which requires the store's name. However, the mock should also set up getLargestUsedRTVersionNumber() to ensure the method behaves correctly, especially when RT versioning is enabled.

Copilot uses AI. Check for mistakes.
pubSubTopicRepository,
5000L,
2,
storeName -> Utils.getRealTimeTopicNameForSystemStore(systemStore));
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test passes a lambda storeName -> Utils.getRealTimeTopicNameForSystemStore(systemStore) as the resolver, but this lambda ignores the storeName parameter and always returns the RT topic name for systemStore. This doesn't match the expected behavior where the resolver should use the provided store name to look up the correct store. The test should either use the storeName parameter or use a different approach to verify the behavior.

Suggested change
storeName -> Utils.getRealTimeTopicNameForSystemStore(systemStore));
storeName -> {
Assert.assertEquals(systemStoreName, storeName);
return Utils.getRealTimeTopicNameForSystemStore(systemStore);
});

Copilot uses AI. Check for mistakes.
pubSubTopicRepository,
5000L,
2,
storeName1 -> Utils.getRealTimeTopicNameForSystemStore(store));
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test passes a lambda that ignores the storeName parameter and always uses store. This doesn't verify that the resolver correctly uses the store name parameter. The lambda should look up the store based on the provided storeName parameter.

Suggested change
storeName1 -> Utils.getRealTimeTopicNameForSystemStore(store));
storeName1 -> Utils.getRealTimeTopicNameForSystemStore(storeName1));

Copilot uses AI. Check for mistakes.
Comment on lines 649 to 654
public static String getRealTimeTopicName(Store store) {
if (store instanceof SystemStore) {
return getRealTimeTopicName(store, ((SystemStore) store).getVeniceStore().getLargestUsedRTVersionNumber());
}
return getRealTimeTopicName(store, DEFAULT_RT_VERSION_NUMBER);
}
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method getRealTimeTopicName(Store store) on line 649 doesn't check if the store parameter is null before accessing its methods. If a null store is passed, this will throw a NullPointerException when calling store instanceof SystemStore.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants