[#31220] CDC: Add cleanup_stale_cdc_streams yb-admin command#32025
[#31220] CDC: Add cleanup_stale_cdc_streams yb-admin command#32025egladysh wants to merge 1 commit into
Conversation
✅ Deploy Preview for infallible-bardeen-164bc9 ready!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Code Review
This pull request introduces a new administrative command cleanup_stale_cdc_streams to yb-admin (along with the corresponding master RPC CleanupStaleCDCStreams) to identify and purge stale entries from the cdc_state table when their associated streams or tablets no longer exist. The changes include documentation, integration tests, and CLI/client support. The review feedback highlights a potential null pointer dereference of cdc_state_table_ in CatalogManager::CleanupStaleCDCStreams before calling GetTableRangeAsync, which could cause a master crash if the table is uninitialized; adding a defensive null check is recommended.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| ASSERT_STR_CONTAINS(resp.error().status().message(), "cdc_state table does not exist"); | ||
| } | ||
|
|
||
| TEST_F(CDCServiceTest, TestCleanupStaleCDCStreamsDryRunAndDelete) { |
There was a problem hiding this comment.
Could we follow a test pattern similar to CDCSDKYsqlTest::TestValidationAndSyncOfCDCStateEntriesAfterUserTableRemoval - this tests the yb-admin command directly
There was a problem hiding this comment.
@asrinivasanyb To confirm, you're after the yb-admin CLI-as-subprocess pattern, not relocating this into the CDCSDKYsqlTest fixture, right?
| } | ||
|
|
||
| std::vector<cdc::CDCStateTableKey> all_entry_keys; | ||
| for (const auto& entry_result : *all_entry_keys_result) { |
There was a problem hiding this comment.
Why do we need 2 passes over the cdc_state table ?
There was a problem hiding this comment.
This is only one pass over cdc_state? We materialize CDCStateTableRange. The later loops are over the in-memory all_entry_keys vector: first to collect candidate tablet IDs for batched metadata lookup, and then to classify rows using that metadata.
| } | ||
| } | ||
|
|
||
| std::unordered_map<TabletId, std::vector<scoped_refptr<TableInfo>>> tablet_tables_map; |
There was a problem hiding this comment.
isn't this required only for the colocated table case ?
There was a problem hiding this comment.
It is used for both colocated and non-colocated tablets. For non-colocated tablets this normally contains the single user table for the tablet. For colocated tablets it can contain multiple tables, so the same map lets us report all resolved table metadata.
Summary
Adds a
CleanupStaleCDCStreamsRPC to theMasterReplicationmaster service and a correspondingcleanup_stale_cdc_streamsyb-admincommand.The command scans the
cdc_statetable and identifies stale entries — rows whose CDC stream no longer exists, or whose tablet no longer exists. It supports:--dry_runflag to report stale entries without deleting them.Example (dry run):
Documentation for the new command is added under the
yb-adminChange Data Capture commands indocs/content/stable/admin/yb-admin.md.Implementation notes:
cdc_statetable is scanned once and materialized so both the tablet-collection and classification loops share one consistent snapshot.SharedLockbefore the classification loop to avoid repeated lock acquisitions.namespace_idhave their namespace resolved via their firsttable_id.Upgrade/Rollback safety
The proto change is purely additive: new messages (
CleanupStaleCDCStreamsRequestPB,CleanupStaleCDCStreamsResponsePB) and a new RPC method (CleanupStaleCDCStreams) are added toMasterReplication. No existing message fields are modified.On a mixed-version cluster, an older master that does not have this RPC will return
UNIMPLEMENTEDwhencleanup_stale_cdc_streamsis invoked; all other CDC and xCluster operations are unaffected.Rolling back removes the
yb-admincommand and the RPC handler. No on-disk state, catalog schema, or gflag defaults are changed.Test plan
CDCServiceTest.TestCleanupStaleCDCStreamsWithoutCDCStateTable— verifiesOBJECT_NOT_FOUNDwhen thecdc_statetable does not exist.CDCServiceTest.TestCleanupStaleCDCStreamsDryRunAndDelete— dry-run returns stale entries without deleting; live run deletes exactly those entries and leaves the valid entry untouched.CDCServiceTest.TestCleanupStaleCDCStreamsNamespaceFilter— with a namespace filter, only entries attributable to the selected namespace are deleted; cross-namespace and unattributable (both stream and tablet missing) entries are preserved.