checker: get incremental data without list | pd=release-8.5-20260121-v8.5.5 tikv=release-8.5-20260121-v8.5.5 tidb=release-8.5-20260121-v8.5.5#4778
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces an alternative file discovery mechanism for the multi-cluster consistency checker's S3 consumer. The new mechanism, enabled by the EnableListByFileIndex configuration, uses index files to discover schema and data files, which improves reliability by not relying on the eventual consistency of directory listings. The file discovery logic has been refactored into a newFileDiscoverer interface with two implementations: directoryBasedNewFileDiscoverer (the existing approach) and indexBasedNewFileDiscoverer (the new index-based approach). New configuration options and validations have been added to ensure correct usage, such as requiring date-separator=none and disabling enable-table-across-nodes when the index-based discovery is active. The review comments suggest renaming inconsistent parameter names for clarity and fixing a stuttering function name in the new file discoverer factory.
| func NewS3Consumer( | ||
| s3Storage storage.ExternalStorage, | ||
| tables map[string][]string, | ||
| enableSchemaIndexByGetObject bool, |
There was a problem hiding this comment.
The parameter name enableSchemaIndexByGetObject is inconsistent with the configuration option enable-list-by-file-index and the parameter name used in the NewNewFileDiscoverer factory function. For better code clarity and maintainability, consider renaming it to enableListByFileIndex.
| enableSchemaIndexByGetObject bool, | |
| enableListByFileIndex bool, |
| ) (map[cloudstorage.DmlPathKey]fileIndexRange, error) | ||
| } | ||
|
|
||
| func NewNewFileDiscoverer(c *S3Consumer, enableListByFileIndex bool) newFileDiscoverer { |
There was a problem hiding this comment.
The function name NewNewFileDiscoverer stutters. To improve readability, consider renaming it to NewFileDiscoverer. This change should be propagated to its call site in consumer.go.
| func NewNewFileDiscoverer(c *S3Consumer, enableListByFileIndex bool) newFileDiscoverer { | |
| func NewFileDiscoverer(c *S3Consumer, enableListByFileIndex bool) newFileDiscoverer { |
| checkpointWatcher Watcher, | ||
| s3Storage storage.ExternalStorage, | ||
| tables map[string][]string, | ||
| enableSchemaIndexByGetObject bool, |
There was a problem hiding this comment.
The parameter name enableSchemaIndexByGetObject is inconsistent with the global configuration option enable-list-by-file-index. To improve consistency and readability across the codebase, please consider renaming it to enableListByFileIndex.
| enableSchemaIndexByGetObject bool, | |
| enableListByFileIndex bool, |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: flowbehappy, hongyunyan The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
/retest |
2ddebec
into
pingcap:release-8.5-20260213-v8.5.5
What problem does this PR solve?
Issue Number: close #4244
What is changed and how it works?
read the file index to get the incremental files and table versions
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note