-
Notifications
You must be signed in to change notification settings - Fork 3.8k
enhance: change default index of hybrid at high cardinality to stl_sort #47084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Buqian Zheng <[email protected]>
Summary of ChangesHello @zhengbuqian, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refines the automatic index selection strategy within the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
[ci-v2-notice] To rerun ci-v2 checks, comment with:
If you have any questions or requests, please contact @zhikunyao. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enhances the hybrid scalar index by changing the default index type for high-cardinality fields from INVERTED to STLSORT. This change affects both numeric types like float and std::string. The modifications in internal/core/src/index/HybridScalarIndex.cpp correctly implement this logic, and the default auto-index parameters in pkg/util/paramtable/autoindex_param.go are updated for float types to leverage this new HYBRID index behavior. The changes appear correct and align with the goal of improving performance. I have one suggestion to refactor some duplicated code to improve maintainability.
| if (distinct_vals.size() >= bitmap_index_cardinality_limit_) { | ||
| if constexpr (std::is_integral_v<T>) { | ||
| internal_index_type_ = ScalarIndexType::STLSORT; | ||
| } else { | ||
| internal_index_type_ = ScalarIndexType::INVERTED; | ||
| } | ||
| internal_index_type_ = ScalarIndexType::STLSORT; | ||
| } else { | ||
| internal_index_type_ = ScalarIndexType::BITMAP; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic for choosing the index type based on cardinality is now duplicated in four places within this file:
HybridScalarIndex<T>::SelectIndexBuildType(here, lines 59-63)HybridScalarIndex<std::string>::SelectIndexBuildType(lines 80-84)HybridScalarIndex<T>::SelectBuildTypeForPrimitiveType(lines 105-109)HybridScalarIndex<std::string>::SelectBuildTypeForPrimitiveType(lines 130-134)
To improve maintainability and reduce code duplication, consider extracting this logic into a private helper function. For example:
private:
void
ChooseIndexTypeByCardinality(size_t cardinality) {
if (cardinality >= bitmap_index_cardinality_limit_) {
internal_index_type_ = ScalarIndexType::STLSORT;
} else {
internal_index_type_ = ScalarIndexType::BITMAP;
}
}You could then replace the duplicated blocks with a call to this new helper function, like ChooseIndexTypeByCardinality(distinct_vals.size());.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic is intentionally kept inline because:
- It's only 4 lines of simple if-else
- These are template specializations that may diverge in the future
- Extracting such trivial logic may reduce readability without significant maintainability benefit
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #47084 +/- ##
==========================================
- Coverage 76.46% 76.44% -0.02%
==========================================
Files 2016 2016
Lines 325600 326297 +697
==========================================
+ Hits 248972 249442 +470
- Misses 68670 68862 +192
- Partials 7958 7993 +35
🚀 New features to boost your workflow:
|
|
@zhengbuqian go-sdk check failed, comment |
Signed-off-by: Buqian Zheng <[email protected]>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: zhengbuqian The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
issue: #47083