-
Notifications
You must be signed in to change notification settings - Fork 195
[Enhancement] support lookup data directly. #427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: wuxueyang.wxy <[email protected]>
Signed-off-by: wuxueyang.wxy <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new lookup execution strategy that allows users to choose between cached and direct database query approaches for StarRocks lookups. The cached approach is suitable for small datasets, while the direct query approach better handles large-scale data.
- Adds a new configuration option
lookup.cache.enabledto control lookup behavior - Introduces a new
StarRocksDynamicLookupFunctionfor direct JDBC queries - Renames the existing cached lookup implementation to
StarRocksDynamicCachedLookupFunction
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| StarRocksSourceOptions.java | Adds LOOKUP_ENABLE_CACHE config option and getter method |
| StarRocksDynamicTableSourceFactory.java | Registers the new cache enable option |
| StarRocksDynamicTableSource.java | Implements factory method to choose between cached/direct lookup functions |
| StarRocksDynamicLookupFunction.java | Complete rewrite to implement direct JDBC-based lookup functionality |
| StarRocksDynamicCachedLookupFunction.java | New file containing the original cached lookup implementation |
Comments suppressed due to low confidence (1)
| public Collection<RowData> lookup(RowData rowData) { | ||
| reloadData(); | ||
| Row keyRow = genRow(rowData); | ||
| return cacheMap.get(keyRow); |
Copilot
AI
Jul 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method returns null when no matching data is found, but the return type is Collection<RowData>. This could cause NullPointerException in calling code. Should return Collections.emptyList() instead.
| return cacheMap.get(keyRow); | |
| return cacheMap.getOrDefault(keyRow, Collections.emptyList()); |
| GenericRowData gRowData = (GenericRowData) rowData; | ||
| Object[] keyObj = new Object[filterRichInfos.length]; | ||
| for (int i = 0; i < filterRichInfos.length; i ++) { |
Copilot
AI
Jul 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method assumes the input RowData is a GenericRowData without type checking, which could cause ClassCastException. Should add type checking or handle different RowData implementations.
| GenericRowData gRowData = (GenericRowData) rowData; | |
| Object[] keyObj = new Object[filterRichInfos.length]; | |
| for (int i = 0; i < filterRichInfos.length; i ++) { | |
| if (!(rowData instanceof GenericRowData)) { | |
| LOG.error("Expected GenericRowData but received: {}", rowData.getClass().getName()); | |
| throw new IllegalArgumentException("Invalid RowData type. Expected GenericRowData."); | |
| } | |
| GenericRowData gRowData = (GenericRowData) rowData; | |
| Object[] keyObj = new Object[filterRichInfos.length]; | |
| for (int i = 0; i < filterRichInfos.length; i++) { |
What type of PR is this:
Which issues of this PR fixes :
Fixes #
Problem Summary(Required) :
Currently, we have a
StarRocksDynamicLookupFunctionwhich collects all of the data on StarRocks and caches the data until it expires. It is good at handling small-scale data, but bad at handling large-scale data.The pr rename
StarRocksDynamicLookupFunctiontoStarRocksDynamicCachedLookupFunction, and add a newStarRocksDynamicLookupFunctionto execute the query directly.Checklist: