-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Pipe: support pattern pruning and redundancy removal in TreePattern parsing #16789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the TreePattern parsing logic to support pattern pruning and redundancy removal. The changes optimize pattern lists by removing redundant patterns (e.g., if "root." exists, "root.db." is removed) and pruning inclusion patterns that are fully covered by exclusion patterns. This improves configuration validation and prevents the creation of pipe patterns that would match nothing.
Key Changes
- Refactored pattern parsing to work with pattern lists before building union patterns
- Added
optimizePatterns()to remove redundant patterns within inclusion/exclusion lists - Added
pruneInclusionPatterns()to remove inclusion patterns fully covered by exclusion patterns - Added early validation to throw a descriptive error when all inclusion patterns are pruned
- Optimized
buildUnionPattern()to return the pattern directly for single-element lists
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } catch (final Exception e) { | ||
| // In case of path parsing errors or unsupported operations, assume no coverage | ||
| // to be safe and avoid aggressive pruning. | ||
| return false; | ||
| } |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The covers method silently catches all exceptions and returns false without logging. While the comment explains this is to "be safe and avoid aggressive pruning", this could hide legitimate errors such as IllegalPathException or NullPointerException. Consider logging the exception at DEBUG or TRACE level to aid in debugging, or at least being more specific about which exceptions are expected (e.g., catch only IllegalPathException).
| /** Checks if 'coverer' pattern fully covers 'coveree' pattern. */ | ||
| private static boolean covers(final TreePattern coverer, final TreePattern coveree) { | ||
| try { | ||
| final List<PartialPath> covererPaths = coverer.getBaseInclusionPaths(); | ||
| final List<PartialPath> covereePaths = coveree.getBaseInclusionPaths(); | ||
|
|
||
| if (covererPaths.isEmpty() || covereePaths.isEmpty()) { | ||
| return false; | ||
| } | ||
|
|
||
| // Logic: For 'coverer' to cover 'coveree', ALL paths in 'coveree' must be included | ||
| // by at least one path in 'coverer'. | ||
| for (final PartialPath sub : covereePaths) { | ||
| boolean isSubCovered = false; | ||
| for (final PartialPath sup : covererPaths) { | ||
| if (sup.include(sub)) { | ||
| isSubCovered = true; | ||
| break; | ||
| } | ||
| } | ||
| if (!isSubCovered) { | ||
| return false; | ||
| } | ||
| } | ||
| return true; | ||
| } catch (final Exception e) { | ||
| // In case of path parsing errors or unsupported operations, assume no coverage | ||
| // to be safe and avoid aggressive pruning. | ||
| return false; | ||
| } | ||
| } |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new methods optimizePatterns, pruneInclusionPatterns, and covers lack documentation. While optimizePatterns has a brief comment explaining its purpose, the other two methods are completely undocumented. Consider adding JavaDoc comments that explain:
- The purpose and algorithm of each method
- Parameter descriptions and constraints
- Return value specifications
- Edge cases and examples
For instance, covers should document what "covers" means in terms of path matching semantics.
| /** | ||
| * Prunes patterns from the inclusion list that are fully covered by ANY pattern in the exclusion | ||
| * list. | ||
| */ | ||
| private static List<TreePattern> pruneInclusionPatterns( | ||
| final List<TreePattern> inclusion, final List<TreePattern> exclusion) { | ||
| if (inclusion == null || inclusion.isEmpty()) { | ||
| return new ArrayList<>(); | ||
| } | ||
| if (exclusion == null || exclusion.isEmpty()) { | ||
| return inclusion; | ||
| } | ||
|
|
||
| // 4. If neither "source.path" nor "source.pattern" is specified, | ||
| // return the provided default pattern (which may be null). | ||
| return defaultPattern; | ||
| final List<TreePattern> prunedInclusion = new ArrayList<>(); | ||
| for (final TreePattern inc : inclusion) { | ||
| boolean isFullyExcluded = false; | ||
| for (final TreePattern exc : exclusion) { | ||
| if (covers(exc, inc)) { | ||
| isFullyExcluded = true; | ||
| break; | ||
| } | ||
| } | ||
| if (!isFullyExcluded) { | ||
| prunedInclusion.add(inc); | ||
| } | ||
| } | ||
| return prunedInclusion; | ||
| } |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for pruneInclusionPatterns is incomplete. While the method has a one-line comment, it should have proper JavaDoc documentation explaining:
- What "fully covered" means in this context
- Examples of when patterns would be pruned
- Edge case behavior (e.g., empty lists)
- The relationship with the
covershelper method
| // 4. Prune inclusion patterns: if an inclusion pattern is fully covered by an exclusion | ||
| // pattern, remove it | ||
| inclusionPatterns = pruneInclusionPatterns(inclusionPatterns, exclusionPatterns); | ||
|
|
||
| // 5. Check if the resulting inclusion pattern is empty | ||
| if (inclusionPatterns.isEmpty()) { | ||
| throw new PipeException( | ||
| "Pipe: The inclusion pattern is empty after pruning by the exclusion pattern. " | ||
| + "This pipe pattern will match nothing."); | ||
| } |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The pruning logic at this stage may result in redundant checks with the existing checkAndLogPatternCoverage method called in WithExclusionIoTDBTreePattern and WithExclusionTreePattern constructors (lines 201-204, 207-208). Both perform coverage checks, but at different granularities:
- The new pruning checks individual patterns in lists before building unions
- The existing check validates final TreePattern objects after construction
While this provides defense-in-depth, it could lead to different error messages for the same underlying issue. Consider documenting this relationship or consolidating the checks to avoid confusion.
| * UnionTreePattern. | ||
| */ | ||
| private static TreePattern buildUnionPattern( | ||
| final boolean isTreeModelDataAllowedToBeCaptured, final List<TreePattern> patterns) { |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The buildUnionPattern method now has an early return for single-element lists, but it doesn't validate that the list is non-empty. If an empty list is passed, it will throw an IndexOutOfBoundsException when trying to access patterns.get(0). Consider adding a null/empty check before this optimization.
| final boolean isTreeModelDataAllowedToBeCaptured, final List<TreePattern> patterns) { | |
| final boolean isTreeModelDataAllowedToBeCaptured, final List<TreePattern> patterns) { | |
| if (patterns == null || patterns.isEmpty()) { | |
| throw new IllegalArgumentException("patterns list must not be null or empty"); | |
| } |
| private static List<TreePattern> optimizePatterns(final List<TreePattern> patterns) { | ||
| if (patterns == null || patterns.isEmpty()) { | ||
| return new ArrayList<>(); | ||
| } | ||
| if (patterns.size() == 1) { | ||
| return patterns; | ||
| } | ||
|
|
||
| // 3. If only "source.pattern" is specified, parse it using the helper method. | ||
| if (pattern != null) { | ||
| return buildUnionPattern( | ||
| isTreeModelDataAllowedToBeCaptured, | ||
| parsePatternsFromPatternParameter( | ||
| pattern, sourceParameters, isTreeModelDataAllowedToBeCaptured)); | ||
| final List<TreePattern> optimized = new ArrayList<>(); | ||
| // Determine coverage using base paths | ||
| for (int i = 0; i < patterns.size(); i++) { | ||
| final TreePattern current = patterns.get(i); | ||
| boolean isCoveredByOther = false; | ||
|
|
||
| for (int j = 0; j < patterns.size(); j++) { | ||
| if (i == j) { | ||
| continue; | ||
| } | ||
| final TreePattern other = patterns.get(j); | ||
|
|
||
| // If 'other' covers 'current', 'current' is redundant. | ||
| // Note: if they mutually cover each other (duplicates), we must ensure we keep one. | ||
| // We use index comparison to break ties for exact duplicates. | ||
| if (covers(other, current)) { | ||
| if (covers(current, other)) { | ||
| // Both cover each other (likely identical). Keep the one with smaller index. | ||
| if (j < i) { | ||
| isCoveredByOther = true; | ||
| break; | ||
| } | ||
| } else { | ||
| // Strict coverage | ||
| isCoveredByOther = true; | ||
| break; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if (!isCoveredByOther) { | ||
| optimized.add(current); | ||
| } | ||
| } | ||
| return optimized; | ||
| } |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The optimizePatterns method has O(n²) complexity where each pattern is compared against all others, and each comparison involves calling covers() which iterates over paths. While this is acceptable for small pattern lists (typically a handful), consider documenting the expected maximum size of pattern lists in the method documentation to set appropriate expectations.
Caideyipi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May add some tests
As title.