-
Notifications
You must be signed in to change notification settings - Fork 25
Add Substrait Protobuf Visitor Framework #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| package traverse | ||
|
|
||
| import ( | ||
| proto "github.com/substrait-io/substrait-protobuf/go/substraitpb" | ||
| ) | ||
|
|
||
| // MultiVisitor combines multiple visitors into a single visitor. | ||
| // All visitors are called for each node they support. | ||
| type MultiVisitor struct { | ||
| visitors []Visitor | ||
| } | ||
|
|
||
| // NewMultiVisitor creates a visitor that delegates to multiple visitors. | ||
| func NewMultiVisitor(visitors ...Visitor) *MultiVisitor { | ||
| return &MultiVisitor{visitors: visitors} | ||
| } | ||
|
|
||
| // VisitRel calls VisitRel on all visitors that implement RelVisitor. | ||
| func (m *MultiVisitor) VisitRel(rel *proto.Rel) { | ||
| for _, v := range m.visitors { | ||
| if rv, ok := v.(RelVisitor); ok { | ||
| rv.VisitRel(rel) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // VisitExpr calls VisitExpr on all visitors that implement ExprVisitor. | ||
| func (m *MultiVisitor) VisitExpr(expr *proto.Expression) { | ||
| for _, v := range m.visitors { | ||
| if ev, ok := v.(ExprVisitor); ok { | ||
| ev.VisitExpr(expr) | ||
| } | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,117 @@ | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| /* | ||
| Package traverse provides a visitor framework for analyzing Substrait plans | ||
| with multiple read-only operations in a single tree traversal. | ||
|
|
||
| ## Key Features: | ||
| - Interface-based visitor pattern | ||
| - Single tree walk for multiple analyses | ||
| - Sequential execution | ||
| - Read-only operations - NO MUTATIONS | ||
| - Zero allocations per node | ||
|
|
||
| ## IMPORTANT: Read-Only Analysis Only | ||
| This interface is designed for read-only analysis, and assumes the tree does not change | ||
| as it is being walked. Any modification of the tree could disrupt the walking of the | ||
| tree causing nodes to be missed, visited out of order, or cause panics. | ||
|
|
||
| ## Basic Usage: | ||
|
|
||
| // Option 1: Simple usage with Visit (builds context automatically) | ||
| visitor := NewMyVisitor() | ||
| traverse.Visit(plan, visitor) | ||
|
|
||
| // Option 2: Explicit context for advanced usage | ||
| ctx := traverse.NewPlanContext(plan) | ||
| visitor := NewMyVisitor(ctx) | ||
| traverse.Walk(plan.Relations, visitor) | ||
|
|
||
| // Option 3: With cycle detection for DAG enforcement | ||
| visitor := NewMyVisitor() | ||
| cycleDetector := traverse.NewCycleDetectingVisitor(visitor) | ||
| traverse.Walk(rel, cycleDetector) | ||
|
|
||
| ## Creating Custom Visitors: | ||
|
|
||
| // Visitor that only cares about relations | ||
| type NodeCounter struct { | ||
| count int | ||
| } | ||
|
|
||
| func (v *NodeCounter) VisitRel(rel *proto.Rel) { | ||
| v.count++ | ||
| } | ||
|
|
||
| // Visitor that only cares about expressions (e.g., finding all functions) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Presumably for finding all utilized functions versus checking the extension list. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, I can also push a plan context that will get the extension list. didn't want the one PR to get too big. |
||
| type FunctionCollector struct { | ||
| ctx *PlanContext | ||
| functions []string | ||
| } | ||
|
|
||
| func (v *FunctionCollector) VisitExpr(expr *proto.Expression) { | ||
| // Collect function names | ||
| } | ||
|
|
||
| // Visitor that visits both | ||
| type FullAnalyzer struct{} | ||
|
|
||
| func (v *FullAnalyzer) VisitRel(rel *proto.Rel) { | ||
| // Analyze relations | ||
| } | ||
|
|
||
| func (v *FullAnalyzer) VisitExpr(expr *proto.Expression) { | ||
| // Analyze expressions | ||
| } | ||
| */ | ||
| package traverse | ||
|
|
||
| import ( | ||
| proto "github.com/substrait-io/substrait-protobuf/go/substraitpb" | ||
| ) | ||
|
|
||
| // Visitor is a marker interface for all visitors. | ||
| // Visitors should implement one or more of the following methods: | ||
| // - VisitRel(*proto.Rel) - called for each relation node | ||
| // - VisitExpr(*proto.Expression) - called for each non-literal expression | ||
| // | ||
| // The traversal framework uses type assertions to determine which methods to call. | ||
| type Visitor interface{} | ||
|
|
||
| // RelVisitor is implemented by visitors that want to visit relation nodes. | ||
| type RelVisitor interface { | ||
| VisitRel(rel *proto.Rel) | ||
| } | ||
|
|
||
| // ExprVisitor is implemented by visitors that want to visit expression nodes. | ||
| type ExprVisitor interface { | ||
| VisitExpr(expr *proto.Expression) | ||
| } | ||
|
|
||
| // Visit provides a convenient way to traverse a Substrait plan. | ||
| // It walks all relations in the plan, calling the appropriate visitor methods | ||
| // based on what interfaces the visitor implements. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably worth noting that you have to visit the entire plan. So if you wanted to stop early once you found something you couldn't handle (such as an unsupported type) you couldn't bail early. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you think we should have an ability to exit early? the visitRel/visitExpr could return an "ok" and if any of the visits return !ok we don't traverse deeper. Wouldn't stop all other recursive branches though. |
||
| func Visit(plan *proto.Plan, visitor Visitor) { | ||
| if visitor == nil || plan == nil { | ||
| return | ||
| } | ||
|
|
||
| // Walk all relations in the plan | ||
| for _, rel := range plan.Relations { | ||
| if root := rel.GetRoot(); root != nil { | ||
| walk(root.Input, visitor) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Walk traverses a Substrait relation tree, applying the visitor to each node. | ||
| // The visitor's methods are called based on which interfaces it implements. | ||
| // This function guarantees zero allocations per node visited. | ||
| // | ||
| // If your visitor needs plan context (for extensions, etc.), construct it | ||
| // with NewPlanContext and pass it to your visitor's constructor. | ||
| func Walk(rel *proto.Rel, visitor Visitor) { | ||
|
||
| if visitor == nil { | ||
| return | ||
| } | ||
| walk(rel, visitor) | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some modifications would be reasonably safe with this interface. Here are a few possibilities: https://github.com/voltrondata/spark-substrait-gateway/tree/main/src/transforms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, I can update the description to be less intense and just warn