Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions traverse/multi_visitor.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
// SPDX-License-Identifier: Apache-2.0
package traverse

import (
proto "github.com/substrait-io/substrait-protobuf/go/substraitpb"
)

// MultiVisitor combines multiple visitors into a single visitor.
// All visitors are called for each node they support.
type MultiVisitor struct {
visitors []Visitor
}

// NewMultiVisitor creates a visitor that delegates to multiple visitors.
func NewMultiVisitor(visitors ...Visitor) *MultiVisitor {
return &MultiVisitor{visitors: visitors}
}

// VisitRel calls VisitRel on all visitors that implement RelVisitor.
func (m *MultiVisitor) VisitRel(rel *proto.Rel) {
for _, v := range m.visitors {
if rv, ok := v.(RelVisitor); ok {
rv.VisitRel(rel)
}
}
}

// VisitExpr calls VisitExpr on all visitors that implement ExprVisitor.
func (m *MultiVisitor) VisitExpr(expr *proto.Expression) {
for _, v := range m.visitors {
if ev, ok := v.(ExprVisitor); ok {
ev.VisitExpr(expr)
}
}
}
117 changes: 117 additions & 0 deletions traverse/visitor.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
// SPDX-License-Identifier: Apache-2.0
/*
Package traverse provides a visitor framework for analyzing Substrait plans
with multiple read-only operations in a single tree traversal.

## Key Features:
- Interface-based visitor pattern
- Single tree walk for multiple analyses
- Sequential execution
- Read-only operations - NO MUTATIONS
- Zero allocations per node

## IMPORTANT: Read-Only Analysis Only
This interface is designed for read-only analysis, and assumes the tree does not change
as it is being walked. Any modification of the tree could disrupt the walking of the
tree causing nodes to be missed, visited out of order, or cause panics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some modifications would be reasonably safe with this interface. Here are a few possibilities: https://github.com/voltrondata/spark-substrait-gateway/tree/main/src/transforms

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, I can update the description to be less intense and just warn


## Basic Usage:

// Option 1: Simple usage with Visit (builds context automatically)
visitor := NewMyVisitor()
traverse.Visit(plan, visitor)

// Option 2: Explicit context for advanced usage
ctx := traverse.NewPlanContext(plan)
visitor := NewMyVisitor(ctx)
traverse.Walk(plan.Relations, visitor)

// Option 3: With cycle detection for DAG enforcement
visitor := NewMyVisitor()
cycleDetector := traverse.NewCycleDetectingVisitor(visitor)
traverse.Walk(rel, cycleDetector)

## Creating Custom Visitors:

// Visitor that only cares about relations
type NodeCounter struct {
count int
}

func (v *NodeCounter) VisitRel(rel *proto.Rel) {
v.count++
}

// Visitor that only cares about expressions (e.g., finding all functions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably for finding all utilized functions versus checking the extension list.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I can also push a plan context that will get the extension list. didn't want the one PR to get too big.

type FunctionCollector struct {
ctx *PlanContext
functions []string
}

func (v *FunctionCollector) VisitExpr(expr *proto.Expression) {
// Collect function names
}

// Visitor that visits both
type FullAnalyzer struct{}

func (v *FullAnalyzer) VisitRel(rel *proto.Rel) {
// Analyze relations
}

func (v *FullAnalyzer) VisitExpr(expr *proto.Expression) {
// Analyze expressions
}
*/
package traverse

import (
proto "github.com/substrait-io/substrait-protobuf/go/substraitpb"
)

// Visitor is a marker interface for all visitors.
// Visitors should implement one or more of the following methods:
// - VisitRel(*proto.Rel) - called for each relation node
// - VisitExpr(*proto.Expression) - called for each non-literal expression
//
// The traversal framework uses type assertions to determine which methods to call.
type Visitor interface{}

// RelVisitor is implemented by visitors that want to visit relation nodes.
type RelVisitor interface {
VisitRel(rel *proto.Rel)
}

// ExprVisitor is implemented by visitors that want to visit expression nodes.
type ExprVisitor interface {
VisitExpr(expr *proto.Expression)
}

// Visit provides a convenient way to traverse a Substrait plan.
// It walks all relations in the plan, calling the appropriate visitor methods
// based on what interfaces the visitor implements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth noting that you have to visit the entire plan. So if you wanted to stop early once you found something you couldn't handle (such as an unsupported type) you couldn't bail early.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should have an ability to exit early? the visitRel/visitExpr could return an "ok" and if any of the visits return !ok we don't traverse deeper. Wouldn't stop all other recursive branches though.

func Visit(plan *proto.Plan, visitor Visitor) {
if visitor == nil || plan == nil {
return
}

Check warning on line 96 in traverse/visitor.go

View check run for this annotation

Codecov / codecov/patch

traverse/visitor.go#L95-L96

Added lines #L95 - L96 were not covered by tests

// Walk all relations in the plan
for _, rel := range plan.Relations {
if root := rel.GetRoot(); root != nil {
walk(root.Input, visitor)
}
}
}

// Walk traverses a Substrait relation tree, applying the visitor to each node.
// The visitor's methods are called based on which interfaces it implements.
// This function guarantees zero allocations per node visited.
//
// If your visitor needs plan context (for extensions, etc.), construct it
// with NewPlanContext and pass it to your visitor's constructor.
func Walk(rel *proto.Rel, visitor Visitor) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use this to start a visit at any point in the tree? An alternative name could be VisitRelation since it also visits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, yeah I like that name more. Will update.

if visitor == nil {
return
}
walk(rel, visitor)
}
Loading
Loading