-
Notifications
You must be signed in to change notification settings - Fork 381
perf: dont eval empty recordbatches #5968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Greptile SummaryAdds performance optimization to skip expression evaluation on empty record batches and micropartitions when no aggregation expressions are present. Also adds Key Changes:
Issue Found:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant PyMicroPartition
participant MicroPartition
participant RecordBatch
participant ExprEval as Expression Evaluator
Client->>PyMicroPartition: eval_expression_list(exprs)
PyMicroPartition->>PyMicroPartition: Check if empty && no aggs
alt Empty with no aggregations
PyMicroPartition-->>Client: Return clone (skip eval)
else Has data or aggregations
PyMicroPartition->>MicroPartition: eval_expression_list(exprs)
MicroPartition->>MicroPartition: Check if empty && no aggs
alt Empty with no aggregations
MicroPartition-->>PyMicroPartition: Return clone (skip eval)
else Has data or aggregations
loop For each RecordBatch
MicroPartition->>RecordBatch: eval_expression_list(exprs)
RecordBatch->>RecordBatch: Check if empty && no aggs
alt Empty with no aggregations
RecordBatch-->>MicroPartition: Return clone (skip eval)
else Has data or aggregations
loop For each expression
RecordBatch->>ExprEval: eval_expression(expr)
ExprEval-->>RecordBatch: Series result
end
RecordBatch->>RecordBatch: process_eval_results()
Note over RecordBatch: Handles aggregation semantics:<br/>empty + agg → unit length result
RecordBatch-->>MicroPartition: Evaluated RecordBatch
end
end
MicroPartition-->>PyMicroPartition: Evaluated MicroPartition
end
PyMicroPartition-->>Client: Result
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, 6 comments
…ersalmind303/no-eval-on-empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, 1 comment
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Changes Made
checks if the recordbatch/micropartition is empty. if so, skips trying to evaluate.
Also adds
Cloneto micropartition. All of it's contents areCloneand are mostlyArc'd so this is a very cheap cloneRelated Issues