Skip to content

Conversation

@universalmind303
Copy link
Member

Changes Made

checks if the recordbatch/micropartition is empty. if so, skips trying to evaluate.

Also adds Clone to micropartition. All of it's contents are Clone and are mostly Arc'd so this is a very cheap clone

Related Issues

@github-actions github-actions bot added the perf label Jan 7, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 7, 2026

Greptile Summary

Adds performance optimization to skip expression evaluation on empty record batches and micropartitions when no aggregation expressions are present. Also adds Clone derive to MicroPartition for efficient cloning via Arc-wrapped fields.

Key Changes:

  • Early returns self.clone() when batch is empty and expressions contain no aggregations
  • Correctly preserves aggregation semantics where empty tables with aggregations produce unit-length results (e.g., COUNT(*) on empty → 1 row with 0)
  • Applied across sync, async, and parallel evaluation methods

Issue Found:

  • Inconsistent aggregation detection in eval_expression_list_async (line 50) uses shallow matches! check instead of recursive has_agg, potentially missing nested aggregations like col("x") + COUNT(*)

Confidence Score: 4/5

  • Safe to merge after fixing the aggregation detection bug in async method
  • The optimization is sound and correctly guards against breaking aggregation semantics in most places, but the inconsistent aggregation check in the async method could cause incorrect behavior with nested aggregation expressions
  • Pay close attention to src/daft-micropartition/src/ops/eval_expressions.rs:50 - the aggregation detection needs to be consistent with other methods

Important Files Changed

Filename Overview
src/daft-micropartition/src/ops/eval_expressions.rs adds empty check optimization with aggregation guard, but async version has inconsistent aggregation detection
src/daft-recordbatch/src/lib.rs adds empty check optimization with correct aggregation detection across all methods

Sequence Diagram

sequenceDiagram
    participant Client
    participant PyMicroPartition
    participant MicroPartition
    participant RecordBatch
    participant ExprEval as Expression Evaluator
    
    Client->>PyMicroPartition: eval_expression_list(exprs)
    PyMicroPartition->>PyMicroPartition: Check if empty && no aggs
    alt Empty with no aggregations
        PyMicroPartition-->>Client: Return clone (skip eval)
    else Has data or aggregations
        PyMicroPartition->>MicroPartition: eval_expression_list(exprs)
        MicroPartition->>MicroPartition: Check if empty && no aggs
        alt Empty with no aggregations
            MicroPartition-->>PyMicroPartition: Return clone (skip eval)
        else Has data or aggregations
            loop For each RecordBatch
                MicroPartition->>RecordBatch: eval_expression_list(exprs)
                RecordBatch->>RecordBatch: Check if empty && no aggs
                alt Empty with no aggregations
                    RecordBatch-->>MicroPartition: Return clone (skip eval)
                else Has data or aggregations
                    loop For each expression
                        RecordBatch->>ExprEval: eval_expression(expr)
                        ExprEval-->>RecordBatch: Series result
                    end
                    RecordBatch->>RecordBatch: process_eval_results()
                    Note over RecordBatch: Handles aggregation semantics:<br/>empty + agg → unit length result
                    RecordBatch-->>MicroPartition: Evaluated RecordBatch
                end
            end
            MicroPartition-->>PyMicroPartition: Evaluated MicroPartition
        end
        PyMicroPartition-->>Client: Result
    end
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

@universalmind303
Copy link
Member Author

@greptileai

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants