Skip to content

Track individual expr's execution time in ProjectExec metrics (in EXPLAIN ANALYZE) #18456

@2010YOUY01

Description

@2010YOUY01

Is your feature request related to a problem or challenge?

Originally suggested by @simonvandel in #18217 (comment)

ProjectionExec currently shows the total elapsed time for all expressions. Would it make sense to show timings for each individual expression? Perhaps we could do it in VERBOSE mode

Given query (executed in datafusion-cli:

> set datafusion.explain.analyze_level = summary;
0 row(s) fetched.
Elapsed 0.001 seconds.

> explain analyze select pow(v1,2), abs(round(sqrt(v1)))
from generate_series(1000) as t1(v1);
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                                                          |
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | ProjectionExec: expr=[power(v1@0, 2) as pow(t1.v1,Int64(2)), abs(round(sqrt(CAST(v1@0 AS Float64)))) as abs(round(sqrt(t1.v1)))], metrics=[output_rows=1001, elapsed_compute=167.513µs, output_bytes=15.8 KB] |
|                   |   RepartitionExec: partitioning=RoundRobinBatch(14), input_partitions=1, metrics=[spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, fetch_time=48.126µs, repartition_time=1ns, send_time=6.055µs]           |
|                   |     ProjectionExec: expr=[value@0 as v1], metrics=[output_rows=1001, elapsed_compute=2.458µs, output_bytes=64.0 KB]                                                                                           |
|                   |       LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=0, end=1000, batch_size=8192], metrics=[output_rows=1001, elapsed_compute=35.208µs, output_bytes=64.0 KB]                        |
|                   |                                                                                                                                                                                                               |
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.006 seconds.

In ProjectExec's metric, we can track each expressions evaluation time separately like pow(v1,2): 10ms, abs(round(sqrt(v1))): 30ms, this should be easy to implement.

Additionally, we can also implement fine-grained execution time tracking for nested expression like sqrt(v1): 5ms, round(sqrt(v1)): 10ms, abs(round(sqrt(v1))): 15ms.
I have briefly checked the implementation, it requires a) public API change that enables metrics tracking for Exprs b) Update implementation for expressions.
I think we don't have to implement that for all builtin Exprs, the math expressions as showed in the example are usually very fast, however there are some expressions that are slower to evaluate, and regularly cause performance issue (e.g. array_ functions, regex functions), it can be helpful to track execution time for them.

Describe the solution you'd like

  • Implement per-expression metrics tracking in ProjectionExec
  • Support fine-grained execution time metric for individual Exprs in a nested expression.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions