Skip to content

Limit statements cannot finish quickly with LimitPushDown in fault tolerant execution #19450

Open
@hackeryang

Description

@hackeryang

Hello dear community, I executed a query with the fault tolerant execution mode in Trino 423:

trino:ta> set session retry_policy='TASK';
SET SESSION
trino:ta> select * from ta_event_2976 limit 100;

Query 20231019_112636_05204_pfmms, FAILED, 4 nodes
http://ta3:8080/ui/query.html?20231019_112636_05204_pfmms
Splits: 36,352 total, 34,533 done (NaN%)
CPU Time: 119.7s total,   943 rows/s, 2.36MB/s, 2% active
Per Node: 0.3 parallelism,   245 rows/s,  631KB/s
Parallelism: 1.0
Peak Memory: 155MB
1:55 [113K rows, 283MB] [983 rows/s, 2.47MB/s]

Query aborted by user

However, I found that this query didn't finish quickly with 100 rows, and try to scan over the whole table.

I roughly understand the cause of this phenomenon, it may be because the MPP architecture needs the coordinator(i.e. Stage 0) to decide the termination of splits scheduling, and the fault tolerant execution mode won't let the coordinator see 100 rows immediately just like the streaming pipeline execution before.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions