Skip to content

Query failure due to GC time increase #25543

Open
@BalaMahesh

Description

@BalaMahesh

Hello folks,
We are on trino 466, since last few weeks one of our etl query started failing with the error. However, when we split this query into two parts(50% data in each part), it is finishing.

io.trino.spi.TrinoException: Expected response code from http://10.205.32.4:8080/v1/task/20250409_025418_00147_45wcu.3.76.0/status to be 200, but was 408 Error 408 Timeout: Timed out (timeout delayed by 349 ms after scheduled time): AsyncCatchingFuture@155e778e[status=SUCCESS, result=[io.trino.execution.TaskStatus@44f967b8]] at io.trino.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:62) at io.trino.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27) at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1137) at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:79) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1575)

After reviewing the metrics, we observed that G1 Old Generation GC pauses became significantly longer while that specific query was running.

Image

Below are our configs. Please check the details and help us identifying the cause and possible fix for this problem.

  resourcesWorker:
  limits:
    cpu: "15"
    memory: 115Gi
  requests:
    cpu: "15"
    memory: 115Gi

jvmConfig: |-
  -server
  -Xmx105G
  -XX:+UseG1GC
  -XX:G1HeapRegionSize=32M
  -XX:+UseGCOverheadLimit
  -XX:+ExplicitGCInvokesConcurrent
  -XX:+HeapDumpOnOutOfMemoryError
  -XX:+ExitOnOutOfMemoryError
  -XX:+UnlockDiagnosticVMOptions
  -XX:G1NumCollectionsKeepPinned=10000000
  --add-opens=java.base/java.nio=ALL-UNNAMED
  -Djdk.attach.allowAttachSelf=true

configProperties: |-
  coordinator=false
  iterative-optimizer-timeout=10m
  http-server.http.port=8080
  query.max-memory=10000GB
  query.max-memory-per-node=85GB
  memory.heap-headroom-per-node=20GB
  discovery.uri=http://ocd-trino-engg:8080
  spill-enabled=true
  spiller-spill-path=/usr/lib/trino/plugin/trino-udfs
  spill-compression-codec=ZSTD
  spiller-max-used-space-threshold=0.95
  query-max-spill-per-node=10000GB
  max-spill-per-node=10000GB

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions