Skip to content

Compaction containers won't terminate due to exception #6507

@m09526

Description

@m09526

Description / Background

On a large scale Sleeper compaction run, after all compaction tasks should have finished, some containers remain and the logs contain the following error below:

Steps to reproduce

  1. This is seen in compaction tasks on a large table.

Expected behaviour

Compaction containers should terminate when no jobs are present.

Details

This appears to be caused by the main thread hanging on a compaction that never finishes. When the main thread doesn't finish a compaction, the message keep alive thread keeps trying to keep the message from re-appearing back on the queue and eventually reaches its maximum values of 43200. The main thread never terminates, so the compaction task never terminates.

Trace:

[Thread-98] job.action.ChangeMessageVisibilityTimeoutAction INFO - Compaction job xxx: AmazonSQSException changing message visibility timeout to 900 for message with receipt handle xxxxxx (Exception message Value 900 for parameter VisibilityTimeout is invalid. Reason: Total VisibilityTimeout for the message is beyond the limit [43200 seconds]. (Service: Sqs, Status Code: 400, Request ID: xxxxxx) (SDK Attempt Count: 1), stacktrace software.amazon.awssdk.services.sqs.model.SqsException$BuilderImpl.build(SqsException.java:113)
--
  software.amazon.awssdk.services.sqs.model.SqsException$BuilderImpl.build(SqsException.java:61)
  software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.retryPolicyDisallowedRetryException(RetryableStageHelper.java:168)
  software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:73)
  software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
  software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
  software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53)
  software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35)
  software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82)
  software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
  software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
  software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
  software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
  software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
  software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
  software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
  software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
  software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210)
  software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
  software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
  software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
  software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
  software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
  software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
  software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
  software.amazon.awssdk.services.sqs.DefaultSqsClient.changeMessageVisibility(DefaultSqsClient.java:807)
  sleeper.common.job.action.ChangeMessageVisibilityTimeoutAction.call(ChangeMessageVisibilityTimeoutAction.java:59)
  sleeper.common.job.action.thread.PeriodicActionRunnable.run(PeriodicActionRunnable.java:56)
  java.base/java.lang.Thread.run(Thread.java:840))
  Exception in thread "Thread-98" java.lang.RuntimeException: ActionException calling PeriodicActionRunnable sleeper.common.job.action.ChangeMessageVisibilityTimeoutAction@1cfc0a5d
  at sleeper.common.job.action.thread.PeriodicActionRunnable.run(PeriodicActionRunnable.java:58)
  at java.base/java.lang.Thread.run(Thread.java:840)
  Caused by: sleeper.common.job.action.ActionException: Compaction job 6ad5d3e9-8b7f-4456-827a-b01676c24ba0: AmazonSQSException changing message visibility timeout
  at sleeper.common.job.action.ChangeMessageVisibilityTimeoutAction.call(ChangeMessageVisibilityTimeoutAction.java:79)
  at sleeper.common.job.action.thread.PeriodicActionRunnable.run(PeriodicActionRunnable.java:56)
  ... 1 more
  Caused by: software.amazon.awssdk.services.sqs.model.SqsException: Value 900 for parameter VisibilityTimeout is invalid. Reason: Total VisibilityTimeout for the message is beyond the limit [43200 seconds]. (Service: Sqs, Status Code: 400, Request ID: daefb520-ebb0-5bc5-b2c7-d4d33f8f11f4) (SDK Attempt Count: 1) | Caused by: software.amazon.awssdk.services.sqs.model.SqsException: Value 900 for parameter VisibilityTimeout is invalid. Reason: Total VisibilityTimeout for the message is beyond the limit [43200 seconds]. (Service: Sqs, Status Code: 400, Request ID: daefb520-ebb0-5bc5-b2c7-d4d33f8f11f4) (SDK Attempt Count: 1)
  at software.amazon.awssdk.services.sqs.model.SqsException$BuilderImpl.build(SqsException.java:113)
  at software.amazon.awssdk.services.sqs.model.SqsException$BuilderImpl.build(SqsException.java:61)
  at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.retryPolicyDisallowedRetryException(RetryableStageHelper.java:168)
  at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:73)
  at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
  at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
  at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53)
  at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35)
  at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82)
  at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
  at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
  at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
  at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
  at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
  at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
  at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
  at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
  at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210)
  at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
  at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
  at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
  at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
  at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
  at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
  at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
  at software.amazon.awssdk.services.sqs.DefaultSqsClient.changeMessageVisibility(DefaultSqsClient.java:807)
  at sleeper.common.job.action.ChangeMessageVisibilityTimeoutAction.call(ChangeMessageVisibilityTimeoutAction.java:59)
  ... 2 more

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions