-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Description / Background
On a large scale Sleeper compaction run, after all compaction tasks should have finished, some containers remain and the logs contain the following error below:
Steps to reproduce
- This is seen in compaction tasks on a large table.
Expected behaviour
Compaction containers should terminate when no jobs are present.
Details
This appears to be caused by the main thread hanging on a compaction that never finishes. When the main thread doesn't finish a compaction, the message keep alive thread keeps trying to keep the message from re-appearing back on the queue and eventually reaches its maximum values of 43200. The main thread never terminates, so the compaction task never terminates.
Trace:
[Thread-98] job.action.ChangeMessageVisibilityTimeoutAction INFO - Compaction job xxx: AmazonSQSException changing message visibility timeout to 900 for message with receipt handle xxxxxx (Exception message Value 900 for parameter VisibilityTimeout is invalid. Reason: Total VisibilityTimeout for the message is beyond the limit [43200 seconds]. (Service: Sqs, Status Code: 400, Request ID: xxxxxx) (SDK Attempt Count: 1), stacktrace software.amazon.awssdk.services.sqs.model.SqsException$BuilderImpl.build(SqsException.java:113)
--
software.amazon.awssdk.services.sqs.model.SqsException$BuilderImpl.build(SqsException.java:61)
software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.retryPolicyDisallowedRetryException(RetryableStageHelper.java:168)
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:73)
software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53)
software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35)
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82)
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210)
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
software.amazon.awssdk.services.sqs.DefaultSqsClient.changeMessageVisibility(DefaultSqsClient.java:807)
sleeper.common.job.action.ChangeMessageVisibilityTimeoutAction.call(ChangeMessageVisibilityTimeoutAction.java:59)
sleeper.common.job.action.thread.PeriodicActionRunnable.run(PeriodicActionRunnable.java:56)
java.base/java.lang.Thread.run(Thread.java:840))
Exception in thread "Thread-98" java.lang.RuntimeException: ActionException calling PeriodicActionRunnable sleeper.common.job.action.ChangeMessageVisibilityTimeoutAction@1cfc0a5d
at sleeper.common.job.action.thread.PeriodicActionRunnable.run(PeriodicActionRunnable.java:58)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: sleeper.common.job.action.ActionException: Compaction job 6ad5d3e9-8b7f-4456-827a-b01676c24ba0: AmazonSQSException changing message visibility timeout
at sleeper.common.job.action.ChangeMessageVisibilityTimeoutAction.call(ChangeMessageVisibilityTimeoutAction.java:79)
at sleeper.common.job.action.thread.PeriodicActionRunnable.run(PeriodicActionRunnable.java:56)
... 1 more
Caused by: software.amazon.awssdk.services.sqs.model.SqsException: Value 900 for parameter VisibilityTimeout is invalid. Reason: Total VisibilityTimeout for the message is beyond the limit [43200 seconds]. (Service: Sqs, Status Code: 400, Request ID: daefb520-ebb0-5bc5-b2c7-d4d33f8f11f4) (SDK Attempt Count: 1) | Caused by: software.amazon.awssdk.services.sqs.model.SqsException: Value 900 for parameter VisibilityTimeout is invalid. Reason: Total VisibilityTimeout for the message is beyond the limit [43200 seconds]. (Service: Sqs, Status Code: 400, Request ID: daefb520-ebb0-5bc5-b2c7-d4d33f8f11f4) (SDK Attempt Count: 1)
at software.amazon.awssdk.services.sqs.model.SqsException$BuilderImpl.build(SqsException.java:113)
at software.amazon.awssdk.services.sqs.model.SqsException$BuilderImpl.build(SqsException.java:61)
at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.retryPolicyDisallowedRetryException(RetryableStageHelper.java:168)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:73)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:53)
at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:35)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:82)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:43)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:210)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74)
at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53)
at software.amazon.awssdk.services.sqs.DefaultSqsClient.changeMessageVisibility(DefaultSqsClient.java:807)
at sleeper.common.job.action.ChangeMessageVisibilityTimeoutAction.call(ChangeMessageVisibilityTimeoutAction.java:59)
... 2 more
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working