Skip to content

Potential data corruption with recoverable_grouped_execution enabled #21201

@arhimondr

Description

@arhimondr

Your Environment

  • Presto version used: 0.285-SNAPSHOT
  • Storage (HDFS/S3/GCS..): Local
  • Data source and connector used: Hive
  • Deployment (Cloud or On-prem): Local

Expected Behavior

When recovery happens correct output table content is expected

Current Behavior

Incorrect table content

Possible Solution

Disable recoverable_grouped_execution to mitigate

Steps to Reproduce

  1. Failure is injected right before commit with some probability: https://github.com/arhimondr/presto/commit/872012a1b99681bc869c0105b39f73ac2b3e3e00#diff-90f667b9e48d15ad2dce1189ca464e382be5b4bb0ee94a4e3ab5ce2ea0ebac31R296
  2. Test is modified to rely on injected failure instead of making workers unreponsive: https://github.com/arhimondr/presto/commit/872012a1b99681bc869c0105b39f73ac2b3e3e00#diff-b0e66057cedfe54a588d3d28a8c274dbeeeaf805218c73a96aac251e0442b1cbR370
  3. When failure occurs output table sometimes contains incorrect data:
2023-10-20T11:50:33.272-0500	ERROR	SplitRunner-9-200	com.facebook.presto.execution.executor.TaskExecutor	Error processing Split 20231020_165028_00012_mkv4v.1.0.0.0-32  (start = 2.6082186029375E8, wall = 178 ms, cpu = 2 ms, wait = 1 ms, calls = 2): REMOTE_TASK_ERROR: This is injected recoverable writer error

2023-10-20T11:50:34.339-0500	ERROR	main	com.facebook.presto.hive.TestHiveRecoverableExecution	Query with recovery took 5716ms



java.lang.AssertionError:

Expected :15000

Actual   :13847

<Click to see difference>





	at org.testng.Assert.fail(Assert.java:110)

	at org.testng.Assert.failNotEquals(Assert.java:1413)

	at org.testng.Assert.assertEqualsImpl(Assert.java:149)

	at org.testng.Assert.assertEquals(Assert.java:131)

	at org.testng.Assert.assertEquals(Assert.java:655)

	at org.testng.Assert.assertEquals(Assert.java:665)

	at com.facebook.presto.hive.TestHiveRecoverableExecution.testRecoverableGroupedExecution(TestHiveRecoverableExecution.java:400)

	at com.facebook.presto.hive.TestHiveRecoverableExecution.testInsertBucketedTable(TestHiveRecoverableExecution.java:197)

	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

	at java.lang.reflect.Method.invoke(Method.java:498)

	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:135)

	at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:673)

	at org.testng.internal.invokers.TestInvoker.invokeTestMethod(TestInvoker.java:220)

	at org.testng.internal.invokers.MethodRunner.runInSequence(MethodRunner.java:50)

	at org.testng.internal.invokers.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:945)

	at org.testng.internal.invokers.TestInvoker.invokeTestMethods(TestInvoker.java:193)

	at org.testng.internal.invokers.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)

	at org.testng.internal.invokers.TestMethodWorker.run(TestMethodWorker.java:128)

	at java.util.ArrayList.forEach(ArrayList.java:1257)

	at org.testng.TestRunner.privateRun(TestRunner.java:808)

	at org.testng.TestRunner.run(TestRunner.java:603)

	at org.testng.SuiteRunner.runTest(SuiteRunner.java:429)

	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:423)

	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:383)

	at org.testng.SuiteRunner.run(SuiteRunner.java:326)

	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)

	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:95)

	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1249)

	at org.testng.TestNG.runSuitesLocally(TestNG.java:1169)

	at org.testng.TestNG.runSuites(TestNG.java:1092)

	at org.testng.TestNG.run(TestNG.java:1060)

	at com.intellij.rt.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:66)

	at com.intellij.rt.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:105)

Screenshots (if appropriate)

Context

Discovered when implementing recoverable_grouped_execution support in Prestissimo

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    🆕 Unprioritized

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions