Open
Description
Problem
How much memory does a spark-dependencies job take while handling about 12Gb data index?
I am totally new to the spark project and I have tried serval times to run a spark-dependencies job to create the DAG.
It always came with the error below even though I have adjusted the memory limit to about 28Gi.
21/02/03 08:18:39 ERROR TaskSetManager: Task 3 in stage 1.0 failed 1 times; aborting job
21/02/03 08:18:39 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 5, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 278220 ms
21/02/03 08:18:39 WARN TaskSetManager: Lost task 2.0 in stage 1.0 (TID 7, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 278220 ms
21/02/03 08:18:39 WARN TaskSetManager: Lost task 4.0 in stage 1.0 (TID 9, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 278220 ms
21/02/03 08:18:39 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 6, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 278220 ms
21/02/03 08:18:39 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_1_piece0 !
21/02/03 08:18:39 WARN NettyRpcEnv: Ignored message: HeartbeatResponse(true)
21/02/03 08:18:39 WARN SparkContext: Killing executors is not supported by current scheduler.
21/02/03 08:18:39 WARN Executor: Issue communicating with driver in heartbeater
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:785)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1988)
at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:814)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Could not find HeartbeatReceiver.
at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:160)
at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135)
at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229)
at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523)
at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91)
... 13 more
21/02/03 08:18:49 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 7,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
at sun.reflect.GeneratedSerializationConstructorAccessor28.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.io.ObjectStreamClass.newInstance(ObjectStreamClass.java:1102)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2110)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2032)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:465)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:158)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:153)
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:41)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:90)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
21/02/03 08:18:49 ERROR TaskSchedulerImpl: Ignoring update with state FAILED for TID 7 because its task set is gone (this is likely the result of receiving duplicate task finished status updates) or its executor has been marked as failed.
Sometimes even a copyOfRange
error occurs.
21/02/03 07:55:00 ERROR TaskSchedulerImpl: Lost executor driver on localhost: Executor heartbeat timed out after 148649 ms
21/02/03 07:55:00 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 5)
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3496)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3283)
at java.io.ObjectInputStream.readString(ObjectInputStream.java:1962)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1607)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2032)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:465)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:158)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
21/02/03 07:55:00 WARN TaskSetManager: Lost task 3.0 in stage 1.0 (TID 8, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 148649 ms
21/02/03 07:55:00 ERROR TaskSetManager: Task 3 in stage 1.0 failed 1 times; aborting job
21/02/03 07:55:00 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 5, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 148649 ms
21/02/03 07:55:00 WARN TaskSetManager: Lost task 2.0 in stage 1.0 (TID 7, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 148649 ms
21/02/03 07:55:00 WARN TaskSetManager: Lost task 4.0 in stage 1.0 (TID 9, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 148649 ms
21/02/03 07:55:00 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 6, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 148649 ms
21/02/03 07:55:00 WARN NettyRpcEnv: Ignored message: true
21/02/03 07:55:00 WARN SparkContext: Killing executors is not supported by current scheduler.
21/02/03 07:55:00 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_1_piece0 !
Environment
spark job configuration
javaOpts: -Xms12g -Xmx20g
resources:
limits:
cpu: "7"
memory: 28Gi
requests:
cpu: "4"
memory: 20Gi
ES data size
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open jaeger--jaeger-span-2021-02-03 hhLqvs-5RT2xxxxxxxxx 5 1 193532073 0 11.4gb 6gb
Is there a way to solve this problem not by adding the memory limit? or it is just a usage problem that I have
Any suggestions or tips would be greatly appreciated.
Metadata
Assignees
Labels
No labels