Skip to content

spark-dependencies OutOfMemoryError #106

Open
@RJ0222

Description

Problem

How much memory does a spark-dependencies job take while handling about 12Gb data index?

I am totally new to the spark project and I have tried serval times to run a spark-dependencies job to create the DAG.

It always came with the error below even though I have adjusted the memory limit to about 28Gi.

21/02/03 08:18:39 ERROR TaskSetManager: Task 3 in stage 1.0 failed 1 times; aborting job
21/02/03 08:18:39 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 5, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 278220 ms
21/02/03 08:18:39 WARN TaskSetManager: Lost task 2.0 in stage 1.0 (TID 7, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 278220 ms
21/02/03 08:18:39 WARN TaskSetManager: Lost task 4.0 in stage 1.0 (TID 9, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 278220 ms
21/02/03 08:18:39 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 6, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 278220 ms
21/02/03 08:18:39 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_1_piece0 !
21/02/03 08:18:39 WARN NettyRpcEnv: Ignored message: HeartbeatResponse(true)
21/02/03 08:18:39 WARN SparkContext: Killing executors is not supported by current scheduler.
21/02/03 08:18:39 WARN Executor: Issue communicating with driver in heartbeater
org.apache.spark.SparkException: Exception thrown in awaitResult:
	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
	at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
	at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:785)
	at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:814)
	at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
	at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:814)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1988)
	at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:814)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Could not find HeartbeatReceiver.
	at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:160)
	at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135)
	at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229)
	at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523)
	at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91)
	... 13 more
21/02/03 08:18:49 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 7,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at sun.reflect.GeneratedSerializationConstructorAccessor28.newInstance(Unknown Source)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at java.io.ObjectStreamClass.newInstance(ObjectStreamClass.java:1102)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2110)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2032)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1613)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:465)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:158)
	at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
	at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)
	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:153)
	at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:41)
	at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:90)
	at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:105)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
21/02/03 08:18:49 ERROR TaskSchedulerImpl: Ignoring update with state FAILED for TID 7 because its task set is gone (this is likely the result of receiving duplicate task finished status updates) or its executor has been marked as failed.

Sometimes even a copyOfRange error occurs.

21/02/03 07:55:00 ERROR TaskSchedulerImpl: Lost executor driver on localhost: Executor heartbeat timed out after 148649 ms
21/02/03 07:55:00 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 5)
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.Arrays.copyOfRange(Arrays.java:3664)
	at java.lang.String.<init>(String.java:207)
	at java.lang.StringBuilder.toString(StringBuilder.java:407)
	at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3496)
	at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3283)
	at java.io.ObjectInputStream.readString(ObjectInputStream.java:1962)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1607)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2032)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1613)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2344)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2268)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2126)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1625)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:465)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:423)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:158)
	at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:188)
	at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:185)
	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:438)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
21/02/03 07:55:00 WARN TaskSetManager: Lost task 3.0 in stage 1.0 (TID 8, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 148649 ms
21/02/03 07:55:00 ERROR TaskSetManager: Task 3 in stage 1.0 failed 1 times; aborting job
21/02/03 07:55:00 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 5, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 148649 ms
21/02/03 07:55:00 WARN TaskSetManager: Lost task 2.0 in stage 1.0 (TID 7, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 148649 ms
21/02/03 07:55:00 WARN TaskSetManager: Lost task 4.0 in stage 1.0 (TID 9, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 148649 ms
21/02/03 07:55:00 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 6, localhost, executor driver): ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 148649 ms
21/02/03 07:55:00 WARN NettyRpcEnv: Ignored message: true
21/02/03 07:55:00 WARN SparkContext: Killing executors is not supported by current scheduler.
21/02/03 07:55:00 WARN BlockManagerMasterEndpoint: No more replicas available for broadcast_1_piece0 !

Environment

spark job configuration

javaOpts: -Xms12g -Xmx20g
resources:
  limits:
    cpu: "7"
    memory: 28Gi
  requests:
    cpu: "4"
    memory: 20Gi

ES data size

health status index                                  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   jaeger--jaeger-span-2021-02-03         hhLqvs-5RT2xxxxxxxxx   5   1  193532073            0     11.4gb            6gb

Is there a way to solve this problem not by adding the memory limit? or it is just a usage problem that I have

Any suggestions or tips would be greatly appreciated.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions