-
Notifications
You must be signed in to change notification settings - Fork 45
Description
--class com.scylladb.migrator.Migrator
--master spark://spark-master:7077
--conf spark.driver.host=spark-master
--conf spark.scylla.config=/app/configs/dev_source_data.yaml
/jars/scylla-migrator-assembly.jar
25/09/02 06:39:15 INFO SparkContext: Running Spark version 3.5.1
25/09/02 06:39:15 INFO SparkContext: OS info Linux, 5.11.0-1022-aws, amd64
25/09/02 06:39:15 INFO SparkContext: Java version 11.0.28
25/09/02 06:39:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
25/09/02 06:39:15 INFO ResourceUtils: ==============================================================
25/09/02 06:39:15 INFO ResourceUtils: No custom resources configured for spark.driver.
25/09/02 06:39:15 INFO ResourceUtils: ==============================================================
25/09/02 06:39:15 INFO SparkContext: Submitted application: scylla-migrator
25/09/02 06:39:15 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
25/09/02 06:39:15 INFO ResourceProfile: Limiting resource is cpu
25/09/02 06:39:15 INFO ResourceProfileManager: Added ResourceProfile id: 0
25/09/02 06:39:15 INFO SecurityManager: Changing view acls to: root
25/09/02 06:39:15 INFO SecurityManager: Changing modify acls to: root
25/09/02 06:39:15 INFO SecurityManager: Changing view acls groups to:
25/09/02 06:39:15 INFO SecurityManager: Changing modify acls groups to:
25/09/02 06:39:15 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: root; groups with view permissions: EMPTY; users with modify permissions: root; groups with modify permissions: EMPTY
25/09/02 06:39:16 INFO Utils: Successfully started service 'sparkDriver' on port 38725.
25/09/02 06:39:16 INFO SparkEnv: Registering MapOutputTracker
25/09/02 06:39:16 INFO SparkEnv: Registering BlockManagerMaster
25/09/02 06:39:16 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
25/09/02 06:39:16 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
25/09/02 06:39:16 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/09/02 06:39:16 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-f2e4f45f-1565-4dcf-9d47-3a74244ae731
25/09/02 06:39:16 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
25/09/02 06:39:16 INFO SparkEnv: Registering OutputCommitCoordinator
25/09/02 06:39:16 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
25/09/02 06:39:16 INFO Utils: Successfully started service 'SparkUI' on port 4040.
25/09/02 06:39:16 INFO SparkContext: Added JAR file:/jars/scylla-migrator-assembly.jar at spark://spark-master:38725/jars/scylla-migrator-assembly.jar with timestamp 1756795155682
25/09/02 06:39:16 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
25/09/02 06:39:16 INFO TransportClientFactory: Successfully created connection to spark-master/172.19.0.2:7077 after 22 ms (0 ms spent in bootstraps)
25/09/02 06:39:16 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20250902063916-0001
25/09/02 06:39:16 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20250902063916-0001/0 on worker-20250902063749-172.19.0.3-38139 (172.19.0.3:38139) with 2 core(s)
25/09/02 06:39:16 INFO StandaloneSchedulerBackend: Granted executor ID app-20250902063916-0001/0 on hostPort 172.19.0.3:38139 with 2 core(s), 1024.0 MiB RAM
25/09/02 06:39:16 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20250902063916-0001/1 on worker-20250902063749-172.19.0.4-34803 (172.19.0.4:34803) with 2 core(s)
25/09/02 06:39:16 INFO StandaloneSchedulerBackend: Granted executor ID app-20250902063916-0001/1 on hostPort 172.19.0.4:34803 with 2 core(s), 1024.0 MiB RAM
25/09/02 06:39:16 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35641.
25/09/02 06:39:16 INFO NettyBlockTransferService: Server created on spark-master:35641
25/09/02 06:39:16 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
25/09/02 06:39:16 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-master, 35641, None)
25/09/02 06:39:16 INFO BlockManagerMasterEndpoint: Registering block manager spark-master:35641 with 434.4 MiB RAM, BlockManagerId(driver, spark-master, 35641, None)
25/09/02 06:39:16 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20250902063916-0001/0 is now RUNNING
25/09/02 06:39:16 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20250902063916-0001/1 is now RUNNING
25/09/02 06:39:16 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-master, 35641, None)
25/09/02 06:39:16 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-master, 35641, None)
25/09/02 06:39:16 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
25/09/02 06:39:16 INFO migrator: ScyllaDB Migrator 1.1.0
25/09/02 06:39:17 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
25/09/02 06:39:17 INFO SharedState: Warehouse path is 'file:/spark/spark-warehouse'.
25/09/02 06:39:18 INFO migrator: Loaded config: MigratorConfig(DynamoDBS3Export(dev-ntl,dev-source-data-ddb-export/AWSDynamoDB/01587540-6c8960f7/manifest-summary.json,TableDescription(List(AttributeDefinition(reference_id,S)),List(KeySchema(reference_id,Hash))),None,Some(ap-south-1),None,None),DynamoDB(Some(DynamoDBEndpoint(http://10.123.0.2,8000)),Some(ap-south-1),Some(AWSCredentials(cas..., )),dev_source_data,None,None,true,None),None,Savepoints(300,/app/savepoints),None,None,None)
25/09/02 06:39:18 INFO StandaloneSchedulerBackend$StandaloneDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.19.0.3:59822) with ID 0, ResourceProfileId 0
25/09/02 06:39:18 INFO StandaloneSchedulerBackend$StandaloneDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.19.0.4:36322) with ID 1, ResourceProfileId 0
25/09/02 06:39:18 INFO BlockManagerMasterEndpoint: Registering block manager 172.19.0.3:41961 with 434.4 MiB RAM, BlockManagerId(0, 172.19.0.3, 41961, None)
25/09/02 06:39:18 INFO BlockManagerMasterEndpoint: Registering block manager 172.19.0.4:35467 with 434.4 MiB RAM, BlockManagerId(1, 172.19.0.4, 35467, None)
25/09/02 06:39:20 INFO DynamoDBS3Export: Found DynamoDB S3 export containing 100549 items
25/09/02 06:39:20 WARN alternator: 'streamChanges: true' is not supported when the source is a DynamoDB S3 export.
25/09/02 06:39:20 INFO alternator: We need to transfer: 4 partitions in total
25/09/02 06:39:20 INFO DynamoUtils: Checking for table existence at destination
25/09/02 06:39:20 INFO DynamoUtils: Table tardis_dev_source_data_reference_store_new exists at destination
25/09/02 06:39:20 INFO DynamoDbSavepointsManager: Installing SIGINT/TERM/USR2 handler. Send this to dump the current progress to a savepoint.
25/09/02 06:39:20 INFO DynamoDbSavepointsManager: Starting savepoint schedule; will write a savepoint every 300 seconds
25/09/02 06:39:20 INFO alternator: Starting write...
25/09/02 06:39:20 INFO DynamoUtils: Using AWS region: ap-south-1
25/09/02 06:39:20 INFO DynamoUtils: Using AWS endpoint: http://10.123.0.2:8000
25/09/02 06:39:20 INFO DynamoDB: Setting up Hadoop job to write table using a calculated throughput of 8000 WCU(s)
25/09/02 06:39:20 INFO deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
25/09/02 06:39:20 INFO HadoopMapRedCommitProtocol: Using output committer class org.apache.hadoop.mapred.FileOutputCommitter
25/09/02 06:39:20 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
25/09/02 06:39:20 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
25/09/02 06:39:20 WARN FileOutputCommitter: Output Path is null in setupJob()
25/09/02 06:39:20 INFO SparkContext: Starting job: runJob at SparkHadoopWriter.scala:83
25/09/02 06:39:20 INFO DAGScheduler: Got job 0 (runJob at SparkHadoopWriter.scala:83) with 4 output partitions
25/09/02 06:39:20 INFO DAGScheduler: Final stage: ResultStage 0 (runJob at SparkHadoopWriter.scala:83)
25/09/02 06:39:20 INFO DAGScheduler: Parents of final stage: List()
25/09/02 06:39:20 INFO DAGScheduler: Missing parents: List()
25/09/02 06:39:20 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at map at AlternatorMigrator.scala:36), which has no missing parents
25/09/02 06:39:20 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 104.1 KiB, free 434.3 MiB)
25/09/02 06:39:20 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 37.8 KiB, free 434.3 MiB)
25/09/02 06:39:20 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on spark-master:35641 (size: 37.8 KiB, free: 434.4 MiB)
25/09/02 06:39:20 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1585
25/09/02 06:39:20 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at map at AlternatorMigrator.scala:36) (first 15 tasks are for partitions Vector(0, 1, 2, 3))
25/09/02 06:39:20 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks resource profile 0
25/09/02 06:39:21 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.19.0.4:35467 (size: 37.8 KiB, free: 434.4 MiB)
25/09/02 06:39:21 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.19.0.3:41961 (size: 37.8 KiB, free: 434.4 MiB)
25/09/02 06:39:31 ERROR DynamoDbSavepointsManager: Unable to collect the segments scanned in partition 3. The next savepoint will not include them.
java.lang.Exception: Unexpected partition type: org.apache.spark.rdd.ParallelCollectionPartition.
at com.scylladb.migrator.alternator.DynamoDbSavepointsManager$.$anonfun$inputSplit$1(DynamoDbSavepointsManager.scala:91)
at scala.util.Try$.apply(Try.scala:210)
at com.scylladb.migrator.alternator.DynamoDbSavepointsManager$.inputSplit(DynamoDbSavepointsManager.scala:87)
at com.scylladb.migrator.alternator.DynamoDbSavepointsManager$.com$scylladb$migrator$alternator$DynamoDbSavepointsManager$$scanSegments(DynamoDbSavepointsManager.scala:78)
at com.scylladb.migrator.alternator.DynamoDbSavepointsManager$$anon$1.onTaskEnd(DynamoDbSavepointsManager.scala:55)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1356)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
25/09/02 06:39:31 ERROR DynamoDbSavepointsManager: Unable to collect the segments scanned in partition 2. The next savepoint will not include them.
java.lang.Exception: Unexpected partition type: org.apache.spark.rdd.ParallelCollectionPartition.
at com.scylladb.migrator.alternator.DynamoDbSavepointsManager$.$anonfun$inputSplit$1(DynamoDbSavepointsManager.scala:91)
at scala.util.Try$.apply(Try.scala:210)
at com.scylladb.migrator.alternator.DynamoDbSavepointsManager$.inputSplit(DynamoDbSavepointsManager.scala:87)
at com.scylladb.migrator.alternator.DynamoDbSavepointsManager$.com$scylladb$migrator$alternator$DynamoDbSavepointsManager$$scanSegments(DynamoDbSavepointsManager.scala:78)
at com.scylladb.migrator.alternator.DynamoDbSavepointsManager$$anon$1.onTaskEnd(DynamoDbSavepointsManager.scala:55)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1356)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
25/09/02 06:39:31 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
25/09/02 06:39:31 ERROR DynamoDbSavepointsManager: Unable to collect the segments scanned in partition 1. The next savepoint will not include them.
java.lang.Exception: Unexpected partition type: org.apache.spark.rdd.ParallelCollectionPartition.
at com.scylladb.migrator.alternator.DynamoDbSavepointsManager$.$anonfun$inputSplit$1(DynamoDbSavepointsManager.scala:91)
at scala.util.Try$.apply(Try.scala:210)
at com.scylladb.migrator.alternator.DynamoDbSavepointsManager$.inputSplit(DynamoDbSavepointsManager.scala:87)
at com.scylladb.migrator.alternator.DynamoDbSavepointsManager$.com$scylladb$migrator$alternator$DynamoDbSavepointsManager$$scanSegments(DynamoDbSavepointsManager.scala:78)
at com.scylladb.migrator.alternator.DynamoDbSavepointsManager$$anon$1.onTaskEnd(DynamoDbSavepointsManager.scala:55)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1356)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
25/09/02 06:39:31 ERROR DynamoDbSavepointsManager: Unable to collect the segments scanned in partition 0. The next savepoint will not include them.
java.lang.Exception: Unexpected partition type: org.apache.spark.rdd.ParallelCollectionPartition.
at com.scylladb.migrator.alternator.DynamoDbSavepointsManager$.$anonfun$inputSplit$1(DynamoDbSavepointsManager.scala:91)
at scala.util.Try$.apply(Try.scala:210)
at com.scylladb.migrator.alternator.DynamoDbSavepointsManager$.inputSplit(DynamoDbSavepointsManager.scala:87)
at com.scylladb.migrator.alternator.DynamoDbSavepointsManager$.com$scylladb$migrator$alternator$DynamoDbSavepointsManager$$scanSegments(DynamoDbSavepointsManager.scala:78)
at com.scylladb.migrator.alternator.DynamoDbSavepointsManager$$anon$1.onTaskEnd(DynamoDbSavepointsManager.scala:55)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1356)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
25/09/02 06:39:31 INFO DAGScheduler: ResultStage 0 (runJob at SparkHadoopWriter.scala:83) finished in 10.541 s
25/09/02 06:39:31 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
25/09/02 06:39:31 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
25/09/02 06:39:31 INFO DAGScheduler: Job 0 finished: runJob at SparkHadoopWriter.scala:83, took 10.572354 s
25/09/02 06:39:31 INFO SparkHadoopWriter: Start to commit write Job job_202509020639206049217849139054100_0003.
25/09/02 06:39:31 WARN FileOutputCommitter: Output Path is null in commitJob()
25/09/02 06:39:31 INFO SparkHadoopWriter: Write Job job_202509020639206049217849139054100_0003 committed. Elapsed time: 0 ms.
25/09/02 06:39:31 INFO alternator: Done transferring table snapshot
25/09/02 06:39:31 INFO SparkContext: SparkContext is stopping with exitCode 0.
25/09/02 06:39:31 INFO SparkUI: Stopped Spark web UI at http://localhost:4040
25/09/02 06:39:31 INFO StandaloneSchedulerBackend: Shutting down all executors
25/09/02 06:39:31 INFO StandaloneSchedulerBackend$StandaloneDriverEndpoint: Asking each executor to shut down
25/09/02 06:39:31 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
25/09/02 06:39:31 INFO MemoryStore: MemoryStore cleared
25/09/02 06:39:31 INFO BlockManager: BlockManager stopped
25/09/02 06:39:31 INFO BlockManagerMaster: BlockManagerMaster stopped
25/09/02 06:39:31 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
25/09/02 06:39:31 INFO SparkContext: Successfully stopped SparkContext
25/09/02 06:39:31 INFO ShutdownHookManager: Shutdown hook called
25/09/02 06:39:31 INFO ShutdownHookManager: Deleting directory /tmp/spark-3959d8ba-de80-4806-a27a-5df6e19f2407
25/09/02 06:39:31 INFO ShutdownHookManager: Deleting directory /tmp/spark-138b4ff6-bf70-49bc-a171-936b5e1f1751