Skip to content

Commit 5da74e1

Browse files
author
Artem
committed
Merge branch 'parquet-savepoint-rebased' of github.com:pizzaeueu/scylla-migrator into parquet-savepoint-rebased
2 parents cbb0cae + 2571dae commit 5da74e1

File tree

3 files changed

+12
-2
lines changed

3 files changed

+12
-2
lines changed

migrator/src/main/scala/com/scylladb/migrator/alternator/StringSetAccumulator.scala

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,16 @@ package com.scylladb.migrator.alternator
33
import org.apache.spark.util.AccumulatorV2
44
import java.util.concurrent.atomic.AtomicReference
55

6+
/**
7+
* Accumulator for tracking processed Parquet file paths during migration.
8+
*
9+
* This accumulator collects the set of Parquet file paths that have been processed
10+
* as part of a migration job. It is useful for monitoring progress, avoiding duplicate
11+
* processing, and debugging migration workflows. The accumulator is thread-safe and
12+
* can be used in distributed Spark jobs.
13+
*
14+
* @param initialValue The initial set of processed file paths (usually empty).
15+
*/
616
class StringSetAccumulator(initialValue: Set[String] = Set.empty)
717
extends AccumulatorV2[String, Set[String]] {
818

migrator/src/main/scala/com/scylladb/migrator/readers/FileCompletionListener.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ import scala.collection.concurrent.TrieMap
1313
* partitions and files. When all partitions belonging to a file have been successfully
1414
* completed, it marks the file as processed via the ParquetSavepointsManager.
1515
*
16-
* @param partitionToFile Mapping from Spark partition ID to source file paths
16+
* @param partitionToFiles Mapping from Spark partition ID to source file paths
1717
* @param fileToPartitions Mapping from file path to the set of partition IDs reading from it
1818
* @param savepointsManager Manager to notify when files are completed
1919
*/

migrator/src/main/scala/com/scylladb/migrator/readers/PartitionMetadataReader.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ case class PartitionMetadata(
1212
/**
1313
* This reader uses Spark's internal partition information to build mappings
1414
* between partition IDs and file paths. This allows us to track when all
15-
* partitions of a file have been processed, enabling file-level savepointse.
15+
* partitions of a file have been processed, enabling file-level savepoints.
1616
*/
1717
object PartitionMetadataReader {
1818
private val logger = LogManager.getLogger("com.scylladb.migrator.readers.PartitionMetadataReader")

0 commit comments

Comments
 (0)