Skip to content

Commit 7f7eaa4

Browse files
Merge pull request #656 from datastax/SPARKC-142
SPARKC-142: New Defaults Based on Perf Tests
2 parents e804715 + 8258a51 commit 7f7eaa4

File tree

2 files changed

+9
-6
lines changed

2 files changed

+9
-6
lines changed

doc/5_saving.md

+7-4
Original file line numberDiff line numberDiff line change
@@ -126,19 +126,22 @@ collection.saveAsCassandraTableEx(table2, SomeColumns("word", "count"))
126126

127127

128128
## Tuning
129-
The following properties set in `SparkConf` can be used to fine-tune the saving process:
129+
The following properties set in `SparkConf` can be used to fine-tune the saving process,
130+
These values have been set to achieve stability and not performance. Changing these values may
131+
increase your performance based on your workload:
130132

131133
- `spark.cassandra.output.batch.size.rows`: number of rows per single batch; default is 'auto' which means the connector
132134
will adjust the number of rows based on the amount of data in each row
133-
- `spark.cassandra.output.batch.size.bytes`: maximum total size of the batch in bytes; defaults to 16 kB.
135+
- `spark.cassandra.output.batch.size.bytes`: maximum total size of the batch in bytes; defaults to 1 kB.
134136
- `spark.cassandra.output.batch.grouping.key`: determines how insert statements are grouped into batches; available values are:
135137
- `none`: a batch may contain any statements
136138
- `replica_set`: a batch may contain only statements to be written to the same replica set
137139
- `partition` (default): a batch may contain only statements for rows sharing the same partition key value
138140
- `spark.cassandra.output.batch.buffer.size`: how many batches per single Spark task can be stored in memory before sending to Cassandra; default 1000
139141
- `spark.cassandra.output.concurrent.writes`: maximum number of batches executed in parallel by a single Spark task; defaults to 5
140142
- `spark.cassandra.output.consistency.level`: consistency level for writing; defaults to LOCAL_ONE.
141-
- `spark.cassandra.output.throughput_mb_per_sec`: maximum write throughput allowed per single core in MB/s;
142-
throughput limiting needs `spark.cassandra.output.metrics` enabled
143+
- `spark.cassandra.output.throughput_mb_per_sec`: maximum write throughput allowed per single core in MB/s
144+
limit this on long (+8 hour) runs to 70% of your max
145+
throughput as seen on a smaller job for stability
143146

144147
[Next - Customizing the object mapping](6_advanced_mapper.md)

spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/writer/WriteConf.scala

+2-2
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,8 @@ object WriteConf {
7474
)
7575

7676
val DefaultConsistencyLevel = ConsistencyLevel.LOCAL_ONE
77-
val DefaultBatchSizeInBytes = 16 * 1024
78-
val DefaultParallelismLevel = 8
77+
val DefaultBatchSizeInBytes = 1024
78+
val DefaultParallelismLevel = 5
7979
val DefaultBatchGroupingBufferSize = 1000
8080
val DefaultBatchGroupingKey = BatchGroupingKey.Partition
8181
val DefaultThroughputMiBPS = Int.MaxValue

0 commit comments

Comments
 (0)