@@ -126,19 +126,22 @@ collection.saveAsCassandraTableEx(table2, SomeColumns("word", "count"))
126
126
127
127
128
128
## Tuning
129
- The following properties set in ` SparkConf ` can be used to fine-tune the saving process:
129
+ The following properties set in ` SparkConf ` can be used to fine-tune the saving process,
130
+ These values have been set to achieve stability and not performance. Changing these values may
131
+ increase your performance based on your workload:
130
132
131
133
- ` spark.cassandra.output.batch.size.rows ` : number of rows per single batch; default is 'auto' which means the connector
132
134
will adjust the number of rows based on the amount of data in each row
133
- - ` spark.cassandra.output.batch.size.bytes ` : maximum total size of the batch in bytes; defaults to 16 kB.
135
+ - ` spark.cassandra.output.batch.size.bytes ` : maximum total size of the batch in bytes; defaults to 1 kB.
134
136
- ` spark.cassandra.output.batch.grouping.key ` : determines how insert statements are grouped into batches; available values are:
135
137
- ` none ` : a batch may contain any statements
136
138
- ` replica_set ` : a batch may contain only statements to be written to the same replica set
137
139
- ` partition ` (default): a batch may contain only statements for rows sharing the same partition key value
138
140
- ` spark.cassandra.output.batch.buffer.size ` : how many batches per single Spark task can be stored in memory before sending to Cassandra; default 1000
139
141
- ` spark.cassandra.output.concurrent.writes ` : maximum number of batches executed in parallel by a single Spark task; defaults to 5
140
142
- ` spark.cassandra.output.consistency.level ` : consistency level for writing; defaults to LOCAL_ONE.
141
- - ` spark.cassandra.output.throughput_mb_per_sec ` : maximum write throughput allowed per single core in MB/s;
142
- throughput limiting needs ` spark.cassandra.output.metrics ` enabled
143
+ - ` spark.cassandra.output.throughput_mb_per_sec ` : maximum write throughput allowed per single core in MB/s
144
+ limit this on long (+8 hour) runs to 70% of your max
145
+ throughput as seen on a smaller job for stability
143
146
144
147
[ Next - Customizing the object mapping] ( 6_advanced_mapper.md )
0 commit comments