Skip to content

Commit ffa0383

Browse files
authored
docs: default dataTransferMode is streaming, not bulk (#2377)
* Change default dataTransferMode to bulk * revert: go back to streaming mode and update docs instead * docs: change default mode in webpage as well
1 parent 04d9dc4 commit ffa0383

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

docs/Explore Algorithms/LightGBM/Overview.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -164,11 +164,11 @@ SynapseML must pass data from Spark partitions to LightGBM native Datasets befor
164164
the actual LightGBM execution code for training and inference. SynapseML has two modes
165165
that control how this data is transferred: *streaming* and *bulk*.
166166
This mode doesn't affect training but can affect memory usage and overall fit/transform time.
167+
By default, SynapseML uses "streaming" mode.
167168

168169
#### Bulk Execution mode
169170
The "Bulk" mode is older and requires accumulating all data in executor memory before creating Datasets. This mode can cause
170171
OOM errors for large data, especially since the data must be accumulated in its original uncompressed double-format size.
171-
For now, "bulk" mode is the default since "streaming" is new, but SynapseML will eventually make streaming the default.
172172

173173
For bulk mode, native LightGBM Datasets can either be created per partition (useSingleDatasetMode=false), or
174174
per executor (useSingleDatasetMode=true). Generally, one Dataset per executor is more efficient since it reduces LightGBM network size and complexity during training or fitting. It also avoids using slow network protocols on partitions
@@ -259,4 +259,4 @@ To use it in scala, you can call setUseBarrierExecutionMode(true), for example:
259259
.setUseBarrierExecutionMode(true)
260260
...
261261
<train classifier>
262-
Note: barrier execution mode can also cause complicated issues, so use it only if needed.
262+
Note: barrier execution mode can also cause complicated issues, so use it only if needed.

lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/LightGBMParams.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ trait LightGBMExecutionParams extends Wrappable {
9191

9292
val dataTransferMode = new Param[String](this, "dataTransferMode",
9393
"Specify how SynapseML transfers data from Spark to LightGBM. " +
94-
"Values can be streaming, bulk. Default is bulk, which is the legacy mode.")
94+
"Values can be streaming, bulk. Default is streaming.")
9595
setDefault(dataTransferMode -> LightGBMConstants.StreamingDataTransferMode)
9696
def getDataTransferMode: String = $(dataTransferMode)
9797
def setDataTransferMode(value: String): this.type = set(dataTransferMode, value)

0 commit comments

Comments
 (0)