Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/Explore Algorithms/LightGBM/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,11 +164,11 @@ SynapseML must pass data from Spark partitions to LightGBM native Datasets befor
the actual LightGBM execution code for training and inference. SynapseML has two modes
that control how this data is transferred: *streaming* and *bulk*.
This mode doesn't affect training but can affect memory usage and overall fit/transform time.
By default, SynapseML uses "streaming" mode.

#### Bulk Execution mode
The "Bulk" mode is older and requires accumulating all data in executor memory before creating Datasets. This mode can cause
OOM errors for large data, especially since the data must be accumulated in its original uncompressed double-format size.
For now, "bulk" mode is the default since "streaming" is new, but SynapseML will eventually make streaming the default.

For bulk mode, native LightGBM Datasets can either be created per partition (useSingleDatasetMode=false), or
per executor (useSingleDatasetMode=true). Generally, one Dataset per executor is more efficient since it reduces LightGBM network size and complexity during training or fitting. It also avoids using slow network protocols on partitions
Expand Down Expand Up @@ -259,4 +259,4 @@ To use it in scala, you can call setUseBarrierExecutionMode(true), for example:
.setUseBarrierExecutionMode(true)
...
<train classifier>
Note: barrier execution mode can also cause complicated issues, so use it only if needed.
Note: barrier execution mode can also cause complicated issues, so use it only if needed.
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ trait LightGBMExecutionParams extends Wrappable {

val dataTransferMode = new Param[String](this, "dataTransferMode",
"Specify how SynapseML transfers data from Spark to LightGBM. " +
"Values can be streaming, bulk. Default is bulk, which is the legacy mode.")
"Values can be streaming, bulk. Default is streaming.")
setDefault(dataTransferMode -> LightGBMConstants.StreamingDataTransferMode)
def getDataTransferMode: String = $(dataTransferMode)
def setDataTransferMode(value: String): this.type = set(dataTransferMode, value)
Expand Down