docs: default dataTransferMode is streaming, not bulk (#2377)

operte · web-flow · commit ffa03839eecf · 2025-06-05T16:42:53.000-04:00
* Change default dataTransferMode to bulk

* revert: go back to streaming mode and update docs instead

* docs: change default mode in webpage as well
diff --git a/docs/Explore Algorithms/LightGBM/Overview.md b/docs/Explore Algorithms/LightGBM/Overview.md
@@ -164,11 +164,11 @@ SynapseML must pass data from Spark partitions to LightGBM native Datasets befor
 the actual LightGBM execution code for training and inference. SynapseML has two modes
 that control how this data is transferred: *streaming* and *bulk*.
 This mode doesn't affect training but can affect memory usage and overall fit/transform time.
+By default, SynapseML uses "streaming" mode.
 
 #### Bulk Execution mode
 The "Bulk" mode is older and requires accumulating all data in executor memory before creating Datasets. This mode can cause
 OOM errors for large data, especially since the data must be accumulated in its original uncompressed double-format size.
-For now, "bulk" mode is the default since "streaming" is new, but SynapseML will eventually make streaming the default.
 
 For bulk mode, native LightGBM Datasets can either be created per partition (useSingleDatasetMode=false), or
 per executor (useSingleDatasetMode=true). Generally, one Dataset per executor is more efficient since it reduces LightGBM network size and complexity during training or fitting. It also avoids using slow network protocols on partitions
@@ -259,4 +259,4 @@ To use it in scala, you can call setUseBarrierExecutionMode(true), for example:
         .setUseBarrierExecutionMode(true)
     ...
     <train classifier>
-Note: barrier execution mode can also cause complicated issues, so use it only if needed.
+Note: barrier execution mode can also cause complicated issues, so use it only if needed.
diff --git a/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/LightGBMParams.scala b/lightgbm/src/main/scala/com/microsoft/azure/synapse/ml/lightgbm/params/LightGBMParams.scala
@@ -91,7 +91,7 @@ trait LightGBMExecutionParams extends Wrappable {
 
   val dataTransferMode = new Param[String](this, "dataTransferMode",
     "Specify how SynapseML transfers data from Spark to LightGBM.  " +
-      "Values can be streaming, bulk. Default is bulk, which is the legacy mode.")
+      "Values can be streaming, bulk. Default is streaming.")
   setDefault(dataTransferMode -> LightGBMConstants.StreamingDataTransferMode)
   def getDataTransferMode: String = $(dataTransferMode)
   def setDataTransferMode(value: String): this.type = set(dataTransferMode, value)