Skip to content

Commit a2357eb

Browse files
nicklanvkorukanti
authored andcommitted
[Docs] Add auto-compact docs
(Cherry-pick of 42dea93 to branch-3.1) Adding docs for auto-compact
1 parent 98db14c commit a2357eb

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

docs/source/optimizations-oss.md

+22
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,28 @@ For Scala, Java, and Python API syntax details, see the [_](delta-apidoc.md).\
7272

7373
Readers of Delta tables use snapshot isolation, which means that they are not interrupted when `OPTIMIZE` removes unnecessary files from the transaction log. `OPTIMIZE` makes no data related changes to the table, so a read before and after an `OPTIMIZE` has the same results. Performing `OPTIMIZE` on a table that is a streaming source does not affect any current or future streams that treat this table as a source. `OPTIMIZE` returns the file statistics (min, max, total, and so on) for the files removed and the files added by the operation. Optimize stats also contains the number of batches, and partitions optimized.
7474

75+
## Auto compaction
76+
77+
.. note:: This feature is available in <Delta> 3.1.0 and above.
78+
79+
Auto compaction combines small files within Delta table partitions to automatically reduce small file problems. Auto compaction occurs after a write to a table has succeeded and runs synchronously on the cluster that has performed the write. Auto compaction only compacts files that haven't been compacted previously.
80+
81+
You can control the output file size by setting the configuration `spark.databricks.delta.autoCompact.maxFileSize`.
82+
83+
Auto compaction is only triggered for partitions or tables that have at least a certain number of small files. You can optionally change the minimum number of files required to trigger auto compaction by setting `spark.databricks.delta.autoCompact.minNumFiles`.
84+
85+
Auto compaction can be enabled at the table or session level using the following settings:
86+
87+
- Table property: `delta.autoOptimize.autoCompact`
88+
- SparkSession setting: `spark.databricks.delta.autoCompact.enabled`
89+
90+
These settings accept the following options:
91+
92+
| Options | Behavior |
93+
| --- | --- |
94+
| `true` | Enable auto compaction. By default will use 128 MB as the target file size. |
95+
| `false` | Turns off auto compaction. Can be set at the session level to override auto compaction for all Delta tables modified in the workload. |
96+
7597
## Data skipping
7698

7799
.. note:: This feature is available in <Delta> 1.2.0 and above.

0 commit comments

Comments
 (0)