delta-io · allisonport-db · Feb 1, 2023 · Feb 1, 2023 · Feb 2, 2023 · Feb 2, 2023
diff --git a/src/pages/latest/delta-column-mapping.mdx b/src/pages/latest/delta-column-mapping.mdx
diff --git a/src/pages/latest/delta-intro.mdx b/src/pages/latest/delta-intro.mdx
@@ -14,26 +14,23 @@ metadata handling, and unifies [streaming](delta-streaming.md) and
 [batch](delta-batch.md) data processing on top of existing data lakes, such as
 S3, ADLS, GCS, and HDFS.
 
-For a quick overview and benefits of Delta Lake, watch this YouTube video (3
-minutes).
-
 Specifically, Delta Lake offers:
 
-- [ACID transactions](concurrency-control.md) on Spark: Serializable isolation
+- [ACID transactions](/latest/concurrency-control) on Spark: Serializable isolation
   levels ensure that readers never see inconsistent data.
 - Scalable metadata handling: Leverages Spark distributed processing power to
   handle all the metadata for petabyte-scale tables with billions of files at
   ease.
-- [Streaming](delta-streaming.md) and [batch](delta-batch.md) unification: A
+- [Streaming](/latest/delta-streaming) and [batch](/latest/delta-batch) unification: A
   table in Delta Lake is a batch table as well as a streaming source and sink.
   Streaming data ingest, batch historic backfill, interactive queries all just
   work out of the box.
 - Schema enforcement: Automatically handles schema variations to prevent
   insertion of bad records during ingestion.
-- [Time travel](delta-batch.md#deltatimetravel): Data versioning enables
+- [Time travel](/latest/delta-batch#deltatimetravel): Data versioning enables
   rollbacks, full historical audit trails, and reproducible machine learning
   experiments.
-- [Upserts](delta-update.md#delta-merge) and
-  [deletes](delta-update.md#delta-delete): Supports merge, update and delete
+- [Upserts](/latest/delta-update#upsert-into-a-table-using-merge) and
+  [deletes](/latest/delta-update#delete-from-a-table): Supports merge, update and delete
   operations to enable complex use cases like change-data-capture,
   slowly-changing-dimension (SCD) operations, streaming upserts, and so on.
diff --git a/src/pages/latest/delta-streaming.mdx b/src/pages/latest/delta-streaming.mdx
@@ -273,7 +273,7 @@ The preceding example continuously updates a table that contains the aggregate n
 
 For applications with more lenient latency requirements, you can save computing resources with one-time triggers. Use these to update summary aggregation tables on a given schedule, processing only new data that has arrived since the last update.
 
-## Idempotent table writes in `foreachBatch`
+## Idempotent table writes in foreachBatch
 
 <Info title="Note" level="info">
   Available in Delta Lake 2.0.0 and above.

diff --git a/src/pages/latest/getting-started.mdx b/src/pages/latest/getting-started.mdx
diff --git a/src/pages/latest/optimizations-oss.mdx b/src/pages/latest/optimizations-oss.mdx
@@ -9,11 +9,10 @@ Delta Lake provides optimizations that accelerate data lake operations.
 
 To improve query speed, Delta Lake supports the ability to optimize the layout of data in storage. There are various ways to optimize the layout.
 
+<a id="compaction-binpacking"></a>
 ### Compaction (bin-packing)
 
 <Info title="Note" level="info">
-Note
-
 This feature is available in Delta Lake 1.2.0 and above.
 </Info>
 
@@ -58,11 +57,9 @@ deltaTable.optimize().where("date='2021-11-18'").executeCompaction()
 
 </CodeTabs>
 
-For Scala, Java, and Python API syntax details, see the [Delta Lake APIs](/latest/delta-apidoc.html).
+For Scala, Java, and Python API syntax details, see the [Delta Lake APIs](/latest/delta-apidoc).
 
 <Info title="Note" level="info">
-Note
-
 * Bin-packing optimization is _idempotent_, meaning that if it is run twice on the same dataset, the second run has no effect.
 
 * Bin-packing aims to produce evenly-balanced data files with respect to their size on disk, but not necessarily number of tuples per file. However, the two measures are most often correlated.
@@ -77,20 +74,17 @@ Readers of Delta tables use snapshot isolation, which means that they are not in
 ## Data skipping
 
 <Info title="Note" level="info">
-Note
-
 This feature is available in Delta Lake 1.2.0 and above.
 </Info>
 
-Data skipping information is collected automatically when you write data into a Delta Lake table. Delta Lake takes advantage of this information (minimum and maximum values for each column) at query time to provide faster queries. You do not need to configure data skipping; the feature is activated whenever applicable. However, its effectiveness depends on the layout of your data. For best results, apply [Z-Ordering](/latest/optimizations-oss.html#-z-ordering-multi-dimensional-clustering).
+Data skipping information is collected automatically when you write data into a Delta Lake table. Delta Lake takes advantage of this information (minimum and maximum values for each column) at query time to provide faster queries. You do not need to configure data skipping; the feature is activated whenever applicable. However, its effectiveness depends on the layout of your data. For best results, apply [Z-Ordering](#zordering-multidimensional-clustering).
 
-Collecting statistics on a column containing long values such as string or binary is an expensive operation. To avoid collecting statistics on such columns you can configure the [table property](/latest/delta-batch.html#-table-properties) `delta.dataSkippingNumIndexedCols`. This property indicates the position index of a column in the table’s schema. All columns with a position index less than the `delta.dataSkippingNumIndexedCols` property will have statistics collected. For the purposes of collecting statistics, each field within a nested column is considered as an individual column. To avoid collecting statistics on columns containing long values, either set the `delta.dataSkippingNumIndexedCols` property so that the long value columns are after this index in the table’s schema, or move columns containing long strings to an index position greater than the `delta.dataSkippingNumIndexedCols` property by using `[ALTER TABLE ALTER COLUMN](/latest/sql-ref-syntax-ddl-alter-table.html#alter-or-change-column)`.
+Collecting statistics on a column containing long values such as string or binary is an expensive operation. To avoid collecting statistics on such columns you can configure the [table property](/latest/table-properties) `delta.dataSkippingNumIndexedCols`. This property indicates the position index of a column in the table’s schema. All columns with a position index less than the `delta.dataSkippingNumIndexedCols` property will have statistics collected. For the purposes of collecting statistics, each field within a nested column is considered as an individual column. To avoid collecting statistics on columns containing long values, either set the `delta.dataSkippingNumIndexedCols` property so that the long value columns are after this index in the table’s schema, or move columns containing long strings to an index position greater than the `delta.dataSkippingNumIndexedCols` property by using [`ALTER TABLE ALTER COLUMN`](https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-alter-table.html#alter-or-change-column).
 
+<a id="zordering-multidimensional-clustering"></a>
 ## Z-Ordering (multi-dimensional clustering)
 
 <Info title="Note" level="info">
-Note
-
 This feature is available in Delta Lake 2.0.0 and above.
 </Info>
 
@@ -131,27 +125,24 @@ deltaTable.optimize().where("date='2021-11-18'").executeZOrderBy(eventType)
 
 </CodeTabs>
 
-For Scala, Java, and Python API syntax details, see the [Delta Lake APIs](/latest/delta-apidoc.html).
+For Scala, Java, and Python API syntax details, see the [Delta Lake APIs](/latest/delta-apidoc).
 
 If you expect a column to be commonly used in query predicates and if that column has high cardinality (that is, a large number of distinct values), then use `ZORDER BY`.
 
-You can specify multiple columns for `ZORDER BY` as a comma-separated list. However, the effectiveness of the locality drops with each extra column. Z-Ordering on columns that do not have statistics collected on them would be ineffective and a waste of resources. This is because data skipping requires column-local stats such as min, max, and count. You can configure statistics collection on certain columns by reordering columns in the schema, or you can increase the number of columns to collect statistics on. See [Data skipping](https://docs.delta.io/latest/optimizations-oss.html#-data-skipping).
+You can specify multiple columns for `ZORDER BY` as a comma-separated list. However, the effectiveness of the locality drops with each extra column. Z-Ordering on columns that do not have statistics collected on them would be ineffective and a waste of resources. This is because data skipping requires column-local stats such as min, max, and count. You can configure statistics collection on certain columns by reordering columns in the schema, or you can increase the number of columns to collect statistics on. See [Data skipping](#data-skipping).
 
 <Info title="Note" level="info">
-Note
-
 * Z-Ordering is _not idempotent_. Everytime the Z-Ordering is executed it will try to create a new clustering of data in all files (new and existing files that were part of previous Z-Ordering) in a partition.
 
 * Z-Ordering aims to produce evenly-balanced data files with respect to the number of tuples, but not necessarily data size on disk. The two measures are most often correlated, but there can be situations when that is not the case, leading to skew in optimize task times.
 
 * For example, if you `ZORDER BY` _date_ and your most recent records are all much wider (for example longer arrays or string values) than the ones in the past, it is expected that the `OPTIMIZE` job’s task durations will be skewed, as well as the resulting file sizes. This is, however, only a problem for the `OPTIMIZE` command itself; it should not have any negative impact on subsequent queries.
 </Info>
 
+<a id="multipart-checkpointing"></a>
 ## Multi-part checkpointing
 
 <Info title="Note" level="info">
-Note
-
 This feature is available in Delta Lake 2.0.0 and above. This feature is in experimental support mode.
 </Info>
 
@@ -160,7 +151,5 @@ Delta Lake table periodically and automatically compacts all the incremental upd
 Delta Lake protocol allows [splitting the checkpoint](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#checkpoints) into multiple Parquet files. This parallelizes and speeds up writing the checkpoint. In Delta Lake, by default each checkpoint is written as a single Parquet file. To to use this feature, set the SQL configuration `spark.databricks.delta.checkpoint.partSize=<n>`, where `n` is the limit of number of actions (such as `AddFile`) at which Delta Lake on Apache Spark will start parallelizing the checkpoint and attempt to write a maximum of this many actions per checkpoint file.
 
 <Info title="Note" level="info">
-Note
-
 This feature requires no reader side configuration changes. The existing reader already supports reading a checkpoint with multiple files.
 </Info>