You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/benchmarks/compaction.mdx
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,11 @@
1
1
---
2
-
title: Optimization Benchmarks
3
-
sidebar_label: Optimization Benchmarks
2
+
title: Compaction Benchmarks
3
+
sidebar_label: Compaction Benchmarks
4
4
---
5
5
6
-
# Optimization Benchmarks
6
+
# Compaction Benchmarks
7
7
8
-
The following benchmark evaluates performance, environment configuration, and operational considerations for optimizing Apache Iceberg tables using **Apache Spark**`rewrite_data_files` and **OLake Fusion** compaction on a CDC-like TPCH workload.
8
+
The following benchmark evaluates performance, environment configuration, and operational considerations for compacting Apache Iceberg tables using **Apache Spark**`rewrite_data_files` and **OLake Fusion** compaction on a CDC-like TPCH workload.
Follow the [Quickstart Setup Guide](/docs/install/olake-ui/) to ensure the OLAKE UI is running at [localhost:8000](http://localhost:8000).
11
-
12
-
- You have at least one destination configured in Ingestion **or**
13
-
- You are ready to add a catalog manually in the optimization section.
14
-
15
-
:::info Optimization in the OLake UI
16
-
17
-
Iceberg maintenance (**Optimization**) is available starting from v0.4.0. Upgrade OLake UI to access the **Maintenance** module.
18
-
19
-
-**Existing users:** If you are already using OLake for Ingestion follow the [upgrade guide](/docs/install/olake-ui/#updating-olake-ui-version) to accesss Maintenance module.
20
-
-**New users:** Follow the [quickstart guide](/docs/install/olake-ui/#quick-start) to get started.
10
+
-**Existing users:** If you are already using OLake for Ingestion follow the [upgrade guide](/docs/install/olake-ui/?update-ui-version=fusion-update#updating-olake-ui-version) to access Maintenance module.
11
+
-**New users:** Follow the [quickstart guide](/docs/install/olake-ui/?quick-start=fusion#prerequisites) to get started.
21
12
13
+
:::warning
14
+
If you want to compact OLake-Ingested tables, upgrade **OLake Go** (Ingestion) driver version to **v0.7.0 or higher** to avoid any conflicts.
22
15
:::
23
16
24
-
This guide walks through configuring your **first optimization** for a table.
17
+
This guide walks through configuring your **first compaction** for a table.
25
18
26
19
## Step 1: Add a Catalog
27
20
@@ -52,36 +45,36 @@ Click **View Tables** to go to the **Tables** page for Step 2.
52
45
53
46
1. Use the **Select Catalog** dropdown and select the catalog you just configured.
Only after you select both a catalog and a database will the list of tables in that Iceberg database appear on the page.
62
55
63
-
## Step 3: Configure Optimization For Your Table
56
+
## Step 3: Configure Compaction for Your Table
64
57
65
-
1. In the tables list, find the table you want to optimize.
58
+
1. In the tables list, find the table you want to compact.
66
59
67
60
<details>
68
-
<summary><strong>Tip:</strong> Click <strong>View Metrics</strong> to open table metrics. **Health Score** and **target file size** (and related size signals in the metrics view) help decide whether optimization is required for that table.</summary>
61
+
<summary><strong>Tip:</strong> Click <strong>View Metrics</strong> to open table metrics. **Health Score** and **target file size** (and related size signals in the metrics view) help decide whether compaction is required for that table.</summary>
When configuring optimization for a table, each optimization type has a **Frequency** dropdown with common schedules, such as:
77
+
When configuring compaction for a table, each compaction type has a **Frequency** dropdown with common schedules, such as:
85
78
86
79
- Never
87
80
- Every 30 min
@@ -90,11 +83,11 @@ When configuring optimization for a table, each optimization type has a **Freque
90
83
- Every 12 hours
91
84
- Every 24 hours
92
85
93
-
You can configure these frequencies independently for Lite, Medium, and Full Optimization.
86
+
You can configure these frequencies independently for Lite, Medium, and Full compaction.
94
87
95
88
**Default schedules are applied automatically** for each table, so there is no need to open the configuration modal and set frequencies on every table unless a different cadence is required.
96
89
97
-
**Defaults schedules for each type of optimization:**
90
+
**Default schedules for each type of compaction:**
98
91
99
92
**Lite** — every 1 hour
100
93
@@ -108,50 +101,50 @@ If you choose **Custom** in the Frequency dropdown, a **Cron Expression** field
108
101
109
102
You can enter a standard cron expression here. For example:
110
103
111
-
-`0 0 * * *` – run the optimization once every day at midnight.
104
+
-`0 0 * * *` – run compaction once every day at midnight.
112
105
113
106
## Step 4: (Advanced) Target File Size
114
107
115
108
Under the **Advanced Config** dropdown in the modal, you can configure **Target file size**.
**Full**Optimization uses **target file size** directly: rewritten data files are aligned toward that size.
112
+
**Full**compaction uses **target file size** directly: rewritten data files are aligned toward that size.
120
113
121
-
**Lite** and **Medium** use it **indirectly**. Their merge and output bounds are derived from the same setting. How each type relates to this value is explained in the [Optimization overview](/docs/iceberg-maintenance/optimization/overview/).
114
+
**Lite** and **Medium** use it **indirectly**. Their merge and output bounds are derived from the same setting. How each type relates to this value is explained in the [Types of Compaction](/docs/iceberg-maintenance/compaction/overview/).
122
115
123
116
In general, a **larger** target tends toward **fewer, bigger** files; a **smaller** target tends toward **more, smaller** files.
124
117
125
118
If unsure, start with the default (**512 MB**) and tune later based on query-engine behavior.
Once enabled, OLake will start running optimization for that table according to the schedule you configured.
143
+
Once enabled, OLake will start running compaction for that table according to the schedule you configured.
151
144
152
145
## Health Score and Last Run Status
153
146
154
-
With a catalog and database selected, the **Tables** page shows one row per table. The sections below explain **Health Score** (overall table health) and **Last Run status** (per-type status for Lite, Medium, and Full optimization).
147
+
With a catalog and database selected, the **Tables** page shows one row per table. The sections below explain **Health Score** (overall table health) and **Last Run status** (per-type status for Lite, Medium, and Full compaction).
155
148
156
149
### Health Score
157
150
@@ -165,7 +158,7 @@ With a catalog and database selected, the **Tables** page shows one row per tabl
165
158
166
159
Together, these three parts are **weighted 40% / 40% / 20%**: **Small Files Score** and **Eq Delete Score** each contribute **40%** of the Health Score, and **Pos Delete Score** contributes **20%**.
167
160
168
-
Higher scores generally mean the table is in better shape for reads; lower scores suggest running or tuning optimization more often.
161
+
Higher scores generally mean the table is in better shape for reads; lower scores suggest running or tuning compaction more often.
169
162
170
163

171
164
@@ -174,35 +167,33 @@ Higher scores generally mean the table is in better shape for reads; lower score
174
167
**Last Run status** always shows three badges—**L (Lite)**, **M (Medium)**, **F (Full)**. Each badge is that type’s **latest** outcome: **running**, **success**, **failed**, **cancelled**, **skipped**, or **never run**.
175
168
176
169
-**Letters** — Typically you see three badges together:
177
-
-**L** — Lite Optimization
178
-
-**M** — Medium Optimization
179
-
-**F** — Full Optimization
170
+
-**L** — Lite compaction
171
+
-**M** — Medium compaction
172
+
-**F** — Full compaction
180
173
181
174
-**Colours (quick read, per badge)**
182
175
-**Green** — that type’s last run **succeeded**
183
176
-**Red** — that type’s last run **failed** or was **cancelled**
184
177
-**Yellow** — a run of **that type** is **running** right now
185
178
-**Gray** with **ⓘ** — that type’s last run was **skipped**
186
-
-**Gray** with **◌** — that optimization type has **never** run for this table
179
+
-**Gray** with **◌** — that compaction type has **never** run for this table
187
180
188
-
-**Not Optimized** — Shown **only** when **no**optimization has run yet for that table—**neither** Lite, **nor** Medium, **nor** Full.
181
+
-**Not compacted** — Shown **only** when **no**compaction has run yet for that table—**neither** Lite, **nor** Medium, **nor** Full.
189
182
190
183
:::info Hover
191
184
When you **hover** over a table's Last Run Status, a small **card** opens which includes:
192
185
193
-
-**Name** — Lite, Medium, or Full
186
+
-**Name** — Lite, Medium, or Full
194
187
-**Last run** — relative time, such as “2 hours ago”
195
188
-**Status** — plain text such as **Success**, **Failed**, **Cancelled**, **Skipped**, or **Running**
196
189
:::
197
190
198
-

191
+

199
192
200
193
## Next Steps
201
194
202
-
After your first optimization runs, you can:
203
-
204
-
-**View Logs & Runs** – see each optimization run and its detailed logs:
Copy file name to clipboardExpand all lines: docs/getting-started/quickstart.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: "Setup OLake"
2
+
title: "Quickstart - OLake Go"
3
3
description: Get started with OLake using Docker Compose. Follow simple steps to deploy, log in, and manage OLake jobs for seamless data replication workflows.
Copy file name to clipboardExpand all lines: docs/iceberg-maintenance/catalogs.mdx
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,13 +5,13 @@ sidebar_label: Catalogs
5
5
6
6
# What are Catalogs?
7
7
8
-
A **catalog** is a entity from where OLake fetches Iceberg tables. It stores reference to the latest metadata file which stores info such as table names, schemas, and file locations. OLake must know which catalog (and database) a table belongs to before optimization can run on it.
8
+
A **catalog** is a entity from where OLake fetches Iceberg tables. It stores reference to the latest metadata file which stores info such as table names, schemas, and file locations. OLake must know which catalog (and database) a table belongs to before maintenance can run on it.
9
9
10
10
Catalogs are managed from the **Maintenance** section in the OLake UI. There are two types: **OLake Imported Catalogs** and **External Catalogs**.
11
11
12
12
## OLake Imported Catalogs
13
13
14
-
**OLake Imported Catalogs** are catalog entries whose credentials come from an Iceberg **destination** already configured in OLake Ingestion—imported into optimization instead of re-entering connection details.
14
+
**OLake Imported Catalogs** are catalog entries whose credentials come from an Iceberg **destination** already configured in OLake Ingestion—imported into Maintenance instead of re-entering connection details.
15
15
16
16
When Iceberg ingestion is already set up, that destination's catalog can be imported this way.
17
17
@@ -42,7 +42,7 @@ When Iceberg ingestion is already set up, that destination's catalog can be impo
42
42
43
43
Use an external catalog when:
44
44
45
-
-Optimization is needed on Iceberg tables created outside OLake.
45
+
-Maintenance is needed on Iceberg tables created outside OLake.
46
46
- The same catalog is used by other systems and OLake should compact those tables without replicating data via Ingestion.
Once a catalog is connected—whether imported or added as external—it appears in the **Select Catalog** dropdown on the **Tables** page. Selecting a catalog (and then a database) lists the tables available for optimization configuration.
60
+
Once a catalog is connected—whether imported or added as external—it appears in the **Select Catalog** dropdown on the **Tables** page. Selecting a catalog (and then a database) lists the tables available for maintenance configuration.
61
61
62
-
For a full walkthrough that includes adding a catalog and configuring the first optimization, see [Configure Your First Optimization](/docs/getting-started/configure-first-optimization).
62
+
For a full walkthrough that includes adding a catalog and configuring the first table maintenance, see [Configure Your First Table Maintenance](/docs/getting-started/configure-first-compaction).
- Expand the **Advanced Config** panel in the modal.
30
+
- Specify the **Target File Size** for the table (default **512 MB** if you leave it unchanged).
31
+
32
+
For how **target file size** affects **Lite**, **Medium**, and **Full** compaction, see [Types of Compaction Supported in OLake](/docs/iceberg-maintenance/compaction/overview).
33
+
34
+
> **Tip:** Choose a target size based on your query patterns and table size. Larger files can improve scan efficiency based on the query but may increase the cost of rewriting files.
- After saving, you will be redirected to the **Tables** page.
52
+
- Locate the table and **toggle the Status switch** to activate scheduled compaction for that specific table.
53
+
54
+
> **Important:** The **Status toggle must be switched on**. Even if a cron schedule is configured, compaction will not execute unless the table is enabled.
0 commit comments