Skip to content

Commit 1e7fd54

Browse files
improvement: renamed optimization to either maintenance or compaction (#409)
* improvement: renamed optimization to either maintenance or compaction * updated product labels * fixed link redirect issue, added driver upgrade warning
1 parent 76d94d1 commit 1e7fd54

32 files changed

Lines changed: 320 additions & 266 deletions
Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
2-
title: Optimization Benchmarks
3-
sidebar_label: Optimization Benchmarks
2+
title: Compaction Benchmarks
3+
sidebar_label: Compaction Benchmarks
44
---
55

6-
# Optimization Benchmarks
6+
# Compaction Benchmarks
77

8-
The following benchmark evaluates performance, environment configuration, and operational considerations for optimizing Apache Iceberg tables using **Apache Spark** `rewrite_data_files` and **OLake Fusion** compaction on a CDC-like TPCH workload.
8+
The following benchmark evaluates performance, environment configuration, and operational considerations for compacting Apache Iceberg tables using **Apache Spark** `rewrite_data_files` and **OLake Fusion** compaction on a CDC-like TPCH workload.
99

1010
### Benchmark Environment
1111

@@ -868,7 +868,7 @@ gcloud dataproc jobs submit pyspark gs://dz-benchmark/dataproc_compaction.py \
868868

869869
### OLake Fusion
870870

871-
The compaction was executed using Fusion's scheduled optimization flow across three compaction levels.
871+
The compaction was executed using Fusion's scheduled compaction flow across three compaction levels.
872872

873873
The compaction job was configured with two trigger tiers to balance frequent incremental cleanup with periodic deeper compaction:
874874

@@ -881,7 +881,7 @@ In this benchmark, the destination dataset size was below 100 GB, so running `Fu
881881

882882
The Fusion compaction setup is configured with the following parameters:
883883

884-
- `target-size`: 512 MB (Target file size after optimization)
884+
- `target-size`: 512 MB (Target file size after compaction)
885885

886886
### Dataset and Table Schemas
887887

Lines changed: 44 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,20 @@
11
---
2-
title: Configure Your First optimization
3-
sidebar_label: Configure Your First Optimization
2+
title: Configure Your First Compaction
3+
sidebar_label: Configure Your First Compaction
44
---
55

6-
# Configure Your First Optimization
6+
# Configure Your First Compaction
77

88
## Prerequisites
99

10-
Follow the [Quickstart Setup Guide](/docs/install/olake-ui/) to ensure the OLAKE UI is running at [localhost:8000](http://localhost:8000).
11-
12-
- You have at least one destination configured in Ingestion **or**
13-
- You are ready to add a catalog manually in the optimization section.
14-
15-
:::info Optimization in the OLake UI
16-
17-
Iceberg maintenance (**Optimization**) is available starting from v0.4.0. Upgrade OLake UI to access the **Maintenance** module.
18-
19-
- **Existing users:** If you are already using OLake for Ingestion follow the [upgrade guide](/docs/install/olake-ui/#updating-olake-ui-version) to accesss Maintenance module.
20-
- **New users:** Follow the [quickstart guide](/docs/install/olake-ui/#quick-start) to get started.
10+
- **Existing users:** If you are already using OLake for Ingestion follow the [upgrade guide](/docs/install/olake-ui/?update-ui-version=fusion-update#updating-olake-ui-version) to access Maintenance module.
11+
- **New users:** Follow the [quickstart guide](/docs/install/olake-ui/?quick-start=fusion#prerequisites) to get started.
2112

13+
:::warning
14+
If you want to compact OLake-Ingested tables, upgrade **OLake Go** (Ingestion) driver version to **v0.7.0 or higher** to avoid any conflicts.
2215
:::
2316

24-
This guide walks through configuring your **first optimization** for a table.
17+
This guide walks through configuring your **first compaction** for a table.
2518

2619
## Step 1: Add a Catalog
2720

@@ -52,36 +45,36 @@ Click **View Tables** to go to the **Tables** page for Step 2.
5245

5346
1. Use the **Select Catalog** dropdown and select the catalog you just configured.
5447

55-
![Select Catalog](pathname:///img/docs/getting-started/configure-your-first-optimization/select-catalog.webp)
48+
![Select Catalog](pathname:///img/docs/getting-started/configure-your-first-compaction/select-catalog.webp)
5649

5750
2. After selecting the catalog, use the **Select Database** dropdown to choose a database (Iceberg DB) from that catalog.
5851

59-
![Select Database](pathname:///img/docs/getting-started/configure-your-first-optimization/select-database.webp)
52+
![Select Database](pathname:///img/docs/getting-started/configure-your-first-compaction/select-database.webp)
6053

6154
Only after you select both a catalog and a database will the list of tables in that Iceberg database appear on the page.
6255

63-
## Step 3: Configure Optimization For Your Table
56+
## Step 3: Configure Compaction for Your Table
6457

65-
1. In the tables list, find the table you want to optimize.
58+
1. In the tables list, find the table you want to compact.
6659

6760
<details>
68-
<summary><strong>Tip:</strong> Click <strong>View Metrics</strong> to open table metrics. **Health Score** and **target file size** (and related size signals in the metrics view) help decide whether optimization is required for that table.</summary>
61+
<summary><strong>Tip:</strong> Click <strong>View Metrics</strong> to open table metrics. **Health Score** and **target file size** (and related size signals in the metrics view) help decide whether compaction is required for that table.</summary>
6962

7063
![Table metrics view](pathname:///img/docs/iceberg-maintenance/metrics/view-table-metrics-button.webp)
7164

7265
</details>
7366

7467
2. Click on the **Configure** button.
7568

76-
![Configuration Button](pathname:///img/docs/iceberg-maintenance/optimization/configure-button.webp)
69+
![Configuration Button](pathname:///img/docs/iceberg-maintenance/compaction/configure-button.webp)
7770

78-
This opens a configuration modal (cron modal) where you can set schedules for **Lite**, **Medium**, and **Full** Optimization.
71+
This opens a configuration modal (cron modal) where you can set schedules for **Lite**, **Medium**, and **Full** compaction.
7972

80-
![Configuration Cron](pathname:///img/docs/iceberg-maintenance/optimization/configuration.webp)
73+
![Configuration Cron](pathname:///img/docs/iceberg-maintenance/compaction/configuration.webp)
8174

8275
### Frequency Presets
8376

84-
When configuring optimization for a table, each optimization type has a **Frequency** dropdown with common schedules, such as:
77+
When configuring compaction for a table, each compaction type has a **Frequency** dropdown with common schedules, such as:
8578

8679
- Never
8780
- Every 30 min
@@ -90,11 +83,11 @@ When configuring optimization for a table, each optimization type has a **Freque
9083
- Every 12 hours
9184
- Every 24 hours
9285

93-
You can configure these frequencies independently for Lite, Medium, and Full Optimization.
86+
You can configure these frequencies independently for Lite, Medium, and Full compaction.
9487

9588
**Default schedules are applied automatically** for each table, so there is no need to open the configuration modal and set frequencies on every table unless a different cadence is required.
9689

97-
**Defaults schedules for each type of optimization:**
90+
**Default schedules for each type of compaction:**
9891

9992
**Lite** — every 1 hour
10093

@@ -108,50 +101,50 @@ If you choose **Custom** in the Frequency dropdown, a **Cron Expression** field
108101

109102
You can enter a standard cron expression here. For example:
110103

111-
- `0 0 * * *` – run the optimization once every day at midnight.
104+
- `0 0 * * *` – run compaction once every day at midnight.
112105

113106
## Step 4: (Advanced) Target File Size
114107

115108
Under the **Advanced Config** dropdown in the modal, you can configure **Target file size**.
116109

117-
![Target file size](pathname:///img/docs/iceberg-maintenance/optimization/target-file-size.webp)
110+
![Target file size](pathname:///img/docs/iceberg-maintenance/compaction/target-file-size.webp)
118111

119-
**Full** Optimization uses **target file size** directly: rewritten data files are aligned toward that size.
112+
**Full** compaction uses **target file size** directly: rewritten data files are aligned toward that size.
120113

121-
**Lite** and **Medium** use it **indirectly**. Their merge and output bounds are derived from the same setting. How each type relates to this value is explained in the [Optimization overview](/docs/iceberg-maintenance/optimization/overview/).
114+
**Lite** and **Medium** use it **indirectly**. Their merge and output bounds are derived from the same setting. How each type relates to this value is explained in the [Types of Compaction](/docs/iceberg-maintenance/compaction/overview/).
122115

123116
In general, a **larger** target tends toward **fewer, bigger** files; a **smaller** target tends toward **more, smaller** files.
124117

125118
If unsure, start with the default (**512 MB**) and tune later based on query-engine behavior.
126119

127-
## Step 5: Save the Optimization Configuration
120+
## Step 5: Save the Compaction Configuration
128121

129122
After configuring the cron:
130123

131124
1. In the modal, click on **Save**.
132125
2. The configuration for that table is saved.
133126

134-
![Save Configuration](pathname:///img/docs/iceberg-maintenance/optimization/save-configuration.webp)
127+
![Save Configuration](pathname:///img/docs/iceberg-maintenance/compaction/save-configuration.webp)
135128

136129
After saving, a **Configuration Successful** modal appears. It **closes automatically after 3 seconds**.
137130

138-
![Configuration Successful modal](pathname:///img/docs/iceberg-maintenance/optimization/configuration-successful.webp)
131+
![Configuration Successful modal](pathname:///img/docs/iceberg-maintenance/compaction/configuration-successful.webp)
139132

140133

141-
## Step 6: Enable the Optimization
134+
## Step 6: Enable Compaction
142135

143-
Saving the configuration does not start optimization automatically. You must enable it:
136+
Saving the configuration does not start compaction automatically. You must enable it:
144137

145138
1. On the **Tables** page, locate the **Status** column next to the **Configure** button for your table.
146-
2. Use the **toggle** in the **Status** column to enable the optimization configuration.
139+
2. Use the **toggle** in the **Status** column to enable the compaction configuration.
147140

148-
![Enable Configuration](pathname:///img/docs/iceberg-maintenance/optimization/enable-optimization.webp)
141+
![Enable Configuration](pathname:///img/docs/iceberg-maintenance/compaction/enable-compaction.webp)
149142

150-
Once enabled, OLake will start running optimization for that table according to the schedule you configured.
143+
Once enabled, OLake will start running compaction for that table according to the schedule you configured.
151144

152145
## Health Score and Last Run Status
153146

154-
With a catalog and database selected, the **Tables** page shows one row per table. The sections below explain **Health Score** (overall table health) and **Last Run status** (per-type status for Lite, Medium, and Full optimization).
147+
With a catalog and database selected, the **Tables** page shows one row per table. The sections below explain **Health Score** (overall table health) and **Last Run status** (per-type status for Lite, Medium, and Full compaction).
155148

156149
### Health Score
157150

@@ -165,7 +158,7 @@ With a catalog and database selected, the **Tables** page shows one row per tabl
165158

166159
Together, these three parts are **weighted 40% / 40% / 20%**: **Small Files Score** and **Eq Delete Score** each contribute **40%** of the Health Score, and **Pos Delete Score** contributes **20%**.
167160

168-
Higher scores generally mean the table is in better shape for reads; lower scores suggest running or tuning optimization more often.
161+
Higher scores generally mean the table is in better shape for reads; lower scores suggest running or tuning compaction more often.
169162

170163
![Health Score column on the Tables page](pathname:///img/docs/iceberg-maintenance/metrics/health-score.webp)
171164

@@ -174,35 +167,33 @@ Higher scores generally mean the table is in better shape for reads; lower score
174167
**Last Run status** always shows three badges—**L (Lite)**, **M (Medium)**, **F (Full)**. Each badge is that type’s **latest** outcome: **running**, **success**, **failed**, **cancelled**, **skipped**, or **never run**.
175168

176169
- **Letters** — Typically you see three badges together:
177-
- **L** — Lite Optimization
178-
- **M** — Medium Optimization
179-
- **F** — Full Optimization
170+
- **L** — Lite compaction
171+
- **M** — Medium compaction
172+
- **F** — Full compaction
180173

181174
- **Colours (quick read, per badge)**
182175
- **Green** — that type’s last run **succeeded**
183176
- **Red** — that type’s last run **failed** or was **cancelled**
184177
- **Yellow** — a run of **that type** is **running** right now
185178
- **Gray** with **** — that type’s last run was **skipped**
186-
- **Gray** with **** — that optimization type has **never** run for this table
179+
- **Gray** with **** — that compaction type has **never** run for this table
187180

188-
- **Not Optimized** — Shown **only** when **no** optimization has run yet for that table—**neither** Lite, **nor** Medium, **nor** Full.
181+
- **Not compacted** — Shown **only** when **no** compaction has run yet for that table—**neither** Lite, **nor** Medium, **nor** Full.
189182

190183
:::info Hover
191184
When you **hover** over a table's Last Run Status, a small **card** opens which includes:
192185

193-
- **Name** — Lite, Medium, or Full
186+
- **Name** — Lite, Medium, or Full
194187
- **Last run** — relative time, such as “2 hours ago”
195188
- **Status** — plain text such as **Success**, **Failed**, **Cancelled**, **Skipped**, or **Running**
196189
:::
197190

198-
![Last Run status](pathname:///img/docs/iceberg-maintenance/runs-and-Logs/last-run-status-hover.webp)
191+
![Last Run status](pathname:///img/docs/iceberg-maintenance/runs-and-logs/last-run-status-hover.webp)
199192

200193
## Next Steps
201194

202-
After your first optimization runs, you can:
203-
204-
- **View Logs & Runs** – see each optimization run and its detailed logs:
205-
[Logs & Runs](/docs/iceberg-maintenance/runs-and-logs)
206-
- **View Metrics** – understand how optimization affecme size as ts file counts, sizes, and health score for your table:
207-
[Metrics](/docs/iceberg-maintenance/metrics)
195+
After your first compaction runs, you can:
208196

197+
- **View [Logs & Runs](/docs/iceberg-maintenance/runs-and-logs)** – see each compaction run and its detailed logs:
198+
- **View [Metrics](/docs/iceberg-maintenance/metrics)** – understand how compaction affects file counts, sizes, and health score for your table:
199+

docs/getting-started/quickstart.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: "Setup OLake"
2+
title: "Quickstart - OLake Go"
33
description: Get started with OLake using Docker Compose. Follow simple steps to deploy, log in, and manage OLake jobs for seamless data replication workflows.
44
sidebar_label: Getting Started
55
sidebar_position: 1

docs/iceberg-maintenance/catalogs.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,13 @@ sidebar_label: Catalogs
55

66
# What are Catalogs?
77

8-
A **catalog** is a entity from where OLake fetches Iceberg tables. It stores reference to the latest metadata file which stores info such as table names, schemas, and file locations. OLake must know which catalog (and database) a table belongs to before optimization can run on it.
8+
A **catalog** is a entity from where OLake fetches Iceberg tables. It stores reference to the latest metadata file which stores info such as table names, schemas, and file locations. OLake must know which catalog (and database) a table belongs to before maintenance can run on it.
99

1010
Catalogs are managed from the **Maintenance** section in the OLake UI. There are two types: **OLake Imported Catalogs** and **External Catalogs**.
1111

1212
## OLake Imported Catalogs
1313

14-
**OLake Imported Catalogs** are catalog entries whose credentials come from an Iceberg **destination** already configured in OLake Ingestion—imported into optimization instead of re-entering connection details.
14+
**OLake Imported Catalogs** are catalog entries whose credentials come from an Iceberg **destination** already configured in OLake Ingestion—imported into Maintenance instead of re-entering connection details.
1515

1616
When Iceberg ingestion is already set up, that destination's catalog can be imported this way.
1717

@@ -42,7 +42,7 @@ When Iceberg ingestion is already set up, that destination's catalog can be impo
4242

4343
Use an external catalog when:
4444

45-
- Optimization is needed on Iceberg tables created outside OLake.
45+
- Maintenance is needed on Iceberg tables created outside OLake.
4646
- The same catalog is used by other systems and OLake should compact those tables without replicating data via Ingestion.
4747

4848
### How to add an External Catalog
@@ -57,6 +57,6 @@ Use an external catalog when:
5757

5858
![Catalog connected view](/img/docs/iceberg-maintenance/catalogs/connect-catalog.webp)
5959

60-
Once a catalog is connected—whether imported or added as external—it appears in the **Select Catalog** dropdown on the **Tables** page. Selecting a catalog (and then a database) lists the tables available for optimization configuration.
60+
Once a catalog is connected—whether imported or added as external—it appears in the **Select Catalog** dropdown on the **Tables** page. Selecting a catalog (and then a database) lists the tables available for maintenance configuration.
6161

62-
For a full walkthrough that includes adding a catalog and configuring the first optimization, see [Configure Your First Optimization](/docs/getting-started/configure-first-optimization).
62+
For a full walkthrough that includes adding a catalog and configuring the first table maintenance, see [Configure Your First Table Maintenance](/docs/getting-started/configure-first-compaction).
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
---
2+
title: Configuration
3+
sidebar_position: 2
4+
---
5+
6+
# Configuring Table Compaction in Olake
7+
8+
Each table in Olake can have its **own compaction schedule and advanced settings**.
9+
Follow these steps to configure compaction for a specific table:
10+
11+
12+
### 1. Click the Configure Button
13+
14+
Click the **Configure** button next to the table you want to compact.
15+
This opens a modal where you can schedule **Lite, Medium, and Full compactions**.
16+
17+
![Configure button](pathname:///img/docs/iceberg-maintenance/compaction/configure-button.webp)
18+
19+
20+
### 2. Set the Compaction Schedule
21+
22+
- Select a schedule from the **predefined dropdown options** or choose **Custom** to specify your own cron expression.
23+
- Compaction will run automatically according to the schedule set for that table.
24+
25+
![Compaction Schedule](pathname:///img/docs/iceberg-maintenance/compaction/configuration.webp)
26+
27+
### 3. Advanced Config: Target File Size
28+
29+
- Expand the **Advanced Config** panel in the modal.
30+
- Specify the **Target File Size** for the table (default **512 MB** if you leave it unchanged).
31+
32+
For how **target file size** affects **Lite**, **Medium**, and **Full** compaction, see [Types of Compaction Supported in OLake](/docs/iceberg-maintenance/compaction/overview).
33+
34+
> **Tip:** Choose a target size based on your query patterns and table size. Larger files can improve scan efficiency based on the query but may increase the cost of rewriting files.
35+
36+
![Target File Size](pathname:///img/docs/iceberg-maintenance/compaction/target-file-size.webp)
37+
38+
### 4. Save the Configuration
39+
40+
- Click **Save**.
41+
42+
![Save Configuration](pathname:///img/docs/iceberg-maintenance/compaction/save-configuration.webp)
43+
44+
- A dialog box confirms that the configuration was successful.
45+
46+
![Save Configuration](pathname:///img/docs/iceberg-maintenance/compaction/configuration-successful.webp)
47+
48+
49+
### 5. Enable the Table for Compaction
50+
51+
- After saving, you will be redirected to the **Tables** page.
52+
- Locate the table and **toggle the Status switch** to activate scheduled compaction for that specific table.
53+
54+
> **Important:** The **Status toggle must be switched on**. Even if a cron schedule is configured, compaction will not execute unless the table is enabled.
55+
56+
![Enable compaction](pathname:///img/docs/iceberg-maintenance/compaction/enable-compaction.webp)
57+
58+

0 commit comments

Comments
 (0)