Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 139 additions & 0 deletions blog/2026-04-28-olake-fusion-introduction-blog.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
---
slug: olake-fusion-introduction
title: "We Built an Easier Way to Maintain Iceberg Tables"
description: Why Iceberg table maintenance tends to become a full-time headache, what teams are doing today to cope with it, and what we built at OLake to actually solve it.
date: 2026-04-28
tags: [iceberg, olake, fusion, compaction, optimization, apache-iceberg, iceberg-tables, iceberg-maintenance, small-files, binpack-compaction, sort-compaction, manifest-rewrite, lakehouse, metadata-optimization]
Comment thread
badalprasadsingh marked this conversation as resolved.
authors: [siddharth]
image: /img/blog/2026/5/introducing-olake-fusion.webp
---

# We Built a Better Way to Maintain Apache Iceberg Tables

Apache Iceberg is the right choice for most modern lakehouses ([read more](https://olake.io/blog/apache-iceberg-features-benefits/)). It gives you ACID guarantees, schema evolution, time travel, and genuinely fast analytical queries — without locking you into any single vendor or engine. The adoption numbers back it up: Iceberg has quietly become the default open table format for teams building serious data infrastructure.

But here's what nobody tells you when you're getting started: picking the right table format is only half the job. The other half is *keeping those tables healthy*. And that part? It's a lot harder than it looks.

This blog is about that second half — specifically, why Iceberg table maintenance tends to become a full-time headache, what teams are doing today to cope with it, and what we built at OLake to actually solve it.

## The Problem That Sneaks Up on You

When you first set up Iceberg, everything feels fast and clean. Queries return in seconds. Pipelines run smoothly. The team is happy.

Then, slowly, things start to drift.

Queries that used to finish in seconds start taking minutes. Your dashboards feel a little sluggish. File counts are climbing. Nothing is *broken*, exactly — but something is clearly off.

What's happening is almost always the same thing: **Small Files**.

Every time a CDC pipeline writes data to Iceberg, it creates new files in object storage. That's how the format works. The trouble is that modern CDC pipelines write constantly. Row-level changes streaming in every few seconds, each batch producing a tiny new file. What should've been 50 well-sized Parquet files has turned into 50,000 tiny ones spread across your table.

![Small files problem diagram](/img/blog/2026/5/small-file-problem.webp)

This is the small files problem, and it triggers a cascade of issues.

**Query engines have to work much harder.** Engines like Spark, Trino, or Athena don't read a table as a single unit. They read individual files. With 50,000 small files, every query involves thousands of extra file listings, metadata reads, and I/O round trips. The total data size hasn't changed, but the work has grown by orders of magnitude.

![Query slowdown diagram](/img/blog/2026/5/query-slowdown.webp)

**Metadata becomes a bottleneck on its own.** Iceberg lists every data file in manifests. The more files you have, the heavier those lists get. Planning a query or committing a write then takes longer, because the engine has to scan a much larger inventory before it can do real work.

**Delete file accumulation makes this even worse.** In CDC-heavy pipelines, every sync doesn't just create new data files — it also creates delete files that track which rows were updated or deleted. These delete files are how Iceberg handles upserts without rewriting entire data files on every change. But delete files have a cost: every query has to apply them at read time to get the correct view of the data. As they pile up, the overhead of applying deletes during reads becomes significant. A table with thousands of delete files will be noticeably slower than the same table after they've been resolved.

![Delete file problem diagram](/img/blog/2026/5/delete-file-problem.webp)
Comment thread
siddharth-chevella marked this conversation as resolved.

**Object storage costs creep up silently.** Cloud storage doesn't just charge for how much data you store — it also charges per API request. More files means more reads, more listings, more API calls on every operation. You won't notice it until the bill shows up, and by then you've been overpaying for weeks.

![Storage Costs problem diagram](/img/blog/2026/5/storage-cost.webp)
Comment thread
siddharth-chevella marked this conversation as resolved.

None of this happens suddenly. It builds up quietly, which is exactly why it catches teams off guard. By the time performance is obviously degraded, the tables are already in rough shape.

## What Teams Do Today (And Why It's a Grind)

The standard fix for small files and delete accumulation is **compaction** — periodically rewriting fragmented small files into larger, well-organized ones and resolving accumulated deletes into the data. Iceberg's ecosystem supports this, and Apache Spark has become the de facto tool for it via `rewrite_data_files`.

So most teams end up doing something like this:

They write a Python script that calls `rewrite_data_files(...)` with the right parameters. They figure out executor counts, memory settings, file size bounds, and parallelism through trial and error. They wire it up to Airflow or a cron job to run every 20 or 30 minutes. A few weeks later, their ingestion rate changes, and the parameters they tuned are no longer appropriate for the table's current state.

This works. Teams do make it work. But look at what they're actually doing:

**Writing custom spark scripts to maintain Iceberg tables.** The compaction script itself becomes a thing that needs documentation, version control, incident response, and occasional debugging at 2am when a job fails and nobody knows why. That's before you account for the fact that most teams have more than one Iceberg table.

**Running one compaction setup for situations that need different treatment.** Some tables need frequent, aggressive compaction, while others only need lighter compaction on a slower schedule. Spark's `rewrite_data_files` doesn't differentiate — it processes whatever files fall within your size bounds, regardless of whether that's the right level of intervention for the current table state.

**Figuring out what happened by digging through scattered logs.** Most setups save something like: Airflow history, Spark driver logs, or files on disk. The hard part is connecting that output back to the table itself—whether file layout improved, deletes were absorbed, and whether the job really helped. When a run fails—or shows success while queries stay slow—**why** is often still unclear. Errors and exit codes alone rarely say what went wrong; you hunt through executor logs and put the picture together by hand.

## Introducing OLake Fusion

![Introducing OLake Fusion](/img/blog/2026/5/introducing-olake-fusion.webp)

OLake Fusion is a dedicated Iceberg table maintenance service. It handles compaction for your Iceberg tables on a per-table cron schedule you configure — with tiered compaction levels, built-in metrics, and enough observability to actually understand what's happening to your tables.

No custom Spark scripts. No wondering if last night's compaction job did anything useful.

### Tiered Compaction: The Right Level of Work at the Right Time

The most important thing about OLake Fusion's approach is that it doesn't treat all compaction as the same operation. It offers three compaction tiers that you can schedule independently, each designed for a different kind of table maintenance need.

**Lite** — Designed for small, frequent cleanup tasks. It keeps tables from slowly sliding into bad shape without using much compute, so you can run it often.
Comment thread
siddharth-chevella marked this conversation as resolved.

**Medium** — Designed for regular cleanup when small files and deletes are starting to slow reads. It does more work than Lite, but avoids the cost of rewriting the whole table.

**Full** — Designed for deep cleanup tasks where the whole table needs to be laid out fresh. It uses the most compute, so it makes sense for occasional resets, not frequent runs.

One important detail: if multiple tiers are scheduled to run at the same time, Fusion automatically runs only the highest one. Medium overrides Lite. Full overrides both. You don't end up doing redundant work when schedules overlap.
Comment thread
siddharth-chevella marked this conversation as resolved.

For the exact details of what each tier does, see the [Types of Compaction](https://olake.io/docs/iceberg-maintenance/compaction/overview/).

This tiered approach matters because it lets you optimize for cost and efficiency at the same time. Running Full compaction every few minutes on a CDC table is wasteful — you're rewriting data that doesn't need rewriting. Running only Lite is insufficient if delete files are building up and impacting read performance. The right answer is run Lite frequently, Medium regularly, Full occasionally. Fusion makes it easy to express exactly that.

### Cheaper than Spark compaction

On comparable infrastructure, Fusion costs about **50% less** than Apache Spark’s `rewrite_data_files` for the same compaction workload without giving up table layout quality. Run-by-run timings, query checks, methodology, and cost breakdown are covered in [OLake Fusion vs Spark compaction benchmark](https://olake.io/blog/iceberg-compaction-spark-vs-fusion-benchmark/)

### Observability That Actually Tells You Something

Here's a problem that doesn't get talked about enough: with custom Spark compaction scripts, visibility is usually something you have to build and maintain yourself.

You can query Iceberg metadata tables before and after a Spark job to calculate file counts and delete counts. But in practice, teams still have to wire that into the job, store the results, connect them to run history, and make them easy to inspect when something feels slow. Fusion makes that visibility part of the product instead of another extra script.

Fusion comes with observability built in, at two levels.

**Per-run logs and metrics.** Fusion keeps logs and metrics for each compaction run, so you can see what happened and dig in without starting from unrelated job noise. More in [Runs and logs](https://olake.io/docs/iceberg-maintenance/runs-and-logs).

![Runs page](/img/docs/iceberg-maintenance/runs-and-logs/runs-page.webp)

**Input vs output for each run.** After each compaction run, Fusion shows metrics for inputs and outputs: counts and sizes for data files and deletes, recorded before versus after each run. You read them straight from the UI instead of reconstructing totals only from unstructured logs.

![Runs Metrics](/img/blog/2026/5/run-metrics.webp)

**Table-level health metrics.** Fusion shows metrics for each table's current state so you can understand and decide whether it needs compaction.

![Table Metrics](/img/docs/iceberg-maintenance/metrics/table-metrics.webp)

The **Tables** page shows an overall **health score** for each table, so you get a first-pass view of whether compaction looks necessary before you dive into detailed metrics.

![Table Metrics](/img/docs/iceberg-maintenance/metrics/health-score.webp)

This is the kind of visibility that makes the difference between proactively maintaining your tables and reactively debugging performance issues after users are already complaining.
Comment thread
merlynm20 marked this conversation as resolved.

To know more about metrics, refer [here](https://olake.io/docs/iceberg-maintenance/metrics).

## Where OLake Fusion fits in

Fusion connects to your Iceberg catalog. For each table, you configure the compaction schedule — which tiers to enable, and how often each one should run. You can think of it like cron: you define the cadence, Fusion executes it.

A typical setup depends on your CDC ingestion frequency. For example, if ingestion runs every 2 minutes, you might schedule Lite every 30 minutes, Medium every 6 hours, and Full every 2 days. Fusion handles the execution, the logging, and the metrics. If a run fails, you see it immediately in the runs view without having to dig through Airflow task logs or SSH into a Spark driver node.

If you're already using OLake for CDC ingestion, Fusion integrates naturally — same catalog, same UI. But it also works as a standalone service if you're using a different ingestion tool.

Refer here for a walkthrough guide: [Getting Started with Fusion](https://olake.io/docs/getting-started/configure-first-compaction)

## TL;DR

If you're running Iceberg with CDC pipelines, table maintenance isn't optional — it's the difference between a lakehouse that stays fast and one that gradually becomes unusable. The small files problem and delete file accumulation are real, they compound over time, and they're hard to notice until performance is already degraded.
Comment thread
merlynm20 marked this conversation as resolved.

Spark-based compaction works, but only if you build and run those jobs yourself. They are often slow and expensive, and it can be hard to tell if each run really helped.

OLake Fusion is built specifically for this. Tiered compaction that matches the level of work to what the table actually needs. 2x faster than Spark. About half the cost. And enough observability to actually understand what's happening to your tables — before your users start asking why queries are slow.
9 changes: 9 additions & 0 deletions blog/authors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -153,3 +153,12 @@ anshika:
email: hello@olake.io
socials:
linkedin: anshika

siddharth:
page: true
name: Siddharth Chevella
title: OLake Maintainer
image_url: /img/authors/siddharth.webp
email: siddharth@olake.io
socials:
linkedin: siddharth-ch05
Binary file added static/img/authors/siddharth.webp
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/blog/2026/5/delete-file-problem.webp
Comment thread
badalprasadsingh marked this conversation as resolved.
Comment thread
badalprasadsingh marked this conversation as resolved.
Comment thread
badalprasadsingh marked this conversation as resolved.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/blog/2026/5/query-slowdown.webp
Comment thread
badalprasadsingh marked this conversation as resolved.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/blog/2026/5/run-metrics.webp
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/blog/2026/5/small-file-problem.webp
Comment thread
badalprasadsingh marked this conversation as resolved.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/img/blog/2026/5/storage-cost.webp
Comment thread
badalprasadsingh marked this conversation as resolved.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading