OLake-Fusion is a lakehouse table management system for Apache Iceberg.
It helps teams run faster queries, lower storage cost, and operate Iceberg at scale with less effort.
Operating Iceberg in production is powerful, but day-2 operations can be expensive and complex. OLake-Fusion adds an operational layer on top of Iceberg so your team can focus on data products instead of maintenance jobs.
With OLake-Fusion, you can:
- Keep query performance stable with continuous self-optimization.
- Reduce storage and compute waste from small-file and metadata overhead.
- Manage tables consistently across different catalogs and environments.
- Build infra-decoupled, stream-and-batch-fused, lake-native data platforms.
- Fusion (Management Service): Handles table lifecycle operations such as self-optimization and data expiration, and provides a unified catalog interface across engines.
- Spark Optimizer: Runs optimization tasks that improve file layout and maintain read efficiency.
- Self-Optimizing Tables: Automatically compacts files and organizes data to keep read latency low.
- Multi-Catalog Support: Works with catalogs such as Glue, JDBC, and REST-based catalogs.
- Infrastructure Independent: Deploy on private cloud, public cloud, hybrid cloud, or multi-cloud.
- Lakehouse Ready: Designed for modern analytics workloads on open table formats.
- Up to 2x faster than vanilla Spark compaction in benchmark scenarios.
- Around 5% better query performance in tested workloads.
Read the full benchmark details: Compaction Benchmark
Start with the first end-to-end setup guide:
Helpful next reads:
- Join us on Slack
- Ask questions and report issues via GitHub Issues
- Follow docs and updates at olake.io/docs
Contributions of all sizes are welcome.
- Core project: CONTRIBUTING.md
- UI project: OLake UI Repository
- Docs and website: OLake Docs Repository
- Contributor rewards: Bounty Program
