OLake Go is a high-performance, open-source data ingestion engine for replicating databases, S3, and Kafka into
Apache Iceberg (or plain Parquet).
Built for scalable, real-time pipelines, OLake Go provides a simple web UI and CLI - used to ingest into vendor-lock-in free Iceberg tables supporting all the query-engines/warehouses.
Read the docs and benchmarks at
olake.io/docs.
Join our active community on
Slack.
Note
🎉 OLake Fusion is now live! — Automate your Apache Iceberg Table Maintenance. Check it out here → github.com/datazip-inc/olake-fusion 🎉
OLake Go supports replication from transactional databases such as PostgreSQL, MySQL, MongoDB, Oracle, DB2, and MSSQL, event-streaming systems like Apache Kafka and Object-store like S3, into open data lakehouse formats such as Apache Iceberg or Plain Parquet — delivering blazing-fast performance with minimal infrastructure cost.
- 🧠 Smart sync: Full + CDC replication with automatic schema discovery & schema evolution
- ⚡ High throughput: 580K RPS (Postgres) & 338K RPS (MySQL)
- ➡️ Exactly once delivery & Arrow writes: Accuracy with speed.
- 💾 Iceberg-native: Supports Glue, Hive, JDBC, REST catalogs
- 🖥️ Self-serve UI: Deploy via Docker Compose and sync in minutes
- 💸 Infra-light: No Spark, no Flink, no Kafka, no Debezium
| Source → Destination | Full Load | Relative Performance (Full Load) | Full Report |
|---|---|---|---|
| Postgres → Iceberg (as of 30th Jan 2026) |
5,80,113 RPS | 12.5× faster than Fivetran | Full Report |
| MySQL → Iceberg (as of 30th May 2026) |
1,39,773 RPS | 1.91× faster than Fivetran | Full Report |
| MongoDB → Iceberg (as of 5th Feb 2026) |
37,879 RPS | - | Full Report |
| Oracle → Iceberg (as of 30th Jan 2026) |
5,26,337 RPS | - | Full Report |
| Kafka → Iceberg (as of 27th Feb 2026) |
2,09,065 MPS (Bounded Incremental) | 1.23x slower than Flink | Full Report |
| MSSQL → Iceberg (as of 09th June 2026) |
3,45,866 MPS | 4.32x faster than Fivetran | Full Report |
| Source → Destination | CDC | Relative Performance (CDC) | Full Report |
|---|---|---|---|
| Postgres → Iceberg (as of 30th Jan 2026) |
55,555 RPS | 2× faster than Fivetran | Full Report |
| MySQL → Iceberg (as of 30th May 2026) |
59,951 RPS | 1.52× faster than Fivetran | Full Report |
| MongoDB → Iceberg (as of 5th Feb 2026) |
10,692 RPS | - | Full Report |
| Source | Full Load | CDC | Incremental | Notes | Documentation |
|---|---|---|---|---|---|
| PostgreSQL | ✅ | ✅ pgoutput |
✅ | wal2json deprecated |
Postgres Docs |
| MySQL | ✅ | ✅ | ✅ | Binlog-based CDC | MySQL Docs |
| MongoDB | ✅ | ✅ | ✅ | Oplog-based CDC | MongoDB Docs |
| Oracle | ✅ | WIP | ✅ | JDBC based Full Load & Incremental | Oracle Docs |
| DB2 | ✅ | - | ✅ | JDBC based Full Load & Incremental | DB2 Docs |
| MSSQL | ✅ | ✅ | ✅ | Full Load, CDC & Incremental | MSSQL Docs |
| Source | Full Load | CDC | Incremental | Notes | Documentation |
|---|---|---|---|---|---|
| S3 | ✅ | - | ✅ | Ingests from Amazon S3 or S3-compatible (MinIO, LocalStack) | S3 Docs |
| Source | Bounded Incremental | Notes | Documentation |
|---|---|---|---|
| Kafka | ✅ | Latest offset bounded incremental sync | Kafka Docs |
| Destination | Format | Supported Catalogs |
|---|---|---|
| Iceberg | ✅ | Glue, Hive, JDBC, REST (Nessie, Polaris, Unity, Lakekeeper, AWS S3 tables) |
| Parquet | ✅ | Filesystem |
-
Apache Iceberg Docs
- Catalogs
- AWS Glue Catalog
- REST Catalog
- Generic
- Lakekeeper
- Nessie
- S3 Tables
- Unity
- Apache Polaris
- JDBC Catalog
- Hive Catalog
- Azure ADLS Gen2
- Google Cloud Storage (GCS)
- MinIO (local)
- Catalogs
-
Parquet Writer
- AWS S3 Docs
- Google Cloud Storage (GCS)
- Local FileSystem Docs
OLake UI is a web-based interface for managing OLake Go jobs, sources, destinations and configurations. You can run the entire OLake Go stack (UI, Backend, and all dependencies) using Docker Compose. This is the recommended way to get started. Run the UI, connect your source DB, and start syncing in minutes.
curl -sSL https://raw.githubusercontent.com/datazip-inc/olake-ui/master/docker-compose.yml | docker compose -f - up -dAccess the UI:
- OLake UI: http://localhost:8000
- Log in with default credentials:
admin/password
Detailed getting started using OLake UI can be found here.
With the UI running, you can create a data pipeline in a few steps:
- Configure Source: Navigate to Source tab and click Create Source. Set up your source connection.
- Configure Destination: Navigate to Destination tab and click Create Destination. Set up your destination.
- Create a Job: Navigate to the Jobs tab and click Create Job.
- Configure & Run: Give your job a name, set a schedule, select your source and destination and click Next to finish.
- Select Streams: Choose which tables to sync and configure their sync mode (
CDC,Full RefreshorIncremental).
For a detailed walkthrough, refer to the Jobs documentation.
For advanced users and automation, OLake Go's core logic is exposed via a powerful CLI. The core framework handles state management, configuration validation, logging, and type detection. It interacts with drivers using four main commands:
spec: Returns a render-able JSON Schema for a connector's configuration.check: Validates connection configurations for sources and destinations.discover: Returns all available streams (e.g., tables) and their schemas from a source.sync: Executes the data replication job, extracting from the source and writing to the destination.clear-destination: Clears data in the destination, only for the selected streams defined instreams.json.
Find out more about CLI here.
Below are other different ways you can run OLake Go:
- OLake Go UI (Recommended)
- Kubernetes using Helm
- Standalone Docker container
- Airflow on EC2
- Airflow on Kubernetes
- OLake Go + Apache Iceberg + REST Catalog + Presto
- OLake Go + Apache Iceberg + AWS Glue + Trino
- OLake Go + Apache Iceberg + AWS Glue + Athena
- OLake Go + Apache Iceberg + AWS Glue + Snowflake
- OLake Go + Apache Iceberg + REST Catalog + Spark
- ✅ Migrate from OLTP to Iceberg without Spark or Flink
- ✅ Enable BI over fresh CDC data using Athena, StarRocks, Trino, Presto, Dremio, Databricks, Snowflake and more!
- ✅ Build near real-time data lake-house on cost-efficient cloud object stores
- ✅ Move away from vendor-lock-in warehouse or tools with open data lake-house
- ✅ Single copy for both analytics & machine learning
- Oracle Full Load Support
- Oracle Incremental
- Filters for Full Load and Incremental
- Interoperability (Coming Soon)
- Iceberg V3 Support
We ❤️ contributions, big or small!
Check out our Bounty Program. A huge thanks to all our amazing contributors!
- To contribute to the OLake Go, see CONTRIBUTING.md.
- To contribute to the UI, visit the OLake UI Repository.
- To contribute to the OLake Helm, visit the OLake Helm Repository.
- To contribute to our website and documentation, visit the Olake Docs Repository.
