You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As software/data engineers, we've witnessed Apache Iceberg revolutionize analytical data lakes with ACID transactions, time travel, and schema evolution. Yet when we try to push Iceberg into real-time workloads—sub-second streaming queries, high-frequency CDC updates, and primary key semantics—we hit fundamental architectural walls. This blog explores how Fluss × Iceberg integration works and delivers a true real-time lakehouse.
9
+
As software and data engineers, we've witnessed Apache Iceberg revolutionize analytical data lakes with ACID transactions, time travel, and schema evolution. Yet when we try to push Iceberg into real-time workloads such as sub-second streaming queries, high-frequency CDC updates, and primary key semantics, we hit fundamental architectural walls. This blog explores how Fluss × Iceberg integration works and delivers a true real-time lakehouse.
10
10
11
11
Apache Fluss represents a new architectural approach: the **Streamhouse** for real-time lakehouses. Instead of stitching together separate streaming and batch systems, the Streamhouse unifies them under a single architecture. In this model, Apache Iceberg continues to serve exactly the role it was designed for: a highly efficient, scalable cold storage layer for analytics, while Fluss fills the missing piece: a hot streaming storage layer with sub-second latency, columnar storage, and built-in primary-key semantics.
12
12
13
13
After working on Fluss–Iceberg lakehouse integration and deploying this architecture at a massive scale, including Alibaba's 3 PB production deployment processing 40 GB/s, we're ready to share the architectural lessons learned. Specifically, why existing systems fall short, how Fluss and Iceberg naturally complement each other, and what this means for finally building true real-time lakehouses.
@@ -30,7 +30,7 @@ Four converging forces are driving the need for sub-second data infrastructure:
30
30
31
31
**4. Agentic AI Requires Real-Time Context:** AI agents need immediate access to the current system state to make decisions. Whether it's autonomous trading systems, intelligent routing agents, or customer service bots, agents can't operate effectively on stale data.
@@ -44,8 +44,7 @@ Four converging forces are driving the need for sub-second data infrastructure:
44
44
45
45
Yet critical use cases demand sub-second to second-level latency: search and recommendation systems with real-time personalization, advertisement attribution tracking, anomaly detection for fraud and security monitoring, operational intelligence for manufacturing/logistics/ride-sharing, and Gen AI model inference requiring up-to-the-second features. The industry needs a **hot real-time layer** sitting in front of the lakehouse.
@@ -78,7 +77,7 @@ Traditional architectures force you to maintain **separate systems** for these z
78
77
79
78
**Query flexibility:** Run streaming queries on hot data (Fluss), analytical queries on cold data (Iceberg), or union queries that transparently span both tiers.
Union Read delivers sub-second lakehouse freshness: union delta log on Fluss, Arrow-native exchange, and seamless integration with Flink, Spark *, Trino, and StarRocks.
361
360
@@ -374,7 +373,7 @@ This gives you a working streaming lakehouse environment in minutes. Visit: [htt
374
373
375
374
## Conclusion: The Path Forward
376
375
377
-
Apache Fluss and Apache Iceberg represent a fundamental rethinking of real-time lakehouse architecture. Instead of forcing Iceberg to become a streaming platform (which architecturally it was never designed to be), Fluss embraces Iceberg for its strengths—cost-efficient analytical storage with ACID guarantees—while adding the missing hot streaming layer.
376
+
Apache Fluss and Apache Iceberg represent a fundamental rethinking of real-time lakehouse architecture. Instead of forcing Iceberg to become a streaming platform (which architecturally it was never designed to be), Fluss embraces Iceberg for its strengthscost-efficient analytical storage with ACID guarantees, while adding the missing hot streaming layer.
378
377
379
378
The result is a Streamhouse that delivers:
380
379
@@ -384,7 +383,7 @@ The result is a Streamhouse that delivers:
-**Automatic lifecycle management** from hot to cold tiers
386
385
387
-
For software/data engineers building real-time analytics platforms, the question isn't whether to use Fluss or Iceberg—it's recognizing they solve complementary problems. Fluss handles what happens in the last hour (streaming, updates, real-time queries). Iceberg handles everything before that (historical analytics, ML training, compliance).
386
+
For software/data engineers building real-time analytics platforms, the question isn't whether to use Fluss or Iceberg, it's recognizing they solve complementary problems. Fluss handles what happens in the last hour (streaming, updates, real-time queries). Iceberg handles everything before that (historical analytics, ML training, compliance).
388
387
389
388
### When to Adopt
390
389
@@ -395,7 +394,7 @@ For software/data engineers building real-time analytics platforms, the question
395
394
- Need for primary key semantics with indexed lookups
396
395
- Large Flink stateful jobs (10TB+ state) that could be externalized
397
396
- Desire to unify real-time and historical queries
398
-
- Tired of maintaining dual infrastructure—one for batch, another for real-time
397
+
- Tired of maintaining dual infrastructureone for batch, another for real-time
399
398
400
399
### Next Steps
401
400
@@ -407,4 +406,4 @@ For software/data engineers building real-time analytics platforms, the question
407
406
408
407
The future of real-time analytics isn't Lambda architecture with separate streaming and batch systems. It's unified lakehouse storage where hot and cold are simply tiers of the same table, with data flowing automatically between them.
409
408
410
-
**Apache Fluss makes this vision real—it transforms your lakehouse into a streaming lakehouse.**
409
+
**Apache Fluss makes this vision real, it transforms your lakehouse into a streaming lakehouse.**
0 commit comments