Skip to content

Apache Kafka 4.0: KRaft, New Features, and Migration

lyx2000 edited this page Apr 23, 2025 · 1 revision

Overview

Apache Kafka 4.0, released on March 18, 2025, represents a significant milestone in the evolution of this popular distributed event streaming platform. This major release introduces architectural transformations, feature enhancements, and performance improvements while marking the end of legacy components. The most prominent change is the complete removal of ZooKeeper dependency, with KRaft (Kafka Raft) becoming the exclusive metadata management solution. This comprehensive guide explores the new features, changes, and best practices for Apache Kafka 4.0.

Major Architectural Changes

Farewell to ZooKeeper: Full Adoption of KRaft Mode

The most significant change in Kafka 4.0 is the complete removal of Apache ZooKeeper dependency[8][9]. After serving as Kafka's backbone for over a decade, ZooKeeper is being replaced by KRaft (Kafka Raft) mode as the sole metadata management protocol[1][7][8]. This architectural shift streamlines Kafka's deployment and management processes by eliminating the need to maintain a separate ZooKeeper ensemble[8][14].

KRaft mode, which became production-ready in Kafka 3.6, offers several advantages:

  • Simplified cluster management and deployment

  • Enhanced scalability for larger clusters

  • Improved reliability and metadata management

  • Reduced operational overhead

The transition to KRaft represents a fundamental paradigm shift in how Kafka manages its distributed coordination and metadata storage[17]. By integrating these functions directly into Kafka, the system becomes more self-contained and easier to manage[8].

Migration Requirements for Existing Clusters

For users planning to upgrade to Kafka 4.0, migration from ZooKeeper-based clusters to KRaft mode is mandatory[16][15]. The migration path depends on your current Kafka version:

  • If running Kafka 3.3.x through 3.9.x in KRaft mode: Direct upgrade to 4.0 is possible

  • If running Kafka 3.3.x through 3.9.x in ZooKeeper mode: Migrate to KRaft first, then upgrade to 4.0

  • If running Kafka versions older than 3.3.x: Upgrade to version 3.9.x first, then migrate to KRaft, before upgrading to 4.0[16]

The dynamic KRaft quorums introduced in Kafka 3.9 serve as a "bridge release" to Kafka 4.0, enabling the addition or removal of controller nodes without downtime[1].

Core Feature Enhancements

New Consumer Group Protocol (KIP-848)

Kafka 4.0 delivers the general availability of KIP-848, which introduces a powerful new consumer group protocol designed to dramatically improve rebalance performance[8][18]. This enhancement addresses one of the long-standing pain points in Kafka deployments, particularly in large-scale systems. Benefits include:

  • Significantly reduced downtime during rebalances

  • Lower latency for consumer operations

  • Enhanced reliability and responsiveness of consumer groups

  • Improved scalability for large consumer group deployments[8]

Queues for Kafka (KIP-932)

An exciting addition to Kafka 4.0 is the early access to Queues for Kafka (KIP-932), which enables Kafka to support traditional queue semantics directly[8][18]. This feature extends Kafka's versatility beyond its traditional pub/sub model:

  • Support for point-to-point messaging patterns

  • Compatibility with traditional queue-based applications

  • Broader range of use cases without requiring additional messaging systems[8]

Client Protocol Improvements

Several KIPs (Kafka Improvement Proposals) enhance client reliability and protocol functionality:

  • KIP-1102 : Enables clients to rebootstrap based on timeout or error code, enhancing resilience by proactively triggering metadata rebootstrap when updates don't occur within a timeout period[8]

  • KIP-896 : Removes old client protocol API versions, requiring broker version 2.1 or higher before upgrading Java clients to 4.0[8][16]

  • KIP-1124 : Provides a clear Kafka Client upgrade path for 4.x, outlining the upgrade process for Kafka Clients, Streams, and Connect[8]

Technical Requirements and Dependencies

Java Version Updates

Kafka 4.0 raises the minimum Java version requirements across its components:

  • Kafka Clients and Kafka Streams now require Java 11

  • Kafka Brokers, Connect, and Tools now require Java 17[8]

These changes align Kafka with modern Java development practices and security requirements.

Logging Framework Transition

The Log4j appender, deprecated in earlier versions, is completely removed in Kafka 4.0, completing the transition to Log4j2[1][14][15]. This change:

  • Addresses security vulnerabilities such as Log4Shell

  • Aligns with modern logging practices

  • Improves logging performance and capabilities[1]

Removal of Legacy Components

MirrorMaker 1, which was deprecated in Kafka 3.0, is officially removed in Kafka 4.0[1][15]. Users must migrate to MirrorMaker 2 or alternative mirroring tools before upgrading to version 4.0.

Kafka Streams Enhancements

Foreign Key Extraction (KIP-1104)

KIP-1104 enhances Kafka Streams by allowing foreign keys to be extracted directly from both record keys and values[8]. This removes the need to duplicate keys into values for foreign-key joins, providing:

  • Simplified joins

  • Reduced storage overhead

  • More intuitive developer experience[8]

Custom Processor Wrapping (KIP-1112)

The introduction of the ProcessorWrapper interface (KIP-1112) simplifies the application of cross-cutting logic in Kafka Streams[8]. This feature:

  • Enables seamless injection of custom logic around Processor API and DSL processors

  • Eliminates redundancy

  • Reduces maintenance overhead from manually integrating logic into each processor[8]

Error Handling and Metrics Improvements

Additional enhancements to Kafka Streams include:

  • KIP-1065 : Adds a "RETRY" option to ProductionExceptionHandler , allowing users to break retry loops with customizable error handling[8]

  • KIP-1091 : Improves Kafka Streams operator metrics, adding state metrics for each StreamThread and client instance for detailed visibility into application state[8]

Configuration and Performance Optimization

Key Configuration Parameters

Parameter
Description
Recommendation
replica.lag.time.max.ms
Controls when replicas are considered out of sync
Balance between reliability and performance; tune based on network conditions[11]
num.network.threads
Number of threads for network requests
Increase for high-throughput scenarios[11]
num.io.threads
Number of threads for disk I/O
Adjust based on storage performance[11]
segment.ms
Time before rolling to a new log segment
Avoid setting too low to prevent "Too many open files" errors[6]
request.timeout.ms
How long clients wait for broker response
Default is 30 seconds; setting too low can cause unnecessary failures[6]

Common Pitfalls to Avoid

  1. Setting ** request.timeout.ms ** too low : This can cause unnecessary client failures when brokers are under load[6]

  2. Misunderstanding producer retries : Configure retries appropriately to ensure message delivery reliability[6]

  3. Excessive partitioning : While more partitions enable parallelism, they increase replication latency and server overhead[11]

  4. Setting ** segment.ms ** too low : This creates many small segment files, potentially causing file handle exhaustion and performance degradation[6]

  5. Unmonitored broker metrics : Regularly monitor key metrics like network throughput, open file handles, and JVM stats[2]

Topic Management Best Practices

Single Topic vs. Multiple Topics

When designing Kafka applications, a common question is whether to use a single topic for multiple entities or create separate topics. The best practice is typically to use a single topic with appropriate partitioning rather than creating many topics[12]. This approach:

  • Avoids hitting limits on the number of topics

  • Reduces file handle consumption

  • Improves overall cluster performance

  • Simplifies management and monitoring[12]

Partition Sizing Considerations

The number of partitions directly impacts performance and scalability:

  • More partitions : Greater parallelization and throughput

  • Too many partitions : Increased replication latency, longer rebalances, more open server files[2]

A balanced approach based on throughput requirements and consumer parallelism is recommended.

Framework and Ecosystem Integration

Spring for Apache Kafka Support

Spring for Apache Kafka is being updated to support Kafka 4.0:

  • Spring for Apache Kafka 4.0.0-M1 is available, compatible with Spring Framework 7.0.0-M3[10]

  • Spring for Apache Kafka 3.3.4 is compatible with Kafka Client 3.9.0[10]

  • Spring Boot users can override Kafka Client versions as needed[10]

Comparison with Redpanda

Redpanda, a Kafka-compatible alternative written in C++, offers a different implementation with similar capabilities:

Feature
Kafka
Redpanda
Protocol Compatibility
Native
Compatible with Kafka protocol
Language
Java
C++
Metadata Management
KRaft (formerly ZooKeeper)
Built-in Raft consensus
Messaging Capabilities
Pub/sub with partition-level ordering
Same as Kafka
Ecosystem
Large, mature ecosystem
Compatible with Kafka ecosystem
Stream Processing
Kafka Streams, ksqlDB
Limited native capabilities

While Redpanda aims to be compatible with the Kafka protocol, there's no guarantee that all Kafka components will behave the same way when used with Redpanda[5].

Conclusion

Apache Kafka 4.0 represents a significant evolution of the platform, with the removal of ZooKeeper dependency being the most transformative change. The adoption of KRaft mode simplifies deployment while improving scalability and reliability. Other enhancements like the new consumer group protocol, queue support, and Streams API improvements further expand Kafka's capabilities.

Organizations planning to upgrade should carefully assess their current deployments, ensure compatibility with the new requirements, and consider a phased migration approach. With these changes, Kafka continues to evolve as a central component in modern data architectures, maintaining its position as the leading distributed event streaming platform.

As with any major release, testing in a non-production environment before upgrading production systems is strongly recommended, with particular attention to the ZooKeeper to KRaft migration process for existing clusters.

If you find this content helpful, you might also be interested in our product AutoMQ. AutoMQ is a cloud-native alternative to Kafka by decoupling durability to S3 and EBS. 10x Cost-Effective. No Cross-AZ Traffic Cost. Autoscale in seconds. Single-digit ms latency. AutoMQ now is source code available on github. Big Companies Worldwide are Using AutoMQ. Check the following case studies to learn more:

References:

  1. Introduction to Kafka 4.0

  2. Apache Kafka Best Practices to Optimize Your Deployment

  3. Top Four Features in Confluent Platform 5.4

  4. Conduktor Targets Kafka Developers

  5. Redpanda vs Kafka Comparison

  6. 5 Common Pitfalls When Using Apache Kafka

  7. Major Update: Apache Kafka 4.0 to Arrive with Breaking Changes

  8. Apache Kafka Blog

  9. Released from the Cage: Apache Kafka without its ZooKeeper

  10. Spring Kafka 4.0.0-M1, 3.3.4, and 3.2.8 Available Now

  11. Top 10 Kafka Configuration Tweaks for Performance

  12. Best Practice for Topic Management in Kafka

  13. Introducing Confluent Platform 4.0

  14. Red Hat Apache Kafka Documentation

  15. Taming Apache Kafka 4.0

  16. Kafka Upgrade Guide

  17. What to Expect from Apache Kafka in 2025

  18. Apache Kafka 4.0 Released

  19. Apache Kafka Documentation

  20. Apache Kafka 4.0 is Here

  21. Queue Support in Apache Kafka 4.0

  22. Industrial IoT Reference Architecture

  23. AIOKafka Issues

  24. Introducing Apache Kafka 4.0

  25. Queues on Kafka

  26. Confluent Kafka 4.0 Release Notes

  27. ZooKeeper with Kafka Guide

  28. Migration from ZooKeeper to KRaft

  29. Kafka-Go Issues

  30. ZooKeeper-less Kafka

  31. Apache Kafka 4.0.0 Release Notes

  32. Configuring Kafka in WSO2

  33. Kafka Connection Setup Guide

  34. Kafka Tutorial: Running Without ZooKeeper

  35. Kafka 4.0 New Features and Enhancements

  36. Modifying Kafka Partitions and Replicas

  37. Kafka vs Redpanda Performance Analysis

  38. Oracle Kafka Configuration Guide

  39. Kafka Streams Upgrade Guide

  40. Kafka 4.0.0 Release Notes

  41. Redpanda Migration Guide

  42. Kafka Producer Configuration Guide

  43. Kafka Connector Migration Guide

  44. Spring Kafka Release Announcement

  45. Managed Apache Kafka as a Service

  46. Apache Kafka in Industrial IoT and Manufacturing

  47. A 2-Minute Overview of Apache Kafka 4.0

  48. KIP-932: Queues for Kafka

  49. Industry 4.0: Siemens Brose and Confluent

  50. Planning Your Kafka 4.0 Upgrade

  51. Apache Kafka 3.9 Documentation

  52. Troubleshooting Kafka Conduktor Connection

  53. NetApp Kafka Best Practice Guidelines

  54. Conduktor Console Best Practices

  55. Apache Kafka 4.0 Release Overview

  56. Why Apache Kafka in Manufacturing and Industry 4.0

  57. Apache Kafka in Manufacturing: Industry 4.0 Use Cases

AutoMQ Wiki Key Pages

What is automq

Getting started

Architecture

Deployment

Migration

Observability

Integrations

Releases

Benchmarks

Reference

Articles

Clone this wiki locally