MINOR: perf optimization for header serialization and type conversion by mjsax · Pull Request #21762 · apache/kafka

mjsax · 2026-03-15T07:45:44Z

This PR replaces the usage of ByteArrayOutputStreams with ByteBuffers.
Some manual benchmarks show a perf improvement for the value-ts-header
and session-header converters of 3x.

This PR replaces the usage of ByteArrayOutputStreams with ByteBuffers. Some manual benchmarks show a perf improvement for the value-ts-header and session-header convertes of 3x.

mjsax · 2026-03-15T07:51:43Z

streams/src/main/java/org/apache/kafka/streams/state/internals/RecordConverters.java

            record.timestampType(),
            record.serializedKeySize(),
-            record.serializedValueSize(),
+            recordValueWithTimestamp != null ? recordValueWithTimestamp.length : 0,


Stumbled over this by chance -- it's a long standing minor bug... Fixing on the side.

mjsax · 2026-03-15T07:51:49Z

streams/src/main/java/org/apache/kafka/streams/state/internals/RecordConverters.java

            record.timestampType(),
            record.serializedKeySize(),
-            record.serializedValueSize(),
+            recordValueWithTimestampAndHeaders != null ? recordValueWithTimestampAndHeaders.length : 0,


mjsax · 2026-03-15T07:52:14Z

streams/src/main/java/org/apache/kafka/streams/state/internals/RecordConverters.java

            record.timestampType(),
            record.serializedKeySize(),
-            record.serializedValueSize(),
+            recordValueWithHeaders != null ? recordValueWithHeaders.length : 0,


And one more

mjsax · 2026-03-15T08:01:07Z

streams/src/test/java/org/apache/kafka/streams/state/internals/RecordConvertersTest.java


    private final RecordConverter timestampedValueConverter = rawValueToTimestampedValue();
    private final RecordConverter headersValueConverter = rawValueToHeadersValue();
+    private final RecordConverter sessionValueConverter = rawValueToSessionHeadersValue();


Adding missing case for session-header-store

mjsax · 2026-03-15T08:01:26Z

streams/src/test/java/org/apache/kafka/streams/state/internals/RecordConvertersTest.java

-            {50, 2, 20, 104, 101, 97, 100, 101, 114, 45, 107, 101, 121, 24, 104, 101, 97, 100, 101,
-                114, 45, 118, 97, 108, 117, 101, 0, 0, 0, 0, 0, 0, 0, 10, 0};
+            {50, 2, 20, 'h', 'e', 'a', 'd', 'e', 'r', '-', 'k', 'e', 'y', 24, 'h', 'e', 'a', 'd', 'e',
+                'r', '-', 'v', 'a', 'l', 'u', 'e', 0, 0, 0, 0, 0, 0, 0, 10, value[0]};


Found a way to make this easier to read :)

nileshkumar3 · 2026-03-15T15:10:44Z

streams/src/main/java/org/apache/kafka/streams/state/internals/HeadersSerializer.java

+        int estimatedBufferSize = 5;
+        for (final Header header : headersArray) {
+            // adding 5 bytes for varint encoding of header-key length
+            estimatedBufferSize += 5 + header.key().length();


we should use key.getBytes(StandardCharsets.UTF_8).length.

It would be better to reuse the byte array from key.getBytes(StandardCharsets.UTF_8)

Should it not be the same?

Oh. UTF-16 vs UTF-8...

yes, The serialized format writes UTF-8 bytes, while String.length() counts UTF-16 code units.

chia7712 · 2026-03-15T19:12:23Z

streams/src/main/java/org/apache/kafka/streams/state/internals/Utils.java

+     */
+    static ByteBuffer prepareByteBufferWithSizePrefix(final int prefix, final int bufferSize) {
+        final ByteBuffer varLengthBuffer = ByteBuffer.allocate(5); // 5 bytes for max varint encoding
+        ByteUtils.writeVarint(prefix, varLengthBuffer);


Would you leverage ByteUtils.sizeOfVarint to avoid creating the temporary buffer?

Oh. I missed that we have sizeOfVarint -- I was actually considering to maybe add such a helper -- this simplifies things quite a bit.

chia7712 · 2026-03-15T19:17:51Z

streams/src/main/java/org/apache/kafka/streams/state/internals/HeadersSerializer.java

+        int estimatedBufferSize = 5;
+        for (final Header header : headersArray) {
+            // adding 5 bytes for varint encoding of header-key length
+            estimatedBufferSize += 5 + header.key().length();


It would be better to reuse the byte array from key.getBytes(StandardCharsets.UTF_8)

chia7712 · 2026-03-15T19:23:51Z

streams/src/main/java/org/apache/kafka/streams/state/internals/HeadersSerializer.java


-            ByteUtils.writeVarint(headersArray.length, out);
+        // start with 5 bytes for varint encoding of header count
+        int estimatedBufferSize = 5;


Maybe we could use sizeOfVarint to get exact size?

int exactBufferSize = ByteUtils.sizeOfVarint(headersArray.length); for (final Header header : headersArray) { final String headerKey = header.key(); int headerKeySize = Utils.utf8Length(headerKey); exactBufferSize += ByteUtils.sizeOfVarint(headerKeySize) + headerKeySize; final byte[] value = header.value(); if (value == null) { exactBufferSize += ByteUtils.sizeOfVarint(-1); } else { exactBufferSize += ByteUtils.sizeOfVarint(value.length) + value.length; } }

mjsax · 2026-03-15T20:57:18Z

Thanks @nileshkumar3 @chia7712 -- pushed an update. Needed to restructure some parts of the code...

mjsax · 2026-03-15T22:13:56Z

streams/src/main/java/org/apache/kafka/streams/state/internals/HeadersSerializer.java


+    final static class PreSerializedHeaders {
+        final int requiredBufferSizeForHeaders;
+        final byte[][][] serializedHeaders;


This 3-dimensional array is a very bad idea... totally kills performance...

chia7712 · 2026-03-15T22:39:37Z

streams/src/test/java/org/apache/kafka/streams/state/internals/HeadersSerializerTest.java

 public class HeadersSerializerTest {

+    @Test
+    public void test() {


Wow, 500 million iterations!

Well, I gives me only a benchmark runtime of 20 sec (with the current code)...

chia7712 · 2026-03-15T22:39:52Z

streams/src/main/java/org/apache/kafka/streams/state/internals/HeadersSerializer.java

+        PreSerializedHeaders(
+            final int requiredBufferSizeForHeaders,
+            final byte[][] rawHeaderKeys,
+            final byte[][] rawHeaderValued


rawHeaderValues?

mjsax added 2 commits March 14, 2026 23:50

MINOR: perf optimization for header serialization and type conversion

aa477af

This PR replaces the usage of ByteArrayOutputStreams with ByteBuffers. Some manual benchmarks show a perf improvement for the value-ts-header and session-header convertes of 3x.

cleanup

4f1e183

mjsax added streams kip Requires or implements a KIP labels Mar 15, 2026

mjsax commented Mar 15, 2026

View reviewed changes

minor

0d2b7e5

nileshkumar3 reviewed Mar 15, 2026

View reviewed changes

chia7712 reviewed Mar 15, 2026

View reviewed changes

mjsax added 2 commits March 15, 2026 12:31

review comments

44fdee1

review comments

0d1952e

mjsax added 2 commits March 15, 2026 14:26

testing

fa2f7c5

avoid expensive array operations which totally kill perf

f2883bf

mjsax commented Mar 15, 2026

View reviewed changes

chia7712 reviewed Mar 15, 2026

View reviewed changes

checkstyle and cleanup

a4a5f52

chia7712 approved these changes Mar 15, 2026

View reviewed changes

nileshkumar3 approved these changes Mar 15, 2026

View reviewed changes

Conversation

mjsax commented Mar 15, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mjsax commented Mar 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mjsax commented Mar 15, 2026 •

edited by github-actions bot

Loading