Skip to content

KAFKA-19648; Cluster metadata bootstrapping with kraft checkpoint#20707

Merged
jsancio merged 102 commits into
apache:trunkfrom
mannoopj:kip-1170-format
May 27, 2026
Merged

KAFKA-19648; Cluster metadata bootstrapping with kraft checkpoint#20707
jsancio merged 102 commits into
apache:trunkfrom
mannoopj:kip-1170-format

Conversation

@mannoopj
Copy link
Copy Markdown
Contributor

@mannoopj mannoopj commented Oct 15, 2025

Previously, bootstrap metadata was stored in a separate
bootstrap.checkpoint file, while the zero checkpoint contained only
KRaft control records. This change unifies them by having the Formatter
append bootstrap metadata records into the zero checkpoint alongside the
existing KRaft control records, integrating with KRaft's bootstrapping
checkpoint mechanisms like RaftClient.Listener#handleLoadBootstrap and
KIP-630 snapshot lifecycle management.

QuorumController's handleLoadBootstrap now extracts bootstrap records
from the zero checkpoint and stores them as BootstrapMetadata, which is
later committed by ActivationRecordsGenerator when the controller
activates on an empty metadata log.

The BootstrapDirectory class is removed and its functionality
consolidated into static methods on BootstrapMetadata#fromDirectory
reads from the legacy bootstrap.checkpoint (falling back to defaults),
and fromCheckpointFile reads from a specific checkpoint path.
StorageTool now only writes the bootstrap snapshot when the node has the
Controller role. KafkaClusterTestKit is updated to pass non-feature
versions, non-SCRAM bootstrap records to the Formatter as additional
bootstrap records.

Reviewers: José Armando García Sancio jsancio@apache.org, Kevin Wu
kevin.wu2412@gmail.com

@github-actions github-actions Bot added triage PRs from the community kraft small Small PRs labels Oct 15, 2025
@mannoopj mannoopj changed the title KIP-1170: Formatter changes KAFKA-19648: Formatter refactoring Oct 15, 2025
Copy link
Copy Markdown
Contributor

@kevin-wu24 kevin-wu24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @mannoopj. Some high level comments:

Comment on lines 522 to 540
Copy link
Copy Markdown
Contributor

@kevin-wu24 kevin-wu24 Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a clearer way to make these changes. This method is what writes the 0-0.checkpoint currently. We should pass the bootstrap metadata object here and append the metadata records using writer.append.append(bootstrapMetadata.records()) before calling writer.freeze().

Comment on lines +448 to +449
if (directoryTypes.get(writeLogDir).isDynamicMetadataDirectory()) {
writeDynamicQuorumSnapshot(writeLogDir,
writeDynamicQuorumSnapshot(clusterMetadataDirectory.getPath(),
Copy link
Copy Markdown
Contributor

@kevin-wu24 kevin-wu24 Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should rename writeDynamicQuorumSnapshot to writeZeroSnapshot, and write it when formatting metadata directories. The semantics that change here are that we should not write the KRaft control records (KRaft version and voter set) when !isDynamicMetadataDirectory().

Comment on lines +453 to +472
File createdBoostrapCheckpoint = new File(clusterMetadataDirectory.getPath() + "/" + BootstrapDirectory.BINARY_BOOTSTRAP_FILENAME);
File created000Checkpoint = new File(clusterMetadataDirectory.getPath() + "/" + BootstrapDirectory.BINARY_CHECKPOINT_FILENAME);
Files.write(
createdBoostrapCheckpoint.toPath(),
Files.readAllBytes(created000Checkpoint.toPath()),
StandardOpenOption.APPEND);
try {
created000Checkpoint.delete();
createdBoostrapCheckpoint.renameTo(created000Checkpoint);
} catch (Exception ex) {
throw new RuntimeException("Failed operation to combine metadata and kraft records: ", ex);
}
} else {
File createdBoostrapCheckpoint = new File(clusterMetadataDirectory.getPath() + "/" + BootstrapDirectory.BINARY_BOOTSTRAP_FILENAME);
File created000Checkpoint = new File(clusterMetadataDirectory.getPath() + "/" + BootstrapDirectory.BINARY_CHECKPOINT_FILENAME);
try {
createdBoostrapCheckpoint.renameTo(created000Checkpoint);
} catch (Exception ex) {
throw new RuntimeException("Failed to rename file: ", ex);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is confusing. Instead of doing this renaming and deleting. We should instead remove the call to write the bootstrap metadata to disk on Line 447, since we're no longer writing bootstrap.checkpoint anymore, and follow the other comments for writing metadata records to 0-0.checkpoint.

We can check if an old bootstrap.checkpoint exists and delete it, since IIRC that was part of the KIP.

@github-actions github-actions Bot removed the triage PRs from the community label Oct 16, 2025
@github-actions github-actions Bot added the core Kafka Broker label Oct 20, 2025
Copy link
Copy Markdown
Contributor

@kevin-wu24 kevin-wu24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @mannoopj. Left some more comments

Comment on lines 513 to 514
try (RecordsSnapshotWriter<ApiMessageAndVersion> writer = builder.build(new MetadataRecordSerde(), Optional.of(bootstrapMetadata.records()))) {
writer.freeze();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try (RecordsSnapshotWriter<ApiMessageAndVersion> writer = builder.build(new MetadataRecordSerde(), Optional.of(bootstrapMetadata.records()))) {
writer.freeze();
try (RecordsSnapshotWriter<ApiMessageAndVersion> writer = builder.build(new MetadataRecordSerde()))) {
writer.append(bootstrapMetadata.records());
writer.freeze();

Copy link
Copy Markdown
Contributor Author

@mannoopj mannoopj Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this write the bootstrap records after the control records, since in RecordsSnapshotWriter.build() is where we append the kraft records and we would be calling that ahead of writer.append in this scenario? we want the bootstrap records ahead correct?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter what order we write them in because KRaft only reads the control records, and the metadata module will only read the "data" records. When we read the 0-0.checkpoint back into memory, we only deal with either its control records or its data records, not both in the same code.

.setKraftVersion(KRaftVersion.KRAFT_VERSION_1)
.setVoterSet(Optional.of(VoterSetTest.voterSet(VoterSetTest.voterMap(IntStream.of(1, 2, 3), true))))
.build(MetadataRecordSerde.INSTANCE)
.build(MetadataRecordSerde.INSTANCE, emptyOptional)
Copy link
Copy Markdown
Contributor

@kevin-wu24 kevin-wu24 Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to remove this. This applies to other instances where we build the RecordsSnapshotWriter.

}
writeBoostrapSnapshot(writeLogDir,
bootstrapMetadata,
initialControllers.get(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to pass the optional for initialControllers here. We can only do a .get() if initialControllers.isPresent().

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only construct the VoterSet object if initialControllers.isPresent().

@github-actions github-actions Bot added clients and removed small Small PRs labels Oct 24, 2025
Comment on lines +1030 to +1033
// For bootstrap snapshots, extract feature levels from all data records
if (batch.controlRecords().isEmpty()) {
bootstrapMetadata = BootstrapMetadata.fromRecords(messages, "bootstrap");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct logic here is:
If the batch has records, read them into bootstrapMetadata (this means 0-0.checkpoint has bootstrap metadata records).
If the batch doesn't have records, try to read the bootstrapMetadata from bootstrap.checkpoint.

Copy link
Copy Markdown
Contributor

@kevin-wu24 kevin-wu24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @mannoopj. Review of the metadata layer implementation:

Comment on lines +63 to +66
if (level > 0) {
records.add(new ApiMessageAndVersion(new FeatureLevelRecord().
setName(featureName).
setFeatureLevel(level), (short) 0));
}
// Include all feature levels, including level 0 which may disable features
records.add(new ApiMessageAndVersion(new FeatureLevelRecord().
setName(featureName).
setFeatureLevel(level), (short) 0));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not change. The default level of features is 0, and that is why we don't add a record for them when the level is 0.

CLUSTER_METADATA_TOPIC_PARTITION.partition()),
BINARY_BOOTSTRAP_CHECKPOINT_FILENAME);
if (!Files.exists(binaryBootstrapPath)) {
return readFromConfiguration();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are at L76, that means we do 0-0.checkpoint doesn't exist. This is where we should read from bootstrap.checkpoint. If that doesn't exist too, then we call readFromConfiguration().

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if you look at how we're reading stuff in, we probably don't even need to change this file. We don't need to call BootstrapDirectory#read for the 0-0.checkpoint because we are using handleLoadSnapshot, which already puts the checkpoint in memory for us.

(metaPropsEnsemble, bootstrapMetadata)
// val bootstrapDirectory = new BootstrapDirectory(config.metadataLogDir)
// val bootstrapMetadata = bootstrapDirectory.read()
(metaPropsEnsemble, null)
Copy link
Copy Markdown
Contributor

@kevin-wu24 kevin-wu24 Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove bootstrapMetadata here and in ControllerServer since it is just being passed down to QuorumController eventually. We initialize it in QuorumController.

Comment on lines +486 to +497

// Copy feature levels from TestKitNodes bootstrap metadata to ensure test annotations are respected
for (var record : nodes.bootstrapMetadata().records()) {
if (record.message() instanceof FeatureLevelRecord featureLevelRecord) {
String featureName = featureLevelRecord.name();
short level = featureLevelRecord.featureLevel();
// Don't override MetadataVersion as it's handled by setReleaseVersion()
if (!featureName.equals("metadata.version")) {
formatter.setFeatureLevel(featureName, level);
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some background on how the KafkaClusterTestKit works:

Basically, tests that use this class are "integration tests" in the sense that we're trying to replicate a real cluster, just all within the same JVM. That means multiple brokerServers and controllerServers. This file shouldn't change outside of removing nodes.bootstrapMetadata() from the ControllerServer constructor.

@github-actions github-actions Bot added streams build Gradle build or GitHub Actions labels Nov 3, 2025
Copy link
Copy Markdown
Contributor

@kevin-wu24 kevin-wu24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a high level comment @mannoopj

Comment on lines +184 to +186
// val bootstrapDirectory = new BootstrapDirectory(config.metadataLogDir)
// val bootstrapMetadata = bootstrapDirectory.read()
(metaPropsEnsemble, null)
Copy link
Copy Markdown
Contributor

@kevin-wu24 kevin-wu24 Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this change more compatible for the existing test framework, we should instead pass down a BootstrapCheckpointFactory/Builder or something like that. Then have two separate implementations:

One for tests that specifies a BootstrapMetadata object all in-memory based on the factory.
In the actual implementation, we can point that factory to the actual files on disk we would be reading.

Either way, in QuorumController#handleLoadSnapshot, that is when we actually "resolve" this bootstrap metadata stuff by calling a method on the factory/builder object.\

EDIT: after looking at QuorumTestHarness and KafkaClusterTestKit, we shouldn't need to do this.

Copy link
Copy Markdown
Contributor

@kevin-wu24 kevin-wu24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some more comments regarding the metadata layer implementation @mannoopj

Comment on lines +143 to +181
// Write bootstrap records to the log so brokers can read them, but only if not handling a partial transaction
// Brokers can't read snapshots, only log entries
boolean shouldWriteBootstrapRecords = (transactionStartOffset == -1L);
if (shouldWriteBootstrapRecords) {
logMessageBuilder
.append("Writing bootstrap records to log for broker consumption. ")
.append("Appending ")
.append(bootstrapMetadata.records().size())
.append(" bootstrap record(s) ");

if (curMetadataVersion.isMetadataTransactionSupported()) {
records.add(new ApiMessageAndVersion(
new BeginTransactionRecord().setName("Bootstrap records"), (short) 0));
logMessageBuilder.append("in metadata transaction ");
}
logMessageBuilder
.append("at metadata.version ")
.append(curMetadataVersion)
.append(" from bootstrap source '")
.append(bootstrapMetadata.source())
.append("'. ");

// Add bootstrap records
records.addAll(bootstrapMetadata.records());

// If ELR is enabled, we need to set a cluster-level min.insync.replicas.
if (bootstrapMetadata.featureLevel(EligibleLeaderReplicasVersion.FEATURE_NAME) > 0) {
records.add(new ApiMessageAndVersion(new ConfigRecord().
setResourceType(BROKER.id()).
setResourceName("").
setName(TopicConfig.MIN_IN_SYNC_REPLICAS_CONFIG).
setValue(Integer.toString(defaultMinInSyncReplicas)), (short) 0));
}

if (curMetadataVersion.isMetadataTransactionSupported()) {
records.add(new ApiMessageAndVersion(new EndTransactionRecord(), (short) 0));
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing this code?

We shouldn't change this code though, because when the log is non-empty, it means the bootstrap metadata records have already been written in the log before.

Comment on lines +1027 to +1031
if (batch.controlRecords().isEmpty()) {
System.out.println("DEBUG: Extracting bootstrap metadata from " + messages.size() + " records");
bootstrapMetadata = BootstrapMetadata.fromRecords(messages, "bootstrap");
System.out.println("DEBUG: Bootstrap metadata extracted: " + bootstrapMetadata);
}
Copy link
Copy Markdown
Contributor

@kevin-wu24 kevin-wu24 Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong if condition. We should check !batch.records.isEmpty.

If the 0-0.checkpoint does not have metadata records, AND bootstrapMetadata == null at this point, we should throw an IllegalStateException, because we cannot construct bootstrapMetadata.

Comment on lines +1032 to +1037
} else {
Map<String, Short> featureVersions = new HashMap<>();
MetadataVersion metadataVersion = MetadataVersion.latestProduction();
featureVersions.put(MetadataVersion.FEATURE_NAME, metadataVersion.featureLevel());
featureVersions.put(KRaftVersion.FEATURE_NAME, raftClient.kraftVersion().featureLevel());
bootstrapMetadata = BootstrapMetadata.fromVersions(metadataVersion, featureVersions, "generated default");
Copy link
Copy Markdown
Contributor

@kevin-wu24 kevin-wu24 Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're not reading the 0-0.checkpoint, bootstrapMetadata is either:

  1. read from bootstrap.checkpoint and passed down here, so it is non-null.
  2. null, because it should have already been written to the log as part of the 0-0.checkpoint.

Comment on lines +1156 to +1159
if (bootstrapMetadata == null) {
throw new IllegalStateException("Bootstrap metadata not available during activation. " +
"This should not happen if a bootstrap snapshot was processed.");
}
Copy link
Copy Markdown
Contributor

@kevin-wu24 kevin-wu24 Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should allow bootstrapMetadata to be null here because we can be in the case where we bootstrapped using 0-0.checkpoint, but that file no longer exists because it was cleaned up by KRaft. However, bootstrapMetadata cannot be null when we call recordsForEmptyLog. It can be null when we call recordsForNonEmptyLog

List<ApiMessageAndVersion> messages = batch.records();

if (bootstrapMetadata == null) {
if (reader.snapshotId().equals(Snapshots.BOOTSTRAP_SNAPSHOT_ID)) {
Copy link
Copy Markdown
Contributor

@kevin-wu24 kevin-wu24 Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we are reading in the 0-0.checkpoint, the ONLY thing we should be doing if !messages.isEmpty() in this method is using messages to construct a bootstrapMetadata object. It should not append an event even I think...

0-0.checkpoint is special because its records are uncommitted, unlike all other checkpoints this method handles, and need to be written to the log when a leader is determined.

This changed because previously, 0-0.checkpoint did not contain any metadata records, just KRaft control records potentially.

@github-actions github-actions Bot added the tools label Mar 25, 2026
@mannoopj
Copy link
Copy Markdown
Contributor Author

mannoopj commented Mar 25, 2026

What are these other threads that read bootstrapMetadata?

Was working under the assumptions that the raft io thread was calling listener.handleLoadBootstrap. Didnt realize appendRaftEvent adds this to the controller event thread.

Comment thread core/src/main/scala/kafka/server/KafkaRaftServer.scala Outdated
Comment thread core/src/test/scala/unit/kafka/server/KafkaRaftServerTest.scala
Comment thread metadata/src/main/java/org/apache/kafka/metadata/bootstrap/BootstrapMetadata.java Outdated
Comment on lines +127 to +137
case KRAFT_VERSION: {
KRaftVersionRecord message = new KRaftVersionRecord();
message.read(new ByteBufferAccessor(record.value()), (short) 0);
messages.add(new ApiMessageAndVersion(message, (short) 0));
break;
}
case KRAFT_VOTERS:
VotersRecord message = new VotersRecord();
message.read(new ByteBufferAccessor(record.value()), (short) 0);
messages.add(new ApiMessageAndVersion(message, (short) 0));
break;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you file an issue for this? If so can you link it here?

Comment thread metadata/src/main/java/org/apache/kafka/controller/QuorumController.java Outdated
Comment thread metadata/src/main/java/org/apache/kafka/metadata/bootstrap/BootstrapMetadata.java Outdated
Comment thread metadata/src/main/java/org/apache/kafka/metadata/bootstrap/BootstrapMetadata.java Outdated
Copy link
Copy Markdown
Member

@jsancio jsancio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feature @mannoopj . Just some minor comments.

@github-actions github-actions Bot added the build Gradle build or GitHub Actions label May 14, 2026
@jsancio
Copy link
Copy Markdown
Member

jsancio commented May 19, 2026

@mannoopj there are test failures please take a look.

@mannoopj
Copy link
Copy Markdown
Contributor Author

mannoopj commented May 19, 2026

@mannoopj there are test failures please take a look.

The integration tests failure can be mapped out this way:
`On trunk:

  1. Test sets @ClusterFeature(feature = Feature.GROUP_VERSION, version = 0)
  2. RaftClusterInvocationContext builds newFeatureLevels map with group.version=0
  3. BootstrapMetadata.fromVersions() skips level 0 — nodes.bootstrapMetadata() has no GROUP_VERSION record (implicitly 0)
  4. KafkaClusterTestKit.formatNode() loops through bootstrapMetadata.records() — no GROUP_VERSION record found, so formatter.setFeatureLevel("group.version") is never called
  5. Formatter's calculateEffectiveFeatureLevels() sees GROUP_VERSION is missing, fills in default (e.g. 1) — writes bootstrap.checkpoint with GROUP_VERSION=1
  6. bootstrap.checkpoint is never read — KafkaClusterTestKit doesn't use KafkaRaftServer.initializeLogDirs()
  7. ControllerServer gets nodes.bootstrapMetadata() directly — no GROUP_VERSION record
  8. ActivationRecordsGenerator writes bootstrapMetadata.records() to metadata log — no GROUP_VERSION record
  9. Broker sees no GROUP_VERSION in metadata log — level 0 — API disabled — test passes

On this branch:

  1. Test sets @ClusterFeature(feature = Feature.GROUP_VERSION, version = 0)
  2. RaftClusterInvocationContext builds newFeatureLevels map with group.version=0
  3. BootstrapMetadata.fromVersions() skips level 0 — nodes.bootstrapMetadata() has no GROUP_VERSION record (implicitly 0)
  4. KafkaClusterTestKit.formatNode() loops through bootstrapMetadata.records() — no GROUP_VERSION record found, so formatter.setFeatureLevel("group.version") is never called
  5. Formatter's calculateEffectiveFeatureLevels() sees GROUP_VERSION is missing, fills in default (e.g. 1) — writes the zero checkpoint with GROUP_VERSION=1
  6. ControllerServer gets nodes.bootstrapMetadata() directly — no GROUP_VERSION record (still correct at this point)
  7. handleLoadBootstrap reads the zero checkpoint — finds GROUP_VERSION=1 — overwrites bootstrapMetadata with GROUP_VERSION=1
  8. ActivationRecordsGenerator writes the overwritten bootstrapMetadata.records() to metadata log — GROUP_VERSION=1
  9. Broker sees GROUP_VERSION=1 in metadata log — level 1 — API enabled — test fails`

This seems like an existing bug that never manifested. The problem now is Formatter needs a way to know if a feature is 0 or not. Since not writing FeatureLevelRecords set to 0 is correct and wanted behavior i think we need to pass in some sort of explicit list of features so that calculateEffectiveFeatureLevels can set these to 0.

@jsancio @kevin-wu24

@jsancio
Copy link
Copy Markdown
Member

jsancio commented May 19, 2026

This seems like an existing bug that never manifested. The problem now is Formatter needs a way to know if a feature is 0 or not. Since not writing FeatureLevelRecords set to 0 is correct and wanted behavior i think we need to pass in some sort of explicit list of features so that calculateEffectiveFeatureLevels can set these to 0.

Yes, looks like the issue is with how the test is constructed. Can we avoid using BootstrapMetadata for bootstrapping the the cluster nodes? For example, can the test be configured using the Formatter directly?

@kevin-wu24
Copy link
Copy Markdown
Contributor

kevin-wu24 commented May 19, 2026

Hi @mannoopj,

To me, the issue seems to be in step 4. In the current code, KafkaClusterTestKit formatting logic is using the TestKitNodes.bootstrapMetadata as the source of truth for feature levels, which does not exactly mirror how the storage tool actually works. The storage tool takes the "defaults" from the supplied release-version, and then any manual feature overrides using --feature= take priority over that.

I think this test infra logic only differs when a builder of TestKitNodes explicitly sets a feature value to 0 but has a metadata version whose bootstrap version for said feature is not 0. If said feature's value was not set to 0, the feature record is present in bootstrapMetadata and therefore this test infra logic does behave just like the storage tool. Maybe the fix is to pass some additional state like disabledFeatures down to KafkaClusterTestKit. Then you are able to call Formatter.setFeatureLevel accordingly.

I think this difference is a consequence of the integration tests needing to generate BootstrapMetadata "earlier" than when the actual kafka would generate that object. In the integration tests, ControllerServer exists before Formatter (IMO it is out of scope to change that in this PR, but maybe that should not be the case...), which does not happen in the actual implementation.

@mannoopj
Copy link
Copy Markdown
Contributor Author

I went with @kevin-wu24's suggestion here as it was the most straight forward solution. Restructuring the Test infra to avoid using BootstrapMetadata seems to me to be out of scope of this PR.

Copy link
Copy Markdown
Member

@jsancio jsancio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the feature @mannoopj

@jsancio jsancio changed the title KAFKA-19648: KIP-1170 Unify cluster metadata bootstrapping KAFKA-19648; Cluster metadata bootstrapping with kraft checkpoint May 27, 2026
@jsancio jsancio merged commit 78a66fa into apache:trunk May 27, 2026
24 checks passed
Files.createDirectories(Paths.get(writeLogDir));
BootstrapDirectory bootstrapDirectory = new BootstrapDirectory(writeLogDir);
bootstrapDirectory.writeBinaryFile(bootstrapMetadata);
if (directoryTypes.get(writeLogDir).isDynamicMetadataDirectory()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the isDynamicMetadataDirectory() condition here? I noticed that the E2E tests fail in combined mode with multiple log directories because the pure data folders end up containing the unexpected __cluster_metadata-0 directory

throw new KafkaException(s"Found unexpected metadata location in data directory `$clusterMetadataTopic` " +

Copy link
Copy Markdown
Contributor

@kevin-wu24 kevin-wu24 May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should change this check to isMetadataDirectory. We should still write a 0-0.checkpoint for static quorums. Basically:

boolean isMetadataDirectory() {
    return this != LOG_DIRECTORY;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevin-wu24 Yes, your approach is much better and more precise. We will file a patch tomorrow if there is no objection.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chia7712 and @kevin-wu24 . Let's make sure we have a test that covers this case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this and for the discussion. I opened a PR to address this issue: #22418

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Gradle build or GitHub Actions ci-approved core Kafka Broker kraft tools

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants