-
Notifications
You must be signed in to change notification settings - Fork 14.5k
KAFKA-19221: Propagate IOException on LogSegment#close #19607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
94c3137
to
f107be8
Compare
A label of 'needs-attention' was automatically added to this PR in order to raise the |
CC: @chia7712 can you please take a look? |
A label of 'needs-attention' was automatically added to this PR in order to raise the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gaurav-narula : Thanks for the PR. Left a couple of comments.
@@ -57,6 +58,70 @@ public LogManagerIntegrationTest(ClusterInstance cluster) { | |||
this.cluster = cluster; | |||
} | |||
|
|||
@ClusterTest(types = {Type.KRAFT}, brokers = 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just use 1 broker in the test?
assertTrue(timeIndexFile.exists()); | ||
assertTrue(timeIndexFile.setReadOnly()); | ||
|
||
cluster.brokers().get(0).shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we verify that the cleanShutdown file is not written?
f107be8
to
2677f90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gaurav-narula thanks for this fix.
|
||
cluster.brokers().get(0).shutdown(); | ||
|
||
assertEquals(1, cluster.brokers().get(0).config().logDirs().size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a variable for cluster.brokers().get(0)
?
var broker = cluster.brokers().get(0);
broker.shutdown();
assertEquals(1, broker.config().logDirs().size());
String logDir = broker.config().logDirs().get(0);
Utils.closeQuietly(lazyTimeIndex, "timeIndex", LOGGER); | ||
Utils.closeQuietly(log, "log", LOGGER); | ||
Utils.closeQuietly(txnIndex, "txnIndex", LOGGER); | ||
Utils.closeAll(lazyOffsetIndex, lazyTimeIndex, log, txnIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If LogSegment#close
now throws an exception, then LogSegments#close
might break without closing all segments, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kafka/storage/src/main/java/org/apache/kafka/storage/internals/log/LogSegments.java
Line 108 in 4eac6ad
s.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that! I've modified LogSegments#close
to ensure we close all log segments even if one of them throws and added a test in LogSegmentsTest
.
In the process, I found that some tests assumed that resources like Indexes and FileRecords may be closed multiple times. With LogSegment#close
now propagating exceptions, we need to exit early if the resources have been closed before, otherwise we'd see failures due to FileChannel being closed or mmap being null
. I've therefore updated AbstractIndex
, TransactionIndex
and FileRecords
appropriately.
0151603
to
c290a11
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gaurav-narula thanks for updates. one small comment is left. PTAL
|
||
doThrow(new IOException("Failure")).when(seg2).close(); | ||
|
||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please consider using assertThrows
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed with ce22ba9
@gaurav-narula could you please fix the build error? |
Log segment closure results in right sizing the segment on disk along with the associated index files. This is specially important for TimeIndexes where a failure to right size may eventually cause log roll failures leading to under replication and log cleaner failures. This change uses `Utils.closeAll` which propagates exceptions, resulting in an "unclean" shutdown. That would then cause the broker to attempt to recover the log segment and the index on next startup, thereby avoiding the failures described above.
ce22ba9
to
d0afe7c
Compare
@chia7712 should be good now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gaurav-narula : Thanks for the updated PR. LGTM
Log segment closure results in right sizing the segment on disk along with the associated index files. This is specially important for TimeIndexes where a failure to right size may eventually cause log roll failures leading to under replication and log cleaner failures. This change uses `Utils.closeAll` which propagates exceptions, resulting in an "unclean" shutdown. That would then cause the broker to attempt to recover the log segment and the index on next startup, thereby avoiding the failures described above. Reviewers: Omnia Ibrahim <[email protected]>, Jun Rao <[email protected]>, Chia-Ping Tsai <[email protected]>
@@ -751,10 +751,7 @@ public Optional<FileRecords.TimestampAndOffset> findOffsetByTimestamp(long times | |||
public void close() throws IOException { | |||
if (maxTimestampAndOffsetSoFar != TimestampOffset.UNKNOWN) | |||
Utils.swallow(LOGGER, Level.WARN, "maybeAppend", () -> timeIndex().maybeAppend(maxTimestampSoFar(), shallowOffsetOfMaxTimestampSoFar(), true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think writing the max timestamp has a similar issue. the last entry is assumed to be the max timestamp after restarting. Hence, it needs to rebuild the index if the "true" max timestamp is not stored correctly
I open KAFKA-19428 to trace it
Log segment closure results in right sizing the segment on disk along with the associated index files. This is specially important for TimeIndexes where a failure to right size may eventually cause log roll failures leading to under replication and log cleaner failures. This change uses `Utils.closeAll` which propagates exceptions, resulting in an "unclean" shutdown. That would then cause the broker to attempt to recover the log segment and the index on next startup, thereby avoiding the failures described above. Reviewers: Omnia Ibrahim <[email protected]>, Jun Rao <[email protected]>, Chia-Ping Tsai <[email protected]>
Log segment closure results in right sizing the segment on disk along with the associated index files. This is specially important for TimeIndexes where a failure to right size may eventually cause log roll failures leading to under replication and log cleaner failures. This change uses `Utils.closeAll` which propagates exceptions, resulting in an "unclean" shutdown. That would then cause the broker to attempt to recover the log segment and the index on next startup, thereby avoiding the failures described above. Reviewers: Omnia Ibrahim <[email protected]>, Jun Rao <[email protected]>, Chia-Ping Tsai <[email protected]>
Log segment closure results in right sizing the segment on disk along with the associated index files. This is specially important for TimeIndexes where a failure to right size may eventually cause log roll failures leading to under replication and log cleaner failures. This change uses `Utils.closeAll` which propagates exceptions, resulting in an "unclean" shutdown. That would then cause the broker to attempt to recover the log segment and the index on next startup, thereby avoiding the failures described above. Reviewers: Omnia Ibrahim <[email protected]>, Jun Rao <[email protected]>, Chia-Ping Tsai <[email protected]>
Log segment closure results in right sizing the segment on disk along
with the associated index files.
This is specially important for TimeIndexes where a failure to right
size may eventually cause log roll failures leading to under replication
and log cleaner failures.
This change uses
Utils.closeAll
which propagates exceptions, resultingin an "unclean" shutdown. That would then cause the broker to attempt to
recover the log segment and the index on next startup, thereby avoiding
the failures described above.
Reviewers: Omnia Ibrahim [email protected], Jun Rao
[email protected], Chia-Ping Tsai [email protected]