Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hotfix] Change writerState flush from async to sync while rolling log to avoid conflicts with the writerState flush during logTablet close #729

Merged
merged 1 commit into from
Apr 14, 2025

Conversation

swuferhong
Copy link
Collaborator

Purpose

This pr is aims to change writerState flush from async to sync while rolling log to avoid conflicts with the writerState flush during logTablet close.

In the LogTablet#roll() method, the current implementation writes the writerState while holding the lock, but the file flush operation is performed outside the lock. However, this could potentially conflict with the writerState write operation in the LogTablet#close() method. Considering that the writerState flush operation itself is not very resource-intensive and the roll() method isn't called frequently, we consider change it to a sync flush approach.

Brief change log

Tests

API and Format

Documentation

…g to avoid conflicts with the writerState flush during logTablet close
@wuchong
Copy link
Member

wuchong commented Apr 12, 2025

It seems Kafka introduced the async on purpose to address a performance problem when flushing the writer state takes too long. What exception/problem when using async?

https://issues.apache.org/jira/browse/KAFKA-15046
apache/kafka@d71d063

@swuferhong
Copy link
Collaborator Author

It seems Kafka introduced the async on purpose to address a performance problem when flushing the writer state takes too long. What exception/problem when using async?

https://issues.apache.org/jira/browse/KAFKA-15046 apache/kafka@d71d063

When testing the Fluss cluster update, the Fluss tabletServer shutdown also needs to call writerStateManager.takeSnapshot(). If this conflicts with the takeSnapshot in the roll() method, it may happen that the shutdown process acquires the lock before the roll() has flushed the snapshot and proceeds to write the snapshot, which will cause errors. Therefore, a better approach than async flushing should be used here to address the issue of blocking writes.

@wuchong wuchong merged commit f3d924e into alibaba:main Apr 14, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants