Skip to content

Writer throw OutOfOrderException when cluster upgrading #1163

@swuferhong

Description

@swuferhong

Search before asking

  • I searched in the issues and found nothing similar.

Fluss version

0.6.0 (latest release)

Please describe the bug 🐞

The error:

2025-06-19 20:19:11
java.io.IOException: Could not perform checkpoint 8 for operator Calc[11] -> app_taobao_exposure_fi[12]: Writer (757/1024)#1.
	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1507)
	at org.apache.flink.streaming.runtime.io.checkpointing.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:147)
	at org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.triggerCheckpoint(SingleCheckpointBarrierHandler.java:287)
	at org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.access$100(SingleCheckpointBarrierHandler.java:64)
	at org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler$ControllerImpl.triggerGlobalCheckpoint(SingleCheckpointBarrierHandler.java:488)
	at org.apache.flink.streaming.runtime.io.checkpointing.AbstractAlignedBarrierHandlerState.triggerGlobalCheckpoint(AbstractAlignedBarrierHandlerState.java:74)
	at org.apache.flink.streaming.runtime.io.checkpointing.AbstractAlignedBarrierHandlerState.barrierReceived(AbstractAlignedBarrierHandlerState.java:66)
	at org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.lambda$processBarrier$2(SingleCheckpointBarrierHandler.java:234)
	at org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.markCheckpointAlignedAndTransformState(SingleCheckpointBarrierHandler.java:262)
	at org.apache.flink.streaming.runtime.io.checkpointing.SingleCheckpointBarrierHandler.processBarrier(SingleCheckpointBarrierHandler.java:231)
	at org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.handleEvent(CheckpointedInputGate.java:183)
	at org.apache.flink.streaming.runtime.io.checkpointing.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:159)
	at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:178)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:68)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:616)
	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:1071)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:1020)
	at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:959)
	at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:938)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:751)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:567)
	at java.lang.Thread.run(Thread.java:879)
Caused by: java.io.IOException: One or more Fluss Writer send requests have encountered exception
	at com.alibaba.fluss.flink.sink.writer.FlinkSinkWriter.checkAsyncException(FlinkSinkWriter.java:250)
	at com.alibaba.fluss.flink.sink.writer.AppendSinkWriter.flush(AppendSinkWriter.java:67)
	at org.apache.flink.streaming.runtime.operators.sink.SinkWriterOperator.prepareSnapshotPreBarrier(SinkWriterOperator.java:198)
	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.prepareSnapshotPreBarrier(RegularOperatorChain.java:89)
	at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.checkpointState(SubtaskCheckpointCoordinatorImpl.java:340)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$14(StreamTask.java:1568)
	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:1556)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:1495)
	... 22 more
Caused by: java.util.concurrent.CompletionException: com.alibaba.fluss.exception.OutOfOrderSequenceException: Out of order batch sequence for writer 62979 at offset 2900 in table-bucket TableBucket{tableId=29, partitionId=656, bucket=48} : 14 (incoming batch seq.), -1 (current batch seq.)
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
	at com.alibaba.fluss.client.table.writer.AbstractTableWriter.lambda$send$0(AbstractTableWriter.java:71)
	at com.alibaba.fluss.client.write.WriteBatch.lambda$completeFutureAndFireCallbacks$0(WriteBatch.java:189)
	at java.util.ArrayList.forEach(ArrayList.java:1259)
	at com.alibaba.fluss.client.write.WriteBatch.completeFutureAndFireCallbacks(WriteBatch.java:186)
	at com.alibaba.fluss.client.write.WriteBatch.done(WriteBatch.java:237)
	at com.alibaba.fluss.client.write.WriteBatch.completeExceptionally(WriteBatch.java:181)
	at com.alibaba.fluss.client.write.Sender.failBatch(Sender.java:265)
	at com.alibaba.fluss.client.write.Sender.handleWriteBatchException(Sender.java:563)
	at com.alibaba.fluss.client.write.Sender.handleProduceLogResponse(Sender.java:453)
	at com.alibaba.fluss.client.write.Sender.lambda$sendProduceLogRequestAndHandleResponse$6(Sender.java:412)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
	at com.alibaba.fluss.rpc.netty.client.ServerConnection$ResponseCallback.onRequestResult(ServerConnection.java:240)
	at com.alibaba.fluss.rpc.netty.client.NettyClientHandler.channelRead(NettyClientHandler.java:96)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at com.alibaba.fluss.shaded.netty4.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at com.alibaba.fluss.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
	at com.alibaba.fluss.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at com.alibaba.fluss.shaded.netty4.io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	at com.alibaba.fluss.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at com.alibaba.fluss.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at com.alibaba.fluss.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at com.alibaba.fluss.shaded.netty4.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	... 1 more
Caused by: com.alibaba.fluss.exception.OutOfOrderSequenceException: Out of order batch sequence for writer 62979 at offset 2900 in table-bucket TableBucket{tableId=29, partitionId=656, bucket=48} : 14 (incoming batch seq.), -1 (current batch seq.)

Solution

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions