[Bug]: PubsubIO on Flink Runner not acknowledging old messages

### What happened?

I'm using "org.apache.beam:beam-runners-flink-1.18:2.57.0". 
When I read from pubsub, I found it's not able to acknowledging messages that are generated before the job starts. As a result,  the messages are sent to Flink repeatedly, the number of unacked messages stay flat.
I also observed a similiar issue to this one https://github.com/apache/beam/issues/31510
The ack message count can be higher than the message produce rate.

It can be reproduced with the following code, it's simply reading from pubsub and print out a string.
args
``` 
 - "--runner=FlinkRunner"
 - "--attachedMode=false"
 - "--checkpointingInterval=10000"
 - "--unalignedCheckpointEnabled=true"
```

```
import org.apache.beam.runners.flink.FlinkPipelineOptions;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.gcp.pubsub.PubsubIO;
import org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.values.PCollection;
import java.util.concurrent.ThreadLocalRandom;

public class Test {
    public static void main(String[] args) {
        FlinkPipelineOptions options = PipelineOptionsFactory.fromArgs(args)
                .withValidation().withoutStrictParsing().as(FlinkPipelineOptions.class);
        Pipeline pipeline = Pipeline.create(options);
        PCollection<PubsubMessage> pubsubMessages = pipeline.apply(
                        PubsubIO.readMessages().fromSubscription(
                                "xxx"))
                .apply("print", ParDo.of(new DoFn<PubsubMessage, PubsubMessage>() {
                    @ProcessElement
                    public void processElement(ProcessContext c) {
                        if (ThreadLocalRandom.current().nextDouble() < 0.01) {
                            System.out.println("##################");
                            c.output(c.element());
                        }
                    }
                }));
        pipeline.run();
    }
}
```

### Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

### Issue Components

- [ ] Component: Python SDK
- [ ] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [ ] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Infrastructure
- [ ] Component: Spark Runner
- [X] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: PubsubIO on Flink Runner not acknowledging old messages #32461

What happened?

Issue Priority

Issue Components

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

[Bug]: PubsubIO on Flink Runner not acknowledging old messages #32461

Description

What happened?

Issue Priority

Issue Components

Activity

liferoad commented on Sep 16, 2024

xzhang2sc commented on Sep 16, 2024

xzhang2sc commented on Sep 16, 2024

xzhang2sc commented on Sep 16, 2024

je-ik commented on Sep 17, 2024

xzhang2sc commented on Sep 17, 2024

xzhang2sc commented on Sep 18, 2024

je-ik commented on Sep 18, 2024

zendesk-kjaanson commented on Jan 27, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions