Skip to content

[Bug]: FileBasedDeadLetterQueueReconsumer could result in duplicates #2236

Open
@damccorm

Description

@damccorm

Related Template(s)

Anything using FileBasedDeadLetterQueueReconsumer (as best I can tell, looks like just spanner changestreams + datastreams templates - https://github.com/search?q=repo%3AGoogleCloudPlatform%2FDataflowTemplates%20FileBasedDeadLetterQueueReconsumer&type=code )

Template Version

latest - seen in 2023-07-18-00_rc00, but also just based on code inspection

What happened?

The way

is written, it could result in duplicates if Dataflow experiences any sort of backlog or slowdown. Specifically, the following scenario could happen:

  1. Generate sequence fires, and everything before the reshuffle happens
  2. Generate sequence fires again, and everything before the reshuffle happens
  3. We remove the file after the reshuffle
  4. We try to remove the file after the reshuffle again, which logs but succeeds -

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions