Skip to content

Worker picks up AwaitingRetry flow runs waiting to execute decorator-defined retries if retry_delay_seconds exceeds worker's polling window #15458

Closed
@kevingrismore

Description

@kevingrismore

Bug summary

This is harder to reproduce on cloud, but happens consistently on a local server.

Deploy this flow to a work pool (what the task functions do doesn't matter, this is just to match the included screenshot):

@flow(retries=1, retry_delay_seconds=30)
def my_pipeline():
    data = extract()
    transformed_data = transform(data)
    load(transformed_data)
    raise Exception("Pipeline failed")

After the first failure, 30 seconds will pass and the flow will be retried in its first process, but the worker will also pick up the flow run again and start a new process/container, running only what it believes to be the remaining retry.

The final outcome is a run count of 3, even though it should run two times at most.

image

Version info (prefect version output)

Version:             3.0.3
API version:         0.8.4
Python version:      3.11.5
Git commit:          d9c51bc3
Built:               Fri, Sep 20, 2024 8:52 AM
OS/Arch:             darwin/arm64
Profile:             local
Server type:         server
Pydantic version:    2.8.2

Additional context

I was not able to reproduce this with a served flow.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions