Skip to content

Hangfire Does Not Handle DLQ Properly in Case of Hard Crashes #2505

Open
@irfanFarooc

Description

@irfanFarooc

Using Hangfire 1.8.6 and Sql Server as storage.

In the case of normal failures, Hangfire retries N times and moves the faulty message to the DLQ while allowing other messages to process correctly.

However, in the case of a hard crash (e.g., if the server crashes after picking up a message for processing), the faulty message remains stuck in the queue forever. This results in Hangfire retrying the failed message every N minutes, leading to repeated crashes every N minutes if the message continues to fail.

Observation: It look like's Hangfire is bumping the retry count after it has handled the exception. But in case when it fails fast (crashed) it stuck in queue forever, leading to repeated crashes and Retry count parameter does not even appear in the Hangfire.JobParameter table.

As a suggestion, keep incrementing the retry counter when a message is dequeued?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions