Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry if driver throws an JobQueueDriverError connectionError #77

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

adam-fowler
Copy link
Member

@adam-fowler adam-fowler commented Mar 5, 2025

  • Add withExponentialBackoff which retries an operation with exponential backoff if it throws a JobQueueDriverError with code set to .connectionError.
  • Wrap all driver operations in withExponentialBackoff
  • Add driver specific retry options to JobQueueOptions

Copy link

codecov bot commented Mar 5, 2025

Codecov Report

Attention: Patch coverage is 94.11765% with 4 lines in your changes missing coverage. Please review.

Project coverage is 91.75%. Comparing base (b6f6cb2) to head (3383a0f).

Files with missing lines Patch % Lines
Sources/Jobs/JobQueueDriverError.swift 72.72% 3 Missing ⚠️
Sources/Jobs/JobQueueHandler.swift 98.11% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #77      +/-   ##
==========================================
- Coverage   92.20%   91.75%   -0.45%     
==========================================
  Files          23       24       +1     
  Lines        1296     1347      +51     
==========================================
+ Hits         1195     1236      +41     
- Misses        101      111      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@adam-fowler adam-fowler linked an issue Mar 5, 2025 that may be closed by this pull request
@adam-fowler adam-fowler changed the title Retry if driver throws an error Retry if driver throws an JobQueueDriverError connectionError Mar 5, 2025
return try await operation()
} catch let error as JobQueueDriverError where error.code == .connectionError {
logger.debug("\(message()) failed")
if self.options.driverRetryStrategy.shouldRetry(attempt: attempt, error: error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this call still be made since the default maxAttempt is set to the maximum int value? We can have a maximum of two states here where a job was popped off a queue and we loose connection to the driver and will retry until connected or the job lost connection while polling.

For the first case, I am wondering if we should have a background running that finds jobs with states 'processing' that do not exist in a queue? Or should we by default move jobs with such state to their specific queue?

Copy link
Member Author

@adam-fowler adam-fowler Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we hit the retry limit the error is propagated further up and the job queue handler exits and we'll have to restart the queue process to continue processing jobs. The default is set to .max as the alternative is exiting the process.

If the default is set to a lower number and we exit the handler then the cleanup at start can fixup any jobs left in the processing state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we hit the retry limit the error is propagated further up and the job queue handler exits and we'll have to restart the queue process to continue processing jobs. The default is set to .max as the alternative is exiting the process.

If the default is set to a lower number and we exit the handler then the cleanup at start can fixup any jobs left in the processing state.

By default all the drivers are setup to do nothing on boot. I think this should be documented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of documentation to add. We have made a lot of changes since the last release

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of documentation to add. We have made a lot of changes since the last release

Indeed! I will help with documents too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I forgot to mention this earlier. How will this work with the Postgres driver? PostgresNIO seems to keep on retrying after a connection lost. I am that familiar with the Redis driver, I suppose it'll be same since the connection pool logic seems very similar between the two?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah PostgresNIO will retry connections ad-infinitum. So in theory it isn't an issue when using the Postgres driver.

Redis is different in that it will eventually throw an error and has different errors for when an open connection was closed and when a connection couldn't be made.

Without this change the error would be propagated up and end the job queue handler and eventually the application.

We could move the retry to the drivers instead. I'm already asking the drivers to recognise connection errors.

@adam-fowler
Copy link
Member Author

I'm going to put this on hold, while I think about it. I might push this functionality down to the drivers where needed

@adam-fowler adam-fowler marked this pull request as draft March 6, 2025 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Job queue iteration error handling
2 participants