Skip to content

Commit 48c6e51

Browse files
authored
Merge pull request #415 from hangfire-postgres/features/414-resilient-startup
Implemented resilient startup
2 parents 9e32feb + 1f7dbb9 commit 48c6e51

File tree

7 files changed

+385
-17
lines changed

7 files changed

+385
-17
lines changed

Hangfire.PostgreSql.sln

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@ MinimumVisualStudioVersion = 10.0.40219.1
66
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "tests", "tests", "{766BE831-F758-46BC-AFD3-BBEEFE0F686F}"
77
EndProject
88
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Solution Items", "Solution Items", "{5CA38188-92EE-453C-A04E-A506DF15A787}"
9+
ProjectSection(SolutionItems) = preProject
10+
README.md = README.md
11+
EndProjectSection
912
EndProject
1013
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "src", "src", "{0D30A51B-814F-474E-93B8-44E9C155255C}"
1114
EndProject

README.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,105 @@ app.UseHangfireServer(options);
8282

8383
this provider would first process jobs in `a-long-running-queue`, then `general-queue` and lastly `very-fast-queue`.
8484

85+
### Startup resilience and transient database outages
86+
87+
Starting from version 1.20.13 (where `PostgreSqlStorageOptions` gained startup resilience options), the storage tries to be more tolerant to *transient* PostgreSQL outages during application startup.
88+
89+
#### Default behavior
90+
91+
By default, when you use the new-style configuration:
92+
93+
```csharp
94+
services.AddHangfire((provider, config) =>
95+
{
96+
config.UsePostgreSqlStorage(opts =>
97+
opts.UseNpgsqlConnection(Configuration.GetConnectionString("HangfireConnection")));
98+
});
99+
100+
app.UseHangfireServer();
101+
app.UseHangfireDashboard();
102+
```
103+
104+
`PostgreSqlStorageOptions` uses the following defaults for startup resilience:
105+
106+
- `PrepareSchemaIfNecessary = true`
107+
- `StartupConnectionMaxRetries = 5`
108+
- `StartupConnectionBaseDelay = 1 second`
109+
- `StartupConnectionMaxDelay = 1 minute`
110+
- `AllowDegradedModeWithoutStorage = true`
111+
112+
With these defaults:
113+
114+
1. On application startup, when schema preparation is required, the storage will try to open a connection and install/upgrade the schema.
115+
2. If the database is temporarily unavailable, it will retry the operation up to 6 times (1 initial attempt + 5 retries) with exponential backoff between attempts, capped at 1 minute.
116+
3. If all attempts fail **during startup**, the storage enters a *degraded* state instead of crashing the whole process. Your ASP.NET Core application can still start and serve other endpoints that do not depend on Hangfire.
117+
4. On the *first actual use* of the storage (e.g. dashboard, background job server), Hangfire will try to initialize again. If the database is available by then, initialization succeeds and everything works as usual. If it is still unavailable, an `InvalidOperationException` with the original database exception as `InnerException` is thrown at that call site.
118+
119+
This behavior is designed to make applications more robust in scenarios where the database may briefly lag behind the application during deployments or orchestrated restarts.
120+
121+
#### Opting out of resilient startup (fail fast)
122+
123+
If you prefer to fail the whole process immediately if the database is not reachable during startup – you can disable retries by setting `StartupConnectionMaxRetries` to `0`:
124+
125+
```csharp
126+
var storageOptions = new PostgreSqlStorageOptions
127+
{
128+
PrepareSchemaIfNecessary = true,
129+
StartupConnectionMaxRetries = 0, // disables resilient startup
130+
AllowDegradedModeWithoutStorage = false, // fail fast if DB is down at startup
131+
};
132+
133+
services.AddHangfire((provider, config) =>
134+
{
135+
config.UsePostgreSqlStorage(opts =>
136+
opts.UseNpgsqlConnection(Configuration.GetConnectionString("HangfireConnection")),
137+
storageOptions);
138+
});
139+
```
140+
141+
With this configuration:
142+
143+
- A single attempt is made to open a connection and prepare the schema.
144+
- If that attempt fails, the storage constructor throws and application startup fails.
145+
146+
#### Controlling degraded mode
147+
148+
Degraded mode is controlled via `AllowDegradedModeWithoutStorage`:
149+
150+
- `true` (default): if all startup attempts fail, both startup and lazy initialization will keep the storage in an uninitialized state on failure and retry on subsequent uses, until initialization eventually succeeds.
151+
- `false`: if all startup attempts fail, the storage constructor will throw an `InvalidOperationException("Failed to initialize Hangfire PostgreSQL storage.", innerException)`.
152+
153+
For example, to keep retries but still fail startup if the DB never becomes available:
154+
155+
```csharp
156+
var storageOptions = new PostgreSqlStorageOptions
157+
{
158+
PrepareSchemaIfNecessary = true,
159+
StartupConnectionMaxRetries = 10, // more aggressive retry policy
160+
StartupConnectionBaseDelay = TimeSpan.FromSeconds(2),
161+
StartupConnectionMaxDelay = TimeSpan.FromMinutes(2),
162+
AllowDegradedModeWithoutStorage = false, // do not start the app without storage
163+
};
164+
```
165+
166+
#### Turning off schema preparation entirely
167+
168+
If you manage the Hangfire schema yourself (for example via migrations or a dedicated deployment step) and do not want the storage to touch the database during startup or on first use, set `PrepareSchemaIfNecessary = false`:
169+
170+
```csharp
171+
var storageOptions = new PostgreSqlStorageOptions
172+
{
173+
PrepareSchemaIfNecessary = false, // no schema installation/upgrade
174+
};
175+
```
176+
177+
In this case:
178+
179+
- No schema initialization is performed by `PostgreSqlStorage`.
180+
- The first query that actually needs the database will fail if the schema is missing or mismatched, so you must ensure it is created/updated out of band.
181+
182+
> Note: startup resilience settings (`StartupConnectionMaxRetries`, `AllowDegradedModeWithoutStorage`, etc.) only apply when `PrepareSchemaIfNecessary` is `true`.
183+
85184
### License
86185

87186
Copyright © 2014-2024 Frank Hommers https://github.com/hangfire-postgres/Hangfire.PostgreSql.

src/Hangfire.PostgreSql/PostgreSqlStorage.cs

Lines changed: 144 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
using System.Data;
2525
using System.Data.Common;
2626
using System.Text;
27+
using System.Threading;
2728
using System.Transactions;
2829
using Hangfire.Annotations;
2930
using Hangfire.Logging;
@@ -39,7 +40,10 @@ namespace Hangfire.PostgreSql
3940
public class PostgreSqlStorage : JobStorage
4041
{
4142
private readonly IConnectionFactory _connectionFactory;
42-
43+
private readonly object _initializationLock = new();
44+
private bool _initialized;
45+
private Exception _lastInitializationException;
46+
4347
private readonly Dictionary<string, bool> _features =
4448
new(StringComparer.OrdinalIgnoreCase)
4549
{
@@ -85,27 +89,18 @@ public PostgreSqlStorage(IConnectionFactory connectionFactory, PostgreSqlStorage
8589
_connectionFactory = connectionFactory ?? throw new ArgumentNullException(nameof(connectionFactory));
8690
Options = options ?? throw new ArgumentNullException(nameof(options));
8791

88-
if (options.PrepareSchemaIfNecessary)
89-
{
90-
NpgsqlConnection connection = CreateAndOpenConnection();
91-
try
92-
{
93-
PostgreSqlObjectsInstaller.Install(connection, options.SchemaName);
94-
}
95-
finally
96-
{
97-
if (connectionFactory is not ExistingNpgsqlConnectionFactory)
98-
{
99-
connection.Dispose();
100-
}
101-
}
102-
}
103-
10492
InitializeQueueProviders();
10593
if (Options.UseSlidingInvisibilityTimeout)
10694
{
10795
HeartbeatProcess = new PostgreSqlHeartbeatProcess();
10896
}
97+
98+
// Perform eager initialization if schema preparation is requested. This can be made resilient
99+
// via the options exposed on PostgreSqlStorageOptions.
100+
if (Options.PrepareSchemaIfNecessary)
101+
{
102+
TryInitializeStorage(isStartup: true);
103+
}
109104
}
110105

111106
public PersistentJobQueueProviderCollection QueueProviders { get; internal set; }
@@ -116,11 +111,13 @@ public PostgreSqlStorage(IConnectionFactory connectionFactory, PostgreSqlStorage
116111

117112
public override IMonitoringApi GetMonitoringApi()
118113
{
114+
EnsureInitialized();
119115
return new PostgreSqlMonitoringApi(this, QueueProviders);
120116
}
121117

122118
public override IStorageConnection GetConnection()
123119
{
120+
EnsureInitialized();
124121
return new PostgreSqlConnection(this);
125122
}
126123

@@ -199,6 +196,136 @@ internal NpgsqlConnection CreateAndOpenConnection()
199196
}
200197
}
201198

199+
/// <summary>
200+
/// Ensures storage is initialized. When resilient startup and degraded mode are enabled,
201+
/// this will attempt a lazy initialization on first use.
202+
/// </summary>
203+
private void EnsureInitialized()
204+
{
205+
if (_initialized || !Options.PrepareSchemaIfNecessary)
206+
{
207+
return;
208+
}
209+
210+
lock (_initializationLock)
211+
{
212+
if (_initialized)
213+
{
214+
return;
215+
}
216+
217+
TryInitializeStorage(isStartup: false);
218+
219+
if (!_initialized && !Options.AllowDegradedModeWithoutStorage)
220+
{
221+
// Initialization failed and degraded mode is not enabled - rethrow with the last error
222+
// to give a clear signal to the caller.
223+
throw new InvalidOperationException(
224+
"Hangfire PostgreSQL storage is not initialized. See inner exception for details.",
225+
_lastInitializationException);
226+
}
227+
}
228+
}
229+
230+
private void TryInitializeStorage(bool isStartup)
231+
{
232+
// Fast-path: no resilient startup configured - keep the current behavior of a single attempt.
233+
if (!Options.EnableResilientStartup)
234+
{
235+
PerformSingleInitializationAttempt();
236+
_initialized = true;
237+
_lastInitializationException = null;
238+
return;
239+
}
240+
241+
int attempts = 0;
242+
int maxAttempts = 1 + Options.StartupConnectionMaxRetries; // initial + retries
243+
Exception lastException = null;
244+
245+
while (attempts < maxAttempts)
246+
{
247+
try
248+
{
249+
PerformSingleInitializationAttempt();
250+
_initialized = true;
251+
_lastInitializationException = null;
252+
return;
253+
}
254+
catch (Exception ex)
255+
{
256+
lastException = ex;
257+
attempts++;
258+
259+
if (attempts >= maxAttempts)
260+
{
261+
break;
262+
}
263+
264+
// Apply exponential backoff with capping.
265+
TimeSpan delay = ComputeBackoffDelay(attempts, Options.StartupConnectionBaseDelay, Options.StartupConnectionMaxDelay);
266+
267+
try
268+
{
269+
Thread.Sleep(delay);
270+
}
271+
catch (ThreadInterruptedException)
272+
{
273+
// Preserve original exception and abort initialization.
274+
break;
275+
}
276+
}
277+
}
278+
279+
_initialized = false;
280+
_lastInitializationException = lastException;
281+
282+
if (!Options.AllowDegradedModeWithoutStorage && isStartup)
283+
{
284+
// During startup without degraded mode, fail fast to avoid starting the app in
285+
// a partially configured state.
286+
throw new InvalidOperationException(
287+
"Failed to initialize Hangfire PostgreSQL storage.",
288+
lastException);
289+
}
290+
291+
// When degraded mode is allowed, we swallow the exception here and leave storage
292+
// uninitialized. Subsequent calls will attempt to initialize lazily via EnsureInitialized.
293+
}
294+
295+
private void PerformSingleInitializationAttempt()
296+
{
297+
NpgsqlConnection connection = CreateAndOpenConnection();
298+
try
299+
{
300+
PostgreSqlObjectsInstaller.Install(connection, Options.SchemaName);
301+
}
302+
finally
303+
{
304+
if (_connectionFactory is not ExistingNpgsqlConnectionFactory)
305+
{
306+
connection.Dispose();
307+
}
308+
}
309+
}
310+
311+
private static TimeSpan ComputeBackoffDelay(int attempt, TimeSpan baseDelay, TimeSpan maxDelay)
312+
{
313+
if (attempt <= 0)
314+
{
315+
return baseDelay;
316+
}
317+
318+
double factor = Math.Pow(2, attempt - 1);
319+
double millis = baseDelay.TotalMilliseconds * factor;
320+
321+
if (millis < 0 || millis > maxDelay.TotalMilliseconds || double.IsInfinity(millis) || double.IsNaN(millis))
322+
{
323+
millis = maxDelay.TotalMilliseconds;
324+
}
325+
326+
return TimeSpan.FromMilliseconds(millis);
327+
}
328+
202329
internal void UseTransaction(DbConnection dedicatedConnection,
203330
[InstantHandle] Action<DbConnection, IDbTransaction> action,
204331
IsolationLevel? isolationLevel = null)

src/Hangfire.PostgreSql/PostgreSqlStorageOptions.cs

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,10 @@ public PostgreSqlStorageOptions()
5050
EnableTransactionScopeEnlistment = true;
5151
DeleteExpiredBatchSize = 1000;
5252
UseSlidingInvisibilityTimeout = false;
53+
StartupConnectionMaxRetries = 5;
54+
StartupConnectionBaseDelay = TimeSpan.FromSeconds(1);
55+
StartupConnectionMaxDelay = TimeSpan.FromMinutes(1);
56+
AllowDegradedModeWithoutStorage = true;
5357
}
5458

5559
public TimeSpan QueuePollInterval
@@ -134,6 +138,38 @@ public int DeleteExpiredBatchSize
134138
/// </summary>
135139
public bool UseSlidingInvisibilityTimeout { get; set; }
136140

141+
/// <summary>
142+
/// Gets if additional resilience during storage initialization is enabled. When <see cref="StartupConnectionMaxRetries"/>
143+
/// is greater than zero and <see cref="PrepareSchemaIfNecessary"/> is true, Hangfire will retry opening a
144+
/// connection and installing schema instead of failing immediately.
145+
/// This property is computed from <see cref="StartupConnectionMaxRetries"/>.
146+
/// </summary>
147+
public bool EnableResilientStartup => StartupConnectionMaxRetries > 0;
148+
149+
/// <summary>
150+
/// Maximum number of additional attempts (after the initial one) to obtain a connection and
151+
/// prepare the schema during startup when <see cref="EnableResilientStartup"/> is true.
152+
/// Value of 0 keeps current behavior (no retries).
153+
/// </summary>
154+
public int StartupConnectionMaxRetries { get; set; }
155+
156+
/// <summary>
157+
/// Base delay used to compute exponential backoff between startup connection attempts when <see cref="EnableResilientStartup"/> is true.
158+
/// </summary>
159+
public TimeSpan StartupConnectionBaseDelay { get; set; }
160+
161+
/// <summary>
162+
/// Maximum delay between startup connection attempts when <see cref="EnableResilientStartup"/> is true.
163+
/// </summary>
164+
public TimeSpan StartupConnectionMaxDelay { get; set; }
165+
166+
/// <summary>
167+
/// When true and <see cref="EnableResilientStartup"/> is enabled, storage initialization will
168+
/// not throw even if all startup connection attempts fail. Instead, the storage starts in a
169+
/// degraded mode and will attempt to initialize lazily on first use.
170+
/// </summary>
171+
public bool AllowDegradedModeWithoutStorage { get; set; }
172+
137173
private static void ThrowIfValueIsNotPositive(TimeSpan value, string fieldName)
138174
{
139175
string message = $"The {fieldName} property value should be positive. Given: {value}.";

0 commit comments

Comments
 (0)