Skip to content

Fix foreman.target start and stop ordering#412

Merged
evgeni merged 2 commits intotheforeman:masterfrom
pablomh:fix_foreman_target_ordering
Mar 18, 2026
Merged

Fix foreman.target start and stop ordering#412
evgeni merged 2 commits intotheforeman:masterfrom
pablomh:fix_foreman_target_ordering

Conversation

@pablomh
Copy link
Copy Markdown
Contributor

@pablomh pablomh commented Mar 17, 2026

Two related fixes for race conditions in foreman.target lifecycle management.

Add startup ordering dependencies to foreman container

foreman.service was the only application service without After=/Wants= for redis and postgresql, unlike dynflow-sidekiq, candlepin, and all Pulp services which already declared them. Without these, on a foreman.target stop+start, foreman can start racing against postgresql coming back up, exceed the sdnotify timeout, and be marked failed by systemd — leaving port 3000 unbound and HTTPD returning 503.

Ensure all foreman.target services stop before the target

Add After=foreman.target to every service and timer that is PartOf=foreman.target. Due to systemd's reversed stop ordering, each service now stops before the target, meaning systemctl stop foreman.target blocks until all constituent services have fully stopped rather than returning the moment the target meta-unit transitions to inactive. Without this, a rapid stop+start can call systemctl start while services are still shutting down, causing container conflicts and races on the subsequent start.

Before (from journal of a failing run):

systemctl stop foreman.target
Stopped target Foreman services  ← target stops immediately
Stopping foreman.service...      ← service still stopping
systemctl start foreman.target   ← race condition!
foreman container died           ← old foreman still stopping
Starting foreman.service...      ← new start races with postgresql
foreman.service: start operation timed out  ← FAIL

After (from journal, two independent runs):

systemctl stop foreman.target
Stopping foreman.service...
Stopped foreman.service.          ← service stops first ✓
Stopped target Foreman services   ← target stops after (~2s gap) ✓
systemctl start foreman.target    ← no race, clean start ✓
Starting foreman.service...       ← ~31s later, after postgresql is ready ✓
Started foreman.service.          ← sd_notify received ✓  PASS

pablomh added 2 commits March 17, 2026 16:22
foreman.service was the only application service without After=/Wants=
for redis and postgresql, unlike dynflow-sidekiq, candlepin, and all
pulp services which already declared them. Without these, on a
foreman.target stop+start, foreman could race against postgresql coming
back up, exceed the sdnotify timeout, and be marked failed by systemd
— leaving port 3000 unbound and HTTPD returning 503.
Add After=foreman.target to every service and timer that is
PartOf=foreman.target. The reversed stop ordering (each service stops
before foreman.target) means systemctl stop foreman.target now blocks
until all constituent services have fully stopped, rather than returning
the moment the target meta-unit transitions to inactive.

Without this, a rapid stop+start of foreman.target can call systemctl
start while services are still in the process of stopping, leading to
container conflicts and races on the subsequent start.
@pablomh
Copy link
Copy Markdown
Contributor Author

pablomh commented Mar 17, 2026

This issue was discovered while testing #409.

Copy link
Copy Markdown
Member

@ehelms ehelms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. @evgeni ?

Copy link
Copy Markdown
Member

@evgeni evgeni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a moment to grasp the After=foreman.target part, but I think it makes sense.

@evgeni evgeni merged commit 805217d into theforeman:master Mar 18, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants