Skip to content

fix: Don't advertise IDLE until ChromeDriver is validated at startup#598

Open
aaronkvanmeerten wants to merge 2 commits intomasterfrom
chromedriver-startup-check
Open

fix: Don't advertise IDLE until ChromeDriver is validated at startup#598
aaronkvanmeerten wants to merge 2 commits intomasterfrom
chromedriver-startup-check

Conversation

@aaronkvanmeerten
Copy link
Copy Markdown
Member

Summary

  • Jibri was broadcasting IDLE+HEALTHY to XMPP MUCs immediately on startup before ChromeDriver was validated, allowing Jicofo to dispatch requests that would fail during ChromeDriver initialization
  • Adds an EnvironmentChecker that registers as UNHEALTHY at construction, then probes ChromeDriver in the background — Jibri only becomes HEALTHY (eligible for dispatch) after the probe succeeds
  • Retry defaults (6 attempts, 10s delay) accommodate post-reboot delays where Xvfb/Chrome may not be immediately available
  • Configurable via jibri.chrome.startup-check, startup-check-max-attempts, and startup-check-retry-delay

Test plan

  • Unit tests for EnvironmentChecker (initial UNHEALTHY registration, successful transition, failure persistence, retry-then-succeed)
  • Full mvn verify passes (145 tests, 0 failures, ktlint clean)
  • Deploy to a test Jibri and verify:
    • Initial XMPP presence shows IDLE + UNHEALTHY
    • After probe succeeds (~seconds), presence updates to IDLE + HEALTHY
    • HTTP /jibri/api/v1.0/health reflects the same transitions
    • Health check script does not trigger shutdown during normal startup window
  • Verify startup-check = false preserves old behavior (immediate IDLE + HEALTHY)

Jibri was broadcasting IDLE+HEALTHY to XMPP MUCs immediately on startup,
before ChromeDriver had been validated. If Jicofo dispatched a request
during this window, ChromeDriver initialization could fail, causing
dispatch errors. This was especially impactful during batch reboots.

Add an EnvironmentChecker that registers as UNHEALTHY on construction,
then probes ChromeDriver in the background. Jibri only transitions to
HEALTHY (and becomes eligible for dispatch) after the probe succeeds.

Configurable via jibri.chrome.startup-check (default: true),
startup-check-max-attempts (default: 6), and
startup-check-retry-delay (default: 10000ms).
Copy link
Copy Markdown
Member

@bgrozev bgrozev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to nest the config under a "startup-check" section:
chrome {
startup-check {
enabled = true
max-attempts = 6
retry-delay = 10000
}
}

Copy link
Copy Markdown
Member Author

@aaronkvanmeerten aaronkvanmeerten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest to nest the config under a "startup-check" section:
chrome {
startup-check {
enabled = true
max-attempts = 6
retry-delay = 10000
}
}

Good idea, done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants