Skip to content

dnsdist: add opt-in fatal bind failures for console and webserver#17099

Open
Annih wants to merge 1 commit into
PowerDNS:masterfrom
Annih:issue17036_dnsdist_optin_fatal_bind_failures
Open

dnsdist: add opt-in fatal bind failures for console and webserver#17099
Annih wants to merge 1 commit into
PowerDNS:masterfrom
Annih:issue17036_dnsdist_optin_fatal_bind_failures

Conversation

@Annih
Copy link
Copy Markdown
Contributor

@Annih Annih commented Mar 31, 2026

Short description

Introduce opt-in fatal behavior when binding the webserver socket or the control socket fails, to make startup failures visible to service managers like systemd.
Expose the feature in both configuration styles:

  • Lua: setConsoleBindFatal(bool), setWebserverBindFatal(bool)
  • YAML: console.bind_fatal, webserver.bind_fatal

When enabled, dnsdist now exits with failure on bind exceptions for:

  • control socket listeners
  • webserver listeners

Wire the new settings through runtime configuration loading, Lua configuration items, and YAML parsing, and add console completion entries for both setters.
Update documentation with new config functions and behavior notes.

Add regression tests in test_BindFatal.py for Lua and YAML, validating:

  • default/not set: bind failures are non-fatal
  • explicit false: bind failures are non-fatal
  • explicit true: bind failures are fatal at startup

This should solve #17036

Checklist

I have:

  • read the CONTRIBUTING.md document
  • read and accepted the Developer Certificate of Origin document, including the AI Policy, and added a "Signed-off-by" to my commits
  • compiled this code
  • tested this code
  • included documentation (including possible behaviour changes)
  • documented the code
  • added or modified regression test(s)
  • added or modified unit test(s)

@Annih Annih force-pushed the issue17036_dnsdist_optin_fatal_bind_failures branch 3 times, most recently from 64d598b to 274ff6d Compare March 31, 2026 20:33
@Annih
Copy link
Copy Markdown
Contributor Author

Annih commented Mar 31, 2026

OK after fixing the python indentation issues (in 3 takes) I just noticed I forgot the mention closes #XXXX 🤦🏻 should I add it?

Also note that in the doc I added added in version 2.2.0, because @rgacogne mentionned it could be a good timeline, but:

  1. maybe it is not my role to document that?
  2. maybe you want it later or earlier?
    Let me know!

Finally, I also struggled on the case of the webserver setting, I settled on webserverBindFatal because I saw webserverConfig, but I also noticed webServerACL or something similar; let me know if you want this to be changed 🤷🏻‍♂️

@Annih
Copy link
Copy Markdown
Contributor Author

Annih commented Mar 31, 2026

Humpf I need to understand why my simple tests are working locally but not on the CI 🤔will see later this week :(

@rgacogne
Copy link
Copy Markdown
Member

rgacogne commented Apr 8, 2026

At a quick glance the code looks very nice, thanks! Let me know if you want me to have a look at the test failures!

@Annih
Copy link
Copy Markdown
Contributor Author

Annih commented Apr 22, 2026

@rgacogne I didn't find the time yet to get back to it to fix test, if you have time soon-ish please have a look, my setup to rebuild/repro is not perfect. (it currently works on my machine ... ^^')

@rgacogne
Copy link
Copy Markdown
Member

I can reproduce locally, the problem is that the regression tests check the exit status of dnsdist a bit too quickly, and TSAN slow things down a lot, especially on shitty GH action runners. Let me see if I can come up with something that does not slow down the tests too much.

@rgacogne
Copy link
Copy Markdown
Member

This fixes the issue for me:

diff --git regression-tests.dnsdist/test_BindFatal.py regression-tests.dnsdist/test_BindFatal.py
index 11282c896b..b5e0379708 100644
--- regression-tests.dnsdist/test_BindFatal.py
+++ regression-tests.dnsdist/test_BindFatal.py
@@ -1,5 +1,6 @@
 #!/usr/bin/env python
 import unittest
+import time

 from dnsdisttests import DNSDistTest

@@ -17,6 +18,11 @@ class _BindFatalMixin:
         cls._startupFailed = False
         try:
             super().setUpClass()
+            for _ in range(0, 20):
+                if cls._dnsdist.poll() is not None:
+                    cls._startupFailed = True
+                    break
+                time.sleep(0.1)
         except unittest.SkipTest:
             raise
         except Exception:

@rgacogne
Copy link
Copy Markdown
Member

Hi! Do you mind if a push the fix I mentioned earlier to your branch?

@Annih Annih force-pushed the issue17036_dnsdist_optin_fatal_bind_failures branch from 274ff6d to 138e88d Compare May 26, 2026 19:52
@Annih
Copy link
Copy Markdown
Contributor Author

Annih commented May 26, 2026

Hello @rgacogne , thanks for checking and your proposal to amend my commit; I was a bit busy on other topics, and our dnsdist usecase was a bit in pause.
I just applied your patch, do you need anything else to be changed?

For instance I refere to version 2.2.0, should I be the one to add that? should I remove it?

@rgacogne rgacogne force-pushed the issue17036_dnsdist_optin_fatal_bind_failures branch from 138e88d to c2f260d Compare May 28, 2026 07:24
@rgacogne
Copy link
Copy Markdown
Member

I rebased your PR on the current main branch to fix a conflict

@rgacogne
Copy link
Copy Markdown
Member

I just applied your patch, do you need anything else to be changed?

Looks good to me, thanks! Let's see if CI is happy with it :)

For instance I refere to version 2.2.0, should I be the one to add that? should I remove it?

No, what you did is exactly what was needed, no worries.

@coveralls
Copy link
Copy Markdown

coveralls commented May 28, 2026

Coverage Report for CI Build 26601352724

Coverage increased (+4.1%) to 71.072%

Details

  • Coverage increased (+4.1%) from the base build.
  • Patch coverage: 10 uncovered changes across 2 files (6 of 16 lines covered, 37.5%).
  • 40 coverage regressions across 8 files.

Uncovered Changes

File Changed Covered %
pdns/dnsdistdist/dnsdist-lua.cc 6 0 0.0%
pdns/dnsdistdist/dnsdist.cc 6 2 33.33%
Total (4 files) 16 6 37.5%

Coverage Regressions

40 previously-covered lines in 8 files lost coverage.

File Lines Losing Coverage Coverage
pdns/recursordist/test-syncres_cc1.cc 13 79.93%
pdns/recursordist/syncres.cc 8 81.23%
pdns/dnsdistdist/dnsdist-carbon.cc 7 59.56%
pdns/dnsdistdist/dnsdist-concurrent-connections.cc 4 90.65%
pdns/communicator.cc 3 63.46%
pdns/recursordist/test-syncres_cc2.cc 3 79.39%
pdns/backends/gsql/gsqlbackend.hh 1 97.77%
pdns/dnsdistdist/dnsdist-lua-configuration-items.cc 1 62.83%

Coverage Stats

Coverage Status
Relevant Lines: 170532
Covered Lines: 132763
Line Coverage: 77.85%
Relevant Branches: 81434
Covered Branches: 46314
Branch Coverage: 56.87%
Branches in Coverage %: Yes
Coverage Strength: 7174737.34 hits per line

💛 - Coveralls

rgacogne
rgacogne previously approved these changes May 28, 2026
@rgacogne rgacogne requested a review from pieterlexis May 28, 2026 14:13
Copy link
Copy Markdown
Contributor

@pieterlexis pieterlexis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, good stuff. 2 small documentation nits.

Comment thread pdns/dnsdistdist/dnsdist-settings-definitions.yml
Comment thread pdns/dnsdistdist/dnsdist-settings-definitions.yml
Introduce opt-in fatal behavior when binding the webserver socket or
the control socket fails, to make startup failures visible to service
managers like systemd.
Expose the feature in both configuration styles:
- Lua: setConsoleBindFatal(bool), setWebserverBindFatal(bool)
- YAML: console.bind_fatal, webserver.bind_fatal

When enabled, dnsdist now exits with failure on bind exceptions for:
- control socket listeners
- webserver listeners

Wire the new settings through runtime configuration loading, Lua
configuration items, and YAML parsing, and add console completion
entries for both setters.
Update documentation with new config functions and behavior notes.

Add regression tests in test_BindFatal.py for Lua and YAML, validating:
- default/not set: bind failures are non-fatal
- explicit false: bind failures are non-fatal
- explicit true: bind failures are fatal at startup

Signed-off-by: b.courtois <b.courtois@criteo.com>
@Annih Annih force-pushed the issue17036_dnsdist_optin_fatal_bind_failures branch from c2f260d to a37aaaf Compare May 28, 2026 20:48
@rgacogne rgacogne requested a review from pieterlexis May 29, 2026 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants