Skip to content

Kea DHCPv4 not running after update (race condition during service restart) #9743

@OscaAlb

Description

@OscaAlb

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

After an OPNsense update (from 26.1_4 to 26.1.1), Kea DHCP fails to bind to sockets on startup due to a race condition. The old Kea process's sockets appear to still be in use (likely TIME_WAIT state) when the new process attempts to start. Kea then runs in a broken state, process alive but unable to serve any DHCP requests, for hours until manual intervention.

Critically, Kea does not retry binding to sockets after the initial failure, and there is no alerting that the DHCP server is non-functional. The only indication in the logs is a WARN level message, and Kea continues running its housekeeping tasks (Lease File Cleanup) as if everything is normal.

Last known working version: 26.1_4 (Kea was working correctly before the upgrade)

Note: Related to #9609 but distinct root cause. In #9609, dnsmasq was re-enabled during update. In my case, ISC DHCP was already disabled and not running, the socket conflict was caused by the old Kea process's sockets not being released before the new process started (race condition during service restart).

To Reproduce

  1. Have Kea DHCP running and serving multiple VLANs/subnets
  2. Perform an OPNsense update that triggers a Kea service restart
  3. Kea shuts down and restarts within a few minutes
  4. New Kea instance fails to bind to port 67 with "Address already in use" on all interfaces
  5. Kea runs but serves zero DHCP requests
  6. Network connectivity is lost as client leases expire (~66 minutes with default 4000 second lease time)

Expected behavior

  • Kea should wait for sockets to be fully released before attempting to bind, OR
  • Kea should retry binding to sockets after initial failure, OR
  • OPNsense should detect that Kea failed to bind and alert the administrator / attempt a restart, OR
  • At minimum, the failure should be logged at ERROR level, not just WARN

Describe alternatives you considered

  • Reverted to ISC DHCP to restore network connectivity
  • Considered manually restarting Kea after updates, but this defeats the purpose of automatic updates

Screenshots

N/A

Relevant log files

See attached log file showing:

  • 00:27-00:33: Normal DHCP operation, devices receiving leases
  • 00:33:52: Kea shutdown command received (triggered by update)
  • 00:36:01: Kea restart fails with DHCPSRV_NO_SOCKETS_OPEN and Address already in use on all interfaces
  • 01:36-06:36: Only LFC housekeeping tasks running, zero DHCP traffic served
  • 07:30: Manual recovery attempt

Key error messages:

DHCPSRV_OPEN_SOCKET_FAIL failed to open socket: Failed to open socket on interface vlan08, reason: failed to bind fallback socket to address 10.12.225.1, port 67, reason: Address already in use - is another DHCP server running?
DHCP4_OPEN_SOCKETS_FAILED maximum number of open service sockets attempts: 0, has been exhausted without success
DHCPSRV_NO_SOCKETS_OPEN no interface configured to listen to DHCP traffic

Additional context

  • ISC DHCP was disabled and not running at the time of the failure
  • The "Address already in use" error at 00:36:01 was caused by the old Kea process's sockets not being fully released, not by another DHCP server
  • Kea ran for nearly 7 hours in this broken state, performing hourly Lease File Cleanup but serving zero DHCP requests
  • This resulted in complete network connectivity loss as leases expired
  • Configuration: 8 subnets across multiple VLANs, multi-threading enabled with 8 threads

Environment

OPNsense 26.1.1 (amd64, upgraded from 26.1_4)
Kea DHCP 3.0.2
Deciso DEC3860 (AMD EPYC 3201, 32GB DDR4 RAM, 4x GbE + 2x SFP+ 10Gbps)

kea_clean.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    supportCommunity support or awaiting triage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions