-
Notifications
You must be signed in to change notification settings - Fork 96
Description
Problem Description
When using WireGuard-ESP32 in a 24/7 application that requires automatic reconnection (e.g., after handshake timeout), the current end() method causes crashes due to two race conditions:
1. Timer Race Condition
The wireguardif_tmr() callback continues running after creating a new WireGuard instance. When the old timer fires, it accesses freed memory, causing crashes like:
Guru Meditation Error: Core 1 panic'ed (LoadProhibited)
PC: 0x420bb1c5: wireguardif_send_keepalive at wireguardif.c:654
EXCVADDR: 0x000000b0 (accessing freed device struct)
Root Cause: The timer is scheduled via sys_timeout(WIREGUARDIF_TIMER_MSECS, wireguardif_tmr, device) and only stopped via sys_untimeout() in wireguardif_shutdown(). Creating a new instance doesn't stop the old timer.
2. TCP Race Condition
Calling netif_remove() immediately in end() crashes when TCP/IP stack still has in-flight packets:
Guru Meditation Error: Core 0 panic'ed (LoadProhibited)
#0 peer_lookup_by_allowed_ip at wireguardif.c:71
(inlined by) wireguardif_output at wireguardif.c:202
#1 ip4_output_if_opt_src
#2 tcp_output_segment
EXCVADDR: 0x000000b0
Root Cause: The netif is removed while lwIP still routes packets through it.
Use Case
Our application is an ESP32-S3 Access Point running 24/7 that:
- Connects to WireGuard VPN for remote management
- Automatically reconnects after handshake timeout (3 minutes)
- Must remain stable without rebooting
- Cannot tolerate crashes during reconnection
Proposed Solution
Add a new method shutdown_timer_only() that safely handles both race conditions:
void WireGuard::shutdown_timer_only() {
if( !this->_is_initialized ) return;
log_i(TAG "Safe WireGuard shutdown starting...");
// Step 1: Disconnect peer FIRST (stops sending keepalives)
wireguardif_disconnect(wg_netif, wireguard_peer_index);
log_i(TAG "Peer disconnected");
// Step 2: Bring interface down (stops routing NEW packets)
netif_set_down(wg_netif);
netif_set_link_down(wg_netif);
log_i(TAG "Interface and link disabled");
// Step 3: Wait for in-flight packets to finish (critical!)
vTaskDelay(pdMS_TO_TICKS(100));
log_i(TAG "Waited for in-flight packets");
// Step 4: Shutdown (stops timer via sys_untimeout)
wireguardif_shutdown(wg_netif);
log_i(TAG "Timer stopped");
// Step 5: Remove netif from lwIP (now safe - no more traffic)
netif_remove(wg_netif);
log_i(TAG "Network interface removed");
// Step 6: Mark as not initialized
this->_is_initialized = false;
wg_netif = nullptr;
wireguard_peer_index = WIREGUARDIF_INVALID_INDEX;
}
Header Declaration
In WireGuard-ESP32.h:
d shutdown_timer_only();
Usage Pattern
static WireGuard* wg = nullptr;
void reconnect() {
if (wg != nullptr && wg->is_initialized()) {
wg->shutdown_timer_only(); // Safe cleanup
delete wg;
}
wg = new WireGuard();
wg->begin(...);
}
Testing
We have implemented this as a workaround using an automatic patch script and tested extensively:
✅ Stable operation through multiple reconnection cycles
✅ No crashes during handshake timeout → reconnection
✅ Timer properly stopped before new instance
✅ TCP/IP stack has time to complete in-flight operations
✅ Production-ready in 24/7 environment
Benefits
Safe reconnection without ESP32 restart
Prevents both timer and TCP race conditions
Minimal changes to existing library
Backwards compatible - end() remains unchanged
Production-ready for long-running applications
Alternative Considered
We considered calling end() directly, but it doesn't solve the timer race condition since it doesn't call wireguardif_shutdown() before potential instance deletion.
Implementation Reference
Working implementation available at: https://github.com/OpenEPaperLink/OpenEPaperLink/tree/ESP32_AP-Flasher (with automatic patch system in patch_wireguard.py)