Skip to content

Feature Request: Add safe shutdown method to prevent race conditions on reconnection #51

@aiakos-k

Description

@aiakos-k

Problem Description

When using WireGuard-ESP32 in a 24/7 application that requires automatic reconnection (e.g., after handshake timeout), the current end() method causes crashes due to two race conditions:

1. Timer Race Condition

The wireguardif_tmr() callback continues running after creating a new WireGuard instance. When the old timer fires, it accesses freed memory, causing crashes like:

Guru Meditation Error: Core 1 panic'ed (LoadProhibited)
PC: 0x420bb1c5: wireguardif_send_keepalive at wireguardif.c:654
EXCVADDR: 0x000000b0 (accessing freed device struct)

Root Cause: The timer is scheduled via sys_timeout(WIREGUARDIF_TIMER_MSECS, wireguardif_tmr, device) and only stopped via sys_untimeout() in wireguardif_shutdown(). Creating a new instance doesn't stop the old timer.

2. TCP Race Condition

Calling netif_remove() immediately in end() crashes when TCP/IP stack still has in-flight packets:

Guru Meditation Error: Core 0 panic'ed (LoadProhibited)
#0 peer_lookup_by_allowed_ip at wireguardif.c:71
(inlined by) wireguardif_output at wireguardif.c:202
#1 ip4_output_if_opt_src
#2 tcp_output_segment
EXCVADDR: 0x000000b0

Root Cause: The netif is removed while lwIP still routes packets through it.

Use Case

Our application is an ESP32-S3 Access Point running 24/7 that:

  • Connects to WireGuard VPN for remote management
  • Automatically reconnects after handshake timeout (3 minutes)
  • Must remain stable without rebooting
  • Cannot tolerate crashes during reconnection

Proposed Solution

Add a new method shutdown_timer_only() that safely handles both race conditions:

void WireGuard::shutdown_timer_only() {
    if( !this->_is_initialized ) return;
    
    log_i(TAG "Safe WireGuard shutdown starting...");
    
    // Step 1: Disconnect peer FIRST (stops sending keepalives)
    wireguardif_disconnect(wg_netif, wireguard_peer_index);
    log_i(TAG "Peer disconnected");
    
    // Step 2: Bring interface down (stops routing NEW packets)
    netif_set_down(wg_netif);
    netif_set_link_down(wg_netif);
    log_i(TAG "Interface and link disabled");
    
    // Step 3: Wait for in-flight packets to finish (critical!)
    vTaskDelay(pdMS_TO_TICKS(100));
    log_i(TAG "Waited for in-flight packets");
    
    // Step 4: Shutdown (stops timer via sys_untimeout)
    wireguardif_shutdown(wg_netif);
    log_i(TAG "Timer stopped");
    
    // Step 5: Remove netif from lwIP (now safe - no more traffic)
    netif_remove(wg_netif);
    log_i(TAG "Network interface removed");
    
    // Step 6: Mark as not initialized
    this->_is_initialized = false;
    wg_netif = nullptr;
    wireguard_peer_index = WIREGUARDIF_INVALID_INDEX;
}

Header Declaration
In WireGuard-ESP32.h:
d shutdown_timer_only();

Usage Pattern
static WireGuard* wg = nullptr;

void reconnect() {
    if (wg != nullptr && wg->is_initialized()) {
        wg->shutdown_timer_only();  // Safe cleanup
        delete wg;
    }
    wg = new WireGuard();
    wg->begin(...);
}

Testing
We have implemented this as a workaround using an automatic patch script and tested extensively:

✅ Stable operation through multiple reconnection cycles
✅ No crashes during handshake timeout → reconnection
✅ Timer properly stopped before new instance
✅ TCP/IP stack has time to complete in-flight operations
✅ Production-ready in 24/7 environment
Benefits
Safe reconnection without ESP32 restart
Prevents both timer and TCP race conditions
Minimal changes to existing library
Backwards compatible - end() remains unchanged
Production-ready for long-running applications
Alternative Considered
We considered calling end() directly, but it doesn't solve the timer race condition since it doesn't call wireguardif_shutdown() before potential instance deletion.

Implementation Reference
Working implementation available at: https://github.com/OpenEPaperLink/OpenEPaperLink/tree/ESP32_AP-Flasher (with automatic patch system in patch_wireguard.py)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions