Feature Request: Add safe shutdown method to prevent race conditions on reconnection

## Problem Description

When using WireGuard-ESP32 in a 24/7 application that requires automatic reconnection (e.g., after handshake timeout), the current `end()` method causes crashes due to two race conditions:

### 1. Timer Race Condition
The `wireguardif_tmr()` callback continues running after creating a new WireGuard instance. When the old timer fires, it accesses freed memory, causing crashes like:


Guru Meditation Error: Core 1 panic'ed (LoadProhibited)
PC: 0x420bb1c5: wireguardif_send_keepalive at wireguardif.c:654
EXCVADDR: 0x000000b0 (accessing freed device struct)


**Root Cause:** The timer is scheduled via `sys_timeout(WIREGUARDIF_TIMER_MSECS, wireguardif_tmr, device)` and only stopped via `sys_untimeout()` in `wireguardif_shutdown()`. Creating a new instance doesn't stop the old timer.

### 2. TCP Race Condition  
Calling `netif_remove()` immediately in `end()` crashes when TCP/IP stack still has in-flight packets:

Guru Meditation Error: Core 0 panic'ed (LoadProhibited)
#0 peer_lookup_by_allowed_ip at wireguardif.c:71
(inlined by) wireguardif_output at wireguardif.c:202
#1 ip4_output_if_opt_src
#2 tcp_output_segment
EXCVADDR: 0x000000b0


**Root Cause:** The netif is removed while lwIP still routes packets through it.

## Use Case

Our application is an ESP32-S3 Access Point running 24/7 that:
- Connects to WireGuard VPN for remote management
- Automatically reconnects after handshake timeout (3 minutes)
- Must remain stable without rebooting
- Cannot tolerate crashes during reconnection

## Proposed Solution

Add a new method `shutdown_timer_only()` that safely handles both race conditions:

```cpp
void WireGuard::shutdown_timer_only() {
    if( !this->_is_initialized ) return;
    
    log_i(TAG "Safe WireGuard shutdown starting...");
    
    // Step 1: Disconnect peer FIRST (stops sending keepalives)
    wireguardif_disconnect(wg_netif, wireguard_peer_index);
    log_i(TAG "Peer disconnected");
    
    // Step 2: Bring interface down (stops routing NEW packets)
    netif_set_down(wg_netif);
    netif_set_link_down(wg_netif);
    log_i(TAG "Interface and link disabled");
    
    // Step 3: Wait for in-flight packets to finish (critical!)
    vTaskDelay(pdMS_TO_TICKS(100));
    log_i(TAG "Waited for in-flight packets");
    
    // Step 4: Shutdown (stops timer via sys_untimeout)
    wireguardif_shutdown(wg_netif);
    log_i(TAG "Timer stopped");
    
    // Step 5: Remove netif from lwIP (now safe - no more traffic)
    netif_remove(wg_netif);
    log_i(TAG "Network interface removed");
    
    // Step 6: Mark as not initialized
    this->_is_initialized = false;
    wg_netif = nullptr;
    wireguard_peer_index = WIREGUARDIF_INVALID_INDEX;
}

Header Declaration
In WireGuard-ESP32.h:
d shutdown_timer_only();

Usage Pattern
static WireGuard* wg = nullptr;

void reconnect() {
    if (wg != nullptr && wg->is_initialized()) {
        wg->shutdown_timer_only();  // Safe cleanup
        delete wg;
    }
    wg = new WireGuard();
    wg->begin(...);
}

Testing
We have implemented this as a workaround using an automatic patch script and tested extensively:

✅ Stable operation through multiple reconnection cycles
✅ No crashes during handshake timeout → reconnection
✅ Timer properly stopped before new instance
✅ TCP/IP stack has time to complete in-flight operations
✅ Production-ready in 24/7 environment
Benefits
Safe reconnection without ESP32 restart
Prevents both timer and TCP race conditions
Minimal changes to existing library
Backwards compatible - end() remains unchanged
Production-ready for long-running applications
Alternative Considered
We considered calling end() directly, but it doesn't solve the timer race condition since it doesn't call wireguardif_shutdown() before potential instance deletion.

Implementation Reference
Working implementation available at: https://github.com/OpenEPaperLink/OpenEPaperLink/tree/ESP32_AP-Flasher (with automatic patch system in patch_wireguard.py)




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Add safe shutdown method to prevent race conditions on reconnection #51

Problem Description

1. Timer Race Condition

2. TCP Race Condition

Use Case

Proposed Solution

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature Request: Add safe shutdown method to prevent race conditions on reconnection #51

Description

Problem Description

1. Timer Race Condition

2. TCP Race Condition

Use Case

Proposed Solution

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions