Answers checklist.
IDF version.
v6.0.1 (also reproduced on v5.5.4, release/v5.5, and master incl. esp32-wifi-lib 1747224af)
Espressif SoC revision.
ESP32-S3 (QFN56) (revision v0.2)
Operating System used.
macOS
How did you build your project?
VS Code IDE
If you are using Windows, please specify command line type.
None
Development Kit.
Custom board — battery-powered ESP32-S3 IoT telemetry device (single-core / CONFIG_FREERTOS_UNICORE=y, no PSRAM)
Power Supply used.
Battery
What is the expected behavior?
Each esp_wifi_* operation in a normal STA session — scan, connect, the socket / TLS send during data transfer, and esp_wifi_stop teardown — should either complete, or fail and return control to the calling task within a bounded time, so the application can retry, fall back to another transport, or go back to sleep.
More fundamentally: the FreeRTOS scheduler must keep dispatching. A context switch that becomes due should be serviced, timed waits (xEventGroupWaitBits, ulTaskNotifyTake, semaphore takes) must expire on schedule, and a runnable higher-priority task must get the CPU. No Wi-Fi operation should be able to leave the scheduler in a state where it stops dispatching entirely and the only recovery is a watchdog-driven reset.
What is the actual behavior?
Fleet impact: roughly 5% of our deployed ESP32-S3 fleet hits this intermittently and resets — recurring across many devices, not a one-off. For a battery-powered, frequently-offline telemetry product that means dropped data and field service, so resolving it matters a lot to us.
What happens: intermittently — after hours-to-days of normal operation, never on demand — a Wi-Fi STA session stops making progress, the calling (main) task is left blocked indefinitely, the FreeRTOS scheduler stops dispatching any task, and the device does nothing further until the watchdog resets it.
This is NOT one specific call. Across 9 task-WDT coredumps from a single firmware build, main is wedged at four different points of the Wi-Fi session, yet the kernel state at the hang is identical every time — uxSchedulerSuspended=0, xPendedTicks=0, xYieldPending=1 (a context switch is owed but never serviced), CPU in esp_cpu_wait_for_intr (waiti):
The scheduler genuinely stops dispatching. A dedicated priority-24 supervisor task we added to detect and break the stall is itself frozen in every wedged core — blocked in a 1 s-timeout ulTaskNotifyTake whose timeout never fires. So this is not priority starvation a higher-priority task can win against: once wedged, nothing at any priority runs, and only the task / hardware watchdog reset recovers the device.
Full per-phase backtraces and the kernel-state table are in the Debug Logs field. It is intermittent, has no deterministic trigger, and reproduces on every IDF version we have tried (v5.5.4 -> master).
Steps to reproduce.
This is an intermittent field failure — we do NOT have a minimal on-demand reproducer. It surfaces after hours-to-days of normal operation across the fleet, not on a fixed input. What we can give is the exact API sequence, the conditions, and 9 coredumps that all share the same kernel-state signature (Debug Logs field).
Context: battery-powered ESP32-S3 telemetry device, CONFIG_FREERTOS_UNICORE=y (single core), no PSRAM, deep-sleep duty cycle. Each wake runs ONE Wi-Fi session on the main task, then deep-sleeps. The session:
esp_netif_init / esp_event_loop_create; create default STA netif
esp_wifi_init(&cfg)
esp_wifi_set_ps(WIFI_PS_NONE)
esp_wifi_set_mode(WIFI_MODE_STA); esp_wifi_start()
esp_wifi_scan_start(&cfg, false) (non-blocking); wait on SCAN_DONE bits in 1 s slices
esp_wifi_set_config(...); esp_wifi_connect()
- wait on CONNECTED / GOT_IP bits in 1 s slices, feeding the task WDT between slices
- HTTPS POST via
esp_http_client (mbedTLS), ~10 KB body
esp_wifi_disconnect(); esp_wifi_stop(); esp_wifi_deinit(); destroy netif
- deep sleep
The hang lands at multiple points of this sequence — coredumps show main wedged at step 5 (scan), step 7 (connect / got-IP), step 8 (TLS / HTTP send, blocked in lwip), and step 9 (esp_wifi_stop never returns). In every case the calling task never regains control, the scheduler stops dispatching, and the 60 s task watchdog resets the device.
Reproduced with the identical signature on ESP-IDF v5.5.4, release/v5.5, v6.0.1, and master (esp32-wifi-lib blob 1747224af; master also includes the FreeRTOS task-switch fix 26dc038a). Updating IDF + blob to the latest does not change it.
Debug Logs.
Decoded coredumps, credential-scrubbed (raw `.elf` cores withheld — they embed the Wi-Fi SSID and server URLs; local paths replaced with `$IDF` / `<app>`, the SSID with `<ssid>`). Build: ESP-IDF master, esp32-wifi-lib `1747224af`; identical signature on v5.5.4 / release-v5.5 / v6.0.1.
**Same kernel state, four different Wi-Fi calls.** Across 9 task-WDT coredumps from one build, the calling (`main`) task is wedged at four different points of the Wi-Fi session, but the FreeRTOS kernel state at the hang is identical every time:
phase (where main is stuck) count uxSchedulerSuspended xPendedTicks xYieldPending wdt_feeds
connect (wait CONNECTED/GOT_IP) 4 0 0 1 7
esp_wifi_stop (teardown) 3 0 0 1 14
scan (wait SCAN_DONE) 1 0 0 1 4
TLS/HTTP send (lwip) 1 0 0 1 17
In every case: CPU in esp_cpu_wait_for_intr (waiti); every other task blocked on its own
queue/event/notify; a context switch is owed (xYieldPending=1) but never serviced, so the
task that should run next never does, and the 60 s task-WDT resets the chip. The esp_wifi_*
call main happens to be in differs; the underlying stall does not.
==== (A) CONNECT ==== [feeds=7, WDT=60s]
#0 uxTaskResetEventItemValue() tasks.c:5672
#1 xEventGroupWaitBits(xTicksToWait=100) event_groups.c:439
#2 wifi_check_status() <app>/components/wifi/wifi.c:932 // 1 s sliced wait for CONNECTED/GOT_IP, WDT fed between slices
#3 wifi_connect() wifi.c:225
#5 wifi_send_data() <app>/main/Src/wifi_transport_handler.c:112
#6 transmit() -> #7 app_integration() -> #8 app_main()
other tasks all blocked on their queue/event/notify -- including our prio-24 supervisor (wifi_sup),
itself frozen in a 1 s-timeout notify-wait that never fires: even the highest-priority task never runs.
==== (B) SCAN ==== [feeds=4]
#1 xEventGroupWaitBits(xTicksToWait=100) event_groups.c:439
#2 wifi_scan() <app>/components/wifi/wifi.c:1120 // 1 s sliced wait for SCAN_DONE
#3 wifi_check_ap(target_ssid=<ssid>) wifi.c:895
#4 wifi_connect() wifi.c:217 -> wifi_send_data() -> ...
==== (C) TLS / HTTP DATA SEND ==== [feeds=17]
#0 vPortEnterCritical() portmacro.h:554
#1 xQueueSemaphoreTake() queue.c:1727
#2 sys_arch_sem_wait(timeout=0) sys_arch.c:174 // 0 = block forever
#3 tcpip_send_msg_wait_sem(fn=lwip_netconn_do_write) tcpip.c:461
#4 netconn_apimsg() -> netconn_write_vectors_partly() -> netconn_write_partly()
#7 lwip_send(size=10240) sockets.c:1457
#9 tcp_write(timeout_ms=20000) transport_ssl.c:261
#10 esp_transport_write() -> #11 esp_http_client_write(len=10240)
// main blocked forever on the tcpip semaphore; the lwip tcpip thread never completes the send (Wi-Fi tx not progressing)
==== (D) TEARDOWN -- esp_wifi_stop() never returns ==== [feeds=14]
#0 esp_wifi_stop()
#1 wifi_guarded_stop() <app>/components/wifi/wifi.c:276
#2 wifi_stop_and_deinit() wifi.c:575
#3 app_integration() <app>/main/Src/app.c:244 -> app_main()
// esp_wifi_stop() blocks indefinitely (cf. #3458)
**Why this is not application logic:** in (A)/(B) our wait is 1 s-bounded and feeds the WDT between slices, yet it hangs the full 60 s — the tick / timeout / yield path beneath `esp_wifi_*` has stalled. In (C)/(D) `main` is blocked inside lwip / `esp_wifi_stop` on an internal infinite wait. `esp32-wifi-lib` is closed so we cannot trace below `esp_wifi_*`, but the identical `xYieldPending=1`-never-serviced state across all four phases — which even freezes our prio-24 supervisor — points at a single scheduler / Wi-Fi-driver stall, not application code.
Diagnostic report archive.
No response
More Information.
Build / config
- ESP32-S3,
CONFIG_FREERTOS_UNICORE=y (single core), no PSRAM (CONFIG_SPIRAM unset), internal SRAM only.
CONFIG_FREERTOS_HZ=100. Task WDT on (CONFIG_ESP_TASK_WDT_PANIC=y), widened to 60 s around the Wi-Fi session.
WIFI_PS_NONE. mbedTLS dynamic buffers. Coredump-to-flash (ELF).
Debugging we did
- Symbolized 9 task-WDT coredumps from one build (plus more across earlier builds) and read the FreeRTOS kernel globals out of each: same
uxSchedulerSuspended=0 / xPendedTicks=0 / xYieldPending=1 every time, at four different esp_wifi_* call sites.
- Added a priority-24 supervisor task to detect the stall and force recovery — it froze too (blocked in a 1 s-timeout wait that never fires), proving no task at any priority runs once wedged.
- Widened the task WDT from 30 s to 60 s, and in one experiment to 15 minutes — it STILL reset, so this is a true hang, not a slow-but-finite operation.
- Switched the scan from blocking to non-blocking (
esp_wifi_scan_start(..., false) + sliced event wait) — no change.
- Set
WIFI_PS_NONE — no change.
- Tested v5.5.4, release/v5.5, v6.0.1, and master with the latest esp32-wifi-lib blob (
1747224af) and the FreeRTOS task-switch fix (26dc038a) — identical signature on all; updating does not help.
What we ruled out, and why this is not application-level
- Not heap / list corruption: builds run with heap poisoning + FreeRTOS list-integrity checks; neither tripped, and the kernel ready / blocked lists in the cores are structurally consistent.
- Not application starvation or our own polling: in the scan / connect cases our wait is 1-second-bounded and feeds the WDT between slices, yet it hangs the full 60 s — a 1 s timeout cannot stretch to 60 s unless the tick / timeout / yield path beneath
esp_wifi_* has stalled. And our highest-priority (prio-24) task is frozen too.
- Not a single-AP / single-site quirk: it happens across different sites and access points.
- Common thread in all cases: a context switch owed but never serviced (
xYieldPending=1) while the CPU idles in waiti — a scheduler / Wi-Fi-driver-level stall below the esp_wifi_* API. esp32-wifi-lib is closed, so we cannot trace further ourselves.
Possibly related: #3458 (esp_wifi_stop blocking forever — matches our teardown cases) and #14703.
Raw coredumps: withheld publicly (they embed the device Wi-Fi SSID / credentials and server URLs), but we can share them plus the exact sdkconfig privately with an Espressif engineer on request.
Answers checklist.
IDF version.
v6.0.1 (also reproduced on v5.5.4, release/v5.5, and master incl. esp32-wifi-lib 1747224af)
Espressif SoC revision.
ESP32-S3 (QFN56) (revision v0.2)
Operating System used.
macOS
How did you build your project?
VS Code IDE
If you are using Windows, please specify command line type.
None
Development Kit.
Custom board — battery-powered ESP32-S3 IoT telemetry device (single-core / CONFIG_FREERTOS_UNICORE=y, no PSRAM)
Power Supply used.
Battery
What is the expected behavior?
Each
esp_wifi_*operation in a normal STA session — scan, connect, the socket / TLS send during data transfer, andesp_wifi_stopteardown — should either complete, or fail and return control to the calling task within a bounded time, so the application can retry, fall back to another transport, or go back to sleep.More fundamentally: the FreeRTOS scheduler must keep dispatching. A context switch that becomes due should be serviced, timed waits (
xEventGroupWaitBits,ulTaskNotifyTake, semaphore takes) must expire on schedule, and a runnable higher-priority task must get the CPU. No Wi-Fi operation should be able to leave the scheduler in a state where it stops dispatching entirely and the only recovery is a watchdog-driven reset.What is the actual behavior?
Fleet impact: roughly 5% of our deployed ESP32-S3 fleet hits this intermittently and resets — recurring across many devices, not a one-off. For a battery-powered, frequently-offline telemetry product that means dropped data and field service, so resolving it matters a lot to us.
What happens: intermittently — after hours-to-days of normal operation, never on demand — a Wi-Fi STA session stops making progress, the calling (
main) task is left blocked indefinitely, the FreeRTOS scheduler stops dispatching any task, and the device does nothing further until the watchdog resets it.This is NOT one specific call. Across 9 task-WDT coredumps from a single firmware build,
mainis wedged at four different points of the Wi-Fi session, yet the kernel state at the hang is identical every time —uxSchedulerSuspended=0,xPendedTicks=0,xYieldPending=1(a context switch is owed but never serviced), CPU inesp_cpu_wait_for_intr(waiti):esp_wifi_connect()— 4 of 9esp_wifi_stop()during teardown, which never returns — 3 of 9 (cf. Deadlock while using wifi routines from the timer task (IDFGH-1143) #3458)esp_wifi_scan_start()— 1 of 9The scheduler genuinely stops dispatching. A dedicated priority-24 supervisor task we added to detect and break the stall is itself frozen in every wedged core — blocked in a 1 s-timeout
ulTaskNotifyTakewhose timeout never fires. So this is not priority starvation a higher-priority task can win against: once wedged, nothing at any priority runs, and only the task / hardware watchdog reset recovers the device.Full per-phase backtraces and the kernel-state table are in the Debug Logs field. It is intermittent, has no deterministic trigger, and reproduces on every IDF version we have tried (v5.5.4 -> master).
Steps to reproduce.
This is an intermittent field failure — we do NOT have a minimal on-demand reproducer. It surfaces after hours-to-days of normal operation across the fleet, not on a fixed input. What we can give is the exact API sequence, the conditions, and 9 coredumps that all share the same kernel-state signature (Debug Logs field).
Context: battery-powered ESP32-S3 telemetry device,
CONFIG_FREERTOS_UNICORE=y(single core), no PSRAM, deep-sleep duty cycle. Each wake runs ONE Wi-Fi session on the main task, then deep-sleeps. The session:esp_netif_init/esp_event_loop_create; create default STA netifesp_wifi_init(&cfg)esp_wifi_set_ps(WIFI_PS_NONE)esp_wifi_set_mode(WIFI_MODE_STA);esp_wifi_start()esp_wifi_scan_start(&cfg, false)(non-blocking); wait on SCAN_DONE bits in 1 s slicesesp_wifi_set_config(...);esp_wifi_connect()esp_http_client(mbedTLS), ~10 KB bodyesp_wifi_disconnect();esp_wifi_stop();esp_wifi_deinit(); destroy netifThe hang lands at multiple points of this sequence — coredumps show
mainwedged at step 5 (scan), step 7 (connect / got-IP), step 8 (TLS / HTTP send, blocked in lwip), and step 9 (esp_wifi_stopnever returns). In every case the calling task never regains control, the scheduler stops dispatching, and the 60 s task watchdog resets the device.Reproduced with the identical signature on ESP-IDF v5.5.4, release/v5.5, v6.0.1, and master (esp32-wifi-lib blob
1747224af; master also includes the FreeRTOS task-switch fix26dc038a). Updating IDF + blob to the latest does not change it.Debug Logs.
Diagnostic report archive.
No response
More Information.
Build / config
CONFIG_FREERTOS_UNICORE=y(single core), no PSRAM (CONFIG_SPIRAMunset), internal SRAM only.CONFIG_FREERTOS_HZ=100. Task WDT on (CONFIG_ESP_TASK_WDT_PANIC=y), widened to 60 s around the Wi-Fi session.WIFI_PS_NONE. mbedTLS dynamic buffers. Coredump-to-flash (ELF).Debugging we did
uxSchedulerSuspended=0 / xPendedTicks=0 / xYieldPending=1every time, at four differentesp_wifi_*call sites.esp_wifi_scan_start(..., false)+ sliced event wait) — no change.WIFI_PS_NONE— no change.1747224af) and the FreeRTOS task-switch fix (26dc038a) — identical signature on all; updating does not help.What we ruled out, and why this is not application-level
esp_wifi_*has stalled. And our highest-priority (prio-24) task is frozen too.xYieldPending=1) while the CPU idles inwaiti— a scheduler / Wi-Fi-driver-level stall below theesp_wifi_*API.esp32-wifi-libis closed, so we cannot trace further ourselves.Possibly related: #3458 (
esp_wifi_stopblocking forever — matches our teardown cases) and #14703.Raw coredumps: withheld publicly (they embed the device Wi-Fi SSID / credentials and server URLs), but we can share them plus the exact sdkconfig privately with an Espressif engineer on request.