Skip to content

Conversation

@MitchBradley
Copy link

Handle Retryable Errors in readLoop

Apropos of #224 and #228:

I spend a day trying to reproduce the problem with two different Espressif DevkitC N8R8 S3s but was unable to make them fail. My custom board that fails also has an N8R8 S3 module with a CP210x serial chip. So I did a deep analysis of the failure details.

_connectAttempt calls resetStrategy (classic) which resets the chip to download mode. The chip generates the expected 3-line response ending with "Waiting for download". The first line of that response (observed with scope) starts 4 ms after the chip comes out of reset and lasts for about 2 ms. Then there is a 4 ms gap, and the next two lines appear. This timing is identical on all of the modules that I have tested, both working and failing.

With the failing systems, readLoop receives a 26-byte data buffer - the first line of that message, then readLoop receives a one-byte data buffer, and then it throws a "BufferOverflow" error. That BufferOverflow breaks out of readLoop which then ceases to receive any more data, and the loading process fails.

With working systems, readLoop continues to receive data, a few bytes at a time, and by the time peek() is called, the buffer contains the complete 3-line message.

I do not know why the BufferOverflow happens on some systems and not others, since the data transmitted from the ESP32-S3 has similar timing in all cases. I suppose there might be USB serial chip differences; perhaps they are different revisions of CP210x, but that is just a wild guess.

Regardless, according to my reading of https://wicg.github.io/serial/ which appears to be definitive, there are several errors including BufferOverflow that should not be treated as fatal, but instead retried.

This patch modifies readLoop() to retry all of the recoverable errors, instead of exiting. The other (fatal) errors cause readLoop() to exit as before.

Testing

I tested this change in the context of the installer for the FluidNC CNC application at https://github.com/breiler/fluid-installer . I used it on a large collection of FluidNC controller boards, both ESP32 and ESP32-S3. It works reliably, whereas the code based on the esptool-js 0.5.6 (and also on #228) failed frequently on my test board, and we have also had a lot of user reports of install failures.

Checklist

Before submitting a Pull Request, please ensure the following:

  • 🚨 This PR does not introduce breaking changes. (API remains the same)
  • All CI checks (GH Actions) pass. (This patch does not introduce any errors that were not already present in Read from read loop remove generator logic #228)
  • Documentation is updated as needed. (No API changes)
  • Tests are updated or added as necessary.
  • Code is well-commented, especially in complex areas. (Error handling is explained)
  • Git history is clean — commits are squashed to the minimum necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant