Recoverable read errors #230
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Handle Retryable Errors in readLoop
Apropos of #224 and #228:
I spend a day trying to reproduce the problem with two different Espressif DevkitC N8R8 S3s but was unable to make them fail. My custom board that fails also has an N8R8 S3 module with a CP210x serial chip. So I did a deep analysis of the failure details.
_connectAttempt calls resetStrategy (classic) which resets the chip to download mode. The chip generates the expected 3-line response ending with "Waiting for download". The first line of that response (observed with scope) starts 4 ms after the chip comes out of reset and lasts for about 2 ms. Then there is a 4 ms gap, and the next two lines appear. This timing is identical on all of the modules that I have tested, both working and failing.
With the failing systems, readLoop receives a 26-byte data buffer - the first line of that message, then readLoop receives a one-byte data buffer, and then it throws a "BufferOverflow" error. That BufferOverflow breaks out of readLoop which then ceases to receive any more data, and the loading process fails.
With working systems, readLoop continues to receive data, a few bytes at a time, and by the time peek() is called, the buffer contains the complete 3-line message.
I do not know why the BufferOverflow happens on some systems and not others, since the data transmitted from the ESP32-S3 has similar timing in all cases. I suppose there might be USB serial chip differences; perhaps they are different revisions of CP210x, but that is just a wild guess.
Regardless, according to my reading of https://wicg.github.io/serial/ which appears to be definitive, there are several errors including BufferOverflow that should not be treated as fatal, but instead retried.
This patch modifies readLoop() to retry all of the recoverable errors, instead of exiting. The other (fatal) errors cause readLoop() to exit as before.
Testing
I tested this change in the context of the installer for the FluidNC CNC application at https://github.com/breiler/fluid-installer . I used it on a large collection of FluidNC controller boards, both ESP32 and ESP32-S3. It works reliably, whereas the code based on the esptool-js 0.5.6 (and also on #228) failed frequently on my test board, and we have also had a lot of user reports of install failures.
Checklist
Before submitting a Pull Request, please ensure the following: