Skip to content

Serial bus No. 2: Update over Serial#182

Merged
JensOgorek merged 56 commits intomainfrom
serial_bus_ota
Mar 9, 2026
Merged

Serial bus No. 2: Update over Serial#182
JensOgorek merged 56 commits intomainfrom
serial_bus_ota

Conversation

@JensOgorek
Copy link
Contributor

@JensOgorek JensOgorek commented Nov 28, 2025

Motivation

Enable firmware updates for devices connected via SerialBus. Since we have serial access to the bus coordinator, we can use it to relay firmware chunks to remote nodes without requiring direct USB connection to each device.

Implementation

Python tool (otb_update.py):

  • Sends firmware in base64-encoded chunks via bus.send() commands
  • Uses a sliding window (default 8 packets in flight) for flow control
  • Supports --expander flag to pause broadcasts during transfer
  • Simple top-to-bottom script flow with minimal helpers (wait_ack, transact)

Lizard firmware (main/utils/otb.cpp):

  • Handles OTB protocol messages (__OTB_BEGIN__, __OTB_CHUNK_<seq>__, __OTB_COMMIT__, etc.)
  • Specific ack types (__OTB_ACK_BEGIN__, __OTB_ACK_CHUNK_<seq>__, __OTB_ACK_COMMIT__)
  • Decodes base64 chunks and writes to flash using ESP-IDF OTA APIs
  • Session timeout protection and error handling

Broadcast pausing (core.pause_broadcasts() / core.resume_broadcasts()):

  • Stops module broadcasts to prevent interference on serial bus while updating
  • Added static broadcast_paused flag in module.cpp

Expander buffer increase:

  • Increased expander proxy/property/call buffers from 256 to 512 bytes

Removed

  • Old HTTP-based OTA via core.ota(url, md5, sha256) method
  • OTA verify task from main.cpp startup

Progress

  • The implementation is complete.
  • Tested on hardware.
  • Documentation has been added.

@JensOgorek JensOgorek marked this pull request as ready for review December 2, 2025 15:39
falkoschindler and others added 6 commits December 3, 2025 14:33
… with OTA support

- Adopted refactored echo relay mechanism from serial_bus (echo_set_target, echo_relay_handler)
- Preserved OTA functionality (ota::bus_tick, ota::bus_handle_frame)
- Replaced BusFrame with IncomingMessage for consistency
- Removed duplicate includes
- Merged handle_frame() and handle_message() to support both OTA and command processing
Base automatically changed from serial_bus to main December 18, 2025 15:10
Resolved conflicts keeping main's updated serial bus implementation
while preserving OTA functionality:
- Kept ota.h include and ota_session member
- Added ota::bus_handle_frame call to handle_incoming_message
- Added ota::bus_tick call to step()
- Used main's callback-based echo system
- Used main's updated naming (make_coordinator, enqueue_outgoing_message, etc.)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@JensOgorek JensOgorek marked this pull request as draft January 20, 2026 13:24
@JensOgorek JensOgorek changed the title Serial bus #3: OTA over Serial (reverted) Serial bus update 2: OTA over Serial Jan 20, 2026
@JensOgorek JensOgorek changed the title Serial bus update 2: OTA over Serial Serial bus update 2: Update over Serial Jan 20, 2026
@JensOgorek JensOgorek changed the title Serial bus update 2: Update over Serial Serial bus No. 2: Update over Serial Jan 20, 2026
Copy link
Collaborator

@falkoschindler falkoschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JensOgorek Can you please have a look at the following issues found by prompting Opus 4.6 with "/review 182". Especially MAJOR-1 needs your judgement since you're more familiar with the code's intention. The remaining issues are probably easy to fix, some of them only need a comment to please the reader (either AI or future-us).


MAJOR

1. Timeout error response sent to wrong node

In serial_bus.cpp:60-67, when bus_tick detects a timeout, fail() writes the error response and then calls bus_reset_session() which zeroes session.sender. When step() then tries to dispatch the response, it sends to sender=0 instead of the actual peer:

// step()
if (this->otb_session.handle != 0) {
    otb::bus_tick(this->otb_session, millis());        // fail() resets sender to 0
    if (this->otb_session.response_length > 0) {
        this->enqueue_outgoing_message(
            this->otb_session.sender, ...);            // sender is now 0!

Fix: save the sender before bus_reset_session, or have bus_reset_session preserve sender/response when there's a pending response. Alternatively, dispatch the response inside bus_tick itself (or pass a send callback).

2. KeyboardInterrupt doesn't abort the firmware session

In otb_update.py:89-91, KeyboardInterrupt just prints and exits. The firmware session stays alive until the 10s timeout. By contrast, the OtbError handler does send __OTB_ABORT__. The interrupt handler should also attempt an abort:

except KeyboardInterrupt:
    print('\nInterrupted')
    transact('__OTB_ABORT__')
    sys.exit(1)

3. --reset-partition uses hardcoded flash addresses

In espresso.py:208-209, reset_partition() erases 0xf000 / 0x2000 which is the default otadata location. This will silently break if the partition table ever changes. A comment documenting the assumption (and which partition table it matches) would reduce the risk.

CLEANUP

4. Duplicate response dispatch pattern

The "check response_length, enqueue, clear" pattern appears identically in serial_bus.cpp:62-66 (for timeout) and serial_bus.cpp:227-232 (for immediate responses). Extracting a small helper like send_otb_response_if_pending(sender) would deduplicate this and also fix issue #1 above:

void SerialBus::send_otb_response(uint8_t to) {
    if (this->otb_session.response_length > 0) {
        this->enqueue_outgoing_message(to, this->otb_session.response,
                                       this->otb_session.response_length);
        this->otb_session.response_length = 0;
    }
}

5. CHUNK_SIZE / WINDOW not linked to firmware constants

otb_update.py:10-11 defines CHUNK_SIZE = 174 and WINDOW = 8 which must stay in sync with BUS_OTB_CHUNK_SIZE in otb.h:21. A comment noting the dependency would help.

6. Protocol flow diagram has a missing __

In docs/tools.md line 164 of the diff, the flow diagram shows __OTB_ACK_COMMIT instead of __OTB_ACK_COMMIT__ (missing trailing underscores). Minor but inconsistent with the rest of the diagram.

@JensOgorek
Copy link
Contributor Author

Updated the code with the suggestions by the review. Also did some AI based reviews myself and tested the code afterwards to ensure functionality.

@falkoschindler falkoschindler self-requested a review February 20, 2026 15:24
Copy link
Collaborator

@falkoschindler falkoschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JensOgorek I finally finished this review. Please have a look into my recent 13 commits with some smaller and some larger findings and check if you agree. It still compiles, but you should perform a final hardware test.

Besides some architectural improvements, I'm particularly glad that I could remove the custom implementation of a base64 decoder. I wonder why the AI didn't spot it.

Opus still finds issues when asked for a review, but nothing substantial as far as I can tell.

@JensOgorek
Copy link
Contributor Author

@falkoschindler perfect. I will take a look and test it tomorrow 👍

@JensOgorek
Copy link
Contributor Author

The handling of OTB response messages (ACK/ERROR) back to the coordinator had no explicit handlers in bus_handle_frame — they were silently falling through. This broke during refactoring when the echo was moved to serial_bus.cpp. Added dedicated handlers for ACK and ERROR responses. The code is also more understandable now.

@JensOgorek JensOgorek merged commit 9ddc9fa into main Mar 9, 2026
@JensOgorek JensOgorek deleted the serial_bus_ota branch March 9, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants