Skip to content

Fix reassembly of ASTM frames split across multiple TCP reads#78

Open
xispa wants to merge 3 commits into
abbott-architect-clientfrom
astm-frames-span-multi-tcp
Open

Fix reassembly of ASTM frames split across multiple TCP reads#78
xispa wants to merge 3 commits into
abbott-architect-clientfrom
astm-frames-span-multi-tcp

Conversation

@xispa

@xispa xispa commented Jun 4, 2026

Copy link
Copy Markdown
Member

Description of the issue/feature this PR addresses

Make the ASTM protocol robust against real serial-to-LAN gateways, which — unlike the loopback simulator used in tests — deliver the byte stream in arbitrary chunks and emit stray control bytes between sessions. Two related fixes:

  1. Reassemble frames split across TCP reads

TCP does not preserve message boundaries, so a single ASTM frame can be delivered across several reads (and, conversely, several frames/control bytes can arrive in one read). This is the norm with serial-to-LAN gateways (Moxa NPort, Lantronix UDS) that forward a slow serial stream in small TCP segments.

data_received() previously treated each read as one complete frame: a fragment such as b'\x021H|\\^&|||A' was checksum-validated on its own, failed, and was NAKed, while the trailing bytes ('RCHITECT^...') could not be dispatched -- the instrument retransmitted and the session never completed. Confirmed against an Abbott ARCHITECT ci4100 behind a Lantronix gateway.

data_received() now appends to a per-connection buffer and iter_tokens() yields only complete tokens: a single control byte (ENQ/ACK/NAK/EOT), or a whole frame up to STX...(ETX|ETB) + 2 checksum bytes, consuming an optional trailing CR/LF. Partial frames stay buffered until the rest arrives. Downstream handling is unchanged, so this is transparent to both server (listen) and client (connect) modes.

  1. Don't let unexpected control bytes tear down the connection

Raising inside an asyncio protocol's data_received() is fatal -- it drops the connection. Several handlers raised on input that instruments legitimately send, so a stray byte forced a full disconnect/reconnect instead of being shrugged off:

  • on_eot() raised InvalidState when an <EOT> arrived outside a transfer, but instruments emit stray <EOT> (and <EOT>/<ENQ> bursts) between sessions and while establishing -- observed live from the ARCHITECT, which churned the client connection (disconnect + 5s reconnect) repeatedly.
  • on_ack()/on_nak() raised NotAccepted on any <ACK>/<NAK>; as the receiver we should simply ignore these.

These now log and ignore instead of raising, and data_received() wraps handle_data() in try/except as a backstop so no single malformed token can drop the link. Removes the now-unused InvalidState import.

The simulator writes each frame in a single send over loopback, so tests never fragmented and never sent stray control bytes. Adds protocol tests: a frame fed one byte at a time, multiple tokens in a single read, a stray <EOT> (and <EOT>/<ENQ>/<EOT> burst) that must not drop the connection, and unexpected <ACK>/<NAK> that are ignored.

Current behavior before PR

The protocol assumed one TCP read = one complete ASTM frame. data_received() ran each chunk straight through handle_data(), checksum-validating it as if it were a whole frame, and raised on unexpected control bytes. Both assumptions break against real serial-to-LAN gateways:

  • A frame split across reads (e.g. b'\x021H|\\^&|||A' then b'RCHITECT^...') fails the checksum on its first fragment and is NAKed; the trailing bytes can't be dispatched. The instrument retransmits, the fragmentation repeats, and the session never completes — no message is received.
  • Several tokens arriving in a single read (e.g. ENQ immediately followed by a frame) were not all processed.
  • A stray <EOT> outside a transfer (on_eotInvalidState), or any <ACK>/<NAK> (on_ack/on_nakNotAccepted), raised inside data_received() and tore down the connection, forcing a reconnect.

Confirmed end-to-end against an Abbott ARCHITECT ci4100 behind a Lantronix gateway: connection established, <ENQ>/<ACK> fine, but every data frame was NAKed, and stray <EOT>/<ENQ> bursts repeatedly crashed the connection.

Desired behavior after PR is merged

The protocol reassembles frames regardless of how TCP splits or coalesces the byte stream, and tolerates stray traffic:

  • data_received() buffers, and iter_tokens() emits only complete tokens — a single control byte, or a whole frame up to STX…(ETX|ETB) + 2 checksum bytes (optional trailing CR/LF). Partial frames stay buffered; multiple tokens in one read are each dispatched.
  • Unexpected/stray control bytes are non-fatal: on_eot() outside a transfer and unexpected on_ack()/on_nak() log and ignore, and handle_data() is wrapped so no single token can drop the connection.

Result: fragmented frames are received and ACKed correctly and the session completes — verified against the Abbott ARCHITECT ci4100 via Lantronix (full H/P/O/R/C/L message parsed) — and stray <EOT>/<ENQ> bursts no longer cause reconnect churn. Transparent to both server (--listen) and client (--connect) modes

--
I confirm I have tested this PR thoroughly and coded it according to PEP8
and Plone's Python styleguide standards.

TCP does not preserve message boundaries, so a single ASTM frame can be
delivered across several reads (and, conversely, several frames/control
bytes can arrive in one read). This is the norm with serial-to-LAN
gateways (Moxa NPort, Lantronix UDS) that forward a slow serial stream in
small TCP segments.

data_received() previously treated each read as one complete frame: a
fragment such as b'\x021H|\\^&|||A' was checksum-validated on its own,
failed, and was NAKed, while the trailing bytes ('RCHITECT^...') could
not be dispatched -- the instrument retransmitted and the session never
completed. Confirmed against an Abbott ARCHITECT ci4100 behind a
Lantronix gateway.

data_received() now appends to a per-connection buffer and iter_tokens()
yields only complete tokens: a single control byte (ENQ/ACK/NAK/EOT), or
a whole frame up to STX...(ETX|ETB) + 2 checksum bytes, consuming an
optional trailing CR/LF. Partial frames stay buffered until the rest
arrives. Downstream handling is unchanged, so this is transparent to both
server (listen) and client (connect) modes.

The simulator writes each frame in a single send over loopback, so tests
never fragmented. Adds protocol tests that feed a frame one byte at a time
and multiple tokens in a single read.
@ramonski

ramonski commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Seems to be the same as this one: #77

xispa added 2 commits June 4, 2026 16:52
Raising inside an asyncio protocol's data_received() is fatal: it drops the
connection. Several handlers raised on input that instruments legitimately
send, so a stray byte forced a full disconnect/reconnect (5s) instead of
being shrugged off:

- on_eot() raised InvalidState when an <EOT> arrived outside a transfer,
  but instruments emit stray <EOT> (and <EOT>/<ENQ> bursts) between sessions
  and while establishing -- observed live from an Abbott ARCHITECT ci4100,
  which churned the client connection repeatedly.
- on_ack()/on_nak() raised NotAccepted on any <ACK>/<NAK>; we are the
  receiver and should simply ignore these.

Make stray/unexpected control bytes non-fatal: on_eot() outside a transfer
and unexpected on_ack()/on_nak() now log and ignore instead of raising. As
a backstop, data_received() wraps handle_data() in try/except so no single
malformed token can ever drop the link. Removes the now-unused InvalidState
import.

Adds protocol tests: a stray <EOT> (and an <EOT>/<ENQ>/<EOT> burst) does not
drop the connection, and unexpected <ACK>/<NAK> are ignored.
@xispa

xispa commented Jun 4, 2026

Copy link
Copy Markdown
Member Author

Seems to be the same as this one: #77

Yes, both do essentially the same, except that #77 lacks on the control-byte hardening added with 798bdeb

This PR comes with #24 and #61. I've been working on this ARCHITECT integration for a while and has been quite challenging to be honest. Also because there is a serial-to-LAN converter in the middle (Moxa does not work in this case, but Lantronix). I am now focused on make this long-standing integration fully work and will port the differences to 2.x afterwards, once all the fundamentals are in place. Will then be possible for me to fully test this with 2.x

@xispa

xispa commented Jun 4, 2026

Copy link
Copy Markdown
Member Author

Yes, both do essentially the same, except that #77 lacks on the control-byte hardening added with 798bdeb

Going to prepare a small PR against 2.x with this smally by the way.

@xispa

xispa commented Jun 4, 2026

Copy link
Copy Markdown
Member Author

Yes, both do essentially the same, except that #77 lacks on the control-byte hardening added with 798bdeb

Going to prepare a small PR against 2.x with this smally by the way.

See #79

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants