fix(python): skip stale protocol v1 responses#6860
Conversation
WalkthroughThis pull request prevents protocol-v1 desynchronization by making Sequence Diagram(s)sequenceDiagram
participant Client
participant ProtocolV1
participant Device
Client->>ProtocolV1: start probe()
ProtocolV1->>Device: send Initialize
Device-->>ProtocolV1: Features / Failure
alt Received Features
ProtocolV1->>ProtocolV1: probe success
ProtocolV1->>ProtocolV1: sync_responses(_cancel=false)
ProtocolV1->>Device: (no Cancel sent) read/drain queued responses
Device-->>ProtocolV1: stale responses (drained)
else Received Failure
ProtocolV1->>Client: return Failure (abort)
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@python/src/trezorlib/protocol_v1.py`:
- Around line 362-363: The call to sync_responses(transport, _cancel=False)
inside probe() is not passing the probe's mapping argument and therefore falls
back to mapping.DEFAULT_MAPPING; update the call to forward the caller's mapping
(use the probe parameter named mapping) into sync_responses so it uses the same
protobuf map for encoding/decoding the Cancel/Ping pair; ensure you reference
the probe(mapping=...) parameter and pass mapping=mapping to sync_responses.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a3d3633e-323b-43d0-9152-988eef2510d9
📒 Files selected for processing (2)
python/.changelog.d/6859.fixedpython/src/trezorlib/protocol_v1.py
dada916 to
c7e6e10
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
python/src/trezorlib/protocol_v1.py (1)
348-363: 🏗️ Heavy liftPlease add a hardware regression test for the canceled-ping desync path.
Given this fix targets a transport/session desync edge case, a HW CI device test covering “interrupt ping → next Initialize returns Features (not stale Failure(ActionCancelled))” would significantly reduce regression risk.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@python/src/trezorlib/protocol_v1.py` around lines 348 - 363, Add a hardware regression test that reproduces the canceled-ping desync path: from a real device transport simulate an "interrupt ping" by writing messages.Cancel (using the same ProtobufMapping encoding as probe) and consuming any in-flight responses, then immediately call the Initialize flow and assert that the Initialize response decodes to messages.Features (not a stale messages.Failure with code messages.FailureType.ActionCancelled). Target the same primitives used in probe/sync_responses (probe, sync_responses, write, read, mapping) so the test explicitly exercises the Cancel → read/encode/decode sequence and verifies the next Initialize returns Features, not Failure(ActionCancelled).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@python/src/trezorlib/protocol_v1.py`:
- Around line 348-363: Add a hardware regression test that reproduces the
canceled-ping desync path: from a real device transport simulate an "interrupt
ping" by writing messages.Cancel (using the same ProtobufMapping encoding as
probe) and consuming any in-flight responses, then immediately call the
Initialize flow and assert that the Initialize response decodes to
messages.Features (not a stale messages.Failure with code
messages.FailureType.ActionCancelled). Target the same primitives used in
probe/sync_responses (probe, sync_responses, write, read, mapping) so the test
explicitly exercises the Cancel → read/encode/decode sequence and verifies the
next Initialize returns Features, not Failure(ActionCancelled).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 75a1b616-cd89-4fe8-94f9-fa3b7fcf0086
📒 Files selected for processing (2)
python/.changelog.d/6859.fixedpython/src/trezorlib/protocol_v1.py
✅ Files skipped from review due to trivial changes (1)
- python/.changelog.d/6859.fixed
|
Marking as draft until device test is added. |
ad1f595 to
67c688d
Compare
|
This PR also resolves the de-synchronization scenario from #6651 (comment). |
67c688d to
7807f2a
Compare
7807f2a to
58636d4
Compare
|
Without the fix, running With the fix, the test passes with: |
mmilata
left a comment
There was a problem hiding this comment.
Starting to look like sync_responses a bit. Should we limit the number of retries too?🤔
Indeed, but here it's used to make sure the next In THP, the deadlock shouldn't happen since the host also is responsive for incoming messages during writes: trezor-firmware/core/src/trezor/wire/thp/channel.py Lines 495 to 496 in d028d42
Sounds good, added in d028d42. |
|
Squashing and rebasing over |
d028d42 to
1edb54a
Compare




























































































































































Fixes #6859.
Was implemented in 49c9ad0.
Note to QA:
See #6859 (comment) for reproduction.