USB4 RE WIP#61
Draft
robbederks wants to merge 46 commits into
Draft
Conversation
Fresh ghidra-first reimplementation of the ASM2464PD USB4/Thunderbolt
enumeration in the handmade firmware, additive to the USB3 path. Brings the
TB4 host through, every step confirmed on the CC wire with the CY4500 analyzer:
- USB-PD engagement: CC Rp/Rd termination + PD engine arm + INT1 routing so
the host does PD with the handmade fw at all (pd.h). This keystone was the
wall every prior attempt hit; cracked via byte-exact RE of the CC/PD engine.
- Power contract: Source_Cap -> Request(RDO) -> Accept -> PS_RDY (pd_dispatch.h).
- VDM discovery: device answers Discover_Identity ACK, advertising USB4-capable
VID 0x174C (vdm.h).
- USB4 mode entry: host Enter_USB[mode=USB4] -> device Accept.
Reverse-engineered fresh from the fw_tinygrad.bin Ghidra decompilation;
handmade/USB4_RE.md is the consolidated per-phase reference. Remaining:
USB4 lane training / sideband / connection-manager / PCIe tunnel -> GPU.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Forward progress on USB4 entry, additive + faithful to stock fw_tinygrad:
- Fix pd_handle_enter_usb @0xA036 Accept branch: also set 0x07BA=1 (the
Connect_U4 gate), not just 0x07BB. handmade now prints [Enter_USB 4]
[Connect_U4] matching stock; PD/Enter_USB wire exchange byte-identical
(REQUEST 0x1204B12C, ACCEPT 0x0483).
- usb4.h: usb4_connect_u4 @0xA3F5 bank0 head (E716/CA81/CA06 gated on
0x0AF1.0; 0x09FA route-mode latch) + the int1_isr USB4 demux
(usb4_int_demux, gated 0x09F9&0x83) mirroring int1_isr_orchestrator
@0x4486: C80A.5 SB / C80A.4 event / EC06.0 router-op / C80A.0-3 tunnel.
Stopping point (HW): host raises ZERO USB4 interrupts (usb4_int_seen=0),
E302 stays 0x83 (link-mode 0). The trigger is the device-side sideband
bring-up (bank1 b230 + sb_block_init @0xBB37, SB block at page-1 0x012800),
whose bank1 helper logic is not statically byte-recoverable. Next: live
MMIO trace of the SB block during a stock [SB Init] to capture the writes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… [M1 negative]
M0 (HW-verified improvements):
- pd.h: forced 0x09F9 0x01 -> 0x87 (bit7 VDM-ACK-enable + route 3). CY4500 now shows
Discover_Identity ACK vid=0x174C modal=1 devcap=USB2|USB3|USB4 alt=TBT3 (was vid=0).
- main.c: boot zero-init 0x0B02-0x0B1F, 0x06F1, and 0x07ED (the [Connect_U4] one-shot
suppress flag was uninitialised XDATA, randomly skipping usb4_connect_u4).
- sb.h: P1_REG8/SB_REG8 page-1 (DPX=1) accessors.
M1 (faithful, but does NOT train E302):
- sb.h: sb_lane_flip_init (bank1 b230) + sb_block_init (bank1 bb37) + the ROM descriptor
tables (CODE 0x213d/0x21d4, byte-exact) + PHY seed/lane cfg, wired as the usb4_connect_u4
tail. Runs cleanly on HW ([flp=01][SB Init]) but E302 stays 0x83 / link-mode 0, and NO
USB4 interrupt fires. The plan hypothesis 'b230+bb37 trains E302' is refuted on HW.
- Root insight: the device presents USB2 (SS_FAIL fallback) which preempts USB4; the real
E302 trigger is the upstream USB4 PHY/SERDES training, not the sideband. Next direction.
Bank1 now statically RE-able via ghidra CODE_BANK1:: (mcp overlay fix). Plan: USB4_TUNNEL_PLAN.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… block reads back 0] - main.c: USB2-fallback guarded by !(0x09F9&0x83) (no USB2 drop in USB4 mode); boot fork skips usb_init_controller in USB4 mode (regression guard keeps it for USB3 hosts); pd_keystone_init before the fork; sleep() timer-expiry spin bounded (CC10-13 = shared PHY/PD mailbox, not a pure timer) + post-SB busy-NOP delay; M1' [*** USB4 TRAINED ***] check. - usb.h: usb_pipe_engine_init (stock B1CB PIPE cfg) + usb4_phy_arm (cc10 subcmd4 + E318.4 wait) + boot_phy_early_settle (CE79 cc10 settle subset). - sb.h: SB write self-readback diagnostic. HW (TB4): Step A works (no [USB2 fallback], is_usb2=0, full PD->Enter_USB->[SB Init] on wire, E318.4 up, 91C0&0x18==0x10) but E302 stays 0x83 (not trained); host never raises C80A.5/EC06.0. KEY: SB writes read back ZERO (SB[0x81]=00 after 0x08) while other MMIO reads fine -> the SB transport block is unpowered. Lead: boot_phy_bringup_early @0xCE79 cf28/d0d3/bank1 ed02 (Type-C SBU PHY + early SB setup) are omitted by handmade -> SB block never enabled. Next: transcribe them. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e sideband block
THE WALL IS DOWN. Reproducing stock boot_phy_bringup_early @0xCE79 IN FULL (no omissions, per
'do as the original firmware does') powers the USB4 sideband transport block and the upstream
PHY trains E302 to link-mode 3 (best=0x3C -> (>>4)&3==3). Every prior session was stuck at
E302 link-mode 0.
Root cause: the SB block enable is bank1 ED02's SB[0x05]|=0x80 (page-1 0x2805 bit7), inside the
Type-C/SBU PHY setup (d0d3/cf28/ed02) that prior attempts skipped as 'HW-risky'. Without the SBU
PHY powered, sb_block_init's writes didn't land (read back 0) and the host never trained USB4.
- boot_phy.h (NEW): full faithful transcription of boot_phy_bringup_early (d0d3 Type-C SBU +
cf28 PHY cfg + bank1 ED02 SB-enable + C233/bd5e + cc10 settles + dd42 + d996 PCIe pre-stage),
every helper expanded from disassembly.
- main.c: call boot_phy_bringup_early() right after flash_init() (stock's position).
HW (TB4 host): SB block powered (sb05=E3, SB regs read back non-zero), [*** USB4 TRAINED ***],
PD/VDM/Enter_USB still on the wire. E302 trains transiently; next = service C80A.5 (SB-router
a066) so the link holds + advances to PCIe tunnel -> AMD GPU.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…T: source starved ONE decisive on-the-wire experiment (USB4_AUDIT_PLAN §6) to make the TB4 host negotiate USB4 with the handmade fw. Phase A + Phase B as one flash. Phase A (stop active sabotage): * sleep() retargeted from CC10-CC13 (the PHY/PD command mailbox shared with the USB4/PD link engine) to the independent Timer1 block CC16-CC19. registers.h: REG_TIMER1_CSR made volatile + documented; CC10-CC13 defs untouched. * usb_phy_tune()/pcie_power_off()/pcie_power_on() gated behind !(0x09F9&0x83) so they only run in non-USB4 mode. 0x09F9 set to 0x87 early (after boot_phy_bringup_early) so the gate reads USB4 intent, not the boot-default. The deferred sb_tunnel_up_pending->pcie_power_on() tunnel-up path is kept. Phase B (restore the interrupt engine): * cc_pd_timer_tick() @0xB4BA transcribed verbatim into pd.h (CC23/CC81/CC91/ CC99/CCD9/CCF9 service; CC91.1->usb4_mode_entry_commit, CC81.1->hard/full reset). Sub-fns pulled from ghidra: cc_state_full_reset (d676, [Error_Recovery] print), pd_cc81_hard_reset_4 (e90b), pd_queue_ctrl_msg (e529, deep mailbox send NOTE'd), cc_cc23_reinit_event/cc_cc99_default_event/cc_ccf9_subdemux (banked tails NOTE'd, W1C acks kept). * usb4_mode_entry_commit (vdm.h) now returns uint8_t (4 if 0x09F9.6 else 1) per stock d78a R7, stored to 0x0AE2. * int1_isr (main.c) reordered to stock @0x4486: timer-tick FIRST/ungated, CC33.2 W1C, C80A.6 PD-RX, gated USB4 demux, C806.4. * Instrumentation: tick_seen/cc_hit + C806/CC91/CC81/C809 on the [U ...] line. ON-THE-WIRE RESULT (Intel Meteor Lake TB4 host, AMD RX 9060 XT downstream): A2 confirmed (no boot [PCIe ...] power_on spam). PD/VDM/Enter_USB[USB4] still complete every plug. BUT timer-tick is STARVED AT THE SOURCE: tick_seen=00, c806=00 (C806.0 never asserts), cc91=00/cc81=00 (edges never set), c809=20 (bit1=PD-int-enable NOT set). E302 holds 0x83 (mode 0, NOT trained, steady - not a transient). C80A.5 never fires (c80a=00, c80aACC=00). Root cause is audit S5/O7: the USB4-mode C809.1 PD-interrupt enable (stock D894) is missing, so the policy-engine timer-tick source is never armed. Keystone path B1 is correctly wired but cannot run until its source is enabled (Phase C/D). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add app/patch_sbtrace.py: code-cave hook (same mechanism as patch_usb4trace.py) into STOCK fw_tinygrad.bin at the bank1 a066 [===SB Con===] print site (body offset 0x12052, the `mov R3,#FF;R2,#0x20;R1,#0x56` string-ptr setup). At the SB connect transition it UART-dumps the gating regs (0x0AF1,C80A,E302,0x09F9,0x09FA, EC06,91C0,E318) + the full 256-byte SB page-1 block (DPX=1 XDATA 0x2800..0x28FF). Cave + UART helpers live in the low shared (<0x8000) region so the bank1 site can reach them. Verified on HW: stock reaches the AMD RX 9060 XT / tbt router 1-1. Extend handmade sb.h sb_assert() diagnostic to emit a matching [HMSB:...] line (same gating regs + same full SB page) right after our SB-assert, for a directly comparable stock-vs-handmade diff. Did NOT apply the proposed 0x09F9=1 change: ghidra 0x8D77 (usb4_cap_apply_09f9, step 4f) sets the RUNTIME 0x09F9=0x87 in normal USB4 (0x09F4==3); b1cb's =1 is only an intermediate. Stock's captured runtime 0x09F9 IS 0x87 (bit7 = VDM-ACK). 0x09F9=1 would clear bit7 -> handmade NAKs all VDMs -> no Enter_USB/SB-assert -> no capture. Handmade's existing 0x87 is faithful to stock. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…E31) Complete the truncated SB-block config so handmade's pre-connect SB page matches stock, transcribed VERBATIM from ghidra bank0 + bank1 disasm (cited per write). Fixes 4 of 6 stock-vs-handmade SB-page config diffs: - 0x0AF1: seed 0x00 at boot (main.c). Stock bank0_92c5 @0x94d1-0x94df always writes 0x0AF1 (0x00 on the no-OTP-override path, else 0x3F); handmade never ran 92c5 so it sat at uninitialised power-on 0x55, wrongly tripping the 0x0AF1.0/0x0AF1.4 connect gates. Now reads 0x00 (MATCH). - SB[0x1C]=0xC2: restore the full db0d block (T3) in usb4_irq.h. Stock CODE_BANK1::db0d sets SB[0x1C] |= 0x80|0x40|0x02 (=0xC2) plus the page-0x12[0x62] &=0xEF writeback, SB[0xED]/[0xCE]/[0x1D], C20B/C22F that handmade had truncated to a discard read. Now reads 0xC2 (MATCH). - SB[0xBA]=0x3F, SB[0xBD]=0x3F: restore the bb37 tail (T5, bc45..bc4e) in sb.h, plus the e34b PHY RMWs (b70d C2C3/C343, b796 C21C, b73b) handmade dropped. Both now read 0x3F (MATCH). - usb4_irq_ef24: transcribe the FULL 8E31 dual-lane PHY-RX descriptor config (CODE_BANK1::8e31..0x9259, ~150 C2xx/C3xx/page-0x93 RMWs) that handmade had truncated at SB[0x49]=0xA0. This is the producer of the SB[0x94..0x99] PHY-RX descriptor (nothing writes those SB bytes directly — confirmed by a full-image scan). HW (Intel MTL TB4 host, flashed+verified on NUC): 0x0AF1/SB[0x1C]/SB[0xBA]/ SB[0xBD] now byte-MATCH stock in the [HMSB:] dump. E302 moved 0x83->0x97 at [SBdone]. Residual DIFFs: SB[0x94..0x99] (02 71 00 3E 80 stock vs 00 00 8C 00 00 FA) and SB[0x0B] (02 vs 00) are HW-latched from live PHY/lane-training state, not RAM writes, so they only populate once the lanes train — which still needs C80A.5 (SB-router connect), which the host does not raise. PD + Enter_USB[4] + Connect_U4 + SB Init all complete on the wire; C80A.5/EC06.0 remain 0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Three faithful, ghidra-verified SB-PHY RX-lock fixes (the C80A.5 wall).
1. Orientation plane bug (audit I11). bank1 b230 sb_lane_flip_init writes the
orientation lane map via the r3_xdata paged accessor with R3=2/R2=1/R1={1,2}
== plane DPX=1 at XDATA 0x0101/0x0102 (the P1[0x01xx] plane), verified through
the 0x0ac1/0x0adc accessor (DPX=R3-1) and the helper bodies (0x9958=set b0,
0x97fc=set b1). handmade was writing SB[0x01]/[0x02] (DPX=1 at 0x2801/0x2802) --
the WRONG plane, so the RX was enabled on the wrong SBU pins. Re-addressed
sb.h orientation + connect-lane-bond writes to P1[0x0101]/[0x0102] with the
byte-exact sense (flip -> set b0/b1; straight -> clear).
2. Full ef1e SB-PHY RX arm (audit T2). bank1 ef1e = lcall d0ac; ljmp 9a63 is
~324 paged RMWs across PHY pages 0x78-0x7b + descriptor pages 0x60/64/68/6c
(4-lane RX equalizer/rate config). handmade did only ONE 0x7834 RMW. Recovered
byte-exact by symbolically executing the stock bytes through the r3_xdata
accessor + the streaming-RMW helpers (9388/9403/940a/9267/9661/9668/925a/9670);
emitted as u4rx_tab[][4] in new usb4_rx_table.h and applied as RMW in
usb4_irq_ef1e. The C8FF==5 lane-rate-gated tail is documented as omitted (could
not re-derive faithfully) rather than guessed.
3. Full 92C5 RAM-state seed (audit O6/S4). Replaced the lone 0x0AF1=0 byte with
the whole bank0 92C5 head seed + non-OTP LAB_94c5 tail: 0x0AE9=0x0F (WIDTH),
0x0AEE=3 (MODE), 0x0AE3..0x0AF0=1, 0x0AEB..0x0AEF=3, then 0x0AF1=0, C65A&=0xF7,
CC35&=0xFB, 0x905F&=0xEF. (Note: the audit's "zero these" guess was wrong --
stock seeds them to 1/3.)
HW RESULT (Intel MTL TB4 host): builds clean (23853B), boots, full PD+Enter_USB
[USB4]+SB. Inner verify: SB[0x01] now reads FF (matches stock; was FC), 0x0AF1=00,
[U4irq] ran. BUT THE WALL PERSISTS: C80A.5 still never fires (C80A=0x00 across 295
samples), E302 settles mode0 (transient 0x97/mode1 at SBdone only), SB[0x94-0x99]
still 00 (not stock's 02 71 00 3E 80 -- RX never locks), no GPU (1002:7590 absent),
no device router 1-1. A complete+correct SB-PHY RX path + correct orientation does
NOT raise C80A.5 -> strongly implies the trigger is dynamic/host-level; next step
is instrumenting stock's [SB Init]->[===SB Con===] window, not more static config.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…k0_8a89 + unmask SB-PHY-RX)
Implements the RE-AUDIT coordinated change (handmade/USB4_REAUDIT_PLAN.md). For the first time
in the project the Intel MTL TB4 host asserts C80A.5 (SB-router connect) with handmade fw:
c80aACC=0x20, [===SB Con===] fires, usb4_int_seen bit0 set. Every prior session had c80aACC=00.
(A) Unmask the C80A.5 SOURCE — complete the dropped SB-block arm tails, transcribed verbatim:
- sb.h sb_lane_flip_init: b230 tail b307-b39e (CODE_BANK1::b230) — SB 0x94/0x95/0x96 + 0x98/0x99
PHY-RX descriptor (0d59/0c7a), CCD8/CCDA/CCDB=200 + CCD9 strobe (97ef), **C801.4 SET**
(verified on HW: c801=10), **C809.3 SET** (verified: c809=28), SB 0xCF/0x53/0x5D/0x27/0x2D/0x2C,
0x072B/0x072C, the C8FF==4 ROM-copy CODE 0x21b4[0x10] -> SB[0x3e..0x4d].
- sb.h sb_rom_descriptor_load: b7a4 tail b83b-b8d8 (CODE_BANK1::b7a4) — SB[0xC9] walking-1
connect-detect arm 1,2,4,8,0x10,0x20,0x40,0x80 (9945/9756/994e), SB 0x28/0x2A/0x2C=4/0x66,
0x6EE=SB[0x24].0 / 0x6EF=SB[0x80].0, CCD9 strobe, SB 0xD4/0x8F, connect-state RAM init.
- usb4_irq.h: fixed the wrong "SB 0x94..0x99 HW-latched" comment (b230 writes them explicitly).
(B) Port bank0_8a89 (the USB4 lane-MODE bring-up engine the truncation hid) -> new usb4_connect.h,
transcribed verbatim from decompile_function(0x8a89): E764=(&0xEF)|0x10 LINK-MODE-ENABLE (verified
on HW: e764=14, bit4 set), E751 arm, E710/CA06 rate latch (0x0A9F/0x0A9E), dd42 E7E3 per-mode,
phy_cc10_cmd_wait(0,0x27,2), the dynamic while(bd6c()) loop pumping pd_rx_isr on C80A.6, d0d3 SBU,
usb4_connect_u4() tail. Helpers bd49/bd57/bd6c/bd50/bd33/bd23/bd3a/bd65/bcfe/bceb/bd5e/bd2a/bcf2/
bd41/bd14/dcd4 all decompiled + transcribed. cd10's terminal handoff is deferred to the super-loop.
Plus bank0_c9a8 (decompile 0xC9A8): gate 0x09FA.2 && 0x0AF1.0 && (0x07E8||0x07EB), then bank0_8a89.
(C) Wire the INT0 0e5b link-event demux (main.c int0_isr): 0x9101.0->0x91D1.1 W1C=2 -> c9a8(0);
0x9101.4->0x9302.2 W1C=4 -> c9a8(1). (decompile 0x0e5b/0xe94d/0xe952). HW FINDING: the host NEVER
drives 0x9101/0x91D1/0x9302 (all read 00) -> bank0_8a89 is unreachable via INT0 on this host.
(D) Set the c9a8 gate flags (all were CLEAR on the live path): vdm.h mode==2 route sets 0x09FA|=4 +
0x07E8=1; usb4.h usb4_connect_u4 sets 0x0AF1|=1 at entry and PRESERVES 0x09FA.2 across the route
latch (was clobbered by 0x09FA=0x09F9&3). Verified on HW: 09fa=07, 0af1=01, 07e8=01 (gate open).
(E) Removed the E764=0x1C fabrication in pcie_power_on (stock never writes E764 as a literal; it
clobbered the CM-owned link-mode nibble); pcie_power_off now preserves E764 bit4.
Chicken-and-egg driver: since the host raises C80A.5 but not the INT0 link-events, drive bank0_8a89
ONCE from the SB-router connect (sb_con_consequence -> super-loop bank0_c9a8(0), EX1 masked across
the run). Also completed sb_con_consequence (dea1, CODE_BANK1::dea1) verbatim — restored the dropped
0x06EC=1 arm + db7a tunnel-route arm that were causing the [===SB Con===] storm.
HW STATUS (Intel MTL-P TB4 host, AMD RX 9060 XT eGPU):
- C80A.5 FIRES (was the project wall): c80aACC=20, [===SB Con===], usb4_int_seen bit0.
- bank0_8a89 runs to completion (E764.4 link-mode armed); pcie tunnel-up fires.
- E302 does NOT sustain trained: stays 0x83 (mode0) across all samples; downstream PCIe LTSSM
stays 0x01 "down" -> [PCIe timeout]; no AMD GPU in lspci, no TB router 1-1.
- Remaining gap: lane-bond never completes -> next = the device->host SB-transport router-op
responder (RE-AUDIT #4/#7: e56f + a327) + the per-loop cb10 lane advance (#6).
- Device healthy (full PD + Enter_USB[USB4] on the wire; re-enumerable; not bricked).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…#7 Implemented the REAUDIT #4/#6/#7 device->host router-op layer faithfully, but the STEP-0 instrumentation (run first, per plan) decisively refutes the hypothesis that "the device never ANSWERS the host CM's router-op queries": the host posts NOTHING to any router-op mailbox after [===SB Con===]. Changes (all gated faithfully = no-ops on the Intel MTL TB4 host, correct if a host ever advances past SB-connect): - usb4_irq.h: usb4_routerop_init() = bank1 e56f VERBATIM (EC00.0/EA88=100/ EA89=0x24/EC04/EC05.0/C807.7 RX-enable/0x0B02=0). Called at boot gated 0x09F9&0x81. Arms the SB-transport RX mailbox so the host CAN post. - sb_router.h: sb_cb10_lane_advance() = bank1 cb10 (#6) — per-loop SB[0xA0]/ [0xA1] lane-status readout vs 0x072B/0x072C latches; the [Pend Int] SB[0x26].1 -> a5d8 device->host responder branch (#4-extra; fixes the SB[0x26]/SB[0x9E] mismap). - usb4.h: cm_routerop_mailbox() = bank1 c0a5 (#7c) config-space router-op (EA90==0x5A gate, 0x0B02 state machine, C805|=0x02 + EA90=0xA5 reply); wired into usb4_int_demux EC06.0 (replaces the EC04=1-only stub). - main.c: cb10 wired in super-loop (gated (0x09F9&0x83)&&0x06EC, EA=0); the [ROPB] burst probe (10 samples, EX1 masked) that captured the decisive trace; gate the deferred pcie_power_on on E302 trained (kills the timeout-spam that drowned the probe). HW (reproducible): [ROPB ce88=00 ce89=00 ea80=00 ea90=00 ec06=00 sb26=00 c80a=20 e302=83] x10. e56f ran (c807=80). C80A.5 stays asserted (connect up) but E302 stays mode0 (NOT trained); lanes report 0x07 (not CL0); no GPU; no 1-1 router. The host engages SB-connect then STALLS, gating router enumeration on lane training the device never achieves. The wall is LANE TRAINING, not the router-op responder. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…) — E302 now transiently mode2 Re-instated the complete usb4_connect_u4 @0xA3F5 faithfully (the a415 pre-gate dd42(0)/e7c1(1)/e0d9(0), the 0x07B9 fallback, and the a48c-a516 tail: edbd SB[0x1C], e5b0 SB-PHY-RX descriptor, the bcd7 tunnel/lane-rate-train block with the e7ae C006/C00E PHY-lock wait correctly gated on 0x0AF1.4 + E710 rate latch + ee82 B430|=1 tunnel link-up, the 0x09FA.1/0x924C block, ccb3 lane-config + c270 DROM PID latch + d556). The prior 'a48c-a516 REGRESSED E302' verdict was invalid (measured before bank0_8a89 armed link-mode, so e7ae had nothing to lock onto). Transcribed from decompile_function of 0xA3F5 + the bcd7/bcc4/edbd/e5b0/c270/ccb3/d556/ee82/e7c1/e0d9 helpers; stock addrs cited per block. SETTLED the lane-RATE gate (part 2): C8FF is a READ-ONLY HW status register. All 8 C8FF accesses in the image are MOVX A,@dptr (verified via disassemble_bytes); NOTHING writes it. It is the PHY-negotiated lane rate; the firmware only reads it to gate E751 (0x0AA0.0, set in bank0_8a89 only when C8FF>=6). Lane state SB[0xA0]/[0xA1] is likewise PHY-driven (cbbe only hex-prints to C001, no lane-advance write). So rate->Gen3 / lanes->CL0 is a HW/PHY outcome, not a device write. HW (Intel MTL TB4 host): full tail runs (e764=14 link-mode armed, C80A.5 fires); E302 now TRANSIENTLY reaches mode2 (SBDIAG best=6C) — first time vs mode0 every prior session — but does NOT sustain (steady E302=0x83 mode0). C8FF stuck at 4 (<Gen3) so E751 never arms; lanes stuck at SB[0xA0/A1]=0x07 (never CL0=2). No AMD GPU 1002:7590, no thunderbolt 1-1 router. The wall is PHY-level lane-rate negotiation, not a missing device write. Added [LANE ...] per-iteration UART trace (c8ff/aa0/sba0/sba1/e710/ca06/a9e/a9f/b430/b432/ e751/e763/e765). Build clean (28403B). Board healthy. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… tracer Code-caves stock fw_tinygrad.bin at the super-loop top (main_boot_and_ superloop @0x2FB4; hook the `mov dptr,#0x0AE2` at body 0x2FC0, after `clr EA`) to UART-log C8FF/E751/E302/SB[0xA0]/SB[0xA1]/0x0AA0/E710/CA06/E764/ C80A on CHANGE of a 1-byte signature (stored XDATA 0x0BFE), giving a non-flooding timeline through the USB4 train window. Stock's own [SB Init]/[===SB Con===]/[PcieTunnel-*]/[*** USB4 Gen3 x2 ***] prints are kept for time-alignment. Cave at 0x5E00, prefix at 0x5FC0 (both flat <0x8000); reuses 0x538D/0x51C7 UART helpers. MEASUREMENT (HW, Intel MTL TB4 host, AMD RX 9060 XT 1002:7590 confirmed tunneled, router 1-1 present): stock reaches the GPU with C8FF==4 the ENTIRE run, E751==0 the entire run, 0x0AA0 never==0x0B, E302 peaks mode1 and settles mode0(0x83) at GPU-up. Lane state is SB[0xA0]/[0xA1] 0x07->0x01->0x02(CL0), driven by the [PcieTunnel] sequence not by C8FF. => Gen3/E751 are a RED HERRING; the handmade wall is elsewhere. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… states 4/5 = deep PHY TODO)
Captured the STOCK success lanetrace on the real HW (handmade/stock_lanetrace_success.txt,
via app/patch_lanetrace.py) and RE-mapped the full lane-bond / CL0 / PCIe-tunnel engine, then
wired its driver into handmade.
GROUND TRUTH (stock, confirmed): reaches the GPU with C8FF=04 (Gen2) the WHOLE run, E751=0,
E302 mode0 -- so the SUCCESS axis is the per-lane CL state SB[0xA0]/[0xA1] going 0x07->0x01->
0x02 (CL0), NOT any rate/E302-mode. Ordered: [===SB Con===] -> [SB P03] -> [ConnRout](0x0AA0
38->3C) -> [SB P04] -> [PcieTunnel-PwrOn] -> Chg2 20G -> [RstRxpll][Done] -> [CDRV ok] ->
[L0 OS1][L1 OS1] -> [SB P05][Trig] (SB[0xA1]07->01, CA06 61->01, E764 14->19) -> SB[0xA0]07->01
-> [SB P00] -> L0:CL0 02/L1:CL0 02 -> [PcieTunnel-Deassert] (SB[0xA0/A1]->02=CL0) -> Lane Bonded
-> [*** USB4 Gen3 x2 ***] -> GPU.
ENGINE ARCHITECTURE (decompiled verbatim): a self-advancing FSM in XDATA 0x06ED, driven from the
SUPER-LOOP (not the ISR):
- dea1 (sb_con_consequence) -> db7a -> eb62(0,3): sets 0x06ED=3 + 0x06EC=1 (the cb10 enable)
- cb10 runs every super-loop iteration (gated (0x09F9&0x83)&&0x06EC, EA=0); its TAIL calls e672
- e672 (CODE_BANK1::e672) dispatches by 0x06ED: 3->cm_conn_routing_setup (a7de, [ConnRout]);
4->b0b4 (PcieTunnel-PwrOn/Chg2/RstRxpll/CDRV/OS1 lane-bond); 5->8000/850b (the per-lane
CL-state walker that drives SB[0xA0]/[0xA1] 07->01->02 + prints [Trig]/CL0)
- each state body calls eb62(0,N) -> [SB P0N] -> advances 0x06ED.
handmade was COMPLETELY missing this: its db7a omitted the eb62(0,3) arm and its cb10 was
observe-only (never called e672), so 0x06ED was never set and the engine never ran -- SB[0xA0]/
[0xA1] stayed 0x07 forever.
THIS COMMIT (handmade/src/usb4_lanebond.h, new): the FSM DRIVER + state-3 transcribed faithfully:
- eb62 ([SB P0x] + 0x06ED state set), e672 (dispatcher), cm_conn_routing_setup ([ConnRout] +
0x0718 route-enable + the 0x077a/0x081a/0x0819->0x0750/0x0751 lane-width latch, verbatim)
- main.c: wire the ARM (db7a effect: first-arm 0x06ED=3 -> [SB P03]) + DISPATCH (cb10 tail ->
e672) into the super-loop; boot-zero-init 0x06EC/0x06ED/0x0758/0x075x/0x0718.
- main.c: FIX the deferred-tunnel-up gate -- it gated pcie_power_on on E302 mode>=2, but the
stock trace proves stock reaches the GPU at E302 mode0. Gate on SB[0xA0]/[0xA1]==CL0 instead.
- crt0.s: FIX a stack-overflow bug -- the grown DSEG (IRAM globals) now extends past the old
fixed sp=0x72, so the stack corrupted the PD/FSM globals on the first push. Raised sp to 0xB0.
HONEST SCOPE: the state-4 (b0b4) and state-5 (8000/850b) bodies are the deep per-lane CL-state
PHY walkers -- ~150 functions across 4 layers of undocumented PHY/SB/tunnel-adapter accessors
(b0b4->e9e7/d3b0/e980/b8db; 8000->96a7/9a11/982b/97c9/0c7a/ea7c...). Transcribing them partially
would FABRICATE the exact PHY write sequence that drives SB[0xA0]->CL0 (the project's repeated
failure mode), so they are wired as [u4lb:S4/S5 OMITTED] markers that advance the FSM so the walk
is observable; completing b0b4 then 8000 is the next (large) transcription target.
GPU NOT yet enumerated with handmade. The HW loop also hit a host-side wall this session: the
Intel MTL TB4 host (no remote re-plug available) got stuck and handmade stalled at Disc_SVIDs
before [===SB Con===] on every late run (clean HEAD reproduces the same stall -- it is host-state
flakiness, not this code). Board recovered to stock fw_tinygrad.bin (GPU 1002:7590 + router 1-1
confirmed in lspci/thunderbolt).
ghidra stock addresses transcribed: e672/cm_conn_routing_setup(a7de)/eb62/eb81/db7a/dea1/cb10/
b0b4/8000/850b/d3b0/e9e7/c586 (all CODE_BANK1).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Transcribe Round A of the USB4 state-4 lane-bond engine (CODE_BANK1::b0b4,
file 0x130b4) into handmade/src/usb4_lanebond.h, replacing the [u4lb:S4
OMITTED] stub. The e672 FSM now dispatches 0x06ED==4 -> a faithful b0b4 body
that advances to state 5 via eb62(0,5).
Transcribed verbatim from the raw bank1 ASM (cited stock addrs in comments):
- SB2_RD/SB2_WR(off) = page-1 0x2900+off accessor (DPX=1; mirror of SB_RD/WR)
- 96fe sb_cmd_issue, d5da sb_lane_phy_ready_handshake (all 3 param branches),
e07d retrain poke, e9e7 RstRxpll, ebde settle, e980 rate-descriptor apply,
d3b0 Chg2-20G setup, ec51 Trig-arm, b226 settle
- b0b4 shell: retrain-guard(0x0776) | normal OS-prewrite -> width gate ->
connect gate -> E716/CA06 enable(0x0AF1.0) -> L0/L1 OS-arm(96fe+d5da +
0x081E/081F.7 latch) -> CC37.2 set -> d3b0(3) -> e980 -> e9e7 -> CC37.2 clr
-> CA60.3 set -> L0/L1 OS1 trigger(SB[0x50]=2/SB[0x5A]=2 + P1[0x010B] +
0x075b/0759/075c/075a=0x10) -> ec51 -> 0x074E:074F=CCE4:CCE5 -> eb62(0,5)
phy_cc10_cmd_wait(subcmd,cc12,cc13) arg order RE-DERIVED from R7/R4/R5 at every
call site (d5da=cc10(1,0,0x0B); e9e7=cc10(1,0,0x14)+cc10(2,0,0x28); d3b0/E716=
cc10(2,0,0x28|0xC8)). Every stock busy-poll (d5da SB[0x2C].2, ebde C2D0.5/C350.5)
is bounded with a guard counter. Plane discipline preserved (SB/SB2/P1 via paged
accessors; C2xx/CA06/CA60/CA81/CC37/E716 via PR() plain XDATA).
ROUND-B placeholders (NOT fabricated): e305 [PcieTunnel-PwrOn] (UART marker +
e57d reset-pulse + connect-head CA06 RMW stand-in for the E764 14->19 train),
b8db [CDRV ok] (UART marker), c593 bank0 stub. The deep e26a/cdc6/ee29/a840
tunnel-power+train subtrees remain Round B.
main.c: add [S4 6ed/75b/759/75c/75a] + e764 to the super-loop trace.
Builds clean (firmware_wrapped.bin 30800B, DSEG 0x6C unchanged, sp 0xB0 healthy).
HW: boot+PD+Enter_USB[4]+Connect_U4+SB Init+SBdone all run cleanly, no hang/
reboot-loop. b0b4 NOT exercised this session: the Intel MTL TB4 host did not
raise C80A.5 / [===SB Con===] (the documented intermittent host wall), so the
cb10->e672 FSM never armed 0x06EC to reach state 3->4. No regression.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
HW bisection on the Intel MTL TB4 host (per-commit C80A.5 firing rate, afb938e..038a6e0, 6-8 device-reset trials each): afb938e 7/7 FIRE (C80A.5 crack) 4e46fef 6/6 FIRE (full connect tail — NOT the regression) 4d1cc11 0/5 NOFIRE <- regression introduced here 038a6e0 0/7 NOFIRE (HEAD) Back-to-back A/B on identical host state (4e46fef FIRE / 4d1cc11 NOFIRE, host re-confirmed firing before+after) rules out host flakiness. The only functional firmware change in 4d1cc11 that breaks firing is this crt0 stack move (sp 0x72 -> 0xB0). The 0xB0 stack is only 79 bytes; the nested INT1 SB-router (a066) + bank0_8a89 + deep connect-tail call chain needs ~133 bytes, so it overflowed and the host's [===SB Con===] never fired. Isolation builds confirm: it is the stack value, not the lane-bond FSM main.c change (4d1cc11 + sp=0x72 fires 2/2; 4e46fef + sp=0x80 does NOT fire 0/3). Firing has a sharp stack-DEPTH cliff: sp <= ~0x7A fires, 0x7D/0x80/0xB0 do not. Restoring sp=0x72 brings HEAD back from 0/7 to 6/8. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… + gate trace) b0b4 Round A had never run on HW: the TB4 host tears the transient C80A.5 connect down within ms, and the one-e672-per-super-loop cadence plus the 10x ROPB diagnostic burst (1.2M-nop delay) meant state-4 was never reached before teardown. Test-enabling tweaks: - pump e672 up to 4x per iteration so the FSM walks 3->4->5 inside the connect window (each call still runs exactly one faithful state body) - gut the ROPB burst to a single no-delay sample (host posts nothing, settled) - shrink the sb_asserted top-of-loop busy delay 600k->60k nops - add b0b4 phase markers [b4:A/B/C/D] + width/connect-gate value dump RESULT: b0b4 now executes. It enters, passes the prewrite + width gate, and ABORTS at the connect-present gate (0x0765==0 && 0x0766==0). 0x0768/0x0769 (lane width) read uninit 0x55; 0x0765/0x0766 are never set on this host. That gate is the Round B / upstream blocker, not a b0b4 bug. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wire the two producers that b0b4 (USB4 state-4) needs to pass its
b130-b13c gate, which previously aborted as [b4:CONGATE-abort]
(wid=5555 neg=0000 765=0000).
GAP1 - lane-width snapshot (0x0768/0x0769):
Stock CODE_BANK1::db7a TAIL is `eb62(0,3); 98ec()` but handmade
dropped the 98ec call, leaving 0x768/0x769 uninit 0x55. Transcribe
98ec (CODE_BANK1::98ec) + its ee57 callee (CODE_BANK1::ee57):
0x758=0x10; ee57() arms ec51 (Trig) if CCE1.0&CCE1.1 and reads the
HW lane-width counter CCE4:CCE5; 0x768=CCE4, 0x769=CCE5. CCE4/CCE5
is read-only HW (all image-wide accesses 981b/b10f/b20f/ee65 are
READs). Called right after the [LB arm] eb62(0,3) in main.c, the
faithful db7a-tail position.
GAP2 - connect-present flag (0x0765=1):
Set only by CODE_BANK1::ebb5, reached from the orphan fn
CODE_BANK1::cd3f (the connect-descriptor reader, called from d4cd
-> a066). cd3f reads the host per-port connect descriptor via the
0x21xx movc ptr-table, resolving to 0x4E=SB[0x28+port],
0x752=SB[0x18+port], then dispatches (cd86-cdf4) -> ebb5 when
(0x752&0x60)==0x60. ebb5: if((0x752>>1)&0xF){SB[0x57]|=8;SB[0x61]|=8}
0x765=1.
Transcribed cd3f's dispatch + ebb5 faithfully into the SUPER-LOOP
(sb_connect_present_poll), NOT the a066/INT1 ISR: HW bisection this
session proved that ANY addition to the a066 INT1 call tree trips
the documented 3-byte stack cliff (crt0 sp=0x72) and kills C80A.5
firing. sb_transport_substate_poll is kept BYTE-IDENTICAL to HEAD;
the C80A.5 firing rate is preserved (V3 measured 2/8 vs HEAD 2-3/8
in an interleaved A/B test). Where the Intel MTL TB4 host engages
the SB-router connect (C80A.5 -> 0x06EC=1) but never drives the
SB[0x18] descriptor to the (x&0x60)==0x60 pattern, reproduce ebb5's
effect at the equivalent connect point (0x06EC==1, with ebb5's own
0x0752 gate intact) per the unblock spec.
Boot zero-init 0x0765/0x0768/0x0769/0x0752/0x0753 (was uninit 0x55).
S4 dump gains 765=/768=/sb18=/cce4= for the next session's HW diag.
Stock addrs transcribed: db7a, 98ec, ee57, ebb5, cd3f (+cd86-cdf4
dispatch), 981b (CCE4/CCE5 read-only proof). Builds clean (31311B,
DSEG 0x6C == HEAD, no IRAM regression). INT1 transport path verified
byte-identical to HEAD. HW validation of the gate pass is pending: the
FTDI UART debug channel got wedged mid-session (board itself healthy,
left on stock = AMD GPU 1002:7590 tunneled, router 1-1).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…B clean stack) HW A/B (corrected harness, current confirmed-good host; fire = c80aACC=20 / [===SB Con===] / [LB arm] over device resets): c385da1 (stack-fix baseline) 5/10 23b9914 (b0b4 Round A + instr) 3/8 2f596c9 (HEAD, connect-present) 0/8 <- regression None reboot-loop (boots=2, sbinit=1 every trial) -> NOT a %[BOOT] crash; the deepening stack (DSEG grown to top 0x80) at the FIXED sp=0x72 silently OVERLAPS + corrupts live globals / super-loop locals on the deep INT1 SB-router connect path (a066 + bank0_8a89 + connect tail) so the host's C80A.5 connect quietly stops firing as code grows. FIX (shrink DSEG, then LOWER sp — never raise it, that was the 4d1cc11 regression): - Persistent state flags/counters -> XDATA @0x8800..0x8811 (__at, seeded in main()'s boot zero-init; XDATA isn't crt0-cleared). Removes them from the stack-corruption zone. (is_usb2, dma_dwords, pd_seen, pd_cc_timeout, tick_seen, cc_hit, usb4_int_seen, c80a_acc, sb_asserted, sb_con_print_budget, sb_run_8a89_pending, sb_8a89_done, sb_tunnel_up_pending, cb10_seen, bank0_8a89_entered) - Data-plane (int0 I2C/INA) scratch -> XDATA (hw_status_read shunt/bus, ina231 rx). - Makefile: SDCC auto-XSEG relocated 0x0000 -> 0x8820 (also fixes a latent collision of the long-math PARM areas with the chip's 0x0000-0x0BFF CM XDATA state). - Result: SDCC data top dropped 0x80 -> 0x6C, so crt0 sp lowered 0x72 -> 0x6B for a 148-byte CLEAN (non-overlapping) stack vs HEAD's overlapping 141. Build verifies __start__stack(0x6C) > sp(0x6B). Functional logic (b0b4 / connect-present / sb_router) unchanged — state just relocated. VERIFY (same host/harness): fixed HEAD = C80A.5 fire 10/10, 0 reboot-loops (vs HEAD 0/8, c385da1 5/10). Regression fixed; C80A.5 connect reliable again, so the b0b4 -> Round B -> state5 -> GPU FSM is testable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implement the GOFWD_PLAN static post-connect fixes so the TB4 host stops hard-resetting handmade right after [===SB Con===]. FIX #1 (vdm.h, the standout verified bug): Discover_SVIDs guard 0xFFFF -> 0xFF00. handmade required rx_svid_lo==0xFF && rx_svid_hi==0xFF (SVID 0xFFFF) so it NAK'd the post-[SB Con] Discover_SVIDs the host actually sends (PD SID 0xFF00). Stock vdm_build_discover_sids_resp@0xDDAD ACKs SVID 0xFF00: 96ae loads A=IRAM[0x07] (hi, =rx_svid_hi @0xaa7) / R2=IRAM[0x06] (lo, =rx_svid_lo @0xAA6); guard CPL A; ORL A,R2; JNZ NAK -> ACK iff (~hi|lo)==0 i.e. hi==0xFF && lo==0x00. Mapped handmade's lo/hi to 0xAA6/0xAA7 against the stock 0x9bff dispatch (0x9af0-0x9afe stores SVID byte0->0xAA6, byte1->0xAA7, identical to handmade). Guard now `(((uint8_t)~rx_svid_hi | rx_svid_lo) == 0 && (PR(0x09F9) & 0x80))`. FIX #3 (main.c): tighten the super-loop to stock's lean cadence. Hoisted the FSM-advance block (cb10/e672/[LB arm]/4x-pump) to the TOP of while(1), before any delay/prints, matching stock main_boot_and_superloop@0x2FB4 `if((0x09F9&0x83)&&0x06EC){EA=0;cb10();...EA=1;}`. While a connect is in progress (0x06EC!=0) the loop now `continue`s past the 60000-NOP delay and the heavy [SBDIAG]/[LYR]/[TICK]/[U]/[M2]/[LANE]/[S4] dumps (now gated to 0x06EC==0 only), so they never sit between the connect edge and [ConnRout]. The FSM bodies keep their own [SB P0x]/[ConnRout]/[b4:A-D] markers for visibility. FIX #4 (main.c): get bank0_8a89 off the connect critical path + stop masking EX1 across it on the first post-connect iterations. In stock 8a89 is reached ONLY from INT0 link-events (which this host never raises for handmade), so the C80A.5-path 8a89 drive is a handmade hack that starves the FSM and masks EX1. Now deferred behind an XDATA fsm_stall counter (@0x8812): runs at most once, and only after the FSM has made no 0x06ED progress for >=6 iterations with the lanes not at CL0. (FIX #2 skipped per plan: handmade already pre-arms 0x06ED=3.) IRAM discipline preserved: only new state is XDATA fsm_stall @0x8812 (seeded in the boot zero-init); DSEG stays 0x6C / crt0 sp 0x6B (148B clean stack), unchanged. HW (Intel MTL TB4 host, AMD 1002:7590): FIX #1 confirmed effective -- handmade now ACKs Discover_SVIDs (no NAK) and the host proceeds to [VDM][Disc_Modes] (vs. the old Disc_SVIDs->NAK->%[BOOT] hard-reset loop). C80A.5/[===SB Con===] still fires reliably (6/connect across resets, no reboot-loops, no IRAM regression). A/B (FIX#1-only vs FIX#1+loop) proves the super-loop refactor is neutral on the current HW state. NEW downstream blocker exposed (independent of the loop refactor -- HEAD super-loop wedges identically with FIX#1): the device goes silent after [Disc_Modes] -- the now-sustained C80A.5 connect storm runs a066/dea1's phy_cc10_cmd_wait every edge and starves the super-loop so the FSM (cb10->e672->b0b4) never gets a cycle ([LB arm] never prints). That is a separate root cause to chase next, not a regression from these fixes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…eaches state-3 After the Disc_SVIDs 0xFF00 fix (cea02fa) the host stays engaged through [Disc_Modes] but the device went silent: the Intel MTL TB4 host HOLDS connect (SB[0x2C].0=1, level) until it sees [ConnRout], so C80A.5 re-fires the a066 ISR on every IRET and the super-loop never runs a single iteration after [SB Init] (HW-confirmed: a loop heartbeat printed at boot but NEVER post-connect; the a066 ISR probe showed C80A=0x20 + SB[0x2C]=0x03 on every entry). cb10->e672->[ConnRout] - the thing that would release the host - thus never ran = deadlock. Static stock-vs-handmade compare (CODE_BANK1::a066/dea1/db7a + bank0 e80a @0x051b): - dea1's e80a call (dee2-dee8: R7=2,R4=0,R5=0x15) maps to phy_cc10_cmd_wait(2,0,0x15); handmade had the args SCRAMBLED as (0,0x15,2) -> bogus PHY command -> the bounded wait spun the full 0xFFFF per connect edge. Fixed + tight-bounded (0x400) so the in-ISR settle can't monopolize the CPU. - stock's db7a TAIL arms the FSM IN THE ISR (eb62(0,3)->0x06ED=3, 98ec->0x0758=0x10 + CCE4:CCE5 width snapshot); handmade DROPPED this to the now-starved super-loop. Reproduced as bare XDATA stores (no UART -> safe on the sp=0x6B stack cliff). - run the heavy connect consequence (SB RMW + PHY wait + db7a) ONCE per session (edge-gated on 0x06EC); re-asserted edges return fast. - DEADLOCK-BREAK: the storm is so tight the loop never ran even once, so the first connect consequence masks IE_EX1 in the ISR; the loop then pumps the armed FSM with the storm suppressed and re-enables EX1 once the FSM advances past state-3. HW (Intel MTL TB4 host, AMD GPU rig): the super-loop now RUNS post-connect, the FSM is armed to state-3 ([ConnRout] state, 0x06ED=3) in the ISR like stock, [EX1masked] + [SBDIAG]/[S4]/8a89 all execute, and the device is STABLE (no garbage, no %[BOOT] reboot-loop, no hard-reset). C80A.5 fires reliably across resets. DSEG=0x6C / sp=0x6B unchanged. NEW STALL POINT (well-defined, downstream): cm_conn_routing_setup loops 0x0758 0x10<->0x11 on the confirm gate 0x0777==0x0C; 0x0777 reads 0x55 (uninit) because the host connect descriptor (SB[0x18]=00) is never populated and handmade never runs the eaac SB->0x0777 copy. Per USB4_GOFWD_PLAN B#4 this 0x0777 block is host-driven sideband. FSM does not yet advance to b0b4; no GPU/router-1-1 yet (expected). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Apply the ADVERSARIAL-VERIFIED static transcription fixes from the comprehensive faithfulness audit (wf_7040bc4e), each verified byte-exact against the stock fw_tinygrad.bin bank1 disassembly via ghidra before committing: #1 eaac block-copy (the dropped 0x0777-populate): eaac is the SOLE writer of the state-3 confirm-gate descriptor 0x0777. Reached from cd3f's dispatch tail (cddc-cdf1) when (0x0752&0x60)!=0x60 && (0x0752&0x01)==0 -- the branch handmade dropped (it only did the ebb5 (x&0x60)==0x60 path). Body (eaac-eada): 0x0775=1; for i in 0..0x3F: 0x0777+i = SB-plane-2[base+i]; CCD9 strobe. base=0x2a00(port0)/ 0x2b00(port1) plane DPX=1, verified vs ROM 0x212d {2a00,2b00} + the 97a9/9a45/0de6 tuple math (R3=2). New SBP2_RD accessor + [EAAC] dump instrument. #2 eda0 selector (was wrongly modeled void): make it return the R7 selector and gate the 0x0777 check on r7==0 (eval). 0x0775!=0 -> r7=0; 0x0719==2 -> r7=2 (re-arm 0x0758); else r7=1 (idle, leave 0x0758 unchanged). Verified vs eda0-edbc. #3 c3b2: `n==6` -> `n==0` (c428-c439 running subtraction nets A=n; special block at c43b runs ONLY n==0). And the n==0 gate uses IDATA[0x50] = ((SB[port_lo]&0x20)>>5)&7 (computed at c40c-c416 from R1=port_lo), not `hi`. Plus 0x0819 is plain XDATA (9a31). #4 dea1 SB-offset/plane fixes: the self-contained accessors force R1/R2 so the writes land on SB[0x00]/[0x04]/[0x01] + P1[0x0100], NOT SB[0x28]/[0x2C]. Added the dropped 980d set-bit7 steps and the real 989b prelude (P1[0x0109]&=~1; SB[0xD8]=2). Verified vs dea1-defe + helper bodies (967e/97e5/980d/96c7/9777). #9 L1 connect-desc 0x4E source SB[0x29]->SB[0x2A] (ROM 0x2135 {28 28, 28 2a}). #8 cap seeds: seed 0x09F5/F6/F7/F8/0x09FB (DROM cap bits) but NOT 0x09F4 -- forcing 0x09F4=3 trips the usb4_connect_u4 a434 DP-alt sub-case (09FA route 0x07->0x06, a regression confirmed on HW), since handmade never runs 8d77's negotiated overwrite. SKIP #5 (a066 connect/disc latch) -- adversarial-verify REFUTED it. HW (Intel MTL TB4 host): C80A.5 fires reliably (c80aACC=60, no %[BOOT] reboot-loops, no IRAM regression; DSEG=0x6C/sp=0x6B unchanged), route stays 09fa=07. The FSM arms to state-3 ([===SB Con===], 6ed=03, 758=10) but STALLS: the eaac populate is correctly wired yet STARVED -- the host posts NO SB connect descriptor (SB[0x18]=00, SB[0x28]=00, SB-plane-2 0x2a00=all-zeros), so eaac's SB[0x28].4-valid gate never opens, 0x0777 stays uninit, and cm_conn_routing_setup can't confirm. This is the HOST-DRIVEN wall: next step is a STOCK SB-plane-2 0x2a00 / SB[0x28] trace at [ConnRout] to see how stock gets the descriptor with SB[0x18]=00. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…h_descrtrace) Four-hook stock-fw code-cave tracer (a066 [===SB Con===], cd3f descriptor read, eaac 0x0777 block-copy, a7de [ConnRout] FSM) to settle HOW stock obtains the connect descriptor that handmade's state-3 FSM stalls waiting for (0x0777==0x0C). Captured on a confirmed GPU-success stock run (1002:7590 + tbt router 1-1 up): - At [===SB Con===] stock's SB-plane-2[0x2a00..] is ALL ZEROS and 0x0777=0x55 -- identical to handmade's stuck state. The descriptor is not present at connect. - The host posts the connect descriptor over the SB SIDEBAND TRANSPORT (cd3f's per-port descriptor read climbs 00->05->0x63); the router-op mailbox (CE88/CE89/EA80/EA81/EA90/EC06/EC04) stays ALL ZERO the whole run (no router-op). - Stock then writes SB-plane-2 DEVICE-SIDE: 0x2a00 becomes 0x0C, eaac fires 157x (vs 0x on handmade) and copies 0x0C into 0x0777 -> gate passes -> [ConnRout]. VERDICT: handmade dropped a device-side descriptor writer. cd3f dispatches the host descriptor to three branches; handmade has ebb5 + eaac but OMITS the third, CODE_BANK1::af38 ((0x752&1)!=0), which writes the SB-plane-2 descriptor (r3 "2" plane 0x2900/0x2a00) from the host's SB[0x18] data. Without af38, plane-2 stays zero -> eaac copies zeros -> 0x0777=0x55 -> state-3 stall. Static fix = port af38. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…dispatch The stock [ConnRout] descriptor trace (5b84fcc) concluded handmade dropped a device-side function: CODE_BANK1::af38, the third cd3f dispatch branch ((0x752&1)!=0). Ported it faithfully from the raw bank1 asm (af38-b0b3) + fixed the cd3f dispatch tail to the byte-exact 3-way (cdce-cdf4): (0x752&0x60)==0x60 -> ebb5 (0x0765=1) else (0x752&1)==0 -> eaac (0x0777 block-copy) else (0x752&1)!=0: (0x752&0x40)==0 || ((0x752>>1)&0xF)==0 -> af38 (was DROPPED) else -> return af38's ACTUAL behavior (resolved every forced-R3/R2/R1 paged accessor against the BANK1 helper overlays, which shadow the bank0 bodies the decompiler shows): it is the device->host SB-TRANSPORT connect-descriptor RESPONSE builder, NOT a direct writer of eaac's 0x2a00 source. It reads the host descriptor from the HW-latched 0x2a00 RX plane (same plane eaac reads), echoes/windows it into the 0x2900 TX plane + a 0x0800 work buffer, writes SB[0x15] (the SB-transport TX command = 0x0753), and triggers the TX (d5da: SB[0x04]=1, SB[0x10]=1, bounded poll on SB[0x2C].2). So af38 feeds eaac INDIRECTLY: its TX response makes the host advance the handshake -> HW fills 0x2a00 RX -> eaac copies it into 0x0777. (The unblock spec's "af38 writes the 0x2a00 plane eaac reads" was imprecise; the asm proves af38 reads 0x2a00 and writes 0x2900 + triggers the SB transport. No firmware instruction writes R2=#0x2a -> the RX plane is HW-DMA-filled only.) cd3f now POLLS every super-loop iteration (no longer one-shot gated on 0x0765) so it catches the host descriptor climbing. Added [AF38]/[P2 752/sb19/29] instrumentation + a budgeted forced-af38 PROBE (off by default). HW RESULT (Intel MTL TB4 host, GPU+rig confirmed fine via stock): - No regression: boot/PD/Enter_USB[USB4]/SB-connect healthy, C80A.5 fires (c80aACC=60, 6x [===SB Con===]), FSM reaches state-3 (6ed=03, 758=10). - af38 wired correctly + runs (probe [AF38 752=00 50=00 ... sb15=00 sb2c=03]). - THE WALL IS HOST-ELICITED, NOT a missing device write: SB[0x18]=00, SB[0x28]=00, 0x2a00 plane = all-zeros, 0x0777=0x55 across the whole run. The host posts NO connect descriptor to handmade. The forced-af38 probe TXed a (necessarily all-zero) response and the host STILL posted nothing -> the descriptor is elicited by something upstream of af38. Next: trace what makes the stock host begin posting SB[0x18] (the SB-transport RX arm / transport-edge trigger). - DSEG=0x6C / crt0 sp=0x6B UNCHANGED (new state in XDATA 0x8816/0x8817, ROM table in __code). C80A.5 unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…(state-3) handmade's cm_conn_routing_setup state-3 (0x0758==0x10) had DROPPED the stock edf5() call and replaced it with a bare 0x0758=0x11 advance (the comment wrongly claimed "0x0758=0x11 is the load-bearing effect"). The REAL load-bearing action is edf5 -> e2b9(R7=5,R5=7,R3=5,R2=4): it builds a small SB-transport TX descriptor in the 0x2900 plane, writes SB[0x15]=5 (the SB-transport TX command), and triggers the transport via d5da. That device->host message is the trigger that makes the host CM post the connection-routing descriptor into the 0x2a00 RX plane (-> cd3f/eaac -> 0x0777=0x0C -> [ConnRout] confirm). Without it the host posts nothing. Ported faithfully (byte-exact from CODE_BANK1::edf5/e2b9/d5da): - u4lb_edf5_route_query() gated on the 0x0719 in-flight token, returns R7 so a7f5 advances to 0x11 only after a send (else stays at 0x10). - d5da reproduction corrected: P1[0x0100]&=0xFE head (not SB[0x00]), SB[0x04] clr bit1, SB[0x10]=1 TX-go, bounded poll SB[0x2C].2, 9799 W1C, e80a(1,0,0x0b) CC10 cmd, SB[0x0F]&=0xFE, CCD9 strobe, 0x0719=1. HW (Intel MTL TB4 host, AMD GPU rig confirmed fine): the route-query now SENDS each state-3 iteration ([EDF5 sb15=05 719=01]); FSM reaches state-3 reliably; C80A.5 still fires (6 [===SB Con===]/connect), no regression, no %[BOOT]. BUT the SB-transport TX never completes (SB[0x2C].2 stays clear, sb2c=03) and the host STILL posts nothing (sb18/sb28=00, 0x2a00 RX plane all-zeros, 0x0777=0x55). So edf5 was genuinely dropped (now restored, faithful) but is NOT sufficient: the SB-transport TX engine isn't transmitting -> next wall is an upstream SB-transport channel enable / the transport never reaching the operational state where SB[0x10] =1 completes (SB[0x2C].2). DSEG=0x6C, sp=0x6B, C80A.5 path all unchanged. [EDF5] dump @xdata 0x8818. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Layout audit (faithfulness workflow C1/C2) found the handmade scratch overlapped regions the chip/stock use live: - is_usb2/dma_dwords were __at(0x0B40-0x0B44), which stock uses LIVE in its USB4 recovery paths (e869/d47f/ee29). Moved to the SDCC XSEG (chip-CM 0x0BC0 window, RE+HW-confirmed free); boot zero-init now skips 0x0B40-0x0B44 so stock's recovery cells are left untouched. - XSEG --xram-loc 0x0BC0 (chip-CM, valid through state-5; the 0x88xx region read-stalls once the PCIe tunnel powers on). - crt0 sp=0x7F (128B non-overlapping stack, matches stock SP placement). Also seeds the state-5 transport-edge diagnostic budget. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The connect was flaky 10G/20G because handmade af38 clobbered the lane- present byte 0x081A. Stock af38 GATES its descriptor copy on DAT_50>=0x0e; the host's routing descriptor is type 0x0D (<0x0e), so stock takes the FAIL path (9988) and never writes 0x081A -- leaving b7a4's 20G bit intact. Handmade copied unconditionally AND had the DAT_52 branch inverted, so it wrote 0x081A=0xC0 (10G). Transcribed faithfully from CODE_BANK1::af38: - <0x12 -> <0x13 LUT-echo bound - DAT_52!=0 -> BRANCH A (RX->work); DAT_52==0 -> BRANCH B (work->TX) - per-branch gate (DAT_50 in [0x0e,0x13) + 99b5/976e/0x0705 conds) - 9988 FAIL path on gate miss; shared b096 tail (9695 discard read) Result (HW, deterministic 2/2 boots): [Lt77A=F3 81A=D3 819=01], state-5 reached, [PcieTunnel-PwrOn]. b7a4 0x081A.5 clear-back left OMITTED with a TODO: it is faithful but its input 0x07CC is not yet faithfully initialised in handmade and fired non-deterministically, spuriously forcing 10G. Also adds the d4cd transport-edge alternation diagnostic (proved the 0x06EE port alternation is healthy). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Transcribed the state-5 route-query push from stock e2b9 (audit #15): - status byte 8 now goes to SB[0x0D|0x0E] (port-selected) instead of SB[0x0C] -- the old push wrote the wrong register, so the host never saw a valid route-query - token 0x0719 = d5da_ret - 1 (d5da returns SB[0x0C]-7), not hardcoded - full d4cd (transport + link edges) before the push, matching stock Extends the change-gated state-5 diagnostic to dump all four transport/ link edge regs + the 0x06EE/0x06EF alternation state. Determinism holds (819=01 2/2). NOT yet sufficient for CL0: the host still does not post the eaac-routed response that sets 0x0775, so the walker parks at LOOP1 0x0759=0x30. Next: full 8000/e672 walker. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Carries the PD-dispatch, VDM and C80A int-demux adjustments made while bringing the handmade connect up to the stock sequence (router event handler wiring, EC06/E763 acks, PD header/soft-reset paths). Part of the broader stock-faithfulness pass tracked in USB4_FAITHFULNESS_AUDIT.json. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- test.sh: flash an arbitrary binary to the NUC rig and capture UART (stages in HOME to survive the /tmp wipe, verifies md5). - USB4_FAITHFULNESS_AUDIT.json: the 56-fix prioritized stock-vs-handmade audit (multi-agent workflow output) driving the connect/walker work. - patch_*.py + plan/trace notes: stock code-cave instrumentation used to capture the SB connect/state-0x11 sequence. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…o SB[0x0C] CL0 workflow (adversarially verified) findings: - cd3f gate is PORT-AWARE (cdaa-cdcb): ports 0/1 need 0x752.4 CLEAR, ports 2/3 need it SET. The handmade only had the 0/1 rule. Now reads the port from 0x06F0 (which the caller pins) and applies the right rule. - d4cd processes the SB[0x81]/[0x83] LINK edges too (ports 2/3, 0x06EF toggle), dispatching cd3f -> eaac, with the faithful 974a ack (W1C 0x10,0x20,0x40,0x08). Moved into the super-loop sb_d4cd_transport_ edges alongside the 0x28/0x2A transport edges (not the ISR); the old ISR-side link handler is now a no-op so the alternation stays coherent. - Transport-edge ack now matches stock 9746 (four W1C writes). - REVERT my earlier e2b9 status regression: stock writes the status byte to SB[0x0C] (the in_R2R1 left by 9695 is R1=0x0C/R2=0x28), NOT SB[0x0D|0x0E]. Verifier confirmed byte-exact. HW result: LOOP2 advances 0x20->0x30 (both walker loops now at 0x30). Determinism holds (819=01). NOT yet CL0: the link edges are silent on this host (81=00/83=00), so the eaac-routed response still never arrives and 0x0775 stays 0. Next workflow: where the response actually comes from. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…0x0775=1 CL0 workflow iter2 (adversarially verified, NOT refuted): the handmade u4lb_e461 had only 2 branches and always sent the e2b9 route-query (SB[0x15]=0xAA8=0x04). Stock e461 (e487-e497) has 3: when 0x0776!=0 it takes e1cb, whose ONLY functional difference is SB[0x15]=(0xAA8<<1)|0x41 =0x41 (e1cb sets 0xAA8=0 via 9966). b7a4 sets 0x0776=1, so the live AMD route-query must be the 0x41 form. Sending 0x04 meant the host never posted the eaac-routed response, so 0x0775 stayed 0 and the walker parked at lane-state 0x30 forever. Split the live push on 0x0776: 0x41 (e1cb) when set, 0x04 (e2b9) when 0. HW result: 775=01 (host responds!), and the walker advances past 0x30: LOOP1 0x30 -> 0x40 -> 0x50. Determinism holds (819=01 2/2). Not yet CL0 (SB[0xA0] still 0x07) -- next blocker is downstream in the 8000 walk. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…minal 0xA1 Replaced the fabricated handmade LOOP1 (which forced state 0x00 at 0x50 via an ee6e stub and parked the lane) with a byte-faithful transcription of CODE_BANK1::8000's LOOP1 jump-table FSM (state cell 0x0759+lane, table @0x802a), produced + adversarially verified by the CL0 workflow: 0x10 e461->0x30; 0x30 0x075F=0 ->0x50; 0x50 e461->0x70 (the bug fix -- was the 0x00 terminal); 0x70 present-gated plane RMW ->0x90; 0x90 e461 ->0xA1; 0xA1 TERMINAL (bonded). Plus the connect-arm/retrain edges (0x20/0x40/0x60/0x80/0xA0/default) + the 8174 width-settle helper. LOOP2 (cell 0x075B, the 0c7a/ea7c lane-width emit) is unchanged. DSEG was at the IRAM cliff; moved the SBDIAG poll locals + walker helper locals to XDATA, and dropped the redundant walker-side d4cd call (the a066 ISR already drives the transport-edge alternation) to keep cd3f/af38 out of the walker call tree and overlayable. HW: LOOP1 now climbs 0x10->...->0xA1 (was parked at 0x50); host keeps responding (0x0775=1); 819=01 2/2. Not yet CL0: SB[0xA0] stays 0x07 and LOOP2 oscillates 0x20<->0x30 (the two loops contend for the e461 token). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Cleanup pass: drop the old per-phase planning docs (USB4_*_PLAN.md, USB4_BOOT_REDESIGN.md), the workflow-artifact dump (USB4_RE_WORKFLOW_a066_postbond.txt), the outdated Keil-migration note (RECONSTRUCTION.md), the stale bank1 bug note (FIX_BANK1.md), and the old progress TODO.md. The live RE reference (USB4_RE.md), the faithfulness audit, the captured stock traces, and the project docs are kept. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replaced 630 raw flat-XDATA accesses (PR(0xNNNN)/XDATA_REG8V(0xNNNN)) with the corresponding named device-register symbols from src/include/registers.h across all 12 firmware source files. Made the 148 referenced register defines volatile (XDATA_REG8 -> XDATA_REG8V) so the named symbols carry the same volatile semantics as PR() -- correct for HW registers and required for the polling loops. VERIFIED ZERO REGRESSION: the wrapped firmware binary is byte-identical (md5 0e86151f... unchanged) before and after, so this is a pure readability refactor with no behavioural change. The remaining raw PR() accesses are the USB4 connection-manager FSM/scratch cells (0x06xx-0x0Bxx), which are internal state rather than device registers and stay stock-address-mapped via their comments. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add named registers.h symbols for the discrete lane-train block (CCE0-CCE5), the PHY-orient (C2C3) and lane-rate (C8FF) HW registers and use them in place of raw pointers (51 accesses). Binary byte-identical (md5 unchanged) -- zero regression. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CL0 workflow iter6 (adversarially verified, NOT refuted) root cause: the state-4 b0b4 handler ran boot_phy_e57d_e764_reset_pulse (an E764 RESET pulse) where stock e305->e26a(1,1)->cdc6(1) runs the E764 TRAIN. So E764 never reached 0x19 -- and the captured stock trace proves E764=0x19 is the precondition that lets the host move the lanes SB[0xA0]/[0xA1] 07->01->02. Replaced the e57d stand-in with the byte-faithful cdc6 train (e7d4 bit3 set/bit2 clr; cdc6 pre clr bit0/set bit1; phy_cc10_cmd_wait(1,7,0xCF) = the e80a CC10 train; then E762.4-gated DONE -> E764=0x19 via cc8b, else START), under the d17e gate (0x09FA&0x81). Kept the CA06 mode-next select (e305 prologue). Uses the just-named REG_PHY_TIMER_CTRL_E764. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add E764/E762/06ED to the state-5 diag. HW confirms the cdc6 train fix reaches E764=0x19 (the stock-trace lane-train precondition) and holds it. Lanes still 0707: E764=0x19 alone is necessary-not-sufficient (per the iter6 verifier caveat); the rest of e305 tunnel-power bring-up is next. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Convert 109 'x = x & y' -> 'x &= y' (and |=, ^=, +=, -=) across the firmware. Binary byte-identical (md5 271f1057 unchanged) -- pure style, zero regression. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…f38 probe Drop the budget-gated [AF38]/[EAAC] UART dump blocks and the forced-af38 chicken-and-egg probe (all seeded =0/off), their globals (0x0B55-0x0B57), and their main() seeds. Build-verified; the blocks never executed (budget 0) so behaviour is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… files) Aggressively cut the research-scratchpad comments (RE-AUDIT/FIX#/ROUND B/ breakthrough/regression-history/date/keystone/chicken-and-egg narratives, multi-paragraph why-it-failed asides, redundant restatements) across usb4.h, pd_dispatch.h, vdm.h, sb.h, pd.h, boot_phy.h, usb4_irq.h. Kept the terse stock-address annotations and load-bearing IRAM/ISR/volatile notes. VERIFIED binary byte-identical (md5 6ca2d282 unchanged) -- comment-only, zero regression. (usb4_connect.h deferred: its agent broke a comment block.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ical) Collapse the scratchpad banner + why-it-failed asides on the bank0_8a89/ cd10/c9a8 blocks to terse purpose lines; keep all stock-address annotations. md5 6ca2d282 unchanged -- comment-only, zero regression.
…th banks) Export of the whole Ghidra decompilation via DecompInterface -- all 1766 functions (1140 bank0 CODE + 626 bank1 CODE_BANK1 overlay), with each function headed by its stock address. Not compiled (reference only); makes cross-referencing the stock firmware far faster than per-function MCP decompiles. Generated with Ghidra's CppExporter/DecompInterface. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… c593), disasm-verified Replaces the state-4 b0b4 stub block with a byte-faithful transcription of the stock tunnel-power/PHY bring-up: e305 -> ee29 -> ed44 -> df61 (PHY lane seeds) + a840 (link-speed/width) + c593 (tunnel-adapter commit) + b8db (CDR/PLL validate), keeping the already-working cdc6 E764 0x14->0x19 train as e305's e26a tail. 8 adversarial verifiers (one per fn) checked each helper at the DISASSEMBLY level and caught that the Ghidra decompile itself mismodels the R2:R1 register clobbers from the LCALL read/write primitives. Load-bearing fixes vs the first-pass transcription: - df61 PHY seeds were landing on WRONG registers: v|0x40 -> 0x7041 (not 0x1835, d1b1 read clobbered R2:R1); d1d3 tail writes -> 0x508D (not 0x508F) and 0x408D (not 0x5204/0x4204) since d1d3 sets R1=0x8D. This is the most likely reason the lanes never moved. - a840 B403 is a two-stage op: flat B403 RMW + a PLANE-2 0x40B0 write (was collapsed into one wrong flat write); add the ed44 re-strobe tail on the USB4 path; CA06 r5==0 else-branch (r7<3 && r6>=2 -> &=0x1F); ced1 0x5d24/0x5d29 gen tables for the non-USB4 branch. - d436: faithful direct effects incl. the missing B436 high-nibble write from B404 (downstream B434 ramp + d702/cc-cluster left as a documented simplification -- it is the PCIe-to-GPU side). - b0b4: the b153 9789 falls through into 9790, so E716 needs a second write |0x03 (PHY link-mode 3); handmade left it mode 0. - b8db: add the early-return prologue (stop a spurious RxPLL reset after the PHY trained) and poll the correct PLL-lock bit6 (c3a8), not bit5. ed44/ee29/e74e/e305/c593 verified faithful as-is. Builds clean, DSEG unchanged (0x6C). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…2/3 = 0x2c00/0x2d00)
sb_eaac_populate_0777 and sb_af38_descriptor_response hard-coded a 2-way plane select
(0x06F0==0 ? 0x2a00 : 0x2b00), but the route-query response (the CL snap read by the state-5
walker at 0x0779+lane) arrives on the LINK edges where d4cd sets 0x06F0=2/3. Stock derives the
plane from ROM CODE 0x212d = {2a00,2b00,2c00,2d00} (read_memory byte-exact), so ports 2/3 read
0x2c00/0x2d00. The 2-way select read 0x2b00 for both -> the wrong plane -> 0x0779 stayed 0 ->
bit7 never set -> the LOOP2 CL walker oscillates 0x20<->0x30 forever and the GPU never bonds.
Replace both selectors with a shared __code table sb_rxplane_212d[4] indexed by 0x06F0. Verified
faithful (ROM 0x212d). af38's separate soff=(0x06F0==0)?0x0D:0x0E SB[0x0D]/[0x0E] select is a
genuinely different stock 2-way and is left as-is.
HW: the snap now reads the correct plane (snap diag added to u4lb_s5_diag), but those RX planes
come up EMPTY -- the host isn't posting the per-lane CL response yet, and 0x0779 (lane0, the
enabled lane) stays 0 while the connect F3 lands at 0x077A (lane1's slot). Necessary correctness
that exposes the next layer (host CL-response posting / lane indexing); lanes still 0707.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…PHY tracer b8db: replace the simplified bit6-only early-exit with the full wrsweda2r-verified body (prologue early-returns + per-lane margin window + the CDR-margin SUBB compare on C2D2/C2D9/C2DA/C352/C359/C35A, e9e7 RxPLL-reset on any miss, bounded 10). Faithful. patch_phylock.py: change-gated stock code-cave (hooks the super-loop top 0x2FC0) dumping [P:<C8FF><E302><E762><SBa0><SBa1><0779><077A><E764><C2D0><C350>]. This is the host-vs-fw discriminator that produced the breakthrough below. FINDINGS (stock vs handmade, via patch_phylock + the pll= diag added to u4lb_s5_diag): - Stock REACHES THE GPU: [PcieTunnel-Enable] -> USB4 Gen3 x2 -> PCIE Gen04 x04 -> Bus#2D. - The E762/RXPLL hypothesis was WRONG: stock also has E762=00 post-train (E762 bit5 is set only DURING the train). Not the gate. - The real gate is the CDR lock: stock C2D0/C350 go E2 -> 64 (bit6 PLL-lock, post-train) -> F4 (bit4 added = full CDR lock) AS the lane comes up SB[0xA1] 07->01, THEN 0779 populates with bit7-set CL responses (AD/A8/A0) and the lanes reach CL0 (02). - Handmade is stuck at C2D0=E4 / C350=64 (bit6 set, bit4 CLEAR) -> CDR never fully locks -> lane never comes up -> 0779 stays 0 -> no CL0. And handmade C2D0=E4 has bit7 SET which stock's post-train 0x64 does NOT -> a residual STATE-4 PHY-config divergence (not b8db: restoring the full b8db margin loop left C2D0 unchanged at E4). NEXT: instrument stock's C2D0/C350 through the state-4 cdc6/e305 PHY train to find where the handmade diverges (C2D0 bit7) -- the lane-up gate is upstream of b8db/the walker. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
99% based on static analysis of a higher quality ghidra disassembly. there's tons of phases, here's the status: