Skip to content

Commit d7f5fe0

Browse files
committed
Implemented support for older devbin protocol
Support legacy-firmware devbin format and fix command routing for direct devices - Decode devbin records in two formats via a per-record discriminator: * lengthPrefixed (current firmware): 3-byte envelope (magic 0xDB-0xDF + topic + envelopeSeq) followed by [recordLen:2][statusBus:1][addr:4] [devTypeIdx:2][deviceSeq:1][sampleLen:1][sample] * legacyRaw (Cog v1.9.5 and similar): no envelope, fixed-size samples, no per-record length prefix - Add hasValidRecordAt / resolveRecordPayloadFormat plus helpers getBinaryDeviceKey and getLegacyRawSampleLen to drive iteration - Route typeinfo lookups via ?bus=<n>&type=<n> for legacy firmware that doesn't support ?deviceid=, and add the same fallback on the JSON publish path for simulators / older publishers - Enforce sample bounds in RaftAttributeHandler via diagCtx.sampleEndIdx so a short legacyRaw sample can never overrun into the next record - Add parseDeviceKeyForCommand() and use it in sendAction and setSampleRate so devman/cmdraw and devman/devconfig URLs use the firmware-side bus/address (fixes wrong bus=0_0&addr=2 for direct devices whose key looks like "0_0_2") - Dashboard: probe datalogging capability and hide the three log panels on firmware that doesn't support it
1 parent fbe5876 commit d7f5fe0

7 files changed

Lines changed: 711 additions & 103 deletions

File tree

devdocs/decode-overrun-investigation.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -8,19 +8,19 @@ The raftjs example dashboard is receiving an `AttributeHandler decode overrun` e
88

99
**Error Message:**
1010
```
11-
AttributeHandler decode overrun (msgBuffer):
12-
deviceKey=0_0
13-
deviceType=Cog Light Sensors
14-
debugMsgIndex=1
15-
attr.n=amb0
16-
attr.t=>H
17-
attrTypeSize=2
18-
curFieldBufIdx=45
19-
msgBuffer.length=46
20-
sampleStartIdx=37
21-
sampleEndIdx=46
22-
availableInSample=1
23-
availableInBuffer=1
11+
AttributeHandler decode overrun (msgBuffer):
12+
deviceKey=0_0
13+
deviceType=Cog Light Sensors
14+
debugMsgIndex=1
15+
attr.n=amb0
16+
attr.t=>H
17+
attrTypeSize=2
18+
curFieldBufIdx=45
19+
msgBuffer.length=46
20+
sampleStartIdx=37
21+
sampleEndIdx=46
22+
availableInSample=1
23+
availableInBuffer=1
2424
```
2525

2626
**Key Facts:**
@@ -96,8 +96,8 @@ The mysterious `01` byte at the end needs to be identified:
9696
- Is it an uninitialized buffer value?
9797
- Was it added in a recent firmware change?
9898

99-
**Check:**
100-
- `git log -p components/DeviceLightSensors/DeviceLightSensors.cpp` around `formDeviceDataResponse()`
99+
**Check:**
100+
- `git log -p components/DeviceLightSensors/DeviceLightSensors.cpp` around `formDeviceDataResponse()`
101101
- Look for recent changes that add bytes (new sensor data, flags, etc.)
102102
- Compare the byte count calculation in `getDeviceTypeRecord()` with actual `formDeviceDataResponse()` logic
103103

Lines changed: 291 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,291 @@
1+
# Legacy Firmware (Cog v1.9.5) Compatibility — Final Design
2+
3+
## Summary
4+
5+
RaftJS must support two devbin record body layouts and two `devman/typeinfo`
6+
URL forms in order to interoperate with both current firmware (Axiom and
7+
recent Cog builds) and Cog v1.9.5 in the field. The discriminator on the
8+
binary side is **presence of the devbin envelope at message offset 2**; on
9+
the JSON side it is a small **per-endpoint capability cache** that suppresses
10+
repeated `failUnknownAPI` calls. No `SystemVersion` string parsing is
11+
required for correctness.
12+
13+
The design below is grounded in concrete data captured by a temporary
14+
diagnostic probe (`src/_DevbinCompatProbe.ts`) against both firmwares and in
15+
the actual library history (commit `054125c`, where the `devman/typeinfo`
16+
URL form changed).
17+
18+
## Confirmed Observations
19+
20+
The probe was run twice from `examples/dashboard` against the two firmwares.
21+
22+
### Old firmware (Cog v1.9.5)
23+
24+
```
25+
devbin frames by envelope kind:
26+
no-envelope count=203 firstBytes=001180000000000002 ddac07...
27+
json endpoints supported (rslt=ok seen):
28+
v, subscription, bledisconnect
29+
json endpoints failing:
30+
pubtopics :: failUnknownAPI count=1
31+
devman/typeinfo :: failBusMissing count=2
32+
datetime :: failUnknownAPI count=1
33+
filelist/local :: failUnknownAPI count=1
34+
datalog :: failUnknownAPI count=6
35+
filelist/local/logs :: failUnknownAPI count=1
36+
```
37+
38+
Decoded record header from `firstBytes`:
39+
40+
```
41+
0011 recordLen = 17
42+
80 statusBus (online, bus 0)
43+
00000000 address = 0 (direct)
44+
0002 devTypeIdx = 2
45+
dd ac 07... timestamp(2) + fixed-size payload <- no deviceSeq, no sampleLen
46+
```
47+
48+
### New firmware
49+
50+
```
51+
devbin frames by envelope kind:
52+
env=0xdb v=11 count=1701 firstBytes=0057810000076a000e65 4e70...
53+
json endpoints supported (rslt=ok seen):
54+
v, subscription, devman/typeinfo, datetime, filelist/local,
55+
datalog, devman/devconfig, bledisconnect
56+
json endpoints failing:
57+
filelist/local/logs :: nofolder count=1
58+
```
59+
60+
Decoded record header from `firstBytes` (after the 3-byte envelope):
61+
62+
```
63+
0057 recordLen = 87
64+
81 statusBus (online, bus 1)
65+
0000076a address = slot 7, I2C 0x6a
66+
000e devTypeIdx = 14
67+
65 deviceSeq = 0x65 <- present
68+
4e sampleLen = 78 <- length-prefixed
69+
70 ... sample data
70+
```
71+
72+
### Conclusions from the data
73+
74+
1. **The magic byte does not change between formats.** Both observed
75+
firmwares use `0xDB` (the probe's `v=11` is just the literal low nibble of
76+
`0xDB`). The `0xDB..0xDF` range is reserved space but is not currently
77+
used as a version counter.
78+
2. **The real discriminator is envelope presence** at byte offset 2 of the
79+
message (after the 2-byte msgType prefix). Old firmware emits no envelope
80+
at all; new firmware emits the `0xDB` envelope.
81+
3. **Both body layouts described earlier are confirmed**: legacy raw
82+
`[timestamp:2][fixed payload]` samples with no per-device sequence byte,
83+
versus current `[deviceSeq:1][sampleLen:1][sampleData]`.
84+
4. **`devman/typeinfo` is not missing on old firmware** — it just expects a
85+
different query form. See "typeinfo URL history" below.
86+
5. The `0_0` device-key collision is real and unavoidable on old firmware:
87+
the captured records show multiple distinct `devTypeIdx` values
88+
(2, 3, 5 in the earlier log) all on bus 0 / address 0.
89+
90+
## `devman/typeinfo` URL History
91+
92+
Git history on `src/RaftDeviceManager.ts` shows the URL changed in commit
93+
`054125c` ("Added support for 'role':'system' to denote system devices"):
94+
95+
```diff
96+
- const cmd = "devman/typeinfo?bus=" + busName + "&type=" + deviceType;
97+
+ const cmd = "devman/typeinfo?deviceid=" + deviceKey;
98+
```
99+
100+
The pre-`054125c` form took:
101+
102+
- `bus` — numeric bus number as a string
103+
- `type` — numeric `devTypeIdx` as a string
104+
105+
Both values are already present on the wire in every devbin record, so no
106+
extra metadata is needed to construct the legacy request. The post-`054125c`
107+
form uses `deviceid=<bus>_<addrHex>`, which old firmware rejects with
108+
`failBusMissing`.
109+
110+
This means there is **no need for a bundled static typeinfo table** — old
111+
firmware can answer typeinfo queries, we just have to ask in the old format.
112+
113+
## Final Design
114+
115+
### A. Binary discriminator: envelope presence
116+
117+
The single rule the parser uses to choose a record body layout:
118+
119+
```
120+
let envByte = rxMsg[2]; // first byte after msgType prefix
121+
let hasEnvelope = (envByte & 0xF0) === 0xD0;
122+
if (hasEnvelope) {
123+
// current format
124+
msgPos = 2 + 3; // skip 3-byte envelope
125+
bodyMode = lengthPrefixed;
126+
} else {
127+
// legacy format
128+
msgPos = 2;
129+
bodyMode = legacyRaw;
130+
}
131+
```
132+
133+
Notes:
134+
135+
- Magic-byte values `0xDB..0xDF` are all accepted and mapped to the current
136+
body layout. A future format change would have to land both a new envelope
137+
value and matching parser code; until then the low nibble is ignored.
138+
- A top-nibble-only check is safe because legacy record `recordLen` is
139+
`uint16 big-endian`, which means the first byte of any legacy frame is the
140+
high byte of `recordLen`. Real-world legacy records have `recordLen <
141+
0x1000` (the captured example shows `0x0011`), so the high byte is `0x00`
142+
and cannot be confused with `0xD?`.
143+
144+
### B. JSON-API capability cache
145+
146+
Located in a small object owned by `RaftMsgHandler` (or `RaftSystemUtils`):
147+
148+
```
149+
endpointCapability: Map<endpoint, "ok" | "unsupported">
150+
```
151+
152+
Rules:
153+
154+
1. On every JSON response, key the cache by the request endpoint with the
155+
query string stripped (`pubtopics`, `datetime`, `filelist/local`,
156+
`filelist/local/logs`, `datalog`, `devman/typeinfo`, ...).
157+
2. `rslt=ok` → record `ok`. `rslt=fail` with `error=failUnknownAPI`
158+
record `unsupported`. Other failures are not capability signals (e.g.
159+
`nofolder`, `failBusMissing`).
160+
3. Before sending an optional API call, consult the cache. If
161+
`unsupported`, skip the call entirely. If unknown, send once.
162+
4. Once an endpoint is marked `unsupported`, demote its `failUnknownAPI`
163+
log line to debug for the duration of the connection.
164+
5. Reset the cache on disconnect, since a different device may be the
165+
peer next time.
166+
167+
Targets to gate on connect: `pubtopics`, `datetime?UTC=...`,
168+
`filelist/local`, `filelist/local/logs`, `datalog?action=status`. These are
169+
the calls the probe showed firing unconditionally on old firmware.
170+
171+
Specifically address the **`datalog` count=6**: somewhere a retry loop is
172+
running. Once the capability is cached as `unsupported`, that loop must
173+
short-circuit instead of retrying.
174+
175+
### C. Dual-layout devbin parser
176+
177+
Refactor `DeviceManager.handleClientMsgBinary` to support two record body
178+
modes, selected by `bodyMode` from (A):
179+
180+
- `lengthPrefixed`: `[deviceSeq:1][sampleLen:1][sampleData:sampleLen]...`
181+
- `legacyRaw`: fixed-size `[timestamp:2][payload:fixedSize]...`
182+
183+
Decoder rules:
184+
185+
1. After reading `statusBus`/`address`/`devTypeIdx`, fetch
186+
`DeviceTypeInfo`. For `legacyRaw`, compute the sample stride as
187+
`2 (timestamp) + sum(struct sizes from resp.a)`, falling back to
188+
`resp.b` only if the schema cannot be sized.
189+
2. Bound every sample to its own `[start, end]` range when calling the
190+
attribute decoder, so malformed data cannot walk past the record.
191+
Replace any throw-on-overrun with a throttled warning + skip. This
192+
protects against the `RangeError: Offset is outside the bounds of the
193+
DataView` symptom regardless of which layout is in use.
194+
3. Specifically for Cog v1.9.5 light sensor: trust the schema-derived
195+
size, not `resp.b`, because that firmware double-reports the payload
196+
size in metadata. (Carried over from the sibling implementation; needs
197+
one verification capture in this codebase.)
198+
199+
### D. Device key disambiguation for legacy direct devices
200+
201+
Old firmware publishes multiple direct-connected devices on bus 0 /
202+
address 0. To keep them distinct without breaking the existing `bus_addr`
203+
key scheme everywhere:
204+
205+
- For `legacyRaw` records where `busNum == 0 && devAddr == 0`, build the
206+
key as `0_0_<devTypeIdx>`.
207+
- Continue to use `bus_addr` for all `lengthPrefixed` records and for any
208+
legacy record where bus or address is non-zero.
209+
- When sending commands, always use the stored `DeviceState.busName` and
210+
`DeviceState.deviceAddress` rather than re-parsing the displayed key,
211+
so a command for compatibility key `0_0_2` is correctly sent to bus 0 /
212+
address 0.
213+
- The rate-limit cache in `getDeviceTypeInfo` keys off `deviceKey`. With
214+
the disambiguated key the rate limiter naturally avoids the "one
215+
failure poisons three devices" symptom seen in the original log.
216+
217+
### E. `devman/typeinfo` URL fallback
218+
219+
Two-form request strategy in `executeDeviceTypeInfoRequest`:
220+
221+
1. If `bodyMode == legacyRaw` for the originating record, send the legacy
222+
URL directly:
223+
```
224+
devman/typeinfo?bus=<busNum>&type=<devTypeIdx>
225+
```
226+
2. Otherwise send the current URL:
227+
```
228+
devman/typeinfo?deviceid=<bus>_<addrHex>
229+
```
230+
3. On `rslt=fail` with `error=failBusMissing` against the current URL,
231+
transparently retry once with the legacy URL form. Cache the chosen
232+
form per connection so we don't pay the failed first request more than
233+
once.
234+
235+
Because every callsite that needs typeinfo already knows `busNum` and
236+
`devTypeIdx` from the record header, no extra state needs to be threaded
237+
through. The existing `getDeviceTypeInfo(deviceKey)` signature can stay
238+
if the chosen URL form is selected from a small per-key context map
239+
populated when the record header is parsed.
240+
241+
### F. Tests
242+
243+
Add to `src/RaftDeviceManager.test.ts`:
244+
245+
- current length-prefixed records decode correctly (envelope present)
246+
- legacy raw records decode correctly (no envelope)
247+
- legacy direct-device records with bus/addr `0_0` and distinct
248+
`devTypeIdx` stay distinct (key disambiguation)
249+
- malformed sample data inside a record is bounded and skipped (no throw)
250+
- capability cache marks `pubtopics` unsupported on first
251+
`failUnknownAPI` and skips the second call
252+
253+
Use the captured `firstBytes` previews (above) as the basis for the
254+
binary fixtures.
255+
256+
## Out of Scope
257+
258+
- Changes to firmware. Cog v1.9.5 stays in the field as-is.
259+
- Changes to the dashboard UI. Once the library decodes legacy frames,
260+
the existing panels populate without modification.
261+
- Removing the temporary `_DevbinCompatProbe.ts` instrumentation. That
262+
stays in place until the implementation in this design is verified
263+
against both firmwares, then is removed by deleting the file and the
264+
`[COMPAT-PROBE]` tagged call sites.
265+
266+
## Files To Change
267+
268+
- `src/RaftDeviceManager.ts`
269+
- envelope-presence discriminator and dual-layout sample loop
270+
- `0_0_<devTypeIdx>` key disambiguation for legacy direct devices
271+
- `executeDeviceTypeInfoRequest` two-form URL strategy
272+
- `src/RaftAttributeHandler.ts` / `src/RaftCustomAttrHandler.ts`
273+
- bounded sample decoding, replace overrun throws with skip+warn
274+
- `src/RaftMsgHandler.ts` (or `src/RaftSystemUtils.ts`)
275+
- per-endpoint capability cache; suppress repeated `failUnknownAPI`
276+
- `src/RaftSystemUtils.ts` / `src/RaftConnector.ts`
277+
- gate `pubtopics`, `datetime`, `filelist/local`, `filelist/local/logs`,
278+
`datalog?action=status` on the capability cache; fix the `datalog`
279+
retry loop so it honours `unsupported`
280+
- `src/RaftDeviceManager.test.ts`
281+
- new fixtures and assertions per section F
282+
283+
## Cross-Reference
284+
285+
A working implementation of the dual-layout parser and `0_0_<devTypeIdx>`
286+
disambiguation exists locally at
287+
`C:\Users\rob\Documents\rdev\1\SortRaftJsIssues\raftjs-robotical-main\`
288+
(see `devdocs/devbin-backwards-compatibility.md` in that repo). Use it as
289+
a reference when porting; the design above supersedes its envelope
290+
selection rule (presence-based, not version-nibble-based) and its
291+
typeinfo fallback (use the legacy URL form, not a bundled static table).

examples/dashboard/package-lock.json

Lines changed: 6 additions & 6 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)