|
| 1 | +# Legacy Firmware (Cog v1.9.5) Compatibility — Final Design |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +RaftJS must support two devbin record body layouts and two `devman/typeinfo` |
| 6 | +URL forms in order to interoperate with both current firmware (Axiom and |
| 7 | +recent Cog builds) and Cog v1.9.5 in the field. The discriminator on the |
| 8 | +binary side is **presence of the devbin envelope at message offset 2**; on |
| 9 | +the JSON side it is a small **per-endpoint capability cache** that suppresses |
| 10 | +repeated `failUnknownAPI` calls. No `SystemVersion` string parsing is |
| 11 | +required for correctness. |
| 12 | + |
| 13 | +The design below is grounded in concrete data captured by a temporary |
| 14 | +diagnostic probe (`src/_DevbinCompatProbe.ts`) against both firmwares and in |
| 15 | +the actual library history (commit `054125c`, where the `devman/typeinfo` |
| 16 | +URL form changed). |
| 17 | + |
| 18 | +## Confirmed Observations |
| 19 | + |
| 20 | +The probe was run twice from `examples/dashboard` against the two firmwares. |
| 21 | + |
| 22 | +### Old firmware (Cog v1.9.5) |
| 23 | + |
| 24 | +``` |
| 25 | +devbin frames by envelope kind: |
| 26 | + no-envelope count=203 firstBytes=001180000000000002 ddac07... |
| 27 | +json endpoints supported (rslt=ok seen): |
| 28 | + v, subscription, bledisconnect |
| 29 | +json endpoints failing: |
| 30 | + pubtopics :: failUnknownAPI count=1 |
| 31 | + devman/typeinfo :: failBusMissing count=2 |
| 32 | + datetime :: failUnknownAPI count=1 |
| 33 | + filelist/local :: failUnknownAPI count=1 |
| 34 | + datalog :: failUnknownAPI count=6 |
| 35 | + filelist/local/logs :: failUnknownAPI count=1 |
| 36 | +``` |
| 37 | + |
| 38 | +Decoded record header from `firstBytes`: |
| 39 | + |
| 40 | +``` |
| 41 | +0011 recordLen = 17 |
| 42 | +80 statusBus (online, bus 0) |
| 43 | +00000000 address = 0 (direct) |
| 44 | +0002 devTypeIdx = 2 |
| 45 | +dd ac 07... timestamp(2) + fixed-size payload <- no deviceSeq, no sampleLen |
| 46 | +``` |
| 47 | + |
| 48 | +### New firmware |
| 49 | + |
| 50 | +``` |
| 51 | +devbin frames by envelope kind: |
| 52 | + env=0xdb v=11 count=1701 firstBytes=0057810000076a000e65 4e70... |
| 53 | +json endpoints supported (rslt=ok seen): |
| 54 | + v, subscription, devman/typeinfo, datetime, filelist/local, |
| 55 | + datalog, devman/devconfig, bledisconnect |
| 56 | +json endpoints failing: |
| 57 | + filelist/local/logs :: nofolder count=1 |
| 58 | +``` |
| 59 | + |
| 60 | +Decoded record header from `firstBytes` (after the 3-byte envelope): |
| 61 | + |
| 62 | +``` |
| 63 | +0057 recordLen = 87 |
| 64 | +81 statusBus (online, bus 1) |
| 65 | +0000076a address = slot 7, I2C 0x6a |
| 66 | +000e devTypeIdx = 14 |
| 67 | +65 deviceSeq = 0x65 <- present |
| 68 | +4e sampleLen = 78 <- length-prefixed |
| 69 | +70 ... sample data |
| 70 | +``` |
| 71 | + |
| 72 | +### Conclusions from the data |
| 73 | + |
| 74 | +1. **The magic byte does not change between formats.** Both observed |
| 75 | + firmwares use `0xDB` (the probe's `v=11` is just the literal low nibble of |
| 76 | + `0xDB`). The `0xDB..0xDF` range is reserved space but is not currently |
| 77 | + used as a version counter. |
| 78 | +2. **The real discriminator is envelope presence** at byte offset 2 of the |
| 79 | + message (after the 2-byte msgType prefix). Old firmware emits no envelope |
| 80 | + at all; new firmware emits the `0xDB` envelope. |
| 81 | +3. **Both body layouts described earlier are confirmed**: legacy raw |
| 82 | + `[timestamp:2][fixed payload]` samples with no per-device sequence byte, |
| 83 | + versus current `[deviceSeq:1][sampleLen:1][sampleData]`. |
| 84 | +4. **`devman/typeinfo` is not missing on old firmware** — it just expects a |
| 85 | + different query form. See "typeinfo URL history" below. |
| 86 | +5. The `0_0` device-key collision is real and unavoidable on old firmware: |
| 87 | + the captured records show multiple distinct `devTypeIdx` values |
| 88 | + (2, 3, 5 in the earlier log) all on bus 0 / address 0. |
| 89 | + |
| 90 | +## `devman/typeinfo` URL History |
| 91 | + |
| 92 | +Git history on `src/RaftDeviceManager.ts` shows the URL changed in commit |
| 93 | +`054125c` ("Added support for 'role':'system' to denote system devices"): |
| 94 | + |
| 95 | +```diff |
| 96 | +- const cmd = "devman/typeinfo?bus=" + busName + "&type=" + deviceType; |
| 97 | ++ const cmd = "devman/typeinfo?deviceid=" + deviceKey; |
| 98 | +``` |
| 99 | + |
| 100 | +The pre-`054125c` form took: |
| 101 | + |
| 102 | +- `bus` — numeric bus number as a string |
| 103 | +- `type` — numeric `devTypeIdx` as a string |
| 104 | + |
| 105 | +Both values are already present on the wire in every devbin record, so no |
| 106 | +extra metadata is needed to construct the legacy request. The post-`054125c` |
| 107 | +form uses `deviceid=<bus>_<addrHex>`, which old firmware rejects with |
| 108 | +`failBusMissing`. |
| 109 | + |
| 110 | +This means there is **no need for a bundled static typeinfo table** — old |
| 111 | +firmware can answer typeinfo queries, we just have to ask in the old format. |
| 112 | + |
| 113 | +## Final Design |
| 114 | + |
| 115 | +### A. Binary discriminator: envelope presence |
| 116 | + |
| 117 | +The single rule the parser uses to choose a record body layout: |
| 118 | + |
| 119 | +``` |
| 120 | +let envByte = rxMsg[2]; // first byte after msgType prefix |
| 121 | +let hasEnvelope = (envByte & 0xF0) === 0xD0; |
| 122 | +if (hasEnvelope) { |
| 123 | + // current format |
| 124 | + msgPos = 2 + 3; // skip 3-byte envelope |
| 125 | + bodyMode = lengthPrefixed; |
| 126 | +} else { |
| 127 | + // legacy format |
| 128 | + msgPos = 2; |
| 129 | + bodyMode = legacyRaw; |
| 130 | +} |
| 131 | +``` |
| 132 | + |
| 133 | +Notes: |
| 134 | + |
| 135 | +- Magic-byte values `0xDB..0xDF` are all accepted and mapped to the current |
| 136 | + body layout. A future format change would have to land both a new envelope |
| 137 | + value and matching parser code; until then the low nibble is ignored. |
| 138 | +- A top-nibble-only check is safe because legacy record `recordLen` is |
| 139 | + `uint16 big-endian`, which means the first byte of any legacy frame is the |
| 140 | + high byte of `recordLen`. Real-world legacy records have `recordLen < |
| 141 | + 0x1000` (the captured example shows `0x0011`), so the high byte is `0x00` |
| 142 | + and cannot be confused with `0xD?`. |
| 143 | + |
| 144 | +### B. JSON-API capability cache |
| 145 | + |
| 146 | +Located in a small object owned by `RaftMsgHandler` (or `RaftSystemUtils`): |
| 147 | + |
| 148 | +``` |
| 149 | +endpointCapability: Map<endpoint, "ok" | "unsupported"> |
| 150 | +``` |
| 151 | + |
| 152 | +Rules: |
| 153 | + |
| 154 | +1. On every JSON response, key the cache by the request endpoint with the |
| 155 | + query string stripped (`pubtopics`, `datetime`, `filelist/local`, |
| 156 | + `filelist/local/logs`, `datalog`, `devman/typeinfo`, ...). |
| 157 | +2. `rslt=ok` → record `ok`. `rslt=fail` with `error=failUnknownAPI` → |
| 158 | + record `unsupported`. Other failures are not capability signals (e.g. |
| 159 | + `nofolder`, `failBusMissing`). |
| 160 | +3. Before sending an optional API call, consult the cache. If |
| 161 | + `unsupported`, skip the call entirely. If unknown, send once. |
| 162 | +4. Once an endpoint is marked `unsupported`, demote its `failUnknownAPI` |
| 163 | + log line to debug for the duration of the connection. |
| 164 | +5. Reset the cache on disconnect, since a different device may be the |
| 165 | + peer next time. |
| 166 | + |
| 167 | +Targets to gate on connect: `pubtopics`, `datetime?UTC=...`, |
| 168 | +`filelist/local`, `filelist/local/logs`, `datalog?action=status`. These are |
| 169 | +the calls the probe showed firing unconditionally on old firmware. |
| 170 | + |
| 171 | +Specifically address the **`datalog` count=6**: somewhere a retry loop is |
| 172 | +running. Once the capability is cached as `unsupported`, that loop must |
| 173 | +short-circuit instead of retrying. |
| 174 | + |
| 175 | +### C. Dual-layout devbin parser |
| 176 | + |
| 177 | +Refactor `DeviceManager.handleClientMsgBinary` to support two record body |
| 178 | +modes, selected by `bodyMode` from (A): |
| 179 | + |
| 180 | +- `lengthPrefixed`: `[deviceSeq:1][sampleLen:1][sampleData:sampleLen]...` |
| 181 | +- `legacyRaw`: fixed-size `[timestamp:2][payload:fixedSize]...` |
| 182 | + |
| 183 | +Decoder rules: |
| 184 | + |
| 185 | +1. After reading `statusBus`/`address`/`devTypeIdx`, fetch |
| 186 | + `DeviceTypeInfo`. For `legacyRaw`, compute the sample stride as |
| 187 | + `2 (timestamp) + sum(struct sizes from resp.a)`, falling back to |
| 188 | + `resp.b` only if the schema cannot be sized. |
| 189 | +2. Bound every sample to its own `[start, end]` range when calling the |
| 190 | + attribute decoder, so malformed data cannot walk past the record. |
| 191 | + Replace any throw-on-overrun with a throttled warning + skip. This |
| 192 | + protects against the `RangeError: Offset is outside the bounds of the |
| 193 | + DataView` symptom regardless of which layout is in use. |
| 194 | +3. Specifically for Cog v1.9.5 light sensor: trust the schema-derived |
| 195 | + size, not `resp.b`, because that firmware double-reports the payload |
| 196 | + size in metadata. (Carried over from the sibling implementation; needs |
| 197 | + one verification capture in this codebase.) |
| 198 | + |
| 199 | +### D. Device key disambiguation for legacy direct devices |
| 200 | + |
| 201 | +Old firmware publishes multiple direct-connected devices on bus 0 / |
| 202 | +address 0. To keep them distinct without breaking the existing `bus_addr` |
| 203 | +key scheme everywhere: |
| 204 | + |
| 205 | +- For `legacyRaw` records where `busNum == 0 && devAddr == 0`, build the |
| 206 | + key as `0_0_<devTypeIdx>`. |
| 207 | +- Continue to use `bus_addr` for all `lengthPrefixed` records and for any |
| 208 | + legacy record where bus or address is non-zero. |
| 209 | +- When sending commands, always use the stored `DeviceState.busName` and |
| 210 | + `DeviceState.deviceAddress` rather than re-parsing the displayed key, |
| 211 | + so a command for compatibility key `0_0_2` is correctly sent to bus 0 / |
| 212 | + address 0. |
| 213 | +- The rate-limit cache in `getDeviceTypeInfo` keys off `deviceKey`. With |
| 214 | + the disambiguated key the rate limiter naturally avoids the "one |
| 215 | + failure poisons three devices" symptom seen in the original log. |
| 216 | + |
| 217 | +### E. `devman/typeinfo` URL fallback |
| 218 | + |
| 219 | +Two-form request strategy in `executeDeviceTypeInfoRequest`: |
| 220 | + |
| 221 | +1. If `bodyMode == legacyRaw` for the originating record, send the legacy |
| 222 | + URL directly: |
| 223 | + ``` |
| 224 | + devman/typeinfo?bus=<busNum>&type=<devTypeIdx> |
| 225 | + ``` |
| 226 | +2. Otherwise send the current URL: |
| 227 | + ``` |
| 228 | + devman/typeinfo?deviceid=<bus>_<addrHex> |
| 229 | + ``` |
| 230 | +3. On `rslt=fail` with `error=failBusMissing` against the current URL, |
| 231 | + transparently retry once with the legacy URL form. Cache the chosen |
| 232 | + form per connection so we don't pay the failed first request more than |
| 233 | + once. |
| 234 | + |
| 235 | +Because every callsite that needs typeinfo already knows `busNum` and |
| 236 | +`devTypeIdx` from the record header, no extra state needs to be threaded |
| 237 | +through. The existing `getDeviceTypeInfo(deviceKey)` signature can stay |
| 238 | +if the chosen URL form is selected from a small per-key context map |
| 239 | +populated when the record header is parsed. |
| 240 | + |
| 241 | +### F. Tests |
| 242 | + |
| 243 | +Add to `src/RaftDeviceManager.test.ts`: |
| 244 | + |
| 245 | +- current length-prefixed records decode correctly (envelope present) |
| 246 | +- legacy raw records decode correctly (no envelope) |
| 247 | +- legacy direct-device records with bus/addr `0_0` and distinct |
| 248 | + `devTypeIdx` stay distinct (key disambiguation) |
| 249 | +- malformed sample data inside a record is bounded and skipped (no throw) |
| 250 | +- capability cache marks `pubtopics` unsupported on first |
| 251 | + `failUnknownAPI` and skips the second call |
| 252 | + |
| 253 | +Use the captured `firstBytes` previews (above) as the basis for the |
| 254 | +binary fixtures. |
| 255 | + |
| 256 | +## Out of Scope |
| 257 | + |
| 258 | +- Changes to firmware. Cog v1.9.5 stays in the field as-is. |
| 259 | +- Changes to the dashboard UI. Once the library decodes legacy frames, |
| 260 | + the existing panels populate without modification. |
| 261 | +- Removing the temporary `_DevbinCompatProbe.ts` instrumentation. That |
| 262 | + stays in place until the implementation in this design is verified |
| 263 | + against both firmwares, then is removed by deleting the file and the |
| 264 | + `[COMPAT-PROBE]` tagged call sites. |
| 265 | + |
| 266 | +## Files To Change |
| 267 | + |
| 268 | +- `src/RaftDeviceManager.ts` |
| 269 | + - envelope-presence discriminator and dual-layout sample loop |
| 270 | + - `0_0_<devTypeIdx>` key disambiguation for legacy direct devices |
| 271 | + - `executeDeviceTypeInfoRequest` two-form URL strategy |
| 272 | +- `src/RaftAttributeHandler.ts` / `src/RaftCustomAttrHandler.ts` |
| 273 | + - bounded sample decoding, replace overrun throws with skip+warn |
| 274 | +- `src/RaftMsgHandler.ts` (or `src/RaftSystemUtils.ts`) |
| 275 | + - per-endpoint capability cache; suppress repeated `failUnknownAPI` |
| 276 | +- `src/RaftSystemUtils.ts` / `src/RaftConnector.ts` |
| 277 | + - gate `pubtopics`, `datetime`, `filelist/local`, `filelist/local/logs`, |
| 278 | + `datalog?action=status` on the capability cache; fix the `datalog` |
| 279 | + retry loop so it honours `unsupported` |
| 280 | +- `src/RaftDeviceManager.test.ts` |
| 281 | + - new fixtures and assertions per section F |
| 282 | + |
| 283 | +## Cross-Reference |
| 284 | + |
| 285 | +A working implementation of the dual-layout parser and `0_0_<devTypeIdx>` |
| 286 | +disambiguation exists locally at |
| 287 | +`C:\Users\rob\Documents\rdev\1\SortRaftJsIssues\raftjs-robotical-main\` |
| 288 | +(see `devdocs/devbin-backwards-compatibility.md` in that repo). Use it as |
| 289 | +a reference when porting; the design above supersedes its envelope |
| 290 | +selection rule (presence-based, not version-nibble-based) and its |
| 291 | +typeinfo fallback (use the legacy URL form, not a bundled static table). |
0 commit comments