Skip to content

AI firmware analysis: improvement proposals based on architecture review (Claude Opus 4.6) #72

@ensrationis

Description

@ensrationis

Примечание от @ensrationis: это предварительный анализ, в котором также есть ошибки, но лучше иметь его перед глазами как отправную точку для обсуждения.

Context

Analysis based on the Altruist → Connectivity → RoSeMAN → sensors.social architecture scheme and a full code review of the firmware repository. Each finding is mapped to a specific step in the architecture diagram and references exact files/lines.


🔴 Critical (Security)

1. Private key stored as plaintext in SPIFFS

  • config_manager/config_helpers.cpp:78-90saveRobonomicsPrivateKey() writes key as a string to /config.json
  • config_manager/config_defaults.cpp:31char private_key[65] = "Not Set"
  • Problem: physical access to ESP32 = SPIFFS dump = Robonomics identity theft. On the scheme this is step 2 (signed extrinsic) — a stolen key allows an attacker to sign on behalf of the device.
  • Suggestion: use NVS encryption API (ESP-IDF supports flash encryption on ESP32-C6) or store the key in an eFuse-protected NVS partition.

2. OTA update without cryptographic verification

  • OTA_Update.cpp:60-100+ — only MD5 hash is checked
  • OTA_Update.cpp:40 — firmware downloaded over HTTP (not HTTPS)
  • Problem: MITM attack can substitute firmware. This undermines the entire chain of trust in the architecture.
  • Suggestion: implement ECDSA firmware signature verification before flashing. ESP32 Secure Boot v2 supports this natively.

🟡 Important (Reliability)

3. No data buffering when WiFi is lost

  • airrohr-firmware.ino:583-598 — if WiFi is down, data is simply skipped
  • SD card logging exists only on Insight; on Urban — measurements are lost
  • Problem: steps 2 and 3 on the scheme are interrupted. With unstable WiFi (remote installations) measurements are lost.
  • Suggestion: implement a ring buffer in RTC RAM (survives deep sleep, 8KB available) or NVS-backed queue. On WiFi restoration — send accumulated data.

4. Connectivity server pool is hardcoded

  • robonomics_servers.h:15-19 — exactly 3 servers, all REGION_GLOBAL
  • apis/robonomics_http_api.cpp:117-183chooseRobonomicsServer() only polls these
  • Problem: step 3 on the scheme — if all 3 servers are down, the device cannot send data. No DNS-based discovery, no fallback.
  • Suggestion: add a configurable custom connectivity endpoint (similar to the existing custom RPC node option), plus DNS SRV record discovery as a fallback.

5. Timestamp divided by 100 — precision loss

  • apis/helpers/message_formatter.cpp:69-71timestampStr.substring(0, timestampStr.length() - 2)
  • Problem: Unix timestamps 1609459200 and 1609459299 both become 16094592. On the connectivity side (altruist.py:43-55) verification also uses time.time()[:-2], but if clocks are desynchronized by >100 sec — signature verification fails. This also creates a 100-second replay attack window.
  • Suggestion: use full timestamps. If truncation is needed for compatibility with connectivity — document the reason and add a nonce.

6. Bug: shrinkToFit() may break Urban discovery on Insight

  • airrohr-firmware.ino:681-682sensors_data.shrinkToFit() is called only on Urban
  • Problem: on Insight (which aggregates Urban data) if the condition changes — dynamic addition of Urban data will break due to insufficient JSON buffer capacity.
  • Suggestion: allocate a fixed size for Urban data or use a separate JsonDocument.

7. No fallback RPC endpoint for datalog

  • apis/robonomics_datalog_api.cpp:18 — setup with a single robonomics_public_node
  • Problem: step 2 on the scheme goes to one RPC node. If it's down — datalog is not written. Connectivity does fallback KSM→DOT (datalog_feeder.py:118), but the firmware does not.
  • Suggestion: add a fallback RPC endpoint. If the primary is unavailable — try the secondary (similar to how connectivity handles this).

🟢 Improvements (Architecture)

8. No exponential backoff on network errors

  • apis/robonomics_http_api.cpp:80-115 — POSTRequest: single 20 sec timeout, no retry
  • apis/robonomics_datalog_api.cpp:23-43 — single sendRWSDatalogRecord call, no retry
  • Problem: on transient network issues — data is lost. Steps 2, 3, 10 on the scheme — all without retry.
  • Suggestion: implement retry with exponential backoff (1s, 2s, 4s, max 30s). Don't block the main loop — use FreeRTOS task notification.

9. AP mode with predictable password

  • defines.h:285 — AP password: 123456789
  • Problem: anyone within WiFi range can connect to the Altruist in setup mode and change the configuration.
  • Suggestion: generate a unique password from chip ID (already used in SSID), display it on LED or via QR code.

10. Forced reboot every 28 days

  • defines.h:94DURATION_BEFORE_FORCED_RESTART_MS
  • Problem: this is a workaround for memory leaks. Better to find and fix them.
  • Suggestion: monitor esp_get_free_heap_size() and restart only when memory is actually low. Log heap usage for diagnostics.

11. esp-robonomics-client without version pinning

  • platformio.ini:34 — pulls latest master
  • Problem: a breaking change in the library will break the build without warning.
  • Suggestion: pin to a specific commit or version tag.

12. WiFiClientSecure without certificate validation

  • defines.h:17BEARSSL_SSL_BASIC
  • No CA bundle is loaded anywhere
  • Problem: TLS connections (if used) are vulnerable to MITM.
  • Suggestion: embed root CA (Let's Encrypt ISRG Root) into firmware or use certificate pinning.

Priority mapped to architecture scheme

Scheme step What to improve Priority
1 (RWS subscription) Private key protection (#1) 🔴
2 (Signed extrinsic) Fallback RPC (#7), retry (#8) 🟡
3 (Signed msg → Connectivity) Buffer (#3), dynamic pool (#4) 🟡
5 (Data blocks) OTA security (#2) 🔴
10 (Batch hash) Timestamp precision (#5) 🟡

Analysis performed by Claude Opus 4.6 based on full repository code review and architecture diagram.

Metadata

Metadata

Labels

help wantedExtra attention is needed

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions