| Board | Probe Serial | Flash Command |
|---|---|---|
| lora-1 | 0483:374e:003E00463234510A33353533 |
cargo run --release (from firmware/lora-1) |
| lora-2 | 0483:374e:0026003A3234510A33353533 |
cargo run --release (from firmware/lora-2) |
When both boards are connected, probe-rs will prompt for selection. Use:
probe-rs run --chip STM32WL55JCIx --probe 0483:374e:003E00463234510A33353533 target/thumbv7em-none-eabihf/release/lora-1| Board | DevEUI |
|---|---|
| lora-1 | 23ce1bfeff091fac |
| lora-2 | 24ce1bfeff091fac |
- RAK7268V2 at
192.168.0.254(WiFi network — changed 2026-04-28, was10.10.10.254on local Ethernet) - MQTT broker on port 1883
- AU915 sub-band 1
If the gateway IP changes (e.g. moving from a dedicated Ethernet segment to the main WiFi LAN), update all of the following — the firmware does not hardcode the gateway IP (LoRaWAN is radio, not IP), but the backend does:
| File | Location | What to change |
|---|---|---|
mqtt_to_influx.py |
line 23 GATEWAY_MQTT_HOST |
New gateway IP |
NOTES.md |
Gateway section (above) | Update for reference |
banner.svg |
Bottom label row | Update the 10.x.x.x label if you regenerate |
After changing mqtt_to_influx.py, restart the bridge:
docker compose restart mqtt-bridge
docker compose logs -f mqtt-bridge # confirm "Connected to MQTT broker at <new-ip>:1883"The Grafana dashboard and InfluxDB are unaffected — they talk to the bridge container, not directly to the gateway.
application/TOT/device/23ce1bfeff091fac/rx # lora-1 uplinks
application/TOT/device/24ce1bfeff091fac/rx # lora-2 uplinks
Both boards use the same algorithm (see src/main.rs main loop):
- Initial join: single attempt at startup before entering main loop
- Exponential backoff: starts at 16s (~8 ticks × 2s), doubles on each failure, caps at 60s (30 ticks)
- Backoff resets to 16s on successful join
- Gateway loss → rejoin: 3 consecutive missed ACKs triggers
is_joined = falseand resets backoff - lora-1: every 5th uplink is confirmed; failure counter only increments on missed ACKs, only resets on confirmed ACK
- lora-2: every uplink is confirmed (range probe — end-to-end ACK verification on every packet)
join_attemptcounter resets to 0 on every rejoin trigger so the OLED display stays meaningful
Why 60s cap (not 10 min): Field testing requires frequent power cycles. The old 10-min cap meant after 4–5 missed join attempts the board appeared stuck for up to 10 minutes. With a 60s cap, worst case is one missed attempt then back online within a minute.
Sub-band bias: Both boards call set_join_bias_and_noncompliant_retries(Subband::_1, 20) — the first 20 join attempts stay on sub-band 1 (matching the gateway config) before falling back to spec-compliant rotation across all sub-bands. set_join_bias (the old call) only biased the first attempt; any transient RF miss on that one attempt caused all retries to scatter across wrong sub-bands.
lora-2 was originally fitted with a BME688 (pressure/humidity/gas) sensor. It was replaced with an SHT41 (temperature/humidity only) to match lora-1. Both boards are now identical hardware. Firmware differs: lora-2 sends all uplinks confirmed and transmits a TX counter rather than sensor readings (it serves as a range probe, not a sensor node).
Why the BME688 caused problems:
The BME688 shares I2C2 with the SH1106 OLED. On every power-up, the BME688 holds SDA low mid-transaction (internal boot sequence), which corrupts the I2C bus state before the display is even touched. The SH1106's flush() sends a large (~1KB) I2C burst which NACKs after ~32 bytes when the bus is dirty. Result: first 2 display lines OK, rest gibberish.
Attempted fixes (all failed):
bus_recover()(toggle SCL 9×) before display writes — BME688 re-grabbed SDA between lines- Sensor AFTER display — same NACK cascade from previous iteration's bus state
- One I2C instance per function call vs per transaction — made things worse
- Custom raw I2C driver (8-byte max writes) — worked in isolation (
display_testbinary) but failed in main loop once LoRaWAN and sensor traffic shared the bus
Root cause: sh1106::flush() sends the full 1KB framebuffer in one I2C transaction. This works fine when the bus is idle (lora-1, SHT41 only), but the BME688 periodically re-asserts SDA during its own internal state transitions, NACKing the flush mid-frame.
Fix: Replace BME688 with SHT41. SHT41 never holds SDA unexpectedly, bus stays clean, flush() works every time.
Lesson: When two I2C devices share a bus and one is pathologically poorly behaved (BME688 at power-up), the only reliable fix is hardware isolation (separate I2C bus) or removing the offending device.
Bosch full compensation requires reading calibration NVM coefficients from two register blocks (0x8A and 0xE1). Attempted this but got press_raw=0x80000 (524288) — the BME680 "data not ready" sentinel — suggesting the compensation was not helping.
Reverted to a linear scaling approach:
let press_pa = ((press_raw * 195) / 1000) as u32;
*pressure_int = (press_pa / 100) as i16;The multiplier 195 was back-calculated from: Brisbane actual ~1021 hPa vs old formula (×295) giving ~1546 hPa → correction factor 1021/1546 ≈ 0.661 → 295 × 0.661 ≈ 195. Good enough for relative readings; will drift slightly with weather.
The sensor on lora-2 is a BME688 (chip ID 0x61), not a BME680. The BME688 uses a different gas scanning architecture — the run_gas_l / run_gas_h bits and parallel mode gas scanning are not compatible with the simple BME680 forced-mode gas register approach.
Attempted:
- Proper Bosch
calc_res_heatformula with calibration coefficients (par_gh1/2/3, res_heat_range, res_heat_val) — correctly computesres_heat=0x67for 320°C target at 28°C ambient - Both
run_gas_l(0x10) andrun_gas_h(0x20) in CTRL_GAS_1
Result: gas_valid=0, heat_stab=0 always — measurement skipped by hardware. Properly supporting BME688 gas requires implementing the full BME688 scanning mode API. Gas resistance will read 0 until then.
lora-1 has an explicit Timer::after_secs(2).await at the end of each loop iteration. lora-2 originally relied on the BME680 blocking read (~2s) for loop timing — but if reads complete faster, the loop runs too fast. Fixed by adding the same Timer::after_secs(2).await to lora-2, making both boards consistent: sensor read every ~4s, uplink every ~60s (UPLINK_INTERVAL=15 × ~4s/tick).
SF is the core LoRa modulation parameter. It controls how many chips (radio symbols) represent each bit:
| SF | Time-on-air (12 byte payload) | Approx. throughput | Link budget vs SF7 |
|---|---|---|---|
| SF7 | ~50 ms | ~5.5 kbps | baseline |
| SF8 | ~100 ms | ~3.1 kbps | +2.5 dB |
| SF9 | ~185 ms | ~1.8 kbps | +5 dB |
| SF10 | ~370 ms | ~980 bps | +7.5 dB |
| SF11 | ~740 ms | ~440 bps | +10 dB |
| SF12 | ~1500 ms | ~250 bps | +12.5 dB |
Higher SF = better range, but longer airtime = more power, less capacity, and stricter duty cycle limits.
DR is an index that bundles SF + bandwidth. Uplink channels (sub-band 1, 125 kHz):
| DR | SF | BW | Max payload |
|---|---|---|---|
| DR0 | SF12 | 125 kHz | 59 bytes |
| DR1 | SF11 | 125 kHz | 59 bytes |
| DR2 | SF10 | 125 kHz | 59 bytes |
| DR3 | SF9 | 125 kHz | 123 bytes |
| DR4 | SF8 | 125 kHz | 250 bytes |
| DR5 | SF7 | 125 kHz | 250 bytes |
| DR6 | SF8 | 500 kHz | 250 bytes (join only) |
Devices always join at DR0 (SF12) by default for maximum range.
ADR is the mechanism by which the network server automatically moves a device to a higher DR (lower SF) when signal quality permits. The goal is minimum airtime and power use.
How it works:
- Device sends uplinks with
ADR=1bit set in the MAC header - Network server accumulates SNR/RSSI history (typically 20 frames)
- When average SNR has enough margin above the minimum for the current DR, server sends a
LinkADRReqMAC command in a downlink LinkADRReqspecifies new DR and TX power- Device acknowledges with
LinkADRAnsand switches
ADR margin: The server keeps a safety margin (typically 15 dB) so a brief fade doesn't immediately cause packet loss. ADR steps up aggressively, steps down conservatively.
Why our boards stay at DR0 (SF12):
- SNR of +4–+7 dB is strong, so ADR should step up
- Likely causes: ADR not enabled on the RAK gateway application, or the RAK built-in LoRa server has conservative ADR settings
- Check: RAK gateway UI → Application → Device → ADR enabled?
What to watch on the OLED: Once ADR kicks in, line 5 will change from DR0 SF12 → DR1 SF11 → ... → DR5 SF7 over successive uplinks. Each step is a LinkADRReq downlink from the gateway.
RSSI (Received Signal Strength Indicator): total received power in dBm. More negative = weaker. -120 dBm is near the noise floor, -20 dBm is very strong.
SNR (Signal-to-Noise Ratio): signal power relative to noise floor, in dB. LoRa can decode below 0 dB SNR — this is unique to LoRa:
| SF | Minimum decodable SNR |
|---|---|
| SF7 | -7.5 dB |
| SF8 | -10 dB |
| SF9 | -12.5 dB |
| SF10 | -15 dB |
| SF11 | -17.5 dB |
| SF12 | -20 dB |
Our boards at +4–+7 dB SNR have ~24–27 dB of margin at SF12 — plenty for ADR to push to SF7.
For field testing: walk away from the gateway and watch RSSI drop and SNR fall. The ADR margin tells you how far you can go before packet loss. If SNR hits the minimum for the current SF, you'll see lost packets before the network server drops to a higher SF.
A common point of confusion when reading the lora-2 OLED during a walk test:
DR0 is not a signal quality measurement. It is the modulation setting — SF12 + 125 kHz bandwidth. It tells you how the radio is transmitting, not how well the signal arrived. Think of it as the gear the radio is in. DR steps up (DR1, DR2...) only when ADR receives a LinkADRReq downlink from the gateway.
SNR is signal quality — how far above the noise floor the received signal sits. This is the number to watch during a range test.
They are unrelated. DR0 simply means ADR has not yet stepped up the modulation.
You will notice the LCD and Grafana report different SNR values for the same board. This is not a bug — they are measuring two completely different radio paths:
| LCD (e.g. +4 dB) | Grafana (e.g. +11 dB) | |
|---|---|---|
| Measured by | Node receiver | Gateway receiver |
| Which signal | Gateway → Node (downlink ACK) | Node → Gateway (uplink) |
| Frequency | 923–928 MHz (RX1) | 915–928 MHz |
| Bandwidth | 500 kHz (AU915 RX1) | 125 kHz |
The gateway hears the node better than the node hears the gateway because:
- The gateway has a superior antenna, low-noise amplifier, and is typically mounted high
- AU915 RX1 downlinks use 500 kHz bandwidth — wider bandwidth means more noise, lower SNR at the node
- The node's receiver is a modest embedded radio, not a base-station-grade front end
Both readings are valid. Use the LCD SNR for field testing (it tells you what the node is experiencing). Use Grafana SNR for post-analysis (it tells you what the gateway received).
You do not need to manually record RSSI, SNR, or DR during the test. InfluxDB timestamps every uplink to the second. All signal data is already there.
Your field notes only need location and time:
09:14 Left gateway (0 m)
09:22 End of driveway (~200 m)
09:31 Front paddock gate (~500 m)
09:45 Creek crossing (~900 m)
09:58 OLED shows Connecting... — coverage edge
10:06 Signal resumed walking back
Back at the desk, open Grafana and overlay your timestamps against:
- RSSI — shows signal degradation over the walk
- SNR — shows when you approached the decoding limit
- Frame count — gaps reveal exactly which uplinks were lost and at what time
lora-2 sends a confirmed uplink every ~10s, so the Grafana timeline has enough resolution to match waypoint times precisely. The frame count gap tells you the exact moment and location of first packet loss.
Watch SNR, not just RSSI. RSSI can read −100 dBm while SNR is still healthy. Conversely, SNR can collapse even when RSSI looks reasonable in a noisy RF environment.
At DR0 (SF12) the decoding limit is −20 dB SNR. In practice expect packet loss to begin around −15 dB SNR due to multipath and fading. When the LCD SNR approaches 0 dB and keeps falling, you are getting close to the edge.
Coverage boundary = where Connecting... first appears consistently on the OLED. Mark the time, walk back, and let the board rejoin (16–60s backoff). The TX counter will resume from where it left off — session keys survive a rejoin.
Setup:
- Gateway: RAK7268V2 mounted on rooftop, altitude 85 m, GPS −27.425958°, 153.051761°
- Node: lora-2 (range probe), AU915 DR0/SF12, confirmed uplink every ~10 s
- Environment: suburban residential, hilly terrain
- Signal data source: InfluxDB (246 records captured, gateway SNR)
| Waypoint | Time | Distance | Altitude | RSSI | SNR (GW) | Notes |
|---|---|---|---|---|---|---|
| WP1 Home/GW | 09:40 | 0 m | 63 m | −17 dBm | +13.8 dB | Baseline — gateway on roof above |
| WP2 | 10:08 | 63 m | 77 m | −70 dBm | +9.5 dB | Still good |
| WP3 Church | 10:15 | 256 m | 89 m | −89 dBm | +4.2 dB | Highest elevation point — near GW roof level |
| WP4 | 10:22 | 380 m | 87 m | −102 dBm | −6.5 dB | SNR crossed 0 dB — marginal |
| WP5 | 10:26 | 466 m | 82 m | −103 dBm | −19.5 dB | At SF12 decode limit |
| WP6 | 10:31 | 593 m | 75 m | −103 dBm | −20.8 dB | Below SF12 limit — packet loss |
| WP7 | 10:37 | 716 m | 48 m | −103 dBm | −19.2 dB | Lowest altitude — worst reception |
| WP8 | 10:42 | 743 m | 64 m | −96 dBm | +0.5 dB | Climbed 16 m → signal recovered |
| WP9 | 10:50 | 921 m | 77 m | −103 dBm | −13.2 dB | Furthest point — still receiving |
Gaps > 30 s indicate missed uplinks (node retrying or out of coverage):
| Time (AEST) | Gap | RSSI before gap | SNR before gap | Interpretation |
|---|---|---|---|---|
| 09:50 → 09:55 | 307 s | −18 dBm | +14.2 dB | Gateway being moved to roof |
| 10:15 → 10:17 | 112 s | −90 dBm | +4.2 dB | Church — momentary obstruction |
| 10:27 → 10:30 | 158 s | −104 dBm | −11.5 dB | Approaching coverage edge |
| 10:30 → 10:36 | 354 s | −103 dBm | −20.8 dB | Below SF12 limit — heavy loss |
| 10:36 → 10:40 | 242 s | −103 dBm | −19.2 dB | Downhill, terrain masking |
| 10:50 → 10:54 | 254 s | −100 dBm | −3.0 dB | Furthest point, marginal |
The dominant variable was not distance — it was altitude relative to the gateway.
WP3 (Church, 256 m away, 89 m altitude): SNR +4.2 dB — strong signal because the node was at roughly the same elevation as the gateway roof. Near line-of-sight path.
WP7 (716 m away, 48 m altitude): SNR −19.2 dB, 354 s gap — the node had dropped 37 m below the gateway. Terrain between them clipped the Fresnel zone, collapsing the link despite SF12's theoretical −20 dB SNR budget.
WP8 (743 m away, 64 m altitude): SNR +0.5 dB — climbing 16 m from WP7 recovered the link even though the node was now further from the gateway. Line-of-sight restored.
Fresnel zone: The radio wave between transmitter and receiver occupies an elliptical volume (the Fresnel zone), not a laser-thin line. At 700 m range on 915 MHz, the first Fresnel zone radius is ~14 m at the midpoint. When terrain or buildings intrude into this zone, signal degrades sharply — even if the geometric line-of-sight appears clear.
Practical implication for agriculture: Gateway placement on the highest available point (silo, water tower, ridge line) is more important than central placement. An extra 10–20 m of gateway elevation can add kilometres of reliable coverage across hilly terrain.
- Reliable range (SNR > 0 dB): ~350–400 m in hilly suburban terrain
- Marginal range (packets received, gaps present): 400–950 m
- Maximum observed range: 921 m (suburban, hilly, SF12)
- Coverage edge: SNR −20 dB hit at ~466–593 m on downhill path; recovered at 743 m on uphill return
- Open rural equivalent: Expect 5–15 km with gateway elevated on a structure
The STM32WL55 has no FPU, no OS, and tight memory constraints — the exact environment where C has historically ruled and where bugs are hardest to find. Rust's ownership and borrow checker catches at compile time the class of bugs that are silent disasters in C:
- Buffer overflows — impossible in safe Rust
- Use-after-free / dangling pointers — caught at compile time
- Data races — the type system enforces exclusive or shared access
- Uninitialized memory — variables must be initialized before use
The result: if it compiles, a large class of memory safety bugs is already eliminated. On hardware where a crash means a silent stuck node in the field, that matters.
Embassy is an async runtime for bare-metal embedded Rust. It replaces the traditional RTOS (FreeRTOS, Zephyr) with Rust's native async/await model:
- No heap required — async state machines are stack-allocated
- No context switch overhead — cooperative, not preemptive
- Naturally fits this workload — "wait for radio TX, wait for sensor measurement, wait for timer" maps perfectly onto
async/await - Type-safe peripheral access — the HAL enforces that you can't use a peripheral from two places simultaneously (moves and borrows at the type level)
The firmware pattern:
// Wait for LoRa TX to complete — suspends task, doesn't block CPU
device.send(&payload, 1, false).await;
// Wait for sensor measurement — same
Timer::after_millis(10).await;Without Embassy (or an RTOS), you'd poll these manually with state machines and flags.
Harder entry point than C:
- The borrow checker rejects patterns that are idiomatic in C (e.g. shared mutable state, self-referential structs)
- Embedded Rust's
no_stdenvironment has a smaller ecosystem than C — some drivers don't exist or are immature - Async on embedded is still maturing — the
embassy-timeversion pinning issue withlora-phyis a direct example
Worth it because:
- Bugs caught at compile time don't happen in the field
- The async model scales cleanly — adding more concurrent tasks doesn't require restructuring the whole firmware
- Rust's type system makes the hardware abstraction layer (embassy-stm32) genuinely safer — peripheral ownership is tracked, not assumed
lora-phy and lorawan-device published on crates.io depend on an older embassy-time API. Embassy moved to 0.4.0 and broke the interface. The fix was to fork both crates locally (firmware/lora-phy-patched, firmware/lorawan-device-patched) and update the embassy-time calls.
This is a known growing pain in the embedded Rust ecosystem — crate versions lag behind Embassy releases. The community is working on it (embassy now has more stable API guarantees), but for now: do not cargo update without testing.
thumbv7em-none-eabihf
thumbv7em— ARM Cortex-M4 with Thumb2 instruction setnone— no OSeabihf— hard-float ABI
Despite the hf suffix, the STM32WL55 has no FPU. The eabihf ABI is used anyway because the LoRa PHY crate expects it. All floating-point operations must use integer math with fixed-point scaling (e.g. temperature × 100 stored as i16).
When a gateway is powered off, LoRaWAN nodes have no passive notification mechanism. The radio is fire-and-forget on uplinks. Initially the firmware tried to detect gateway loss by counting consecutive uplink failures with uplink_fail_count, but the counter never reached the threshold and the OLED stayed on "Connected" indefinitely.
Why the original logic was broken:
The failure counter was reset inside the Ok(response) match arm. For unconfirmed uplinks, lorawan-device returns Ok(SendResponse::RxComplete) regardless of whether the gateway received the packet — there is no ACK to wait for. With the uplink pattern at the time (1 confirmed every 5), the 4 unconfirmed uplinks between confirmed ones always returned Ok and reset uplink_fail_count to 0. The counter could never accumulate to 3.
lorawan-device returns Ok(SendResponse::NoAck) — not Err(...) — when a confirmed uplink times out with no ACK in RX1 or RX2. Err is only returned for radio hardware faults.
The SendResponse enum:
pub enum SendResponse {
DownlinkReceived(mac::FcntDown),
SessionExpired,
NoAck, // confirmed uplink sent, no ACK received — gateway unreachable
RxComplete, // unconfirmed uplink complete — gives no gateway feedback
}The fix matches on NoAck explicitly:
match device.send(&payload, 1, use_confirmed).await {
Ok(SendResponse::NoAck) => {
uplink_fail_count += 1; // gateway is not responding
// trigger rejoin after MAX_UPLINK_FAILS
}
Ok(response) => {
// confirmed ACK received (DownlinkReceived or RxComplete on confirmed)
if use_confirmed { uplink_fail_count = 0; }
}
Err(e) => { /* radio hardware fault */ }
}lora-2 sends every uplink confirmed so gateway loss is detected within 3 uplink cycles (~30s).
lora-1 sends every 5th uplink confirmed. The failure counter only increments on NoAck from a confirmed send, and only resets when a confirmed send gets a real ACK. Unconfirmed RxComplete responses are neutral — they neither increment nor reset the counter. Detection time: up to 3 confirmed cycles × 5 × 30s = ~7.5 min worst case.
In LoRaWAN, "uplink sent without error" and "gateway received uplink" are not the same thing. Only a confirmed uplink with an ACK downlink proves end-to-end connectivity. Unconfirmed uplinks are inherently best-effort and provide no liveness information about the network.
If probe-rs run fails with Device or resource busy (os error 16), a previous session is still attached:
pkill -f "probe-rs"Then retry the flash command.