Fully Autonomous On-Board Q-Learning – 2-Servo Crawling Robot with Ultrasonic Reward & Swarm-Ready OTA
Zero off-board computation – Complete Q-Learning (35×35 table) runs on ESP32 at 100–200 Hz
Real-time reward directly from HC-SR04 ultrasonic sensor
Supports up to 8 robots simultaneously with unique WiFi AP + OTA hostname
Full source code + schematics included
- 35 discrete posture states + 35 possible actions → 1225 Q-values stored in RAM
- Distance-based reward:
r = 2.5 × Δdistance(encourages forward motion) - ε-greedy with decay (0.8 → 0.1) → converges in ~10 episodes (~2–3 min)
- Smooth servo trajectories with configurable speed
- Startup health check (ultrasonic + servo reset
- I2C 16×2 LCD real-time feedback
- EEPROM-persisted robot ID (1–8) → unique SSID:
ESP32-AP-1…ESP32-AP-8 - Non-blocking OTA updates via FreeRTOS task
![]() |
2-servo crawling robot learning optimal forward gait completely on-board using only an ultrasonic sensor as reward signal
- ESP32 Dev Module
- 2× SG90/MG90S servo motors
- HC-SR04 ultrasonic sensor
- 16×2 I2C LCD
- Custom 3D-printed linkage mechanism
