Skip to content

Commit e130dbf

Browse files
authored
Merge pull request #19 from NVIDIA-AI-IOT/feature/ui-upgrades
chore: release 0.3.0 - UI upgrade, robotics prompts, Fixes #14
2 parents 7e726bf + f24cda1 commit e130dbf

10 files changed

Lines changed: 612 additions & 52 deletions

File tree

CHANGELOG.md

Lines changed: 30 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
### Planned for 0.4.0
11+
- **Multi-session support for cloud deployment**: Scope multi-user / multi-session architecture for cloud deployments (see current limitations in 0.3.0).
12+
13+
---
14+
15+
## [0.3.0] - 2026-03-02
16+
17+
**UI upgrade and robotics-oriented prompts**
18+
19+
### Added
20+
- **Video overlay controls (play / stop)**:
21+
- Big green PLAY button centered on video; animates to top-left and fades when streaming starts
22+
- Small red STOP button in top-left while streaming (higher opacity for visibility)
23+
- Sidebar start/stop replaced by overlay flow for cleaner UX
24+
- **Fullscreen mode**: Toggle fullscreen on the video card with VLM output overlay; shrink and mirror buttons remain clickable (z-index fix)
25+
- **Robotics-oriented prompt preset**: "Robot Navigation (Simple)" system prompt—describe scene and output 5 navigation commands (`linear_x`, `angular_z`) with reasons, e.g. for bathroom-finding or similar tasks
26+
1027
### Fixed
11-
- **Model initialization race condition**: Fixed auto-selected models not being sent to server
12-
- Previously, if the UI auto-selected a model on page load, it wouldn't be sent to the server
13-
- This happened because `fetchModels()` ran before WebSocket connection completed
14-
- Symptom: Camera opens but no VLM processing until manually selecting a model
15-
- Fix: Send current model to server immediately after WebSocket connects
16-
- Ensures server always uses the model shown in UI, even when auto-selected
17-
- Result: VLM processing starts automatically without requiring manual model selection
28+
- **Model initialization race condition**: Auto-selected model is sent to server as soon as WebSocket connects so VLM processing starts without manually re-selecting the model
29+
- **MediaStreamError on stop**: Track end when user stops is handled as normal shutdown (logged at DEBUG only, no error/traceback)
30+
- **Fullscreen controls**: Shrink (minimize) and Mirror buttons stay above the VLM overlay and remain clickable in fullscreen
31+
- **Jetson Thor Docker** ([#14](https://github.com/NVIDIA-AI-IOT/live-vlm-webui/issues/14)): `start_container.sh` now uses `--runtime=nvidia` instead of `--gpus all` on Jetson (Thor and Orin) so containers start correctly
32+
33+
### Changed
34+
- **WebRTC**: Wait for ICE gathering to complete before sending offer (reduces stuck "checking" connections)
35+
- **Troubleshooting**: New "WebRTC connection issues" section (ICE stuck, firewall, STUN, verification steps)
36+
- **Scripts**: `start_server.sh` suggests `kill -9` when port is in use
1837

1938
---
2039

@@ -360,6 +379,9 @@ This is the initial public release of Live VLM WebUI - a real-time vision langua
360379

361380
---
362381

363-
[Unreleased]: https://github.com/NVIDIA-AI-IOT/live-vlm-webui/compare/v0.1.1...HEAD
382+
[Unreleased]: https://github.com/NVIDIA-AI-IOT/live-vlm-webui/compare/v0.3.0...HEAD
383+
[0.3.0]: https://github.com/NVIDIA-AI-IOT/live-vlm-webui/compare/v0.2.1...v0.3.0
384+
[0.2.1]: https://github.com/NVIDIA-AI-IOT/live-vlm-webui/compare/v0.2.0...v0.2.1
385+
[0.2.0]: https://github.com/NVIDIA-AI-IOT/live-vlm-webui/compare/v0.1.1...v0.2.0
364386
[0.1.1]: https://github.com/NVIDIA-AI-IOT/live-vlm-webui/compare/v0.1.0...v0.1.1
365387
[0.1.0]: https://github.com/NVIDIA-AI-IOT/live-vlm-webui/releases/tag/v0.1.0

docs/troubleshooting.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -395,6 +395,105 @@ For production use, get a proper SSL certificate from Let's Encrypt or a certifi
395395

396396
---
397397

398+
## WebRTC Connection Issues
399+
400+
### No VLM analysis results / GPU not increasing / Connection stuck
401+
402+
**Symptoms:**
403+
- ✅ Server starts successfully
404+
- ✅ Web UI loads properly
405+
- ✅ Camera permission granted
406+
- ❌ No VLM analysis results appear
407+
- ❌ GPU utilization stays at 0%
408+
- ❌ Video preview may show but no processing happens
409+
410+
**Root Cause:** WebRTC connection is not completing. The ICE (Interactive Connectivity Establishment) connection gets stuck in "checking" state and never reaches "connected".
411+
412+
**How to verify this is the issue:**
413+
414+
Check server logs for this pattern:
415+
```log
416+
ICE gathering state: complete
417+
Created answer with 1 transceivers
418+
ICE connection state: checking
419+
Connection state: connecting
420+
# ❌ Connection never progresses to "connected"
421+
```
422+
423+
Check browser console (F12 → Console tab):
424+
```javascript
425+
ICE connection state: checking
426+
# ❌ Should show "connected" but doesn't
427+
```
428+
429+
**Solution:** This issue has been fixed in recent versions. Update to the latest version:
430+
431+
```bash
432+
# Update to latest version
433+
pip install --upgrade live-vlm-webui
434+
435+
# Or if using git:
436+
cd live-vlm-webui
437+
git pull
438+
pip install -e .
439+
```
440+
441+
**If updating doesn't help, check these:**
442+
443+
1. **Firewall blocking WebRTC:**
444+
```bash
445+
# Allow UDP for WebRTC
446+
sudo ufw allow 8090/tcp
447+
sudo ufw allow 49152:65535/udp # WebRTC ports
448+
```
449+
450+
2. **STUN server unreachable:**
451+
```bash
452+
# Test STUN server connectivity
453+
curl -I stun.l.google.com:19302
454+
```
455+
456+
3. **Corporate/Network restrictions:**
457+
- Some corporate networks block WebRTC traffic
458+
- Try from a different network or use mobile hotspot for testing
459+
- Check if UDP traffic is blocked by your router/firewall
460+
461+
4. **Browser compatibility:**
462+
- ✅ Chrome/Edge (recommended - best WebRTC support)
463+
-Firefox (good support)
464+
- ⚠️ Safari (limited support)
465+
- Use latest browser version
466+
467+
5. **SSL certificate issues:**
468+
- Make sure you accepted the self-signed certificate warning
469+
- Clear browser cache and reload: Ctrl+Shift+R (Cmd+Shift+R on Mac)
470+
471+
**Technical Details:**
472+
473+
The fix ensures ICE candidates are properly gathered before exchanging WebRTC offers. Without this, the peers can't find network paths to connect, leaving the connection in "checking" state indefinitely.
474+
475+
**Verify the fix worked:**
476+
477+
After starting camera, you should see in server logs:
478+
```log
479+
✅ ICE gathering state: complete
480+
✅ Created answer with 1 transceivers
481+
✅ ICE connection state: checking
482+
✅ ICE connection state: connected # ← This line should appear!
483+
✅ Connection state: connected
484+
```
485+
486+
And browser console should show:
487+
```javascript
488+
ICE connection state: connected // ← Must see this!
489+
```
490+
491+
Once connected, you should immediately see:
492+
- VLM analysis results appearing in the UI
493+
- GPU utilization increasing (check with `nvidia-smi` or `jtop`)
494+
495+
---
496+
398497
## VLM Backend Issues
399498
400499
> 📖 **Reference:** For a complete list of available Vision-Language Models across different providers, see [List of VLMs](usage/list-of-vlms.md).

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "live-vlm-webui"
7-
version = "0.2.1"
7+
version = "0.3.0"
88
description = "Real-time Vision Language Model interaction web interface"
99
readme = "README.md"
1010
requires-python = ">=3.10"

scripts/pre_commit_check.sh

100644100755
File mode changed.

scripts/start_container.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1029,7 +1029,7 @@ elif [ "$ARCH" = "aarch64" ]; then
10291029
if echo "$GPU_NAME" | grep -qi "thor"; then
10301030
PLATFORM="jetson-thor"
10311031
PLATFORM_SUFFIX="-jetson-thor"
1032-
GPU_FLAG="--gpus all"
1032+
RUNTIME_FLAG="--runtime=nvidia"
10331033
echo -e " Platform: ${GREEN}NVIDIA Jetson Thor${NC} (detected via GPU: ${GPU_NAME})"
10341034
else
10351035
PLATFORM="jetson-orin"
@@ -1046,7 +1046,7 @@ elif [ "$ARCH" = "aarch64" ]; then
10461046
if [ "$L4T_VERSION" -ge 38 ]; then
10471047
PLATFORM="jetson-thor"
10481048
PLATFORM_SUFFIX="-jetson-thor"
1049-
GPU_FLAG="--gpus all"
1049+
RUNTIME_FLAG="--runtime=nvidia"
10501050
echo -e " Platform: ${GREEN}NVIDIA Jetson Thor${NC} (L4T R${L4T_VERSION})"
10511051
else
10521052
PLATFORM="jetson-orin"

scripts/start_server.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -184,14 +184,14 @@ if [ "$PORT_IN_USE" = true ]; then
184184
if [ -n "$PID" ]; then
185185
PROC_INFO=$(ps -p $PID -o comm= 2>/dev/null || echo "unknown")
186186
echo " Process using port 8090: PID $PID ($PROC_INFO)"
187-
echo " kill $PID"
187+
echo " kill -9 $PID"
188188
else
189189
echo " lsof -ti :8090 # Find the process"
190-
echo " kill <PID> # Stop it"
190+
echo " kill -9 <PID> # Force stop it"
191191
fi
192192
else
193193
echo " netstat -tulpn | grep :8090 # Find the process"
194-
echo " kill <PID> # Stop it"
194+
echo " kill -9 <PID> # Force stop it"
195195
fi
196196
echo ""
197197
echo "Option 3: Use a different port"

src/live_vlm_webui/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
with real-time AI analysis and system monitoring.
2121
"""
2222

23-
__version__ = "0.2.1"
23+
__version__ = "0.3.0"
2424
__author__ = "NVIDIA Corporation"
2525
__license__ = "Apache-2.0"
2626

0 commit comments

Comments
 (0)