Skip to content

Releases: NVIDIA-AI-IOT/live-vlm-webui

v0.4.0 - Cloud deployment, multi-session, debug payloads

08 Mar 15:54
ecc9fc6

Choose a tag to compare

Cloud deployment, debug payloads, and docs

Added

  • Multi-session support: Per-session VLM state and WebSocket routing so multiple tabs/users get isolated streams and config (cloud-friendly).
  • Cloud deployment config overrides: Environment variables LIVE_VLM_API_BASE, LIVE_VLM_DEFAULT_MODEL, and LIVE_VLM_PROCESS_EVERY override default API base URL, default model, and frame processing interval (e.g. LIVE_VLM_PROCESS_EVERY=150 for ~5 s interval to reduce API quota; LIVE_VLM_API_BASE=https://integrate.api.nvidia.com/v1 and LIVE_VLM_DEFAULT_MODEL=google/gemma-3-4b-it for NVIDIA API Catalog). Documented in Docker guide with -e / --env-file examples.
  • Debug payloads: In Settings (gear) → Debug: toggles for Show request payload and Show response payload. Collapsible request JSON (under prompt) and response JSON (under VLM result) for debugging how image and prompt are sent and what the API returns.

Fixed

  • Server startup: Resolved UnboundLocalError for os when using env overrides (removed redundant import os/import sys inside main()).

Changed

  • Server config: server_config now includes process_every so the UI shows the server default frame interval on connect.
  • README: Ollama on Jetson Thor — recommend upgrading to latest Ollama instead of pinning to 0.12.9; link to troubleshooting for workarounds if needed.

v0.3.0 - UI upgrade and robotics prompts

02 Mar 21:15
e130dbf

Choose a tag to compare

Added

  • Video overlay controls (play / stop):
    • Big green PLAY button centered on video; animates to top-left and fades when streaming starts
    • Small red STOP button in top-left while streaming (higher opacity for visibility)
    • Sidebar start/stop replaced by overlay flow for cleaner UX
  • Fullscreen mode: Toggle fullscreen on the video card with VLM output overlay; shrink and mirror buttons remain clickable (z-index fix)
  • Robotics-oriented prompt preset: "Robot Navigation (Simple)" system prompt—describe scene and output 5 navigation commands (linear_x, angular_z) with reasons, e.g. for bathroom-finding or similar tasks

Fixed

  • Model initialization race condition: Auto-selected model is sent to server as soon as WebSocket connects so VLM processing starts without manually re-selecting the model
  • MediaStreamError on stop: Track end when user stops is handled as normal shutdown (logged at DEBUG only, no error/traceback)
  • Fullscreen controls: Shrink (minimize) and Mirror buttons stay above the VLM overlay and remain clickable in fullscreen
  • Jetson Thor Docker (#14): start_container.sh now uses --runtime=nvidia instead of --gpus all on Jetson (Thor and Orin) so containers start correctly

Changed

  • WebRTC: Wait for ICE gathering to complete before sending offer (reduces stuck "checking" connections)
  • Troubleshooting: New "WebRTC connection issues" section (ICE stuck, firewall, STUN, verification steps)
  • Scripts: start_server.sh suggests kill -9 when port is in use

v0.2.1 - Version String Fix and Test Infrastructure Improvements

14 Nov 00:31

Choose a tag to compare

🐛 Bug Fixes

  • Version string fix: live-vlm-webui --version now correctly displays 0.2.1 (was showing 0.1.1 in v0.2.0)
  • Test infrastructure: Fixed pytest-asyncio event loop conflicts - all tests now pass reliably

📦 Installation

pip install --upgrade live-vlm-webui==0.2.1

Verify: live-vlm-webui --version should show 0.2.1

📚 Documentation

  • Consolidated release documentation
  • Added version verification steps to prevent future version mismatches

Screenshot

RTSP Feature Preview:

2025-11-13_16h16_44

Full changelog: CHANGELOG.md

v0.2.0 - RTSP IP Camera Support (Beta) + UI/UX Improvements

13 Nov 23:17

Choose a tag to compare

Release Notes for v0.2.0

🎉 What's New

New Beta Feature: RTSP IP Camera Support

  • Stream video from RTSP IP cameras for continuous monitoring
  • Switch between webcam and RTSP camera in UI
  • Auto-reconnection on stream drops
  • Support for H.264, H.265, and MJPEG codecs
  • Complete setup guide: docs/usage/rtsp-ip-cameras.md

UI/UX Improvements

  • 🎨 OS Dark/Light Mode Preference: Automatically detects and honors system theme preference
  • 📝 Markdown Rendering: Render formatted markdown responses from VLMs (headers, lists, code blocks, tables)
  • 📋 Copy to Clipboard: One-click copy of generation results with visual feedback
  • 🐳 Docker Version Picker: Interactive version selection for Docker deployments
2025-11-13_16h16_44

📦 Installation

New Installation:

pip install live-vlm-webui==0.2.0

Upgrading from Previous Version:

pip install --upgrade live-vlm-webui==0.2.0

Or simply:

pip install --upgrade live-vlm-webui

Docker Users:

# Use the versioned image
./scripts/start_container.sh --version 0.2.0

# Or use latest (will be 0.2.0 after release)
./scripts/start_container.sh

📚 Documentation

  • RTSP setup guide: docs/usage/rtsp-ip-cameras.md
  • Full changelog: See CHANGELOG.md

🔗 Links

v0.1.1 - WSL2 Support and VLM Documentation

12 Nov 23:59

Choose a tag to compare

Bug Fixes and Documentation Improvements

🐛 Fixed

  • WSL2 GPU monitoring resilience: Robust error handling for intermittent NVML issues
    • Automatic retry logic with graduated thresholds
    • Auto-recovery when GPU access restored
    • WSL2 now fully supported with complete GPU monitoring

📚 Added

  • Comprehensive VLM documentation: Complete model catalog with 16 verified NVIDIA API models
    • Corrected vision capabilities for gemma3 and llava models
    • Guidance on text-only vs vision-capable models
    • Troubleshooting for common model selection issues
  • Windows WSL usage guide: Full setup instructions for WSL2 environments

🧪 Tested On

  • ✅ Windows WSL2 (Ubuntu 22.04) with NVIDIA RTX A3000 Laptop GPU

Installation

pip install --upgrade live-vlm-webui==0.1.1

See CHANGELOG.md for complete details.

v0.1.0 - First PyPI Release

10 Nov 20:53

Choose a tag to compare

v0.1.0 - First PyPI Release 🎉

This is the first public release of Live VLM WebUI - a real-time vision language model interface with WebRTC video streaming and live GPU monitoring.

✨ Key Features

Real-time Vision Analysis

  • WebRTC video streaming with live VLM analysis overlay
  • Support for multiple VLM backends: Ollama, vLLM, NVIDIA API Catalog, OpenAI API
  • Configurable prompts, frame intervals, and model settings

Live System Monitoring

  • GPU utilization and VRAM usage with sparkline charts
  • CPU and RAM monitoring
  • Inference latency tracking (last, average, total count)

Multi-Platform Support

  • ✅ PC with NVIDIA GPUs
  • ✅ NVIDIA DGX Spark
  • ✅ NVIDIA Jetson (Orin, Thor)
  • ✅ Apple Silicon Macs
  • ✅ CPU-only fallback

Easy Deployment

  • PyPI package: pip install live-vlm-webui
  • Docker images for all platforms
  • Automatic SSL certificate generation

📦 Installation

pip install live-vlm-webui
live-vlm-webui

Then open your browser to https://localhost:8090

Docker:

docker run -d --gpus all -p 8090:8090 \
  ghcr.io/nvidia-ai-iot/live-vlm-webui:latest

🧪 Tested Platforms

  • ✅ x86_64 PC (Linux Ubuntu 22.04)
  • ✅ NVIDIA DGX Spark (ARM64 SBSA)
  • ✅ NVIDIA Jetson AGX Thor (JetPack 7.0)
  • ✅ NVIDIA Jetson AGX Orin (JetPack 6.2)
  • ✅ NVIDIA Jetson Orin Nano (JetPack 6.2)
  • ✅ macOS (Apple Silicon M-series)

⚠️ Known Issues

  • Jetson Thor + Ollama 0.12.10: GPU inference fails on JetPack 7.0

📚 Documentation

🙏 Acknowledgments

This is the initial release developed and tested across multiple NVIDIA platforms. Feedback and contributions are welcome!


Full Changelog: https://github.com/NVIDIA-AI-IOT/live-vlm-webui/blob/main/CHANGELOG.md