docs: INSTALL vLLM (8010, cache, VRAM), NGC/perms; preset 8010 by tokk-nv · Pull Request #32 · NVIDIA-AI-IOT/multi_modal_ai_studio

tokk-nv · 2026-03-14T04:32:20Z

Summary

Documentation and small config/UI updates: vLLM on port 8010 with cache volume and GPU memory notes, NGC/permission troubleshooting, and preset/UI aligned to 8010.

Changes

INSTALL.md

vLLM: Standardize on port 8010 (avoids Riva 8000–8002). Add optional torch.compile cache volume (-v ~/.cache/vllm:/root/.cache/vllm) and mkdir -p ~/.cache/vllm before first run so the directory is user-owned.
Memory: Clarify that vm.drop_caches=3 frees system (CPU) memory only, not GPU VRAM. Note to stop the first vLLM container and wait 30–60 s before starting again.
Troubleshooting: New subsection for ValueError: Free memory on device cuda:0 (...) is less than desired GPU memory utilization (stop other process, wait for VRAM).
NGC Cosmos: Soften “org must be nvidia” to “try org nvidia if you see 403”; note that download often works with default org. Permission-denied bullet points to “Fix Hugging Face cache permissions (root-owned).”

Preset & UI

presets/cosmos-reason.yaml: api_base set to http://localhost:8010/v1.
app.js: vLLM API preset updated from 8003 to 8010 (label and URL).

Backend (openai.py)

Model list: Docstring updated to state that we probe /api/tags first (Ollama can run on any port), then fall back to /v1/models.

Testing

Verified vLLM Docker run with cache volume; second run reuses torch.compile cache (~24 s faster).
Confirmed NGC Cosmos download can succeed with default org when permissions are correct.

Screenshots

- README: refresh Key Features (UI/devices, USB mic/speaker/webcam), remove unimpl features, fix clone URL - setup_riva: app naming (Multi-modal AI Studio / Live RIVA WebUI), riva-speech container, Docker/GPU troubleshooting from stash - pyproject: authors (YATO, Sahu, kbenkhaled), project URLs (NVIDIA-AI-IOT), name comment Made-with: Cursor

… preset 8010

adsahu-nv

Looks good! Thanks

tokk-nv added 2 commits March 13, 2026 17:11

INSTALL: vLLM 8010, cache volume, VRAM/perm notes; Ollama vision fix;…

9a1fdf2

… preset 8010

tokk-nv requested a review from adsahu-nv March 14, 2026 04:32

adsahu-nv approved these changes Mar 14, 2026

View reviewed changes

tokk-nv merged commit d08e3ba into NVIDIA-AI-IOT:main Mar 14, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: INSTALL vLLM (8010, cache, VRAM), NGC/perms; preset 8010#32

docs: INSTALL vLLM (8010, cache, VRAM), NGC/perms; preset 8010#32
tokk-nv merged 2 commits into
NVIDIA-AI-IOT:mainfrom
tokk-nv:docs/readme-setup-riva-pyproject

tokk-nv commented Mar 14, 2026

Uh oh!

adsahu-nv left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tokk-nv commented Mar 14, 2026

Summary

Changes

INSTALL.md

Preset & UI

Backend (openai.py)

Testing

Screenshots

Uh oh!

adsahu-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants