Skip to content

docs: INSTALL vLLM (8010, cache, VRAM), NGC/perms; preset 8010#32

Merged
tokk-nv merged 2 commits into
NVIDIA-AI-IOT:mainfrom
tokk-nv:docs/readme-setup-riva-pyproject
Mar 14, 2026
Merged

docs: INSTALL vLLM (8010, cache, VRAM), NGC/perms; preset 8010#32
tokk-nv merged 2 commits into
NVIDIA-AI-IOT:mainfrom
tokk-nv:docs/readme-setup-riva-pyproject

Conversation

@tokk-nv

@tokk-nv tokk-nv commented Mar 14, 2026

Copy link
Copy Markdown
Member

Summary

Documentation and small config/UI updates: vLLM on port 8010 with cache volume and GPU memory notes, NGC/permission troubleshooting, and preset/UI aligned to 8010.

Changes

INSTALL.md

  • vLLM: Standardize on port 8010 (avoids Riva 8000–8002). Add optional torch.compile cache volume (-v ~/.cache/vllm:/root/.cache/vllm) and mkdir -p ~/.cache/vllm before first run so the directory is user-owned.
  • Memory: Clarify that vm.drop_caches=3 frees system (CPU) memory only, not GPU VRAM. Note to stop the first vLLM container and wait 30–60 s before starting again.
  • Troubleshooting: New subsection for ValueError: Free memory on device cuda:0 (...) is less than desired GPU memory utilization (stop other process, wait for VRAM).
  • NGC Cosmos: Soften “org must be nvidia” to “try org nvidia if you see 403”; note that download often works with default org. Permission-denied bullet points to “Fix Hugging Face cache permissions (root-owned).”

Preset & UI

  • presets/cosmos-reason.yaml: api_base set to http://localhost:8010/v1.
  • app.js: vLLM API preset updated from 8003 to 8010 (label and URL).

Backend (openai.py)

  • Model list: Docstring updated to state that we probe /api/tags first (Ollama can run on any port), then fall back to /v1/models.

Testing

  • Verified vLLM Docker run with cache volume; second run reuses torch.compile cache (~24 s faster).
  • Confirmed NGC Cosmos download can succeed with default org when permissions are correct.

Screenshots

image

tokk-nv added 2 commits March 13, 2026 17:11
- README: refresh Key Features (UI/devices, USB mic/speaker/webcam), remove unimpl features, fix clone URL
- setup_riva: app naming (Multi-modal AI Studio / Live RIVA WebUI), riva-speech container, Docker/GPU troubleshooting from stash
- pyproject: authors (YATO, Sahu, kbenkhaled), project URLs (NVIDIA-AI-IOT), name comment

Made-with: Cursor
@tokk-nv tokk-nv requested a review from adsahu-nv March 14, 2026 04:32

@adsahu-nv adsahu-nv left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks

@tokk-nv tokk-nv merged commit d08e3ba into NVIDIA-AI-IOT:main Mar 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants