AI vWS Sizing Advisor

RAG-powered vGPU sizing recommendations for AI Virtual Workstations
Powered by NVIDIA NeMo and Nemotron models

Official Documentation • Demo • Deployment • Changelog

Overview

AI vWS Sizing Advisor is a RAG-powered tool that helps you determine the optimal NVIDIA vGPU sizing configuration for AI workloads on NVIDIA AI Virtual Workstation (AI vWS). Using NVIDIA vGPU documentation and best practices, it provides tailored recommendations for optimal performance and resource efficiency.

Powered by NVIDIA Nemotron

This tool leverages NVIDIA Nemotron models for intelligent sizing recommendations:

Llama-3.3-Nemotron-Super-49B — Powers the RAG backend for intelligent conversational sizing guidance
Nemotron-3 Nano 30B — Default model for workload sizing calculations

Key Capabilities

Enter your workload requirements and receive validated recommendations including:

vGPU Profile — Recommended profile (e.g., L40S-24Q) based on your workload
Resource Requirements — vCPUs, GPU memory, system RAM needed
Performance Estimates — Expected latency, throughput, and time to first token
Live Testing — Instantly deploy and validate your configuration locally using vLLM containers

The tool differentiates between RAG and inference workloads by accounting for embedding vectors and database overhead. It intelligently suggests GPU passthrough when jobs exceed standard vGPU profile limits.

Application Workflow

The architecture follows a three-phase pipeline: document ingestion, RAG-based profile suggestion, and local deployment verification.

Demo

Configuration Wizard

Configure your workload parameters including model selection, GPU type, quantization, and token sizes:

Local Deployment Verification

Validate your configuration by deploying a vLLM container locally and comparing actual GPU memory usage against estimates:

Deployment

Requirements: Ubuntu 22.04 LTS • NVIDIA GPU (L40S/L40/L4/A40, 24GB+ VRAM) • Driver 535+ • 32GB RAM • 50GB storage

# 1. Install dependencies (skip if already installed)
sudo apt update && sudo apt install -y docker.io npm
sudo usermod -aG docker $USER && newgrp docker

# 2. Clone and navigate
git clone https://github.com/NVIDIA/GenerativeAIExamples.git
cd GenerativeAIExamples/community/ai-vws-sizing-advisor

# 3. Set API key (get yours at https://build.nvidia.com/settings/api-keys)
export NGC_API_KEY="nvapi-your-key-here"
echo "${NGC_API_KEY}" | docker login nvcr.io -u '$oauthtoken' --password-stdin

# 4. Start backend (first run takes 3-5 min)
./scripts/start_app.sh

# 5. Start frontend (in new terminal)
cd frontend && npm install && npm run dev

Note: A HuggingFace token is required for local deployment testing with gated models (e.g., Llama).

Usage

Select Workload Type: RAG or Inference
Enter Parameters:
- Model name (default: Nemotron-3 Nano 30B FP8)
- GPU type
- Prompt size (input tokens)
- Response size (output tokens)
- Quantization (FP16, FP8, INT8, INT4)
- For RAG: Embedding model and vector dimensions
View Recommendations:
- Recommended vGPU profiles
- Resource requirements (vCPUs, RAM, GPU memory)
- Performance estimates
Test Locally (optional):
- Run local inference with a containerized vLLM server
- View performance metrics
- Compare actual results versus suggested profile configuration

Management Commands

# Service Management
./scripts/status.sh           # Check status of all services
./scripts/restart_app.sh      # Restart RAG backend
./scripts/stop_app.sh         # Stop all services (with Docker cleanup)
./scripts/stop_app.sh --volumes     # Stop services and remove all data
./scripts/stop_app.sh --cleanup-images  # Stop services and clean Docker cache

# Logs
docker logs -f rag-server      # View RAG server logs
docker logs -f ingestor-server # View ingestion logs

Stop Script Options

The stop script automatically performs Docker cleanup operations:

Removes stopped containers
Prunes unused volumes
Cleans up unused networks
Optionally removes dangling images (--cleanup-images)
Optionally removes all data volumes (--volumes)

Adding Documents to RAG Context

The tool includes NVIDIA vGPU documentation by default. To add your own:

# Copy document to docs directory
cp your-document.pdf ./vgpu_docs/

# Trigger ingestion
curl -X POST -F "file=@./vgpu_docs/your-document.pdf" http://localhost:8082/v1/ingest

Supported formats: PDF, TXT, DOCX, HTML, PPTX

License

Licensed under the Apache License, Version 2.0.

Models governed by NVIDIA AI Foundation Models Community License and Llama 3.2 Community License.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
deploy		deploy
deployment_examples		deployment_examples
docs		docs
frontend		frontend
scripts		scripts
src		src
vgpu_docs		vgpu_docs
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI vWS Sizing Advisor

Overview

Powered by NVIDIA Nemotron

Key Capabilities

Application Workflow

Demo

Configuration Wizard

Local Deployment Verification

Deployment

Usage

Management Commands

Stop Script Options

Adding Documents to RAG Context

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI vWS Sizing Advisor

Overview

Powered by NVIDIA Nemotron

Key Capabilities

Application Workflow

Demo

Configuration Wizard

Local Deployment Verification

Deployment

Usage

Management Commands

Stop Script Options

Adding Documents to RAG Context

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages