Assessment Recommendation Engine — NVIDIA NIM Stack

Enterprise-grade RAG system using NVIDIA NIM for LLM inference, NV-Embed-QA for embeddings, FAISS for vector search, BM25 for keyword retrieval, and NVIDIA Reranker for ranking.

Architecture

FastAPI Backend
  ↓
Conversation Orchestrator
  ↓
NVIDIA NIM (Llama 3.1 70B)
  ↓
Hybrid Retrieval (BM25 + Vector Search)
  ↓
NVIDIA Reranker
  ↓
Grounded Generation

Quick Start

Install dependencies:

pip install -r requirements.txt

Configure NVIDIA API:

Copy .env.example to .env
Get your NVIDIA API key from the NVIDIA API Catalog
Set NVIDIA_API_KEY in .env

Build embeddings and indices:

python scripts/build_embeddings.py

This script:

Reads data/catalog.json
Generates embeddings using NVIDIA NV-Embed-QA
Builds FAISS vector index
Builds BM25 keyword index

Run the server:

uvicorn app.main:app --reload --port 8000

Test locally:

pytest -q

Evaluate retrieval quality:

python scripts/eval_retrieval.py

This reports:

Recall@5
Recall@10
MRR
reranker improvements
candidate rescue rate

Hosting

The app serves:

API: POST /chat, GET /health
UI: / (static files from app/static/)

Option 1: Render (recommended)

Push this repo to GitHub.
In Render: New → Blueprint and select the repo.
Render will read render.yaml.
Add secret env var NVIDIA_API_KEY in the Render dashboard.

Option 2: Azure App Service (container)

See azure-app-service.md.

Option 3: Docker (anywhere)

docker build -t shl-nim-rag .
docker run -p 8000:8000 \
  -e NVIDIA_API_KEY=YOUR_KEY \
  -e NVIDIA_NIM_BASE_URL=https://integrate.api.nvidia.com/v1 \
  -e AUTO_BUILD_INDICES=true \
  shl-nim-rag

Open:

UI: http://localhost:8000/
Health: http://localhost:8000/health

API Endpoints

`GET /health`

Health check endpoint.

curl http://localhost:8000/health

Response:

{
  "status": "ok",
  "backend": "NVIDIA NIM"
}

`POST /chat`

Main conversation endpoint.

Request:

{
  "messages": [
    {
      "role": "user",
      "content": "I'm looking for a Python backend developer assessment"
    }
  ],
  "top_k": 5,
  "use_reranker": true
}

Response:

{
  "action": "respond",
  "reply": "Based on your requirement for a Python backend developer...",
  "retrieved_assessments": [
    {
      "rank": 1,
      "id": "assessment_1",
      "title": "Python Backend Developer Assessment",
      "hybrid_score": 0.95,
      "vector_score": 0.88,
      "bm25_score": 0.92,
      "rerank_score": 0.98,
      "final_rank": 1,
      "meta": {...}
    }
  ],
  "turn_count": 1,
  "provenance": {
    "model": "meta/llama-3.1-70b-instruct",
    "embedding_model": "nvidia/nv-embed-qa-e5-v5",
    "retrieval_method": "hybrid_bm25_vector",
    "reranked": true
  }
}

Key Components

NVIDIA NIM Client (`app/services/nim_client.py`)

Chat completions via Llama 3.1 70B
Embeddings via NV-Embed-QA
Reranking via NVIDIA Reranker

Hybrid Retrieval (`app/retrieval/hybrid.py`)

BM25 for keyword matching
Vector search via FAISS + NV-Embed
Weighted hybrid scoring (semantic + bm25 + metadata)
Query expansion for intent-rich prompts
Metadata boosting using catalog fields
Reranking for final ranking

BM25 Retriever (`app/retrieval/bm25.py`)

Lightweight keyword-based search
Fast inference

FAISS Index (`app/retrieval/faiss_index.py`)

Vector similarity search
Efficient L2 distance computation

FastAPI Orchestration (`app/main.py`)

Stateless conversation handling
Clarification-first policy
Grounded generation with provenance

Configuration

Set these environment variables in .env:

NVIDIA_API_KEY=<your-nvidia-api-key>
NVIDIA_NIM_BASE_URL=https://integrate.api.nvidia.com/v1
FAISS_INDEX_PATH=data/faiss.index
EMBEDDINGS_PKL=data/embeddings.pkl
CATALOG_JSON=data/catalog.json
BM25_PKL=data/bm25_retriever.pkl

Workflow

User query reaches FastAPI
Clarification check determines whether more detail is needed
Hybrid retrieval combines BM25 and vector search
Reranking re-scores the top candidates
Grounding prompt includes the retrieved assessments
Llama 3.1 generates the response grounded in retrieved data
The API returns structured JSON with provenance

Why NVIDIA NIM?

Enterprise-grade GPU-accelerated inference
Open models such as Llama 3.1 and NV-Embed
Cost-effective pay-per-token pricing
High performance for production RAG pipelines
Useful for demonstrating modern AI infrastructure knowledge

Development Notes

FAISS and BM25 are stored locally for fast iteration
Reranker integration is placeholder; update with NVIDIA reranker API
Conversation state is reconstructed from message history
Turn count limited to 8 for assignment constraints

Vercel Deployment

This repo is Vercel-ready as a Python ASGI app. Vercel uses api/index.py as the entrypoint, and that module imports the FastAPI app from app/main.py.

Push the repository to GitHub.
Import the repo into Vercel.
Set the required runtime variables, especially NVIDIA_API_KEY.
Make sure the built assets in data/ are present so /health and /chat can load the retriever.
The UI is available at /ui, and / redirects there automatically.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
_sample_conversations/GenAI_SampleConversations		_sample_conversations/GenAI_SampleConversations
app		app
data		data
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env		.env
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Dockerfile		Dockerfile
README.md		README.md
SHL_AI_Intern_Assignment.pdf		SHL_AI_Intern_Assignment.pdf
azure-app-service.md		azure-app-service.md
pytest.ini		pytest.ini
render.yaml		render.yaml
requirements.txt		requirements.txt
sample_conversations.zip		sample_conversations.zip
tunnel.py		tunnel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assessment Recommendation Engine — NVIDIA NIM Stack

Architecture

Quick Start

Hosting

Option 1: Render (recommended)

Option 2: Azure App Service (container)

Option 3: Docker (anywhere)

API Endpoints

`GET /health`

`POST /chat`

Key Components

NVIDIA NIM Client (`app/services/nim_client.py`)

Hybrid Retrieval (`app/retrieval/hybrid.py`)

BM25 Retriever (`app/retrieval/bm25.py`)

FAISS Index (`app/retrieval/faiss_index.py`)

FastAPI Orchestration (`app/main.py`)

Configuration

Workflow

Why NVIDIA NIM?

Development Notes

Vercel Deployment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Assessment Recommendation Engine — NVIDIA NIM Stack

Architecture

Quick Start

Hosting

Option 1: Render (recommended)

Option 2: Azure App Service (container)

Option 3: Docker (anywhere)

API Endpoints

GET /health

POST /chat

Key Components

NVIDIA NIM Client (app/services/nim_client.py)

Hybrid Retrieval (app/retrieval/hybrid.py)

BM25 Retriever (app/retrieval/bm25.py)

FAISS Index (app/retrieval/faiss_index.py)

FastAPI Orchestration (app/main.py)

Configuration

Workflow

Why NVIDIA NIM?

Development Notes

Vercel Deployment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /health`

`POST /chat`

NVIDIA NIM Client (`app/services/nim_client.py`)

Hybrid Retrieval (`app/retrieval/hybrid.py`)

BM25 Retriever (`app/retrieval/bm25.py`)

FAISS Index (`app/retrieval/faiss_index.py`)

FastAPI Orchestration (`app/main.py`)

Packages