MkDocs RAG Demo

A demonstration project featuring a beautiful MkDocs documentation site with an embedded chat assistant powered by a custom RAG (Retrieval-Augmented Generation) pipeline using Google Gemini.

Overview

This project showcases:

📚 MkDocs Documentation Site - Beautiful, searchable documentation with Material theme
💬 Chat Assistant - Ask questions in natural language and get answers from the docs
🔍 RAG Pipeline - Custom retrieval system using Gemini embeddings and ChromaDB
📎 Source Citations - Every answer includes cited sections from the documentation

Architecture

User Question → Frontend (MkDocs)
                    ↓
           Backend API (FastAPI)
                    ↓
           Query Embedding (Gemini)
                    ↓
      Vector Search (ChromaDB + pgvector)
                    ↓
           Retrieve Top-K Chunks
                    ↓
      Build Prompt + Context
                    ↓
      Gemini Generate Answer
                    ↓
      Return Answer + Citations

Project Structure

mkdocs_rag/
├── frontend/              # MkDocs documentation site
│   ├── docs/             # Markdown documentation files
│   │   ├── index.md
│   │   ├── chat.md       # Chat interface page
│   │   ├── runbooks/     # Operational runbooks
│   │   ├── howtos/       # How-to guides
│   │   └── policies/     # Company policies
│   ├── mkdocs.yml        # MkDocs configuration
│   └── requirements.txt
│
├── backend/              # FastAPI RAG service
│   ├── rag/             # RAG pipeline components
│   │   ├── vector_store.py   # Vector storage (ChromaDB)
│   │   ├── ingestion.py      # Document chunking & embedding
│   │   ├── retriever.py      # Query & answer generation
│   │   └── models.py         # Data models
│   ├── scripts/
│   │   └── index_docs.py     # Index documentation
│   ├── tests/
│   ├── main.py              # FastAPI application
│   ├── config.py            # Configuration
│   ├── pyproject.toml       # uv dependencies
│   └── uv.lock              # Dependency lock file
│
├── notebooks/            # Interactive learning notebooks
│   ├── 01_local_rag_no_cloud.ipynb      # Local RAG basics
│   └── 02_simple_vertex_ai_rag.ipynb    # Vertex AI RAG
│
└── README.md

Quick Start

Prerequisites

Python 3.12+
uv package manager
Google Gemini API key (Get one here)

Option 1: Automated Setup (Recommended)

# Clone the repository
git clone <repository-url>
cd mkdocs_rag

# Set your Gemini API key
export GOOGLE_API_KEY=your_key_here

# Run setup script (installs dependencies and indexes docs)
./setup.sh

# Start both backend and frontend services
./run.sh

Then open http://localhost:8000 in your browser!

Press Ctrl+C to stop both services.

Option 2: Manual Setup

1. Clone and Setup

git clone <repository-url>
cd mkdocs_rag

2. Backend Setup

cd backend

# Install dependencies using uv
uv sync

# Configure environment
cp .env.example .env
# Edit .env and add your GOOGLE_API_KEY

# Index the documentation
uv run python -m scripts.index_docs

# Start the API server
uv run uvicorn main:app --reload

The backend API will be available at http://localhost:8000

3. Frontend Setup

Open a new terminal:

cd frontend
pip install -r requirements.txt

# Start the MkDocs server
mkdocs serve

The documentation site will be available at http://localhost:8000 (or 8001 if 8000 is taken)

Try It Out!

Open the documentation site in your browser
Navigate to the "Chat Assistant" page
Ask questions like:
- "How do I handle a SEV-1 incident?"
- "What is the deployment process?"
- "How do I request production database access?"

Learning Path: Interactive Notebooks 📓

We've created hands-on Jupyter notebooks that take you from zero to building RAG Applications with Vertex AI using Google's RAG Engine.

Follow these notebooks in order for the best learning experience:

1. Local RAG (No Cloud Required) 🏠

File: notebooks/01_local_rag_no_cloud.ipynb

Duration: 5-10 minutes
Prerequisites: None
Cost: $0 (runs entirely locally in colab)
Best for: Understanding RAG fundamentals, privacy-conscious use cases

Learn the core concepts of RAG using local tools: HuggingFace embeddings, FAISS vector store, and local LLM inference. Perfect for understanding how each component works without needing any cloud accounts.

2. Simple Vertex AI RAG Engine 🚀

File: notebooks/02_simple_vertex_ai_rag.ipynb

Duration: 10-15 minutes
Prerequisites: Google Cloud account, billing enabled
Cost: ~$0.10-0.50 per run
Best for: Quick production deployment with minimal code

Experience the managed RAG approach with Vertex AI's RAG Engine, handling infrastructure automatically. See how Google's managed service can accelerate your RAG deployments.

Additional Learning Resources

Here is a great source to learn more about Google's RAG Engine capabilities and best practices

📺 Video Tutorial: Building RAG Applications with Vertex AI

Running the Notebooks

Open them directly in Google Colab using the badges in each notebook!

Features

Documentation Site

Modern Material Design theme with light/dark mode
Full-text search
Responsive navigation
Syntax highlighting
Mobile-friendly

Chat Assistant

Natural language question answering
Multiple AI model options (Gemini, Groq Llama, Mixtral)
Semantic search across all documentation
Source citations with direct links
Context-aware responses
Clean, intuitive interface

RAG Pipeline

Document chunking by headers with overlap
Gemini text embeddings (models/embedding-001)
ChromaDB vector storage
Gemini 2.5 Flash for answer generation
Configurable retrieval parameters

Configuration

Backend Configuration

Edit backend/.env:

# API Keys
GOOGLE_API_KEY=your_key_here
GROQ_API_KEY=your_groq_key_here  # Optional, for Llama models

# Paths
DOCS_PATH=../frontend/docs
CHROMA_PERSIST_DIR=./chroma_db

# RAG Parameters
EMBEDDING_MODEL=models/embedding-001
GENERATION_MODEL=gemini-2.5-flash
GROQ_GENERATION_MODEL=llama-3.1-8b-instant
CHUNK_SIZE=500
CHUNK_OVERLAP=100
TOP_K_RESULTS=5

Frontend Configuration

Edit frontend/mkdocs.yml:

extra:
  backend_api_url: http://localhost:8000  # Change for production

API Endpoints

GET / - API information
GET /health - Health check

POST /api/chat - Chat with documentation

{
  "question": "How do I deploy to production?",
  "model": "groq-llama3"  // optional, defaults to "gemini"
}

GET /api/models - Get list of available models
POST /api/reindex - Rebuild vector index (see Reindexing section)
GET /docs - Interactive API documentation (Swagger UI)

Reindexing

The vector index needs to be rebuilt whenever documentation content changes. Reindexing processes all markdown files, creates embeddings, and updates the vector store.

When to Reindex

After adding, modifying, or deleting documentation files
When updating the documentation structure or content
If search results seem outdated or incomplete
After initial setup (first-time indexing)

How to Reindex

Option 1: Using the API endpoint (recommended)

curl -X POST http://localhost:8000/api/reindex

Option 2: Using the indexing script

cd backend
uv run python -m scripts.index_docs

What Happens During Reindexing

Clear existing index - Removes all existing vectors from the store
Scan documentation - Finds all .md files in the configured DOCS_PATH
Parse and chunk - Splits documents by headers with configurable overlap
Generate embeddings - Creates vector embeddings using Gemini's embedding model
Store vectors - Saves embeddings and metadata to ChromaDB

Reindexing Response

{
  "status": "success",
  "chunks_indexed": 42
}

Notes

Reindexing can take several minutes depending on the number of documents
The API remains available during reindexing, but may return stale results until complete
For production deployments, consider scheduling periodic reindexing or triggering it via CI/CD when docs change

Testing

Backend Tests

cd backend
uv run pytest tests/

Demo Questions

See backend/tests/demo_questions.md for a curated list of questions to demonstrate the system.

Deployment to GCP

Backend (Cloud Run)

# Build and push container from project root
gcloud builds submit --tag gcr.io/PROJECT_ID/mkdocs-rag-backend -f backend/Dockerfile .

# Deploy to Cloud Run
gcloud run deploy mkdocs-rag-backend \
  --image gcr.io/PROJECT_ID/mkdocs-rag-backend \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars GOOGLE_API_KEY=your_key

Frontend (Firebase Hosting or Cloud Storage)

cd frontend

# Build static site
mkdocs build

# Deploy to Firebase
firebase deploy

# Or upload to Cloud Storage
gsutil -m rsync -r site/ gs://your-bucket/

Environment Variables for Production

Set these in Cloud Run:

GOOGLE_API_KEY - Your Gemini API key
DOCS_PATH - Path to docs in container
GENERATION_MODEL - Gemini model to use

Development

Adding New Documentation

Add markdown files to frontend/docs/
Update navigation in frontend/mkdocs.yml
Reindex the vector store to make new content searchable:
```
curl -X POST http://localhost:8000/api/reindex
```
See the Reindexing section for more details.

Customizing the RAG Pipeline

Chunking Strategy: Edit backend/rag/ingestion.py
Retrieval: Modify backend/rag/retriever.py
Vector Store: Swap implementation in backend/rag/vector_store.py

Future: Hybrid RAG + Web Grounding

The HybridRetriever class in backend/rag/retriever.py provides an extension point for adding web-grounded search using Gemini with Google Search. This allows fallback to external sources when internal docs lack information.

Troubleshooting

Backend won't start

Check GOOGLE_API_KEY is set in .env
Verify Python dependencies installed
Check port 8000 is available

Chat not working

Ensure backend is running
Check backend_api_url in mkdocs.yml
Check browser console for errors
Verify CORS settings in backend/main.py

No search results

Reindex the vector store: curl -X POST http://localhost:8000/api/reindex or run uv run python -m scripts.index_docs
Check logs for embedding errors
Verify DOCS_PATH points to correct location
Ensure documents exist in the configured path

Technology Stack

Frontend: MkDocs, Material for MkDocs, Vanilla JavaScript
Backend: FastAPI, Python 3.12+
LLM: Google Gemini (embeddings + generation)
Vector Store: ChromaDB (demo) / PostgreSQL + pgvector (production)
Deployment: GCP (Cloud Run, Firebase Hosting)

License

MIT License - feel free to use for your own projects!

Contributing

This is a demo project, but suggestions and improvements are welcome!

Acknowledgments

Built with MkDocs and Material for MkDocs
Powered by Google Gemini
Vector storage by ChromaDB

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
backend		backend
frontend		frontend
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
run.sh		run.sh
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

MkDocs RAG Demo

Overview

Architecture

Project Structure

Quick Start

Prerequisites

Option 1: Automated Setup (Recommended)

Option 2: Manual Setup

1. Clone and Setup

2. Backend Setup

3. Frontend Setup

Try It Out!

Learning Path: Interactive Notebooks 📓

1. Local RAG (No Cloud Required) 🏠

2. Simple Vertex AI RAG Engine 🚀

Additional Learning Resources

Running the Notebooks

Features

Documentation Site

Chat Assistant

RAG Pipeline

Configuration

Backend Configuration

Frontend Configuration

API Endpoints

Reindexing

When to Reindex

How to Reindex

What Happens During Reindexing

Reindexing Response

Notes

Testing

Backend Tests

Demo Questions

Deployment to GCP

Backend (Cloud Run)

Frontend (Firebase Hosting or Cloud Storage)

Environment Variables for Production

Development

Adding New Documentation

Customizing the RAG Pipeline

Future: Hybrid RAG + Web Grounding

Troubleshooting

Backend won't start

Chat not working

No search results

Technology Stack

License

Contributing

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages