A demonstration project featuring a beautiful MkDocs documentation site with an embedded chat assistant powered by a custom RAG (Retrieval-Augmented Generation) pipeline using Google Gemini.
This project showcases:
- 📚 MkDocs Documentation Site - Beautiful, searchable documentation with Material theme
- 💬 Chat Assistant - Ask questions in natural language and get answers from the docs
- 🔍 RAG Pipeline - Custom retrieval system using Gemini embeddings and ChromaDB
- 📎 Source Citations - Every answer includes cited sections from the documentation
User Question → Frontend (MkDocs)
↓
Backend API (FastAPI)
↓
Query Embedding (Gemini)
↓
Vector Search (ChromaDB + pgvector)
↓
Retrieve Top-K Chunks
↓
Build Prompt + Context
↓
Gemini Generate Answer
↓
Return Answer + Citations
mkdocs_rag/
├── frontend/ # MkDocs documentation site
│ ├── docs/ # Markdown documentation files
│ │ ├── index.md
│ │ ├── chat.md # Chat interface page
│ │ ├── runbooks/ # Operational runbooks
│ │ ├── howtos/ # How-to guides
│ │ └── policies/ # Company policies
│ ├── mkdocs.yml # MkDocs configuration
│ └── requirements.txt
│
├── backend/ # FastAPI RAG service
│ ├── rag/ # RAG pipeline components
│ │ ├── vector_store.py # Vector storage (ChromaDB)
│ │ ├── ingestion.py # Document chunking & embedding
│ │ ├── retriever.py # Query & answer generation
│ │ └── models.py # Data models
│ ├── scripts/
│ │ └── index_docs.py # Index documentation
│ ├── tests/
│ ├── main.py # FastAPI application
│ ├── config.py # Configuration
│ ├── pyproject.toml # uv dependencies
│ └── uv.lock # Dependency lock file
│
├── notebooks/ # Interactive learning notebooks
│ ├── 01_local_rag_no_cloud.ipynb # Local RAG basics
│ └── 02_simple_vertex_ai_rag.ipynb # Vertex AI RAG
│
└── README.md
- Python 3.12+
- uv package manager
- Google Gemini API key (Get one here)
# Clone the repository
git clone <repository-url>
cd mkdocs_rag
# Set your Gemini API key
export GOOGLE_API_KEY=your_key_here
# Run setup script (installs dependencies and indexes docs)
./setup.sh
# Start both backend and frontend services
./run.shThen open http://localhost:8000 in your browser!
Press Ctrl+C to stop both services.
git clone <repository-url>
cd mkdocs_ragcd backend
# Install dependencies using uv
uv sync
# Configure environment
cp .env.example .env
# Edit .env and add your GOOGLE_API_KEY
# Index the documentation
uv run python -m scripts.index_docs
# Start the API server
uv run uvicorn main:app --reloadThe backend API will be available at http://localhost:8000
Open a new terminal:
cd frontend
pip install -r requirements.txt
# Start the MkDocs server
mkdocs serveThe documentation site will be available at http://localhost:8000 (or 8001 if 8000 is taken)
- Open the documentation site in your browser
- Navigate to the "Chat Assistant" page
- Ask questions like:
- "How do I handle a SEV-1 incident?"
- "What is the deployment process?"
- "How do I request production database access?"
We've created hands-on Jupyter notebooks that take you from zero to building RAG Applications with Vertex AI using Google's RAG Engine.
Follow these notebooks in order for the best learning experience:
File: notebooks/01_local_rag_no_cloud.ipynb
Duration: 5-10 minutes
Prerequisites: None
Cost: $0 (runs entirely locally in colab)
Best for: Understanding RAG fundamentals, privacy-conscious use cases
Learn the core concepts of RAG using local tools: HuggingFace embeddings, FAISS vector store, and local LLM inference. Perfect for understanding how each component works without needing any cloud accounts.
File: notebooks/02_simple_vertex_ai_rag.ipynb
Duration: 10-15 minutes
Prerequisites: Google Cloud account, billing enabled
Cost: ~$0.10-0.50 per run
Best for: Quick production deployment with minimal code
Experience the managed RAG approach with Vertex AI's RAG Engine, handling infrastructure automatically. See how Google's managed service can accelerate your RAG deployments.
Here is a great source to learn more about Google's RAG Engine capabilities and best practices
📺 Video Tutorial: Building RAG Applications with Vertex AI
Open them directly in Google Colab using the badges in each notebook!
- Modern Material Design theme with light/dark mode
- Full-text search
- Responsive navigation
- Syntax highlighting
- Mobile-friendly
- Natural language question answering
- Multiple AI model options (Gemini, Groq Llama, Mixtral)
- Semantic search across all documentation
- Source citations with direct links
- Context-aware responses
- Clean, intuitive interface
- Document chunking by headers with overlap
- Gemini text embeddings (models/embedding-001)
- ChromaDB vector storage
- Gemini 2.5 Flash for answer generation
- Configurable retrieval parameters
Edit backend/.env:
# API Keys
GOOGLE_API_KEY=your_key_here
GROQ_API_KEY=your_groq_key_here # Optional, for Llama models
# Paths
DOCS_PATH=../frontend/docs
CHROMA_PERSIST_DIR=./chroma_db
# RAG Parameters
EMBEDDING_MODEL=models/embedding-001
GENERATION_MODEL=gemini-2.5-flash
GROQ_GENERATION_MODEL=llama-3.1-8b-instant
CHUNK_SIZE=500
CHUNK_OVERLAP=100
TOP_K_RESULTS=5Edit frontend/mkdocs.yml:
extra:
backend_api_url: http://localhost:8000 # Change for productionGET /- API informationGET /health- Health checkPOST /api/chat- Chat with documentation{ "question": "How do I deploy to production?", "model": "groq-llama3" // optional, defaults to "gemini" }GET /api/models- Get list of available modelsPOST /api/reindex- Rebuild vector index (see Reindexing section)GET /docs- Interactive API documentation (Swagger UI)
The vector index needs to be rebuilt whenever documentation content changes. Reindexing processes all markdown files, creates embeddings, and updates the vector store.
- After adding, modifying, or deleting documentation files
- When updating the documentation structure or content
- If search results seem outdated or incomplete
- After initial setup (first-time indexing)
Option 1: Using the API endpoint (recommended)
curl -X POST http://localhost:8000/api/reindexOption 2: Using the indexing script
cd backend
uv run python -m scripts.index_docs- Clear existing index - Removes all existing vectors from the store
- Scan documentation - Finds all
.mdfiles in the configuredDOCS_PATH - Parse and chunk - Splits documents by headers with configurable overlap
- Generate embeddings - Creates vector embeddings using Gemini's embedding model
- Store vectors - Saves embeddings and metadata to ChromaDB
{
"status": "success",
"chunks_indexed": 42
}- Reindexing can take several minutes depending on the number of documents
- The API remains available during reindexing, but may return stale results until complete
- For production deployments, consider scheduling periodic reindexing or triggering it via CI/CD when docs change
cd backend
uv run pytest tests/See backend/tests/demo_questions.md for a curated list of questions to demonstrate the system.
# Build and push container from project root
gcloud builds submit --tag gcr.io/PROJECT_ID/mkdocs-rag-backend -f backend/Dockerfile .
# Deploy to Cloud Run
gcloud run deploy mkdocs-rag-backend \
--image gcr.io/PROJECT_ID/mkdocs-rag-backend \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars GOOGLE_API_KEY=your_keycd frontend
# Build static site
mkdocs build
# Deploy to Firebase
firebase deploy
# Or upload to Cloud Storage
gsutil -m rsync -r site/ gs://your-bucket/Set these in Cloud Run:
GOOGLE_API_KEY- Your Gemini API keyDOCS_PATH- Path to docs in containerGENERATION_MODEL- Gemini model to use
- Add markdown files to
frontend/docs/ - Update navigation in
frontend/mkdocs.yml - Reindex the vector store to make new content searchable:
See the Reindexing section for more details.
curl -X POST http://localhost:8000/api/reindex
- Chunking Strategy: Edit
backend/rag/ingestion.py - Retrieval: Modify
backend/rag/retriever.py - Vector Store: Swap implementation in
backend/rag/vector_store.py
The HybridRetriever class in backend/rag/retriever.py provides an extension point for adding web-grounded search using Gemini with Google Search. This allows fallback to external sources when internal docs lack information.
- Check
GOOGLE_API_KEYis set in.env - Verify Python dependencies installed
- Check port 8000 is available
- Ensure backend is running
- Check
backend_api_urlinmkdocs.yml - Check browser console for errors
- Verify CORS settings in
backend/main.py
- Reindex the vector store:
curl -X POST http://localhost:8000/api/reindexor runuv run python -m scripts.index_docs - Check logs for embedding errors
- Verify
DOCS_PATHpoints to correct location - Ensure documents exist in the configured path
- Frontend: MkDocs, Material for MkDocs, Vanilla JavaScript
- Backend: FastAPI, Python 3.12+
- LLM: Google Gemini (embeddings + generation)
- Vector Store: ChromaDB (demo) / PostgreSQL + pgvector (production)
- Deployment: GCP (Cloud Run, Firebase Hosting)
MIT License - feel free to use for your own projects!
This is a demo project, but suggestions and improvements are welcome!
- Built with MkDocs and Material for MkDocs
- Powered by Google Gemini
- Vector storage by ChromaDB