Skip to content

labeebkm/cost-optimized-rag-website-chatbot

Repository files navigation

RAG-Powered Website Chatbot

A cost-optimized chatbot that ingests a website URL, recursively crawls same-domain pages, builds a local FAISS vector index, and answers questions using Retrieval-Augmented Generation.

The project uses AWS Bedrock where it matters most:

  • Amazon Titan Embeddings for converting website chunks into vectors
  • Groq LLaMA (llama-3.3-70b-versatile) for grounded answer generation
  • DynamoDB for optional conversation session memory

It deliberately avoids managed Bedrock knowledge bases, object-storage ingestion, and managed search infrastructure to prevent idle infrastructure cost.

Demo UI Screenshot

RAG Website Chatbot UI

The interface features a real-time 3D knowledge graph showing crawled pages as interconnected nodes, live confidence scoring, and per-namespace chat isolation.

Architecture

Architecture Diagram

Crawler -> Titan Embeddings (Bedrock) -> FAISS -> Groq LLaMA -> Answer
Website URL
    |
    v
Recursive crawler
    |
    v
Clean text + chunk content
    |
    v
Titan Embeddings via Bedrock
    |
    v
Local FAISS index
  faiss_index/{namespace}/index.faiss
  faiss_index/{namespace}/index.pkl
    |
User question
    |
    v
Titan query embedding -> FAISS search -> Groq LLaMA -> Answer
    |
    v
Answer + confidence + source citations

Features

Feature Details
Recursive crawler Follows same-domain links up to configurable depth and page limits
Local vector store Uses FAISS files on disk instead of managed search infrastructure
Bedrock embeddings Uses amazon.titan-embed-text-v2:0
Groq generation Uses Groq LLaMA (llama-3.3-70b-versatile)
Source citations Returns source URLs and relevant chunks with each answer
Confidence scoring Labels answers as HIGH, MEDIUM, LOW, or FALLBACK
Session memory Stores multi-turn chat history in DynamoDB
Multi-site support Namespaced FAISS indexes allow multiple sites to be ingested simultaneously
SSE streaming POST /chat/stream streams answer tokens in real time for a ChatGPT-like experience
Safer crawling Blocks private and loopback IP ranges to reduce SSRF risk
3D Knowledge Graph React Three Fiber visualisation showing crawled pages as nodes; cited pages glow after each answer
Prompt injection detection Blocks role override attempts, system prompt extraction, and jailbreak phrases at the API layer
FAISS poisoning prevention Validates every chunk before embedding - rejects injected instructions and adversarial content

Tech Stack

Layer Technology
API FastAPI + Uvicorn
Crawler httpx
Embeddings Amazon Titan Embeddings via Bedrock
LLM Groq LLaMA (llama-3.3-70b-versatile)
Vector search FAISS local index
Session memory DynamoDB
Frontend React 19 + Vite + React Three Fiber + Tailwind CSS
Security Custom prompt injection detector + chunk content validator
Testing pytest (35 tests)

Cost Optimization

Component Service Demo Cost
Crawling Local Python/httpx $0
Embeddings Titan Embeddings via Bedrock ~$0.00002 per 1K tokens
Vector store Local FAISS files $0
Generation Groq LLaMA $0
Session memory DynamoDB free tier/pay-per-request ~$0 for demo usage

The crawler output is embedded immediately and saved to local FAISS files. This keeps vector storage and generation free while still showing practical AWS Bedrock integration for embeddings.

Security

Feature Details
Prompt injection detection Detects role override attempts (ignore previous instructions, act as, jailbreak), system prompt extraction (reveal your prompt, show your system prompt), and known jailbreak phrases (DAN mode, developer mode enabled) before the message reaches the LLM
FAISS index poisoning prevention Validates every chunk before embedding - rejects injected instructions, excessive repetition, low information density, high special character ratio, and known jailbreak phrases
SSRF protection Crawler blocks private IP ranges (10.x.x.x, 192.168.x.x, 127.x.x.x) to prevent server-side request forgery
Message sanitization Strips null bytes, collapses whitespace, removes non-printable characters, truncates to 4000 characters

Prompt injection attempts return HTTP 400 with a structured error response. The frontend displays a red security alert bubble instead of passing the message to the LLM.

Prerequisites

  • Python 3.11+
  • AWS credentials with Bedrock Runtime access
  • Bedrock model access enabled for amazon.titan-embed-text-v2:0
  • Groq API key for llama-3.3-70b-versatile
  • Node.js 18+ (for the frontend)
  • Optional: DynamoDB permission if you want persistent session memory

Backend Setup

git clone https://github.com/labeebkm/cost-optimized-rag-website-chatbot
cd rag-website-chatbot
pip install -r requirements.txt
cp .env.example .env

Edit .env with your AWS credentials and Groq settings:

GROQ_API_KEY=your-groq-api-key
GROQ_MODEL_ID=llama-3.3-70b-versatile
AWS_ACCESS_KEY_ID=your-access-key-id
AWS_SECRET_ACCESS_KEY=your-secret-access-key
AWS_REGION=us-east-1

Optional: create the DynamoDB session table:

python scripts/setup_aws.py

Frontend Setup

The 3D chat UI is a React/Vite app located in the frontend/ folder.

cd frontend
npm install

Run (development)

npm run dev

Open http://localhost:5173 in your browser.

Build (production)

npm run build

Run

Start the backend:

uvicorn app.main:app --reload --port 8080

Open the API docs:

http://localhost:8080/docs

API Usage

Ingest a Website

POST /ingest
Content-Type: application/json

{
  "url": "https://docs.python.org/3/",
  "max_pages": 15,
  "max_depth": 2
}

Example response:

{
  "job_id": "ingest-1715356800",
  "status": "complete",
  "message": "Crawled 12 pages from https://docs.python.org/3/. Indexed 48 chunks into namespace 'docs.python.org'.",
  "url": "https://docs.python.org/3/",
  "namespace": "docs.python.org",
  "pages_crawled": 12,
  "chunks_indexed": 48,
  "index_path": "faiss_index/docs.python.org/index.faiss"
}

Chat

POST /chat
Content-Type: application/json

{
  "message": "What is a Python decorator?",
  "namespace": "docs.python.org",
  "session_id": null
}

Example response:

{
  "response": "A decorator is ...",
  "session_id": "sess_a1b2c3d4e5f6",
  "confidence": 0.87,
  "confidence_label": "HIGH",
  "source_chunks_used": 4,
  "sources": [
    {
      "text": "Relevant website chunk...",
      "score": 0.91,
      "source_url": "https://docs.python.org/3/..."
    }
  ],
  "fallback_used": false
}

Streaming Chat

POST /chat/stream
Content-Type: application/json

{
  "message": "What is a Python decorator?",
  "namespace": "docs.python.org",
  "session_id": null
}

Returns Server-Sent Events (SSE). First event contains metadata, subsequent events stream answer tokens, final event signals completion:

data: {"type": "meta", "session_id": "sess_abc123", "confidence": 0.87, "confidence_label": "HIGH", "sources": [...]}

data: {"type": "token", "text": "A "}
data: {"type": "token", "text": "decorator "}
data: {"type": "token", "text": "is ..."}

data: {"type": "done"}

Prompt Injection Blocked (HTTP 400)

{
  "error": "prompt_injection_detected",
  "detail": "Message contains patterns associated with prompt injection attacks.",
  "reason": "role override attempt detected: ignore previous instructions"
}

Confidence Score

The confidence score is calculated from the top retrieved FAISS matches:

Label Score Range Meaning
HIGH >= 0.70 Strong match in crawled website content
MEDIUM 0.50-0.69 Partial but useful match
LOW 0.35-0.49 Weak match
FALLBACK < 0.35 No reliable context found - refuses to answer rather than hallucinate

Project Structure

rag-website-chatbot/
  app/
    main.py
    config.py
    core/
      security.py         <- prompt injection detection + message sanitization
      exceptions.py
      logging.py
    routes/
      ingest.py
      chat.py
    services/
      crawler_service.py
      bedrock_service.py
      vector_store_service.py
      rag_service.py
      session_service.py
    models/
      schemas.py
  frontend/
    src/
      components/
        WebChatAI.tsx
        KnowledgeGraph3D.tsx
        KnowledgeGraph2D.tsx
        Scene3D.tsx
      lib/
        api.ts
  screenshots/
    rag-website-chatbot-ui.png
    architecture-diagram.png
  scripts/
    setup_aws.py
  tests/
    test_main.py          <- 35 tests
  requirements.txt

Run Tests

pytest tests -q

Expected output: 35 passed

Demo Flow

  1. Start the backend: uvicorn app.main:app --reload --port 8080
  2. Start the frontend: cd frontend && npm run dev
  3. Open http://localhost:5173
  4. Paste a public URL and click Ingest - watch the 3D knowledge graph populate with crawled pages as nodes
  5. Ask a factual question - observe the HIGH/MEDIUM confidence score and source citations
  6. Watch the cited nodes pulse/glow on the knowledge graph
  7. Ask a follow-up question - session memory carries context automatically
  8. Ask an off-topic question - observe the FALLBACK response (no hallucination)
  9. Switch to a second ingested site from the sidebar - previous chat history is preserved
  10. Type "ignore previous instructions" - observe the red security alert bubble blocking the prompt injection attempt

Limitations and Future Improvements

  • FAISS index is local to the machine running the API - a production deployment would use S3 + EFS for shared storage
  • Current crawler focuses on HTML and plain text - PDF and table extraction can be added
  • Rate limiting (slowapi) can be added to prevent API abuse
  • A managed vector database (Pinecone, Weaviate) can replace local FAISS for multi-instance deployments

Author

Labeeb K M

About

RAG-powered chatbot that crawls any website and answers questions using Groq LLaMA + AWS Bedrock Titan Embeddings + FAISS.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors