A cost-optimized chatbot that ingests a website URL, recursively crawls same-domain pages, builds a local FAISS vector index, and answers questions using Retrieval-Augmented Generation.
The project uses AWS Bedrock where it matters most:
- Amazon Titan Embeddings for converting website chunks into vectors
- Groq LLaMA (
llama-3.3-70b-versatile) for grounded answer generation - DynamoDB for optional conversation session memory
It deliberately avoids managed Bedrock knowledge bases, object-storage ingestion, and managed search infrastructure to prevent idle infrastructure cost.
The interface features a real-time 3D knowledge graph showing crawled pages as interconnected nodes, live confidence scoring, and per-namespace chat isolation.
Crawler -> Titan Embeddings (Bedrock) -> FAISS -> Groq LLaMA -> Answer
Website URL
|
v
Recursive crawler
|
v
Clean text + chunk content
|
v
Titan Embeddings via Bedrock
|
v
Local FAISS index
faiss_index/{namespace}/index.faiss
faiss_index/{namespace}/index.pkl
|
User question
|
v
Titan query embedding -> FAISS search -> Groq LLaMA -> Answer
|
v
Answer + confidence + source citations
| Feature | Details |
|---|---|
| Recursive crawler | Follows same-domain links up to configurable depth and page limits |
| Local vector store | Uses FAISS files on disk instead of managed search infrastructure |
| Bedrock embeddings | Uses amazon.titan-embed-text-v2:0 |
| Groq generation | Uses Groq LLaMA (llama-3.3-70b-versatile) |
| Source citations | Returns source URLs and relevant chunks with each answer |
| Confidence scoring | Labels answers as HIGH, MEDIUM, LOW, or FALLBACK |
| Session memory | Stores multi-turn chat history in DynamoDB |
| Multi-site support | Namespaced FAISS indexes allow multiple sites to be ingested simultaneously |
| SSE streaming | POST /chat/stream streams answer tokens in real time for a ChatGPT-like experience |
| Safer crawling | Blocks private and loopback IP ranges to reduce SSRF risk |
| 3D Knowledge Graph | React Three Fiber visualisation showing crawled pages as nodes; cited pages glow after each answer |
| Prompt injection detection | Blocks role override attempts, system prompt extraction, and jailbreak phrases at the API layer |
| FAISS poisoning prevention | Validates every chunk before embedding - rejects injected instructions and adversarial content |
| Layer | Technology |
|---|---|
| API | FastAPI + Uvicorn |
| Crawler | httpx |
| Embeddings | Amazon Titan Embeddings via Bedrock |
| LLM | Groq LLaMA (llama-3.3-70b-versatile) |
| Vector search | FAISS local index |
| Session memory | DynamoDB |
| Frontend | React 19 + Vite + React Three Fiber + Tailwind CSS |
| Security | Custom prompt injection detector + chunk content validator |
| Testing | pytest (35 tests) |
| Component | Service | Demo Cost |
|---|---|---|
| Crawling | Local Python/httpx | $0 |
| Embeddings | Titan Embeddings via Bedrock | ~$0.00002 per 1K tokens |
| Vector store | Local FAISS files | $0 |
| Generation | Groq LLaMA | $0 |
| Session memory | DynamoDB free tier/pay-per-request | ~$0 for demo usage |
The crawler output is embedded immediately and saved to local FAISS files. This keeps vector storage and generation free while still showing practical AWS Bedrock integration for embeddings.
| Feature | Details |
|---|---|
| Prompt injection detection | Detects role override attempts (ignore previous instructions, act as, jailbreak), system prompt extraction (reveal your prompt, show your system prompt), and known jailbreak phrases (DAN mode, developer mode enabled) before the message reaches the LLM |
| FAISS index poisoning prevention | Validates every chunk before embedding - rejects injected instructions, excessive repetition, low information density, high special character ratio, and known jailbreak phrases |
| SSRF protection | Crawler blocks private IP ranges (10.x.x.x, 192.168.x.x, 127.x.x.x) to prevent server-side request forgery |
| Message sanitization | Strips null bytes, collapses whitespace, removes non-printable characters, truncates to 4000 characters |
Prompt injection attempts return HTTP 400 with a structured error response. The frontend displays a red security alert bubble instead of passing the message to the LLM.
- Python 3.11+
- AWS credentials with Bedrock Runtime access
- Bedrock model access enabled for
amazon.titan-embed-text-v2:0 - Groq API key for
llama-3.3-70b-versatile - Node.js 18+ (for the frontend)
- Optional: DynamoDB permission if you want persistent session memory
git clone https://github.com/labeebkm/cost-optimized-rag-website-chatbot
cd rag-website-chatbot
pip install -r requirements.txt
cp .env.example .envEdit .env with your AWS credentials and Groq settings:
GROQ_API_KEY=your-groq-api-key
GROQ_MODEL_ID=llama-3.3-70b-versatile
AWS_ACCESS_KEY_ID=your-access-key-id
AWS_SECRET_ACCESS_KEY=your-secret-access-key
AWS_REGION=us-east-1Optional: create the DynamoDB session table:
python scripts/setup_aws.pyThe 3D chat UI is a React/Vite app located in the frontend/ folder.
cd frontend
npm installnpm run devOpen http://localhost:5173 in your browser.
npm run buildStart the backend:
uvicorn app.main:app --reload --port 8080Open the API docs:
http://localhost:8080/docs
POST /ingest
Content-Type: application/json
{
"url": "https://docs.python.org/3/",
"max_pages": 15,
"max_depth": 2
}Example response:
{
"job_id": "ingest-1715356800",
"status": "complete",
"message": "Crawled 12 pages from https://docs.python.org/3/. Indexed 48 chunks into namespace 'docs.python.org'.",
"url": "https://docs.python.org/3/",
"namespace": "docs.python.org",
"pages_crawled": 12,
"chunks_indexed": 48,
"index_path": "faiss_index/docs.python.org/index.faiss"
}POST /chat
Content-Type: application/json
{
"message": "What is a Python decorator?",
"namespace": "docs.python.org",
"session_id": null
}Example response:
{
"response": "A decorator is ...",
"session_id": "sess_a1b2c3d4e5f6",
"confidence": 0.87,
"confidence_label": "HIGH",
"source_chunks_used": 4,
"sources": [
{
"text": "Relevant website chunk...",
"score": 0.91,
"source_url": "https://docs.python.org/3/..."
}
],
"fallback_used": false
}POST /chat/stream
Content-Type: application/json
{
"message": "What is a Python decorator?",
"namespace": "docs.python.org",
"session_id": null
}Returns Server-Sent Events (SSE). First event contains metadata, subsequent events stream answer tokens, final event signals completion:
data: {"type": "meta", "session_id": "sess_abc123", "confidence": 0.87, "confidence_label": "HIGH", "sources": [...]}
data: {"type": "token", "text": "A "}
data: {"type": "token", "text": "decorator "}
data: {"type": "token", "text": "is ..."}
data: {"type": "done"}
{
"error": "prompt_injection_detected",
"detail": "Message contains patterns associated with prompt injection attacks.",
"reason": "role override attempt detected: ignore previous instructions"
}The confidence score is calculated from the top retrieved FAISS matches:
| Label | Score Range | Meaning |
|---|---|---|
| HIGH | >= 0.70 | Strong match in crawled website content |
| MEDIUM | 0.50-0.69 | Partial but useful match |
| LOW | 0.35-0.49 | Weak match |
| FALLBACK | < 0.35 | No reliable context found - refuses to answer rather than hallucinate |
rag-website-chatbot/
app/
main.py
config.py
core/
security.py <- prompt injection detection + message sanitization
exceptions.py
logging.py
routes/
ingest.py
chat.py
services/
crawler_service.py
bedrock_service.py
vector_store_service.py
rag_service.py
session_service.py
models/
schemas.py
frontend/
src/
components/
WebChatAI.tsx
KnowledgeGraph3D.tsx
KnowledgeGraph2D.tsx
Scene3D.tsx
lib/
api.ts
screenshots/
rag-website-chatbot-ui.png
architecture-diagram.png
scripts/
setup_aws.py
tests/
test_main.py <- 35 tests
requirements.txt
pytest tests -qExpected output: 35 passed
- Start the backend:
uvicorn app.main:app --reload --port 8080 - Start the frontend:
cd frontend && npm run dev - Open
http://localhost:5173 - Paste a public URL and click Ingest - watch the 3D knowledge graph populate with crawled pages as nodes
- Ask a factual question - observe the HIGH/MEDIUM confidence score and source citations
- Watch the cited nodes pulse/glow on the knowledge graph
- Ask a follow-up question - session memory carries context automatically
- Ask an off-topic question - observe the FALLBACK response (no hallucination)
- Switch to a second ingested site from the sidebar - previous chat history is preserved
- Type
"ignore previous instructions"- observe the red security alert bubble blocking the prompt injection attempt
- FAISS index is local to the machine running the API - a production deployment would use S3 + EFS for shared storage
- Current crawler focuses on HTML and plain text - PDF and table extraction can be added
- Rate limiting (slowapi) can be added to prevent API abuse
- A managed vector database (Pinecone, Weaviate) can replace local FAISS for multi-instance deployments
Labeeb K M

