This project is a sophisticated, full-stack customer support chatbot that leverages a Retrieval-Augmented Generation (RAG) pipeline. It's designed to provide intelligent, context-aware responses by drawing from both unstructured documents and structured databases. The application is fully containerized with Docker for easy setup and deployment.
The following sequence demonstrates a typical customer support interaction flow, showing how the chatbot handles various scenarios and escalates to human agents when needed:
The customer initiates a conversation by asking about the return policy. The chatbot uses its RAG pipeline to retrieve relevant information from the policy documents and provides a comprehensive response.
When the customer needs specific order information or requests to speak with a human agent, the chatbot can handle both structured data queries and escalation requests.
When a customer requests human assistance, the system automatically sends a notification to the support team via Slack integration, ensuring timely response to escalations.
A human support agent receives the notification and can review the conversation history before taking over the chat session.
The human agent seamlessly takes over the conversation, with full context of the previous interactions, ensuring continuity in customer service.
The customer continues to receive personalized support from the human agent, with the system maintaining the conversation flow and history.
The application is designed with a modern, services-oriented architecture, consisting of a frontend web application, a backend API, and a suite of data stores. The entire system is containerized using Docker, making it portable and easy to deploy.
Here’s a breakdown of the main components:
-
Frontend: A single-page application (SPA) built with React and TypeScript, using Vite for the build tooling. It provides the user interface for the chatbot and the agent dashboard. The frontend interacts with the backend via a REST API. It is served by Nginx in the Docker setup.
-
Backend API: A robust backend service built with FastAPI (Python). It serves as the central hub of the system, handling business logic, user authentication, and orchestrating the RAG pipeline. It exposes several endpoints for managing chat sessions, ingesting data, and handling escalations.
-
Data Stores: The system utilizes a polyglot persistence approach, using different databases for different purposes:
- PostgreSQL: A relational database used to store structured data, such as customer and order information.
- MongoDB: A NoSQL document database, likely used for storing chat histories and other semi-structured data.
- Redis: An in-memory data store used for caching and managing user sessions, ensuring fast access to session data.
- Pinecone: A managed vector database used for the RAG pipeline. It stores document embeddings for efficient similarity search and also serves as a semantic cache for chat responses.
-
Integrations:
- OpenAI: The system leverages OpenAI's language models (e.g.,
gpt-4o-mini) for various tasks within the RAG pipeline, including query classification, response generation, and groundedness checking. - Slack: The application is integrated with Slack to send real-time alerts when a user escalates a conversation to a human agent.
- OpenAI: The system leverages OpenAI's language models (e.g.,
The RAG pipeline is the core of the chatbot's intelligence. It is implemented as a sophisticated state machine using LangGraph, which allows for a flexible and powerful flow of logic. The pipeline processes user queries to generate accurate, context-aware, and grounded responses.
The pipeline consists of several nodes, each performing a specific task:
-
Router: This is the entry point of the pipeline. It uses an LLM to classify the user's query into one of several predefined categories (e.g.,
chitchat,order_lookup,escalation). This classification determines the subsequent path through the graph, deciding whether to retrieve data from the SQL database, the document store, or both. -
Cache Check: Before executing the full pipeline, the system checks a semantic cache (powered by Pinecone) for similar, previously answered queries. If a sufficiently similar query is found, the cached response is returned immediately, reducing latency and cost.
-
SQL Retrieval: If the router determines that the query requires specific information about an order or customer, this node connects to the PostgreSQL database to fetch the relevant data. This allows the chatbot to answer questions like "What is the status of my order?".
-
Document Retrieval: For queries related to policies, product information, or other general knowledge, this node retrieves relevant documents from the Pinecone vector store. The retrieved documents are then passed through a reranker to ensure that only the most relevant information is used to generate the answer.
-
Generation: This is the heart of the RAG pipeline. It uses a powerful LLM to synthesize an answer based on all the information gathered in the previous steps, including the original user query, data from the SQL database, content from the retrieved documents, and the recent conversation history.
-
Groundedness Check: After a response is generated, this final node acts as a quality control step. It uses an LLM to verify that the generated answer is directly supported by the information retrieved from the database or documents. If the answer is found to be "ungrounded," the system can attempt to regenerate it with feedback, ensuring higher accuracy and reducing hallucinations.
- Conversational AI: A session-aware chat interface that maintains conversation history.
- Hybrid RAG Pipeline:
- Retrieves information from PDF documents (e.g., policies, guides) using vector search with Pinecone.
- Queries structured data from a PostgreSQL database (e.g., customer info, order details) using natural language to SQL translation.
- Semantic Caching: Reduces latency and API costs by caching similar queries and their responses.
- Modern Tech Stack:
- Frontend: React, TypeScript, Vite, and Tailwind CSS.
- Backend: FastAPI, Python 3.12, and LangGraph for building the RAG pipeline.
- Databases: PostgreSQL for structured data, MongoDB for chat history, Redis for session management, and Pinecone for vector storage.
- Dockerized Environment: The entire application stack can be spun up with a single
docker-composecommand. - Data Ingestion: REST API endpoints for ingesting both PDF documents and CSV files into the system.
- Agent Escalation: A system for flagging conversations for review by a human agent.
The application is composed of several services that work together:
- Frontend: A React-based single-page application that provides the user interface for the chatbot.
- API Backend: A FastAPI application that exposes REST endpoints for chat, session management, data ingestion, and more.
- RAG Pipeline: Built with LangGraph, this is the core of the chatbot's intelligence. It orchestrates a series of nodes to process user queries:
- Router: Determines the type of query (e.g., a policy question or a database lookup).
- Cache Check: Checks if a semantically similar query has been answered before.
- SQL Retriever: If needed, translates the natural language query into SQL, executes it against the PostgreSQL database, and retrieves the results.
- Document Retriever: If needed, performs a vector search in Pinecone to find relevant document snippets.
- Generate: Uses a large language model (like OpenAI's GPT) to synthesize an answer based on the retrieved context.
- Groundedness Check: Verifies that the generated answer is grounded in the retrieved information to prevent hallucinations.
- Databases:
- PostgreSQL: Stores structured business data like customers, products, and orders.
- MongoDB: Persists long-term chat history for each session.
- Redis: Manages active user sessions and caches recent messages for quick access.
- Pinecone: Stores vector embeddings of documents for efficient similarity search.
.
├── app/ # FastAPI application code
│ ├── api/ # API endpoints (routers)
│ └── streamlit/ # (Optional) Streamlit demo UI
├── data/ # Sample data (PDFs and CSVs)
├── frontend/ # React frontend application
│ └── src/
├── src/ # Core Python source code
│ ├── graph/ # LangGraph RAG pipeline definition
│ ├── ingestion/ # Data ingestion scripts
│ ├── persistence/ # Database interaction logic
│ ├── retrievers/ # Document and SQL retrievers
│ └── vectorstores/ # Vector store integration
├── tests/ # Automated tests
├── docker-compose.yaml # Docker services definition
├── Dockerfile # Dockerfile for the backend API
└── README.md # This file
- Docker and Docker Compose
- An OpenAI API key (get one from OpenAI Platform)
- A Pinecone API key (get one from Pinecone Console)
- A Pinecone index named
ecomm-policies-v1with:- Dimensions: 1536 (for OpenAI's
text-embedding-3-smallmodel) - Metric: cosine
- Dimensions: 1536 (for OpenAI's
Create a .env file in the project root by copying the provided .env.example file:
cp .env.example .envThen edit the .env file and update the following required variables:
# Core API Keys (Required)
OPENAI_API_KEY="your_openai_api_key_here"
PINECONE_API_KEY="your_pinecone_api_key_here"
# Application Environment
ENVIRONMENT="dev"
# Database Connection Strings (Default values for Docker)
POSTGRES_DSN="postgresql+psycopg://user:password@localhost:5433/ecomm"
REDIS_URL="redis://localhost:6379/0"
MONGODB_URI="mongodb://localhost:27017"
# Frontend Configuration
FRONTEND_BASE_URL="http://localhost:3000"
# Admin User Credentials
ADMIN_EMAIL="admin@example.com"
ADMIN_PASSCODE="admin_passcode"
# Slack Integration (Optional - leave empty if not using)
SLACK_WEBHOOK_URL=""
SLACK_BOT_TOKEN=""
SLACK_CHANNEL_ID=""Important: You must provide your own OpenAI and Pinecone API keys. The other values can remain as defaults for local Docker development.
docker-compose up --buildThis will build the Docker images and start all the services. The first time you run this, it may take a few minutes to download the images and build the containers.
Once the services are running, you need to load the sample data.
Run the following curl command to load the CSV data into the PostgreSQL database:
curl -X POST "http://localhost:8000/v1/ingest/csv" \
-H "Content-Type: application/json" \
-d '{
"dsn": "postgresql+psycopg://user:password@postgres:5432/ecomm",
"customers_csv_path": "/app/data/fake_customers.csv",
"orders_csv_path": "/app/data/fake_orders.csv",
"products_csv_path": "/app/data/fake_products.csv"
}'To ingest the policy documents into Pinecone, run:
curl -X POST "http://localhost:8000/v1/ingest/docs" \
-H "Content-Type: application/json" \
-d '{
"sources": ["/app/data"]
}'- Frontend: http://localhost:3000
- Backend API Docs: http://localhost:8000/docs
You can log in with any email from fake_customers.csv and the passcode 12345. For example: dmadocjones0@oracle.com.
1. CSV Ingestion Fails with Authentication Error
- Ensure all environment variables are set correctly in your
.envfile - Verify that the PostgreSQL service is running:
docker-compose ps postgres - Check the logs:
docker-compose logs postgres
2. Document Ingestion Fails
- Verify your Pinecone API key is correct
- Ensure you've created the Pinecone index
ecomm-policies-v1with the correct dimensions (1536) and metric (cosine) - Check the API logs:
docker-compose logs api
3. Frontend Not Loading
- Ensure all services are running:
docker-compose ps - Try rebuilding:
docker-compose down && docker-compose up --build - Check if port 3000 is available:
lsof -i :3000
4. Services Won't Start
- Make sure Docker is running
- Check for port conflicts (ports 3000, 8000, 5433, 6379, 27017)
- Try:
docker-compose down && docker system prune -f && docker-compose up --build
For local development with hot-reloading:
- Start the databases in Docker:
docker-compose up postgres redis mongo -d
- Install Python dependencies and run the FastAPI server:
python -m venv venv source venv/bin/activate pip install -r requirements.txt uvicorn app.api.main:app --reload
- Start the backend services:
docker-compose up api postgres redis mongo -d
- Install Node.js dependencies and run the development server:
cd frontend npm install npm run dev
The project includes a suite of tests. To run them, first ensure you have the development dependencies installed:
pip install -r requirements.txtThen, run pytest from the project root:
pytest






