Traditional FAQ pages often require users to manually search through information, leading to frustration and inefficiency. This is a Retrieval-Augmented Generation (RAG) chatbot application built with FastAPI.
It answers user questions based on a set of documents stored in the ./docs folder.
In this example, I used Singapore Airlines’ FAQ. Therefore, it addresses the traditional, rigid way of FAQ by enabling users to ask natural-language questions and receive precise, context-grounded answers directly from Singapore Airlines’ FAQ documents.
The system combines:
- FastAPI for serving API requests
- LLM model for answer generation
- Vector database + embeddings for retrieval
- Docker for local development and deployment
- AWS Lambda + ECR for serverless production hosting
This makes the chatbot fast, cost-efficient, and easy to deploy.
- Loads your Singapore Airlines FAQ or any documents from
./docs - Converts them into embeddings
- Stores them in a vector store
- Accepts questions via an API (
/rag) - Retrieves the most relevant chunks
- Uses the LLM to generate an accurate answer
- Runs locally with Docker or deploys to AWS Lambda
┌─────────────────────┐
│ User / UI │
└──────────┬──────────┘
│ HTTP POST /rag
▼
┌───────────────────┐
│ FastAPI │
│ (Mangum Adapter) │
└──────────┬────────┘
│ Calls RAG pipeline
▼
┌─────────────────────────┐
│ Retriever │
│ (Embeddings + VectorDB) │
└──────────┬──────────────┘
│ Top-k chunks
▼
┌──────────────────────┐
│ LLM │
└──────────┬───────────┘
│ Final Answer
▼
┌──────────────────────┐
│ FastAPI │
└──────────────────────┘
These are the local setup steps. Use these before running or building the Docker image (recommended for development).
- Clone the repo
git clone https://github.com/<your-username>/<your-repo>.git
cd <your-repo>- Create & activate a Python virtual environment
macOS / Linux:
python3 -m venv venv
source venv/bin/activateWindows (PowerShell):
python -m venv venv
venv\Scripts\Activate.ps1Windows (cmd):
python -m venv venv
venv\Scripts\activate.bat- Install dependencies
pip install --upgrade pip
pip install -r requirements.txtIf you use requirements-dev.txt for development dependencies, install it too:
pip install -r requirements-dev.txt- Copy / create
.env
If the repo contains .env.example:
cp .env.example .envOpen .env and update these (example keys):
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=
LANGCHAIN_PROJECT="rag-demo"
FASTAPI_URL=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_DEFAULT_REGION=us-east-1
Notes:
- Do not add quotes around values unless the value itself contains spaces and you want them to be literal.
- Keep
.envout of version control;.gitignoreshould contain.env.
- If you want to initialize vectorstore / index documents
To build your own vector database (vectorstore) using the documents in the ./docs directory, run the indexing script now.:
python app/preprocessing/build_vector_db.pyThis script:
- loads everything inside ./docs/
- splits into chunks
- embeds each chunk
- inserts the embeddings into the vectorstore ( CHROMA_DB_DIR = "../../chroma_db")
You run this script only once, or whenever your documents change.
This process is called indexing where we
-
Convert your documents into embeddings (e.g., Singapore Airlines FAQ PDFs, text files, etc.)
-
Store those embeddings in a vector database (e.g., Chroma, FAISS, SQLite-based vector store, etc.)
docker build -t rag-fastapi .docker run --env-file .env -p 8000:8000 rag-fastapi
curl -XPOST "http://localhost:8000/rag" \
-H "Content-Type: application/json" \
-d '{"question": "What is your baggage policy?"}'
Or open interactive docs:
http://localhost:8000/docs
Instead of manually creating Lambda and ECR resources, I use AWS CDK to define cloud infrastructure with Python. This ensures consistent, repeatable deployments.
- Create the infra directory under the project and init a Python CDK app
mkdir -p infra
cd infra
cdk init app --language python
- Create and activate a Python virtualenv (inside infra/)
python3 -m venv .venv
source .venv/bin/activate # macOS / Linux
- Install CDK runtime libs
pip install --upgrade pip
pip install aws-cdk-lib constructs
After cdk init, you’ll have a folder name,infra — e.g. infra/infra/infra_stack.py. Put infra_stack.py inside that package directory, it becomes infra/infra_stack.py
Edit app.py (the CDK entrypoint) to import and instantiate your stack.
Build your stack in infra_stack.py In infra_stack.py:
class FastapiLambdaCdkStack(Stack):
def __init__(self, scope: Construct, id: str, **kwargs):
super().__init__(scope, id, **kwargs)
# Lambda from Docker image
docker_lambda = _lambda.DockerImageFunction(
self, "FastApiDockerLambda",
code=_lambda.DockerImageCode.from_image_asset("../",
file="Dockerfile.api",
exclude=[
"infra/cdk.out", # CDK output
"infra/.venv", # if CDK has its own venv
"venv", # app venv
"*.md", # README files
"*.bat", # scripts like source.bat
"requirements-dev.txt" # dev-only dependencies
]), #custom Dockerfile name
timeout = Duration.seconds(120),
memory_size = 1024,
)
then in terminal
cdk deploy
CDK will build the Docker image in Dockerfile.api, uploads it to ECR, and deploys a Lambda using that image. Once it run sucessfully, the AWS Lambda API will be created and the link look something like this https://.execute-api.ap-southeast-1.amazonaws.com/prod/
You can then add this link in .env as your FASTAPI DIR URL (production) to your frontend code.
POST /rag
{
"question": "Does Singapore Airlines allow cabin pets?"
}
{
"answer": "Based on your documents: Singapore Airlines does not allow pets in the cabin, except for assistance dogs."
}Monitoring & Tracing with LangSmith
This project includes built‑in LangSmith tracing to help track and debug your RAG pipeline.
LangSmith is automatically enabled when you set the following in your .env:
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your_langsmith_api_key
LANGCHAIN_PROJECT=rag_demo
Tracing is integrated inside app/api/rag_router.py, where the RAG chain is invoked:
result = rag_chain.invoke({"question": request.question})
Every call to rag_chain.invoke() automatically generates a LangSmith trace.
- Full RAG pipeline breakdown (retriever → LLM → output)
- Input question & generated answer
- Token usage
- Latency per component
- Any errors or exceptions
Metadata such as model used, prompt templates, retriever hits
Visit LangSmith project: https://smith.langchain.com/
- Add API Key Authentication. This is to prevent unauthorized access and avoids unexpected costs from misuse of your API.
- Use an external vector database (e.g., Chroma or Aurora on EC2) instead of storing locally Ensures persistence, scalability, and faster retrieval. Lambda containers are ephemeral, so local storage is not reliable in production.
- Replace Streamlit with a production-ready front-end (Next.js). Streamlit is suitable for internal demos or prototypes, but not ideal for a production front-end due to limited UI flexibility, server-side rendering only, and scaling challenges. Next.js (or similar frameworks) allows better UI customization, supports streaming responses, and scales well with multiple users.