The GenAI chabot application that answers questions about recent financial news using a local vector database built from data/stock_news.json.
It uses LangChain + OpenAI for embeddings and response generation, and streams responses to the UI.
- NextJS, TailwindCSS, React
- LangChainJS for LLM integration
- LocalStorage for mimicking backend db
- JWT for auth
- HNSWLib for file system based vector db
- Vitest for test coverage
For this MVP, data vectorization is done locally and persisted to the filesystem. Scripts are provided to sanitize data and build embeddings on demand.
- Accept username and password for login.
- On successful login, issue
access_token(HttpOnly cookie) andchat_identifier. - Use chat session identifier in the query string. If identifier is missing then show login, otherwise show chat.
- Chat component:
- Text area for typing a question and send button
- Show conversation history on top of the input field.
- show loading state when waiting on the response from the backend.
- stream response chunk back from api to frontend.
- API
- GET /healthcheck - check if api is running
- POST /login - authenticate
- POST /logout - clear access token
- POST /chat/{identifier} - post a question to backend and get response
- Data
- vector_db (HNSWLib) - data indexes
- stock_news.json - original data
- stock_news_sanitized.json - sanitized data
- Lib
- index_docs - generate embeddings
- auth - jwt and validate required auth
- llms - adapters for integration with llms, client give access to llm.
- vector_store - adapters for integrating with hnsw, client provide access to implemented vector db.
- security - prompt injection validation
- Tests
- Vitest coverage for core lib utilities
- Use LocalStorage for conversation history persistence (mimics a database).
- Protected routes validate the JWT from the
access_tokencookie. - Environment variables are loaded from
.env.
This is a Next.js project bootstrapped with create-next-app.
Create .env using .env.example as a template.
Tip
USER_NAME and PASSWORD values are used for login.
JWT_SECRET is required for signing tokens.
OPENAI_API_KEY is required for embeddings + chat.
npm installnpm run devOpen http://localhost:3000 with your browser to see the result.
The sanitizer removes unicode artifacts, paywall snippets, and promotional/CTA text before embedding.
Sanitize content:
npm run sanitize:dataGenerate embeddings (writes to data/vector_db):
npm run build:indexnpm run testProvide samples in data/eval_samples.json, then run:
npm run eval:ragThis writes a summary to data/eval_results.json.
app/- Next.js routes and API handlerscomponents/- UI componentsdata/- Raw data, sanitized data, and vector DBlib/auth/- JWT helpers + auth guardllm/- LLM adapter + clientsecurity/- prompt injection checksvector_store/- vector DB adapter + client
scripts/- data sanitization scripttests/- Vitest tests
flowchart TB
UI[UI: React / Next.js App Router] -->|Login| LoginAPI[POST /api/login]
UI -->|Logout| LogoutAPI[POST /api/logout]
UI -->|Ask Question SSE| ChatAPI[POST /api/chat/:identifier]
LoginAPI --> JWT[JWT Signer]
JWT --> Cookie[HttpOnly access_token cookie]
ChatAPI --> Auth[Auth Guard JWT verify]
ChatAPI --> Injection[Prompt Injection Check]
ChatAPI --> Vector[Vector Store Client]
Vector --> HNSW[HNSWLib Vector DB]
HNSW --> Data[(data/vector_db)]
ChatAPI --> LLM[LLM Client]
LLM --> OpenAI[OpenAI Chat Model]
ChatAPI -->|SSE tokens| UI
Scripts[Sanitize + Index Scripts] --> DataFiles[(data/stock_news*.json)]
Scripts --> HNSW