Veritas is a ChatGPT-style assistant powered by Groq LLaMA 4 with streaming responses, conversation history, and optional web-search augmentation.
- Real-time SSE streaming responses
- Conversation memory with CRUD endpoints
- Optional web search (DuckDuckGo) when the model requests
[SEARCH: ...] - Markdown + code highlighting in chat
- Message editing, regeneration, export, and responsive UI
- Persistent storage with Supabase when configured, with automatic in-memory fallback
index.html
backend/
app.py
config.py
database.py
web_search.py
services/chat_service.py
frontend/
css/styles.css
js/app.js
- Python 3.12+ recommended
- A Groq API key
Notes:
- Python 3.14 can run the app in memory mode, but Supabase dependency chains may require C++ build tools on Windows.
- For easiest full install (including Supabase), use Python 3.12.
python -m venv .venv
& ".venv/Scripts/Activate.ps1"python -m pip install -r backend/requirements.txtIf Supabase-related wheels fail on Python 3.14, you can still run in memory mode with:
python -m pip install flask flask-cors groq python-dotenv requests flask-limiterCreate backend/.env with at least:
GROQ_API_KEY=your_groq_api_keyOptional values:
# Server
HOST=127.0.0.1
PORT=5000
FLASK_DEBUG=True
LOG_LEVEL=INFO
# Security
API_AUTH_TOKEN=
CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000
MAX_MESSAGE_LENGTH=10000
MAX_QUERY_LENGTH=512
MAX_TITLE_LENGTH=120
MAX_CONTENT_LENGTH_BYTES=1048576
RATE_LIMIT_DEFAULT=200 per day;50 per hour
RATE_LIMIT_CHAT=20 per minute
RATE_LIMIT_SEARCH=30 per minute
# Optional Supabase
SUPABASE_URL=
SUPABASE_KEY=python backend/app.pyOpen:
POST /api/chat- Body:
{ "message": "...", "conversation_id": "...", "stream": true|false }
- Body:
POST /api/search- Body:
{ "query": "..." }
- Body:
POST /api/research- Body:
{ "question": "...", "depth": "quick|deep", "stream": true|false } - Returns: plan, retrieval rounds, synthesis report, sources, and stats
- Body:
GET /api/conversationsGET /api/conversations/<id>PUT /api/conversations/<id>DELETE /api/conversations/<id>DELETE /api/conversations
| Shortcut | Action |
|---|---|
| Enter | Send message |
| Shift + Enter | New line |
| Ctrl + N | New chat |
| Esc | Stop generation |
| Ctrl + Shift + C | Copy latest assistant reply |
Your app currently supports a single-hop search flow:
- Model replies with
[SEARCH: query] - Backend fetches results
- Model writes final answer
To make it true deep research, implement a multi-step research loop.
- Add a planner phase.
- Generate a research plan with sub-questions (not final answer yet).
- Add iterative retrieval.
- Execute multiple search rounds per sub-question.
- Keep a
research_stateobject: queries used, sources found, confidence, unresolved questions.
- Add source quality scoring.
- Score by recency, domain trust, cross-source agreement, and specificity.
- Prefer primary sources over summaries.
- Add evidence synthesis.
- Build answer from claims + citations, not from a single model pass.
- Keep quote snippets and URL metadata in memory while generating.
- Add contradiction checks.
- If high-impact claims disagree, run reconciliation queries before final output.
- Add explicit uncertainty.
- Return confidence per section and list what is still unknown.
- Create
backend/services/research_service.py.
- Method:
run_deep_research(question, max_rounds=4) - Output structure:
{
"summary": "...",
"findings": [{"claim": "...", "sources": ["..."]}],
"gaps": ["..."],
"confidence": "high|medium|low",
"sources": [{"title": "...", "url": "...", "snippet": "..."}]
}- Add a new endpoint in
backend/app.py.
POST /api/research- Body:
{ "question": "...", "depth": "quick|deep" }
- Update
backend/services/chat_service.py.
- Route complex prompts to research mode based on intent.
- Return progressive events for each research round during streaming.
- Improve
backend/web_search.py.
- Add deduplication, retries, domain filtering, and result caching.
- Update frontend in
frontend/js/app.js.
- Show research timeline UI: planning, retrieval rounds, synthesis, final report.
Use a strict output contract at each stage:
- Stage 1 (Plan): return only JSON plan and queries.
- Stage 2 (Evidence): return normalized evidence records.
- Stage 3 (Synthesis): return claims with cited sources.
- Stage 4 (Audit): return uncertainty and missing evidence.
This separation reduces hallucination and makes behavior testable.
- Configure
API_AUTH_TOKENto enforce authenticated API calls. - Set
CORS_ORIGINSto your frontend origin(s) in production. - Current rate limits use in-memory storage by default; use Redis for production-scale limiter storage.
- Backend: Flask, Groq SDK, Flask-CORS, Flask-Limiter
- Frontend: Vanilla JS, CSS, Marked.js, Highlight.js
- Optional persistence: Supabase
MIT License
Developed by Dhanush Pillay