| title | Agentic AI Research Assistant |
|---|---|
| emoji | π§ |
| colorFrom | blue |
| colorTo | purple |
| sdk | docker |
| app_file | app.py |
| pinned | false |
An autonomous AI agent built to tackle one of the biggest problems with LLMs: hallucinations.
When doing research, you can't afford confidently stated incorrect facts. This agent solves that by searching the web, evaluating its own findings, and fixing its mistakes before giving you an answer.
π Try it live on Hugging Face Spaces: Agentic AI Research Assistant
Instead of a simple "prompt -> response" pipeline, this agent uses a self-reflection loop (powered by LangGraph):
- Generate: It drafts an initial response using web search results.
- Critique: It fact-checks every single claim it just made against the real web evidence.
- Refine: If it isn't completely confident (score < 0.7), it goes back to search for more data and rewrites its answer.
graph TD
Start([User Query]) --> Agent
Agent{{Decide Action}}
Agent -->|Needs Info| Tools[Web Search / Summarize]
Tools --> Agent
Agent -->|Draft Response| Critic[Self-Reflection Node]
Critic -->|Confidence < 0.7| Agent
Critic -->|Confidence >= 0.7| Final([Final Response])
subgraph Reflection Loop
Critic -.->|Feedback + Retry| Agent
end
| Component | Technology |
|---|---|
| Agent framework | LangGraph |
| LLM | Groq (Llama 3.x) |
| Web search | Tavily / DuckDuckGo |
| Evaluation | Ragas |
| Tracing | LangSmith |
| Backend API | FastAPI |
| Frontend | Streamlit |
| Deployment | Docker, HuggingFace Spaces |
| CI/CD | GitHub Actions |
# Clone
git clone https://github.com/aniketpoojari/Agentic-AI-Research-Assistant.git
cd Agentic-AI-Research-Assistant
# Install
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Configure
cp .env.example .env
# Edit .env with your API keys (see Environment Variables below)
# Run
uvicorn main:app --reload # API at http://localhost:8000
streamlit run app.py # UI at http://localhost:8501docker build -t research-assistant .
docker run -p 7860:7860 -p 8000:8000 \
-e GROQ_API_KEY=your_key \
-e TAVILY_API_KEY=your_key \
research-assistant| Variable | Required | Description |
|---|---|---|
GROQ_API_KEY |
Yes | Groq API key for LLM inference |
TAVILY_API_KEY |
Yes | Tavily API key for web search |
MODEL_NAME |
No | Model name (default: llama-3.1-8b-instant) |
LANGCHAIN_API_KEY |
No | LangSmith API key for tracing |
| Method | Endpoint | Description |
|---|---|---|
POST |
/research |
Run a research query |
POST |
/research/stream |
Stream research results (SSE) |
GET |
/health |
Health check |
GET |
/reflection-stats |
Self-reflection metrics |
GET |
/cache/stats |
Cache hit rates |
GET |
/metrics |
Performance metrics |
curl -X POST http://localhost:8000/research \
-H "Content-Type: application/json" \
-d '{"query": "What are the latest breakthroughs in solid-state batteries?", "max_results": 5}'The project includes two evaluation systems:
evaluation/-- Ragas evaluation that scores the agent on faithfulness, relevancy, context precision, and recall. Runs automatically in CI via GitHub Actions.benchmarking/-- Comparative benchmark that tests the agent head-to-head against a baseline LLM on the same queries.
# Run evaluation
python evaluation/run_evaluation.py
# Run comparative benchmark
python -m benchmarking.benchmark --num 10.
βββ agent/ # LangGraph agent workflow
βββ app.py # Streamlit frontend
βββ benchmarking/ # Agent vs baseline comparison
βββ config/ # YAML configuration
βββ evaluation/ # Ragas evaluation + test queries
βββ logger/ # Logging setup
βββ main.py # FastAPI backend
βββ models/ # Model definitions
βββ prompt_library/ # System prompts
βββ tools/ # LangChain tool wrappers
βββ utils/ # Config loader, web search, cache
βββ Dockerfile # Multi-stage Docker build
βββ requirements.txt # Python dependencies
See CONTRIBUTING.md for guidelines.
MIT -- see LICENSE for details.