Agentic AI Research Assistant

title	Agentic AI Research Assistant
emoji	🧠
colorFrom	blue
colorTo	purple
sdk	docker
app_file	app.py
pinned	false

Agentic AI Research Assistant

An autonomous AI agent built to tackle one of the biggest problems with LLMs: hallucinations.

When doing research, you can't afford confidently stated incorrect facts. This agent solves that by searching the web, evaluating its own findings, and fixing its mistakes before giving you an answer.

🚀 Try it live on Hugging Face Spaces: Agentic AI Research Assistant

How It Works

Instead of a simple "prompt -> response" pipeline, this agent uses a self-reflection loop (powered by LangGraph):

Generate: It drafts an initial response using web search results.
Critique: It fact-checks every single claim it just made against the real web evidence.
Refine: If it isn't completely confident (score < 0.7), it goes back to search for more data and rewrites its answer.

Architecture

graph TD
    Start([User Query]) --> Agent
    Agent{{Decide Action}}

    Agent -->|Needs Info| Tools[Web Search / Summarize]
    Tools --> Agent

    Agent -->|Draft Response| Critic[Self-Reflection Node]

    Critic -->|Confidence < 0.7| Agent
    Critic -->|Confidence >= 0.7| Final([Final Response])

    subgraph Reflection Loop
    Critic -.->|Feedback + Retry| Agent
    end

Tech Stack

Component	Technology
Agent framework	LangGraph
LLM	Groq (Llama 3.x)
Web search	Tavily / DuckDuckGo
Evaluation	Ragas
Tracing	LangSmith
Backend API	FastAPI
Frontend	Streamlit
Deployment	Docker, HuggingFace Spaces
CI/CD	GitHub Actions

Quick Start

# Clone
git clone https://github.com/aniketpoojari/Agentic-AI-Research-Assistant.git
cd Agentic-AI-Research-Assistant

# Install
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Configure
cp .env.example .env
# Edit .env with your API keys (see Environment Variables below)

# Run
uvicorn main:app --reload          # API at http://localhost:8000
streamlit run app.py               # UI at http://localhost:8501

Docker

docker build -t research-assistant .
docker run -p 7860:7860 -p 8000:8000 \
  -e GROQ_API_KEY=your_key \
  -e TAVILY_API_KEY=your_key \
  research-assistant

Environment Variables

Variable	Required	Description
`GROQ_API_KEY`	Yes	Groq API key for LLM inference
`TAVILY_API_KEY`	Yes	Tavily API key for web search
`MODEL_NAME`	No	Model name (default: `llama-3.1-8b-instant`)
`LANGCHAIN_API_KEY`	No	LangSmith API key for tracing

API Endpoints

Method	Endpoint	Description
`POST`	`/research`	Run a research query
`POST`	`/research/stream`	Stream research results (SSE)
`GET`	`/health`	Health check
`GET`	`/reflection-stats`	Self-reflection metrics
`GET`	`/cache/stats`	Cache hit rates
`GET`	`/metrics`	Performance metrics

Example Request

curl -X POST http://localhost:8000/research \
  -H "Content-Type: application/json" \
  -d '{"query": "What are the latest breakthroughs in solid-state batteries?", "max_results": 5}'

Evaluation

The project includes two evaluation systems:

evaluation/ -- Ragas evaluation that scores the agent on faithfulness, relevancy, context precision, and recall. Runs automatically in CI via GitHub Actions.
benchmarking/ -- Comparative benchmark that tests the agent head-to-head against a baseline LLM on the same queries.

# Run evaluation
python evaluation/run_evaluation.py

# Run comparative benchmark
python -m benchmarking.benchmark --num 10

Project Structure

.
├── agent/              # LangGraph agent workflow
├── app.py              # Streamlit frontend
├── benchmarking/       # Agent vs baseline comparison
├── config/             # YAML configuration
├── evaluation/         # Ragas evaluation + test queries
├── logger/             # Logging setup
├── main.py             # FastAPI backend
├── models/             # Model definitions
├── prompt_library/     # System prompts
├── tools/              # LangChain tool wrappers
├── utils/              # Config loader, web search, cache
├── Dockerfile          # Multi-stage Docker build
└── requirements.txt    # Python dependencies

Contributing

See CONTRIBUTING.md for guidelines.

License

MIT -- see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic AI Research Assistant

How It Works

Architecture

Tech Stack

Quick Start

Docker

Environment Variables

API Endpoints

Example Request

Evaluation

Project Structure

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github		.github
agent		agent
benchmarking		benchmarking
config		config
evaluation		evaluation
logger		logger
models		models
prompt_library		prompt_library
tools		tools
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SYSTEM_DESIGN.md		SYSTEM_DESIGN.md
app.py		app.py
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Agentic AI Research Assistant

How It Works

Architecture

Tech Stack

Quick Start

Docker

Environment Variables

API Endpoints

Example Request

Evaluation

Project Structure

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages