Automated pipeline that fetches Indian news via RSS, scrapes full article content, and classifies articles using an LLM. Results are served through a FastAPI backend and displayed on a React dashboard.
# 1. Clone & configure
cp .env.example .env # then fill in values (see below)
# 2. Run everything
docker compose up --buildThat's it. On first boot the database schema is applied automatically.
| Service | URL |
|---|---|
| Dashboard | http://localhost:5173 |
| API | http://localhost:8000 |
| API Docs | http://localhost:8000/docs |
| Postgres | localhost:5433 |
- db — TimescaleDB starts;
database/schema.sqlis applied on first run viadocker-entrypoint-initdb.d. - api — FastAPI server on port 8000 (waits for healthy db).
- worker — Runs an infinite loop: fetch RSS → scrape articles → classify with LLM. Intervals and batch sizes are configurable (see below).
- frontend — Vite dev server on port 5173, proxies
/apito the API container.
Create a .env file in the project root. Required variables:
# Postgres
POSTGRES_USER=newstrack
POSTGRES_PASSWORD=<secret>
POSTGRES_DB=newstrack
POSTGRES_HOST=localhost # docker-compose overrides to "db"
POSTGRES_PORT=5433 # host port; containers use 5432 internally
# LLM — at least one is required for classification
GOOGLE_API_KEY=<key> # Gemini (default provider)
OPENAI_API_KEY=<key> # OpenAI (optional)
ANTHROPIC_API_KEY=<key> # Anthropic (optional)
COHERE_API_KEY=<key> # Cohere (optional)
# App
ENVIRONMENT=development
LOG_LEVEL=INFO
PORT=8000
Set these in .env or directly in docker-compose.yml:
| Variable | Default | Description |
|---|---|---|
FETCH_INTERVAL |
300 | Seconds between RSS fetch cycles |
SCRAPE_INTERVAL |
120 | Seconds between scrape cycles |
CLASSIFY_INTERVAL |
120 | Seconds between classify cycles |
SCRAPE_BATCH_SIZE |
50 | Articles per scrape batch |
CLASSIFY_BATCH_SIZE |
20 | Articles per classify batch |
If you want to run a single step instead of the continuous worker:
docker compose exec api python -m scripts.fetch_rss
docker compose exec api python -m scripts.scrape_content
docker compose exec api python -m scripts.classify_articlesAll in config/, mounted into containers at /app/config.
| File | Purpose |
|---|---|
rss-sources.yaml |
RSS feed URLs and metadata |
tags.yaml |
Classification tags/categories |
filters.yaml |
Article filtering rules |
llm-config.yaml |
LLM provider and model settings |
prompts/ |
Prompt templates for LLM tasks |
| Method | Path | Description |
|---|---|---|
| GET | /api/v1/events |
List news events |
| GET | /api/v1/events/{id} |
Single event detail |
| GET | /api/v1/events/timeline |
Events grouped by date |
| GET | /api/v1/search |
Full-text search |
| GET | /api/v1/analytics/overview |
Dashboard stats |
| GET | /api/v1/analytics/trends |
Category/tag trends |
| GET | /api/v1/config/categories |
Available categories |
| GET | /health |
Health check |
# Backend
cd backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload
# Frontend
cd frontend
npm install
npm run devRequires a running Postgres/TimescaleDB instance — apply database/schema.sql manually.