Built as a student project demonstrating production-grade AI engineering patterns.
A fully autonomous AI social media monitoring and response agent that watches Reddit, LinkedIn, and Twitter in real time, classifies incoming signals using a large language model, retrieves relevant answers from a knowledge base, routes them through a multi-agent pipeline, and queues draft replies for human review.
- Monitors Reddit, LinkedIn, and Twitter simultaneously in real time
- Classifies every signal by intent β complaint, praise, question, crisis, viral opportunity
- Retrieves accurate answers from a RAG knowledge base (12 TechDesk AI FAQs)
- Routes signals through a LangGraph multi-agent swarm β Orchestrator, Engagement, Crisis, ContentCreator agents
- Drafts on-brand replies using Llama 3.3 70B via Groq API
- Runs every draft through a multi-layer safety gate β keyword filter + toxicity detection
- Queues drafts for human review in a real-time web dashboard
- Logs every LLM call and action to an append-only audit trail in PostgreSQL
- Collects RLHF preference data from human edits for future fine-tuning
- Tracks strategy performance using a contextual bandit algorithm
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 0 β Perception β
β Reddit Β· LinkedIn Β· Twitter Β· Simulation β Kafka β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 1 β Understanding β
β Intent classification Β· Sentiment Β· Entity extraction β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 2 β Planning β
β LangGraph orchestration Β· Strategy selection Β· Routing β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 3 β Memory β
β PostgreSQL + pgvector Β· Redis Β· RAG knowledge base β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 4 β Action β
β Draft generation Β· Platform formatting Β· Scheduling β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 5 β Safety β
β Keyword filter Β· Perspective API Β· HITL review dashboard β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 6 β Observability β
β Audit trail Β· RLHF collector Β· Strategy leaderboard β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββ
Signal βββββββΊ β Orchestrator β classifies intent + routes
ββββββββββ¬βββββββββ
β
ββββββββββββββββΌβββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββ ββββββββββββββββββββ
β Engagement β β Crisis β β ContentCreator β
β Agent β β Agent β β Agent β
β β β β β β
β Drafts reply β βEscalates β β Creates proactiveβ
β using RAG β β to HITL β β content for β
β + persona β β urgently β β viral signals β
ββββββββββββββββ ββββββββββββ ββββββββββββββββββββ
β β β
ββββββββββββββββΌβββββββββββββββ
βΌ
βββββββββββββββββββ
β Safety Gate β keyword + toxicity check
ββββββββββ¬βββββββββ
β
βββββββββββββββββββ
β HITL Queue β human review dashboard
βββββββββββββββββββ
| Component | Technology |
|---|---|
| Language | Python 3.12 |
| LLM | Llama 3.3 70B via Groq API (free) |
| Agent orchestration | LangGraph |
| Embeddings | fastembed β BAAI/bge-small-en-v1.5 (local, no GPU) |
| Primary database | PostgreSQL 16 + pgvector extension |
| Vector search | pgvector cosine similarity |
| Event streaming | Apache Kafka (KRaft mode, no Zookeeper) |
| Working memory | Redis 7 |
| API framework | FastAPI + WebSocket |
| HITL dashboard | FastAPI + vanilla JS + WebSocket real-time updates |
| Safety | Keyword filter + Google Perspective API |
| Infrastructure | Docker Compose |
| Reddit connector | Public JSON API β no key needed |
| LinkedIn connector | Google News RSS + feedparser β no key needed |
AI-Social-Agent/
βββ main.py β Main orchestrator β Phase 4
βββ README.md
βββ requirements.txt
βββ .env β API keys (never commit this)
β
βββ services/
β βββ perception/
β β βββ launcher.py β Starts all platform connectors
β β βββ main.py β Simulation mode
β β βββ reddit_stream.py β Reddit public API connector
β β βββ linkedin_stream.py β LinkedIn + Google News connector
β β βββ twitter_stream.py β Twitter filtered stream
β β βββ normalizer.py β Normalizes all platforms to SocialSignal
β β
β βββ agents/
β β βββ graph.py β LangGraph compiled agent graph
β β βββ state.py β Shared AgentState TypedDict
β β βββ orchestrator.py β Routes signals to specialist agents
β β βββ engagement.py β Drafts replies using RAG + persona
β β βββ crisis.py β Crisis escalation agent
β β βββ content_creator.py β Proactive content for viral signals
β β
β βββ safety/
β β βββ gate.py β Multi-layer content moderation
β β
β βββ hitl/
β β βββ dashboard.py β FastAPI backend + WebSocket
β β βββ dashboard.html β Review UI β approve/edit/reject
β β
β βββ rag/
β β βββ pipeline.py β Embed + retrieve knowledge chunks
β β
β βββ rlhf/
β βββ collector.py β Saves human edit preference pairs
β βββ strategy_tracker.py β Contextual bandit strategy selector
β βββ dashboard_routes.py β RLHF API endpoints
β
βββ shared/
β βββ models.py β Pydantic data models
β βββ config.py β Settings loader (.env)
β βββ kafka_client.py β Kafka producer/consumer helpers
β βββ audit.py β Append-only audit trail
β βββ db/
β βββ models.py β SQLAlchemy ORM models
β
βββ scripts/
β βββ init_db.py β Creates tables + seeds knowledge base
β βββ export_preferences.py β Exports RLHF data to JSONL
β
βββ infra/
βββ docker-compose.yml β PostgreSQL + Redis + Kafka
| Table | Purpose |
|---|---|
signals |
Every incoming social signal with embedding |
actions |
Every agent action β draft, final content, scores |
knowledge_base |
RAG document chunks with vector embeddings |
audit_log |
Append-only log of every LLM call and publish event |
preference_pairs |
RLHF training data from human edits |
| Topic | Purpose |
|---|---|
social.signals.raw |
Raw normalized signals from all platforms |
social.signals.classified |
Signals after intent classification |
agent.actions.draft |
Draft replies before safety gate |
agent.actions.approved |
Approved replies ready to publish |
agent.actions.published |
Confirmed published actions |
- Docker Desktop or Docker Engine
- Python 3.12
- Groq API key β free at console.groq.com
git clone <repo>
cd AI-Social-Agent
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtcp .env.example .env
nano .env
# Add your GROQ_API_KEYdocker compose -f infra/docker-compose.yml up -dpython scripts/init_db.pyTab 1 β All platform connectors:
python services/perception/launcher.pyTab 2 β Main agent:
python main.pyTab 3 β HITL dashboard:
uvicorn services.hitl.dashboard:app --host 0.0.0.0 --port 8000Open http://localhost:8000 in your browser.
GROQ_API_KEY= # Required β get free at console.groq.com
TWITTER_BEARER_TOKEN= # Optional β Twitter filtered stream
PERSPECTIVE_API_KEY= # Optional β Google toxicity detection
REDIS_URL=redis://localhost:6379/0
DATABASE_URL=postgresql://agent:agentpass@localhost:5432/social_agent
AGENT_ENV=development
HITL_ENABLED=true
SAFETY_THRESHOLD=0.7
LOG_LEVEL=INFOWhy Groq instead of OpenAI? Groq provides free API access to Llama 3.3 70B with generous rate limits β perfect for a student project. The architecture supports swapping in any LLM provider by changing one file.
Why LangGraph instead of LangChain agents? LangGraph gives explicit control over the agent graph β nodes, edges, and routing are all code. This makes the system debuggable and predictable, unlike black-box agent frameworks.
Why Kafka instead of just Redis queues? Kafka provides durable, replayable event streaming. If the agent crashes, no signals are lost β the consumer group simply re-reads from its last committed offset. Redis queues are ephemeral.
Why pgvector instead of a separate vector database? pgvector keeps the entire data model in one system. For a project of this scale, the operational simplicity of one database outweighs the performance benefits of a dedicated vector store.
Why fastembed instead of OpenAI embeddings? fastembed runs entirely locally β no API call, no cost, no latency. BAAI/bge-small-en-v1.5 is 384 dimensions and performs well for semantic similarity on short social media text.
| Property | This project |
|---|---|
| Memory | Four-tier: working (Redis) + episodic (pgvector) + semantic (RAG) + procedural (strategy tracker) |
| Agent architecture | Specialized agents with LangGraph orchestration and tool use |
| Safety | Keyword filter + toxicity detection + HITL review + full audit trail |
| Human oversight | Built into the workflow as a first-class concept |
| Feedback loop | RLHF preference collection + contextual bandit strategy selection |
| Observability | Append-only PostgreSQL audit log, every LLM call recorded |
| Scalability | Stateless agents + Kafka = horizontal scaling ready |
| Reliability | Idempotent actions, Kafka consumer groups, error recovery |
Built with guidance from the AI Social Agents Industry Guide. Skills used:
- Python β async/await, Pydantic, SQLAlchemy, FastAPI
- LLM engineering β prompt engineering, tool use, ReAct pattern
- RAG β embedding, vector similarity search, context injection
- Multi-agent systems β LangGraph state machines, agent specialization
- Data engineering β Kafka, streaming, consumer groups
- Databases β PostgreSQL, pgvector, Redis
- Infrastructure β Docker, docker-compose
Student project β Ankit Negi Built as a demonstration of industry-grade AI agent engineering patterns.
"Build one layer at a time. Iterate on real data. The best agents are built by engineers who keep learning."