Skip to content

ankitnegi-dev/Techdesk-ai-social-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TechDesk AI β€” Industry-Grade AI Social Media Agent

Built as a student project demonstrating production-grade AI engineering patterns.

A fully autonomous AI social media monitoring and response agent that watches Reddit, LinkedIn, and Twitter in real time, classifies incoming signals using a large language model, retrieves relevant answers from a knowledge base, routes them through a multi-agent pipeline, and queues draft replies for human review.


What it does

  • Monitors Reddit, LinkedIn, and Twitter simultaneously in real time
  • Classifies every signal by intent β€” complaint, praise, question, crisis, viral opportunity
  • Retrieves accurate answers from a RAG knowledge base (12 TechDesk AI FAQs)
  • Routes signals through a LangGraph multi-agent swarm β€” Orchestrator, Engagement, Crisis, ContentCreator agents
  • Drafts on-brand replies using Llama 3.3 70B via Groq API
  • Runs every draft through a multi-layer safety gate β€” keyword filter + toxicity detection
  • Queues drafts for human review in a real-time web dashboard
  • Logs every LLM call and action to an append-only audit trail in PostgreSQL
  • Collects RLHF preference data from human edits for future fine-tuning
  • Tracks strategy performance using a contextual bandit algorithm

Architecture β€” 7 layers

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Layer 0 β€” Perception                                        β”‚
β”‚  Reddit Β· LinkedIn Β· Twitter Β· Simulation β†’ Kafka           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Layer 1 β€” Understanding                                     β”‚
β”‚  Intent classification Β· Sentiment Β· Entity extraction      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Layer 2 β€” Planning                                          β”‚
β”‚  LangGraph orchestration Β· Strategy selection Β· Routing     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Layer 3 β€” Memory                                            β”‚
β”‚  PostgreSQL + pgvector Β· Redis Β· RAG knowledge base         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Layer 4 β€” Action                                            β”‚
β”‚  Draft generation Β· Platform formatting Β· Scheduling        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Layer 5 β€” Safety                                            β”‚
β”‚  Keyword filter Β· Perspective API Β· HITL review dashboard   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Layer 6 β€” Observability                                     β”‚
β”‚  Audit trail Β· RLHF collector Β· Strategy leaderboard        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Multi-agent system

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    Signal ──────►  β”‚  Orchestrator   β”‚  classifies intent + routes
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β–Ό              β–Ό              β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Engagement  β”‚  β”‚  Crisis  β”‚  β”‚ ContentCreator   β”‚
    β”‚    Agent     β”‚  β”‚  Agent   β”‚  β”‚     Agent        β”‚
    β”‚              β”‚  β”‚          β”‚  β”‚                  β”‚
    β”‚ Drafts reply β”‚  β”‚Escalates β”‚  β”‚ Creates proactiveβ”‚
    β”‚ using RAG    β”‚  β”‚ to HITL  β”‚  β”‚ content for      β”‚
    β”‚ + persona    β”‚  β”‚ urgently β”‚  β”‚ viral signals    β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚              β”‚              β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Safety Gate    β”‚  keyword + toxicity check
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  HITL Queue     β”‚  human review dashboard
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tech stack

Component Technology
Language Python 3.12
LLM Llama 3.3 70B via Groq API (free)
Agent orchestration LangGraph
Embeddings fastembed β€” BAAI/bge-small-en-v1.5 (local, no GPU)
Primary database PostgreSQL 16 + pgvector extension
Vector search pgvector cosine similarity
Event streaming Apache Kafka (KRaft mode, no Zookeeper)
Working memory Redis 7
API framework FastAPI + WebSocket
HITL dashboard FastAPI + vanilla JS + WebSocket real-time updates
Safety Keyword filter + Google Perspective API
Infrastructure Docker Compose
Reddit connector Public JSON API β€” no key needed
LinkedIn connector Google News RSS + feedparser β€” no key needed

Project structure

AI-Social-Agent/
β”œβ”€β”€ main.py                          ← Main orchestrator β€” Phase 4
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env                             ← API keys (never commit this)
β”‚
β”œβ”€β”€ services/
β”‚   β”œβ”€β”€ perception/
β”‚   β”‚   β”œβ”€β”€ launcher.py              ← Starts all platform connectors
β”‚   β”‚   β”œβ”€β”€ main.py                  ← Simulation mode
β”‚   β”‚   β”œβ”€β”€ reddit_stream.py         ← Reddit public API connector
β”‚   β”‚   β”œβ”€β”€ linkedin_stream.py       ← LinkedIn + Google News connector
β”‚   β”‚   β”œβ”€β”€ twitter_stream.py        ← Twitter filtered stream
β”‚   β”‚   └── normalizer.py            ← Normalizes all platforms to SocialSignal
β”‚   β”‚
β”‚   β”œβ”€β”€ agents/
β”‚   β”‚   β”œβ”€β”€ graph.py                 ← LangGraph compiled agent graph
β”‚   β”‚   β”œβ”€β”€ state.py                 ← Shared AgentState TypedDict
β”‚   β”‚   β”œβ”€β”€ orchestrator.py          ← Routes signals to specialist agents
β”‚   β”‚   β”œβ”€β”€ engagement.py            ← Drafts replies using RAG + persona
β”‚   β”‚   β”œβ”€β”€ crisis.py                ← Crisis escalation agent
β”‚   β”‚   └── content_creator.py      ← Proactive content for viral signals
β”‚   β”‚
β”‚   β”œβ”€β”€ safety/
β”‚   β”‚   └── gate.py                  ← Multi-layer content moderation
β”‚   β”‚
β”‚   β”œβ”€β”€ hitl/
β”‚   β”‚   β”œβ”€β”€ dashboard.py             ← FastAPI backend + WebSocket
β”‚   β”‚   └── dashboard.html           ← Review UI β€” approve/edit/reject
β”‚   β”‚
β”‚   β”œβ”€β”€ rag/
β”‚   β”‚   └── pipeline.py              ← Embed + retrieve knowledge chunks
β”‚   β”‚
β”‚   └── rlhf/
β”‚       β”œβ”€β”€ collector.py             ← Saves human edit preference pairs
β”‚       β”œβ”€β”€ strategy_tracker.py      ← Contextual bandit strategy selector
β”‚       └── dashboard_routes.py      ← RLHF API endpoints
β”‚
β”œβ”€β”€ shared/
β”‚   β”œβ”€β”€ models.py                    ← Pydantic data models
β”‚   β”œβ”€β”€ config.py                    ← Settings loader (.env)
β”‚   β”œβ”€β”€ kafka_client.py              ← Kafka producer/consumer helpers
β”‚   β”œβ”€β”€ audit.py                     ← Append-only audit trail
β”‚   └── db/
β”‚       └── models.py                ← SQLAlchemy ORM models
β”‚
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ init_db.py                   ← Creates tables + seeds knowledge base
β”‚   └── export_preferences.py        ← Exports RLHF data to JSONL
β”‚
└── infra/
    └── docker-compose.yml           ← PostgreSQL + Redis + Kafka

Database schema

Table Purpose
signals Every incoming social signal with embedding
actions Every agent action β€” draft, final content, scores
knowledge_base RAG document chunks with vector embeddings
audit_log Append-only log of every LLM call and publish event
preference_pairs RLHF training data from human edits

Kafka topics

Topic Purpose
social.signals.raw Raw normalized signals from all platforms
social.signals.classified Signals after intent classification
agent.actions.draft Draft replies before safety gate
agent.actions.approved Approved replies ready to publish
agent.actions.published Confirmed published actions

Setup and running

Prerequisites

  • Docker Desktop or Docker Engine
  • Python 3.12
  • Groq API key β€” free at console.groq.com

Step 1 β€” Clone and set up environment

git clone <repo>
cd AI-Social-Agent
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Step 2 β€” Configure API keys

cp .env.example .env
nano .env
# Add your GROQ_API_KEY

Step 3 β€” Start infrastructure

docker compose -f infra/docker-compose.yml up -d

Step 4 β€” Initialize database

python scripts/init_db.py

Step 5 β€” Run the system (3 terminal tabs)

Tab 1 β€” All platform connectors:

python services/perception/launcher.py

Tab 2 β€” Main agent:

python main.py

Tab 3 β€” HITL dashboard:

uvicorn services.hitl.dashboard:app --host 0.0.0.0 --port 8000

Open http://localhost:8000 in your browser.


Environment variables

GROQ_API_KEY=                    # Required β€” get free at console.groq.com
TWITTER_BEARER_TOKEN=            # Optional β€” Twitter filtered stream
PERSPECTIVE_API_KEY=             # Optional β€” Google toxicity detection
REDIS_URL=redis://localhost:6379/0
DATABASE_URL=postgresql://agent:agentpass@localhost:5432/social_agent
AGENT_ENV=development
HITL_ENABLED=true
SAFETY_THRESHOLD=0.7
LOG_LEVEL=INFO

Key engineering decisions

Why Groq instead of OpenAI? Groq provides free API access to Llama 3.3 70B with generous rate limits β€” perfect for a student project. The architecture supports swapping in any LLM provider by changing one file.

Why LangGraph instead of LangChain agents? LangGraph gives explicit control over the agent graph β€” nodes, edges, and routing are all code. This makes the system debuggable and predictable, unlike black-box agent frameworks.

Why Kafka instead of just Redis queues? Kafka provides durable, replayable event streaming. If the agent crashes, no signals are lost β€” the consumer group simply re-reads from its last committed offset. Redis queues are ephemeral.

Why pgvector instead of a separate vector database? pgvector keeps the entire data model in one system. For a project of this scale, the operational simplicity of one database outweighs the performance benefits of a dedicated vector store.

Why fastembed instead of OpenAI embeddings? fastembed runs entirely locally β€” no API call, no cost, no latency. BAAI/bge-small-en-v1.5 is 384 dimensions and performs well for semantic similarity on short social media text.


What makes this industry-level

Property This project
Memory Four-tier: working (Redis) + episodic (pgvector) + semantic (RAG) + procedural (strategy tracker)
Agent architecture Specialized agents with LangGraph orchestration and tool use
Safety Keyword filter + toxicity detection + HITL review + full audit trail
Human oversight Built into the workflow as a first-class concept
Feedback loop RLHF preference collection + contextual bandit strategy selection
Observability Append-only PostgreSQL audit log, every LLM call recorded
Scalability Stateless agents + Kafka = horizontal scaling ready
Reliability Idempotent actions, Kafka consumer groups, error recovery

Learning roadmap

Built with guidance from the AI Social Agents Industry Guide. Skills used:

  • Python β€” async/await, Pydantic, SQLAlchemy, FastAPI
  • LLM engineering β€” prompt engineering, tool use, ReAct pattern
  • RAG β€” embedding, vector similarity search, context injection
  • Multi-agent systems β€” LangGraph state machines, agent specialization
  • Data engineering β€” Kafka, streaming, consumer groups
  • Databases β€” PostgreSQL, pgvector, Redis
  • Infrastructure β€” Docker, docker-compose

Author

Student project β€” Ankit Negi Built as a demonstration of industry-grade AI agent engineering patterns.


"Build one layer at a time. Iterate on real data. The best agents are built by engineers who keep learning."

About

Industry-grade AI social media agent built with LangGraph, Kafka, RAG, and HITL review

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors