β‘ Multi-Agent AI that turns any podcast, video or transcript into summaries, highlights & social content
π Try the App Here
π Frontend (Streamlit): https://mediamind-ai.onrender.com/
MediaMind is a production-grade Autonomous Media Intelligence Platform powered by a multi-agent AI pipeline.
Instead of a single LLM call, it routes every user request through a Supervisor β Specialist agent system β intelligently deciding whether to summarize, extract highlights, or generate social content.
It combines Groq's ultra-fast inference with Hybrid RAG (ChromaDB + BM25), MCP-style tool calling, and real-time YouTube transcript ingestion β all behind a clean, session-aware Streamlit chat UI.
| Feature | Description |
|---|---|
| β‘ Ultra-Fast Inference | Groq LPU running Llama 3.3 70B β sub-2s responses |
| π§ Multi-Agent Pipeline | Supervisor routes to Summarize / Highlight / Social agent |
| πΊ YouTube Ingestion | Paste any YouTube URL β transcript fetched, indexed, answered |
| π Hybrid RAG | ChromaDB vector search (60%) + BM25 keyword search (40%) merged |
| π§ MCP Tool Registry | Wikipedia, DuckDuckGo, YouTube Transcript, File Reader β per-agent access control |
| π¬ Multi-Session Chat | Full session history, auto-titles, session switching, export to markdown |
| π¬ Direct Q&A Mode | Ask any question β Q&A Agent answers concisely, no structured reports |
| π Deployed on Render | Persistent ChromaDB storage β data survives server restarts |
| Technology | Purpose |
|---|---|
| π Python | Core programming |
| β‘ Groq API | Fast LLM inference (Llama 3.3 70B) |
| π§ LangGraph | Agent orchestration (StateGraph) |
| π LangChain | LLM integration + tool binding |
| π¨ Streamlit | Frontend UI + multi-session chat |
| π¦ ChromaDB | Vector store (persistent) |
| π BM25 (rank_bm25) | Keyword search for hybrid RAG |
| π€ all-MiniLM-L6-v2 | Local embeddings β zero API cost |
| π DuckDuckGo DDGS | Live web search tool |
| π Wikipedia API | Factual enrichment tool |
| π Render | Cloud deployment |
MediaMind
β
βββ app.py # Streamlit UI β multi-session chat, source management
βββ agent.py # LangGraph multi-agent pipeline
βββ rag.py # Hybrid RAG (ChromaDB + BM25)
βββ mcp_tools.py # MCP tool registry (4 tools, per-agent access control)
βββ llm.py # Groq LLM client (3 temperature modes)
βββ prompts.py # All LLM prompts β clean separation of concerns
βββ config.py # Central config β models, RAG params, retry settings
βββ requirements.txt
βββ .env # API keys (NOT pushed to GitHub)
βββ README.md
User Query
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Supervisor Node β
β Reads query, decides routing (temp=0.0) β
ββββββββ¬βββββββββββ¬βββββββββββββββ¬βββββββββββββββββββββ
β β β β
βΌ βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ βββββββββββββββ
βSummarize β βHighlight β β Social β β Q&A Agent β
β Agent β β Agent β β Agent β β (NEW β¨) β
ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββββββ¬βββββββ
β β β β
βΌ βΌ βΌ βΌ
Wikipedia Wikipedia Web Search Wikipedia
Web Search Web Search (only) Web Search
β β β β
βΌ βΌ βΌ βΌ
Groq 0.3 Groq 0.0 Groq 0.75 Groq 0.3
(balanced) (precise) (creative) (balanced)
β β β β
ββββββββββββββ΄βββββββββββββ΄ββββββββββββββββ
β
βΌ
Final Response β Chat UI
| Query type | Example | Routes to |
|---|---|---|
| Wants a summary / overview | "summarize this video" | summarize_agent |
| Wants highlights / key moments | "what are the key points?" | highlight_agent |
| Wants social media content | "write a LinkedIn post" | social_agent |
| Asks a direct question | "what does X mean?" / "who is Y?" | qa_agent β¨ |
How the supervisor decides: If the query contains question words β what, why, how, who, when, explain, define β it always routes to
qa_agent. The Q&A Agent answers in 2β5 sentences, grounded in the transcript, with no structured reports or bullet points.
User Query
β
ββββββββββββββββββββββββββββββββ
βΌ βΌ
ChromaDB Vector Search BM25 Keyword Search
(semantic similarity) (exact term matching)
all-MiniLM-L6-v2 embeddings rank_bm25 BM25Okapi
Top-4 chunks (60% weight) Top-4 chunks (40% weight)
β β
ββββββββββββ¬ββββββββββββββββββββ
βΌ
Merge + Deduplicate
(vector results get priority)
β
βΌ
Top-4 chunks β context string β Agent
| Tool | Description | Agent Access |
|---|---|---|
youtube_transcript |
Fetches full transcript from YouTube URL | Research agent |
web_search |
Live DuckDuckGo search for news & trends | All agents |
wikipedia_search |
Factual background on people & topics | Summarize, Highlight |
read_file |
Reads local .txt / .srt / .md transcript | Research agent |
Each agent gets only the tools it needs β social agent gets web search only, summarize and highlight agents get Wikipedia + web search. This is deliberate architecture, not default behaviour.
git clone https://github.com/hari9618/mediamind
cd mediamindpython -m venv venv
source venv/bin/activate # Mac/Linux
venv\Scripts\activate # Windowspip install -r requirements.txtCreate a .env file:
GROQ_API_KEY=your_groq_api_key_hereGet your free Groq API key at console.groq.com
streamlit run app.py1οΈβ£ User sends a query (or pastes a YouTube URL)
β
2οΈβ£ YouTube URL detected? β Fetch transcript β Clear ChromaDB β Re-index
β
3οΈβ£ Hybrid RAG retrieval β ChromaDB semantic + BM25 keyword β Top-4 chunks
β
4οΈβ£ Supervisor reads query β Routes to Summarize / Highlight / Social / Q&A agent
β (question words detected? β qa_agent for direct concise answer)
β
5οΈβ£ Agent calls MCP tools (Wikipedia, DuckDuckGo) for real-world enrichment
β
6οΈβ£ Agent formats prompt: RAG context + tool results + user query β Groq LLM
β
7οΈβ£ Response rendered in chat β markdown or styled highlight cards
(Add your screenshot here)
<img width="951" height="446" alt="Screenshot 2026-05-09 170729" src="https://github.com/user-attachments/assets/978fbee0-d71f-4b39-9519-98e0de61ecab" />
β LangGraph StateGraph β building real state machines with typed state and conditional edges
β Hybrid RAG Engineering β combining vector + keyword search with weighted merging
β MCP Tool Architecture β per-agent access control, tool binding, ToolMessage conversations
β Multi-Session State Management β Streamlit session_state design for complex apps
β Production RAG Deployment β PersistentClient ChromaDB, real-time re-indexing
β LLM Temperature Strategy β precise / balanced / creative modes for different task types
β YouTube API Integration β youtube-transcript-api v1.x, URL parsing, live ingestion
β Intelligent Task Routing β keyword-based intent detection to separate Q&A from generation tasks
πΉ Speaker diarization β identify who said what in transcripts
πΉ Multi-turn Q&A β follow-up questions that remember previous answers in session
πΉ Multi-document RAG β index multiple videos/files simultaneously
πΉ Audio file support β direct .mp3/.wav upload with Whisper transcription
πΉ Scheduled indexing β auto-index new episodes from RSS feeds
πΉ Shareable sessions β export and share full conversation threads
Hari Krishna T
AI Engineer | Multi-Agent Systems Builder | Gen AI Developer
π GitHub: github.com/hari9618
π LinkedIn: linkedin.com/in/hari-krishna-thota-06b850275
If you like this project:
β Star the repository
π’ Share with others
π΄ Fork and build on top of it
AI Multi-Agent LangGraph LangChain Groq RAG ChromaDB BM25 Streamlit YouTube MCP Python Generative AI LLM Render
