Skip to content

hari9618/Meadia-Mind--MultiAgent-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ MediaMind β€” Autonomous Media Intelligence Platform

⚑ Multi-Agent AI that turns any podcast, video or transcript into summaries, highlights & social content


🧩 Tech Badges


πŸš€ Live Demo

πŸ‘‰ Try the App Here

πŸ”— Frontend (Streamlit): https://mediamind-ai.onrender.com/


πŸ“Œ Project Overview

MediaMind is a production-grade Autonomous Media Intelligence Platform powered by a multi-agent AI pipeline.

Instead of a single LLM call, it routes every user request through a Supervisor β†’ Specialist agent system β€” intelligently deciding whether to summarize, extract highlights, or generate social content.

It combines Groq's ultra-fast inference with Hybrid RAG (ChromaDB + BM25), MCP-style tool calling, and real-time YouTube transcript ingestion β€” all behind a clean, session-aware Streamlit chat UI.


✨ Key Features

Feature Description
⚑ Ultra-Fast Inference Groq LPU running Llama 3.3 70B β€” sub-2s responses
🧠 Multi-Agent Pipeline Supervisor routes to Summarize / Highlight / Social agent
πŸ“Ί YouTube Ingestion Paste any YouTube URL β€” transcript fetched, indexed, answered
πŸ” Hybrid RAG ChromaDB vector search (60%) + BM25 keyword search (40%) merged
πŸ”§ MCP Tool Registry Wikipedia, DuckDuckGo, YouTube Transcript, File Reader β€” per-agent access control
πŸ’¬ Multi-Session Chat Full session history, auto-titles, session switching, export to markdown
πŸ’¬ Direct Q&A Mode Ask any question β€” Q&A Agent answers concisely, no structured reports
πŸš€ Deployed on Render Persistent ChromaDB storage β€” data survives server restarts

πŸ› οΈ Tech Stack

Technology Purpose
🐍 Python Core programming
⚑ Groq API Fast LLM inference (Llama 3.3 70B)
🧠 LangGraph Agent orchestration (StateGraph)
πŸ”— LangChain LLM integration + tool binding
🎨 Streamlit Frontend UI + multi-session chat
πŸ“¦ ChromaDB Vector store (persistent)
πŸ“Š BM25 (rank_bm25) Keyword search for hybrid RAG
πŸ€— all-MiniLM-L6-v2 Local embeddings β€” zero API cost
🌐 DuckDuckGo DDGS Live web search tool
πŸ“– Wikipedia API Factual enrichment tool
πŸš€ Render Cloud deployment

πŸ—οΈ System Architecture

MediaMind
β”‚
β”œβ”€β”€ app.py              # Streamlit UI β€” multi-session chat, source management
β”œβ”€β”€ agent.py            # LangGraph multi-agent pipeline
β”œβ”€β”€ rag.py              # Hybrid RAG (ChromaDB + BM25)
β”œβ”€β”€ mcp_tools.py        # MCP tool registry (4 tools, per-agent access control)
β”œβ”€β”€ llm.py              # Groq LLM client (3 temperature modes)
β”œβ”€β”€ prompts.py          # All LLM prompts β€” clean separation of concerns
β”œβ”€β”€ config.py           # Central config β€” models, RAG params, retry settings
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env                # API keys (NOT pushed to GitHub)
└── README.md

πŸ€– Agent Pipeline

User Query
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Supervisor Node                    β”‚
β”‚          Reads query, decides routing (temp=0.0)     β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚          β”‚              β”‚                β”‚
       β–Ό          β–Ό              β–Ό                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Summarize β”‚ β”‚Highlight β”‚ β”‚  Social  β”‚  β”‚   Q&A Agent β”‚
β”‚  Agent   β”‚ β”‚  Agent   β”‚ β”‚  Agent   β”‚  β”‚  (NEW ✨)   β”‚
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
     β”‚            β”‚            β”‚               β”‚
     β–Ό            β–Ό            β–Ό               β–Ό
 Wikipedia    Wikipedia    Web Search      Wikipedia
 Web Search   Web Search    (only)        Web Search
     β”‚            β”‚            β”‚               β”‚
     β–Ό            β–Ό            β–Ό               β–Ό
  Groq 0.3    Groq 0.0     Groq 0.75       Groq 0.3
  (balanced)  (precise)   (creative)      (balanced)
     β”‚            β”‚            β”‚               β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
                    Final Response β†’ Chat UI

πŸ€– Agent Routing Logic

Query type Example Routes to
Wants a summary / overview "summarize this video" summarize_agent
Wants highlights / key moments "what are the key points?" highlight_agent
Wants social media content "write a LinkedIn post" social_agent
Asks a direct question "what does X mean?" / "who is Y?" qa_agent ✨

How the supervisor decides: If the query contains question words β€” what, why, how, who, when, explain, define β€” it always routes to qa_agent. The Q&A Agent answers in 2–5 sentences, grounded in the transcript, with no structured reports or bullet points.


πŸ” Hybrid RAG Pipeline

User Query
    β”‚
    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β–Ό                              β–Ό
ChromaDB Vector Search         BM25 Keyword Search
(semantic similarity)          (exact term matching)
all-MiniLM-L6-v2 embeddings    rank_bm25 BM25Okapi
Top-4 chunks (60% weight)      Top-4 chunks (40% weight)
    β”‚                              β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β–Ό
       Merge + Deduplicate
    (vector results get priority)
               β”‚
               β–Ό
       Top-4 chunks β†’ context string β†’ Agent

πŸ”§ MCP Tool Registry

Tool Description Agent Access
youtube_transcript Fetches full transcript from YouTube URL Research agent
web_search Live DuckDuckGo search for news & trends All agents
wikipedia_search Factual background on people & topics Summarize, Highlight
read_file Reads local .txt / .srt / .md transcript Research agent

Each agent gets only the tools it needs β€” social agent gets web search only, summarize and highlight agents get Wikipedia + web search. This is deliberate architecture, not default behaviour.


βš™οΈ Installation Guide

1️⃣ Clone Repository

git clone https://github.com/hari9618/mediamind
cd mediamind

2️⃣ Create Virtual Environment

python -m venv venv
source venv/bin/activate   # Mac/Linux
venv\Scripts\activate      # Windows

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Setup Environment Variables

Create a .env file:

GROQ_API_KEY=your_groq_api_key_here

Get your free Groq API key at console.groq.com

5️⃣ Run the App

streamlit run app.py

🧠 How It Works

1️⃣  User sends a query (or pastes a YouTube URL)
        β”‚
2️⃣  YouTube URL detected? β†’ Fetch transcript β†’ Clear ChromaDB β†’ Re-index
        β”‚
3️⃣  Hybrid RAG retrieval β†’ ChromaDB semantic + BM25 keyword β†’ Top-4 chunks
        β”‚
4️⃣  Supervisor reads query β†’ Routes to Summarize / Highlight / Social / Q&A agent
        β”‚        (question words detected? β†’ qa_agent for direct concise answer)
        β”‚
5️⃣  Agent calls MCP tools (Wikipedia, DuckDuckGo) for real-world enrichment
        β”‚
6️⃣  Agent formats prompt: RAG context + tool results + user query β†’ Groq LLM
        β”‚
7️⃣  Response rendered in chat β€” markdown or styled highlight cards

πŸ“· Application Preview

(Add your screenshot here)

<img width="951" height="446" alt="Screenshot 2026-05-09 170729" src="https://github.com/user-attachments/assets/978fbee0-d71f-4b39-9519-98e0de61ecab" />


πŸ“š What I Learned

βœ” LangGraph StateGraph β€” building real state machines with typed state and conditional edges
βœ” Hybrid RAG Engineering β€” combining vector + keyword search with weighted merging
βœ” MCP Tool Architecture β€” per-agent access control, tool binding, ToolMessage conversations
βœ” Multi-Session State Management β€” Streamlit session_state design for complex apps
βœ” Production RAG Deployment β€” PersistentClient ChromaDB, real-time re-indexing
βœ” LLM Temperature Strategy β€” precise / balanced / creative modes for different task types
βœ” YouTube API Integration β€” youtube-transcript-api v1.x, URL parsing, live ingestion
βœ” Intelligent Task Routing β€” keyword-based intent detection to separate Q&A from generation tasks


🎯 Future Improvements

πŸ”Ή Speaker diarization β€” identify who said what in transcripts
πŸ”Ή Multi-turn Q&A β€” follow-up questions that remember previous answers in session
πŸ”Ή Multi-document RAG β€” index multiple videos/files simultaneously
πŸ”Ή Audio file support β€” direct .mp3/.wav upload with Whisper transcription
πŸ”Ή Scheduled indexing β€” auto-index new episodes from RSS feeds
πŸ”Ή Shareable sessions β€” export and share full conversation threads


πŸ‘¨β€πŸ’» Author

Hari Krishna T
AI Engineer | Multi-Agent Systems Builder | Gen AI Developer

πŸ”— GitHub: github.com/hari9618
πŸ”— LinkedIn: linkedin.com/in/hari-krishna-thota-06b850275


⭐ Support

If you like this project:

⭐ Star the repository
πŸ“’ Share with others
🍴 Fork and build on top of it


πŸ“’ Tags

AI Multi-Agent LangGraph LangChain Groq RAG ChromaDB BM25 Streamlit YouTube MCP Python Generative AI LLM Render

About

🧠 MediaMind is a multi-agent AI-powered media intelligence platform that transforms podcasts, videos, and transcripts into summaries, highlights, and social media content using LangGraph, Hybrid RAG, and Groq Llama 3.3 70B. Features include autonomous agent routing, YouTube transcript ingestion, MCP-inspired tool architecture, ChromaDB + BM25 retr

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages