A production-grade, multi-agent AI system for real-time video/audio stream intelligence.
- Streaming Transcript Pipeline: Seamlessly handles live or uploaded media, segmenting audio into chunks with high-performance transcription (OpenAI Whisper or mock mode) and sequence-based event ordering.
- Multi-Agent Orchestrator: A parallel, asynchronous engine that fans out transcript chunks to 6 specialized AI agents:
- Moment Detection: Finds highlights, goals, and breaking news.
- Rolling Summarization: Maintains an evolving context-aware narrative.
- Entity/Topic Extraction: Identifies players, teams, and key topics with sentiment analysis.
- Consumer Formatting: Transforms raw AI insights into viewer-ready UX cards.
- Q&A Assistant: Allows viewers to ask natural language questions about the stream using RAG (Retrieval Augmented Generation).
- Safety Guardrails: Gathers all agent output and passes it through a quality-safety gate before publishing.
- WebSocket Gateway: Pushes structured intelligence to frontend clients in <500ms after agent processing.
- Dual Frontends:
- Consumer Live View: A premium, real-time dashboard for viewers with scrolling transcripts and moment alerts.
- Operator Console: A granular, low-level dashboard for monitoring confidence scores, agent latencies, and pipeline health with manual approval/rejection controls.
LBIP is built with a modular microservices architecture focused on throughput and low latency:
┌─────────────────┐ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Media Ingestion │ ────▶│ Kafka Topic │ ────▶│ AI Agent Swarm│ ────▶│ Redis PubSub│
│ (FastAPI + FFmpeg) │ (raw.chunks)│ │ (Orchestrator)│ │ (ws.bridge) │
└─────────────────┘ └─────────────┘ └──────────────┘ └─────────────┘
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌─────────────┐
│ S3 Asset Store│ │ PostgreSQL DB │ │ Next.js App │
│ (MinIO local) │ │ (Persistence) │ │ (Viewer/Op) │
└───────────────┘ └───────────────┘ └─────────────┘
- Backend: Python 3.12 (FastAPI), Async SQLAlchemy, Pydantic v2.
- AI/LLM: OpenAI (GPT-4o / Whisper) & Anthropic (Claude 3).
- Messaging: Apache Kafka (Event-driven async processing).
- Caching/Real-time: Redis (Pub/Sub + State/Context).
- Persistence: PostgreSQL (Structured session/event data).
- Storage: AWS S3 / MinIO (Media assets).
- Frontend: Next.js 14, React 18, TypeScript, TailwindCSS, Lucide Icons.
- Containerization: Docker & Docker Compose.
- Low Latency: Agents run in parallel using
asyncio.gather. Guardrail agent is the only serial gate. Average pipeline latency ~1.2s-2s including LLM calls. - Fault Tolerance: Kafka consumers implement manual offset commits and a dedicated Dead Letter Queue (DLQ) for failed transcript chunks.
- Idempotency: Every transcript chunk and agent output is keyed by a SHA-256 hash of its session, sequence, and content, preventing duplicates across system restarts or retries.
- Concurrency: The orchestrator uses a Redis-backed session context manager to isolate state for multiple simultaneous live streams.
- Cost-Awareness: Intelligent model routing (GPT-4o-mini for fast detection/safe-filtering, GPT-4o for complex summarization).
-
Environment Config:
cp .env.example .env # Add your OPENAI_API_KEY for real transcription -
Launch Infrastructure:
docker-compose up -d --build
-
Install & Run Frontend:
cd frontend && npm install && npm run dev
-
Seed Sample Data (Optional):
# Run the seed script to see a populated dashboard python scripts/seed_data.py