Skip to content

anantham/live_conversational_threads

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

237 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Live Conversational Threads

A multi-scale conversation analysis platform for Google Meet transcripts with AI-powered insights, cognitive bias detection, and advanced visualization.

Live Conversational Threads transforms conversation transcripts into interactive, multi-scale graph visualizations that reveal both temporal flow and thematic relationships. The application supports Google Meet transcripts with speaker diarization, allowing users to explore conversations at five discrete zoom levelsβ€”from individual sentences to narrative arcsβ€”while simultaneously viewing both timeline and contextual network views.

Built with FastAPI (Python backend) and React + Vite (frontend), the platform leverages LLM-powered analysis to detect Simulacra levels, identify cognitive biases, extract implicit frames, and generate comprehensive speaker analytics.


Table of Contents


Key Features

Core Capabilities

🎯 Google Meet Transcript Import

  • Parse PDF/TXT transcripts with speaker diarization
  • Automatic speaker detection and turn segmentation
  • Timestamp extraction and duration calculation

πŸ“Š Dual-View Visualization

  • Timeline View (15%): Linear temporal progression of conversation
  • Contextual Network View (85%): Thematic clustering and idea relationships
  • Synchronized navigation and selection across views
  • Resizable split with user-customizable proportions

πŸ” 5-Level Zoom System

  • Level 1 (Sentence): Individual utterances and speaker turns
  • Level 2 (Turn): Aggregated speaker contributions
  • Level 3 (Topic): Semantic topic segments
  • Level 4 (Theme): Major thematic clusters
  • Level 5 (Arc): Narrative arcs and conversation structure

🎭 Advanced AI Analysis

  • Simulacra Level Detection: Classify utterances by communication intent (Levels 1-4)
  • Cognitive Bias Detection: Identify 25+ types of biases and logical fallacies
  • Implicit Frame Analysis: Uncover hidden worldviews and normative assumptions
  • Speaker Analytics: Role detection, time distribution, topic dominance

βš™οΈ Customizable AI Prompts

  • Externalized prompts in JSON configuration
  • User-editable via Settings UI
  • A/B testing support for prompt variations
  • Version history and rollback capability
  • Performance metrics per prompt (cost, latency, accuracy)

πŸ“ˆ Cost Tracking & Instrumentation

  • Real-time LLM API cost tracking
  • Latency monitoring (p50, p95, p99)
  • Token usage analytics by feature
  • Cost per conversation dashboards
  • Automated alerts for threshold breaches

✏️ Edit Mode & Training Data Export

  • Manual correction of AI-generated nodes/edges
  • All edits logged for future model training
  • Export formats: JSONL (fine-tuning), CSV (analysis), Markdown (review)
  • Feedback annotation for continuous improvement

Demo

Note: Video reflects earlier version of the application. Current version includes dual-view architecture, zoom levels, and advanced analysis features.


Architecture Overview

High-Level Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    React Frontend (Vite)                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Timeline View  β”‚  β”‚  Contextual Network View        β”‚   β”‚
β”‚  β”‚ (15% height)   β”‚  β”‚  (85% height)                   β”‚   β”‚
β”‚  β”‚                β”‚  β”‚                                  β”‚   β”‚
β”‚  β”‚ ●──●──●──●──●  β”‚  β”‚      β”Œβ”€β”€β”      β”Œβ”€β”€β”            β”‚   β”‚
β”‚  β”‚                β”‚  β”‚      β”‚  │──────│  β”‚            β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚      β””β”€β”€β”˜      β””β”€β”€β”˜            β”‚   β”‚
β”‚                      β”‚         β†˜      β†—                 β”‚   β”‚
β”‚                      β”‚          β”Œβ”€β”€β”                   β”‚   β”‚
β”‚                      β”‚          β”‚  β”‚                   β”‚   β”‚
β”‚                      β”‚          β””β”€β”€β”˜                   β”‚   β”‚
β”‚                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚ REST API
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    FastAPI Backend                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Parsers     β”‚  β”‚  AI Services β”‚  β”‚  Instrumentation β”‚  β”‚
β”‚  β”‚ - Google Meetβ”‚  β”‚ - Clustering β”‚  β”‚  - Cost Tracking β”‚  β”‚
β”‚  β”‚              β”‚  β”‚ - Bias Det.  β”‚  β”‚  - Metrics       β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚                      β”‚                      β”‚
   β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”          β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
   β”‚PostgreSQLβ”‚          β”‚ OpenAI API β”‚      β”‚ GCS Storage  β”‚
   β”‚ Database β”‚          β”‚ Anthropic  β”‚      β”‚ (Transcripts)β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow

  1. Import: User uploads Google Meet transcript (PDF/TXT)
  2. Parsing: Backend extracts speakers, utterances, timestamps
  3. AI Analysis: LLM generates nodes, edges, clusters (via prompts.json)
  4. Storage: Conversation data saved to PostgreSQL, files to GCS
  5. Visualization: Frontend fetches graph data, renders dual-view
  6. Interaction: User explores zoom levels, selects nodes, views analytics
  7. Editing: User corrections logged to edits_log table
  8. Export: Training data exported in JSONL format for fine-tuning

Project Structure

live_conversational_threads/
β”œβ”€β”€ lct_python_backend/          # FastAPI backend
β”‚   β”œβ”€β”€ backend.py               # App shell + router mounting
β”‚   β”œβ”€β”€ *_api.py                 # Router modules (import, stt, llm, graph, etc.)
β”‚   β”œβ”€β”€ services/                # Processing, LLM/STT, persistence helpers
β”‚   β”œβ”€β”€ alembic/                 # Database migrations
β”‚   β”œβ”€β”€ tests/                   # Unit + integration coverage
β”‚   └── prompts.json             # Prompt configuration
β”œβ”€β”€ lct_app/                     # React frontend (JSX)
β”‚   β”œβ”€β”€ src/pages/               # Route-level screens
β”‚   β”œβ”€β”€ src/components/          # Graph/audio/settings UI
β”‚   β”œβ”€β”€ src/services/            # API clients
β”‚   β”œβ”€β”€ package.json
β”‚   └── vite.config.js
β”œβ”€β”€ docs/                        # ADRs, plans, runbooks
β”œβ”€β”€ setup-once.command           # First-time setup
β”œβ”€β”€ start.command                # Daily startup
β”œβ”€β”€ AGENTS.md                    # Operating protocol
└── README.md

Prerequisites

  • Python 3.9+ (with venv or Conda)
  • Node.js 18+ and npm 9+
  • PostgreSQL 15+ (or Docker via docker compose up -d)
  • Optional API keys (depend on provider mode):
    • Local mode: none required
    • Online LLMs: GEMINI_KEY / GEMINI_API_KEY / GOOGLEAI_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY, OPENROUTER_API_KEY, PERPLEXITY_API_KEY
    • Cloud persistence (optional): GCS_BUCKET_NAME, GOOGLE_APPLICATION_CREDENTIALS

Local Setup (Recommended)

Use the streamlined scripts from repo root:

1. One-time setup

./setup-once.command

This installs dependencies, initializes local PostgreSQL (.postgres_data on 5433), prepares lct_python_backend/.env, and runs migrations.

2. Daily startup

./start.command

This performs a clean start, runs migrations, then starts backend + frontend with prefixed terminal logs.

start.command now attempts shared Parakeet STT autostart by default (STT_AUTOSTART=1) and uses backend-owned STT routing. It also checks local LLM reachability at ${LOCAL_LLM_BASE_URL:-http://100.81.65.74:1234}/v1/models during startup.

To disable STT autostart for a run:

STT_AUTOSTART=0 ./start.command

By default this reuses the sibling Parakeet repo/container and shared Docker volume parakeet-models.

3. Full setup guide

See docs/LOCAL_SETUP.md for detailed setup behavior and troubleshooting.


Running the Application

1. Start local stack

./start.command

2. Verify services

3. Import a Google Meet Transcript

  1. Navigate to http://localhost:5173
  2. Click "Import Transcript" button
  3. Upload a Google Meet transcript (PDF or TXT format)
  4. Wait for AI-powered graph generation (~30-60 seconds)
  5. Explore the conversation using dual-view interface!

Environment Variables

Backend Core Variables

Variable Description Example
DATABASE_URL PostgreSQL connection string postgresql://lct_user:lct_password@localhost:5433/lct_dev
DEFAULT_LLM_MODE Local/online default mode local
LOCAL_LLM_BASE_URL Local LLM endpoint http://100.81.65.74:1234

Backend Optional Variables

Variable Description Default
OPENAI_API_KEY OpenAI key (online mode) unset
ANTHROPIC_API_KEY Anthropic key (online mode) unset
GEMINI_KEY / GEMINI_API_KEY / GOOGLEAI_API_KEY Gemini key aliases (online mode) unset
OPENROUTER_API_KEY OpenRouter key (online mode) unset
PERPLEXITY_API_KEY Perplexity key (fact-checking) unset
GCS_BUCKET_NAME Cloud bucket for conversation JSON unset
GCS_FOLDER Cloud folder path unset
GOOGLE_APPLICATION_CREDENTIALS ADC/service account path unset
LOG_LEVEL Logging level INFO
TRACE_API_CALLS Backend outbound call tracing true
API_LOG_PREVIEW_CHARS Trace preview truncation length 280

Frontend Variables

Variable Description Default
VITE_BACKEND_API_URL Backend API base URL (service clients) http://localhost:8000
VITE_AUTH_TOKEN Optional bearer token for protected backends unset
VITE_API_TRACE Frontend request/response console tracing dev-mode on

Database Setup

1. Recommended path (scripted)

Use ./setup-once.command for first-time setup and ./start.command for daily runs.
These scripts initialize local Postgres (default localhost:5433) and run Alembic migrations automatically.

2. Manual migration path (advanced)

From lct_python_backend/:

alembic upgrade head

Default local connection in this repo is:

postgresql://lct_user:lct_password@localhost:5433/lct_dev

3. Database Schema

Schema is migration-driven (alembic upgrade head) and evolves over time.

Current core entities include:

  • conversations, utterances, nodes, relationships
  • transcript_events, app_settings
  • bookmarks, fact_checks, api_calls_log

For field-level details, use:

  • ORM models: lct_python_backend/models.py
  • Migrations: lct_python_backend/alembic/versions/

API Documentation

Once the backend server is running:

Key Endpoints

GET    /api/import/health
POST   /api/import/google-meet
POST   /api/import/from-text
GET    /conversations/{conversation_id}
POST   /save_json/
GET    /api/settings/stt
PUT    /api/settings/stt
GET    /api/settings/llm
PUT    /api/settings/llm
GET    /api/settings/llm/models
GET    /api/graph/health
POST   /api/graph/generate
WS     /ws/transcripts

Documentation

Core Documentation

Document Description
ROADMAP.md 14-week implementation plan with instrumentation, metrics, storage, and testing strategies
TIER_1_DECISIONS.md Foundational architectural decisions (Google Meet format, zoom levels, dual-view, prompts)
TIER_2_FEATURES.md Detailed specifications for 6 major features (Node Detail Panel, Speaker Analytics, Prompts Config, etc.)
FEATURE_SIMULACRA_LEVELS.md Simulacra level detection, cognitive bias analysis, implicit frames, rhetorical profiling
DATA_MODEL_V2.md Complete database schema with all tables, indexes, and relationships
PRODUCT_VISION.md High-level product strategy and user personas
FEATURE_ROADMAP.md ROI analysis and feature prioritization

Architecture Decision Records (ADRs)

ADR Title Status
ADR-001 Google Meet Transcript Support Proposed
ADR-002 Hierarchical Coarse-Graining for Multi-Scale Visualization Proposed
ADR-003 Observability, Metrics, and Storage Baseline Proposed
ADR-004 Dual-View Architecture (Timeline + Contextual Network) Approved
ADR-005 Externalized Prompts Configuration System Approved
ADR-006 Testing Strategy & Quality Assurance Proposed
ADR-007 System Invariants & Data Integrity Proposed
ADR-008 Local STT & Append-Only Transcript Events Approved
ADR-009 Local-First LLM Defaults Proposed
ADR-010 Minimal Conversation Schema for Pause/Resume and Thread Legibility Proposed
ADR-011 Minimal Live Conversation UI Redesign Draft
ADR-012 Real-Time Speaker Diarization Sidecar for Local Speech-to-Graph Proposed

See docs/adr/INDEX.md for the complete ADR index.


Development Roadmap

Phase 1: Foundation & Infrastructure (Weeks 1-4)

  • βœ… Database schema migration (DATA_MODEL_V2)
  • βœ… Instrumentation & cost tracking
  • 🚧 Google Meet transcript parser
  • 🚧 Initial graph generation with prompt engineering

Phase 2: Core Features (Weeks 5-7)

  • πŸ“… Dual-view architecture (Timeline + Contextual)
  • πŸ“… 5-level zoom system
  • πŸ“… Node detail panel with editing

Phase 3: Analysis Features (Weeks 8-10)

  • πŸ“… Speaker analytics view
  • πŸ“… Prompts configuration UI
  • πŸ“… Edit history & training data export

Phase 4: Advanced Features (Weeks 11-14)

  • πŸ“… Simulacra level detection
  • πŸ“… Cognitive bias detection (25 types)
  • πŸ“… Implicit frame analysis
  • πŸ“… Final integration & polish

Legend:

  • βœ… Completed
  • 🚧 In Progress
  • πŸ“… Planned

See docs/ROADMAP.md for detailed sprint-by-sprint breakdown.


Troubleshooting

Backend Issues

Database connection errors:

# Check PostgreSQL is running
pg_ctl status

# Test connection
psql -U your_user -d lct_db

LLM API errors:

# Verify API keys are set
echo $OPENAI_API_KEY
echo $ANTHROPIC_API_KEY

# Check API key validity
curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"

Import errors:

# Reinstall dependencies
pip install --force-reinstall -r requirements.txt

# Check Python version (must be 3.9+)
python --version

Frontend Issues

Port conflicts:

# Kill process on port 5173
lsof -ti:5173 | xargs kill -9

# Or use different port
npm run dev -- --port 3000

CORS errors:

  • Backend is configured to allow http://localhost:5173
  • If using different port, update CORS settings in backend.py

Build errors:

# Clear cache and reinstall
rm -rf node_modules package-lock.json
npm install

Performance Issues

Slow graph generation:

  • Check api_calls_log table for high latency
  • Consider using GPT-3.5-turbo for cheaper/faster clustering
  • Reduce max_tokens in prompts.json

High LLM costs:

  • Check /api/cost-tracking/stats endpoint
  • Review prompts.json for token-heavy templates
  • Enable prompt caching (coming in Week 9)

Contributing

We welcome contributions! Please follow these guidelines:

Pull Request Process

  1. Create a feature branch from main:

    git checkout -b feature/your-feature-name
  2. Follow commit message format (see .claude/CLAUDE.md):

    [TYPE]: Brief summary (50 chars max)
    
    MOTIVATION:
    - Why this change was needed
    
    APPROACH:
    - How the solution works
    
    CHANGES:
    - file1.py: Specific changes made
    
    IMPACT:
    - What functionality is added/changed
    
    TESTING:
    - How to verify the changes work
    
  3. Write tests:

    • Unit tests: pytest tests/unit/test_your_feature.py
    • Integration tests: pytest tests/integration/
    • Maintain 85%+ coverage
  4. Run linters:

    # Python
    black .
    flake8 .
    mypy .
    
    # Frontend
    npm run lint
  5. Create Pull Request to main:

    • Fill out PR template
    • Link related issues
    • Request review from maintainers

Development Guidelines

  • No direct commits to main – all changes via PR
  • Test coverage: 85%+ for new code
  • Documentation: Update relevant docs/ files
  • ADRs: Create ADR for significant architectural decisions
  • Prompts: Externalize new LLM prompts to prompts.json

Code Style

Python:

  • Black formatter (line length 100)
  • Type hints for all functions
  • Docstrings (Google style)

TypeScript:

  • Prettier formatter
  • ESLint rules enforced
  • Prefer functional components with hooks

License

This project is licensed under the GNU General Public License v3.0 (GPLv3).

You are free to use, modify, and distribute this software under the terms of the GPLv3, which ensures that derivative works remain open source.

Key Points:

  • βœ… Use freely for personal, academic, or open-source projects
  • βœ… Modify and distribute under GPLv3 terms
  • ❌ Cannot use in proprietary/closed-source software without commercial license

Commercial Use

If you would like to use this software in a closed-source or commercial product, or if you need a commercial license without the GPL's copyleft requirements, please contact:

Email: adityaadiga6@gmail.com GitHub: https://github.com/aditya-adiga


Contact & Support

Maintainer: Aditya Adiga Email: adityaadiga6@gmail.com GitHub: @aditya-adiga

Issues: GitHub Issues Discussions: GitHub Discussions


Acknowledgments

  • Zvi Mowshowitz – Simulacra Levels framework
  • LessWrong Community – Cognitive bias taxonomies
  • OpenAI & Anthropic – LLM APIs powering analysis
  • React Flow – Graph visualization library
  • FastAPI – Python web framework

Last Updated: 2026-02-13 Version: 2.1.0 (Local STT, local-first LLM, security hardening)

About

My fork of Aditya Adiga's LCT app

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 66.5%
  • JavaScript 28.8%
  • TypeScript 3.2%
  • Shell 1.4%
  • Dockerfile 0.1%
  • Mako 0.0%