Skip to content

SectumPsempra/final-evaluation-mcp

Repository files navigation

Model Context Protocol Next.js FastAPI TypeScript Python Clerk OpenRouter SQLite SSE Vercel uv

MCP Chat landing page

MCP Chat

A production-grade AI chatbot that discovers and uses external tools at runtime via the Model Context Protocol. Built for a final evaluation where the MCP server, its tools, and their schemas are revealed only at runtime.


Architecture

flowchart LR
    Browser["Browser"]
    Next["Next.js 15\nVercel or static host"]
    Backend["FastAPI\nmanaged container / local"]
    MCP["MCP Server\nMeridian or any URL"]
    DB["SQLite\n+ JSON RAG"]
    OR["OpenRouter\nLLM completions"]
    OAI["OpenAI\nembeddings only"]
    LS["LangSmith\ntracing"]

    Browser -->|"Clerk session"| Next
    Next -->|"Bearer JWT"| Backend
    Backend -->|"Streamable HTTP"| MCP
    Backend --> DB
    Backend --> OR
    Backend --> OAI
    Backend -.->|"env-driven traces"| LS
Loading

Swap-in point: set MCP_SERVER_URL on the backend host. Tool discovery, schema translation, LLM injection, and UI rendering follow automatically. No code changes.


Features

  • MCP host connects over Streamable HTTP at startup; tool list cached and injected into every LLM call
  • Token-level SSE streaming with live tool call and result cards in the UI
  • Clerk authentication; SQLite-persisted chats and messages per user
  • Optional RAG retrieval (embeddings as JSON; cosine in-process) with text-embedding-3-small
  • structlog + Cloud Logging JSON renderer; LangSmith traces via env vars
  • GitHub Actions CI (ruff, pytest, tsc, build)

Repository layout

final-evaluation-mcp/
  backend/                        FastAPI backend (Python 3.12, uv)
    src/final_eval_mcp/
      config.py                   pydantic-settings + early dotenv
      logging.py                  structlog + Cloud Logging renderer
      tracing.py                  LangSmith env wiring
      main.py                     FastAPI factory + lifespan
      api/                        REST routes: chats, messages, tools, health
      auth/clerk.py               JWKS cache + JWT verification
      chat/                       LLM client, prompts, streaming orchestrator
      mcp/                        MCPHost (lifespan), adapters, tool call loop
      rag/                        Embeddings, ingest CLI, JSON-vector retrieval
      db/                         SQLAlchemy 2 async, models, repositories
    alembic/                      Async migrations (SQLite / portable JSON)
    tests/                        pytest suite with mocked LLM + MCP
  frontend/                       Next.js 15 App Router (TypeScript, Tailwind, shadcn)
    src/
      app/(auth)/                 Clerk sign-in / sign-up pages
      app/(app)/chat/             Chat shell + per-chat route
      app/(app)/tools/            MCP tool catalog with try-it form
      components/chat/            ChatShell, MessageList, Composer, ToolCallCard
      lib/                        Typed API client, SSE consumer, shared types
  docker/                         Multi-stage Dockerfiles (backend, frontend)
  docker-compose.yml              Local dev: backend + SQLite volume (MCP remote)
  scripts/
    dev.sh                        Start local stack
    migrate.sh                    Alembic from repo root (`upgrade head`, etc.)
    bootstrap_db.sh               mkdir .data + run Alembic
    seed_db.sh                    Run Alembic migrations (host / CI)
    tail_logs.sh                  Stream Cloud Logging (optional; needs gcloud + Cloud Run)
  .github/workflows/
    ci.yml                        Lint, type-check, test, build

Quickstart (local development)

Without Docker: see docs/RUN_LOCAL_NO_DOCKER.md (SQLite + uvicorn + pnpm dev, Meridian MCP remote).

Prerequisites

  • Python 3.12+, uv
  • Node 20+, pnpm
  • Docker with Buildx
  • A Clerk account (free tier)

1. Clone and configure

git clone https://github.com/YOUR_ORG/final-evaluation-mcp
cd final-evaluation-mcp
cp .env.example .env
# Fill in OPENROUTER_API_KEY, OPENAI_API_KEY, CLERK_* vars in .env
# Leave DATABASE_URL unset for repo .data/app.db (SQLite), or set sqlite+aiosqlite://...

2. Start infrastructure (Docker backend + SQLite)

./scripts/dev.sh
# docker compose up (backend with SQLite volume), waits for /health, then pnpm dev in frontend/

Or step by step:

docker compose up -d
./scripts/seed_db.sh          # Alembic migrations
cd frontend && pnpm install && pnpm dev

Services:

3. Verify

curl http://localhost:8000/health
# {"status":"ok","timestamp":"...","version":"0.1.0"}

Environment variables

Variable Required Description
OPENROUTER_API_KEY yes OpenRouter key for LLM completions
OPENROUTER_MODEL no Model slug (default: openai/gpt-4o-mini)
OPENAI_API_KEY yes OpenAI key for text-embedding-3-small
DATABASE_URL no Default: SQLite file <repo>/.data/app.db. Optional: sqlite+aiosqlite://... or postgresql+asyncpg://...
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY yes Clerk publishable key (frontend)
CLERK_SECRET_KEY yes Clerk secret key (backend, not exposed to browser)
CLERK_JWKS_URL yes Clerk JWKS endpoint for JWT verification
CLERK_ISSUER no Clerk issuer URL (validates iss claim)
MCP_SERVER_URL yes MCP base URL (Streamable HTTP path, e.g. .../mcp)
MCP_AUTH_TOKEN no Bearer token if the MCP server requires auth
LANGSMITH_TRACING no true to enable LangSmith traces
LANGSMITH_API_KEY no LangSmith API key
LANGSMITH_PROJECT no LangSmith project name
LOG_LEVEL no DEBUG, INFO, WARNING, ERROR (default: INFO)
ENV no development or production
CORS_ORIGINS no Comma-separated allowed origins

Deployment

Host the frontend (e.g. Vercel) and backend (container or VM with a writable SQLite path). See docs/DEPLOY_VERCEL.md for layout and env vars.

To swap the MCP server at evaluation time, change MCP_SERVER_URL (and MCP_AUTH_TOKEN if required) on the backend host and restart the process so tools are re-discovered.


Design decisions

Decision Chosen Alternatives considered Reason
UI framework Next.js 15 App Router Gradio, Streamlit Full design control; native Clerk integration; deploys to Vercel or any static/Node host
Auth Clerk + JWKS verification Auth0, custom JWT Lowest integration surface; no session storage; JWKS cached locally
Database SQLite (default) + optional Postgres Cloud-only serverless DB Zero-ops local file; portable JSON embeddings
Vector / RAG In-process cosine on JSON vectors pgvector, Pinecone No separate vector extension; simple deploy
LLM provider OpenRouter via AsyncOpenAI Direct Anthropic/OpenAI Model-agnostic; swap via OPENROUTER_MODEL; same SDK interface
MCP transport Streamable HTTP (remote) stdio, SSE Remote MCP matches evaluation servers; swap-in via MCP_SERVER_URL
Orchestration Hand-written async generator LangGraph, LangChain agents Single-agent tool-use loop is ~120 lines; explicit event protocol is easier to evaluate
Logging structlog + stdlib bridge stdlib, loguru Structured JSON logs; stdlib bridge captures LangChain logs
Tracing LangSmith via env vars OpenTelemetry Zero code changes; LANGCHAIN_TRACING_V2 auto-instruments all LLM calls
IaC (removed from repo) Terraform, Pulumi Vercel + SQLite-backed API host; add your own IaC if needed
CI GitHub Actions (lint + test + build) Manual No deploy workflow in-tree

Tech stack

Layer Technology Version
Frontend Next.js 15
UI components Tailwind CSS + shadcn/ui 3.4 / latest
Auth (browser) @clerk/nextjs 6
Backend FastAPI + uvicorn 0.115 / 0.30
Python version CPython 3.12
Package manager uv 0.5+
Database SQLite (+ aiosqlite) 3.12+
ORM SQLAlchemy async 2.0
Migrations Alembic 1.13
LLM provider OpenRouter via openai SDK 1.40
Embeddings OpenAI text-embedding-3-small --
MCP mcp Python SDK 1.3
Auth (backend) python-jose JWKS verification 3.3
Logging structlog 24.4
Tracing LangSmith 0.1.100

Testing

# Backend
cd backend
uv run pytest tests/ -v

# Frontend
cd frontend
pnpm test

Backend test coverage:

  • test_auth.py — Clerk JWKS verification (malformed tokens, unknown kid)
  • test_mcp_host.py — adapter conversion, disconnected-host guards
  • test_orchestrator.py — streaming event protocol, sources emission

Roadmap

  • RAG ingest pipeline wired to a document source (PDF, URLs)
  • Multi-turn context trimming strategy (token-aware window)
  • Streaming abort (cancel in-flight SSE)
  • Rate limiting per user (Redis or Postgres counter)
  • LangSmith evaluation dataset for regression testing

Evaluation playbook (3 videos)

Video 1 — Problem scope and approach

Key points to cover:

  1. The chatbot challenge: unknown MCP server, tools revealed at runtime
  2. Architecture decision: Next.js + FastAPI + MCPHost (not a monolith, not Gradio)
  3. The single swap-in point: MCP_SERVER_URL — no code changes at eval time
  4. Tech choices: SQLite for zero-cost persistence, structlog, LangSmith for trace review
  5. Walk the repo tree: workspace layout, module boundaries, interface contracts

Demo for video:

  • Show mcp/host.py and mcp/adapters.py — tool discovery and OpenAI translation
  • Show the streaming event protocol in chat/orchestrator.py
  • Walk docs/DEPLOY_VERCEL.md (or your host’s env screen) for production-shaped config

Video 2 — Current progress and blockers

What to show:

  1. Local stack: docker compose up -d then pnpm dev in frontend/
  2. Sign up via Clerk, create a chat, send a message, watch tool call cards appear
  3. /tools page: live catalog of discovered tools, try-it form
  4. LangSmith dashboard: trace for a tool-use turn (prompt tokens, tool call, result)
  5. Optional: API host dashboards (metrics, logs)

Likely blockers to anticipate:

  • Clerk keys not yet filled in (placeholder in .env)
  • Evaluation MCP URL unknown until eval day (swap-in is one env var)
  • Session JWT missing email claim — see docs/CLERK_SESSION_CLAIMS.md

Next steps:

  • Fill Clerk keys once eval Clerk project is provided
  • Deploy frontend + backend per docs/DEPLOY_VERCEL.md (or your platform)
  • Set MCP_SERVER_URL at eval day; verify /tools shows new tools

Video 3 — Deployment and demo

Script:

  1. Show env vars / deploy for frontend + backend + SQLite (no in-repo Terraform)
  2. Open frontend URL, sign in, start a chat
  3. Ask a question that triggers a tool call — show the tool call card live
  4. Open LangSmith project, click the trace, show token counts and latency
  5. Show API logs: structured JSON from structlog (request_id, latency_ms, …)
  6. MCP swap: update MCP_SERVER_URL on the backend host and restart
  7. Refresh /tools page — new tools appear; send a message that uses one

Commands to have ready:

# Health (replace host)
curl -s https://<backend-url>/health

# Optional: Cloud Run logs if you deploy there
./scripts/tail_logs.sh backend

About

MCP Chat is a production-grade AI chatbot that discovers and uses external tools at runtime via the Model Context Protocol.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors