A production-grade AI chatbot that discovers and uses external tools at runtime via the Model Context Protocol. Built for a final evaluation where the MCP server, its tools, and their schemas are revealed only at runtime.
flowchart LR
Browser["Browser"]
Next["Next.js 15\nVercel or static host"]
Backend["FastAPI\nmanaged container / local"]
MCP["MCP Server\nMeridian or any URL"]
DB["SQLite\n+ JSON RAG"]
OR["OpenRouter\nLLM completions"]
OAI["OpenAI\nembeddings only"]
LS["LangSmith\ntracing"]
Browser -->|"Clerk session"| Next
Next -->|"Bearer JWT"| Backend
Backend -->|"Streamable HTTP"| MCP
Backend --> DB
Backend --> OR
Backend --> OAI
Backend -.->|"env-driven traces"| LS
Swap-in point: set MCP_SERVER_URL on the backend host. Tool discovery, schema translation, LLM injection, and UI rendering follow automatically. No code changes.
- MCP host connects over Streamable HTTP at startup; tool list cached and injected into every LLM call
- Token-level SSE streaming with live tool call and result cards in the UI
- Clerk authentication; SQLite-persisted chats and messages per user
- Optional RAG retrieval (embeddings as JSON; cosine in-process) with text-embedding-3-small
- structlog + Cloud Logging JSON renderer; LangSmith traces via env vars
- GitHub Actions CI (ruff, pytest, tsc, build)
final-evaluation-mcp/
backend/ FastAPI backend (Python 3.12, uv)
src/final_eval_mcp/
config.py pydantic-settings + early dotenv
logging.py structlog + Cloud Logging renderer
tracing.py LangSmith env wiring
main.py FastAPI factory + lifespan
api/ REST routes: chats, messages, tools, health
auth/clerk.py JWKS cache + JWT verification
chat/ LLM client, prompts, streaming orchestrator
mcp/ MCPHost (lifespan), adapters, tool call loop
rag/ Embeddings, ingest CLI, JSON-vector retrieval
db/ SQLAlchemy 2 async, models, repositories
alembic/ Async migrations (SQLite / portable JSON)
tests/ pytest suite with mocked LLM + MCP
frontend/ Next.js 15 App Router (TypeScript, Tailwind, shadcn)
src/
app/(auth)/ Clerk sign-in / sign-up pages
app/(app)/chat/ Chat shell + per-chat route
app/(app)/tools/ MCP tool catalog with try-it form
components/chat/ ChatShell, MessageList, Composer, ToolCallCard
lib/ Typed API client, SSE consumer, shared types
docker/ Multi-stage Dockerfiles (backend, frontend)
docker-compose.yml Local dev: backend + SQLite volume (MCP remote)
scripts/
dev.sh Start local stack
migrate.sh Alembic from repo root (`upgrade head`, etc.)
bootstrap_db.sh mkdir .data + run Alembic
seed_db.sh Run Alembic migrations (host / CI)
tail_logs.sh Stream Cloud Logging (optional; needs gcloud + Cloud Run)
.github/workflows/
ci.yml Lint, type-check, test, build
Without Docker: see docs/RUN_LOCAL_NO_DOCKER.md (SQLite + uvicorn + pnpm dev, Meridian MCP remote).
git clone https://github.com/YOUR_ORG/final-evaluation-mcp
cd final-evaluation-mcp
cp .env.example .env
# Fill in OPENROUTER_API_KEY, OPENAI_API_KEY, CLERK_* vars in .env
# Leave DATABASE_URL unset for repo .data/app.db (SQLite), or set sqlite+aiosqlite://..../scripts/dev.sh
# docker compose up (backend with SQLite volume), waits for /health, then pnpm dev in frontend/Or step by step:
docker compose up -d
./scripts/seed_db.sh # Alembic migrations
cd frontend && pnpm install && pnpm devServices:
- Frontend: http://localhost:3000
- Backend: http://localhost:8000 (docs:
/docs) - MCP: remote URL from
MCP_SERVER_URLin.env(default: Meridian order MCP)
curl http://localhost:8000/health
# {"status":"ok","timestamp":"...","version":"0.1.0"}| Variable | Required | Description |
|---|---|---|
OPENROUTER_API_KEY |
yes | OpenRouter key for LLM completions |
OPENROUTER_MODEL |
no | Model slug (default: openai/gpt-4o-mini) |
OPENAI_API_KEY |
yes | OpenAI key for text-embedding-3-small |
DATABASE_URL |
no | Default: SQLite file <repo>/.data/app.db. Optional: sqlite+aiosqlite://... or postgresql+asyncpg://... |
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY |
yes | Clerk publishable key (frontend) |
CLERK_SECRET_KEY |
yes | Clerk secret key (backend, not exposed to browser) |
CLERK_JWKS_URL |
yes | Clerk JWKS endpoint for JWT verification |
CLERK_ISSUER |
no | Clerk issuer URL (validates iss claim) |
MCP_SERVER_URL |
yes | MCP base URL (Streamable HTTP path, e.g. .../mcp) |
MCP_AUTH_TOKEN |
no | Bearer token if the MCP server requires auth |
LANGSMITH_TRACING |
no | true to enable LangSmith traces |
LANGSMITH_API_KEY |
no | LangSmith API key |
LANGSMITH_PROJECT |
no | LangSmith project name |
LOG_LEVEL |
no | DEBUG, INFO, WARNING, ERROR (default: INFO) |
ENV |
no | development or production |
CORS_ORIGINS |
no | Comma-separated allowed origins |
Host the frontend (e.g. Vercel) and backend (container or VM with a writable SQLite path). See docs/DEPLOY_VERCEL.md for layout and env vars.
To swap the MCP server at evaluation time, change MCP_SERVER_URL (and MCP_AUTH_TOKEN if required) on the backend host and restart the process so tools are re-discovered.
| Decision | Chosen | Alternatives considered | Reason |
|---|---|---|---|
| UI framework | Next.js 15 App Router | Gradio, Streamlit | Full design control; native Clerk integration; deploys to Vercel or any static/Node host |
| Auth | Clerk + JWKS verification | Auth0, custom JWT | Lowest integration surface; no session storage; JWKS cached locally |
| Database | SQLite (default) + optional Postgres | Cloud-only serverless DB | Zero-ops local file; portable JSON embeddings |
| Vector / RAG | In-process cosine on JSON vectors | pgvector, Pinecone | No separate vector extension; simple deploy |
| LLM provider | OpenRouter via AsyncOpenAI | Direct Anthropic/OpenAI | Model-agnostic; swap via OPENROUTER_MODEL; same SDK interface |
| MCP transport | Streamable HTTP (remote) | stdio, SSE | Remote MCP matches evaluation servers; swap-in via MCP_SERVER_URL |
| Orchestration | Hand-written async generator | LangGraph, LangChain agents | Single-agent tool-use loop is ~120 lines; explicit event protocol is easier to evaluate |
| Logging | structlog + stdlib bridge | stdlib, loguru | Structured JSON logs; stdlib bridge captures LangChain logs |
| Tracing | LangSmith via env vars | OpenTelemetry | Zero code changes; LANGCHAIN_TRACING_V2 auto-instruments all LLM calls |
| IaC | (removed from repo) | Terraform, Pulumi | Vercel + SQLite-backed API host; add your own IaC if needed |
| CI | GitHub Actions (lint + test + build) | Manual | No deploy workflow in-tree |
| Layer | Technology | Version |
|---|---|---|
| Frontend | Next.js | 15 |
| UI components | Tailwind CSS + shadcn/ui | 3.4 / latest |
| Auth (browser) | @clerk/nextjs | 6 |
| Backend | FastAPI + uvicorn | 0.115 / 0.30 |
| Python version | CPython | 3.12 |
| Package manager | uv | 0.5+ |
| Database | SQLite (+ aiosqlite) | 3.12+ |
| ORM | SQLAlchemy async | 2.0 |
| Migrations | Alembic | 1.13 |
| LLM provider | OpenRouter via openai SDK | 1.40 |
| Embeddings | OpenAI text-embedding-3-small | -- |
| MCP | mcp Python SDK | 1.3 |
| Auth (backend) | python-jose JWKS verification | 3.3 |
| Logging | structlog | 24.4 |
| Tracing | LangSmith | 0.1.100 |
# Backend
cd backend
uv run pytest tests/ -v
# Frontend
cd frontend
pnpm testBackend test coverage:
test_auth.py— Clerk JWKS verification (malformed tokens, unknown kid)test_mcp_host.py— adapter conversion, disconnected-host guardstest_orchestrator.py— streaming event protocol, sources emission
- RAG ingest pipeline wired to a document source (PDF, URLs)
- Multi-turn context trimming strategy (token-aware window)
- Streaming abort (cancel in-flight SSE)
- Rate limiting per user (Redis or Postgres counter)
- LangSmith evaluation dataset for regression testing
Key points to cover:
- The chatbot challenge: unknown MCP server, tools revealed at runtime
- Architecture decision: Next.js + FastAPI + MCPHost (not a monolith, not Gradio)
- The single swap-in point:
MCP_SERVER_URL— no code changes at eval time - Tech choices: SQLite for zero-cost persistence, structlog, LangSmith for trace review
- Walk the repo tree: workspace layout, module boundaries, interface contracts
Demo for video:
- Show
mcp/host.pyandmcp/adapters.py— tool discovery and OpenAI translation - Show the streaming event protocol in
chat/orchestrator.py - Walk docs/DEPLOY_VERCEL.md (or your host’s env screen) for production-shaped config
What to show:
- Local stack:
docker compose up -dthenpnpm devinfrontend/ - Sign up via Clerk, create a chat, send a message, watch tool call cards appear
/toolspage: live catalog of discovered tools, try-it form- LangSmith dashboard: trace for a tool-use turn (prompt tokens, tool call, result)
- Optional: API host dashboards (metrics, logs)
Likely blockers to anticipate:
- Clerk keys not yet filled in (placeholder in
.env) - Evaluation MCP URL unknown until eval day (swap-in is one env var)
- Session JWT missing
emailclaim — see docs/CLERK_SESSION_CLAIMS.md
Next steps:
- Fill Clerk keys once eval Clerk project is provided
- Deploy frontend + backend per docs/DEPLOY_VERCEL.md (or your platform)
- Set
MCP_SERVER_URLat eval day; verify/toolsshows new tools
Script:
- Show env vars / deploy for frontend + backend + SQLite (no in-repo Terraform)
- Open frontend URL, sign in, start a chat
- Ask a question that triggers a tool call — show the tool call card live
- Open LangSmith project, click the trace, show token counts and latency
- Show API logs: structured JSON from structlog (
request_id,latency_ms, …) - MCP swap: update
MCP_SERVER_URLon the backend host and restart - Refresh
/toolspage — new tools appear; send a message that uses one
Commands to have ready:
# Health (replace host)
curl -s https://<backend-url>/health
# Optional: Cloud Run logs if you deploy there
./scripts/tail_logs.sh backend