Follow this when a user asks you to install eval-mcp into their IDE.
uvx --from llm-evaluation-system eval-mcp install --yesThe subcommand detects which IDEs are present (Claude Code, Kiro, VS Code,
Cursor, Codex), registers eval in each, and warms the uvx cache so the
first IDE launch isn't 60s of "disconnected". Then tell the user to
restart whichever IDE(s) it touched.
If the user wants a single IDE only:
uvx --from llm-evaluation-system eval-mcp install --ide claude-code --yesValid --ide values: claude-code, kiro, vscode, cursor, codex.
Comma-separate for multiple.
If eval is already registered, the subcommand skips with a message.
Pass --force to overwrite.
User needs uv. Check: command -v uv. If missing, tell them to run
curl -LsSf https://astral.sh/uv/install.sh | sh (or brew install uv
on macOS).
Want to share eval results, datasets, judges, and reports with teammates via a shared S3 bucket?
If yes, team already has a bucket:
uvx --from llm-evaluation-system eval-mcp init <bucket-name>If yes, creating a new bucket (first on the team):
git clone https://github.com/awslabs/llm-evaluation-system.git /tmp/eval-mcp-infra
cd /tmp/eval-mcp-infra/infra/modules/eval-logs-bucket
terraform init
terraform apply -var="bucket_name=<bucket-name>"Then uvx --from llm-evaluation-system eval-mcp init <bucket-name>. Tell the user: teammates just run eval-mcp init <bucket-name> — no Terraform needed.
Suggested bucket name if they don't have a preference: eval-mcp-$(aws sts get-caller-identity --query Account --output text) — globally unique, easy to remember.
If no: skip. They can enable it later with eval-mcp init <bucket>.
If eval-mcp install won't run for some reason (e.g. corporate network blocks PyPI
mid-install and the cache warm-up fails), here are the same operations done by hand:
- Claude Code:
claude mcp add eval -s user -- uvx --from llm-evaluation-system eval-mcp - VS Code:
code --add-mcp '{"name":"eval","command":"uvx","args":["--from","llm-evaluation-system","eval-mcp"]}' - Kiro: merge into
~/.kiro/settings/mcp.jsonundermcpServers.eval:{"mcpServers": {"eval": {"command": "uvx", "args": ["--from", "llm-evaluation-system", "eval-mcp"]}}} - Cursor: same JSON shape, in
~/.cursor/mcp.json. Or use the one-click deeplink:cursor://anysphere.cursor-deeplink/mcp/install?name=eval&config=eyJjb21tYW5kIjoidXZ4IiwiYXJncyI6WyItLWZyb20iLCJsbG0tZXZhbHVhdGlvbi1zeXN0ZW0iLCJldmFsLW1jcCJdfQ== - Codex: append to
~/.codex/config.toml:[mcp_servers.eval]\ncommand = "uvx"\nargs = ["--from", "llm-evaluation-system", "eval-mcp"]
For any manual JSON/TOML edit: back up first, merge — never clobber, atomic write.
Do not use llm-evaluation-system@latest in the IDE config — it forces a PyPI
resolution on every MCP start (~20s cold start) and causes the MCP to show
"disconnected" on first connect after any new release. Plain llm-evaluation-system
is what's documented across the MCP ecosystem (reference servers, mcp-atlassian,
etc.). Upgrade separately via uv cache clean llm-evaluation-system.
The install subcommand prints per-IDE restart hints. Quick recap:
- Claude Code / Cursor / VS Code: reload/restart the window or app
- Kiro: picks up
mcp.jsonautomatically — no restart needed - Codex: must restart the CLI
Per IDE: remove the eval entry from the relevant config (Claude Code:
claude mcp remove eval -s user; others: edit the JSON/TOML). Then restart.