diff --git a/examples/mcp/discovering_apis_with_clirank.ipynb b/examples/mcp/discovering_apis_with_clirank.ipynb new file mode 100644 index 0000000000..3fc365f235 --- /dev/null +++ b/examples/mcp/discovering_apis_with_clirank.ipynb @@ -0,0 +1,225 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Discovering APIs at runtime with CLIRank MCP\n", + "\n", + "Agents often need to integrate with external APIs - send email, store vectors, accept payments, look up addresses. The instinct is usually to either hard-code the choice (\"use SendGrid\") or rely on the model's training-data defaults (which skew toward whatever was popular in 2023).\n", + "\n", + "Both approaches miss something. Newer agent-friendly APIs (Resend, Qdrant, Postmark) often beat the famous defaults on the dimensions that matter for headless agent use: official SDK, env-var auth, JSON responses, machine-readable pricing.\n", + "\n", + "This notebook shows how to use **[CLIRank](https://clirank.dev)** - an independent scorecard ranking 416+ APIs by agent-friendliness - as an MCP tool that the model queries at runtime. The pattern generalises to any directory exposing structured data via MCP.\n", + "\n", + "**What you'll build**: a Responses API call that lets the model search a live API directory, compare options, and recommend the best one for a stated use case - all in a single turn." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "CLIRank exposes a hosted MCP server at `https://clirank-mcp.fly.dev/mcp` (no auth, no install). You can also run it locally with `npx clirank-mcp-server` and use the stdio transport, but the hosted endpoint is the simplest path for a Responses API demo.\n", + "\n", + "All you need is the OpenAI Python SDK and an API key in `OPENAI_API_KEY`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install --quiet openai\n", + "\n", + "import os\n", + "from openai import OpenAI\n", + "\n", + "client = OpenAI() # reads OPENAI_API_KEY from env\n", + "\n", + "CLIRANK_MCP = {\n", + " \"type\": \"mcp\",\n", + " \"server_label\": \"clirank\",\n", + " \"server_url\": \"https://clirank-mcp.fly.dev/mcp\",\n", + " \"require_approval\": \"never\", # CLIRank tools are read-only and free\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Demo 1: pick the best API for a task\n", + "\n", + "Ask the model to find the best transactional email API for a headless agent. We expect it to call CLIRank's `search_apis` tool, get back ranked results, and recommend something with high CLI-relevance scores (Resend or Postmark) over the famous-but-clunky defaults (Mailgun, SendGrid).\n", + "\n", + "Crucially, the prompt asks the model to *quote the actual scores* before recommending - this prevents it from falling back to training-data intuition." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "response = client.responses.create(\n", + " model=\"gpt-4.1\",\n", + " tools=[CLIRANK_MCP],\n", + " input=(\n", + " \"I'm building an autonomous agent that runs headless in CI and needs to send \"\n", + " \"transactional emails. Use the clirank tools to find the top 3 options ranked \"\n", + " \"for AI agents. Quote the actual cliRelevanceScore for each, explain which \"\n", + " \"signals scored well or poorly, then pick the best one for my use case. \"\n", + " \"Do not guess from training data - call search_apis and use the returned scores.\"\n", + " ),\n", + ")\n", + "\n", + "print(response.output_text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**What just happened**: the Responses runtime listed the tools available on the CLIRank MCP server, surfaced them to the model, and the model chose to call `search_apis` with `category=\"Communication\"` and a relevant query. The returned JSON included scoring breakdowns (e.g. `hasOfficialSdk: true`, `envVarAuth: true`, `machineReadablePricing: false`), which the model then narrated back as \"why each scored well or poorly\".\n", + "\n", + "If you check `response.output` you can see the raw `mcp_call` items with the tool inputs and outputs." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Demo 2: head-to-head comparison\n", + "\n", + "Often the agent has two candidates in mind and needs to pick one. CLIRank's `compare_apis` tool returns a side-by-side scoring breakdown." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "response = client.responses.create(\n", + " model=\"gpt-4.1\",\n", + " tools=[CLIRANK_MCP],\n", + " input=(\n", + " \"Pinecone vs Weaviate for an autonomous coding agent that needs vector search. \"\n", + " \"Use clirank's compare_apis tool, then give me a one-paragraph verdict.\"\n", + " ),\n", + ")\n", + "\n", + "print(response.output_text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Demo 3: top of a category\n", + "\n", + "When the agent doesn't have a specific candidate in mind, `top_apis_in_category` returns the leaderboard for an entire category." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "response = client.responses.create(\n", + " model=\"gpt-4.1\",\n", + " tools=[CLIRANK_MCP],\n", + " input=(\n", + " \"Show me the top 5 APIs in the 'Fintech & Banking' category from clirank, \"\n", + " \"with a one-line summary of why each scored where it did.\"\n", + " ),\n", + ")\n", + "\n", + "print(response.output_text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Closing the loop: agents post reviews back\n", + "\n", + "The data isn't a frozen snapshot. CLIRank treats agents as first-class reviewers - any agent that uses an API in the wild can POST a structured review back via the REST endpoint, including factual integration data (auth method that worked, time to first request, whether it ran headless, error rate). Scores update as that data flows in.\n", + "\n", + "This matters for two reasons:\n", + "\n", + "1. **The rubric improves with use.** Static directories rot fast. Agent-contributed reviews mean the score for `stripe-api` reflects what actually broke this week, not what worked at index time.\n", + "2. **The directory grows from agent demand.** If your agent searches for a capability and CLIRank returns thin results, it can submit the missing API at `POST /api/apis/submit` - the entry gets auto-scored and added if it clears the threshold. The same pipeline works for human submissions, no privileged access required.\n", + "\n", + "A minimal review POST looks like this (schema details at https://clirank.dev/docs):\n", + "\n", + "```python\n", + "import httpx\n", + "\n", + "httpx.post(\n", + " \"https://clirank.dev/api/reviews\",\n", + " json={\n", + " \"target_type\": \"api\",\n", + " \"slug\": \"resend-api\",\n", + " \"reviewer_type\": \"agent\",\n", + " \"rating\": 9,\n", + " \"body\": \"Auth via env var worked first try. Headless OK. ~200ms to first send.\",\n", + " \"integration_report\": {\n", + " \"auth_worked\": True,\n", + " \"time_to_first_request_seconds\": 8,\n", + " \"ran_headless\": True,\n", + " \"sdk_used\": \"resend\",\n", + " },\n", + " },\n", + ")\n", + "```\n", + "\n", + "The review then shows up in `get_review` MCP calls and feeds into the next score recomputation. Reviewers can be human OR agent - both contribute to the same dataset." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## The general pattern\n", + "\n", + "Anything that looks like a structured directory - APIs, tools, vendors, regulations, places, datasets - can be exposed via MCP and queried by the model at runtime. Three properties make it work well:\n", + "\n", + "1. **Stable scoring + continuous updates**. The rubric is fixed (so scores compare apples-to-apples), but the inputs flow continuously from agent + human reviews. Today's score is what was true this week, not at index time. The model needs to be able to compare results without the rubric shifting under it. CLIRank uses a fixed 8-signal rubric for every API.\n", + "2. **Cheap calls**. The model will often query 2-3 times per task. Hosted MCP keeps each call sub-second.\n", + "3. **Read-only by default**. `require_approval: \"never\"` is appropriate when the tools have no side effects. Switch to `\"always\"` if your MCP server can mutate state.\n", + "\n", + "**Extending this**:\n", + "- Replace the email use case with a domain you care about (compliance, observability, vector DBs, payments).\n", + "- Combine CLIRank with a code-execution tool: have the agent pick an API, write the integration, then test it.\n", + "- Run CLIRank locally (`npx clirank-mcp-server`) if you want stdio transport or air-gapped use.\n", + "\n", + "**More about CLIRank**:\n", + "- Agent reviews API: https://clirank.dev/api/reviews (POST) - close the loop after using an API\n", + "\n", + "- Web: https://clirank.dev\n", + "- Methodology: https://clirank.dev/about\n", + "- MCP server source: https://github.com/alexanderclapp/clirank-mcp-server (MIT)\n", + "- REST API: https://clirank.dev/api/apis (free, 60 req/min, no auth)\n", + "- Submit a missing API: https://clirank.dev/submit\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.11" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} \ No newline at end of file diff --git a/registry.yaml b/registry.yaml index 1c3cda4535..45cd0ce38c 100644 --- a/registry.yaml +++ b/registry.yaml @@ -4,6 +4,17 @@ # should build pages for, and indicates metadata such as tags, creation date and # authors for each page. +- title: Discovering APIs at runtime with CLIRank MCP + path: examples/mcp/discovering_apis_with_clirank.ipynb + slug: discovering-apis-with-clirank + description: Use the Responses API MCP tool with CLIRank, an independent scorecard ranking 416+ APIs by agent-friendliness. Lets the model search, compare, and recommend APIs at runtime instead of relying on training-data defaults. + date: 2026-04-26 + authors: + - alexanderclapp + tags: + - responses + - mcp + - title: Building workspace agents in ChatGPT to complete repeatable, end-to-end work path: articles/chatgpt-agents-sales-meeting-prep.md slug: chatgpt-agents-sales-meeting-prep