Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions AWESOME_WEBMCP.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ A curated list of awesome WebMCP demos.
- **Example Prompt:** "I want to book a flight from New York to Los Angeles for two people on next Thursday."
- [Animal Viewer](https://65s6dw.csb.app/) - A simple codesandbox demo page that shows either a dog or a cat image.
- **Example Prompt:** "Show me a dog on this page"
- **Moving Beyond Screen Scraping**: A hands-on example of using WebMCP to create an agentic first experience with 10x fewer tokens
- [Article](https://lnkd.in/daPRAtMX) | [Code](https://lnkd.in/dkZ3Jizn)

## Contributing

Expand Down
69 changes: 48 additions & 21 deletions evals-cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,19 @@ A TypeScript framework for evaluating the tool-calling capabilities of Large Lan
The project is structured as follows:

- `src/`: Source code.
- `bin/runevals.ts`: Main entry point that sets up the backend and runs the evaluation loop.
- `backend/`: Implementation of LLM backends (e.g., `googleai.ts`, `ollama.ts`).
- `types/`: TypeScript definitions for tools, messages, and evaluations.
- `bin/runevals.ts`: Entry point that loads tool schemas from a JSON file and runs the evaluation loop.
- `bin/webmcpevals.ts`: Entry point that loads tool schemas live from a browser page via the WebMCP API.
- `backend/`: Implementation of LLM backends (e.g., `googleai.ts`, `ollama.ts`).
- `browser/`: Browser automation for WebMCP tool discovery (`webmcp.ts`).
- `types/`: TypeScript definitions for tools, messages, and evaluations.
- `examples/`: Detailed examples and test data.
- `travel/`: A travel agent example containing `schema.json` and `evals.json`.

## Prerequisites

- Node.js (v18+ recommended)
- A Google AI Studio API Key (for Gemini models)
- Chrome Canary 146+ with the `#enable-webmcp-testing` flag enabled (for `webmcpevals` only)

## Setup

Expand Down Expand Up @@ -51,46 +54,70 @@ The project is structured as follows:

## Usage

### Running the Travel Example
### `runevals` — file-based tool schemas

Loads tool schemas from a local JSON file.

```bash
node dist/bin/runevals.js --model=gemini-2.5-flash --tools=examples/travel/schema.json --evals=examples/travel/evals.json
```

### Running evals with Ollama
With Ollama:

```bash
node dist/bin/runevals.js --model=qwen3:8b --backend=ollama --tools=examples/travel/schema.json --evals=examples/travel/evals.json
```

| Argument | Required | Default | Description |
| ----------- | -------- | ------------------ | ------------------------------------- |
| `--tools` | Yes | — | Path to the tool schema JSON file |
| `--evals` | Yes | — | Path to the evals JSON file |
| `--backend` | No | `gemini` | Backend to use (`gemini` or `ollama`) |
| `--model` | No | `gemini-2.5-flash` | Model name |

### `webmcpevals` — live tool schemas via WebMCP

Launches Chrome Canary, navigates to the given URL, and retrieves tool schemas live from the page via `navigator.modelContextTesting.listTools()`. Requires Chrome Canary 146+ with the `chrome://flags/#enable-webmcp-testing` flag enabled.

```bash
node dist/bin/webmcpevals.js --url=https://example.com/my-webmcp-app --evals=examples/travel/evals.json
```

| Argument | Required | Default | Description |
| ----------- | -------- | ------------------ | ------------------------------------- |
| `--url` | Yes | — | URL of the page exposing WebMCP tools |
| `--evals` | Yes | — | Path to the evals JSON file |
| `--backend` | No | `gemini` | Backend to use (`gemini` or `ollama`) |
| `--model` | No | `gemini-2.5-flash` | Model name |

## Argument Constraints

You can use constraint operators to match argument values flexibly. A constraint object is identified when **all** its keys start with `$`.

### Supported Operators

| Operator | Description | Example |
|---|---|---|
| **`$pattern`** | Regex match | `{"$pattern": "^2026-\\d{2}$"}` |
| **`$contains`** | Substring match | `{"$contains": "York"}` |
| **`$gt`**, **`$gte`** | Greater than (or equal) | `{"$gte": 1}` |
| **`$lt`**, **`$lte`** | Less than (or equal) | `{"$lt": 100}` |
| **`$type`** | Type check | `{"$type": "string"}` |
| **`$any`** | Presence check | `{"$any": true}` |
| Operator | Description | Example |
| --------------------- | ----------------------- | ------------------------------- |
| **`$pattern`** | Regex match | `{"$pattern": "^2026-\\d{2}$"}` |
| **`$contains`** | Substring match | `{"$contains": "York"}` |
| **`$gt`**, **`$gte`** | Greater than (or equal) | `{"$gte": 1}` |
| **`$lt`**, **`$lte`** | Less than (or equal) | `{"$lt": 100}` |
| **`$type`** | Type check | `{"$type": "string"}` |
| **`$any`** | Presence check | `{"$any": true}` |

### Example

```json
{
"expectedCall": {
"functionName": "searchFlights",
"arguments": {
"destination": "NYC",
"outboundDate": { "$pattern": "^2026-01-\\d{2}$" },
"passengers": { "$gte": 1 },
"preferences": { "$any": true }
}
"expectedCall": {
"functionName": "searchFlights",
"arguments": {
"destination": "NYC",
"outboundDate": { "$pattern": "^2026-01-\\d{2}$" },
"passengers": { "$gte": 1 },
"preferences": { "$any": true }
}
}
}
```

Expand Down
Loading
Loading