A Comet ML Open Source Project
This Python toolbox contains three command-line easy to use utilities:
ez-mcp-server
- turns a file of Python functions into a MCP serverez-mcp-chatbot
- interactively debug MCP servers, with traces logged to Opikez-mcp-eval
- evaluate LLM applications using Opik's evaluation framework
The ez-mcp-server
allows a quick way to examine tools, signatures, descriptions, latency, and return values. Combined with the chatbot, you can create a fast workflow to interate on your MCP tools.
The ez-mcp-chatbot
allows a quick method to examine and debug LLM and MCP tool interactions, with observability available through Opik. Although the Opik Playground gives you the ability to test your prompts on datasets, do A/B testing, and more, this chatbot gives you a command-line interaction, debugging tools, combined with Opik observability.
pip install ez-mcp-toolbox --upgrade
ez-mcp-chatbot
That will start a ez-mcp-server
(using example tools below) and the ez-mcp-chatbot
configured to use those tools.
ez-mcp-eval --prompt "Answer the question" --dataset "my-dataset" --metric "Hallucination"
This will evaluate your LLM application using Opik's evaluation framework with your dataset and chosen metrics.
You can also limit the evaluation to the first N items of the dataset:
ez-mcp-eval --prompt "Answer the question" --dataset "large-dataset" --metric "Hallucination" --num 100
You can customize the chatbot's behavior with a custom system prompt:
# Use a custom system prompt
ez-mcp-chatbot --system-prompt "You are a helpful coding assistant"
# Create a default configuration
ez-mcp-chatbot --init
Example dialog:
This interaction of the LLM with the MCP tools will be logged, and available for examination and debugging in Opik:

The rest of this file describes these three commands.
A command-line utility for turning a regular file of Python functions or classes into a full-fledged MCP server.
Take an existing Python file of functions, such as this file, my_tools.py
:
# my_tools.py
def add_numbers(a: float, b: float) -> float:
"""
Add two numbers together.
Args:
a: First number to add
b: Second number to add
Returns:
The sum of a and b
"""
return a + b
def greet_user(name: str) -> str:
"""
Greet a user with a welcoming message.
Args:
name: The name of the person to greet
Returns:
A personalized greeting message
"""
return f"Welcome to ez-mcp-server, {name}!"
Then run the server with your custom tools:
ez-mcp-server my_tools.py
You can also load tools from installed Python modules:
ez-mcp-server opik_optimizer.utils.core
The server will automatically:
- Load all functions from your file or module (no ez_mcp_toolbox imports required)
- Convert them to MCP tools
- Generate JSON schemas from your function signatures
- Use your docstrings as tool descriptions
Note: if you just launch the server, it will wait for stdio input. This is designed to run from inside a system that will dynamically start the server (see below).
ez-mcp-server [-h] [--transport {stdio,sse}] [--host HOST] [--port PORT] [--include INCLUDE] [--exclude EXCLUDE] [tools_file]
Positional arguments:
tools_file
- Path to tools file or module name (e.g., 'my_tools.py' or 'opik_optimizer.utils.core') (default: tools.py)
Options:
-h
,--help
- show this help message and exit--transport {stdio,sse}
- Transport method to use (default:stdio
)--host HOST
- Host for SSE transport (default:localhost
)--port PORT
- Port for SSE transport (default:8000
)--include INCLUDE
- Python regex pattern to include only matching tool names--exclude EXCLUDE
- Python regex pattern to exclude matching tool names
You can control which tools are loaded using the --include
and --exclude
flags with Python regex patterns:
# Include only tools with "add" or "multiply" in the name
ez-mcp-server my_tools.py --include "add|multiply"
# Exclude tools with "greet" or "time" in the name
ez-mcp-server my_tools.py --exclude "greet|time"
# Use both filters together
ez-mcp-server my_tools.py --include ".*number.*" --exclude ".*square.*"
# Use with default tools
ez-mcp-server --include "add" --exclude "greet"
Filtering Logic:
- The
--include
filter is applied first, keeping only tools whose names match the regex pattern - The
--exclude
filter is then applied, removing any tools whose names match the regex pattern - Both filters can be used together for fine-grained control
- Invalid regex patterns will cause the server to exit with an error message
A powerful AI chatbot that integrates with Model Context Protocol (MCP) servers and provides observability through Opik tracing. This chatbot can connect to various MCP servers to access specialized tools and capabilities, making it a versatile assistant for different tasks.
- MCP Integration: Connect to multiple Model Context Protocol servers for specialized tool access
- Opik Observability: Built-in tracing and observability with Opik integration
- Interactive Chat Interface: Rich console interface with command history and auto-completion
- Python Code Execution: Execute Python code directly in the chat environment
- Tool Management: Discover and use tools from connected MCP servers
- Configurable: JSON-based configuration for models and MCP servers
- Async Support: Full asynchronous operation for better performance
The server implements the full MCP specification:
- Tool Discovery: Dynamic tool listing and metadata
- Tool Execution: Asynchronous tool calling with proper error handling
- Protocol Compliance: Full compatibility with MCP clients
- Extensibility: Easy addition of new tools and capabilities
Create a default configuration file:
ez-mcp-chatbot --init
This creates a ez-config.json
file with default settings.
Edit ez-config.json
to specify your model and MCP servers. For example:
{
"model": "openai/gpt-4o-mini",
"model_kwargs": {
"temperature": 0.2
},
"mcp_servers": [
{
"name": "ez-mcp-server",
"description": "Ez MCP server from Python files",
"command": "ez-mcp-server",
"args": ["/path/to/my_tools.py"]
}
]
}
Supported model formats:
openai/gpt-4o-mini
anthropic/claude-3-sonnet
google/gemini-pro
- And many more through LiteLLM
Inside the ez-mcp-chatbot
, you can have a normal LLM conversation.
In addition, you have access to the following meta-commands:
/clear
- Clear the conversation history/help
- Show available commands/debug on
or/debug off
to toggle debug output/show tools
- to list all available tools/show tools SERVER
- to list tools for a specific server/run SERVER.TOOL
- to execute a tool! python_code
- to execute Python code (e.g., '! print(2+2)')quit
orexit
- Exit the chatbot
Execute Python code by prefixing with !
:
! print(self.messages)
! import math
! math.sqrt(16)
The chatbot automatically discovers and uses tools from connected MCP servers. Simply ask questions that require tool usage, and the chatbot will automatically call the appropriate tools.
The chatbot uses a system prompt to define its behavior and personality. You can customize this using the --system-prompt
command line option.
By default, the chatbot uses this system prompt:
You are a helpful AI system for answering questions that can be answered
with any of the available tools.
You can override the default system prompt to customize the chatbot's behavior:
# Make it a coding assistant
ez-mcp-chatbot --system-prompt "You are an expert Python developer who helps with coding tasks."
# Make it a data analyst
ez-mcp-chatbot --system-prompt "You are a data scientist who specializes in analyzing datasets and creating visualizations."
# Make it more conversational
ez-mcp-chatbot --system-prompt "You are a friendly AI assistant who loves to help users with their questions and tasks."
The system prompt affects how the chatbot:
- Interprets user requests
- Decides which tools to use
- Structures its responses
- Maintains conversation context
The chatbot includes built-in Opik observability integration:
For the command-line flag --opik
:
hosted
(default): Use hosted Opik servicelocal
: Use local Opik instancedisabled
: Disable Opik tracing
Set environment variables for Opik:
# For hosted mode
export OPIK_API_KEY=your_opik_api_key
# For local mode
export OPIK_LOCAL_URL=http://localhost:8080
# Use hosted Opik (default)
ez-mcp-chatbot --opik hosted
# Use local Opik
ez-mcp-chatbot --opik local
# Disable Opik
ez-mcp-chatbot --opik disabled
# Use custom system prompt
ez-mcp-chatbot --system-prompt "You are a helpful coding assistant"
# Combine options
ez-mcp-chatbot --system-prompt "You are a data analysis expert" --opik local --debug
# Use custom tools file
ez-mcp-chatbot --tools-file "my_tools.py"
--opik {local,hosted,disabled}
- Opik tracing mode (default: hosted)--system-prompt TEXT
- Custom system prompt for the chatbot (overrides default)--debug
- Enable debug output during processing--init
- Create a default ez-config.json file and exit--tools-file TOOLS_FILE
- Path to a Python file containing tool definitions. If provided, will create an MCP server configuration using this file.config_path
- Path to the configuration file (default: ez-config.json)
A command-line utility for evaluating LLM applications using Opik's evaluation framework. This tool provides a simple interface to run evaluations on datasets with various metrics, enabling you to measure and improve your LLM application's performance.
- Dataset Evaluation: Run evaluations on your datasets using Opik's evaluation framework
- Multiple Metrics: Support for various evaluation metrics (Hallucination, LevenshteinRatio, etc.)
- Opik Integration: Full integration with Opik for observability and tracking
- Flexible Configuration: Customizable prompts, models, and evaluation parameters
- Rich Output: Beautiful console output with progress tracking and results display
ez-mcp-eval --prompt "Answer the question" --dataset "my-dataset" --metric "Hallucination"
ez-mcp-eval [-h] --prompt PROMPT --dataset DATASET --metric METRIC
[--experiment-name EXPERIMENT_NAME] [--opik {local,hosted,disabled}]
[--debug] [--input INPUT] [--output OUTPUT] [--num NUM] [--list-metrics]
[--model MODEL] [--model-kwargs MODEL_KWARGS] [--metrics-file METRICS_FILE]
[--config CONFIG] [--tools-file TOOLS_FILE]
--prompt PROMPT
- The prompt to use for evaluation (can be a prompt name in Opik or direct text)--dataset DATASET
- Name of the dataset to evaluate on (must exist in Opik or opik_optimizer.datasets)--metric METRIC
- Name of the metric(s) to use for evaluation (comma-separated for multiple)
--experiment-name EXPERIMENT_NAME
- Name for the evaluation experiment (default: ez-mcp-evaluation)--opik {local,hosted,disabled}
- Opik tracing mode (default: hosted)--debug
- Enable debug output during processing--input INPUT
- Input field name in the dataset (default: input)--output OUTPUT
- Output field mapping in format reference=DATASET_FIELD (default: reference=answer)--num NUM
- Number of items to evaluate from the dataset (takes first N items, default: all items)--list-metrics
- List all available metrics and exit--model MODEL
- LLM model to use for evaluation (default: gpt-3.5-turbo)--model-kwargs MODEL_KWARGS
- JSON string of additional keyword arguments for the LLM model--metrics-file METRICS_FILE
- Path to a Python file containing metric definitions (alternative to using opik.evaluation.metrics)--config CONFIG
- Path to MCP server configuration file (default: ez-config.json)--tools-file TOOLS_FILE
- Path to a Python file containing tool definitions. If provided, will create an MCP server configuration using this file.
The ez-mcp-eval
command supports loading datasets from two sources:
- Opik datasets: If the dataset exists in your Opik account, it will be loaded directly
- opik_optimizer.datasets: If the dataset is not found in Opik, the tool will automatically check for a function with the same name in
opik_optimizer.datasets
and create the dataset using that function
This allows you to use both pre-existing Opik datasets and dynamically generated datasets from the opik_optimizer
package.
# Simple evaluation with Hallucination metric
ez-mcp-eval --prompt "Answer the question" --dataset "qa-dataset" --metric "Hallucination"
# Evaluate with multiple metrics
ez-mcp-eval --prompt "Summarize this text" --dataset "summarization-dataset" --metric "Hallucination,LevenshteinRatio"
# Use a custom experiment name
ez-mcp-eval --prompt "Translate to French" --dataset "translation-dataset" --metric "LevenshteinRatio" --experiment-name "french-translation-test"
# Use a different model with custom parameters
ez-mcp-eval --prompt "Answer the question" --dataset "qa-dataset" --metric "LevenshteinRatio" --model "gpt-4" --model-kwargs '{"temperature": 0.7, "max_tokens": 1000}'
# Use a dataset from opik_optimizer.datasets (automatically created if not in Opik)
ez-mcp-eval --prompt "Answer the question" --dataset "my_optimizer_dataset" --metric "Hallucination"
# Custom input and output field mappings
ez-mcp-eval --prompt "Answer the question" --dataset "qa-dataset" --metric "LevenshteinRatio" --input "question" --output "reference=answer"
The ez-mcp-eval
command now includes automatic validation of input and output field mappings to prevent common configuration errors:
- What it checks: The
--input
field must exist in the dataset items - When it runs: Before starting the evaluation
- Error handling: If the field doesn't exist, the command stops with a clear error message showing available fields
- What it checks:
- The
--output
VALUE (dataset field) must exist in the dataset items - The
--output
KEY (metric parameter) must be a valid parameter for the selected metric(s) score method
- The
- When it runs: Before starting the evaluation
- Error handling: If validation fails, the command stops with clear error messages
# Input field not found in dataset
❌ Input field 'question' not found in dataset items
Available fields: input, answer
# Output field not found in dataset
❌ Reference field 'response' not found in dataset items
Available fields: input, answer
# Invalid metric parameter
❌ Output reference 'reference' is not a valid parameter for metric 'LevenshteinRatio' score method
Available parameters: output, reference
This validation helps catch configuration errors early, saving time and preventing failed evaluations.
# Use custom metrics defined in a Python file
ez-mcp-eval --prompt "Answer the question" --dataset "qa-dataset" --metric "CustomMetric" --metrics-file "my_metrics.py"
# Use a custom tools file for MCP server configuration
ez-mcp-eval --prompt "Answer the question" --dataset "qa-dataset" --metric "LevenshteinRatio" --tools-file "my_tools.py"
# See all available metrics
ez-mcp-eval --list-metrics
# Enable debug output for troubleshooting
ez-mcp-eval --prompt "Answer the question" --dataset "qa-dataset" --metric "Hallucination" --debug
You can define custom metrics in a Python file and use them with the --metrics-file
option. The metric file should contain metric classes that follow the same interface as Opik's built-in metrics.
class CustomMetric:
def __init__(self):
self.name = "CustomMetric"
def __call__(self, output, reference):
# Your custom evaluation logic here
# Return a score between 0 and 1
return 0.8 # Example score
Then use it with:
ez-mcp-eval --prompt "Answer the question" --dataset "qa-dataset" --metric "CustomMetric" --metrics-file "my_metrics.py"
The ez-mcp-eval
tool integrates seamlessly with Opik for:
- Dataset Management: Load datasets from your Opik workspace
- Prompt Management: Use prompts stored in Opik or provide direct text
- Experiment Tracking: Track evaluation experiments with custom names
- Observability: Full tracing of LLM calls and evaluation processes
For Opik integration, set up your environment:
# For hosted Opik
export OPIK_API_KEY=your_opik_api_key
# For local Opik
export OPIK_LOCAL_URL=http://localhost:8080
The tool supports all metrics available in Opik's evaluation framework. Use --list-metrics
to see the complete list, which includes:
- Hallucination: Detect hallucinated content in responses
- LevenshteinRatio: Measure text similarity using Levenshtein distance
- ExactMatch: Check for exact string matches
- F1Score: Calculate F1 score for classification tasks
- And many more...
The tool provides rich console output including:
- Progress tracking during evaluation
- Dataset information and statistics
- Evaluation results and metrics
- Error handling and debugging information
- Integration with Opik's experiment tracking
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Documentation: GitHub Repository
- Issues: GitHub Issues
- Built with Model Context Protocol (MCP)
- Powered by LiteLLM
- Observability by Opik
- Rich console interface by Rich
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Make your changes
- Run tests:
pytest
- Format code:
black . && isort .
- Commit your changes:
git commit -m "Add feature"
- Push to the branch:
git push origin feature-name
- Submit a pull request
- Python 3.8 or higher
- OpenAI, Anthropic, or other LLM provider API key (for chatbot functionality)
# Clone the repository
git clone https://github.com/comet-ml/ez-mcp-toolbox.git
cd ez-mcp-toolbox
# Install in development mode
pip install -e .
# Or install with development dependencies
pip install -e ".[dev]"
pip install -r requirements.txt