vllm-mlx supports the Model Context Protocol (MCP) for integrating external tools with LLMs.
┌─────────────────────────────────────────────────────────────────────┐
│ Tool Calling Flow │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 1. User Request │
│ ─────────────────► "List files in /tmp" │
│ │
│ 2. LLM Generates Tool Call │
│ ─────────────────► tool_calls: [{ │
│ name: "list_directory", │
│ arguments: {path: "/tmp"} │
│ }] │
│ │
│ 3. App Executes Tool via MCP │
│ ─────────────────► MCP Server executes list_directory │
│ Returns: ["file1.txt", "file2.txt"] │
│ │
│ 4. Tool Result Sent Back to LLM │
│ ─────────────────► role: "tool", content: [...] │
│ │
│ 5. LLM Generates Final Response │
│ ─────────────────► "The /tmp directory contains 2 files..." │
│ │
└─────────────────────────────────────────────────────────────────────┘
Create mcp.json:
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
}
}
}# Simple mode
vllm-mlx serve mlx-community/Qwen3-4B-4bit --mcp-config mcp.json
# Continuous batching
vllm-mlx serve mlx-community/Qwen3-4B-4bit --mcp-config mcp.json --continuous-batching# Check MCP status
curl http://localhost:8000/v1/mcp/status
# List available tools
curl http://localhost:8000/v1/mcp/toolsimport json
import httpx
BASE_URL = "http://localhost:8000"
# 1. Get available tools
tools_response = httpx.get(f"{BASE_URL}/v1/mcp/tools")
tools = tools_response.json()["tools"]
# 2. Send request with tools
response = httpx.post(
f"{BASE_URL}/v1/chat/completions",
json={
"model": "default",
"messages": [{"role": "user", "content": "List files in /tmp"}],
"tools": tools,
"max_tokens": 1024
}
)
result = response.json()
message = result["choices"][0]["message"]
# 3. Check for tool calls
if message.get("tool_calls"):
tool_call = message["tool_calls"][0]
# 4. Execute tool via MCP
exec_response = httpx.post(
f"{BASE_URL}/v1/mcp/execute",
json={
"server": "filesystem",
"tool": tool_call["function"]["name"],
"arguments": json.loads(tool_call["function"]["arguments"])
}
)
tool_result = exec_response.json()
# 5. Send result back to LLM
messages = [
{"role": "user", "content": "List files in /tmp"},
message,
{
"role": "tool",
"tool_call_id": tool_call["id"],
"content": json.dumps(tool_result["result"])
}
]
final_response = httpx.post(
f"{BASE_URL}/v1/chat/completions",
json={"model": "default", "messages": messages}
)
print(final_response.json()["choices"][0]["message"]["content"])| Endpoint | Method | Description |
|---|---|---|
/v1/mcp/status |
GET | Check MCP status |
/v1/mcp/tools |
GET | List available tools |
/v1/mcp/execute |
POST | Execute a tool |
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
}
}
}{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_TOKEN": "your-token"
}
}
}
}{
"mcpServers": {
"postgres": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres"],
"env": {
"DATABASE_URL": "postgresql://user:pass@localhost/db"
}
}
}
}{
"mcpServers": {
"brave-search": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-brave-search"],
"env": {
"BRAVE_API_KEY": "your-key"
}
}
}
}{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
},
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_TOKEN": "your-token"
}
}
}
}For testing MCP interactively:
python examples/mcp_chat.pyvllm-mlx supports 12 tool call parsers covering all major model families. See Tool Calling for the full list of parsers, aliases, and examples.
vllm-mlx includes security measures to prevent command injection attacks via MCP servers.
Only trusted commands are allowed by default:
| Category | Allowed Commands |
|---|---|
| Node.js | npx, npm, node |
| Python | uvx, uv, python, python3, pip, pipx |
| Docker | docker |
| MCP Servers | mcp-server-* (official servers) |
The following patterns are blocked to prevent injection attacks:
- Command chaining:
;,&&,||,| - Command substitution:
`,$() - Path traversal:
../ - Dangerous env vars:
LD_PRELOAD,PATH,PYTHONPATH
{
"mcpServers": {
"malicious": {
"command": "bash",
"args": ["-c", "rm -rf /"]
}
}
}This config will be rejected:
ValueError: MCP server 'malicious': Command 'bash' is not in the allowed commands whitelist.
For development only, you can bypass security validation:
{
"mcpServers": {
"custom": {
"command": "my-custom-server",
"skip_security_validation": true
}
}
}WARNING: Never use skip_security_validation in production!
To add custom commands to the whitelist programmatically:
from vllm_mlx.mcp import MCPCommandValidator, set_validator
# Add custom commands
validator = MCPCommandValidator(
custom_whitelist={"my-trusted-server", "another-server"}
)
set_validator(validator)Beyond command validation, vllm-mlx provides runtime sandboxing for tool executions:
| Feature | Description |
|---|---|
| Tool Allowlisting | Only permit specific tools to execute |
| Tool Blocklisting | Block specific dangerous tools |
| Argument Validation | Block dangerous patterns in tool arguments |
| Rate Limiting | Limit tool calls per minute |
| Audit Logging | Track all tool executions |
Tool arguments are validated for dangerous patterns:
- Path traversal:
../ - System directories:
/etc/,/proc/,/sys/ - Root access:
/root/,~root
Tools matching these patterns trigger security warnings:
execute,run_command,shell,eval,exec,system,subprocess
from vllm_mlx.mcp import ToolSandbox, set_sandbox
# Create sandbox with custom settings
sandbox = ToolSandbox(
# Only allow specific tools (whitelist mode)
allowed_tools={"read_file", "list_directory"},
# Block specific tools (blacklist mode)
blocked_tools={"execute_command", "run_shell"},
# Rate limit: max 30 calls per minute
max_calls_per_minute=30,
# Optional audit callback
audit_callback=lambda audit: print(f"Tool: {audit.tool_name}, Success: {audit.success}"),
)
set_sandbox(sandbox)from vllm_mlx.mcp import get_sandbox
sandbox = get_sandbox()
# Get recent audit entries
entries = sandbox.get_audit_log(limit=50)
# Filter by tool name
file_ops = sandbox.get_audit_log(tool_filter="file")
# Get only errors
errors = sandbox.get_audit_log(errors_only=True)
# Clear audit log
sandbox.clear_audit_log()Audit logs automatically redact sensitive fields (password, token, secret, key, credential, auth) and truncate large values.
Check that the MCP server command is correct:
npx -y @modelcontextprotocol/server-filesystem /tmpVerify tool is available:
curl http://localhost:8000/v1/mcp/tools | jq '.tools[].name'Ensure you're using a model that supports function calling (Qwen3, Llama-3.2-Instruct).
If you see "Command X is not in the allowed commands whitelist", either:
- Use an allowed command (see whitelist above)
- Add the command to a custom whitelist
- Use
skip_security_validation: true(development only)