Experimental sandbox for the OpenDataHub community to explore agentic AI reasoning. Apache 2.0 licensed. Not production code -- APIs, examples, and tooling may break without notice.
A collection of examples, tools, and validation notebooks demonstrating how to build AI agents that use external tools via the Model Context Protocol (MCP). The repo intentionally does NOT provide a custom SDK or agent framework (see adr/minimal-sdk.md for the "Option 8" decision). Instead, it shows multiple approaches:
- Direct OpenAI SDK (Python & Go) -- calling
client.responses.create()with MCP tool definitions (simplest examples) - LangChain + LangGraph -- multi-agent workflows with state machines, structured outputs, conditional routing, and guardrails
- CrewAI -- multi-agent crews with sequential/hierarchical task execution
- FastMCP -- building custom MCP servers (e.g., NPS API server)
- MLflow + OpenTelemetry -- tracing and Agent-as-a-Judge evaluation
- Languages: Python 3.13+ (primary), Go 1.24 (secondary)
- LLM Clients:
openaiPython SDK,openai-goGo SDK,langchain-openai(ChatOpenAI) - Agent Frameworks: LangGraph (
StateGraph, conditional edges, structured output), CrewAI (@CrewBase, agents/tasks YAML config), LangChain (tool binding,MultiServerMCPClient, PIIMiddleware guardrails) - MCP Servers: FastMCP (custom server authoring),
kubernetes-mcp-server, GitHub Copilot MCP, Slack MCP, ServiceNow MCP, Google Workspace MCP, Jira MCP (Atlassian), DeepWiki MCP, Context7 MCP - Inference: vLLM (self-hosted, tool calling with
--enable-auto-tool-choice), Ollama (local models), OpenAI API, Google Gemini API, IBM WatsonX - API Layer: Llama Stack (OpenAI-compatible Responses API proxy), can also call OpenAI/vLLM directly
- Tracing/Eval: MLflow (
@mlflow.trace, Agent-as-a-Judge), OpenTelemetry - Benchmarking: BFCL (Berkeley Function Calling Leaderboard), bootstrap significance testing (numpy, scipy, tqdm)
- Web/Infra: Flask, Kubernetes Python client, Podman/Docker, Pydantic (structured outputs)
- Models tested: GPT-4o, GPT-4o-mini, Qwen3-0.6B/1.7B/8B, Llama 3.2, Llama Guard 3, GPT-OSS-20b/120b, Gemini 2.5 Pro
adr/minimal-sdk.md-- Key architectural decision: why no custom SDK (Option 8)examples/langchain-langgraph/workflow.py-- Most complex example: LangGraph StateGraph with classification, routing, K8s + GitHub MCP callsexamples/ai_assistant_for_troubleshooting_apps/crew.py-- CrewAI multi-agent crew with 3 MCP serversexamples/agents_tracing-eval_mlflow/nps_agent/nps_mcp_server.py-- Custom FastMCP server (NPS API, 758 lines)examples/agents_tracing-eval_mlflow/log_monitor/log_monitor_agent/agent.py-- LangGraph workflow with MLflow tracingtools/mcp-tester/test-mcp-server.py-- MCP server diagnostic toolmcp-discovery-configmap/schema.json-- MCP ConfigMap schema (url, transport, description, logo)benchmarking/significance-testing/significance_test.py-- BFCL statistical comparison CLI
Direct vLLM + OpenAI SDK (simplest):
export OPENAI_BASE_URL=http://localhost:8000/v1
export OPENAI_API_KEY=EMPTY
python example.py # or: go run example.goVia Llama Stack (most examples):
llama stack run run.yaml --image-type venv # starts on port 8321
export OPENAI_BASE_URL=http://localhost:8321/v1/openai/v1Via OpenAI directly (github-mcp, gsuite-mcp defaults):
export OPENAI_API_KEY=sk-...
export INFERENCE_MODEL=openai/gpt-4oVia Ollama (langchain-langgraph):
ollama pull llama-guard3:1b && ollama serve
export INFERENCE_MODEL=ollama/llama3.2:3bService-specific tokens: GITHUB_TOKEN, SLACK_MCP_TOKEN, KUBE_TOKEN, GOOGLE_OAUTH_CLIENT_ID/SECRET, NPS_API_KEY
- Each example is self-contained with its own README, dependencies (
pyproject.toml/go.mod), and Llama Stack config (run.yaml) - Python examples:
openaiSDK withclient.responses.create()and MCP tool dicts - Go examples:
openai-gowithresponses.ResponseNewParamsandToolMcpParam - MCP transports: SSE (
/sseendpoint) or streamable HTTP (/mcpendpoint) - Structured outputs via Pydantic models +
llm.with_structured_output()(LangChain pattern) - LangGraph state:
TypedDictwithadd_messagesannotation for message accumulation - CrewAI config: YAML-based agent/task definitions in
config/directory - Package management:
uv(Python), Go modules - ADRs follow
adr/template.md(Context -> Decision -> Status -> Consequences)