MCP (Model Context Protocol) server that exposes the dftracer I/O tracing toolkit as tools for LLM agents.
Includes three services:
- dftracer_utils — 21 tools wrapping the
dftracer_*CLI binaries (info,stats,merge,split,index,reader,pgzip,tar,organize,replay, …) - dfanalyzer — 5 tools for trace analysis (
analyze,summarize_trace,detect_preset,query,list_presets) - dftracer_plot — 2 tools for generating charts from trace data (
plot,plot_all)
The server speaks the MCP stdio transport and works with any MCP-compatible agent (Goose, Claude Desktop, custom agents).
git clone <this-repo>
cd dftracer-agents
python -m venv venv
source venv/bin/activate
pip install -e .This installs two console scripts: dftracer-mcp-server and dftracer-install-skills.
The package bundles a set of agent skills (Claude Code / Goose SKILL.md files) that teach the AI agent how to use dftracer effectively. After pip install, copy them into your project so the agent harness can find them:
# Copy skills into ./.agents/skills/ (current directory)
dftracer-install-skills
# Copy into a specific project directory
dftracer-install-skills /path/to/my-project
# Replace any existing skills with the packaged versions
dftracer-install-skills --overwrite
# Print the path to the bundled skills without copying
dftracer-install-skills --listSkills are installed to <target>/.agents/skills/ and cover:
| Skill | Purpose |
|---|---|
dftracer-pipeline |
Full annotation + trace pipeline workflow |
dftracer-annotate-c / cpp / python |
Per-language annotation rules |
dftracer-trace-utils |
When and how to use MCP trace tools |
dftracer-install |
Installation and configuration |
dftracer-io-optimization |
I/O bottleneck analysis and optimization |
dftracer-lessons / dftracer-pitfalls |
Hard-won annotation lessons and common mistakes |
dftracer-cheatsheet |
Quick reference for dftracer macros and APIs |
| … | 18 skills total |
You can also locate or install skills programmatically:
from dftracer_agents import bundled_skills_dir, install_skills
# Path to the skills inside the installed package
path = bundled_skills_dir()
# Copy to a directory (skips existing by default)
install_skills("/path/to/project", overwrite=False)The full annotation → trace → diagnosis → optimization pipeline is documented with a Mermaid flowchart:
It covers every MCP tool call in order, which sub-service owns each one, the workspace directory layout, and how per-file annotation parallelism and L1 optimization iterations work.
Tests call the real MCP tool functions and compare output against direct subprocess calls — no mocking.
source venv/bin/activate
cd dftracer-agents
# Run everything
python -m pytest test/ -v
# Run only dfanalyzer service tests
python -m pytest test/test_dfanalyzer_service.py -v
# Run only dftracer_utils tool tests
python -m pytest test/test_dftracer_utils_mcp_tools.py -v
# Run a single test by name
python -m pytest test/test_dfanalyzer_service.py::test_summarize_trace_on_sample_data -vExample output:
test/test_dfanalyzer_service.py::test_hydra_args_minimal PASSED
test/test_dfanalyzer_service.py::test_summarize_trace_on_sample_data PASSED
summarize_trace output:
DFTracer Trace Summary: test/data/cm1_1_48_20240926
============================================================
Trace files : 48 total
Total events: 284,041
Processes : 48
Threads : 48
Duration : 145.364s
Bytes read : 2.7 MB
Bytes written: 688.3 MB
Top I/O operations:
write 112,353
__xstat 46,899
fclose 25,992
open 21,814
close 21,563
...
test/test_dftracer_utils_mcp_tools.py::test_pgzip_on_empty_dir_succeeds PASSED
test/test_dftracer_utils_mcp_tools.py::test_tool_and_subprocess_fail_consistently[info] PASSED
...
33 passed, 3 skipped
The 3 skipped tests are server (blocks forever), aggregator_mpi, and call_tree_mpi (require an MPI build).
Each parametrized test in test_dftracer_utils_mcp_tools.py runs both the direct binary and the MCP tool function, then asserts both produce the same outcome (both succeed or both fail). This catches drift between the MCP wrapper and the underlying binary.
test_tool_and_subprocess_fail_consistently[info]
[info] segfaults on this platform (io_uring probe failure)
direct: dftracer_info -d test/data/cm1_1_48_20240926 --query summary
rc=-11
mcp tool (info): raised=True
CalledProcessError(returncode=-11)
both failed — direct rc=-11, mcp rc=-11 ✓
A text REPL for manual tool exploration — useful for debugging tools without writing test code.
source venv/bin/activate
# All services: utils + analyzer + plot (28 tools, default)
python test/mcp_repl.py --service both
# dftracer_utils only (21 tools)
python test/mcp_repl.py --service utils
# dfanalyzer + plot (7 tools)
python test/mcp_repl.py --service analyzerExample session:
Starting MCP server (both)…
============================================================
dftracer-agents MCP REPL
============================================================
28 tools available. Type 'list' to see them.
Syntax: <tool_name> [<json-args>]
Example: info {"directory": "/path/to/traces"}
Commands: list, desc <tool>, quit
============================================================
mcp> list
aggregator Aggregate trace files …
analyze Analyze an I/O trace using dfanalyzer …
info Show a summary of trace files in a directory …
summarize_trace Summarize a dftracer I/O trace directory …
…
mcp> desc summarize_trace
Tool: summarize_trace
Parameters:
trace_path: string (required)
max_files: integer default=50
mcp> summarize_trace {"trace_path": "/path/to/traces"}
→ calling summarize_trace({'trace_path': '/path/to/traces'}) ...
DFTracer Trace Summary: /path/to/traces
============================================================
Trace files : 48 total
Total events: 284,041
Processes : 48
Duration : 145.364s
Bytes read : 2.7 MB
Bytes written: 688.3 MB
…
mcp> quit
test/mcp_agent.py connects the MCP server to a local LLM via the OpenAI-compatible API. You ask questions in plain English; the LLM decides which tools to call.
# Install and start ollama
ollama serve
ollama pull qwen2.5-coder:7bEdit test/openai_client_config.json:
{
"provider": "openai",
"base_url": "http://localhost:11434/v1",
"api_key": "ollama",
"model": "qwen2.5-coder:7b"
}If running inside Docker and ollama is on the host:
{
"base_url": "http://host.docker.internal:11434/v1",
"api_key": "ollama",
"model": "qwen2.5-coder:7b"
}source venv/bin/activate
# All services: utils + analyzer + plot (28 tools)
python test/mcp_agent.py --service both
# dfanalyzer + plot (7 tools)
python test/mcp_agent.py --service analyzer
# dftracer_utils only (21 tools)
python test/mcp_agent.py --service utils
# Custom LLM config file
python test/mcp_agent.py --config /path/to/config.jsonStarting MCP server (both)…
Connecting to LLM at http://localhost:11434/v1 (model=qwen2.5-coder:7b)
============================================================
dftracer-agents LLM Agent
============================================================
Model : qwen2.5-coder:7b
Tools : 28 MCP tools available
Type your question in plain English. Ctrl-C or 'quit' to exit.
============================================================
you> summarize the trace at /data/cm1_trace
[tool] summarize_trace({"trace_path":"/data/cm1_trace"})
[result] DFTracer Trace Summary: /data/cm1_trace …
agent> The trace contains 48 processes, ran for 145 seconds, and performed
112,353 write calls transferring 688 MB. The dominant operation was
write (40% of all calls), followed by stat and open/close pairs typical
of HPC checkpoint I/O.
you> which files were accessed most?
[tool] analyze({"trace_path":"/data/cm1_trace","view_types":["file_name"]})
…
Goose is an open-source AI agent that natively supports MCP servers as extensions.
curl -fsSL https://github.com/aaif-goose/goose/releases/download/stable/download_cli.sh \
| CONFIGURE=false GOOSE_BIN_DIR=/usr/local/bin bashNode.js ≥ 20 is required for goose tui. Install it if not present:
curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
apt-get install -y nodejsAdd to ~/.config/goose/config.yaml:
extensions:
dftracer:
type: stdio
cmd: dftracer-mcp-server
args: []
enabled: true
description: >
DFTracer I/O tracing toolkit. Use summarize_trace() for a quick overview
of any trace directory, analyze() for full dfanalyzer pipeline output,
or the dftracer_utils tools (info, stats, merge, split, …) for file-level
operations.If dftracer-mcp-server is not on your PATH (e.g. installed in a venv), point directly at the Python interpreter:
extensions:
dftracer:
type: stdio
cmd: /path/to/dftracer-agents/venv/bin/python
args:
- /path/to/dftracer-agents/dftracer_mcp_server.py
enabled: trueExpose only one service if you want a smaller tool set:
extensions:
dftracer:
type: stdio
cmd: dftracer-mcp-server
args: [--service, analyzer] # or: --service utils
enabled: trueThe annotation pipeline clones a target application, detects languages, auto-annotates source files with dftracer macros, verifies correctness with a build + smoke test, pauses for your review, then collects and analyzes traces.
Pipeline files live in dftracer-agents/recipes/:
| File | Purpose |
|---|---|
pipeline.yaml |
Main orchestration recipe (headless) |
annotate-c.yaml |
Per-file C annotation sub-recipe |
annotate-cpp.yaml |
Per-file C++ annotation sub-recipe |
annotate-python.yaml |
Per-file Python annotation sub-recipe |
_inc-top.inc |
Shared: Step 0 (lessons), General Rules, Step 1 (read) |
_inc-write.inc |
Shared: Step 5 (write back) |
_inc-report.inc |
Shared: Step 7 (report format) + General Pitfalls |
A lessons-learned file at
.agents/skills/dftracer-annotation-lessons/SKILL.md is read by every
annotation agent at startup and appended after each session.
Runs interactively: the agent asks you questions, pauses for confirmation before the trace run, and lets you request fixes between steps.
# Prerequisites: Node.js >=20 (for goose tui, if using that route)
# The recommended interactive method uses the pipeline skill + wrapper script:
./dftracer-agents/run-pipeline.shThis starts a goose session, injects the startup prompt automatically,
then connects your keyboard for the Q&A. The agent will ask:
What is the Git URL of the application you want to annotate?
> https://github.com/org/myapp
Which branch or tag? (default: main)
> main
Smoke test command? (leave blank to auto-detect)
> make test
Extra CMake build flags? (leave blank to skip)
>
After annotation and build verification it pauses:
┌─────────────────────────────────────────────────────────┐
│ ANNOTATION REPORT — myapp/20260616_120000 │
│ src/io.c DONE 12 annotated 2 skipped │
│ src/main.c DONE 4 annotated 0 skipped │
│ Coverage: 16 / 18 io=10 comm=4 mem=2 cpu=0 │
│ Build: PASSED Smoke test: PASSED │
└─────────────────────────────────────────────────────────┘
Proceed with dftracer trace run? [yes / no / fix <file>]
> yes
Alternatively, start a plain goose session and paste this prompt manually:
Load and follow the dftracer annotation pipeline skill from:
/workspaces/dftracer-agents/.agents/skills/dftracer-pipeline/SKILL.md
Read the full SKILL.md file now, then immediately start Step 1 by asking me:
'What is the Git URL of the application you want to annotate?'
Pass all inputs as --params. The pipeline runs end-to-end unattended,
annotating files in parallel via sub-recipes, then prints the full report
and trace analysis.
goose run --recipe dftracer-agents/recipes/pipeline.yaml \
--params app_url="https://github.com/org/myapp" \
--params ref="main" \
--params smoke_cmd="make test" \
--params extra_flags="-DENABLE_MPI=ON"smoke_cmd and extra_flags are optional (auto-detected if omitted).
You can also invoke a single-file annotation recipe without the pipeline:
goose run --recipe dftracer-agents/recipes/annotate-c.yaml \
--params run_id="myapp/20260616_120000" \
--params filepath="src/io.c"Pass build_errors to fix a specific compiler failure:
goose run --recipe dftracer-agents/recipes/annotate-c.yaml \
--params run_id="myapp/20260616_120000" \
--params filepath="src/io.c" \
--params build_errors="io.c:42: error: 'DFTRACER_C_FUNCTION_END' undeclared"goose sessionGoose is running! Enter your instructions, or try asking for help.
( O )> summarize the I/O trace at /data/cm1_1_48_20240926
─── Tool use: summarize_trace ────────────────────────────────────
{
"trace_path": "/data/cm1_1_48_20240926"
}
──────────────────────────────────────────────────────────────────
DFTracer Trace Summary: /data/cm1_1_48_20240926
============================================================
Trace files : 48 total
Total events: 284,041
Processes : 48
Threads : 48
Duration : 145.364s
Bytes read : 2.7 MB
Bytes written: 688.3 MB
Top I/O operations:
write 112,353
__xstat 46,899
fclose 25,992
open 21,814
close 21,563
fopen64 20,496
read 13,489
opendir 9,215
This trace shows a write-heavy HPC workload (CM1 atmospheric model) with
48 MPI ranks. The high write volume (688 MB) relative to reads (2.7 MB)
is typical of checkpoint I/O. The __xstat and opendir calls suggest
frequent directory polling, common in parallel file systems.
( O )> compress all the .pfw files in that directory
─── Tool use: pgzip ──────────────────────────────────────────────
{
"directory": "/data/cm1_1_48_20240926"
}
──────────────────────────────────────────────────────────────────
…
The annotation pipeline prompt and the analysis workflow below can be combined —
after the pipeline completes, ask Goose to continue with analyze and query
on the trace directory it just produced.
When working with a new trace, follow this sequence — the agent will do this automatically if you ask it to "analyse" a trace:
1. detect_preset(trace_path)
→ reads event categories from the trace
→ returns "posix" (HPC/scientific) or "dlio" (AI/ML deep learning)
2. summarize_trace(trace_path)
→ pure-Python overview: file count, duration, bytes read/written, top ops
→ always works, no native C++ required
3. query(trace_path, view_type="proc_name")
→ per-process I/O breakdown (who writes the most?)
4. query(trace_path, view_type="file_name",
filter_expr='cat == "POSIX" and dur > 1000')
→ hot-file analysis (which files take the longest per call?)
5. query(trace_path, view_type="time_range")
→ I/O rate over time (when does the burst happen?)
6. analyze(trace_path, analyzer_preset=<from step 1>)
→ full dfanalyzer pipeline (requires native C++ and dfanalyzer install)
AI/ML auto-detection (detect_preset): inspects the cat field of every event against the AI/ML category signatures from dftracer's ai_common.py. If any of COMPUTE, DATA, DATALOADER, COMM, DEVICE, CHECKPOINT, or PIPELINE categories are present — or AI/ML function names like forward, backward, epoch, fetch — the dlio preset is recommended; otherwise posix.
| Tool | Service | Description |
|---|---|---|
detect_preset |
analyzer | Auto-detect posix vs dlio preset from trace event categories |
summarize_trace |
analyzer | Pure-Python trace summary (always works, no native deps) |
query |
analyzer | Exploratory groupby views: file_name, proc_name, time_range, raw |
analyze |
analyzer | Full dfanalyzer pipeline (POSIX, DLIO, Darshan presets) |
list_presets |
analyzer | Show all dfanalyzer configuration options |
plot |
plot | Generate a chart from trace data (5 types, PNG/HTML/SVG) |
plot_all |
plot | Generate all 5 standard plots in one call |
info |
utils | Trace directory summary |
stats |
utils | Per-function I/O statistics |
merge |
utils | Merge multiple trace files |
split |
utils | Split a trace by process/time |
pgzip |
utils | Parallel gzip compression |
index |
utils | Build bloom-filter index for fast queries |
reader |
utils | Read and dump trace events |
organize |
utils | Organize traces by run |
replay |
utils | Replay trace I/O for benchmarking |
tar |
utils | Archive trace directories |
| … | utils | 21 dftracer_utils tools total |
The filter_expr parameter uses the same DSL as dftracer_stats --query. Available fields:
| Field | Type | Example |
|---|---|---|
cat |
string | cat == "POSIX" |
name |
string | name in ("read", "write") |
dur |
int (µs) | dur > 1000 |
ts |
int (µs) | — |
pid, tid |
int | pid == 3537780 |
Expressions can be combined with and / or:
'cat == "POSIX" and dur > 500'
'name in ("read", "write") and dur > 10000'
'cat == "COMPUTE"'
You can also run the server manually for debugging or to wire it into any MCP client:
# Start the server (it blocks, reading MCP requests from stdin)
dftracer-mcp-server --service both
# Or via Python
python dftracer_mcp_server.py --service analyzerThe server writes MCP JSON-RPC to stdout and reads from stdin. Any MCP client (Goose, Claude Desktop, a custom script) can connect by spawning it as a subprocess.