dftracer-agents

MCP (Model Context Protocol) server that exposes the dftracer I/O tracing toolkit as tools for LLM agents.

Includes three services:

dftracer_utils — 21 tools wrapping the dftracer_* CLI binaries (info, stats, merge, split, index, reader, pgzip, tar, organize, replay, …)
dfanalyzer — 5 tools for trace analysis (analyze, summarize_trace, detect_preset, query, list_presets)
dftracer_plot — 2 tools for generating charts from trace data (plot, plot_all)

The server speaks the MCP stdio transport and works with any MCP-compatible agent (Goose, Claude Desktop, custom agents).

Setup

git clone <this-repo>
cd dftracer-agents

python -m venv venv
source venv/bin/activate

pip install -e .

This installs two console scripts: dftracer-mcp-server and dftracer-install-skills.

Installing agent skills

The package bundles a set of agent skills (Claude Code / Goose SKILL.md files) that teach the AI agent how to use dftracer effectively. After pip install, copy them into your project so the agent harness can find them:

# Copy skills into ./.agents/skills/ (current directory)
dftracer-install-skills

# Copy into a specific project directory
dftracer-install-skills /path/to/my-project

# Replace any existing skills with the packaged versions
dftracer-install-skills --overwrite

# Print the path to the bundled skills without copying
dftracer-install-skills --list

Skills are installed to <target>/.agents/skills/ and cover:

Skill	Purpose
`dftracer-pipeline`	Full annotation + trace pipeline workflow
`dftracer-annotate-c` / `cpp` / `python`	Per-language annotation rules
`dftracer-trace-utils`	When and how to use MCP trace tools
`dftracer-install`	Installation and configuration
`dftracer-io-optimization`	I/O bottleneck analysis and optimization
`dftracer-lessons` / `dftracer-pitfalls`	Hard-won annotation lessons and common mistakes
`dftracer-cheatsheet`	Quick reference for dftracer macros and APIs
…	18 skills total

You can also locate or install skills programmatically:

from dftracer_agents import bundled_skills_dir, install_skills

# Path to the skills inside the installed package
path = bundled_skills_dir()

# Copy to a directory (skips existing by default)
install_skills("/path/to/project", overwrite=False)

Pipeline

The full annotation → trace → diagnosis → optimization pipeline is documented with a Mermaid flowchart:

docs/pipeline.md

It covers every MCP tool call in order, which sub-service owns each one, the workspace directory layout, and how per-file annotation parallelism and L1 optimization iterations work.

Running the tests

Tests call the real MCP tool functions and compare output against direct subprocess calls — no mocking.

source venv/bin/activate
cd dftracer-agents

# Run everything
python -m pytest test/ -v

# Run only dfanalyzer service tests
python -m pytest test/test_dfanalyzer_service.py -v

# Run only dftracer_utils tool tests
python -m pytest test/test_dftracer_utils_mcp_tools.py -v

# Run a single test by name
python -m pytest test/test_dfanalyzer_service.py::test_summarize_trace_on_sample_data -v

Example output:

test/test_dfanalyzer_service.py::test_hydra_args_minimal PASSED
test/test_dfanalyzer_service.py::test_summarize_trace_on_sample_data PASSED

  summarize_trace output:
  DFTracer Trace Summary: test/data/cm1_1_48_20240926
  ============================================================
    Trace files : 48 total
    Total events: 284,041
    Processes   : 48
    Threads     : 48
    Duration    : 145.364s
    Bytes read  : 2.7 MB
    Bytes written: 688.3 MB

  Top I/O operations:
    write                  112,353
    __xstat                 46,899
    fclose                  25,992
    open                    21,814
    close                   21,563
    ...

test/test_dftracer_utils_mcp_tools.py::test_pgzip_on_empty_dir_succeeds PASSED
test/test_dftracer_utils_mcp_tools.py::test_tool_and_subprocess_fail_consistently[info] PASSED
...
33 passed, 3 skipped

The 3 skipped tests are server (blocks forever), aggregator_mpi, and call_tree_mpi (require an MPI build).

What the tests verify

Each parametrized test in test_dftracer_utils_mcp_tools.py runs both the direct binary and the MCP tool function, then asserts both produce the same outcome (both succeed or both fail). This catches drift between the MCP wrapper and the underlying binary.

test_tool_and_subprocess_fail_consistently[info]
  [info] segfaults on this platform (io_uring probe failure)
  direct: dftracer_info -d test/data/cm1_1_48_20240926 --query summary
    rc=-11
  mcp tool (info): raised=True
    CalledProcessError(returncode=-11)
  both failed — direct rc=-11, mcp rc=-11  ✓

Interactive MCP REPL

A text REPL for manual tool exploration — useful for debugging tools without writing test code.

source venv/bin/activate

# All services: utils + analyzer + plot (28 tools, default)
python test/mcp_repl.py --service both

# dftracer_utils only (21 tools)
python test/mcp_repl.py --service utils

# dfanalyzer + plot (7 tools)
python test/mcp_repl.py --service analyzer

Example session:

Starting MCP server (both)…
============================================================
  dftracer-agents MCP REPL
============================================================
  28 tools available.  Type 'list' to see them.
  Syntax:  <tool_name> [<json-args>]
  Example: info {"directory": "/path/to/traces"}
  Commands: list, desc <tool>, quit
============================================================

mcp> list
  aggregator              Aggregate trace files …
  analyze                 Analyze an I/O trace using dfanalyzer …
  info                    Show a summary of trace files in a directory …
  summarize_trace         Summarize a dftracer I/O trace directory …
  …

mcp> desc summarize_trace
Tool: summarize_trace
Parameters:
  trace_path: string (required)
  max_files:  integer  default=50

mcp> summarize_trace {"trace_path": "/path/to/traces"}
  → calling summarize_trace({'trace_path': '/path/to/traces'}) ...

DFTracer Trace Summary: /path/to/traces
============================================================
  Trace files : 48 total
  Total events: 284,041
  Processes   : 48
  Duration    : 145.364s
  Bytes read  : 2.7 MB
  Bytes written: 688.3 MB
  …

mcp> quit

LLM agent (ollama)

test/mcp_agent.py connects the MCP server to a local LLM via the OpenAI-compatible API. You ask questions in plain English; the LLM decides which tools to call.

Prerequisites

# Install and start ollama
ollama serve
ollama pull qwen2.5-coder:7b

Configure the LLM endpoint

Edit test/openai_client_config.json:

{
  "provider": "openai",
  "base_url": "http://localhost:11434/v1",
  "api_key": "ollama",
  "model": "qwen2.5-coder:7b"
}

If running inside Docker and ollama is on the host:

{
  "base_url": "http://host.docker.internal:11434/v1",
  "api_key": "ollama",
  "model": "qwen2.5-coder:7b"
}

Run

source venv/bin/activate

# All services: utils + analyzer + plot (28 tools)
python test/mcp_agent.py --service both

# dfanalyzer + plot (7 tools)
python test/mcp_agent.py --service analyzer

# dftracer_utils only (21 tools)
python test/mcp_agent.py --service utils

# Custom LLM config file
python test/mcp_agent.py --config /path/to/config.json

Example session

Starting MCP server (both)…
Connecting to LLM at http://localhost:11434/v1 (model=qwen2.5-coder:7b)

============================================================
  dftracer-agents LLM Agent
============================================================
  Model : qwen2.5-coder:7b
  Tools : 28 MCP tools available
  Type your question in plain English.  Ctrl-C or 'quit' to exit.
============================================================

you> summarize the trace at /data/cm1_trace

  [tool] summarize_trace({"trace_path":"/data/cm1_trace"})
  [result] DFTracer Trace Summary: /data/cm1_trace …

agent> The trace contains 48 processes, ran for 145 seconds, and performed
       112,353 write calls transferring 688 MB. The dominant operation was
       write (40% of all calls), followed by stat and open/close pairs typical
       of HPC checkpoint I/O.

you> which files were accessed most?

  [tool] analyze({"trace_path":"/data/cm1_trace","view_types":["file_name"]})
  …

Goose integration

Goose is an open-source AI agent that natively supports MCP servers as extensions.

Install Goose

curl -fsSL https://github.com/aaif-goose/goose/releases/download/stable/download_cli.sh \
  | CONFIGURE=false GOOSE_BIN_DIR=/usr/local/bin bash

Node.js ≥ 20 is required for goose tui. Install it if not present:

curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
apt-get install -y nodejs

Configure the dftracer extension

Add to ~/.config/goose/config.yaml:

extensions:
  dftracer:
    type: stdio
    cmd: dftracer-mcp-server
    args: []
    enabled: true
    description: >
      DFTracer I/O tracing toolkit. Use summarize_trace() for a quick overview
      of any trace directory, analyze() for full dfanalyzer pipeline output,
      or the dftracer_utils tools (info, stats, merge, split, …) for file-level
      operations.

If dftracer-mcp-server is not on your PATH (e.g. installed in a venv), point directly at the Python interpreter:

extensions:
  dftracer:
    type: stdio
    cmd: /path/to/dftracer-agents/venv/bin/python
    args:
      - /path/to/dftracer-agents/dftracer_mcp_server.py
    enabled: true

Expose only one service if you want a smaller tool set:

extensions:
  dftracer:
    type: stdio
    cmd: dftracer-mcp-server
    args: [--service, analyzer]   # or: --service utils
    enabled: true

Using Goose to annotate an application and run the full pipeline

The annotation pipeline clones a target application, detects languages, auto-annotates source files with dftracer macros, verifies correctness with a build + smoke test, pauses for your review, then collects and analyzes traces.

Pipeline files live in dftracer-agents/recipes/:

File	Purpose
`pipeline.yaml`	Main orchestration recipe (headless)
`annotate-c.yaml`	Per-file C annotation sub-recipe
`annotate-cpp.yaml`	Per-file C++ annotation sub-recipe
`annotate-python.yaml`	Per-file Python annotation sub-recipe
`_inc-top.inc`	Shared: Step 0 (lessons), General Rules, Step 1 (read)
`_inc-write.inc`	Shared: Step 5 (write back)
`_inc-report.inc`	Shared: Step 7 (report format) + General Pitfalls

A lessons-learned file at .agents/skills/dftracer-annotation-lessons/SKILL.md is read by every annotation agent at startup and appended after each session.

Option A — Interactive session (recommended)

Runs interactively: the agent asks you questions, pauses for confirmation before the trace run, and lets you request fixes between steps.

# Prerequisites: Node.js >=20 (for goose tui, if using that route)
# The recommended interactive method uses the pipeline skill + wrapper script:

./dftracer-agents/run-pipeline.sh

This starts a goose session, injects the startup prompt automatically, then connects your keyboard for the Q&A. The agent will ask:

What is the Git URL of the application you want to annotate?
> https://github.com/org/myapp

Which branch or tag? (default: main)
> main

Smoke test command? (leave blank to auto-detect)
> make test

Extra CMake build flags? (leave blank to skip)
>

After annotation and build verification it pauses:

┌─────────────────────────────────────────────────────────┐
│  ANNOTATION REPORT — myapp/20260616_120000              │
│  src/io.c       DONE   12 annotated  2 skipped          │
│  src/main.c     DONE    4 annotated  0 skipped          │
│  Coverage: 16 / 18   io=10  comm=4  mem=2  cpu=0        │
│  Build: PASSED   Smoke test: PASSED                      │
└─────────────────────────────────────────────────────────┘

Proceed with dftracer trace run? [yes / no / fix <file>]
> yes

Alternatively, start a plain goose session and paste this prompt manually:

Load and follow the dftracer annotation pipeline skill from:
/workspaces/dftracer-agents/.agents/skills/dftracer-pipeline/SKILL.md

Read the full SKILL.md file now, then immediately start Step 1 by asking me:
  'What is the Git URL of the application you want to annotate?'

Option B — Headless `goose run` (CI / scripted use)

Pass all inputs as --params. The pipeline runs end-to-end unattended, annotating files in parallel via sub-recipes, then prints the full report and trace analysis.

goose run --recipe dftracer-agents/recipes/pipeline.yaml \
  --params app_url="https://github.com/org/myapp" \
  --params ref="main" \
  --params smoke_cmd="make test" \
  --params extra_flags="-DENABLE_MPI=ON"

smoke_cmd and extra_flags are optional (auto-detected if omitted).

Annotation recipes directly

You can also invoke a single-file annotation recipe without the pipeline:

goose run --recipe dftracer-agents/recipes/annotate-c.yaml \
  --params run_id="myapp/20260616_120000" \
  --params filepath="src/io.c"

Pass build_errors to fix a specific compiler failure:

goose run --recipe dftracer-agents/recipes/annotate-c.yaml \
  --params run_id="myapp/20260616_120000" \
  --params filepath="src/io.c" \
  --params build_errors="io.c:42: error: 'DFTRACER_C_FUNCTION_END' undeclared"

Example Goose session — trace analysis

goose session

Goose is running! Enter your instructions, or try asking for help.

( O )> summarize the I/O trace at /data/cm1_1_48_20240926

─── Tool use: summarize_trace ────────────────────────────────────
{
  "trace_path": "/data/cm1_1_48_20240926"
}
──────────────────────────────────────────────────────────────────

DFTracer Trace Summary: /data/cm1_1_48_20240926
============================================================
  Trace files : 48 total
  Total events: 284,041
  Processes   : 48
  Threads     : 48
  Duration    : 145.364s
  Bytes read  : 2.7 MB
  Bytes written: 688.3 MB

Top I/O operations:
  write                  112,353
  __xstat                 46,899
  fclose                  25,992
  open                    21,814
  close                   21,563
  fopen64                 20,496
  read                    13,489
  opendir                  9,215

This trace shows a write-heavy HPC workload (CM1 atmospheric model) with
48 MPI ranks. The high write volume (688 MB) relative to reads (2.7 MB)
is typical of checkpoint I/O. The __xstat and opendir calls suggest
frequent directory polling, common in parallel file systems.

( O )> compress all the .pfw files in that directory

─── Tool use: pgzip ──────────────────────────────────────────────
{
  "directory": "/data/cm1_1_48_20240926"
}
──────────────────────────────────────────────────────────────────
…

The annotation pipeline prompt and the analysis workflow below can be combined — after the pipeline completes, ask Goose to continue with analyze and query on the trace directory it just produced.

Recommended analysis workflow

When working with a new trace, follow this sequence — the agent will do this automatically if you ask it to "analyse" a trace:

1. detect_preset(trace_path)
   → reads event categories from the trace
   → returns "posix" (HPC/scientific) or "dlio" (AI/ML deep learning)

2. summarize_trace(trace_path)
   → pure-Python overview: file count, duration, bytes read/written, top ops
   → always works, no native C++ required

3. query(trace_path, view_type="proc_name")
   → per-process I/O breakdown (who writes the most?)

4. query(trace_path, view_type="file_name",
         filter_expr='cat == "POSIX" and dur > 1000')
   → hot-file analysis (which files take the longest per call?)

5. query(trace_path, view_type="time_range")
   → I/O rate over time (when does the burst happen?)

6. analyze(trace_path, analyzer_preset=<from step 1>)
   → full dfanalyzer pipeline (requires native C++ and dfanalyzer install)

AI/ML auto-detection (detect_preset): inspects the cat field of every event against the AI/ML category signatures from dftracer's ai_common.py. If any of COMPUTE, DATA, DATALOADER, COMM, DEVICE, CHECKPOINT, or PIPELINE categories are present — or AI/ML function names like forward, backward, epoch, fetch — the dlio preset is recommended; otherwise posix.

Available tools in Goose

Tool	Service	Description
`detect_preset`	analyzer	Auto-detect posix vs dlio preset from trace event categories
`summarize_trace`	analyzer	Pure-Python trace summary (always works, no native deps)
`query`	analyzer	Exploratory groupby views: file_name, proc_name, time_range, raw
`analyze`	analyzer	Full dfanalyzer pipeline (POSIX, DLIO, Darshan presets)
`list_presets`	analyzer	Show all dfanalyzer configuration options
`plot`	plot	Generate a chart from trace data (5 types, PNG/HTML/SVG)
`plot_all`	plot	Generate all 5 standard plots in one call
`info`	utils	Trace directory summary
`stats`	utils	Per-function I/O statistics
`merge`	utils	Merge multiple trace files
`split`	utils	Split a trace by process/time
`pgzip`	utils	Parallel gzip compression
`index`	utils	Build bloom-filter index for fast queries
`reader`	utils	Read and dump trace events
`organize`	utils	Organize traces by run
`replay`	utils	Replay trace I/O for benchmarking
`tar`	utils	Archive trace directories
…	utils	21 dftracer_utils tools total

`query` filter expression syntax

The filter_expr parameter uses the same DSL as dftracer_stats --query. Available fields:

Field	Type	Example
`cat`	string	`cat == "POSIX"`
`name`	string	`name in ("read", "write")`
`dur`	int (µs)	`dur > 1000`
`ts`	int (µs)	—
`pid`, `tid`	int	`pid == 3537780`

Expressions can be combined with and / or:

'cat == "POSIX" and dur > 500'
'name in ("read", "write") and dur > 10000'
'cat == "COMPUTE"'

MCP server directly

You can also run the server manually for debugging or to wire it into any MCP client:

# Start the server (it blocks, reading MCP requests from stdin)
dftracer-mcp-server --service both

# Or via Python
python dftracer_mcp_server.py --service analyzer

The server writes MCP JSON-RPC to stdout and reads from stdin. Any MCP client (Goose, Claude Desktop, a custom script) can connect by spawning it as a subprocess.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.agents/skills		.agents/skills
.claude		.claude
.devcontainer		.devcontainer
.goose		.goose
.opencode		.opencode
.vscode		.vscode
dftracer-agents		dftracer-agents
dftracer_agents.egg-info		dftracer_agents.egg-info
docs		docs
scripts		scripts
test		test
.env		.env
.gitignore		.gitignore
.goosehints		.goosehints
.gooseignore		.gooseignore
.mcp.json		.mcp.json
CLAUDE.md		CLAUDE.md
README.md		README.md
TODO.md		TODO.md
goose_config.yaml		goose_config.yaml
ior_dftracer_session_2026-06-19.pdf		ior_dftracer_session_2026-06-19.pdf
program.md		program.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

dftracer-agents

Setup

Installing agent skills

Pipeline

Running the tests

What the tests verify

Interactive MCP REPL

LLM agent (ollama)

Prerequisites

Configure the LLM endpoint

Run

Example session

Goose integration

Install Goose

Configure the dftracer extension

Using Goose to annotate an application and run the full pipeline

Option A — Interactive session (recommended)

Option B — Headless `goose run` (CI / scripted use)

Annotation recipes directly

Example Goose session — trace analysis

Recommended analysis workflow

Available tools in Goose

`query` filter expression syntax

MCP server directly

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

dftracer-agents

Setup

Installing agent skills

Pipeline

Running the tests

What the tests verify

Interactive MCP REPL

LLM agent (ollama)

Prerequisites

Configure the LLM endpoint

Run

Example session

Goose integration

Install Goose

Configure the dftracer extension

Using Goose to annotate an application and run the full pipeline

Option A — Interactive session (recommended)

Option B — Headless goose run (CI / scripted use)

Annotation recipes directly

Example Goose session — trace analysis

Recommended analysis workflow

Available tools in Goose

query filter expression syntax

MCP server directly

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Option B — Headless `goose run` (CI / scripted use)

`query` filter expression syntax

Packages