DFDiagnoser turns DFAnalyzer's per-window analysis facts into longitudinal diagnosis findings — is this bottleneck persistent, getting worse, where, and what class of fix does it call for — that DFOptimizer can act on. It runs the same way offline (replay a saved fact bundle) and online (consume a live stream over Mofka or ZMQ), producing identical findings.
DFAnalyzer emits analyzer.fact-envelope.v1 facts — one per analysis window per
view (a temporal view like window/epoch/step/time_range, or a spatial view
like file_name/proc_name). Each fact carries a fact_type (e.g.
fetch_pressure), a two-level scope (layer:view aggregate, or
layer:view:entity detail), a continuous severity, and opportunity_tags.
DFDiagnoser is a pure fact consumer (scoring/fact-building live in the analyzer).
It tracks each (fact_type, scope) along its temporal axis and summarizes the
trajectory:
- persistence — longest run of consecutive windows the fact appears in
- prevalence — fraction of windows it appears in
- trend — improving / stable / worsening
- motif — the classified pattern (e.g.
persistent_pressure,metadata_bound) - recommendation_bundle + opportunity_tags — the class of fix (e.g.
input_pipeline_tuning)
Temporal views (window online; epoch/step/time_range offline) are the
longitudinal axis; spatial views (file/proc/host) are one-shot. Online, it can
also emit per-window control findings so the optimizer acts on fresh state.
pip install dfdiagnoser # core (offline)
pip install "dfdiagnoser[streaming]" # + online transports (pyzmq / mofka)From source:
git clone https://github.com/LLNL/dfdiagnoser.git && cd dfdiagnoser
uv sync && uv pip install -e .The CLI is a Hydra app selecting an input (file / mofka / zmq) and an
output (console / file).
A minimal time_range rule (offline rules are workload-specific; the shipped
dlio.yaml rules target the streaming epoch axis). Save as /tmp/tr.yaml:
schema_version: analysisfact-rules.v1
defaults: {rule_version: "1.0.0", emit_mode: aggregate, confidence: "0.80"}
rules:
- id: tr.reader_pressure.v1
priority: 100
source_view: time_range
fact_type: reader_pressure
required_metrics: [reader_posix_time_proc_max, app_time_proc_max]
derived_metrics:
reader_frac: "fillna0(reader_posix_time_proc_max) / max(fillna0(app_time_proc_max), 1e-9)"
when: "reader_frac >= 0.10"
severity_score: "clip01(reader_frac)"
opportunity_tags: [dataloader_prefetch, reader_parallelism]# 1. DFAnalyzer writes a fact bundle (facts.jsonl) with output=file. time_range is
# the offline longitudinal axis (epoch/window come from the streaming path below).
dfanalyzer analyzer/preset=dlio trace_path=tests/data/extracted/dftracer-dlio \
view_types=[time_range] facts.enabled=true \
facts.eval_rule_file=/tmp/tr.yaml output=file output.path=/tmp/bundle
# 2. DFDiagnoser replays it into findings
dfdiagnoser input=file input.path=/tmp/bundle output=console╭─ DFDiagnoser Diagnosis ─╮
│ Findings 1 │
│ Scopes 1 │
│ Severity critical: 1 │
╰─────────────────────────╯
Findings
└── view_type: time_range (1)
└── time_range (1)
└── [C] reader_pressure: unclassified (severity critical 1.00, conf 0.50)
├── prevalence 0.93, persistence 39, trend stable -> investigate
└── (fact) reader_pressure @ time_range
(Read it as: reader_pressure was critical across 39 consecutive time windows;
its opportunity_tags carry the fix class the optimizer acts on.)
Streaming windows the event flow, so each window carries epoch/window coordinates —
the per-epoch longitudinal axis. DFAnalyzer streams fact envelopes with output=zmq;
DFDiagnoser consumes them live and prints the summary on idle (or, with
input.output_address set, streams control findings onward to the optimizer):
# DFDiagnoser binds and waits for facts
dfdiagnoser input=zmq input.address="tcp://*:5556" output=console
# DFAnalyzer side (separate process) pushes window-view facts to it
dfanalyzer analyzer/preset=dlio input=zmq input.address="tcp://*:5555" \
view_types=[window] facts.enabled=true \
output=zmq output.address="tcp://127.0.0.1:5556"Findings
└── view_type: window (1)
└── window (1)
└── [C] fetch_pressure: persistent_pressure (severity critical 1.00, conf 0.80)
├── prevalence 1.00, persistence 18, trend stable -> input_pipeline_tuning
└── (fact) fetch_pressure @ window
dfdiagnoser input=mofka \
input.group_file="$MOFKA_GROUP_FILE" \
input.topic_name=analyzer_facts \
input.output_topic=diagnosis_findings \
output=consoleAll three paths yield the same DiagnosisResult.findings; only the transport differs.
- Input:
analyzer.fact-envelope.v1envelopes — a.jsonlfile / bundle dir (offline), or a Mofka/ZMQ stream (online). - Output: findings rendered to the console, written to JSON (
output=file), or streamed onward. Each finding's wire form carriesfinding_type,scope,motif,severity_score,prevalence,persistence,trend_direction,opportunity_tags,recommendation_bundle, andkey_metrics— the fields DFOptimizer gates actuation on.
- Python >= 3.9
- DFAnalyzer fact envelopes (run DFAnalyzer with
facts.enabled=true) - Online transports:
pyzmq(ZMQ) ormochi-mofka(Mofka), via the[streaming]extra
MIT — see LICENSE.