You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CLI that diagnoses Ubuntu system problems. Collects evidence from journald,
dpkg/apt, snap, dmesg, AppArmor, hardware, disk, etc., correlates them
deterministically, and uses a local LLM to explain the top hypotheses
in plain English.
The full design is in docs/plan.md. Read it before making
architectural changes.
Collectors live in src/ubuntu_doctor/collectors// —
one folder per data source. Each is async, pure, has a degradation mode,
exposes COLLECTOR = MySourceCollector() from plugin.py.
Analyzers live in src/ubuntu_doctor/analyzers// —
rule-based correlators that consume a Snapshot and emit Hypothesis
objects with evidence pointers, suggested commands (as text), and risks.
LLM is reached over OpenAI-compatible HTTP at
http://localhost:8336/v1 (Ubuntu Inference Snap). Default model:
gemma:e4b (the local endpoint accepts either gemma:e4b or the
canonical id gemma4-e4b-q4-k-m — both route to the same loaded
model). Endpoint and model are config knobs (--base-url, --model).
Hard rules
Read-only. No collector and no analyzer ever writes to the system.
The LLM never executes commands. Fixes are rendered as copy-pasteable
text, never run.
No --apply in v1. Don't add it until v2; when added, it must
re-prompt per command with diff/effect.
Never require root for basic operation. Collectors that can't read
a source emit a DegradationReport carrying the exact sudo command
that would unlock it. The summary surfaces these — no silent omission.
One LLM call per ubuntu-doctor run by default.--deep allows
follow-up calls. Multi-agent fan-out per plugin is explicitly not the
design — local inference serialises requests, so N agents = N × latency
with no accuracy gain.
Rules first, LLM second. A deterministic correlator produces the
candidate set. The LLM ranks/explains; it does not invent hypotheses.
ubuntu-doctor --no-ai must remain useful on its own.
No generic man-page / /usr/share/doc RAG. Retrieval is
event-driven: changelogs, NEWS, AppArmor profile diffs, apport reports,
kernel taint, local incident memory. Scoped to the incident window.
# install in editable mode with dev deps
pip install -e '.[dev]'# run the CLI (currently --no-ai only)
ubuntu-doctor --no-ai
ubuntu-doctor --no-ai --json
ubuntu-doctor --no-ai --since 7d
# run tests
pytest -q
Status
v1 collector + analyzer set complete. Landed:
Collectors (10): dpkg_history, systemd_failed, dmesg,
journald, apt_log, snap_changes, apparmor_audit,
hardware, diskspace, cache_state. The last four are
fact-only (no events) — they populate snapshot.facts[<id>].
LLM client against the Ubuntu Inference Snap with lenient JSON
parsing, graceful degradation when the endpoint is unreachable,
and a hard safety filter (FORBIDDEN_FIX_PATTERNS) that strips
aa-complain/aa-disable/rm -rf //dd of=/dev/... etc. from
model-proposed fix_commands and surfaces the strip as a risk
Symptom-keyword ranker.py that
re-ranks deterministic hypotheses for ubuntu-doctor why <symptom>
RAG layer that retrieves changelog
entries between two versions, NEWS files, AppArmor profile bodies
from /etc/apparmor.d and /var/lib/snapd/apparmor/profiles, and
apport reports matched by ExecutablePath. Scoped per-hypothesis,
deduplicated across hypotheses, capped at ~2KB/snippet.
Feedback store — SQLite-backed local
incident memory at ~/.local/share/ubuntu-doctor/incidents.db.
Fingerprint is a set of canonical tokens (analyzer ids, event
kinds, subjects); similarity is Jaccard, not vector — no extra
dependency. Past incidents above 0.2 similarity are injected into
the LLM prompt as few-shot context. ~/.cache/ubuntu-doctor/last_run.json
cache carries the most recent diagnosis so ubuntu-doctor feedback knows
what to talk about.
CLI: ubuntu-doctor (passive), ubuntu-doctor why <symptom> (active), and
ubuntu-doctor feedback (interactive incident recorder). All accept
--no-ai, --json, --since, --model, --base-url,
--llm-timeout; diagnose modes additionally accept --no-rag and
--no-history for opt-out of the new subsystems.
Open items: --apply mode (apply fixes interactively, not in v1),
MCP collector server, community incident corpus, snap distribution.
See docs/plan.md.
Window semantics:--since filters historical events (package
upgrades, past service failures). It does NOT filter
current-state facts like "this unit is currently failed" — those are
emitted regardless of when they last exited. The systemd_failed
collector documents this explicitly; new state-fact collectors should
follow the same pattern.