Skip to content

Commit c68a1c9

Browse files
authored
feat(view-browser): add /log scan-log viewer with level + search filters (#107)
* feat(view-browser): add /log scan-log viewer with level + search filters Adds a read-only `/log` route to the browser interface that surfaces the per-run `argus.log` file with the same shape as `/findings`: filter bar (min level + substring search) above a scrollable monospace pane, all driven by URL query params so the page stays bookmarkable. Closes a real triage gap. Today, answering "why did osv exclude 258 findings?" or "did clamav actually run?" required dropping back to the terminal and grepping the log; the browser interface had no way to inspect log output at all. Particularly costly for the personas this surface targets (owners, managers, execs) who don't keep a terminal open alongside. What's in this PR - `argus/viewers/browser/log_view.py` — UI-free parser + filter helpers. Mirrors the pattern in `argus.core.findings_view` so routes, templates, and tests share one code path. Handles continuation lines (multi-line scanner stderr) by joining onto the previous header entry. - `/log` route on the browser app: `?scan=`, `?level=`, `?q=`. Level filter accepts DEBUG / INFO / WARNING (or short WARN) / ERROR / CRITICAL, case-insensitive; unrecognized values silently fall back to "no filter" rather than 500ing on a crafted URL. - `/log/raw` companion route that streams the file as `text/plain` with a `Content-Disposition: attachment` header so browsers download for grep/diff/issue-paste workflows. - `templates/log.html.j2` plus a `Log` link added to every page's nav. CSS for `.log-pane` / `.log-level-*` accents (DEBUG muted, INFO accent-dim, WARN amber, ERROR red) — restrained so a 30k-line log doesn't look like a Christmas tree. - 27-test suite in `argus/tests/viewers/browser/test_log.py`: parser (header lines, continuations, line-number tracking, edge cases), filter (min level matrix, search case-insensitivity, combined filters), routes (empty states, level filter, search filter, scan-param threading through nav, raw download). Engine: rewords the "Native pull failed for X — retrying with --platform linux/amd64" log line. The retry virtually always succeeds for upstreams that publish amd64-only (clamav being the canonical example), so the line read more alarming than it deserved. New phrasing: `X: native pull unsuccessful (Nms) — auto-falling back to --platform linux/amd64 (common for upstreams without arm64 builds)`. Roadmap (`docs/developer/SDK-ROADMAP.md`): adds Phase 3 — Scan log viewer with the five tasks (SU through SY) checked off in this PR. Out-of-scope follow-ups (per-scanner filter chips, jump-to-error shortcuts, `<mark>` highlighting, live tail) listed there too. Out of scope for this MVP - Live tail (deliberately deferred — needs a websocket and process watcher; out of proportion for a single-user localhost UI) - Match highlighting via `<mark>` tags (filter narrowing + browser Cmd+F covers the current pain) - Per-scanner filter chips - Jump-to-first-error / jump-to-last-error keyboard shortcuts Testing - Full SDK suite: 1390 passed, 8 skipped, 7 deselected (+27 from the new test_log.py) - Manual run: `argus scan supply-chain` → `argus view --interface= browser`, click `Log` in the nav, level filter / search work end-to-end on the live argus.log file. * fix(view-browser): parse argus.log as JSON-lines, not plain text The /log viewer's parser assumed argus.log used the same human- readable text format the *console* handler emits (HH:MM:SS LEVEL logger msg). It doesn't — the *file* handler in ``argus.audit.logger.JsonLogFormatter`` writes structured JSON, one object per line. So the regex matched zero lines on every real log file and the viewer rendered an empty pane. Empirical: a real 11KB scan log (45 entries) parsed as 0 entries before, 45 after. Rewrites parse_log to read JSON-lines: - One ``json.loads`` per line; malformed lines (partial flushes, e.g. when reading a log mid-scan) silently skipped rather than 500ing. - Records with a non-Python-logging ``level`` are dropped instead of being assigned an arbitrary rank that would warp filters. - ``_extract_time`` pulls HH:MM:SS from the ISO timestamp; tolerant of microseconds and any TZ form (``+00:00``, ``-05:00``, ``Z``). - The continuation-line concept that mattered for plain-text logs is gone — multi-line messages live inside the JSON ``message`` string and the ``<pre>`` template renders embedded newlines as-is. Tests: replaced the regex-format ``_SAMPLE_LOG`` with a JSON-lines fixture that mirrors what JsonLogFormatter actually writes, and added new parser cases for the JSON-specific edge paths (malformed lines skipped, unknown levels dropped, missing module falls back to "argus", Z-suffix timestamp, empty lines ignored). 31 tests pass (up from 27). LogEntry's display shape is unchanged so the template needs no edits — only the format being parsed changed. --------- Co-authored-by: eFAILution <eFAILution@users.noreply.github.com>
1 parent a3ad04a commit c68a1c9

8 files changed

Lines changed: 816 additions & 2 deletions

File tree

argus/core/engine.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -538,9 +538,14 @@ def _pull_image(self, image: str) -> bool:
538538
elapsed = int((time.monotonic() - start) * 1000)
539539

540540
if result.returncode != 0:
541+
# Distinct from a hard "pull failed" — the retry below
542+
# almost always succeeds for upstreams that publish amd64-
543+
# only (clamav, etc.). Word it as a fallback so users
544+
# reading the log don't misread the line as a scan failure.
541545
logger.info(
542-
"Native pull failed for %s (%dms), retrying with "
543-
"--platform linux/amd64. stderr: %s",
546+
"%s: native pull unsuccessful (%dms) — auto-falling "
547+
"back to --platform linux/amd64 (common for upstreams "
548+
"without arm64 builds). stderr: %s",
544549
image,
545550
elapsed,
546551
result.stderr.strip()[:200],
Lines changed: 369 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,369 @@
1+
"""Tests for the /log viewer route + parsing/filter helpers.
2+
3+
Two layers of coverage:
4+
- Pure-function tests for ``log_view.parse_log`` and
5+
``log_view.filter_entries`` — no app, no fixtures.
6+
- Route tests via FastAPI's TestClient covering the empty state, the
7+
level filter, the search filter, and the raw download endpoint.
8+
"""
9+
10+
from __future__ import annotations
11+
12+
import json
13+
14+
import pytest
15+
16+
from argus.viewers.browser.log_view import (
17+
LogEntry,
18+
filter_entries,
19+
parse_log,
20+
)
21+
22+
pytest.importorskip("fastapi")
23+
24+
from fastapi.testclient import TestClient # noqa: E402
25+
26+
from argus.viewers.browser.app import create_app # noqa: E402
27+
28+
29+
# ───────────────────────────────────────────────
30+
# Fixtures shared across route + parser tests
31+
# ───────────────────────────────────────────────
32+
33+
# JSON-lines format — matches what JsonLogFormatter (in
34+
# argus/audit/logger.py) actually writes to disk. The console
35+
# handler emits human-readable HH:MM:SS lines but the file handler
36+
# always emits structured JSON, so the parser only handles JSON.
37+
_SAMPLE_LOG = "\n".join([
38+
json.dumps({
39+
"timestamp": "2026-05-04T07:13:58.531038+00:00",
40+
"level": "DEBUG", "module": "argus",
41+
"function": "_load_exclusions", "line": 42,
42+
"message": "Full exclusion set: ['node_modules', '.git']",
43+
}),
44+
json.dumps({
45+
"timestamp": "2026-05-04T07:13:58.612001+00:00",
46+
"level": "INFO", "module": "argus",
47+
"function": "_load_exclusions", "line": 51,
48+
"message": "Loaded 66 exclusion pattern(s) from .gitignore",
49+
}),
50+
json.dumps({
51+
"timestamp": "2026-05-04T07:13:59.001234+00:00",
52+
"level": "WARNING", "module": "argus",
53+
"function": "pull_image", "line": 542,
54+
"message": "Native pull failed for clamav/clamav:1.5",
55+
}),
56+
json.dumps({
57+
"timestamp": "2026-05-04T07:13:59.105678+00:00",
58+
"level": "ERROR", "module": "viewers.browser",
59+
"function": "_resolve_scan", "line": 99,
60+
"message": "Could not connect to docker.sock",
61+
}),
62+
json.dumps({
63+
"timestamp": "2026-05-04T07:13:59.205678+00:00",
64+
"level": "INFO", "module": "argus",
65+
"function": "_run_scanner", "line": 712,
66+
"message": "Scanner 'gitleaks' finished in 11722ms: 0 finding(s)",
67+
}),
68+
]) + "\n"
69+
70+
71+
def _sample_payload() -> dict:
72+
return {
73+
"severity_threshold": None,
74+
"results": [
75+
{
76+
"scanner": "bandit",
77+
"findings": [],
78+
"raw_report": None,
79+
"sarif_report": None,
80+
"metadata": {},
81+
"critical_count": 0,
82+
"high_count": 0,
83+
"medium_count": 0,
84+
"low_count": 0,
85+
"total_count": 0,
86+
},
87+
],
88+
}
89+
90+
91+
def _write_scan(tmp_path, log_contents: str | None = _SAMPLE_LOG) -> str:
92+
"""Drop a results JSON + optional argus.log into ``tmp_path``."""
93+
(tmp_path / "argus-results.json").write_text(json.dumps(_sample_payload()))
94+
if log_contents is not None:
95+
(tmp_path / "argus.log").write_text(log_contents)
96+
return str(tmp_path)
97+
98+
99+
# ───────────────────────────────────────────────
100+
# Pure-function tests
101+
# ───────────────────────────────────────────────
102+
103+
104+
class TestParseLog:
105+
def test_parses_each_json_line(self):
106+
entries = parse_log(_SAMPLE_LOG)
107+
assert len(entries) == 5
108+
109+
def test_canonicalizes_warn_to_warning(self):
110+
text = json.dumps({
111+
"timestamp": "2026-05-04T07:00:00+00:00",
112+
"level": "WARN", "module": "argus",
113+
"message": "short-warn form",
114+
}) + "\n"
115+
entries = parse_log(text)
116+
assert len(entries) == 1
117+
assert entries[0].level == "WARNING"
118+
119+
def test_extracts_hhmmss_from_iso_timestamp(self):
120+
entries = parse_log(_SAMPLE_LOG)
121+
warning = next(e for e in entries if e.level == "WARNING")
122+
assert warning.time == "07:13:59"
123+
124+
def test_extracts_time_with_z_suffix(self):
125+
text = json.dumps({
126+
"timestamp": "2026-05-04T09:30:15.123Z",
127+
"level": "INFO", "module": "argus",
128+
"message": "z-suffixed",
129+
}) + "\n"
130+
entries = parse_log(text)
131+
assert len(entries) == 1
132+
assert entries[0].time == "09:30:15"
133+
134+
def test_skips_malformed_json_lines(self):
135+
# Real-world logs can have a partially-flushed final line if
136+
# the user reads while the scan is mid-write. Skip rather than
137+
# 500.
138+
text = (
139+
json.dumps({
140+
"timestamp": "2026-05-04T07:00:00+00:00",
141+
"level": "INFO", "module": "argus",
142+
"message": "first",
143+
}) + "\n"
144+
+ "{not valid json\n"
145+
+ json.dumps({
146+
"timestamp": "2026-05-04T07:00:01+00:00",
147+
"level": "INFO", "module": "argus",
148+
"message": "third",
149+
}) + "\n"
150+
)
151+
entries = parse_log(text)
152+
assert [e.msg for e in entries] == ["first", "third"]
153+
154+
def test_skips_records_with_unknown_level(self):
155+
text = (
156+
json.dumps({
157+
"timestamp": "2026-05-04T07:00:00+00:00",
158+
"level": "INFO", "module": "argus", "message": "kept",
159+
}) + "\n"
160+
+ json.dumps({
161+
"timestamp": "2026-05-04T07:00:01+00:00",
162+
"level": "TRACE", "module": "argus", "message": "dropped",
163+
}) + "\n"
164+
)
165+
entries = parse_log(text)
166+
assert [e.msg for e in entries] == ["kept"]
167+
168+
def test_missing_module_falls_back_to_argus(self):
169+
text = json.dumps({
170+
"timestamp": "2026-05-04T07:00:00+00:00",
171+
"level": "INFO", "message": "no-module",
172+
}) + "\n"
173+
entries = parse_log(text)
174+
assert entries[0].logger == "argus"
175+
176+
def test_empty_lines_ignored(self):
177+
text = (
178+
"\n\n"
179+
+ json.dumps({
180+
"timestamp": "2026-05-04T07:00:00+00:00",
181+
"level": "INFO", "module": "argus", "message": "lonely",
182+
}) + "\n"
183+
+ "\n\n"
184+
)
185+
entries = parse_log(text)
186+
assert len(entries) == 1
187+
assert entries[0].msg == "lonely"
188+
189+
def test_line_no_points_at_source_line(self):
190+
entries = parse_log(_SAMPLE_LOG)
191+
# The WARNING is the 3rd entry in the sample (line 3 of the file).
192+
warning = next(e for e in entries if e.level == "WARNING")
193+
assert warning.line_no == 3
194+
195+
def test_empty_text_returns_empty_list(self):
196+
assert parse_log("") == []
197+
198+
199+
class TestFilterEntries:
200+
def _make(self, level: str, msg: str = "msg") -> LogEntry:
201+
return LogEntry(line_no=1, time="07:00:00", level=level, logger="argus", msg=msg)
202+
203+
def test_min_level_excludes_below(self):
204+
entries = [self._make("DEBUG"), self._make("INFO"), self._make("WARNING"), self._make("ERROR")]
205+
result = filter_entries(entries, min_level="WARNING")
206+
assert {e.level for e in result} == {"WARNING", "ERROR"}
207+
208+
def test_min_level_unknown_value_returns_all(self):
209+
entries = [self._make("DEBUG"), self._make("INFO")]
210+
result = filter_entries(entries, min_level="bogus")
211+
assert len(result) == 2
212+
213+
def test_min_level_accepts_lowercase(self):
214+
entries = [self._make("DEBUG"), self._make("WARNING")]
215+
result = filter_entries(entries, min_level="warning")
216+
assert {e.level for e in result} == {"WARNING"}
217+
218+
def test_min_level_accepts_warn_short_form(self):
219+
entries = [self._make("INFO"), self._make("WARNING"), self._make("ERROR")]
220+
result = filter_entries(entries, min_level="warn")
221+
assert {e.level for e in result} == {"WARNING", "ERROR"}
222+
223+
def test_query_substring_matches_msg(self):
224+
entries = [self._make("INFO", "scanner finished"), self._make("INFO", "loading config")]
225+
result = filter_entries(entries, query="scanner")
226+
assert len(result) == 1
227+
assert "scanner" in result[0].msg
228+
229+
def test_query_substring_is_case_insensitive(self):
230+
entries = [self._make("ERROR", "Permission Denied")]
231+
result = filter_entries(entries, query="permission")
232+
assert len(result) == 1
233+
234+
def test_query_matches_logger_or_level(self):
235+
entries = [self._make("DEBUG", "irrelevant")]
236+
# Logger field included in the haystack — searching the logger
237+
# name finds the entry even when the message doesn't match.
238+
result = filter_entries(entries, query="argus")
239+
assert len(result) == 1
240+
# Level field included too.
241+
assert filter_entries(entries, query="debug") == result
242+
243+
def test_combined_level_and_query(self):
244+
entries = [
245+
self._make("DEBUG", "container exited"),
246+
self._make("WARNING", "container pull failed"),
247+
self._make("INFO", "scanner started"),
248+
]
249+
result = filter_entries(entries, min_level="WARNING", query="container")
250+
assert len(result) == 1
251+
assert result[0].level == "WARNING"
252+
253+
254+
# ───────────────────────────────────────────────
255+
# Route tests
256+
# ───────────────────────────────────────────────
257+
258+
259+
class TestLogRoute:
260+
def test_empty_state_when_log_missing(self, tmp_path):
261+
_write_scan(tmp_path, log_contents=None)
262+
client = TestClient(create_app(root=str(tmp_path)))
263+
resp = client.get("/log")
264+
assert resp.status_code == 200
265+
assert "No log available" in resp.text
266+
267+
def test_empty_state_when_no_scan_loaded(self, tmp_path):
268+
# Empty root → no scan → no log; graceful empty state, not 500.
269+
client = TestClient(create_app(root=str(tmp_path)))
270+
resp = client.get("/log")
271+
assert resp.status_code == 200
272+
assert "No log available" in resp.text
273+
274+
def test_renders_all_entries_with_no_filters(self, tmp_path):
275+
_write_scan(tmp_path)
276+
client = TestClient(create_app(root=str(tmp_path)))
277+
resp = client.get("/log")
278+
assert resp.status_code == 200
279+
assert "Showing <strong>5</strong> of 5 entries" in resp.text
280+
# Spot-check a few signatures from the sample log.
281+
assert "Native pull failed" in resp.text
282+
assert "Scanner 'gitleaks' finished" in resp.text
283+
284+
def test_level_filter_drops_lower_severity(self, tmp_path):
285+
_write_scan(tmp_path)
286+
client = TestClient(create_app(root=str(tmp_path)))
287+
resp = client.get("/log?level=warning")
288+
assert resp.status_code == 200
289+
# 1 WARNING + 1 ERROR remain; 2 INFO + 1 DEBUG drop out.
290+
assert "Showing <strong>2</strong> of 5 entries" in resp.text
291+
assert "(filtered)" in resp.text
292+
assert "Could not connect" in resp.text # ERROR survives
293+
assert "Loaded 66 exclusion" not in resp.text # INFO drops
294+
295+
def test_search_filter_narrows_to_matching_messages(self, tmp_path):
296+
_write_scan(tmp_path)
297+
client = TestClient(create_app(root=str(tmp_path)))
298+
resp = client.get("/log?q=clamav")
299+
assert resp.status_code == 200
300+
assert "Showing <strong>1</strong> of 5 entries" in resp.text
301+
assert "clamav" in resp.text
302+
303+
def test_combined_level_and_query(self, tmp_path):
304+
_write_scan(tmp_path)
305+
client = TestClient(create_app(root=str(tmp_path)))
306+
resp = client.get("/log?level=error&q=docker")
307+
assert resp.status_code == 200
308+
assert "Showing <strong>1</strong> of 5 entries" in resp.text
309+
assert "docker.sock" in resp.text
310+
311+
def test_unrecognized_level_is_silently_ignored(self, tmp_path):
312+
_write_scan(tmp_path)
313+
client = TestClient(create_app(root=str(tmp_path)))
314+
# Crafted URL: bogus level should fall back to no level filter,
315+
# not 500.
316+
resp = client.get("/log?level=bogus")
317+
assert resp.status_code == 200
318+
assert "Showing <strong>5</strong> of 5 entries" in resp.text
319+
320+
def test_nav_link_present_on_all_pages(self, tmp_path):
321+
_write_scan(tmp_path)
322+
client = TestClient(create_app(root=str(tmp_path)))
323+
for path in ("/", "/findings", "/log"):
324+
resp = client.get(path)
325+
assert resp.status_code == 200, path
326+
assert 'href="/log' in resp.text, path
327+
328+
def test_nav_link_carries_scan_param_when_present(self, tmp_path):
329+
# Scan param threading is what keeps the URL bookmarkable across
330+
# nav clicks; without it the picker / dashboard / findings /
331+
# log all snap back to the launch root.
332+
run = tmp_path / "run-a"
333+
run.mkdir()
334+
_write_scan(run)
335+
client = TestClient(create_app(root=str(tmp_path)))
336+
resp = client.get(f"/?scan={run}")
337+
assert resp.status_code == 200
338+
assert "/log?scan=" in resp.text
339+
340+
341+
class TestLogRawRoute:
342+
def test_returns_raw_log_with_text_plain(self, tmp_path):
343+
_write_scan(tmp_path)
344+
client = TestClient(create_app(root=str(tmp_path)))
345+
resp = client.get("/log/raw")
346+
assert resp.status_code == 200
347+
assert resp.headers["content-type"].startswith("text/plain")
348+
# Body matches the file we wrote, byte-for-byte.
349+
assert resp.text == _SAMPLE_LOG
350+
351+
def test_404_when_log_missing(self, tmp_path):
352+
_write_scan(tmp_path, log_contents=None)
353+
client = TestClient(create_app(root=str(tmp_path)))
354+
resp = client.get("/log/raw")
355+
assert resp.status_code == 404
356+
357+
def test_404_when_no_scan_loaded(self, tmp_path):
358+
client = TestClient(create_app(root=str(tmp_path)))
359+
resp = client.get("/log/raw")
360+
assert resp.status_code == 404
361+
362+
def test_content_disposition_marks_attachment(self, tmp_path):
363+
# FileResponse with filename= adds a Content-Disposition header
364+
# so browsers save the file rather than rendering inline.
365+
_write_scan(tmp_path)
366+
client = TestClient(create_app(root=str(tmp_path)))
367+
resp = client.get("/log/raw")
368+
cd = resp.headers.get("content-disposition", "")
369+
assert "argus.log" in cd

0 commit comments

Comments
 (0)