feat: add jsonl output format (#1159)#3415
Conversation
Closes anchore#1159. Adds a JSON Lines (newline-delimited JSON) output formatter selectable via -o jsonl. Each line is a single match record per @kzantow's clarification on the issue, suitable for pipelines like: grype <input> -o jsonl | jq -r .vulnerability.id | xargs ./cve-search.py The 'ndjson' alias is also accepted as it is the more common name in some communities. Document-level metadata (descriptor, source, distro, ignoredMatches, alertsByPackage) is intentionally omitted — JSON Lines is a flat record stream by design. Consumers that need that metadata should continue to use -o json. The package documentation calls this out explicitly so future readers don't read it as an oversight. Empty match sets produce zero bytes of output, which is the standard jsonl convention (an empty file is a valid empty stream). Tests: - presenter golden snapshots for both image and directory sources - empty-document case asserts zero output - line-count assertion ties output line count to document Match count - JSON-validity assertion: every emitted line independently parses Signed-off-by: Chris (ChrisJr404) <11917633+ChrisJr404@users.noreply.github.com>
|
I have a concern that this change might be difficult to reconcile with a potential future state of Grype output. To reduce file size and represent more information, I suspect there could be a change the structure to use package references instead of including them inline with each match. This isn't a blocking concern but it is something we should consider when accepting this PR. |
|
Fair concern. The JSONL emitter here just streams the same per-match record shape that the existing JSON output already inlines, so we are inheriting the inline-package structure rather than introducing it. If the JSON output later moves to package-reference dedup, the JSONL stream would naturally follow the same schema change at the same time, since both encoders pull from the same intermediate Two ways I see to handle the forward-compat:
Happy to add (2) if you want a clearer compat boundary right now. If you would rather hold the PR until the canonical schema discussion lands, I can convert this to draft and let it sit. Either way, let me know which fits. |
Closes #1159 — adds a JSON Lines (newline-delimited JSON) output formatter selectable via
-o jsonl.Each line is a single match record, per @kzantow's clarification on the issue ("each line would be each match record"). This shape is what makes the requested pipeline ergonomic:
ndjsonis also accepted as an alias since some communities prefer that spelling.Why this shape
@kzantow asked on the issue:
Reporter @ocervell confirmed yes. So this PR ships exactly that: one
Matchper line, no surrounding envelope.Document-level metadata (
descriptor,source,distro,ignoredMatches,alertsByPackage) is intentionally omitted — JSON Lines is a flat record stream by design. Consumers that need that metadata should continue to use-o json. The package doc comment calls this out explicitly so it's not read as an oversight.Empty match sets produce zero bytes of output, which is the standard jsonl convention (an empty file is a valid empty stream).
What's added
grype/presenter/jsonl/presenter.gojson.Encoder.Encodeper match (encoder appends\nautomatically, so output is naturally newline-delimited)grype/presenter/jsonl/presenter_test.gogrype/presenter/jsonl/testdata/snapshot/*.golden-update)internal/format/format.goJSONLinesFormatconstant,Parse(\"jsonl\")andParse(\"ndjson\")recognition, added toAvailableFormatsinternal/format/format_test.gojsonl/JSONL/ndjsonparsinginternal/format/presenter.goJSONLinesFormattojsonl.NewPresenterStreaming question (raised by reporter)
Streaming would require restructuring how the matcher pipeline produces results — currently grype waits for all matchers to finish before invoking the presenter. That is a much larger change and is not in scope for this PR. JSON Lines as a file format is independently useful (post-pipe consumption with
jq/xargs/etc.) even without streaming, which matches @kzantow's comment that "We probably wouldn't stream each result individually, but only output a JSONL file at the end."Test plan
go test ./grype/presenter/...passes (existing presenters + new jsonl)go test ./internal/format/...passesgo build ./...cleangrype --helplistsjsonlin the format optionsgrype dir:. -o jsonlagainst a directory with no matches produces zero output, as designed