feat: Use OSV-Scalibr scan function directly #1936

another-rex · 2025-06-10T06:46:03Z

Behaviour changes / notes:

OSV-Scalibr's Scan() function does DFS rather than BFS, so the order of files being read in the snapshot changes slightly.
Their Scan() function also does not deduplicate the output of the extractor, so the SBOM with duplicates will return double the results. I think it's fine to have that behavior change for now unless someone complains about it.
Unfortunately, we also double log the path to the file in the error, because both the extractor and the walker both prepend the path the error originated from. This is a TODO in osv-scalibr to remove input path references in the extractor errors. (chore: Remove input.Path from extractor errors osv-scalibr#821)

another-rex · 2025-06-12T05:48:08Z

Annoyingly it seems like osv-scalibr's directory walk is not deterministic on different systems (as it uses the order the filesystem gives when iterating through a directory).

I'm not sure how to fix this for our tests, do we just avoid snapshotting any of the walking logs?

G-Rath · 2025-06-12T19:30:40Z

I'm not sure how to fix this for our tests, do we just avoid snapshotting any of the walking logs?

I think those logs are an important part of our snapshots since they're output we definitely want to be showing to the user (compared to some of the other logs we're currently filtering) and they aid in debugging since they tell us clearly what files are being scanned and the number of packages extracted.

Is there anyway to make the walking deterministic, even if its controlled via a flag or env variable so that we can opt-in?

G-Rath · 2025-06-12T19:46:14Z

pkg/osvscanner/stats.go

@@ -15,6 +15,10 @@ type FileOpenedPrinter struct {
 var _ stats.Collector = &FileOpenedPrinter{}

 func (c FileOpenedPrinter) AfterExtractorRun(_ string, extractorstats *stats.AfterExtractorStats) {


a possible workaround could be to try buffering the logging, and try to dump it immediately if we error - we could also do the same logging here under a different level like DEBUG so that you could see the actual unbuffered order...

e.g. something like:

// NOT ACTUAL GO CODE scannedLogs := []string{} err := scanner.Scan({ StatsAfterExtractorRun(stats) { slog.Debug(fmt.Sprintf( "Scanned %s file and found %d %s", extractorstats.Path, pkgsFound, output.Form(pkgsFound, "package", "packages"), )) scannedLogs = append(scannedLogs, fmt.Sprintf( "Scanned %s file and found %d %s", extractorstats.Path, pkgsFound, output.Form(pkgsFound, "package", "packages"), )) } }) for line := range slices.Sort(scannedLogs) { slog.Info(line) } if err != nil { slog.Error(err) }

I ended up with something similar to the buffering approach, just on the logs instead so we don't have to touch the code functions that much.

G-Rath · 2025-06-12T19:47:21Z

pkg/osvscanner/scan.go

+			RunningSystem: true,
+		}
+
+		if actions.CompareOffline {


nit: the else branches here could be the initial values in the struct

This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [github/codeql-action](https://redirect.github.com/github/codeql-action) | action | minor | `v3.28.18` -> `v3.29.0` | --- ### Release Notes <details> <summary>github/codeql-action (github/codeql-action)</summary> ### [`v3.29.0`](https://redirect.github.com/github/codeql-action/releases/tag/v3.29.0) [Compare Source](https://redirect.github.com/github/codeql-action/compare/v3.28.19...v3.29.0) ### CodeQL Action Changelog See the [releases page](https://redirect.github.com/github/codeql-action/releases) for the relevant changes to the CodeQL CLI and language packs. #### 3.29.0 - 11 Jun 2025 - Update default CodeQL bundle version to 2.22.0. [#2925](https://redirect.github.com/github/codeql-action/pull/2925) - Bump minimum CodeQL bundle version to 2.16.6. [#2912](https://redirect.github.com/github/codeql-action/pull/2912) See the full [CHANGELOG.md](https://redirect.github.com/github/codeql-action/blob/v3.29.0/CHANGELOG.md) for more information. ### [`v3.28.19`](https://redirect.github.com/github/codeql-action/releases/tag/v3.28.19) [Compare Source](https://redirect.github.com/github/codeql-action/compare/v3.28.18...v3.28.19) ##### CodeQL Action Changelog See the [releases page](https://redirect.github.com/github/codeql-action/releases) for the relevant changes to the CodeQL CLI and language packs. ##### 3.28.19 - 03 Jun 2025 - The CodeQL Action no longer includes its own copy of the extractor for the `actions` language, which is currently in public preview. The `actions` extractor has been included in the CodeQL CLI since v2.20.6. If your workflow has enabled the `actions` language *and* you have pinned your `tools:` property to a specific version of the CodeQL CLI earlier than v2.20.6, you will need to update to at least CodeQL v2.20.6 or disable `actions` analysis. - Update default CodeQL bundle version to 2.21.4. [#2910](https://redirect.github.com/github/codeql-action/pull/2910) See the full [CHANGELOG.md](https://redirect.github.com/github/codeql-action/blob/v3.28.19/CHANGELOG.md) for more information. </details> --- ### Configuration 📅 **Schedule**: Branch creation - "before 6am on monday" in timezone Australia/Sydney, Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/google/osv-scanner).

This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | golang | final | patch | `1.24.3-alpine3.21` -> `1.24.4-alpine3.21` | | golang | stage | patch | `1.24.3-alpine3.21` -> `1.24.4-alpine3.21` | --- ### Configuration 📅 **Schedule**: Branch creation - "before 6am on monday" in timezone Australia/Sydney, Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/google/osv-scanner).  Co-authored-by: Rex P <[email protected]>

another-rex · 2025-06-13T01:12:00Z

@G-Rath

I think those logs are an important part of our snapshots since they're output we definitely want to be showing to the user (compared to some of the other logs we're currently filtering) and they aid in debugging since they tell us clearly what files are being scanned and the number of packages extracted.

Yeah I agree.

Is there anyway to make the walking deterministic, even if its controlled via a flag or env variable so that we can opt-in?

I am very hesitant to change the actual code logic for tests, because then we are not really testing for the actual code paths someone running osv-scanner will run into.

I ended up with printing a few markers at the start and end of the directory walk (When testing is true), then sorting the lines in-between the markers to check for presence, but not the specific order.

PTAL

codecov-commenter · 2025-06-13T01:15:55Z

Codecov Report

Attention: Patch coverage is 83.33333% with 24 lines in your changes missing coverage. Please review.

Project coverage is 65.66%. Comparing base (743d8ed) to head (be24113).
Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/osvscanner/scan.go	85.29%	7 Missing and 3 partials ⚠️
.../requirementsenhancable/requirementsenhanceable.go	61.90%	8 Missing ⚠️
cmd/osv-scanner/internal/testcmd/run.go	83.33%	4 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1936      +/-   ##
==========================================
+ Coverage   65.60%   65.66%   +0.06%     
==========================================
  Files         167      168       +1     
  Lines       16060    16117      +57     
==========================================
+ Hits        10536    10584      +48     
- Misses       4859     4866       +7     
- Partials      665      667       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

G-Rath · 2025-06-13T01:15:58Z

Is there anyway to make the walking deterministic, even if its controlled via a flag or env variable so that we can opt-in?

I am very hesitant to change the actual code logic for tests, because then we are not really testing for the actual code paths someone running osv-scanner will run into.

fwiw I agree - I was meaning we do that for the actual scanner logic not just for tests

another-rex added 26 commits April 16, 2025 16:14

Use scalibr scanning

12a8b9a

Merge branch 'main' into scalibr-use-scan-here

f7c8322

Update snaps again

342c656

Merge branch 'update-snaps' into scalibr-use-scan-here

fd272f2

Update logging to match old style

5dece64

Merge branch 'main' into scalibr-use-scan-here

03d669f

Everything apart from scan.go

22c6776

Scan.go

b138f2f

Revert scan.go changes

9a2afcd

Fix test

7a0cca2

Fix more tests

c66de0b

Merge remote-tracking branch 'upstream/main' into revert-scan.go-changes

f4d6410

Fix step one

5498946

Fix all the type issues in this file

7acce8e

Uncomment vuln_result test

205a7ff

Revert unnecessary file for now

4b46e79

Update scalibr again

9ed30f9

Merge remote-tracking branch 'upstream/main' into revert-scan.go-changes

af464d8

Fix lints and tests

7181fe1

Bump up scalibr version

61729bb

Suppress Created image content file log

c48c9ae

Add requirementsenhancable

aa5d6be

Fix remaining issues

4162025

Update snapshots

49ec2c5

Merge remote-tracking branch 'upstream/main' into scalibr-use-scan-here

662a36c

Fix lints

735c0ed

another-rex requested review from oliverchang, G-Rath, michaelkedar and cuixq June 10, 2025 06:46

Merge branch 'main' into scalibr-use-scan-here

9e49930

G-Rath reviewed Jun 12, 2025

View reviewed changes

renovate-bot and others added 6 commits June 13, 2025 11:05

chore: v2.0.3 Changelog (google#1941)

cfb90b5

fix: Use latest go version (google#1942)

b29baf2

allow unsorted file walks in snapshots

73a3e03

Put else branches of capabilities into default in the struct init

202c432

Fix lints

be24113

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Use OSV-Scalibr scan function directly #1936

feat: Use OSV-Scalibr scan function directly #1936

another-rex commented Jun 10, 2025

Uh oh!

another-rex commented Jun 12, 2025

Uh oh!

G-Rath commented Jun 12, 2025

Uh oh!

G-Rath Jun 12, 2025

Uh oh!

another-rex Jun 13, 2025

Uh oh!

G-Rath Jun 12, 2025

Uh oh!

another-rex commented Jun 13, 2025

Uh oh!

codecov-commenter commented Jun 13, 2025 •

edited

Loading

Uh oh!

G-Rath commented Jun 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

		@@ -15,6 +15,10 @@ type FileOpenedPrinter struct {
		var _ stats.Collector = &FileOpenedPrinter{}

		func (c FileOpenedPrinter) AfterExtractorRun(_ string, extractorstats *stats.AfterExtractorStats) {

feat: Use OSV-Scalibr scan function directly #1936

Are you sure you want to change the base?

feat: Use OSV-Scalibr scan function directly #1936

Conversation

another-rex commented Jun 10, 2025

Uh oh!

another-rex commented Jun 12, 2025

Uh oh!

G-Rath commented Jun 12, 2025

Uh oh!

G-Rath Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

another-rex Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

G-Rath Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

another-rex commented Jun 13, 2025

Uh oh!

codecov-commenter commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

G-Rath commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jun 13, 2025 •

edited

Loading

G-Rath commented Jun 13, 2025 •

edited

Loading