Skip to content

feat: Use OSV-Scalibr scan function directly #1936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

another-rex
Copy link
Collaborator

Closes #1769

Behaviour changes / notes:

  • OSV-Scalibr's Scan() function does DFS rather than BFS, so the order of files being read in the snapshot changes slightly.
  • Their Scan() function also does not deduplicate the output of the extractor, so the SBOM with duplicates will return double the results. I think it's fine to have that behavior change for now unless someone complains about it.
  • Unfortunately, we also double log the path to the file in the error, because both the extractor and the walker both prepend the path the error originated from. This is a TODO in osv-scalibr to remove input path references in the extractor errors. (chore: Remove input.Path from extractor errors osv-scalibr#821)

@another-rex
Copy link
Collaborator Author

Annoyingly it seems like osv-scalibr's directory walk is not deterministic on different systems (as it uses the order the filesystem gives when iterating through a directory).

I'm not sure how to fix this for our tests, do we just avoid snapshotting any of the walking logs?

@G-Rath
Copy link
Collaborator

G-Rath commented Jun 12, 2025

I'm not sure how to fix this for our tests, do we just avoid snapshotting any of the walking logs?

I think those logs are an important part of our snapshots since they're output we definitely want to be showing to the user (compared to some of the other logs we're currently filtering) and they aid in debugging since they tell us clearly what files are being scanned and the number of packages extracted.

Is there anyway to make the walking deterministic, even if its controlled via a flag or env variable so that we can opt-in?

@@ -15,6 +15,10 @@ type FileOpenedPrinter struct {
var _ stats.Collector = &FileOpenedPrinter{}

func (c FileOpenedPrinter) AfterExtractorRun(_ string, extractorstats *stats.AfterExtractorStats) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a possible workaround could be to try buffering the logging, and try to dump it immediately if we error - we could also do the same logging here under a different level like DEBUG so that you could see the actual unbuffered order...

e.g. something like:

// NOT ACTUAL GO CODE
scannedLogs := []string{}

err := scanner.Scan({
	StatsAfterExtractorRun(stats) {
		slog.Debug(fmt.Sprintf(
			"Scanned %s file and found %d %s",
			extractorstats.Path,
			pkgsFound,
			output.Form(pkgsFound, "package", "packages"),
		))

		scannedLogs = append(scannedLogs, fmt.Sprintf(
			"Scanned %s file and found %d %s",
			extractorstats.Path,
			pkgsFound,
			output.Form(pkgsFound, "package", "packages"),
		))
	}
})

for line := range slices.Sort(scannedLogs) {
	slog.Info(line)
}

if err != nil {
	slog.Error(err)
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up with something similar to the buffering approach, just on the logs instead so we don't have to touch the code functions that much.

RunningSystem: true,
}

if actions.CompareOffline {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the else branches here could be the initial values in the struct

renovate-bot and others added 6 commits June 13, 2025 11:05
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
|
[github/codeql-action](https://redirect.github.com/github/codeql-action)
| action | minor | `v3.28.18` -> `v3.29.0` |

---

### Release Notes

<details>
<summary>github/codeql-action (github/codeql-action)</summary>

###
[`v3.29.0`](https://redirect.github.com/github/codeql-action/releases/tag/v3.29.0)

[Compare
Source](https://redirect.github.com/github/codeql-action/compare/v3.28.19...v3.29.0)

### CodeQL Action Changelog

See the [releases
page](https://redirect.github.com/github/codeql-action/releases) for the
relevant changes to the CodeQL CLI and language packs.

#### 3.29.0 - 11 Jun 2025

- Update default CodeQL bundle version to 2.22.0.
[#&#8203;2925](https://redirect.github.com/github/codeql-action/pull/2925)
- Bump minimum CodeQL bundle version to 2.16.6.
[#&#8203;2912](https://redirect.github.com/github/codeql-action/pull/2912)

See the full
[CHANGELOG.md](https://redirect.github.com/github/codeql-action/blob/v3.29.0/CHANGELOG.md)
for more information.

###
[`v3.28.19`](https://redirect.github.com/github/codeql-action/releases/tag/v3.28.19)

[Compare
Source](https://redirect.github.com/github/codeql-action/compare/v3.28.18...v3.28.19)

##### CodeQL Action Changelog

See the [releases
page](https://redirect.github.com/github/codeql-action/releases) for the
relevant changes to the CodeQL CLI and language packs.

##### 3.28.19 - 03 Jun 2025

- The CodeQL Action no longer includes its own copy of the extractor for
the `actions` language, which is currently in public preview.
The `actions` extractor has been included in the CodeQL CLI since
v2.20.6. If your workflow has enabled the `actions` language *and* you
have pinned
your `tools:` property to a specific version of the CodeQL CLI earlier
than v2.20.6, you will need to update to at least CodeQL v2.20.6 or
disable
    `actions` analysis.
- Update default CodeQL bundle version to 2.21.4.
[#&#8203;2910](https://redirect.github.com/github/codeql-action/pull/2910)

See the full
[CHANGELOG.md](https://redirect.github.com/github/codeql-action/blob/v3.28.19/CHANGELOG.md)
for more information.

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 6am on monday" in timezone
Australia/Sydney, Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/google/osv-scanner).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MC40MC4zIiwidXBkYXRlZEluVmVyIjoiNDAuNDguNSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiZGVwZW5kZW5jaWVzIl19-->
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| golang | final | patch | `1.24.3-alpine3.21` -> `1.24.4-alpine3.21` |
| golang | stage | patch | `1.24.3-alpine3.21` -> `1.24.4-alpine3.21` |

---

### Configuration

📅 **Schedule**: Branch creation - "before 6am on monday" in timezone
Australia/Sydney, Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these
updates again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/google/osv-scanner).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MC40MC4zIiwidXBkYXRlZEluVmVyIjoiNDAuNDAuMyIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiZGVwZW5kZW5jaWVzIl19-->

Co-authored-by: Rex P <[email protected]>
@another-rex
Copy link
Collaborator Author

@G-Rath

I think those logs are an important part of our snapshots since they're output we definitely want to be showing to the user (compared to some of the other logs we're currently filtering) and they aid in debugging since they tell us clearly what files are being scanned and the number of packages extracted.

Yeah I agree.

Is there anyway to make the walking deterministic, even if its controlled via a flag or env variable so that we can opt-in?

I am very hesitant to change the actual code logic for tests, because then we are not really testing for the actual code paths someone running osv-scanner will run into.


I ended up with printing a few markers at the start and end of the directory walk (When testing is true), then sorting the lines in-between the markers to check for presence, but not the specific order.

PTAL

@codecov-commenter
Copy link

codecov-commenter commented Jun 13, 2025

Codecov Report

Attention: Patch coverage is 83.33333% with 24 lines in your changes missing coverage. Please review.

Project coverage is 65.66%. Comparing base (743d8ed) to head (be24113).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
pkg/osvscanner/scan.go 85.29% 7 Missing and 3 partials ⚠️
.../requirementsenhancable/requirementsenhanceable.go 61.90% 8 Missing ⚠️
cmd/osv-scanner/internal/testcmd/run.go 83.33% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1936      +/-   ##
==========================================
+ Coverage   65.60%   65.66%   +0.06%     
==========================================
  Files         167      168       +1     
  Lines       16060    16117      +57     
==========================================
+ Hits        10536    10584      +48     
- Misses       4859     4866       +7     
- Partials      665      667       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@G-Rath
Copy link
Collaborator

G-Rath commented Jun 13, 2025

Is there anyway to make the walking deterministic, even if its controlled via a flag or env variable so that we can opt-in?

I am very hesitant to change the actual code logic for tests, because then we are not really testing for the actual code paths someone running osv-scanner will run into.

fwiw I agree - I was meaning we do that for the actual scanner logic not just for tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate OSV-Scanner to use SCALIBR’s main Scan() function
4 participants