Skip to content

Simplify cargo purls#988

Merged
slimreaper35 merged 6 commits into
hermetoproject:mainfrom
slimreaper35:cargo-sbom
Jul 15, 2025
Merged

Simplify cargo purls#988
slimreaper35 merged 6 commits into
hermetoproject:mainfrom
slimreaper35:cargo-sbom

Conversation

@slimreaper35

@slimreaper35 slimreaper35 commented Jun 23, 2025

Copy link
Copy Markdown
Member

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • Code coverage from testing does not decrease and new code is covered
  • Docs updated (if applicable)
  • Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

Summary by CodeRabbit

  • Refactor

    • Improved handling of Cargo packages, distinguishing between local/workspace and external dependencies.
    • Enhanced support for virtual workspaces and local path-based dependencies.
    • Refined SBOM generation and package URL formatting for better accuracy and clarity.
  • Bug Fixes

    • Adjusted package URL (purl) generation to use updated query parameters and formats, improving metadata consistency.
  • Tests

    • Added unit tests for main package resolution and package URL generation for Cargo packages.

Comment thread hermeto/core/package_managers/cargo/main.py Outdated
Comment thread hermeto/core/package_managers/cargo/main.py Outdated
Comment thread hermeto/core/package_managers/cargo/main.py
Comment thread hermeto/core/package_managers/cargo/main.py Outdated
Comment thread hermeto/core/package_managers/cargo/main.py Outdated
Comment thread hermeto/core/package_managers/cargo/main.py Outdated
Comment thread hermeto/core/package_managers/cargo/main.py
Comment thread hermeto/core/package_managers/cargo/main.py Outdated
Comment thread tests/unit/package_managers/cargo/test_main.py Outdated
@slimreaper35

Copy link
Copy Markdown
Member Author

new revision with more commits and with suggested improvements

Comment thread hermeto/core/package_managers/cargo/main.py
Comment thread hermeto/core/package_managers/cargo/main.py Outdated
Comment thread hermeto/core/package_managers/cargo/main.py Outdated
Comment thread hermeto/core/package_managers/cargo/main.py Outdated
Comment thread hermeto/core/package_managers/cargo/main.py Outdated
Comment thread tests/unit/package_managers/cargo/test_main.py Outdated
Comment thread hermeto/core/package_managers/cargo/main.py
Comment thread hermeto/core/package_managers/cargo/main.py
Comment thread hermeto/core/package_managers/cargo/main.py Outdated
Comment thread hermeto/core/package_managers/cargo/main.py
@slimreaper35 slimreaper35 changed the title simplify cargo purls Simplify cargo purls Jun 26, 2025
Comment thread hermeto/core/package_managers/cargo/main.py
Comment thread tests/unit/package_managers/cargo/test_main.py
Comment thread hermeto/core/package_managers/cargo/main.py
Comment thread hermeto/core/package_managers/cargo/main.py Outdated
Comment thread hermeto/core/package_managers/cargo/main.py Outdated
Comment thread hermeto/core/package_managers/cargo/main.py
Comment thread hermeto/core/package_managers/cargo/main.py
Comment thread hermeto/core/package_managers/cargo/main.py Outdated
Comment thread hermeto/core/package_managers/cargo/main.py
@slimreaper35 slimreaper35 force-pushed the cargo-sbom branch 2 times, most recently from 7d89d2a to 20f7092 Compare July 8, 2025 12:10

@eskultety eskultety left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CI is happy, so am I.

Comment thread hermeto/core/package_managers/cargo/main.py Outdated
Comment thread hermeto/core/package_managers/cargo/main.py
subpath=str(package_dir.path.relative_to(package_dir.root)),
).to_component()
)
elif pkg_name in local_packages.keys():

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking observation: having a is_local_package() defined somewhere would have slightly improved the flow. Returning it from _find_local_packaged() (properly renamed) could be done. Bending the rules a little and agreeing on the helper to return either an empty string or a path could result in something like this:

elif subpath := is_local_package(pcg_name):
   ...
   subpath=subpath,
   ...

@taylormadore taylormadore left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Improve parsing of Cargo.toml, Cargo.lock and .cargo/config.toml files
by separating their logic into a distinct function. This enhances
code clarity without unnecessary comments and removes duplication.

Change type of `version` field from `str` to `Optional[str]` to silence
mypy error. The class is used for "main package" component, which can be
a virtual workspace without defined package name and version in
Cargo.toml. The version is None in that case. In the future patches,
I will revert this line back to `str` type and move the logic for "main
package" component away from this class.

Signed-off-by: Michal Šoltis <msoltis@redhat.com>
When resolving a cargo project, we only parse the top level Cargo.toml.

If a project is a virtual workspace, all nested packages can have their
own version. But the actual workspace can have its version too. See
https://doc.rust-lang.org/cargo/reference/workspaces.html#the-package-table
for more details.

The current implementation is wrong, since the code assumes we parse a
nested Cargo.toml of any workspace package and expect workspace metadata
to be present there as well. It is basically an unreachable code.

Run generate test data nox session to update SBOM components based on
the changes in this commit.

Signed-off-by: Michal Šoltis <msoltis@redhat.com>
SBOM components and config template are not connected to each other ->
move fetching dependenceis logic (cargo vendor) to a separate function.
Also, move verifying presence of Cargo.lock from SBOM component
resolution that needs to be called before prefetching.

Update function headers to reflect the changes (e.g. unused attribute).

Signed-off-by: Michal Šoltis <msoltis@redhat.com>
The `cargo vendor` command flags were clarified to explain their
purpose, especially --no-delete which is critical for pip integration.
The docstring was updated to better describe the function's purpose and
the expected output.

Signed-off-by: Michal Šoltis <msoltis@redhat.com>
After splitting this function in previous patches, the new name better
reflects what it does.

Signed-off-by: Michal Šoltis <msoltis@redhat.com>
@coderabbitai

coderabbitai Bot commented Jul 15, 2025

Copy link
Copy Markdown

Walkthrough

The changes refactor and extend the Cargo package manager logic to better distinguish local, workspace, and external dependencies, introducing a new LocalCargoPackage dataclass. SBOM generation and package URL (purl) handling are improved, especially for git-based and local packages. Tests and integration data are updated to reflect the new purl formats and logic.

Changes

File(s) Change Summary
hermeto/core/package_managers/cargo/main.py Refactored Cargo package handling; introduced LocalCargoPackage; improved purl and SBOM generation; modularized dependency resolution.
tests/unit/package_managers/cargo/test_main.py Added unit tests for main package resolution and purl generation, including helper functions.
tests/integration/test_data/cargo_mixed_git_crate_dependency/bom.json Updated purl fields: removed source parameter, changed git purl to use vcs_url qualifier and @ separator.
tests/integration/test_data/legacy_entrypoint_e2e_test/bom.json Updated purl fields: removed source parameter, changed git purl to use vcs_url qualifier and @ separator.
tests/integration/test_data/pip_multiple_packages/bom.json Removed source parameter from all Cargo package purls.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant fetch_cargo_source
    participant _fetch_dependencies
    participant _use_vendored_sources
    participant _generate_sbom_components

    User->>fetch_cargo_source: Request
    fetch_cargo_source->>_fetch_dependencies: Run cargo vendor
    fetch_cargo_source->>_use_vendored_sources: Update Cargo config
    fetch_cargo_source->>_generate_sbom_components: Parse Cargo.lock and Cargo.toml
    _generate_sbom_components-->>fetch_cargo_source: List[Component]
    fetch_cargo_source-->>User: RequestOutput (SBOM, etc.)
Loading

Poem

In the garden of crates, the code did grow,
With local and git, now clearer to show.
SBOMs are sharper, purls are precise,
The bunny hops forward, the logic is nice!
🐇✨

"From toml to lock,
we leap without shock."


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 912d7d0 and 17d38b5.

📒 Files selected for processing (5)
  • hermeto/core/package_managers/cargo/main.py (5 hunks)
  • tests/integration/test_data/cargo_mixed_git_crate_dependency/bom.json (2 hunks)
  • tests/integration/test_data/legacy_entrypoint_e2e_test/bom.json (2 hunks)
  • tests/integration/test_data/pip_multiple_packages/bom.json (53 hunks)
  • tests/unit/package_managers/cargo/test_main.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • tests/integration/test_data/pip_multiple_packages/bom.json
  • tests/integration/test_data/legacy_entrypoint_e2e_test/bom.json
  • tests/integration/test_data/cargo_mixed_git_crate_dependency/bom.json
  • tests/unit/package_managers/cargo/test_main.py
  • hermeto/core/package_managers/cargo/main.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build container image and run integration tests on it
  • GitHub Check: Red Hat Konflux / on-pull-request
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
hermeto/core/package_managers/cargo/main.py (1)

305-305: Simplify dictionary membership check

Remove the unnecessary .keys() call for cleaner, more idiomatic Python code.

-        elif pkg_name in local_packages.keys():
+        elif pkg_name in local_packages:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3215e8d and 912d7d0.

📒 Files selected for processing (5)
  • hermeto/core/package_managers/cargo/main.py (5 hunks)
  • tests/integration/test_data/cargo_mixed_git_crate_dependency/bom.json (2 hunks)
  • tests/integration/test_data/legacy_entrypoint_e2e_test/bom.json (2 hunks)
  • tests/integration/test_data/pip_multiple_packages/bom.json (53 hunks)
  • tests/unit/package_managers/cargo/test_main.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/unit/package_managers/cargo/test_main.py (3)
hermeto/core/package_managers/cargo/main.py (4)
  • CargoPackage (43-69)
  • _resolve_main_package (150-166)
  • purl (52-65)
  • purl (82-98)
hermeto/core/rooted_path.py (1)
  • RootedPath (13-122)
tests/unit/conftest.py (1)
  • rooted_tmp_path (31-33)
🪛 Ruff (0.12.2)
hermeto/core/package_managers/cargo/main.py

305-305: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Red Hat Konflux / on-pull-request
  • GitHub Check: Build container image and run integration tests on it
🔇 Additional comments (10)
hermeto/core/package_managers/cargo/main.py (6)

9-10: LGTM!

The new imports are appropriate for the added type annotations and URL parsing functionality.


44-44: Well-implemented purl generation for git sources!

The enhanced purl property correctly extracts commit IDs from git URLs and formats them as vcs_url qualifiers, aligning with the purl specification.

Also applies to: 51-65


72-103: Good separation of concerns with LocalCargoPackage!

The new dataclass effectively handles local and workspace dependencies, with appropriate handling of optional VCS URLs and subpaths.


123-142: Clear and well-documented dependency fetching!

The function properly extracts the cargo vendor logic with excellent documentation of the command flags, especially the critical --no-delete flag for pip integration.


150-167: Excellent support for virtual workspace packages!

The function now correctly handles both standard packages and virtual workspaces, with appropriate fallbacks for missing package information.


276-326: Well-structured SBOM component generation!

The function effectively differentiates between main, local, and external packages, with proper error handling for scenarios like missing .git directories.

tests/integration/test_data/pip_multiple_packages/bom.json (1)

12-12: Consistent purl simplification across all Cargo packages!

The removal of the redundant registry source parameter from all Cargo package purls aligns with the new simplified purl format while preserving the essential checksum information.

Also applies to: 24-24, 36-36, 48-48, 60-60, 72-72, 84-84, 96-96, 108-108, 168-168, 180-180, 192-192, 204-204, 216-216, 228-228, 240-240, 252-252, 264-264, 276-276, 288-288, 300-300, 312-312, 324-324, 336-336, 348-348, 360-360, 372-372, 384-384, 396-396, 408-408, 420-420, 432-432, 444-444, 456-456, 468-468, 480-480, 492-492, 504-504, 516-516, 528-528, 540-540, 552-552, 564-564, 576-576, 588-588, 600-600, 612-612, 624-624, 636-636, 648-648, 660-660, 672-672, 684-684

tests/integration/test_data/cargo_mixed_git_crate_dependency/bom.json (1)

12-12: Correct purl format updates!

The changes properly reflect the new purl generation logic:

  • Registry source removed for crates.io packages
  • Git sources now use the vcs_url qualifier with @ separator for commit references

Also applies to: 36-36

tests/integration/test_data/legacy_entrypoint_e2e_test/bom.json (1)

12-12: Consistent purl format updates!

The changes match the pattern in other BOM files, correctly implementing the new purl format.

Also applies to: 36-36

tests/unit/package_managers/cargo/test_main.py (1)

1-96: Comprehensive test coverage for the refactored Cargo functionality!

The tests effectively cover:

  • Standard package resolution with explicit name/version
  • Virtual workspace handling with and without workspace versions
  • Purl generation for registry, git, and simple packages

Good use of parameterized tests and clear test structure.

Improve the handling of cargo packages by separating local dependencies
from registry/git ones. This change introduces a new LocalCargoPackage
class to better represent local dependencies and workspace packages.
The SBOM generation logic is updated to properly distinguish between
different package types and include relevant metadata like VCS URL and
subpath.

The `vcs_url` qualifier was incorrectly set only for main packages
while dependencies from git sources were missing this qualifier.
This caused inconsistent purl generation.

We now derive `vcs_url` directly from git source URLs for dependencies
by parsing and reconstructing the repository URL with its ref. The
main package still uses the project's VCS URL when available.

VCS URL format is defined in PURL specification:
https://github.com/package-url/purl-spec/blob/main/PURL-SPECIFICATION.rst

Remove non-existent source field from PURL qualifiers.

Signed-off-by: Michal Šoltis <msoltis@redhat.com>
@slimreaper35 slimreaper35 added this pull request to the merge queue Jul 15, 2025
Merged via the queue into hermetoproject:main with commit 8bf288b Jul 15, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants