Skip to content

Conversation

@jcrossley3
Copy link
Contributor

@jcrossley3 jcrossley3 commented Dec 1, 2025

This fixes #2146 and therefore indirectly resolves the downstream issue, https://issues.redhat.com/browse/TC-3090

It relies on unpublished forks of two interdependent crates: csaf-rs and packageurl.rs

Ultimately, it'd be swell if we could instead depend on something less dead and more supported like https://github.com/csaf-rs/csaf

Summary by Sourcery

Handle percent-encoded slashes in package URLs and align SBOM and advisory translation logic with updated packageurl behavior.

Bug Fixes:

  • Preserve and correctly round-trip percent-encoded slashes (%2F) in purl names when formatting and parsing package URLs.
  • Ensure SBOM lookup returns consistent results for equivalent huggingface and generic package URLs with encoded namespace separators.

Enhancements:

  • Update purl construction and advisory translation helpers to propagate errors from namespace and version setters instead of silently ignoring failures.

Build:

  • Bump packageurl to a newer unreleased revision and switch csaf to a branch with purl name encoding fixes, wiring these forks into Cargo dependencies.

Tests:

  • Extend SBOM endpoint and purl unit tests to cover purls containing percent-encoded slashes and their equivalence across ecosystems.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 1, 2025

Reviewer's Guide

Fixes handling of percent-encoded slashes in purl names by tightening PackageUrl construction/formatting and extending SBOM and Purl tests, while updating csaf and packageurl dependencies to patched git branches that include the encoding fix.

Sequence diagram for updated PackageUrl construction in OSV translate and split_name

sequenceDiagram
    participant Caller
    participant Ecosystem
    participant Translate as translate
    participant PackageUrl

    Caller->>Translate: translate(ecosystem, name)
    Translate->>Ecosystem: ecosystem.ty_for(name)
    Ecosystem-->>Translate: ty

    alt Maven-style name with group and artifact
        Translate->>Translate: split_name(name, "maven", ":")
        alt name contains separator
            Translate->>Translate: join all but last segment as namespace
            Translate->>Translate: take last segment as simple_name
            Translate->>PackageUrl: new("maven", simple_name)
            PackageUrl-->>Translate: Result<PackageUrl, Error>
            alt Ok(package_url)
                Translate->>PackageUrl: with_namespace(namespace)
                PackageUrl-->>Translate: Result<&mut PackageUrl, Error>
                alt Ok(mut_ref)
                    Translate->>PackageUrl: add_qualifier("repository_url", repo)
                    PackageUrl-->>Translate: Result<&mut PackageUrl, Error>
                    Translate-->>Caller: Some(PackageUrl)
                else Err
                    Translate-->>Caller: None (namespace error propagated)
                end
            else Err
                Translate-->>Caller: None (creation error)
            end
        else no separator
            Translate-->>Caller: None
        end
    else other ecosystems or simple names
        Translate->>PackageUrl: new(ty, name)
        PackageUrl-->>Translate: Result<PackageUrl, Error>
        alt Ok(package_url)
            Translate-->>Caller: Some(PackageUrl)
        else Err
            Translate-->>Caller: None
        end
    end
Loading

Class diagram for updated Purl formatting and PackageUrl construction

classDiagram
    class Purl {
        +ty: String
        +name: String
        +namespace: Option~String~
        +version: Option~String~
        +qualifiers: HashMap~String, String~
        +from_str(input: &str) Result~Purl, Error~
        +to_string() String
    }

    class PurlDisplayImpl {
        +fmt(f: &mut Formatter) Result~(), fmt::Error~
    }

    class PurlDebugImpl {
        +fmt(f: &mut Formatter) Result~(), fmt::Error~
    }

    class PackageUrl~'a~ {
        +new(ty: &str, name: &str) Result~PackageUrl~'a~, Error~
        +with_namespace(namespace: &str) Result~&mut PackageUrl~'a~, Error~
        +with_version(version: &str) Result~&mut PackageUrl~'a~, Error~
        +add_qualifier(key: &str, value: &str) Result~&mut PackageUrl~'a~, Error~
        +to_string() String
    }

    Purl ..> PurlDisplayImpl : implements Display
    Purl ..> PurlDebugImpl : implements Debug
    PurlDisplayImpl ..> PackageUrl~'a~ : constructs and configures

    %% Updated behavior:
    %% - Errors from PackageUrl::new, with_namespace, with_version, add_qualifier
    %%   are now propagated as fmt::Error in PurlDisplayImpl.fmt
Loading

File-Level Changes

Change Details Files
Ensure Purl formatting round-trips percent-encoded names and propagates PackageUrl construction errors.
  • Use the fmt alias instead of fully-qualified std::fmt types in Purl visitor and formatting implementations.
  • Wrap PackageUrl::new, with_namespace, with_version, and add_qualifier in error mapping so any construction/encoding failure becomes a fmt::Error.
  • Add a regression test verifying that a generic purl with an encoded slash in the name round-trips correctly through Purl::from_str and Display.
common/src/purl.rs
Align SBOM endpoint expectations and coverage with encoded-slash purl semantics across generic and huggingface ecosystems.
  • Update existing SBOM response expectations to use a generic purl whose name contains an encoded slash (%2F) instead of a literal slash path segment.
  • Add SBOM endpoint tests that query by huggingface and generic purls and assert that both resolve to the same described_by id using the encoded generic purl.
  • Keep the JSON subset assertion while tightening formatting around the expected_result assertion.
modules/fundamental/src/sbom/endpoints/test.rs
Tighten PackageUrl construction when translating OSV advisories, propagating namespace-related failures under the new packageurl API.
  • Change maven translation to use the try-operator on with_namespace so any failure aborts PackageUrl construction and is surfaced via the Result.
  • Refactor split_name helper to construct a PackageUrl and then and_then into with_namespace(namespace).cloned(), matching the new API that returns Result<&mut PackageUrl>.
  • Preserve existing qualifier handling and overall translation behavior while adapting to the updated packageurl crate semantics.
modules/ingestor/src/service/advisory/osv/translate.rs
Update dependencies to versions/branches that contain the purl name encoding fix and remove an unused workspace dependency.
  • Bump the packageurl crate version in the root manifest to 0.6.0-rc.1.
  • Point the csaf dependency at the trustification/csaf-rs repository purl-name-encoding-fix branch instead of a pinned revision.
  • Override packageurl via a git dependency on jcrossley3/packageurl.rs issue-28 branch to pick up the encoding fix.
  • Remove an explicit packageurl dependency from the fundamental module manifest, relying on the workspace/root configuration instead.
  • Regenerate Cargo.lock to capture the new dependency graph.
Cargo.toml
modules/fundamental/Cargo.toml
Cargo.lock

Assessment against linked issues

Issue Objective Addressed Explanation
#2146 Ensure PURLs whose name contains a percent-encoded slash (%2F) are parsed and formatted without incorrectly treating the encoded slash as a namespace separator or otherwise altering the PURL structure.
#2146 Ensure SBOM ingestion and API responses round-trip PURLs with %2F in their names unchanged (no spurious namespace added, and the encoded slash preserved in the returned PURL).

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@jcrossley3 jcrossley3 requested review from ctron and dejanb December 1, 2025 18:01
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and found some issues that need to be addressed.

  • In split_name, the new with_namespace(namespace).unwrap() reintroduces a potential panic path; consider propagating the error (e.g., via and_then or ? in a Result-returning helper) instead of unwrapping.
  • Cargo.toml now declares packageurl twice (once with a version and once as a git dependency); it would be clearer to consolidate this into a single dependency specification or use [patch] so it’s obvious which source is authoritative.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `split_name`, the new `with_namespace(namespace).unwrap()` reintroduces a potential panic path; consider propagating the error (e.g., via `and_then` or `?` in a `Result`-returning helper) instead of unwrapping.
- Cargo.toml now declares `packageurl` twice (once with a version and once as a git dependency); it would be clearer to consolidate this into a single dependency specification or use `[patch]` so it’s obvious which source is authoritative.

## Individual Comments

### Comment 1
<location> `common/src/purl.rs:241` </location>
<code_context>
         let mut purl = PackageUrl::new(&self.ty, &self.name).map_err(|_| fmt::Error)?;
         if let Some(ns) = &self.namespace {
-            purl.with_namespace(ns);
+            purl.with_namespace(ns).map_err(|_| std::fmt::Error {})?;
         }
         if let Some(version) = &self.version {
</code_context>

<issue_to_address>
**issue (bug_risk):** Use `fmt::Error` directly instead of constructing `std::fmt::Error {}`

`fmt::Error` is a unit struct, so `std::fmt::Error {}` is invalid and won’t compile. Instead, either `use std::fmt::Error;` and write `map_err(|_| Error)?`, or follow the existing pattern and use `map_err(|_| fmt::Error)?`. Apply the same fix to the `with_version` call as well.
</issue_to_address>

### Comment 2
<location> `common/src/purl.rs:244` </location>
<code_context>
         }
         if let Some(version) = &self.version {
-            purl.with_version(version);
+            purl.with_version(version).map_err(|_| std::fmt::Error {})?;
         }
         for (key, value) in &self.qualifiers {
</code_context>

<issue_to_address>
**issue (bug_risk):** Fix construction of `fmt::Error` on `with_version` error mapping

As with `with_namespace`, `std::fmt::Error {}` isn’t a valid constructor here and will not compile. Use `map_err(|_| fmt::Error)?` (or an equivalent mapping) so the error conversion is valid and consistent.
</issue_to_address>

### Comment 3
<location> `modules/ingestor/src/service/advisory/osv/translate.rs:63` </location>
<code_context>
             PackageUrl::new(ty, name)
                 .map(|mut purl| {
-                    purl.with_namespace(namespace);
+                    purl.with_namespace(namespace).unwrap();
                     purl
                 })
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Avoid `unwrap()` on `with_namespace` to prevent panics and keep error handling consistent

In `translate` you now propagate `with_namespace` errors with `?`, but here `unwrap()` reintroduces a panic on malformed namespaces. Please handle the error instead (e.g., propagate it, or map the failure to `None` for this helper) to stay consistent and avoid panics.
</issue_to_address>

### Comment 4
<location> `modules/fundamental/src/sbom/endpoints/test.rs:1727-1739` </location>
<code_context>
+
+    // Assert that both packages (generic & huggingface) return the same SBOM
+    // First huggingface...
+    let uri = format!(
+        "/api/v2/sbom/by-package?purl={}",
+        encode("pkg:huggingface/ibm-granite/[email protected]")
+    );
+    let req = TestRequest::get().uri(&uri).to_request();
+    let response: Value = app.call_and_read_body_json(req).await;
+    assert_eq!(
</code_context>

<issue_to_address>
**suggestion (testing):** Add a direct assertion that the huggingface request can be made with an already-encoded slash as well

This block covers the case where a literal `/` in the huggingface purl is translated to `%2F` in the generic purl. Another real-world edge case is when clients pre-encode the `/` in the huggingface purl itself (e.g. `pkg:huggingface/ibm-granite%[email protected]`).

Please add a third request that uses a huggingface purl with `%2F` in the name (still wrapped in `encode(...)` for the query parameter) and asserts it resolves to the same generic SBOM id. That will verify the endpoint behaves correctly whether or not the client pre-encodes the slash in the name segment.

```suggestion
    // Assert that both packages (generic & huggingface) return the same SBOM
    // First huggingface with a literal '/' in the name...
    let uri = format!(
        "/api/v2/sbom/by-package?purl={}",
        encode("pkg:huggingface/ibm-granite/[email protected]")
    );
    let req = TestRequest::get().uri(&uri).to_request();
    let response: Value = app.call_and_read_body_json(req).await;
    assert_eq!(
        response["items"][0]["described_by"][0]["id"],
        "pkg:generic/ibm-granite%[email protected]"
    );

    // Then huggingface with a pre-encoded '/' in the name...
    let uri = format!(
        "/api/v2/sbom/by-package?purl={}",
        encode("pkg:huggingface/ibm-granite%[email protected]")
    );
    let req = TestRequest::get().uri(&uri).to_request();
    let response: Value = app.call_and_read_body_json(req).await;
    assert_eq!(
        response["items"][0]["described_by"][0]["id"],
        "pkg:generic/ibm-granite%[email protected]"
    );

    // And then generic...
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@codecov
Copy link

codecov bot commented Dec 1, 2025

Codecov Report

❌ Patch coverage is 44.44444% with 5 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@545028c). Learn more about missing BASE report.

Files with missing lines Patch % Lines
common/src/purl.rs 40.00% 1 Missing and 2 partials ⚠️
...les/ingestor/src/service/advisory/osv/translate.rs 50.00% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2156   +/-   ##
=======================================
  Coverage        ?   68.20%           
=======================================
  Files           ?      376           
  Lines           ?    21190           
  Branches        ?    21190           
=======================================
  Hits            ?    14453           
  Misses          ?     5863           
  Partials        ?      874           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jcrossley3 jcrossley3 force-pushed the 2146 branch 2 times, most recently from 66039c2 to 01b1d43 Compare December 1, 2025 21:11
@jcrossley3
Copy link
Contributor Author

@sourcery-ai review

sourcery-ai[bot]

This comment was marked as resolved.

@ctron
Copy link
Contributor

ctron commented Dec 2, 2025

I'll release the 0.6.0-rc.1 in a few minutes (unless I forget). Does this mean we have to re-ingest all documents?

@jcrossley3
Copy link
Contributor Author

Does this mean we have to re-ingest all documents?

I wouldn't think so. Only those with %2F in their purl names, and it's only through silently failing actions within the UI that anyone would even notice, at which point one could simply re-upload the SBOM in question.

@ctron
Copy link
Contributor

ctron commented Dec 3, 2025

Only those with %2F in their purl names,

Can the formulate a SQL query finding such SBOMs?

at which point one could simply re-upload the SBOM in question.

Assuming that the user still has the original document.

@jcrossley3
Copy link
Contributor Author

Only those with %2F in their purl names,

Can the formulate a SQL query finding such SBOMs?

I'm not sure that's worth the effort. It would take me days to come up with that. :)

And I've only seen it in the AIBOM's -- and I honestly think the %2F's in them is a mistake.

at which point one could simply re-upload the SBOM in question.

Assuming that the user still has the original document.

They can always download the original and re-ingest, right?

@jcrossley3 jcrossley3 force-pushed the 2146 branch 2 times, most recently from e9c234d to f520b0c Compare December 5, 2025 17:24
Cargo.toml Outdated
csaf = { git = "https://github.com/trustification/csaf-rs", rev = "17620a225744b4a18845d4f7bf63354e01109b91" }
csaf = { git = "https://github.com/trustification/csaf-rs", branch = "purl-name-encoding-fix" }
# required due to https://github.com/scm-rs/packageurl.rs/issues/28
packageurl = { git = "https://github.com/scm-rs/packageurl.rs", rev = "d24f20f3d3c0242d88687119a5353dbc681eac2e" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the version with the change is released. We should use it. If something is missing, let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#cpe = { git = "https://github.com/ctron/cpe-rs", rev = "c3c05e637f6eff7dd4933c2f56d070ee2ddfb44b" }
# required due to https://github.com/voteblake/csaf-rs/pull/29
csaf = { git = "https://github.com/trustification/csaf-rs", rev = "17620a225744b4a18845d4f7bf63354e01109b91" }
csaf = { git = "https://github.com/trustification/csaf-rs", branch = "purl-name-encoding-fix" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep patching it, we should move it over to the scm-rs organization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but I'd prefer to adopt https://github.com/csaf-rs/csaf instead.

This fixes guacsec#2146 and therefore indirectly resolves the downstream
issue, https://issues.redhat.com/browse/TC-3090

It relies on unpublished forks of two interdependent crates: csaf-rs
and packageurl.rs

Ultimately, it'd be swell if we could instead depend on something less
dead and more supported like https://github.com/csaf-rs/csaf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PURL's with %2F in their name aren't ingested properly

2 participants