Skip to content

feat: [CMPA-604] Add distribution:url label to nodes on DepGraph when includeComponentMetadata#224

Open
calhar-snyk wants to merge 4 commits into
masterfrom
feat/CMPA-604-distribution-url-label
Open

feat: [CMPA-604] Add distribution:url label to nodes on DepGraph when includeComponentMetadata#224
calhar-snyk wants to merge 4 commits into
masterfrom
feat/CMPA-604-distribution-url-label

Conversation

@calhar-snyk

@calhar-snyk calhar-snyk commented Jun 26, 2026

Copy link
Copy Markdown
Contributor
  • Ready for review
  • Follows CONTRIBUTING rules
  • Reviewed by Snyk internal team

What does this PR do?

Extends includeComponentMetadata to also emit a distribution:url label on each Maven node, recording the remote URL the artifact was originally resolved from. The URL is built from two sources in the local ~/.m2 repository:

  • _remote.repositories (written by Maven next to each installed artifact) — gives the repository id the artifact came from (e.g. central).
  • mvn dependency:list-repositories — maps that repository id to its base URL.

The artifact's repo-relative path is appended to the base URL to form the full distribution:url, which flows through to the CycloneDX ExternalReferences (distribution type).

Key behaviours:

  • Resolved once per inspect: the node set and each node's .m2 artifact path are computed a single time and shared between the hash-label and distribution-url passes (lib/parse/m2-batch.ts).
  • The hash-label reads run concurrently with the dependency:list-repositories subprocess rather than serially.
  • Graceful degradation throughout — a missing file, an unknown repo id, or a failed Maven invocation simply means no label for that artifact, never an error.

Where should the reviewer start?

lib/parse/m2-remote-repositories.ts (the new parsing/URL logic) and the orchestration in lib/index.ts (inspect, the includeComponentMetadata block). lib/parse/m2-batch.ts is the small shared scaffold both label passes now use.

How should this be manually tested?

Against a Maven project whose dependencies are already resolved into ~/.m2 (so the _remote.repositories files exist):

# SBOM path exercises includeComponentMetadata + --print-graph
./bin/snyk sbom --format=cyclonedx1.4+json --file=pom.xml /path/to/maven-project \
  | jq '.components[].externalReferences'

Confirm distribution-type references appear with sane URLs. Worth spot-checking:

  • Off-cwd / monorepo pom (--file=subdir/pom.xml from a different working dir) — labels should still resolve (the dependency:list-repositories call is now scoped with cwd/--file, mirroring the dependency-tree pipeline).
  • A mirror/settings.xml repo URL ending in / — the emitted URL must not contain a // (trailing-slash normalisation).
  • Aggregate project (--maven-aggregate-project) — repositories are unioned across modules.

Direct-plugin alternative:

DEBUG=snyk-mvn-plugin node -e "require('./dist').inspect('/path/to/project','pom.xml',{includeComponentMetadata:true}).then(r=>console.log(JSON.stringify(r,null,2)))"

Then grep for distribution:url.

Automated coverage: npm run test:functional (see tests/jest/functional/m2-remote-repositories.spec.ts and m2-hash-labels.spec.ts).

Any background context you want to provide?

Builds on the existing includeComponentMetadata hash-label work and reuses the same .m2 access path. Notable correctness fixes folded in during review: the feature is gated solely on includeComponentMetadata (no separate flag); dependency:list-repositories runs with --batch-mode and proper cwd/--file targeting; the _remote.repositories read is byte-bounded (64 KiB) like the sibling hash-labels module; and the repo-URL join is trailing-slash-normalised.

What are the relevant tickets?

Three correctness fixes for the distribution:url label feature:

- Gate the distribution:url path on includeComponentMetadata and drop
  the separate includeDistribution option. The feature is part of
  includeComponentMetadata, not its own flag; as written the path was
  gated on includeDistribution while repositoryPath was only resolved
  under includeComponentMetadata, so it could never run.

- Run dependency:list-repositories via subProcess.execute with the
  Maven working directory and --file targeting, mirroring the
  dependency-tree pipeline. Previously it ran with no cwd/--file, so
  off-cwd or monorepo-subdir poms resolved a different project's
  repositories (or none).

- Pass --batch-mode so CI/non-tty Maven output isn't decorated with
  colour codes or download-progress lines that break repo-line parsing.
…604]

Cleanups for the includeComponentMetadata label passes:

- Extract collectM2Nodes + buildLabelMap into lib/parse/m2-batch.ts, a
  single bounded-concurrency batch loop. The hash-label and
  distribution-url passes were near-verbatim copies of the node-id
  union + slice/Promise.all/store-non-empty loop; now there is one.

- Resolve the node set and each node's artifact path once and share it
  across both passes, instead of rebuilding the node Set and recomputing
  dependencyIdToArtifactPath in each. readM2HashLabels and
  readRemoteRepositoryLabel now take the pre-resolved artifact path.

- Run the hash-label reads concurrently with the
  dependency:list-repositories subprocess rather than strictly after it,
  so inspect latency is the max of the two rather than their sum.
…PA-604]

- Read _remote.repositories through a bounded fs.open prefix read
  (64 KiB) instead of fs.readFile, mirroring the m2-hash-labels cap. A
  real file is a handful of short <filename>><repoId>= records, well
  under 1 KiB; the bound stops a misconfigured mirror that wrote a large
  HTML error page at this path from being buffered wholesale.

- Strip a trailing slash from the repository URL before joining the
  artifact's relative path, so a settings.xml/mirror URL like
  '.../maven2/' no longer yields a '.../maven2//com/...' double slash.

Adds m2-remote-repositories.spec.ts covering URL construction, the
trailing-slash case, unknown repo id, and a missing file.
@calhar-snyk calhar-snyk requested a review from a team as a code owner June 26, 2026 15:16
@snyk-io

snyk-io Bot commented Jun 26, 2026

Copy link
Copy Markdown

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues
Code Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@calhar-snyk calhar-snyk changed the title Feat/cmpa 604 distribution url label feat: [CMPA-604] Add distribution:url label to nodes on DepGraph when includeComponentMetadata Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants