Conversation
- create pydantic models for CycloneDX v1.6 and SPDX v2.3 - use uv for running without venv - remove outdated test file UV is not required, but recommended. Versions in the requirements.txt are known to work.
There was a problem hiding this comment.
Pull Request Overview
Adds CycloneDX (v1.6) support alongside existing SPDX (v2.3) by introducing Pydantic models and refactoring SBOM parsing and merging logic. Also updates usage instructions to optionally run via uv and removes an outdated test file.
- Introduces typed models for SPDX and CycloneDX and unified FDARecord export format
- Refactors merge/dedup logic (list-based FDARecord processing) and Excel export
- Removes prior merge test; README updated with supported formats and uv usage
Reviewed Changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| gen_sbom.py | Major refactor: adds data models, new merge/deduplicate logic, Excel export rewrite, CycloneDX support |
| README.md | Documents supported formats and uv-based execution |
| test_gen_sbom.py | Removed legacy merge test (no replacement added) |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
dfb4631 to
7c91e20
Compare
VCPKG is in the process of better supporting SBOMs. In the meantime, this is the next best solution. This allows mapping the packages to CPE strings. There is a manual map in vcpkg.yml. Not very nice, but better than nothing.
Mostly AI generated. Trivy and Syft/Grype are better tools. Use this only as a last resort. There is a reason there are entire projects and paid products exist solve this problem. It is not trivial. You can get around 80% with this tool, but the last 20% is very difficult.
| import sys | ||
| import time | ||
|
|
||
| import nvdlib |
There was a problem hiding this comment.
I don't see this package in the requirements.txt file (I'm aware this isn't technically necessary if this file is run as a script, but this breaks standard tooling on the IDE side of things if you are only using the requirements file to install into a virtual environment).
Side note: Rather than trying to keep both systems going - old-school requirements.txt and uv / pyproject dependency management style - I would just fully commit and switch the whole repo over to uv. I think, especially with internal tools, we should take advantage of opportunities like this to switch over to new tools whenever possible.
| @click.command() | ||
| @click.argument( | ||
| "input_directory_path", | ||
| type=click.Path(exists=True, file_okay=False, path_type=Path), | ||
| ) | ||
| @click.argument("output_file_path", type=click.Path(dir_okay=False, path_type=Path)) | ||
| @click.option("--verbose", is_flag=True, help="Enable verbose logging.") | ||
| @click.option("--author-name", type=str, default=None, help="Override the Author Name.") | ||
| @click.option("--vcpkg", is_flag=True, help="Combine VCPKG SBOMs.") | ||
| @click.option( | ||
| "--spdx-output-file", | ||
| type=click.Path(dir_okay=False, path_type=Path), | ||
| default=None, | ||
| help="Output combined SPDX SBOM file path (for VCPKG only).", | ||
| ) |
There was a problem hiding this comment.
I'm glad to see the removal of argparse!
Btw, just because I think you might be interested in exploring this in the future - a library I like even more than click is typer. It essentially re-uses the type-annotations as validation (kinda similar to pydantic), so it cuts way down on boilerplate.
| class FDARecord(BaseModel): | ||
| """FDA required fields.""" | ||
|
|
||
| author: str | ||
| timestamp: str | ||
| supplier: str = "Open-source software" | ||
| name: str | ||
| version: str | ||
| unique_identifier: str | ||
| relationship: Literal["Is contained by"] = "Is contained by" |
There was a problem hiding this comment.
For future maintainability, it would be nice to have a docstring here noting where these are coming from (e.g., Based on the ___ guidance, [link]).
| """Merge two SBOMs, keeping the newest version of each package.""" | ||
| records = {(r.name, r.supplier): r for r in sbom1} | ||
| for r in sbom2: | ||
| key = (r.name, r.supplier) | ||
| if key in records: | ||
| records[key] = newer(records[key], r) |
There was a problem hiding this comment.
Hmm... from a security perspective, I'm not sure about the soundness of this.
In general, if you had to pick one, wouldn't it make more sense to use the oldest version as the record instead of the newest? I.e., [insert_thing] is only as strong as its weakest link - if we are scanning a combined sbom for vulns, I would think that the worst offenders would come from being out-of-date, rather than newly introduced vulnerabilities.
That being said, I also don't think we should just be using one or the other - cyclonedx supports declaring version ranges instead of a singular version, and both I'm not sure about spdx, but I know you can use parent-child / linkage to declare the same dependency twice, but as belonging to different parts of the application.
UV is not required, but recommended. Versions in the requirements.txt are known to work.