Skip to content

feat: ES-163 Add folder hashing support #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,17 @@ jobs:
# List all changed Python files since the merge base.
files=$(git diff --name-only "$merge_base" HEAD | grep '\.py$' || true)

# Filter out files that match exclude patterns from pyproject.toml
# this is a temporary workaround until we fix all the lint errors
filtered_files=$(echo "$files" | grep -v -E 'tests/|test_.*\.py|src/protoc_gen_swagger/|src/scanoss/api/' || true)

# Use the multi-line syntax for outputs.
echo "files<<EOF" >> "$GITHUB_OUTPUT"
echo "${files}" >> "$GITHUB_OUTPUT"
echo "${filtered_files}" >> "$GITHUB_OUTPUT"
echo "EOF" >> "$GITHUB_OUTPUT"

echo "Changed files: ${files}"
echo "Changed files before filtering: ${files}"
echo "Changed files after filtering: ${filtered_files}"

- name: Run Ruff on changed files
run: |
Expand All @@ -50,3 +55,4 @@ jobs:
# Pass the list of changed files to Ruff.
echo "${{ steps.changed_files.outputs.files }}" | xargs ruff check
fi

10 changes: 9 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- Upcoming changes...

## [1.21.0] - 2025-03-27
### Added
- Add folder-scan subcommand
- Add folder-hash subcommand
- Add AbstractPresenter class for presenting output in a given format
- Add several reusable helper functions for constructing config objects from CLI args

## [1.20.6] - 2025-03-19
### Added
- Added HTTP/gRPC generic headers feature using --header flag
Expand Down Expand Up @@ -490,4 +497,5 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
[1.20.3]: https://github.com/scanoss/scanoss.py/compare/v1.20.2...v1.20.3
[1.20.4]: https://github.com/scanoss/scanoss.py/compare/v1.20.3...v1.20.4
[1.20.5]: https://github.com/scanoss/scanoss.py/compare/v1.20.4...v1.20.5
[1.20.6]: https://github.com/scanoss/scanoss.py/compare/v1.20.5...v1.20.6
[1.20.6]: https://github.com/scanoss/scanoss.py/compare/v1.20.5...v1.20.6
[1.21.0]: https://github.com/scanoss/scanoss.py/compare/v1.20.6...v1.21.0
9 changes: 9 additions & 0 deletions CLIENT_HELP.md
Original file line number Diff line number Diff line change
Expand Up @@ -420,4 +420,13 @@ scanoss-py insp undeclared -i scan-results.json --output undeclared-summary.jira
The following command can be used to inspect for undeclared components and save the results in Jira Markdown format.
```bash
scanoss-py insp copyleft -i scan-results.json --output copyleft-summary.jiramd --status copyleft-status.jiramd --format jira_md
```

### Folder-Scan a Project Folder

The new `folder-scan` subcommand performs a comprehensive scan on an entire directory by recursively processing files to generate folder-level fingerprints. It computes CRC64 hashes and simhash values to detect directory-level similarities, which is especially useful for comparing large code bases or detecting duplicate folder structures.

**Usage:**
```shell
scanoss-py folder-scan /path/to/folder -o folder-scan-results.json
```
73 changes: 73 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,79 @@ Convert file format to plain, SPDX-Lite, CycloneDX or csv.
* - --format <format>, -f <format>
- Indicates the result output format: {plain, cyclonedx, spdxlite, csv}. (optional - default plain)

--------------------------------
Folder Scanning: folder-scan, fs
--------------------------------

Performs a comprehensive scan of a directory using folder hashing to identify components and their matches.

.. code-block:: bash

scanoss-py folder-scan <directory>

.. list-table::
:widths: 20 30
:header-rows: 1

* - Argument
- Description
* - --output <file name>, -o <file name>
- Output result file name (optional - default STDOUT)
* - --format <format>, -f <format>
- Output format: {json} (optional - default json)
* - --timeout <seconds>, -M <seconds>
- Timeout in seconds for API communication (optional - default 600)
* - --best-match, -bm
- Enable best match mode (optional - default: False)
* - --threshold <1-100>
- Threshold for result matching (optional - default: 100)
* - --settings <file>, -st <file>
- Settings file to use for scanning (optional - default scanoss.json)
* - --skip-settings-file, -stf
- Skip default settings file (scanoss.json) if it exists
* - --key <token>, -k <token>
- SCANOSS API Key token (optional - not required for default OSSKB URL)
* - --proxy <url>
- Proxy URL to use for connections
* - --pac <file/url>
- Proxy auto configuration. Specify a file, http url or "auto"
* - --ca-cert <file>
- Alternative certificate PEM file
* - --api2url <url>
- SCANOSS gRPC API 2.0 URL (optional - default: https://api.osskb.org)
* - --grpc-proxy <url>
- GRPC Proxy URL to use for connections

--------------------------------
Folder Hashing: folder-hash, fh
--------------------------------

Generates cryptographic hashes for files in a given directory and its subdirectories.

.. code-block:: bash

scanoss-py folder-hash <directory>

.. list-table::
:widths: 20 30
:header-rows: 1

* - Argument
- Description
* - --output <file name>, -o <file name>
- Output result file name (optional - default STDOUT)
* - --format <format>, -f <format>
- Output format: {json} (optional - default json)
* - --settings <file>, -st <file>
- Settings file to use for scanning (optional - default scanoss.json)
* - --skip-settings-file, -stf
- Skip default settings file (scanoss.json) if it exists

Both commands also support these general options:
* --debug, -d: Enable debug messages
* --trace, -t: Enable trace messages
* --quiet, -q: Enable quiet mode

-----------------
Component:
-----------------
Expand Down
7 changes: 6 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,12 @@ select = ["E", "F", "I", "PL"]
line-length = 120
# Assume Python 3.7+
target-version = "py37"
exclude = ["tests/*", "test_*.py", "src/scanoss/cli.py"]
exclude = [
"tests/*",
"test_*.py",
"src/protoc_gen_swagger/*",
"src/scanoss/api/*",
]

[tool.ruff.format]
quote-style = "single"
Expand Down
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,5 @@ google-api-core
importlib_resources
packageurl-python
pathspec
jsonschema
jsonschema
crc
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ install_requires =
packageurl-python
pathspec
jsonschema
crc


[options.extras_require]
Expand Down
21 changes: 9 additions & 12 deletions src/protoc_gen_swagger/options/annotations_pb2.py

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion src/protoc_gen_swagger/options/annotations_pb2_grpc.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Generated by the gRPC Python protocol compiler plugin. DO NOT EDIT!
"""Client and server classes corresponding to protobuf-defined services."""

import grpc

Loading