Skip to content

Commit 3ce0d0a

Browse files
authored
Merge pull request #98 from scanoss/feature/mdaloia/add-folder-scan-command
feat: ES-163 Add folder hashing support
2 parents 8af0416 + 09df227 commit 3ce0d0a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+2621
-1085
lines changed

Diff for: .github/workflows/lint.yml

+8-2
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,17 @@ jobs:
3232
# List all changed Python files since the merge base.
3333
files=$(git diff --name-only "$merge_base" HEAD | grep '\.py$' || true)
3434
35+
# Filter out files that match exclude patterns from pyproject.toml
36+
# this is a temporary workaround until we fix all the lint errors
37+
filtered_files=$(echo "$files" | grep -v -E 'tests/|test_.*\.py|src/protoc_gen_swagger/|src/scanoss/api/' || true)
38+
3539
# Use the multi-line syntax for outputs.
3640
echo "files<<EOF" >> "$GITHUB_OUTPUT"
37-
echo "${files}" >> "$GITHUB_OUTPUT"
41+
echo "${filtered_files}" >> "$GITHUB_OUTPUT"
3842
echo "EOF" >> "$GITHUB_OUTPUT"
3943
40-
echo "Changed files: ${files}"
44+
echo "Changed files before filtering: ${files}"
45+
echo "Changed files after filtering: ${filtered_files}"
4146
4247
- name: Run Ruff on changed files
4348
run: |
@@ -50,3 +55,4 @@ jobs:
5055
# Pass the list of changed files to Ruff.
5156
echo "${{ steps.changed_files.outputs.files }}" | xargs ruff check
5257
fi
58+

Diff for: CHANGELOG.md

+9-1
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99
### Added
1010
- Upcoming changes...
1111

12+
## [1.21.0] - 2025-03-27
13+
### Added
14+
- Add folder-scan subcommand
15+
- Add folder-hash subcommand
16+
- Add AbstractPresenter class for presenting output in a given format
17+
- Add several reusable helper functions for constructing config objects from CLI args
18+
1219
## [1.20.6] - 2025-03-19
1320
### Added
1421
- Added HTTP/gRPC generic headers feature using --header flag
@@ -490,4 +497,5 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
490497
[1.20.3]: https://github.com/scanoss/scanoss.py/compare/v1.20.2...v1.20.3
491498
[1.20.4]: https://github.com/scanoss/scanoss.py/compare/v1.20.3...v1.20.4
492499
[1.20.5]: https://github.com/scanoss/scanoss.py/compare/v1.20.4...v1.20.5
493-
[1.20.6]: https://github.com/scanoss/scanoss.py/compare/v1.20.5...v1.20.6
500+
[1.20.6]: https://github.com/scanoss/scanoss.py/compare/v1.20.5...v1.20.6
501+
[1.21.0]: https://github.com/scanoss/scanoss.py/compare/v1.20.6...v1.21.0

Diff for: CLIENT_HELP.md

+9
Original file line numberDiff line numberDiff line change
@@ -420,4 +420,13 @@ scanoss-py insp undeclared -i scan-results.json --output undeclared-summary.jira
420420
The following command can be used to inspect for undeclared components and save the results in Jira Markdown format.
421421
```bash
422422
scanoss-py insp copyleft -i scan-results.json --output copyleft-summary.jiramd --status copyleft-status.jiramd --format jira_md
423+
```
424+
425+
### Folder-Scan a Project Folder
426+
427+
The new `folder-scan` subcommand performs a comprehensive scan on an entire directory by recursively processing files to generate folder-level fingerprints. It computes CRC64 hashes and simhash values to detect directory-level similarities, which is especially useful for comparing large code bases or detecting duplicate folder structures.
428+
429+
**Usage:**
430+
```shell
431+
scanoss-py folder-scan /path/to/folder -o folder-scan-results.json
423432
```

Diff for: docs/source/index.rst

+73
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,79 @@ Convert file format to plain, SPDX-Lite, CycloneDX or csv.
228228
* - --format <format>, -f <format>
229229
- Indicates the result output format: {plain, cyclonedx, spdxlite, csv}. (optional - default plain)
230230

231+
--------------------------------
232+
Folder Scanning: folder-scan, fs
233+
--------------------------------
234+
235+
Performs a comprehensive scan of a directory using folder hashing to identify components and their matches.
236+
237+
.. code-block:: bash
238+
239+
scanoss-py folder-scan <directory>
240+
241+
.. list-table::
242+
:widths: 20 30
243+
:header-rows: 1
244+
245+
* - Argument
246+
- Description
247+
* - --output <file name>, -o <file name>
248+
- Output result file name (optional - default STDOUT)
249+
* - --format <format>, -f <format>
250+
- Output format: {json} (optional - default json)
251+
* - --timeout <seconds>, -M <seconds>
252+
- Timeout in seconds for API communication (optional - default 600)
253+
* - --best-match, -bm
254+
- Enable best match mode (optional - default: False)
255+
* - --threshold <1-100>
256+
- Threshold for result matching (optional - default: 100)
257+
* - --settings <file>, -st <file>
258+
- Settings file to use for scanning (optional - default scanoss.json)
259+
* - --skip-settings-file, -stf
260+
- Skip default settings file (scanoss.json) if it exists
261+
* - --key <token>, -k <token>
262+
- SCANOSS API Key token (optional - not required for default OSSKB URL)
263+
* - --proxy <url>
264+
- Proxy URL to use for connections
265+
* - --pac <file/url>
266+
- Proxy auto configuration. Specify a file, http url or "auto"
267+
* - --ca-cert <file>
268+
- Alternative certificate PEM file
269+
* - --api2url <url>
270+
- SCANOSS gRPC API 2.0 URL (optional - default: https://api.osskb.org)
271+
* - --grpc-proxy <url>
272+
- GRPC Proxy URL to use for connections
273+
274+
--------------------------------
275+
Folder Hashing: folder-hash, fh
276+
--------------------------------
277+
278+
Generates cryptographic hashes for files in a given directory and its subdirectories.
279+
280+
.. code-block:: bash
281+
282+
scanoss-py folder-hash <directory>
283+
284+
.. list-table::
285+
:widths: 20 30
286+
:header-rows: 1
287+
288+
* - Argument
289+
- Description
290+
* - --output <file name>, -o <file name>
291+
- Output result file name (optional - default STDOUT)
292+
* - --format <format>, -f <format>
293+
- Output format: {json} (optional - default json)
294+
* - --settings <file>, -st <file>
295+
- Settings file to use for scanning (optional - default scanoss.json)
296+
* - --skip-settings-file, -stf
297+
- Skip default settings file (scanoss.json) if it exists
298+
299+
Both commands also support these general options:
300+
* --debug, -d: Enable debug messages
301+
* --trace, -t: Enable trace messages
302+
* --quiet, -q: Enable quiet mode
303+
231304
-----------------
232305
Component:
233306
-----------------

Diff for: pyproject.toml

+6-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,12 @@ select = ["E", "F", "I", "PL"]
88
line-length = 120
99
# Assume Python 3.7+
1010
target-version = "py37"
11-
exclude = ["tests/*", "test_*.py", "src/scanoss/cli.py"]
11+
exclude = [
12+
"tests/*",
13+
"test_*.py",
14+
"src/protoc_gen_swagger/*",
15+
"src/scanoss/api/*",
16+
]
1217

1318
[tool.ruff.format]
1419
quote-style = "single"

Diff for: requirements.txt

+2-1
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,5 @@ google-api-core
1111
importlib_resources
1212
packageurl-python
1313
pathspec
14-
jsonschema
14+
jsonschema
15+
crc

Diff for: setup.cfg

+1
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ install_requires =
3838
packageurl-python
3939
pathspec
4040
jsonschema
41+
crc
4142

4243

4344
[options.extras_require]

Diff for: src/protoc_gen_swagger/options/annotations_pb2.py

+9-12
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Generated by the gRPC Python protocol compiler plugin. DO NOT EDIT!
22
"""Client and server classes corresponding to protobuf-defined services."""
3-
43
import grpc
4+

0 commit comments

Comments
 (0)