feat: ES-163 Add folder hashing support #98

matiasdaloia · 2025-02-10T12:48:19Z

Summary by CodeRabbit

New Features
- Introduced a new CLI subcommand for comprehensive folder scanning that computes directory-level fingerprints.
- Enhanced folder hashing via an improved remote procedure call interface.
- Added a new AbstractPresenter class for output presentation in various formats.
- Added several reusable helper functions for constructing configuration objects from command-line interface (CLI) arguments.
- Updated the CHANGELOG.md to reflect new versioning and features.
- Implemented a new folder hashing module with directory tree structure and hash computation capabilities.
- Added a new command for generating cryptographic hashes for files in specified directories.
- Added a new command for scanning folders and generating hashes.
Refactor
- Redesigned output presentation for consistent and flexible formatting of scan results.
- Updated configuration management for improved maintainability.
Documentation
- Updated help guides and version history for clear usage instructions and accurate version tracking.
- Added new version section and updated references in CHANGELOG.md.
- Added a new license header to relevant files.

coderabbitai · 2025-02-10T12:48:27Z

Walkthrough

The pull request introduces version 1.21.0, expanding the scanoss-py CLI with the folder-scan and folder-hash subcommands. It implements an AbstractPresenter class for output formatting, new reusable helper functions for configuration, and enhances gRPC service functionality with new methods. Additionally, it updates documentation, version references, and includes new constants while preserving existing functionalities.

Changes

File(s)	Summary
`CHANGELOG.md`, `CLIENT_HELP.md`	Added new version section for `1.21.0`, documented the `folder-scan` and `folder-hash` subcommands, and introduced the `AbstractPresenter` class. Updated version references.
`pyproject.toml`, `requirements.txt`	Revised exclusion paths and added `jsonschema` and `crc` packages.
`src/scanoss/__init__.py`	Updated version from `1.20.6` to `1.21.0`.
`src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py`	Added `HFHRequest` and `HFHResponse` messages, and introduced the `FolderHashScan` method.
`src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py`	Enhanced gRPC functionality with the new `FolderHashScan` method.
`src/scanoss/cli.py`	Integrated the `folder-scan` and `folder-hash` commands with improved argument parsing and error handling.
`src/scanoss/constants.py`	Added several new default configuration constants.
`src/scanoss/file_filters.py`	Updated file skipping methods and added new file extensions to the skip list.
`src/scanoss/results.py`	Modified the `Results` class to inherit from `AbstractPresenter` and adjusted output formatting methods.
`src/scanoss/scanners/__init__.py`, `src/scanoss/scanners/scanner_config.py`	Added an MIT license header and introduced the `ScannerConfig` data class.
`src/scanoss/scanners/scanner_hfh.py`	Implemented the `ScannerHFH` class for folder hashing, including various helper methods.
`src/scanoss/scanners/folder_hasher.py`	Introduced the `FolderHasher` class for directory hashing, with configuration management and result presentation.
`src/scanoss/scanossbase.py`	Renamed parameter in `print_to_file_or_stdout` from `msg` to `content`.
`src/scanoss/scanossgrpc.py`	Added `folder_hash_scan` method and a custom exception for gRPC errors. Introduced a `GrpcConfig` data class.
`src/scanoss/utils/abstract_presenter.py`	Created the `AbstractPresenter` class to standardize output formatting.
`src/scanoss/utils/crc64.py`	Implemented the CRC64 algorithm with associated helper methods.
`src/scanoss/utils/simhash.py`	Introduced simhash computation functionalities, including feature extraction and utility functions.
`docs/source/index.rst`	Updated documentation to include `folder-scan` and `folder-hash` commands with detailed argument descriptions.

Sequence Diagram(s)

sequenceDiagram
    participant U as User
    participant CLI as CLI Parser
    participant HFH as ScannerHFH
    participant GRPC as ScanossGrpc
    participant Service as Scanning Service
    participant Presenter as AbstractPresenter

    U->>CLI: Execute folder-scan command
    CLI->>HFH: Call folder_hashing_scan(folder_path, settings)
    HFH->>GRPC: Invoke folder_hash_scan() via gRPC
    GRPC->>Service: Perform RPC folder hash scan
    Service-->>GRPC: Return scan results
    GRPC-->>HFH: Deliver scan results
    HFH->>Presenter: Format results (JSON/plain)
    Presenter-->>U: Output formatted results

Suggested reviewers

eeisegn

Poem

I'm a rabbit, hopping with glee,
Coding through changes, wild and free.
I scan folders with a swift, keen glance,
With hashes and bytes in a merry dance.
In a world of code where features compile,
I nibble each bug with a joyful smile!
Happy coding, one hop at a time!

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai plan to trigger planning for file edits and PR creation.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 7

🔭 Outside diff range comments (1)

src/scanoss/results.py (1)
200-206: Potential IndexError with 'purl' or 'licenses'
When 'purl' or 'licenses' is an empty list, accessing [0] can raise an IndexError. Consider checking if the list has elements before indexing.
- 'purl': (item.get('purl')[0] if item.get('purl') else 'N/A'),
+ purls = item.get('purl')
+ 'purl': purls[0] if purls and len(purls) > 0 else 'N/A',

🧹 Nitpick comments (13)

src/scanoss/utils/abstract_presenter.py (1)

5-5: Consider making output formats extensible.
Currently, AVAILABLE_OUTPUT_FORMATS is a hardcoded list. If more formats are expected in the future, you might consider a more dynamic or configurable approach.
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (2)
7-7: Imports need sorting/formatting.
Your lint pipeline flagged an un-sorted import block. Consider placing all imports at the top of the file in alphabetical order or adding an ignore if this file is auto-generated.

🧰 Tools

🪛 GitHub Actions: Lint

[warning] 7-7: Import block is un-sorted or un-formatted. Please organize imports.

30-30: Address excessive line lengths or add ignores if auto-generated.
These lines exceed the 120-character limit cited by the pipeline. For manually maintained code, line-wrapping is preferable. If this file is auto-generated, consider adding a # noqa: E501 or equivalent to bypass lint.

Example approach to splitting line 30:
-DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n.scanoss/api/scanning/v2/scanoss-scanning.proto...<1925 chars>...')
+descriptor_data = b'\n.scanoss/api/scanning/v2/scanoss-scanning.proto...' \
+                 b'...<split any other long strings as needed>...'
+DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(descriptor_data)
Also applies to: 37-37, 39-39, 41-41

🧰 Tools

🪛 Ruff (0.8.2)

30-30: Line too long (1925 > 120)

(E501)

🪛 GitHub Actions: Lint

[error] 30-30: Line too long (1925 > 120).
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (3)
3-3: Organize import statements.
Per pipeline warnings, consider re-sorting your import statements. If gRPC tooling auto-generates this code, you can add an ignore directive or reorder them if permitted by the generator.

🧰 Tools

🪛 GitHub Actions: Lint

[warning] 3-3: Import block is un-sorted or un-formatted. Please organize imports.

22-22: Remove extraneous “f” prefix.
Your pipeline flags this as an f-string without placeholders. It can be replaced with a normal string.
-    raise RuntimeError(
-        f'The grpc package installed is at version {GRPC_VERSION},'
-        + f' but the generated code ...'
+    raise RuntimeError(
+        'The grpc package installed is at version ' + GRPC_VERSION +
+        ' but the generated code ...'
🧰 Tools

🪛 Ruff (0.8.2)

22-22: f-string without any placeholders

Remove extraneous f prefix

(F541)

🪛 GitHub Actions: Lint

[error] 22-22: f-string without any placeholders.

7-7: Line exceeds 120 characters.
If this file is manually maintained, shorten the line. If auto-generated, consider adding ignore directives or adjusting generation parameters to split long lines.

🧰 Tools

🪛 Ruff (0.8.2)

7-7: Line too long (122 > 120)

(E501)

🪛 GitHub Actions: Lint

[error] 7-7: Line too long (122 > 120).
src/scanoss/utils/simhash.py (2)
42-50: Consider renaming the sum method to avoid overshadowing Python's built-in function
The method name sum in the SimhashFeature class overshadows Python's built-in, which might cause confusion if used in dynamic or reflection-based contexts.
-    def sum(self) -> int:
+    def hash_sum(self) -> int:
189-190: Minor discrepancy in docstring
The docstring references "k" but the code variable is "w". Consider aligning the naming for clarity.
-        raise ValueError('simhash.shingle(): k must be a positive integer')
+        raise ValueError('simhash.shingle(): w must be a positive integer')
src/scanoss/scanners/scanner_hfh.py (2)

189-191: Consider more specific exception handling
Catching a broad Exception might mask other potential issues. Consider handling only expected exceptions or re-raising unhandled ones to maintain clarity.

231-260: Potential performance concern for large directories
Concatenating large sets of file names into a single string could degrade performance in extremely large directories.
src/scanoss/scanossgrpc.py (1)
494-497: Use descriptive constants instead of magic values.
Lines 495-498 compare status_code to numeric constants (2 and 3). To comply with linting warnings (PLR2004, etc.), consider a small enum or named constants for clarity, e.g. STATUS_CODE_WARNING = 2, STATUS_CODE_FAILURE = 3.
- if status_code == 2:
+ if status_code == STATUS_CODE_WARNING:
    ...
- elif status_code == 3:
+ elif status_code == STATUS_CODE_FAILURE:
    ...
🧰 Tools

🪛 Ruff (0.8.2)

495-495: Magic value used in comparison, consider replacing 2 with a constant variable

(PLR2004)

🪛 GitHub Actions: Lint

[warning] 495-495: Magic value used in comparison, consider replacing 2 with a constant variable.
CLIENT_HELP.md (1)

406-415: Clarify and Enhance the Folder-Scan Section Documentation.
The newly added "Folder-Scan a Project Folder" section is well organized and provides a clear usage example. To further enhance clarity, consider adding a brief note on any prerequisites—such as required dependencies (e.g., the modules used for CRC64 and simhash computation) or links to the corresponding implementation (possibly in src/scanoss/cli.py or src/scanoss/scanners/scanner_hfh.py). This will help users understand if they need any additional setup to use the folder-scan subcommand.

CHANGELOG.md (1)

12-17: Improve Changelog Entry Consistency.
The new version entry for 1.21.0 is clear; however, to maintain consistency with previous changelog entries, consider using the past-tense formatting (e.g., "Added folder-scan subcommand" instead of "Add folder-scan subcommand"). This minor change will help improve overall consistency and readability of the changelog.

🧰 Tools

🪛 LanguageTool

[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-02-10 ### Added - Add folder-scan subcommand - Add AbstractPr...

(REPEATED_VERBS)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 24fa45b and f3a2acb.

📒 Files selected for processing (19)

CHANGELOG.md (2 hunks)
CLIENT_HELP.md (1 hunks)
pyproject.toml (1 hunks)
requirements.txt (1 hunks)
src/scanoss/__init__.py (1 hunks)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (2 hunks)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (3 hunks)
src/scanoss/cli.py (46 hunks)
src/scanoss/constants.py (1 hunks)
src/scanoss/file_filters.py (3 hunks)
src/scanoss/results.py (5 hunks)
src/scanoss/scanners/__init__.py (1 hunks)
src/scanoss/scanners/scanner_config.py (1 hunks)
src/scanoss/scanners/scanner_hfh.py (1 hunks)
src/scanoss/scanossbase.py (1 hunks)
src/scanoss/scanossgrpc.py (14 hunks)
src/scanoss/utils/abstract_presenter.py (1 hunks)
src/scanoss/utils/crc64.py (1 hunks)
src/scanoss/utils/simhash.py (1 hunks)

✅ Files skipped from review due to trivial changes (4)

requirements.txt
src/scanoss/scanners/init.py
src/scanoss/init.py
src/scanoss/constants.py

🧰 Additional context used

🪛 LanguageTool

CHANGELOG.md

[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-02-10 ### Added - Add folder-scan subcommand - Add AbstractPr...

(REPEATED_VERBS)

🪛 Ruff (0.8.2)

src/scanoss/file_filters.py

455-455: Too many return statements (7 > 6)

(PLR0911)

src/scanoss/scanossgrpc.py

90-90: Too many statements (51 > 50)

(PLR0915)

495-495: Magic value used in comparison, consider replacing 2 with a constant variable

(PLR2004)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py

30-30: Line too long (1925 > 120)

(E501)

37-37: Line too long (391 > 120)

(E501)

39-39: Line too long (130 > 120)

(E501)

41-41: Line too long (144 > 120)

(E501)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py

4-4: warnings imported but unused

Remove unused import: warnings

(F401)

7-7: Line too long (122 > 120)

(E501)

22-22: f-string without any placeholders

Remove extraneous f prefix

(F541)

98-98: Too many arguments in function definition (10 > 5)

(PLR0913)

125-125: Too many arguments in function definition (10 > 5)

(PLR0913)

🪛 GitHub Actions: Lint

src/scanoss/file_filters.py

[error] 243-243: Too many arguments in function definition (11 > 5).

[warning] 314-314: for loop variable dirpath overwritten by assignment target.

[warning] 455-455: Too many return statements (7 > 6).

[warning] 493-493: Too many return statements (7 > 6).

src/scanoss/scanossgrpc.py

[warning] 90-90: Too many statements (51 > 50).

[warning] 495-495: Magic value used in comparison, consider replacing 2 with a constant variable.

[warning] 498-498: Magic value used in comparison, consider replacing 3 with a constant variable.

src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py

[warning] 7-7: Import block is un-sorted or un-formatted. Please organize imports.

[error] 25-25: Module level import not at top of file.

[warning] 25-25: scanoss.api.common.v2.scanoss_common_pb2 imported but unused.

[error] 26-26: Module level import not at top of file.

[warning] 26-26: google.api.annotations_pb2 imported but unused.

[error] 27-27: Module level import not at top of file.

[warning] 27-27: protoc_gen_swagger.options.annotations_pb2 imported but unused.

[error] 30-30: Line too long (1925 > 120).

[error] 37-37: Line too long (391 > 120).

[error] 39-39: Line too long (130 > 120).

[error] 41-41: Line too long (144 > 120).

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py

[warning] 3-3: Import block is un-sorted or un-formatted. Please organize imports.

[warning] 4-4: warnings imported but unused.

[error] 7-7: Line too long (122 > 120).

[error] 22-22: f-string without any placeholders.

[error] 98-98: Too many arguments in function definition (10 > 5).

[error] 125-125: Too many arguments in function definition (10 > 5).

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: build

🔇 Additional comments (39)

src/scanoss/utils/abstract_presenter.py (1)

50-55: LGTM!
This _present_output method cleanly delegates to print_to_file_or_stdout, keeping presentation logic modular.

src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (1)

25-25: Verify usage of imported module.
The pipeline warns that scanoss.api.common.v2.scanoss_common_pb2 is imported but unused. If it’s truly unused, remove it; otherwise, confirm it is referenced properly in the code.

🧰 Tools

🪛 Ruff (0.8.2)

25-25: Module level import not at top of file

(E402)

25-25: scanoss.api.common.v2.scanoss_common_pb2 imported but unused

Remove unused import: scanoss.api.common.v2.scanoss_common_pb2

(F401)

🪛 GitHub Actions: Lint

[error] 25-25: Module level import not at top of file.

[warning] 25-25: scanoss.api.common.v2.scanoss_common_pb2 imported but unused.

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (1)

98-98: Too many parameters in function definitions.
Your lint pipeline flags these methods for having more than 5 parameters. Since this is standard for gRPC stubs, you can safely ignore or suppress this check.

Also applies to: 125-125

🧰 Tools

🪛 Ruff (0.8.2)

98-98: Too many arguments in function definition (10 > 5)

(PLR0913)

🪛 GitHub Actions: Lint

[error] 98-98: Too many arguments in function definition (10 > 5).

src/scanoss/utils/simhash.py (1)

1-199: Overall Implementation Feedback
The simhash implementation appears correct and efficient. The usage of FNV64, the vectorization approach, and the final simhash/hamming distance logic follows standard practices.

src/scanoss/results.py (1)

1-219: Code structure changes are logical
Inheritance from AbstractPresenter appears to streamline result presentation logic without major disruptions.

src/scanoss/scanners/scanner_hfh.py (1)

1-296: Initial Implementation Looks Good
The folder hashing logic is cohesive and effectively leverages simhash and CRC64.

src/scanoss/scanossgrpc.py (22)

25-25: Import statements look fine.
No issues noted with the added imports; they align well with the new functionality.

Also applies to: 28-29

37-39: Additional import stubs for gRPC scanning and provenances.
These imports are necessary for the newly introduced scanning features.

41-48: New references to version and proto definitions.
This set of imports is consistent with the code's usage of MessageToDict, ParseDict, and StatusResponse.

55-62: Components, Cryptography, and Dependencies stubs.
These stubs follow standard gRPC usage patterns.

64-68: HFHRequest, SemgrepResponse, and VulnerabilityResponse usage.
These lines introduce proto-based request and response classes for scanning and vulnerability checks.

70-70: Blank line insertion.
Minor formatting change; no concerns.

77-83: New ScanossGrpcError exception class.
Nicely encapsulates gRPC-specific errors, improving clarity in error handling.

156-156: Insecure channel scanning stub creation.
No immediate concerns. Keep ensuring appropriate environment checks for production usage.

168-168: Secure channel scanning stub creation.
Similarly, this line is consistent with the established pattern for secure connections.

249-249: Error message for empty dependency input.
Message clarity is fine.

256-256: Error message in get_dependencies_json.
Consistent with the function's usage.

283-283: Error message in get_crypto_json.
Nothing to change.

312-312: Error message in get_vulnerabilities_json.
Matches existing error style.

342-342: Error message in get_semgrep_json.
No issues.

372-372: Error message in search_components_json.
Ok as-is.

402-402: Error message in get_component_versions_json.
No further changes needed.

425-434: Docstring for folder_hash_scan.
Describes the functionality well.

435-441: RPC call for folder hash scan.
This approach reuses _call_rpc to standardize gRPC calls. Good pattern.

442-480: _call_rpc utility method.
Centralizing gRPC logic helps with reuse and reliability. The error raising via ScanossGrpcError is also a good pattern.

544-546: Exception logging in get_provenance_json.
Properly logs the gRPC call failure to stderr, consistent with other methods.

561-574: GrpcConfig dataclass.
This data class encapsulates gRPC client configuration neatly.

576-589: create_grpc_config_from_args helper function.
This function is straightforward, ensuring default values are pulled from environment or fallback constants where appropriate.

src/scanoss/scanners/scanner_config.py (2)

1-55: New ScannerConfig class introduction.
All lines in this range define the license header, imports, and the ScannerConfig dataclass. The structure is clear, with sensible defaults and typed optional fields.

57-74: create_scanner_config_from_args: well-organized function.
Constructs a ScannerConfig instance reliably. Good usage of getattr for CLI argument fallback.

src/scanoss/scanossbase.py (1)

83-83: Renamed parameter for clarity and consistent usage.
Changing msg to content in print_to_file_or_stdout is a style improvement, clarifying the function’s intent. Implementation remains straightforward.

Also applies to: 89-89, 91-91

src/scanoss/utils/crc64.py (2)

29-42: LGTM! Efficient implementation of CRC64 ECMA standard.

The implementation correctly follows the ECMA standard and matches Go's hash/crc64 package. The lookup table is efficiently shared across instances by using a class-level variable.

43-69: LGTM! Robust implementation of core CRC64 calculation methods.

The implementation correctly handles:

Lookup table generation

Both bytes and UTF-8 strings in the update method

Final XOR operation in the digest method
src/scanoss/file_filters.py (2)

208-214: LGTM! Additional file extensions for skipping.

The new file extensions are valid additions to the skip list.

455-455: Verify the impact of renaming _should_skip_dir to should_skip_dir.

The method has been renamed from private (_should_skip_dir) to public (should_skip_dir). Ensure that this change in visibility is intentional and verify its usage across the codebase.

Run the following script to verify the method usage:
✅ Verification successful

🏁 Script executed:
#!/bin/bash
# Description: Check for any references to the old method name and verify the usage of the new method name.

# Test: Search for old method name. Expect: No occurrences.
echo "Checking for old method name..."
rg "_should_skip_dir"

# Test: Search for new method name. Expect: Only intended usage.
echo "Checking for new method name..."
rg "should_skip_dir"
Length of output: 393

Renaming Impact Verified
The search for _should_skip_dir returned no occurrences, and the new method name should_skip_dir is used as expected in src/scanoss/file_filters.py. This indicates that the renaming from private to public was intentional and correctly propagated.
🧰 Tools

🪛 Ruff (0.8.2)

455-455: Too many return statements (7 > 6)

(PLR0911)

🪛 GitHub Actions: Lint

[warning] 455-455: Too many return statements (7 > 6).
src/scanoss/cli.py (3)

477-491: LGTM! Well-structured folder-scan subcommand.

The implementation follows the established pattern and includes appropriate arguments consistent with other subcommands.

1451-1485: LGTM! Robust implementation of folder hashing scan.

The implementation includes:

Proper error handling with descriptive messages

Comprehensive configuration setup

Clear documentation

1487-1497: LGTM! Well-structured settings helper function.

The implementation follows good practices:

Single responsibility principle

Proper error handling

Clear return value

pyproject.toml (1)

11-16: LGTM! Improved exclude configuration.

The changes improve readability by using multi-line format and add appropriate paths to the exclude list.

src/scanoss/utils/abstract_presenter.py

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py

src/scanoss/results.py

src/scanoss/scanners/scanner_hfh.py

src/scanoss/scanossgrpc.py

src/scanoss/utils/crc64.py

coderabbitai

Actionable comments posted: 3

🔭 Outside diff range comments (1)

src/scanoss/file_filters.py (1)
455-491: Inconsistent method visibility.

The should_skip_dir method is made public while _should_skip_file remains private. This creates an inconsistent API where related methods have different visibility levels.

Make both methods public for consistency:
-def _should_skip_file(self, file_rel_path: str) -> bool:
+def should_skip_file(self, file_rel_path: str) -> bool:
Also applies to: 493-530

🧰 Tools

🪛 Ruff (0.8.2)

455-455: Too many return statements (7 > 6)

(PLR0911)

🪛 GitHub Actions: Lint

[error] 455-455: Too many return statements (7 > 6).

🧹 Nitpick comments (15)

src/scanoss/utils/abstract_presenter.py (1)
36-39: Use a more specific exception type for invalid format.

Raising a generic Exception for format validation can obscure handling in downstream code. Consider using ValueError or a custom exception to better reflect the nature of the error.
- raise Exception(
+ raise ValueError(
src/scanoss/utils/simhash.py (3)
47-49: Rename the sum() method for clarity.

sum() may not intuitively convey that this method returns the 64-bit hash value. Consider a more descriptive name like get_hash_value() or hash64() to improve readability.

66-82: Consider performance impacts for large feature sets.

The bit-by-bit operations in vectorize may become a bottleneck for very large lists. Optimizing this loop or parallelizing it (depending on the environment) might be beneficial if performance is critical.

112-122: Use modern popcount for Hamming distance.

Python 3.10+ provides an integer bit_count() method, which can be more performant and concise than manually clearing the lowest set bit in a loop.
 def compare(a: int, b: int) -> int:
     v = a ^ b
-    c = 0
-    while v:
-        v &= v - 1
-        c += 1
-    return c
+    return v.bit_count()
src/scanoss/results.py (1)
172-180: Use specialized exceptions for filter validation.

Raise a more specific exception, such as ValueError, instead of a generic Exception. This allows callers to handle invalid filter states more gracefully.
- raise Exception(
+ raise ValueError(
src/scanoss/scanners/scanner_hfh.py (3)

172-172: Consider chunked file reading to optimize memory usage.

Reading entire files into memory with full_file_path.read_bytes() can be problematic for very large files. Consider reading files in chunks or imposing a file size limit to avoid potential out-of-memory issues during folder hashing.

194-215: Potential parallelization for improved performance.

For very large directories, consider parallelizing the recursion in _hash_calc_from_node() and _build_root_node() by processing child nodes concurrently, reducing the time to build the directory tree and compute hashes.

253-254: Clarify rationale for overwriting the most significant byte.

Overwriting the MS byte of names_simhash with the computed head value might mask high-order bits. Document the logic behind this choice or rename _head_calc to make its purpose clearer to future maintainers.
src/scanoss/scanossgrpc.py (2)
425-441: Consider adding additional error handling or retry logic for folder_hash_scan.

As a potentially high-load operation, folder hashing might fail mid-scan. Evaluate adding robust retries or partial scanning logic to handle transient failures rather than returning immediately.

495-498: Replace magic values with named constants.

Comparing status_code directly to 2 or 3 in _check_status_response can be unclear. Consider using named constants or an enum to clarify the meaning of each status code.
- if status_code == 2:
+ if status_code == STATUS_SUCCEEDED_WITH_WARNINGS:
...
- elif status_code == 3:
+ elif status_code == STATUS_FAILED_WITH_WARNINGS:
🧰 Tools

🪛 Ruff (0.8.2)

495-495: Magic value used in comparison, consider replacing 2 with a constant variable

(PLR2004)

498-498: Magic value used in comparison, consider replacing 3 with a constant variable

(PLR2004)

🪛 GitHub Actions: Lint

[warning] 495-495: Magic value used in comparison, consider replacing 2 with a constant variable.

[warning] 498-498: Magic value used in comparison, consider replacing 3 with a constant variable.
src/scanoss/scanners/scanner_config.py (1)
57-73: Add validation for timeout and thread values.

Consider adding validation to ensure timeout values are positive and thread count is within reasonable bounds.
 def create_scanner_config_from_args(args) -> ScannerConfig:
+    post_size = getattr(args, 'post_size', DEFAULT_POST_SIZE)
+    timeout = max(1, getattr(args, 'timeout', DEFAULT_TIMEOUT))
+    sc_timeout = max(1, getattr(args, 'sc_timeout', DEFAULT_SC_TIMEOUT))
+    nb_threads = max(1, min(32, getattr(args, 'nb_threads', DEFAULT_NB_THREADS)))
     return ScannerConfig(
         debug=args.debug,
         trace=args.trace,
         quiet=args.quiet,
         api_key=getattr(args, 'api_key', None),
         url=getattr(args, 'url', None),
         grpc_url=getattr(args, 'grpc_url', None),
-        post_size=getattr(args, 'post_size', DEFAULT_POST_SIZE),
-        timeout=getattr(args, 'timeout', DEFAULT_TIMEOUT),
-        sc_timeout=getattr(args, 'sc_timeout', DEFAULT_SC_TIMEOUT),
-        nb_threads=getattr(args, 'nb_threads', DEFAULT_NB_THREADS),
+        post_size=post_size,
+        timeout=timeout,
+        sc_timeout=sc_timeout,
+        nb_threads=nb_threads,
         proxy=getattr(args, 'proxy', None),
         grpc_proxy=getattr(args, 'grpc_proxy', None),
         ca_cert=getattr(args, 'ca_cert', None),
         pac=getattr(args, 'pac', None),
     )
src/scanoss/utils/crc64.py (1)
74-78: Add missing docstring parameters.

The docstring for the checksum method is missing parameter and return type descriptions.
     def checksum(cls, data: bytes) -> int:
-        """Calculate CRC64 checksum for the given data."""
+        """Calculate CRC64 checksum for the given data.
+        
+        Args:
+            data (bytes): The data to calculate the checksum for.
+            
+        Returns:
+            int: The calculated CRC64 checksum.
+        """
         crc = cls()
         crc.update(data)
         return crc.digest()
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (1)
2-3: Strengthen the warning against manual edits.

For generated protobuf files, consider using a stronger warning message to prevent accidental modifications.
 # Generated by the protocol buffer compiler.  DO NOT EDIT!
-# NO CHECKED-IN PROTOBUF GENCODE
+# WARNING: This is an auto-generated file. Any manual changes will be overwritten!
+# To modify this file, update the proto definitions and regenerate using protoc.
CLIENT_HELP.md (1)

406-415: New Folder-Scan Subcommand Documentation
The new section describing the folder-scan subcommand is clear and informative. It concisely explains the purpose (generating directory-level fingerprints using CRC64 and simhash) and provides a usage example that matches the new functionality. Consider adding brief information on possible outputs or error handling if that information is available elsewhere for consistency.

CHANGELOG.md (1)

12-17: New Version Section [1.21.0] Entry
The changelog entry for version [1.21.0] is well-formatted and clearly lists the new features: the folder-scan subcommand, the AbstractPresenter class, and the helper functions for config construction. Ensure that the documentation in the CLI (and related modules) fully reflects these additions. Also, double-check for any repeated phrasing or verb redundancies—the static analysis hinted at a possible repetition in this area.

🧰 Tools

🪛 LanguageTool

[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-02-10 ### Added - Add folder-scan subcommand - Add AbstractPr...

(REPEATED_VERBS)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 24fa45b and f3a2acb.

📒 Files selected for processing (19)

CHANGELOG.md (2 hunks)
CLIENT_HELP.md (1 hunks)
pyproject.toml (1 hunks)
requirements.txt (1 hunks)
src/scanoss/__init__.py (1 hunks)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (2 hunks)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (3 hunks)
src/scanoss/cli.py (46 hunks)
src/scanoss/constants.py (1 hunks)
src/scanoss/file_filters.py (3 hunks)
src/scanoss/results.py (5 hunks)
src/scanoss/scanners/__init__.py (1 hunks)
src/scanoss/scanners/scanner_config.py (1 hunks)
src/scanoss/scanners/scanner_hfh.py (1 hunks)
src/scanoss/scanossbase.py (1 hunks)
src/scanoss/scanossgrpc.py (14 hunks)
src/scanoss/utils/abstract_presenter.py (1 hunks)
src/scanoss/utils/crc64.py (1 hunks)
src/scanoss/utils/simhash.py (1 hunks)

✅ Files skipped from review due to trivial changes (3)

src/scanoss/init.py
src/scanoss/scanners/init.py
requirements.txt

🧰 Additional context used

🪛 Ruff (0.8.2)

src/scanoss/file_filters.py

455-455: Too many return statements (7 > 6)

(PLR0911)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py

30-30: Line too long (1925 > 120)

(E501)

37-37: Line too long (391 > 120)

(E501)

39-39: Line too long (130 > 120)

(E501)

41-41: Line too long (144 > 120)

(E501)

src/scanoss/scanossgrpc.py

90-90: Too many statements (51 > 50)

(PLR0915)

495-495: Magic value used in comparison, consider replacing 2 with a constant variable

(PLR2004)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py

4-4: warnings imported but unused

Remove unused import: warnings

(F401)

7-7: Line too long (122 > 120)

(E501)

22-22: f-string without any placeholders

Remove extraneous f prefix

(F541)

98-98: Too many arguments in function definition (10 > 5)

(PLR0913)

125-125: Too many arguments in function definition (10 > 5)

(PLR0913)

🪛 GitHub Actions: Lint

src/scanoss/file_filters.py

[error] 243-243: Too many arguments in function definition (11 > 5).

[warning] 314-314: for loop variable dirpath overwritten by assignment target.

[error] 455-455: Too many return statements (7 > 6).

[error] 493-493: Too many return statements (7 > 6).

src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py

[warning] 7-7: Import block is un-sorted or un-formatted. Please organize imports.

[error] 25-25: Module level import not at top of file.

[warning] 25-25: Import block is un-sorted or un-formatted. Please organize imports.

[error] 25-25: scanoss.api.common.v2.scanoss_common_pb2 imported but unused.

[error] 26-26: Module level import not at top of file.

[warning] 26-26: google.api.annotations_pb2 imported but unused.

[error] 27-27: Module level import not at top of file.

[warning] 27-27: protoc_gen_swagger.options.annotations_pb2 imported but unused.

[error] 30-30: Line too long (1925 > 120).

[error] 37-37: Line too long (391 > 120).

[error] 39-39: Line too long (130 > 120).

[error] 41-41: Line too long (144 > 120).

src/scanoss/scanossgrpc.py

[error] 90-90: Too many statements (51 > 50).

[warning] 495-495: Magic value used in comparison, consider replacing 2 with a constant variable.

[warning] 498-498: Magic value used in comparison, consider replacing 3 with a constant variable.

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py

[warning] 3-3: Import block is un-sorted or un-formatted. Please organize imports.

[error] 4-4: warnings imported but unused.

[error] 7-7: Line too long (122 > 120).

[error] 22-22: f-string without any placeholders.

[error] 98-98: Too many arguments in function definition (10 > 5).

[error] 125-125: Too many arguments in function definition (10 > 5).

🪛 LanguageTool

CHANGELOG.md

[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-02-10 ### Added - Add folder-scan subcommand - Add AbstractPr...

(REPEATED_VERBS)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: build

🔇 Additional comments (17)

src/scanoss/utils/abstract_presenter.py (1)

45-46: Clarify fallback behavior for unrecognized formats.

Falling back to 'plain' format might mask user errors or unintended mismatches. You may wish to explicitly require a valid format or provide a clearer fallback warning.

src/scanoss/results.py (2)

83-84: Confirm constructor ordering for multiple inheritance.

When using multiple base classes, call to super().__init__() is typically recommended if both base classes are using cooperative inheritance. Double-check if direct calls to AbstractPresenter.__init__ and ScanossBase.__init__ in this order are required or if a single super().__init__() call suffices.

211-211: Return consistent types even for empty data.

_format_plain_output calls print_stderr() when self.data is empty, which might not return a string. Align return types with the docstring and consider returning a plain message string or raising an exception.

src/scanoss/scanners/scanner_hfh.py (1)

180-187: Validate the intended duplication of file references.

Each file appears to be appended to both the root node and its child node(s). If this design isn't intentional, consider only adding the file to the relevant child node. Otherwise, confirm that multiple references won’t affect hashing or cause duplicated results in downstream processes.

src/scanoss/scanossgrpc.py (2)

77-83: Good addition of a dedicated gRPC error class.

Defining a custom exception class like ScanossGrpcError provides clearer error contexts and helps unify exception handling. Great approach!

442-480: Centralized gRPC call handling is well-structured.

Using _call_rpc to unify request building, metadata insertion, and response parsing streamlines the code. Keep an eye on large request_input payloads; you may need advanced flow control or streaming RPC for extremely large directory scans.

src/scanoss/constants.py (1)

1-13: Constants are properly centralized.

Centralizing configuration values (e.g., DEFAULT_TIMEOUT, DEFAULT_RETRY) makes the code more maintainable. Consider referencing them consistently throughout the code to avoid scattering hard-coded values in multiple places.

src/scanoss/scanners/scanner_config.py (1)

38-54: LGTM! Well-structured configuration class.

The dataclass approach with proper type hints and default values is a clean way to manage scanner configuration.

src/scanoss/scanossbase.py (1)

83-91: LGTM! Parameter rename improves clarity.

The change from msg to content makes the parameter name more descriptive and consistent with its usage.

src/scanoss/utils/crc64.py (1)

29-73: LGTM! Well-implemented CRC64 ECMA algorithm.

The implementation is clean, efficient, and matches Go's hash/crc64 package. Good use of class methods and type hints.

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (2)

9-26: LGTM! Well-implemented version check.

The version check implementation is robust and provides clear error messages for version mismatches.

🧰 Tools

🪛 Ruff (0.8.2)

22-22: f-string without any placeholders

Remove extraneous f prefix

(F541)

🪛 GitHub Actions: Lint

[error] 22-22: f-string without any placeholders.

45-49: LGTM! Well-structured gRPC method implementation.

The FolderHashScan method is properly implemented and follows the same pattern as the existing Echo method, maintaining consistency in the codebase.

Also applies to: 64-69

src/scanoss/file_filters.py (1)

208-214: LGTM! New file extensions added.

The new file extensions are properly added to the DEFAULT_SKIPPED_EXT set.

src/scanoss/cli.py (3)

477-491: LGTM! Well-structured folder-scan command.

The new folder-scan subcommand is properly implemented with clear help messages and appropriate options.

494-507: LGTM! Consistent settings options.

Settings options are properly added to relevant commands with clear help messages.

1451-1485: LGTM! Robust error handling in folder-scan implementation.

The folder_hashing_scan function includes proper error handling and validation:

Validates required arguments

Checks directory existence

Handles gRPC errors

pyproject.toml (1)

11-16: LGTM! Improved exclude list formatting and coverage.

The exclude list is now more readable with multi-line formatting and properly excludes generated code paths.

src/scanoss/scanners/scanner_hfh.py

src/scanoss/utils/crc64.py

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py

coderabbitai

Actionable comments posted: 3

🔭 Outside diff range comments (1)

src/scanoss/results.py (1)

191-209: 🛠️ Refactor suggestion

Consider adding error handling for JSON serialization.

The _format_json_output method should handle potential JSON serialization errors.

 def _format_json_output(self) -> str:
     formatted_data = []
     for item in self.data:
         formatted_data.append({...})
-    return json.dumps({'results': formatted_data, 'total': len(formatted_data)}, indent=2)
+    try:
+        return json.dumps({'results': formatted_data, 'total': len(formatted_data)}, indent=2)
+    except Exception as e:
+        self.print_stderr(f'ERROR: Failed to serialize JSON: {e}')
+        raise

🧹 Nitpick comments (14)

.github/workflows/lint.yml (2)

35-38: Consider extracting exclude patterns from pyproject.toml.

The current implementation hardcodes exclude patterns in the workflow, which duplicates configuration from pyproject.toml and makes maintenance harder. Since this is marked as a temporary workaround, we should:

Track this technical debt in an issue

Consider parsing exclude patterns directly from pyproject.toml

Would you like me to:

Open an issue to track this technical debt?

Generate a Python script to parse exclude patterns from pyproject.toml?

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 35-35: trailing spaces

(trailing-spaces)

35-35: Fix minor formatting issues.

There are two formatting issues to address:

Line 35: Remove trailing spaces

Line 58: Remove extra blank line
-          # Filter out files that match exclude patterns from pyproject.toml 
+          # Filter out files that match exclude patterns from pyproject.toml
           # this is a temporary workaround until we fix all the lint errors
           filtered_files=$(echo "$files" | grep -v -E 'tests/|test_.*\.py|src/protoc_gen_swagger/|src/scanoss/api/' || true)

           # Use the multi-line syntax for outputs.
           echo "files<<EOF" >> "$GITHUB_OUTPUT"
           echo "${filtered_files}" >> "$GITHUB_OUTPUT"
           echo "EOF" >> "$GITHUB_OUTPUT"

           echo "Changed files before filtering: ${files}"
           echo "Changed files after filtering: ${filtered_files}"

           - name: Run Ruff on changed files
-

+
Also applies to: 58-58

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 35-35: trailing spaces

(trailing-spaces)

src/scanoss/scanners/scanner_config.py (1)

57-73: Add type hints and input validation.

Consider these improvements:

Add type hints for the args parameter.
Add validation for numeric fields to ensure they are positive.

-def create_scanner_config_from_args(args) -> ScannerConfig:
+from argparse import Namespace
+
+def create_scanner_config_from_args(args: Namespace) -> ScannerConfig:
+    post_size = getattr(args, 'post_size', DEFAULT_POST_SIZE)
+    timeout = getattr(args, 'timeout', DEFAULT_TIMEOUT)
+    sc_timeout = getattr(args, 'sc_timeout', DEFAULT_SC_TIMEOUT)
+    nb_threads = getattr(args, 'nb_threads', DEFAULT_NB_THREADS)
+    
+    if post_size <= 0:
+        raise ValueError("post_size must be positive")
+    if timeout <= 0:
+        raise ValueError("timeout must be positive")
+    if sc_timeout <= 0:
+        raise ValueError("sc_timeout must be positive")
+    if nb_threads <= 0:
+        raise ValueError("nb_threads must be positive")
+
     return ScannerConfig(
         debug=args.debug,
         trace=args.trace,
         quiet=args.quiet,
         api_key=getattr(args, 'key', None),
         url=getattr(args, 'api_url', None),
         grpc_url=getattr(args, 'grpc_url', None),
-        post_size=getattr(args, 'post_size', DEFAULT_POST_SIZE),
-        timeout=getattr(args, 'timeout', DEFAULT_TIMEOUT),
-        sc_timeout=getattr(args, 'sc_timeout', DEFAULT_SC_TIMEOUT),
-        nb_threads=getattr(args, 'nb_threads', DEFAULT_NB_THREADS),
+        post_size=post_size,
+        timeout=timeout,
+        sc_timeout=sc_timeout,
+        nb_threads=nb_threads,
         proxy=getattr(args, 'proxy', None),
         grpc_proxy=getattr(args, 'grpc_proxy', None),
         ca_cert=getattr(args, 'ca_cert', None),
         pac=getattr(args, 'pac', None),
     )

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (1)

19-25: Simplify version validation error message.

The error message construction can be simplified by removing unnecessary f-strings and concatenations.

     raise RuntimeError(
-        f'The grpc package installed is at version {GRPC_VERSION},'
-        + f' but the generated code in scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py depends on'
-        + f' grpcio>={GRPC_GENERATED_VERSION}.'
-        + f' Please upgrade your grpc module to grpcio>={GRPC_GENERATED_VERSION}'
-        + f' or downgrade your generated code using grpcio-tools<={GRPC_VERSION}.'
+        f'The grpc package installed is at version {GRPC_VERSION}, but the generated code in '
+        f'scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py depends on grpcio>={GRPC_GENERATED_VERSION}. '
+        f'Please upgrade your grpc module to grpcio>={GRPC_GENERATED_VERSION} or downgrade your '
+        f'generated code using grpcio-tools<={GRPC_VERSION}.'
     )

🧰 Tools

🪛 Ruff (0.8.2)

21-21: f-string without any placeholders

Remove extraneous f prefix

(F541)

src/scanoss/utils/simhash.py (2)

42-54: Consider adding value validation in SimhashFeature.

The class could benefit from input validation to ensure hash_value and weight are valid integers.

 def __init__(self, hash_value: int, weight: int = 1):
+    if not isinstance(hash_value, int):
+        raise TypeError("hash_value must be an integer")
+    if not isinstance(weight, int) or weight < 1:
+        raise ValueError("weight must be a positive integer")
     self.hash_value = hash_value
     self.weight = weight

66-81: Consider optimizing vectorize function.

The current implementation has O(n*64) complexity. For large feature sets, this could be optimized using numpy for better performance.

+import numpy as np
+
 def vectorize(features: list) -> list:
-    v = [0] * 64
-    for feature in features:
-        h = feature.sum()
-        w = feature.get_weight()
-        for i in range(64):
-            if ((h >> i) & 1) == 1:
-                v[i] += w
-            else:
-                v[i] -= w
-    return v
+    v = np.zeros(64, dtype=np.int64)
+    for feature in features:
+        h = feature.sum()
+        w = feature.get_weight()
+        v += w * (2 * ((h >> np.arange(64)) & 1) - 1)
+    return v.tolist()

src/scanoss/results.py (1)

53-87: Consider using composition over multiple inheritance.

The class inherits from both AbstractPresenter and ScanossBase, which could lead to the diamond problem. Consider using composition instead.

-class Results(AbstractPresenter, ScanossBase):
+class Results:
     def __init__(self, debug: bool = False, ...):
-        AbstractPresenter.__init__(self, output_file=output_file, output_format=output_format)
-        ScanossBase.__init__(self, debug, trace, quiet)
+        self.presenter = AbstractPresenter(output_file=output_file, output_format=output_format)
+        self.base = ScanossBase(debug, trace, quiet)

src/scanoss/file_filters.py (1)

207-213: Consider sorting the file extensions alphabetically.

The newly added extensions break the alphabetical ordering of the list. Consider reordering for better maintainability.
-    '.whml',
-    '.pom',
-    '.smtml',
-    '.min.js',
-    '.mf',
-    '.base64',
-    '.s',
+    '.base64',
+    '.mf',
+    '.min.js',
+    '.pom',
+    '.s',
+    '.smtml',
+    '.whml',

src/scanoss/scanossgrpc.py (2)

78-93: Consider adding error codes to ScanossGrpcError.

The custom exception could be enhanced with error codes for better error handling by clients.

 class ScanossGrpcError(Exception):
+    def __init__(self, message: str, code: int = None):
+        super().__init__(message)
+        self.code = code

571-599: Consider adding validation to GrpcConfig.

The GrpcConfig dataclass could benefit from runtime validation of its fields.

+from dataclasses import dataclass, field
+from typing import Optional
+
 @dataclass
 class GrpcConfig:
+    def __post_init__(self):
+        if not isinstance(self.url, str):
+            raise TypeError("url must be a string")
+        if self.timeout is not None and (not isinstance(self.timeout, int) or self.timeout < 0):
+            raise ValueError("timeout must be a non-negative integer")

src/scanoss/cli.py (2)

477-491: Consider enhancing input validation for the folder-scan command.

While the command setup follows the established pattern, consider adding validation for:

Maximum directory depth to prevent excessive recursion
Directory size limits to prevent memory issues
File type restrictions for better performance

 p_folder_scan = subparsers.add_parser(
     'folder-scan',
     description=f'Scan the given directory using folder hashing: {__version__}',
     help='Scan the given directory using folder hashing',
 )
 p_folder_scan.add_argument('scan_dir', metavar='FILE/DIR', type=str, nargs='?', help='The root directory to scan')
+p_folder_scan.add_argument('--max-depth', type=int, default=10, help='Maximum directory depth to scan')
+p_folder_scan.add_argument('--max-size', type=str, default='1GB', help='Maximum total directory size to scan')
+p_folder_scan.add_argument('--file-types', type=str, help='Comma-separated list of file types to scan')
 p_folder_scan.add_argument('--output', '-o', type=str, help='Output result file name (optional - default stdout).')

1487-1496: Add input validation for settings configuration.

Consider validating the settings configuration before applying it:

Check for required fields
Validate setting values are within acceptable ranges
Verify setting combinations are valid

 def get_scanoss_settings_from_args(args):
     scanoss_settings = None
     if not args.skip_settings_file:
         scanoss_settings = ScanossSettings(debug=args.debug, trace=args.trace, quiet=args.quiet)
         try:
+            # Validate settings file exists and is readable
+            if args.settings and not os.path.isfile(args.settings):
+                raise ScanossSettingsError(f"Settings file not found: {args.settings}")
+            if args.settings and not os.access(args.settings, os.R_OK):
+                raise ScanossSettingsError(f"Settings file not readable: {args.settings}")
+
             scanoss_settings.load_json_file(args.settings, args.scan_dir).set_file_type('new').set_scan_type('identify')
+
+            # Validate loaded settings
+            if not scanoss_settings.validate():
+                raise ScanossSettingsError("Invalid settings configuration")
+
         except ScanossSettingsError as e:
             print_stderr(f'Error: {e}')
             sys.exit(1)
         return scanoss_settings

CLIENT_HELP.md (1)

408-415: Enhance folder-scan documentation with more examples and best practices.

The current documentation provides basic usage but would benefit from:

More detailed examples showing different output formats
Best practices for scanning large directories
Performance considerations and recommendations

 ### Folder-Scan a Project Folder

 The new `folder-scan` subcommand performs a comprehensive scan on an entire directory by recursively processing files to generate folder-level fingerprints. It computes CRC64 hashes and simhash values to detect directory-level similarities, which is especially useful for comparing large code bases or detecting duplicate folder structures.

 **Usage:**
 ```shell
 scanoss-py folder-scan /path/to/folder -o folder-scan-results.json 
+
+# Scan with JSON output format
+scanoss-py folder-scan /path/to/folder -o results.json --format json
+
+# Best practices for large directories
+scanoss-py folder-scan /path/to/folder --max-depth 5 --max-size 1GB -o results.json
+
+# Performance optimization example
+scanoss-py folder-scan /path/to/folder --file-types=".py,.js,.java" -o results.json

+Performance Considerations:
+- Limit directory depth for better performance
+- Use file type filtering for focused scans
+- Monitor memory usage for large directories


</blockquote></details>
<details>
<summary>CHANGELOG.md (1)</summary><blockquote>

`12-16`: **Enhance changelog entries with more details.**

While the changelog follows the Keep a Changelog format, consider adding:
1. More detailed descriptions of the new features
2. Breaking changes section if applicable
3. Migration instructions if needed

```diff
 ## [1.21.0] - 2025-02-10
 ### Added
-Add folder-scan subcommand
-Add AbstractPresenter class for presenting output in a given format
-Add several reusable helper functions for constructing config objects from CLI args
+- Add folder-scan subcommand for comprehensive directory scanning
+  - Support for CRC64 hash computation
+  - Support for simhash-based similarity detection
+  - Configurable scan depth and file type filtering
+
+- Add AbstractPresenter class for flexible output formatting
+  - Support for plain text and JSON output
+  - Extensible design for future formats
+
+- Add reusable configuration helpers
+  - Simplified CLI argument handling
+  - Improved error handling and validation
+
+### Breaking Changes
+- None
+
+### Migration
+- No migration needed for existing functionality

🧰 Tools

🪛 LanguageTool

[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-02-10 ### Added - Add folder-scan subcommand - Add AbstractPr...

(REPEATED_VERBS)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f3a2acb and 86833bc.

📒 Files selected for processing (21)

.github/workflows/lint.yml (2 hunks)
CHANGELOG.md (2 hunks)
CLIENT_HELP.md (1 hunks)
pyproject.toml (1 hunks)
requirements.txt (1 hunks)
src/scanoss/__init__.py (1 hunks)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (2 hunks)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (3 hunks)
src/scanoss/cli.py (46 hunks)
src/scanoss/constants.py (1 hunks)
src/scanoss/file_filters.py (5 hunks)
src/scanoss/results.py (5 hunks)
src/scanoss/scanners/__init__.py (1 hunks)
src/scanoss/scanners/scanner_config.py (1 hunks)
src/scanoss/scanners/scanner_hfh.py (1 hunks)
src/scanoss/scanossbase.py (1 hunks)
src/scanoss/scanossgrpc.py (14 hunks)
src/scanoss/utils/abstract_presenter.py (1 hunks)
src/scanoss/utils/crc64.py (1 hunks)
src/scanoss/utils/simhash.py (1 hunks)
version.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (9)

src/scanoss/init.py
requirements.txt
src/scanoss/scanossbase.py
pyproject.toml
src/scanoss/scanners/init.py
src/scanoss/constants.py
src/scanoss/utils/abstract_presenter.py
src/scanoss/utils/crc64.py
src/scanoss/scanners/scanner_hfh.py

🧰 Additional context used

🪛 YAMLlint (1.35.1)

.github/workflows/lint.yml

[error] 35-35: trailing spaces

(trailing-spaces)

[warning] 58-58: too many blank lines

(1 > 0) (empty-lines)

🪛 Ruff (0.8.2)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py

30-30: Line too long (1925 > 120)

(E501)

37-37: Line too long (391 > 120)

(E501)

39-39: Line too long (130 > 120)

(E501)

41-41: Line too long (144 > 120)

(E501)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py

6-6: Line too long (122 > 120)

(E501)

21-21: f-string without any placeholders

Remove extraneous f prefix

(F541)

97-97: Too many arguments in function definition (10 > 5)

(PLR0913)

124-124: Too many arguments in function definition (10 > 5)

(PLR0913)

🪛 LanguageTool

CHANGELOG.md

[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-02-10 ### Added - Add folder-scan subcommand - Add AbstractPr...

(REPEATED_VERBS)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: build

🔇 Additional comments (9)

.github/workflows/lint.yml (2)

41-41: LGTM!

The filtered files are correctly written to the GitHub Actions output variable.

44-45: LGTM! Great addition of debug logging.

The added logging improves workflow transparency by showing which files were excluded during filtering.

version.py (1)

26-27: LGTM! Clean refactoring.

The changes improve code organization by grouping imports together and simplify error handling logic by removing the unnecessary else clause.

Also applies to: 51-51

src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (1)

1-55: LGTM! Generated code.

This is auto-generated protobuf code and should not be manually edited. The long lines flagged by static analysis are expected in generated code.

🧰 Tools

🪛 Ruff (0.8.2)

25-25: Module level import not at top of file

(E402)

25-25: scanoss.api.common.v2.scanoss_common_pb2 imported but unused

Remove unused import: scanoss.api.common.v2.scanoss_common_pb2

(F401)

26-26: Module level import not at top of file

(E402)

26-26: google.api.annotations_pb2 imported but unused

Remove unused import: google.api.annotations_pb2

(F401)

27-27: Module level import not at top of file

(E402)

27-27: protoc_gen_swagger.options.annotations_pb2 imported but unused

Remove unused import: protoc_gen_swagger.options.annotations_pb2

(F401)

30-30: Line too long (1925 > 120)

(E501)

37-37: Line too long (391 > 120)

(E501)

39-39: Line too long (130 > 120)

(E501)

41-41: Line too long (144 > 120)

(E501)

src/scanoss/utils/simhash.py (1)

33-39: LGTM! Well-implemented FNV-1 hash function.

The implementation of FNV-1 hash algorithm is correct and follows the standard specification. The use of bitwise operations and masking ensures proper 64-bit integer handling.

src/scanoss/results.py (1)

210-224: LGTM! Clear and consistent plain text formatting.

The plain text output formatting is well-structured and handles edge cases appropriately.

src/scanoss/file_filters.py (1)

441-478: LGTM! Well-documented method with comprehensive checks.

The should_skip_dir method is well-documented and includes thorough validation logic for directory skipping.

src/scanoss/scanossgrpc.py (1)

435-490: LGTM! Well-implemented gRPC call abstraction.

The _call_rpc helper method effectively reduces code duplication and provides consistent error handling across all gRPC calls.

src/scanoss/cli.py (1)

28-47: LGTM! Well-organized imports.

The imports are properly organized and follow Python best practices. The use of dataclasses for configuration objects and centralized constants improves maintainability.

src/scanoss/utils/simhash.py

src/scanoss/cli.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (13)

src/scanoss/api/provenance/v2/scanoss_provenance_pb2.py (1)
23-23: Fix boolean comparison style.

Replace the equality comparison to False with a not expression for better Python style.
-if _descriptor._USE_C_DESCRIPTORS == False:
+if not _descriptor._USE_C_DESCRIPTORS:
🧰 Tools

🪛 Ruff (0.8.2)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)
src/scanoss/api/components/v2/scanoss_components_pb2.py (1)
23-23: Consider using is operator for boolean comparison.

Replace == False with is False or simply use not for better Python idioms.
-if _descriptor._USE_C_DESCRIPTORS == False:
+if not _descriptor._USE_C_DESCRIPTORS:
🧰 Tools

🪛 Ruff (0.8.2)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py (1)
23-23: Style: Improve boolean comparison.

Replace == False with is False or simply not for better Python idioms.
-if _descriptor._USE_C_DESCRIPTORS == False:
+if not _descriptor._USE_C_DESCRIPTORS:
🧰 Tools

🪛 Ruff (0.8.2)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py (2)

19-19: Avoid manual formatting changes to generated code.
Line (4142 > 120) is triggering E501. Since this is auto-generated, it's generally recommended to exclude such files from lint checks rather than modify them manually.

🧰 Tools

🪛 Ruff (0.8.2)

19-19: Line too long (4142 > 120)

(E501)

25-26: Long line in auto-generated code.
Line (400 > 120) triggers E501. Similarly, consider ignoring lint errors for protoc-generated files instead of manually editing them.

🧰 Tools

🪛 Ruff (0.8.2)

26-26: Line too long (400 > 120)

(E501)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (2)

19-19: Long line in generated code.
Line (1925 > 120) triggers E501. Because this is generated, best practice is to exclude or ignore it in linting.

🧰 Tools

🪛 Ruff (0.8.2)

19-19: Line too long (1925 > 120)

(E501)

25-26: Another long line in generated code.
Line (379 > 120) triggers E501. Consider ignoring or excluding these proto-generated files from lint checks.

🧰 Tools

🪛 Ruff (0.8.2)

26-26: Line too long (379 > 120)

(E501)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (6)

6-6: Line too long (122 > 120).
Since this is partially auto-generated, consider ignoring or suppressing lint errors here.

🧰 Tools

🪛 Ruff (0.8.2)

6-6: Line too long (122 > 120)

(E501)

33-42: Unimplemented Echo method.
Currently raises NotImplementedError. If this is intentional, consider adding a docstring explaining why or removing it if not needed.

44-49: FolderHashScan method not implemented.
Same pattern as Echo—either implement it fully or clarify with a docstring.

70-74: Experimental API docstring.
Provide clarity if this is production-ready or restricted usage.

77-92: Method defines too many parameters (10 > 5).
Python style recommends fewer parameters. Try grouping into a config object or reduce arguments if possible.

🧰 Tools

🪛 Ruff (0.8.2)

77-77: Too many arguments in function definition (10 > 5)

(PLR0913)

94-109: Excessive parameters in FolderHashScan.
Same concern as above; consider consolidating arguments for readability and maintainability.

🧰 Tools

🪛 Ruff (0.8.2)

94-94: Too many arguments in function definition (10 > 5)

(PLR0913)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 86833bc and ee6769b.

📒 Files selected for processing (21)

src/protoc_gen_swagger/options/annotations_pb2.py (1 hunks)
src/protoc_gen_swagger/options/annotations_pb2_grpc.py (1 hunks)
src/protoc_gen_swagger/options/openapiv2_pb2.py (1 hunks)
src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py (1 hunks)
src/scanoss/api/common/v2/scanoss_common_pb2.py (1 hunks)
src/scanoss/api/common/v2/scanoss_common_pb2_grpc.py (1 hunks)
src/scanoss/api/components/v2/scanoss_components_pb2.py (1 hunks)
src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (3 hunks)
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py (1 hunks)
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (3 hunks)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py (1 hunks)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (3 hunks)
src/scanoss/api/provenance/v2/scanoss_provenance_pb2.py (2 hunks)
src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (1 hunks)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (1 hunks)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (2 hunks)
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py (1 hunks)
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (2 hunks)
src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py (1 hunks)
src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (3 hunks)
src/scanoss/utils/crc64.py (1 hunks)

✅ Files skipped from review due to trivial changes (4)

src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py
src/protoc_gen_swagger/options/annotations_pb2_grpc.py
src/scanoss/api/common/v2/scanoss_common_pb2_grpc.py
src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/scanoss/utils/crc64.py

🧰 Additional context used

🪛 Ruff (0.8.2)

src/scanoss/api/provenance/v2/scanoss_provenance_pb2.py

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (389 > 120)

(E501)

27-27: Undefined name _PROVENANCE

(F821)

28-28: Undefined name _PROVENANCE

(F821)

28-28: Line too long (122 > 120)

(E501)

29-29: Undefined name _PROVENANCE

(F821)

30-30: Undefined name _PROVENANCE

(F821)

30-30: Line too long (142 > 120)

(E501)

31-31: Undefined name _PROVENANCERESPONSE

(F821)

32-32: Undefined name _PROVENANCERESPONSE

(F821)

33-33: Undefined name _PROVENANCERESPONSE_DECLAREDLOCATION

(F821)

34-34: Undefined name _PROVENANCERESPONSE_DECLAREDLOCATION

(F821)

35-35: Undefined name _PROVENANCERESPONSE_CURATEDLOCATION

(F821)

36-36: Undefined name _PROVENANCERESPONSE_CURATEDLOCATION

(F821)

37-37: Undefined name _PROVENANCERESPONSE_PURLS

(F821)

38-38: Undefined name _PROVENANCERESPONSE_PURLS

(F821)

39-39: Undefined name _PROVENANCE

(F821)

40-40: Undefined name _PROVENANCE

(F821)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py

19-19: Line too long (2471 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (398 > 120)

(E501)

27-27: Undefined name _DEPENDENCIES

(F821)

28-28: Undefined name _DEPENDENCIES

(F821)

28-28: Line too long (126 > 120)

(E501)

29-29: Undefined name _DEPENDENCIES

(F821)

30-30: Undefined name _DEPENDENCIES

(F821)

30-30: Line too long (139 > 120)

(E501)

31-31: Undefined name _DEPENDENCYREQUEST

(F821)

32-32: Undefined name _DEPENDENCYREQUEST

(F821)

33-33: Undefined name _DEPENDENCYREQUEST_PURLS

(F821)

34-34: Undefined name _DEPENDENCYREQUEST_PURLS

(F821)

35-35: Undefined name _DEPENDENCYREQUEST_FILES

(F821)

36-36: Undefined name _DEPENDENCYREQUEST_FILES

(F821)

37-37: Undefined name _DEPENDENCYRESPONSE

(F821)

38-38: Undefined name _DEPENDENCYRESPONSE

(F821)

39-39: Undefined name _DEPENDENCYRESPONSE_LICENSES

(F821)

40-40: Undefined name _DEPENDENCYRESPONSE_LICENSES

(F821)

41-41: Undefined name _DEPENDENCYRESPONSE_DEPENDENCIES

(F821)

42-42: Undefined name _DEPENDENCYRESPONSE_DEPENDENCIES

(F821)

43-43: Undefined name _DEPENDENCYRESPONSE_FILES

(F821)

44-44: Undefined name _DEPENDENCYRESPONSE_FILES

(F821)

45-45: Undefined name _DEPENDENCIES

(F821)

46-46: Undefined name _DEPENDENCIES

(F821)

src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py

19-19: Line too long (2566 > 120)

(E501)

22-22: Line too long (124 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (419 > 120)

(E501)

27-27: Undefined name _VULNERABILITIES

(F821)

28-28: Undefined name _VULNERABILITIES

(F821)

28-28: Line too long (129 > 120)

(E501)

29-29: Undefined name _VULNERABILITIES

(F821)

30-30: Undefined name _VULNERABILITIES

(F821)

30-30: Line too long (132 > 120)

(E501)

31-31: Undefined name _VULNERABILITIES

(F821)

32-32: Undefined name _VULNERABILITIES

(F821)

32-32: Line too long (152 > 120)

(E501)

33-33: Undefined name _VULNERABILITYREQUEST

(F821)

34-34: Undefined name _VULNERABILITYREQUEST

(F821)

35-35: Undefined name _VULNERABILITYREQUEST_PURLS

(F821)

36-36: Undefined name _VULNERABILITYREQUEST_PURLS

(F821)

37-37: Undefined name _CPERESPONSE

(F821)

38-38: Undefined name _CPERESPONSE

(F821)

39-39: Undefined name _CPERESPONSE_PURLS

(F821)

40-40: Undefined name _CPERESPONSE_PURLS

(F821)

41-41: Undefined name _VULNERABILITYRESPONSE

(F821)

42-42: Undefined name _VULNERABILITYRESPONSE

(F821)

43-43: Undefined name _VULNERABILITYRESPONSE_VULNERABILITIES

(F821)

44-44: Undefined name _VULNERABILITYRESPONSE_VULNERABILITIES

(F821)

45-45: Undefined name _VULNERABILITYRESPONSE_PURLS

(F821)

46-46: Undefined name _VULNERABILITYRESPONSE_PURLS

(F821)

47-47: Undefined name _VULNERABILITIES

(F821)

48-48: Undefined name _VULNERABILITIES

(F821)

src/protoc_gen_swagger/options/annotations_pb2.py

18-18: Line too long (1009 > 120)

(E501)

22-22: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

23-23: Undefined name openapiv2_swagger

(F821)

24-24: Undefined name openapiv2_operation

(F821)

25-25: Undefined name openapiv2_schema

(F821)

26-26: Undefined name openapiv2_tag

(F821)

27-27: Undefined name openapiv2_field

(F821)

src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py

19-19: Line too long (4142 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (400 > 120)

(E501)

27-27: Undefined name _CRYPTOGRAPHY

(F821)

28-28: Undefined name _CRYPTOGRAPHY

(F821)

28-28: Line too long (126 > 120)

(E501)

29-29: Undefined name _CRYPTOGRAPHY

(F821)

30-30: Undefined name _CRYPTOGRAPHY

(F821)

30-30: Line too long (138 > 120)

(E501)

31-31: Undefined name _CRYPTOGRAPHY

(F821)

32-32: Undefined name _CRYPTOGRAPHY

(F821)

32-32: Line too long (149 > 120)

(E501)

33-33: Undefined name _CRYPTOGRAPHY

(F821)

34-34: Undefined name _CRYPTOGRAPHY

(F821)

34-34: Line too long (145 > 120)

(E501)

35-35: Undefined name _CRYPTOGRAPHY

(F821)

36-36: Undefined name _CRYPTOGRAPHY

(F821)

36-36: Line too long (139 > 120)

(E501)

37-37: Undefined name _CRYPTOGRAPHY

(F821)

38-38: Undefined name _CRYPTOGRAPHY

(F821)

38-38: Line too long (141 > 120)

(E501)

39-39: Undefined name _ALGORITHM

(F821)

40-40: Undefined name _ALGORITHM

(F821)

41-41: Undefined name _ALGORITHMRESPONSE

(F821)

42-42: Undefined name _ALGORITHMRESPONSE

(F821)

43-43: Undefined name _ALGORITHMRESPONSE_PURLS

(F821)

44-44: Undefined name _ALGORITHMRESPONSE_PURLS

(F821)

45-45: Undefined name _ALGORITHMSINRANGERESPONSE

(F821)

46-46: Undefined name _ALGORITHMSINRANGERESPONSE

(F821)

47-47: Undefined name _ALGORITHMSINRANGERESPONSE_PURL

(F821)

48-48: Undefined name _ALGORITHMSINRANGERESPONSE_PURL

(F821)

49-49: Undefined name _VERSIONSINRANGERESPONSE

(F821)

50-50: Undefined name _VERSIONSINRANGERESPONSE

(F821)

51-51: Undefined name _VERSIONSINRANGERESPONSE_PURL

(F821)

52-52: Undefined name _VERSIONSINRANGERESPONSE_PURL

(F821)

53-53: Undefined name _HINT

(F821)

54-54: Undefined name _HINT

(F821)

55-55: Undefined name _HINTSRESPONSE

(F821)

56-56: Undefined name _HINTSRESPONSE

(F821)

57-57: Undefined name _HINTSRESPONSE_PURLS

(F821)

58-58: Undefined name _HINTSRESPONSE_PURLS

(F821)

59-59: Undefined name _HINTSINRANGERESPONSE

(F821)

60-60: Undefined name _HINTSINRANGERESPONSE

(F821)

61-61: Undefined name _HINTSINRANGERESPONSE_PURL

(F821)

62-62: Undefined name _HINTSINRANGERESPONSE_PURL

(F821)

63-63: Undefined name _CRYPTOGRAPHY

(F821)

64-64: Undefined name _CRYPTOGRAPHY

(F821)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py

19-19: Line too long (1925 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (379 > 120)

(E501)

27-27: Undefined name _SCANNING

(F821)

28-28: Undefined name _SCANNING

(F821)

29-29: Undefined name _SCANNING

(F821)

30-30: Undefined name _SCANNING

(F821)

30-30: Line too long (132 > 120)

(E501)

31-31: Undefined name _HFHREQUEST

(F821)

32-32: Undefined name _HFHREQUEST

(F821)

33-33: Undefined name _HFHREQUEST_CHILDREN

(F821)

34-34: Undefined name _HFHREQUEST_CHILDREN

(F821)

35-35: Undefined name _HFHRESPONSE

(F821)

36-36: Undefined name _HFHRESPONSE

(F821)

37-37: Undefined name _HFHRESPONSE_COMPONENT

(F821)

38-38: Undefined name _HFHRESPONSE_COMPONENT

(F821)

39-39: Undefined name _HFHRESPONSE_RESULT

(F821)

40-40: Undefined name _HFHRESPONSE_RESULT

(F821)

41-41: Undefined name _SCANNING

(F821)

42-42: Undefined name _SCANNING

(F821)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py

6-6: Line too long (122 > 120)

(E501)

77-77: Too many arguments in function definition (10 > 5)

(PLR0913)

94-94: Too many arguments in function definition (10 > 5)

(PLR0913)

src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py

6-6: Line too long (138 > 120)

(E501)

79-79: Line too long (123 > 120)

(E501)

145-145: Too many arguments in function definition (10 > 5)

(PLR0913)

162-162: Too many arguments in function definition (10 > 5)

(PLR0913)

179-179: Too many arguments in function definition (10 > 5)

(PLR0913)

189-189: Line too long (127 > 120)

(E501)

196-196: Too many arguments in function definition (10 > 5)

(PLR0913)

206-206: Line too long (125 > 120)

(E501)

213-213: Too many arguments in function definition (10 > 5)

(PLR0913)

223-223: Line too long (122 > 120)

(E501)

230-230: Too many arguments in function definition (10 > 5)

(PLR0913)

240-240: Line too long (125 > 120)

(E501)

src/scanoss/api/components/v2/scanoss_components_pb2.py

19-19: Line too long (3679 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (389 > 120)

(E501)

27-27: Undefined name _COMPONENTS

(F821)

28-28: Undefined name _COMPONENTS

(F821)

28-28: Line too long (122 > 120)

(E501)

29-29: Undefined name _COMPONENTS

(F821)

30-30: Undefined name _COMPONENTS

(F821)

30-30: Line too long (136 > 120)

(E501)

31-31: Undefined name _COMPONENTS

(F821)

32-32: Undefined name _COMPONENTS

(F821)

32-32: Line too long (139 > 120)

(E501)

33-33: Undefined name _COMPONENTS

(F821)

34-34: Undefined name _COMPONENTS

(F821)

34-34: Line too long (144 > 120)

(E501)

35-35: Undefined name _COMPSEARCHREQUEST

(F821)

36-36: Undefined name _COMPSEARCHREQUEST

(F821)

37-37: Undefined name _COMPSTATISTIC

(F821)

38-38: Undefined name _COMPSTATISTIC

(F821)

39-39: Undefined name _COMPSTATISTIC_LANGUAGE

(F821)

40-40: Undefined name _COMPSTATISTIC_LANGUAGE

(F821)

41-41: Undefined name _COMPSTATISTICRESPONSE

(F821)

42-42: Undefined name _COMPSTATISTICRESPONSE

(F821)

43-43: Undefined name _COMPSTATISTICRESPONSE_PURLS

(F821)

44-44: Undefined name _COMPSTATISTICRESPONSE_PURLS

(F821)

45-45: Undefined name _COMPSEARCHRESPONSE

(F821)

46-46: Undefined name _COMPSEARCHRESPONSE

(F821)

47-47: Undefined name _COMPSEARCHRESPONSE_COMPONENT

(F821)

48-48: Undefined name _COMPSEARCHRESPONSE_COMPONENT

(F821)

49-49: Undefined name _COMPVERSIONREQUEST

(F821)

50-50: Undefined name _COMPVERSIONREQUEST

(F821)

51-51: Undefined name _COMPVERSIONRESPONSE

(F821)

52-52: Undefined name _COMPVERSIONRESPONSE

(F821)

53-53: Undefined name _COMPVERSIONRESPONSE_LICENSE

(F821)

54-54: Undefined name _COMPVERSIONRESPONSE_LICENSE

(F821)

55-55: Undefined name _COMPVERSIONRESPONSE_VERSION

(F821)

56-56: Undefined name _COMPVERSIONRESPONSE_VERSION

(F821)

57-57: Undefined name _COMPVERSIONRESPONSE_COMPONENT

(F821)

58-58: Undefined name _COMPVERSIONRESPONSE_COMPONENT

(F821)

59-59: Undefined name _COMPONENTS

(F821)

60-60: Undefined name _COMPONENTS

(F821)

src/protoc_gen_swagger/options/openapiv2_pb2.py

18-18: Line too long (9607 > 120)

(E501)

22-22: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Undefined name _SWAGGER_RESPONSESENTRY

(F821)

27-27: Undefined name _SWAGGER_RESPONSESENTRY

(F821)

28-28: Undefined name _SWAGGER_EXTENSIONSENTRY

(F821)

29-29: Undefined name _SWAGGER_EXTENSIONSENTRY

(F821)

30-30: Undefined name _OPERATION_RESPONSESENTRY

(F821)

31-31: Undefined name _OPERATION_RESPONSESENTRY

(F821)

32-32: Undefined name _OPERATION_EXTENSIONSENTRY

(F821)

33-33: Undefined name _OPERATION_EXTENSIONSENTRY

(F821)

34-34: Undefined name _RESPONSE_HEADERSENTRY

(F821)

35-35: Undefined name _RESPONSE_HEADERSENTRY

(F821)

36-36: Undefined name _RESPONSE_EXAMPLESENTRY

(F821)

37-37: Undefined name _RESPONSE_EXAMPLESENTRY

(F821)

38-38: Undefined name _RESPONSE_EXTENSIONSENTRY

(F821)

39-39: Undefined name _RESPONSE_EXTENSIONSENTRY

(F821)

40-40: Undefined name _INFO_EXTENSIONSENTRY

(F821)

41-41: Undefined name _INFO_EXTENSIONSENTRY

(F821)

42-42: Undefined name _SCHEMA

(F821)

43-43: Undefined name _SCHEMA

(F821)

44-44: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY

(F821)

45-45: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY

(F821)

46-46: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY

(F821)

47-47: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY

(F821)

48-48: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY

(F821)

49-49: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY

(F821)

50-50: Undefined name _SCOPES_SCOPEENTRY

(F821)

51-51: Undefined name _SCOPES_SCOPEENTRY

(F821)

52-52: Undefined name _SWAGGER

(F821)

53-53: Undefined name _SWAGGER

(F821)

54-54: Undefined name _SWAGGER_RESPONSESENTRY

(F821)

55-55: Undefined name _SWAGGER_RESPONSESENTRY

(F821)

56-56: Undefined name _SWAGGER_EXTENSIONSENTRY

(F821)

57-57: Undefined name _SWAGGER_EXTENSIONSENTRY

(F821)

58-58: Undefined name _SWAGGER_SWAGGERSCHEME

(F821)

59-59: Undefined name _SWAGGER_SWAGGERSCHEME

(F821)

60-60: Undefined name _OPERATION

(F821)

61-61: Undefined name _OPERATION

(F821)

62-62: Undefined name _OPERATION_RESPONSESENTRY

(F821)

63-63: Undefined name _OPERATION_RESPONSESENTRY

(F821)

64-64: Undefined name _OPERATION_EXTENSIONSENTRY

(F821)

65-65: Undefined name _OPERATION_EXTENSIONSENTRY

(F821)

66-66: Undefined name _HEADER

(F821)

67-67: Undefined name _HEADER

(F821)

68-68: Undefined name _RESPONSE

(F821)

69-69: Undefined name _RESPONSE

(F821)

70-70: Undefined name _RESPONSE_HEADERSENTRY

(F821)

71-71: Undefined name _RESPONSE_HEADERSENTRY

(F821)

72-72: Undefined name _RESPONSE_EXAMPLESENTRY

(F821)

73-73: Undefined name _RESPONSE_EXAMPLESENTRY

(F821)

74-74: Undefined name _RESPONSE_EXTENSIONSENTRY

(F821)

75-75: Undefined name _RESPONSE_EXTENSIONSENTRY

(F821)

76-76: Undefined name _INFO

(F821)

77-77: Undefined name _INFO

(F821)

78-78: Undefined name _INFO_EXTENSIONSENTRY

(F821)

79-79: Undefined name _INFO_EXTENSIONSENTRY

(F821)

80-80: Undefined name _CONTACT

(F821)

81-81: Undefined name _CONTACT

(F821)

82-82: Undefined name _LICENSE

(F821)

83-83: Undefined name _LICENSE

(F821)

84-84: Undefined name _EXTERNALDOCUMENTATION

(F821)

85-85: Undefined name _EXTERNALDOCUMENTATION

(F821)

86-86: Undefined name _SCHEMA

(F821)

87-87: Undefined name _SCHEMA

(F821)

88-88: Undefined name _JSONSCHEMA

(F821)

89-89: Undefined name _JSONSCHEMA

(F821)

90-90: Undefined name _JSONSCHEMA_JSONSCHEMASIMPLETYPES

(F821)

91-91: Undefined name _JSONSCHEMA_JSONSCHEMASIMPLETYPES

(F821)

92-92: Undefined name _TAG

(F821)

93-93: Undefined name _TAG

(F821)

94-94: Undefined name _SECURITYDEFINITIONS

(F821)

95-95: Undefined name _SECURITYDEFINITIONS

(F821)

96-96: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY

(F821)

97-97: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY

(F821)

98-98: Undefined name _SECURITYSCHEME

(F821)

99-99: Undefined name _SECURITYSCHEME

(F821)

100-100: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY

(F821)

101-101: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY

(F821)

102-102: Undefined name _SECURITYSCHEME_TYPE

(F821)

103-103: Undefined name _SECURITYSCHEME_TYPE

(F821)

104-104: Undefined name _SECURITYSCHEME_IN

(F821)

105-105: Undefined name _SECURITYSCHEME_IN

(F821)

106-106: Undefined name _SECURITYSCHEME_FLOW

(F821)

107-107: Undefined name _SECURITYSCHEME_FLOW

(F821)

108-108: Undefined name _SECURITYREQUIREMENT

(F821)

109-109: Undefined name _SECURITYREQUIREMENT

(F821)

110-110: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTVALUE

(F821)

111-111: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTVALUE

(F821)

112-112: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY

(F821)

113-113: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY

(F821)

114-114: Undefined name _SCOPES

(F821)

115-115: Undefined name _SCOPES

(F821)

116-116: Undefined name _SCOPES_SCOPEENTRY

(F821)

117-117: Undefined name _SCOPES_SCOPEENTRY

(F821)

src/scanoss/api/common/v2/scanoss_common_pb2.py

16-16: Line too long (845 > 120)

(E501)

20-20: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

24-24: Undefined name _STATUSCODE

(F821)

25-25: Undefined name _STATUSCODE

(F821)

26-26: Undefined name _STATUSRESPONSE

(F821)

27-27: Undefined name _STATUSRESPONSE

(F821)

28-28: Undefined name _ECHOREQUEST

(F821)

29-29: Undefined name _ECHOREQUEST

(F821)

30-30: Undefined name _ECHORESPONSE

(F821)

31-31: Undefined name _ECHORESPONSE

(F821)

32-32: Undefined name _PURLREQUEST

(F821)

33-33: Undefined name _PURLREQUEST

(F821)

34-34: Undefined name _PURLREQUEST_PURLS

(F821)

35-35: Undefined name _PURLREQUEST_PURLS

(F821)

src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py

6-6: Line too long (130 > 120)

(E501)

111-111: Too many arguments in function definition (10 > 5)

(PLR0913)

128-128: Too many arguments in function definition (10 > 5)

(PLR0913)

145-145: Too many arguments in function definition (10 > 5)

(PLR0913)

155-155: Line too long (123 > 120)

(E501)

162-162: Too many arguments in function definition (10 > 5)

(PLR0913)

172-172: Line too long (125 > 120)

(E501)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py

6-6: Line too long (138 > 120)

(E501)

77-77: Too many arguments in function definition (10 > 5)

(PLR0913)

94-94: Too many arguments in function definition (10 > 5)

(PLR0913)

104-104: Line too long (122 > 120)

(E501)

src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py

19-19: Line too long (1718 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (375 > 120)

(E501)

27-27: Undefined name _SEMGREP

(F821)

28-28: Undefined name _SEMGREP

(F821)

29-29: Undefined name _SEMGREP

(F821)

30-30: Undefined name _SEMGREP

(F821)

30-30: Line too long (123 > 120)

(E501)

31-31: Undefined name _SEMGREPRESPONSE

(F821)

32-32: Undefined name _SEMGREPRESPONSE

(F821)

33-33: Undefined name _SEMGREPRESPONSE_ISSUE

(F821)

34-34: Undefined name _SEMGREPRESPONSE_ISSUE

(F821)

35-35: Undefined name _SEMGREPRESPONSE_FILE

(F821)

36-36: Undefined name _SEMGREPRESPONSE_FILE

(F821)

37-37: Undefined name _SEMGREPRESPONSE_PURLS

(F821)

38-38: Undefined name _SEMGREPRESPONSE_PURLS

(F821)

39-39: Undefined name _SEMGREP

(F821)

40-40: Undefined name _SEMGREP

(F821)

src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py

77-77: Too many arguments in function definition (10 > 5)

(PLR0913)

94-94: Too many arguments in function definition (10 > 5)

(PLR0913)

src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py

6-6: Line too long (150 > 120)

(E501)

94-94: Too many arguments in function definition (10 > 5)

(PLR0913)

111-111: Too many arguments in function definition (10 > 5)

(PLR0913)

128-128: Too many arguments in function definition (10 > 5)

(PLR0913)

138-138: Line too long (131 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: build

🔇 Additional comments (26)

src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (1)

1-143: LGTM! Note: This is a generated file.

The changes are purely cosmetic (indentation and formatting) and do not affect the functionality. Since this is a generated file (as indicated by the comment at line 1), any manual changes will be overwritten the next time the gRPC code is generated.

🧰 Tools

🪛 Ruff (0.8.2)

6-6: Line too long (150 > 120)

(E501)

94-94: Too many arguments in function definition (10 > 5)

(PLR0913)

111-111: Too many arguments in function definition (10 > 5)

(PLR0913)

128-128: Too many arguments in function definition (10 > 5)

(PLR0913)

138-138: Line too long (131 > 120)

(E501)

src/scanoss/api/provenance/v2/scanoss_provenance_pb2.py (1)

1-4: Note: This is an auto-generated file.

This file is generated by the protocol buffer compiler and should not be manually edited. Any changes should be made to the source .proto file instead.

src/scanoss/api/components/v2/scanoss_components_pb2.py (2)

1-3: LGTM! This is an auto-generated file.

This file is automatically generated by the protocol buffer compiler and should not be manually edited.

19-60: Static analysis warnings can be safely ignored.

The static analysis tool reports several warnings about:

Line length violations - These are expected for serialized protobuf data

Undefined names - These symbols are dynamically created by the protobuf compiler at runtime

These warnings can be safely ignored as they are typical for generated protobuf code.

🧰 Tools

🪛 Ruff (0.8.2)

19-19: Line too long (3679 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (389 > 120)

(E501)

27-27: Undefined name _COMPONENTS

(F821)

28-28: Undefined name _COMPONENTS

(F821)

28-28: Line too long (122 > 120)

(E501)

29-29: Undefined name _COMPONENTS

(F821)

30-30: Undefined name _COMPONENTS

(F821)

30-30: Line too long (136 > 120)

(E501)

31-31: Undefined name _COMPONENTS

(F821)

32-32: Undefined name _COMPONENTS

(F821)

32-32: Line too long (139 > 120)

(E501)

33-33: Undefined name _COMPONENTS

(F821)

34-34: Undefined name _COMPONENTS

(F821)

34-34: Line too long (144 > 120)

(E501)

35-35: Undefined name _COMPSEARCHREQUEST

(F821)

36-36: Undefined name _COMPSEARCHREQUEST

(F821)

37-37: Undefined name _COMPSTATISTIC

(F821)

38-38: Undefined name _COMPSTATISTIC

(F821)

39-39: Undefined name _COMPSTATISTIC_LANGUAGE

(F821)

40-40: Undefined name _COMPSTATISTIC_LANGUAGE

(F821)

41-41: Undefined name _COMPSTATISTICRESPONSE

(F821)

42-42: Undefined name _COMPSTATISTICRESPONSE

(F821)

43-43: Undefined name _COMPSTATISTICRESPONSE_PURLS

(F821)

44-44: Undefined name _COMPSTATISTICRESPONSE_PURLS

(F821)

45-45: Undefined name _COMPSEARCHRESPONSE

(F821)

46-46: Undefined name _COMPSEARCHRESPONSE

(F821)

47-47: Undefined name _COMPSEARCHRESPONSE_COMPONENT

(F821)

48-48: Undefined name _COMPSEARCHRESPONSE_COMPONENT

(F821)

49-49: Undefined name _COMPVERSIONREQUEST

(F821)

50-50: Undefined name _COMPVERSIONREQUEST

(F821)

51-51: Undefined name _COMPVERSIONRESPONSE

(F821)

52-52: Undefined name _COMPVERSIONRESPONSE

(F821)

53-53: Undefined name _COMPVERSIONRESPONSE_LICENSE

(F821)

54-54: Undefined name _COMPVERSIONRESPONSE_LICENSE

(F821)

55-55: Undefined name _COMPVERSIONRESPONSE_VERSION

(F821)

56-56: Undefined name _COMPVERSIONRESPONSE_VERSION

(F821)

57-57: Undefined name _COMPVERSIONRESPONSE_COMPONENT

(F821)

58-58: Undefined name _COMPVERSIONRESPONSE_COMPONENT

(F821)

59-59: Undefined name _COMPONENTS

(F821)

60-60: Undefined name _COMPONENTS

(F821)

src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (1)

9-177: LGTM! Verify experimental API status.

The gRPC service definition is well-structured with all necessary components (stub, servicer, and experimental API). Since this is a generated file, no changes are needed.

Please verify if the experimental API status is intentional and documented in the project's documentation. This affects the following methods:

Echo

SearchComponents

GetComponentVersions

GetComponentStatistics

🧰 Tools

🪛 Ruff (0.8.2)

111-111: Too many arguments in function definition (10 > 5)

(PLR0913)

128-128: Too many arguments in function definition (10 > 5)

(PLR0913)

145-145: Too many arguments in function definition (10 > 5)

(PLR0913)

155-155: Line too long (123 > 120)

(E501)

162-162: Too many arguments in function definition (10 > 5)

(PLR0913)

172-172: Line too long (125 > 120)

(E501)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py (1)

25-30: LGTM! API paths are correctly defined.

The serialized options correctly define the API paths for the Dependencies service:

/api/v2/dependencies/echo

/api/v2/dependencies/dependencies

🧰 Tools

🪛 Ruff (0.8.2)

26-26: Line too long (398 > 120)

(E501)

27-27: Undefined name _DEPENDENCIES

(F821)

28-28: Undefined name _DEPENDENCIES

(F821)

28-28: Line too long (126 > 120)

(E501)

29-29: Undefined name _DEPENDENCIES

(F821)

30-30: Undefined name _DEPENDENCIES

(F821)

30-30: Line too long (139 > 120)

(E501)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (2)

70-74: Note: This is an experimental API.

Users should be aware that this API is marked as experimental and may be subject to changes.

76-91: LGTM! Service methods are correctly defined.

The gRPC service methods are properly defined with all necessary parameters for:

Echo: Standard health check endpoint

GetDependencies: Main service endpoint for dependency analysis

Also applies to: 93-108

🧰 Tools

🪛 Ruff (0.8.2)

77-77: Too many arguments in function definition (10 > 5)

(PLR0913)

src/protoc_gen_swagger/options/openapiv2_pb2.py (3)

18-18: Ignore line-length warning in auto-generated code.

This line is exceedingly long (over 9,600 characters) as flagged by static analysis, but it's typical of auto-generated protobuf logic. Generally, we avoid manually splitting such lines to preserve parity with the generator's output.

🧰 Tools

🪛 Ruff (0.8.2)

18-18: Line too long (9607 > 120)

(E501)

23-51: Ignore undefined-name warnings in auto-generated code (part 1).

Static analysis flags references like _SWAGGER_RESPONSESENTRY as undefined. These are dynamically created by _builder.BuildMessageAndEnumDescriptors and are standard in generated protobuf files, so you can safely disregard the warnings.

🧰 Tools

🪛 Ruff (0.8.2)

26-26: Undefined name _SWAGGER_RESPONSESENTRY

(F821)

27-27: Undefined name _SWAGGER_RESPONSESENTRY

(F821)

28-28: Undefined name _SWAGGER_EXTENSIONSENTRY

(F821)

29-29: Undefined name _SWAGGER_EXTENSIONSENTRY

(F821)

30-30: Undefined name _OPERATION_RESPONSESENTRY

(F821)

31-31: Undefined name _OPERATION_RESPONSESENTRY

(F821)

32-32: Undefined name _OPERATION_EXTENSIONSENTRY

(F821)

33-33: Undefined name _OPERATION_EXTENSIONSENTRY

(F821)

34-34: Undefined name _RESPONSE_HEADERSENTRY

(F821)

35-35: Undefined name _RESPONSE_HEADERSENTRY

(F821)

36-36: Undefined name _RESPONSE_EXAMPLESENTRY

(F821)

37-37: Undefined name _RESPONSE_EXAMPLESENTRY

(F821)

38-38: Undefined name _RESPONSE_EXTENSIONSENTRY

(F821)

39-39: Undefined name _RESPONSE_EXTENSIONSENTRY

(F821)

40-40: Undefined name _INFO_EXTENSIONSENTRY

(F821)

41-41: Undefined name _INFO_EXTENSIONSENTRY

(F821)

42-42: Undefined name _SCHEMA

(F821)

43-43: Undefined name _SCHEMA

(F821)

44-44: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY

(F821)

45-45: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY

(F821)

46-46: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY

(F821)

47-47: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY

(F821)

48-48: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY

(F821)

49-49: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY

(F821)

50-50: Undefined name _SCOPES_SCOPEENTRY

(F821)

51-51: Undefined name _SCOPES_SCOPEENTRY

(F821)

52-117: Ignore undefined-name warnings in auto-generated code (part 2).

Similar to the previous block, any references flagged as undefined (e.g., _SWAGGER, _OPERATION, _INFO, etc.) are created at runtime and are not actual errors. This is normal for protobuf-generated files.

🧰 Tools

🪛 Ruff (0.8.2)

52-52: Undefined name _SWAGGER

(F821)

53-53: Undefined name _SWAGGER

(F821)

54-54: Undefined name _SWAGGER_RESPONSESENTRY

(F821)

55-55: Undefined name _SWAGGER_RESPONSESENTRY

(F821)

56-56: Undefined name _SWAGGER_EXTENSIONSENTRY

(F821)

57-57: Undefined name _SWAGGER_EXTENSIONSENTRY

(F821)

58-58: Undefined name _SWAGGER_SWAGGERSCHEME

(F821)

59-59: Undefined name _SWAGGER_SWAGGERSCHEME

(F821)

60-60: Undefined name _OPERATION

(F821)

61-61: Undefined name _OPERATION

(F821)

62-62: Undefined name _OPERATION_RESPONSESENTRY

(F821)

63-63: Undefined name _OPERATION_RESPONSESENTRY

(F821)

64-64: Undefined name _OPERATION_EXTENSIONSENTRY

(F821)

65-65: Undefined name _OPERATION_EXTENSIONSENTRY

(F821)

66-66: Undefined name _HEADER

(F821)

67-67: Undefined name _HEADER

(F821)

68-68: Undefined name _RESPONSE

(F821)

69-69: Undefined name _RESPONSE

(F821)

70-70: Undefined name _RESPONSE_HEADERSENTRY

(F821)

71-71: Undefined name _RESPONSE_HEADERSENTRY

(F821)

72-72: Undefined name _RESPONSE_EXAMPLESENTRY

(F821)

73-73: Undefined name _RESPONSE_EXAMPLESENTRY

(F821)

74-74: Undefined name _RESPONSE_EXTENSIONSENTRY

(F821)

75-75: Undefined name _RESPONSE_EXTENSIONSENTRY

(F821)

76-76: Undefined name _INFO

(F821)

77-77: Undefined name _INFO

(F821)

78-78: Undefined name _INFO_EXTENSIONSENTRY

(F821)

79-79: Undefined name _INFO_EXTENSIONSENTRY

(F821)

80-80: Undefined name _CONTACT

(F821)

81-81: Undefined name _CONTACT

(F821)

82-82: Undefined name _LICENSE

(F821)

83-83: Undefined name _LICENSE

(F821)

84-84: Undefined name _EXTERNALDOCUMENTATION

(F821)

85-85: Undefined name _EXTERNALDOCUMENTATION

(F821)

86-86: Undefined name _SCHEMA

(F821)

87-87: Undefined name _SCHEMA

(F821)

88-88: Undefined name _JSONSCHEMA

(F821)

89-89: Undefined name _JSONSCHEMA

(F821)

90-90: Undefined name _JSONSCHEMA_JSONSCHEMASIMPLETYPES

(F821)

91-91: Undefined name _JSONSCHEMA_JSONSCHEMASIMPLETYPES

(F821)

92-92: Undefined name _TAG

(F821)

93-93: Undefined name _TAG

(F821)

94-94: Undefined name _SECURITYDEFINITIONS

(F821)

95-95: Undefined name _SECURITYDEFINITIONS

(F821)

96-96: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY

(F821)

97-97: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY

(F821)

98-98: Undefined name _SECURITYSCHEME

(F821)

99-99: Undefined name _SECURITYSCHEME

(F821)

100-100: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY

(F821)

101-101: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY

(F821)

102-102: Undefined name _SECURITYSCHEME_TYPE

(F821)

103-103: Undefined name _SECURITYSCHEME_TYPE

(F821)

104-104: Undefined name _SECURITYSCHEME_IN

(F821)

105-105: Undefined name _SECURITYSCHEME_IN

(F821)

106-106: Undefined name _SECURITYSCHEME_FLOW

(F821)

107-107: Undefined name _SECURITYSCHEME_FLOW

(F821)

108-108: Undefined name _SECURITYREQUIREMENT

(F821)

109-109: Undefined name _SECURITYREQUIREMENT

(F821)

110-110: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTVALUE

(F821)

111-111: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTVALUE

(F821)

112-112: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY

(F821)

113-113: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY

(F821)

114-114: Undefined name _SCOPES

(F821)

115-115: Undefined name _SCOPES

(F821)

116-116: Undefined name _SCOPES_SCOPEENTRY

(F821)

117-117: Undefined name _SCOPES_SCOPEENTRY

(F821)

src/protoc_gen_swagger/options/annotations_pb2.py (3)

18-18: Ignore line-length warning in auto-generated code.

This line is very long due to the serialized data for the descriptor. It is standard behavior in protobuf-generated files, and manually wrapping or modifying this line could break the generated code’s structure.

🧰 Tools

🪛 Ruff (0.8.2)

18-18: Line too long (1009 > 120)

(E501)

23-27: Ignore undefined-name warnings in auto-generated code (extensions).

The static analysis flags openapiv2_swagger, openapiv2_operation, etc., as undefined. These are extensions declared within the generated protobuf logic. They are correctly assembled at runtime and are not issues.

🧰 Tools

🪛 Ruff (0.8.2)

23-23: Undefined name openapiv2_swagger

(F821)

24-24: Undefined name openapiv2_operation

(F821)

25-25: Undefined name openapiv2_schema

(F821)

26-26: Undefined name openapiv2_tag

(F821)

27-27: Undefined name openapiv2_field

(F821)

29-30: Ignore protobuf descriptor assignment warnings.

Setting _options and _serialized_options to None or a byte string here is part of standard protobuf code generation logic. The static analysis warnings about undefined references do not apply to these generated lines.

src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py (2)

27-38: Static analysis false positive for _CRYPTOGRAPHY.
The _CRYPTOGRAPHY reference may appear undefined to certain linters, but protoc-generated code often instantiates these variables. Verify it’s generated correctly or silence this lint rule for auto-generated code.

🧰 Tools

🪛 Ruff (0.8.2)

27-27: Undefined name _CRYPTOGRAPHY

(F821)

28-28: Undefined name _CRYPTOGRAPHY

(F821)

28-28: Line too long (126 > 120)

(E501)

29-29: Undefined name _CRYPTOGRAPHY

(F821)

30-30: Undefined name _CRYPTOGRAPHY

(F821)

30-30: Line too long (138 > 120)

(E501)

31-31: Undefined name _CRYPTOGRAPHY

(F821)

32-32: Undefined name _CRYPTOGRAPHY

(F821)

32-32: Line too long (149 > 120)

(E501)

33-33: Undefined name _CRYPTOGRAPHY

(F821)

34-34: Undefined name _CRYPTOGRAPHY

(F821)

34-34: Line too long (145 > 120)

(E501)

35-35: Undefined name _CRYPTOGRAPHY

(F821)

36-36: Undefined name _CRYPTOGRAPHY

(F821)

36-36: Line too long (139 > 120)

(E501)

37-37: Undefined name _CRYPTOGRAPHY

(F821)

38-38: Undefined name _CRYPTOGRAPHY

(F821)

38-38: Line too long (141 > 120)

(E501)

39-64: Check auto-generated references.
References to _ALGORITHM, _ALGORITHMRESPONSE, _ALGORITHMSINRANGERESPONSE, _VERSIONSINRANGERESPONSE, _HINT, _HINTSRESPONSE, and _HINTSINRANGERESPONSE might trigger “undefined name” warnings. These are typical in protoc code. Confirm they’re generated or ignore false positives.

🧰 Tools

🪛 Ruff (0.8.2)

39-39: Undefined name _ALGORITHM

(F821)

40-40: Undefined name _ALGORITHM

(F821)

41-41: Undefined name _ALGORITHMRESPONSE

(F821)

42-42: Undefined name _ALGORITHMRESPONSE

(F821)

43-43: Undefined name _ALGORITHMRESPONSE_PURLS

(F821)

44-44: Undefined name _ALGORITHMRESPONSE_PURLS

(F821)

45-45: Undefined name _ALGORITHMSINRANGERESPONSE

(F821)

46-46: Undefined name _ALGORITHMSINRANGERESPONSE

(F821)

47-47: Undefined name _ALGORITHMSINRANGERESPONSE_PURL

(F821)

48-48: Undefined name _ALGORITHMSINRANGERESPONSE_PURL

(F821)

49-49: Undefined name _VERSIONSINRANGERESPONSE

(F821)

50-50: Undefined name _VERSIONSINRANGERESPONSE

(F821)

51-51: Undefined name _VERSIONSINRANGERESPONSE_PURL

(F821)

52-52: Undefined name _VERSIONSINRANGERESPONSE_PURL

(F821)

53-53: Undefined name _HINT

(F821)

54-54: Undefined name _HINT

(F821)

55-55: Undefined name _HINTSRESPONSE

(F821)

56-56: Undefined name _HINTSRESPONSE

(F821)

57-57: Undefined name _HINTSRESPONSE_PURLS

(F821)

58-58: Undefined name _HINTSRESPONSE_PURLS

(F821)

59-59: Undefined name _HINTSINRANGERESPONSE

(F821)

60-60: Undefined name _HINTSINRANGERESPONSE

(F821)

61-61: Undefined name _HINTSINRANGERESPONSE_PURL

(F821)

62-62: Undefined name _HINTSINRANGERESPONSE_PURL

(F821)

63-63: Undefined name _CRYPTOGRAPHY

(F821)

64-64: Undefined name _CRYPTOGRAPHY

(F821)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (2)

27-38: Potential false positive for _SCANNING.
References to _SCANNING can appear undefined in lint checks for generated code. Verify you have correct protoc definitions or suppress these warnings.

🧰 Tools

🪛 Ruff (0.8.2)

27-27: Undefined name _SCANNING

(F821)

28-28: Undefined name _SCANNING

(F821)

29-29: Undefined name _SCANNING

(F821)

30-30: Undefined name _SCANNING

(F821)

30-30: Line too long (132 > 120)

(E501)

31-31: Undefined name _HFHREQUEST

(F821)

32-32: Undefined name _HFHREQUEST

(F821)

33-33: Undefined name _HFHREQUEST_CHILDREN

(F821)

34-34: Undefined name _HFHREQUEST_CHILDREN

(F821)

35-35: Undefined name _HFHRESPONSE

(F821)

36-36: Undefined name _HFHRESPONSE

(F821)

37-37: Undefined name _HFHRESPONSE_COMPONENT

(F821)

38-38: Undefined name _HFHRESPONSE_COMPONENT

(F821)

39-42: Check references to _HFHREQUEST & _HFHRESPONSE.
Static analysis may flag these as undefined. In protoc-generated files, that’s likely a false positive.

🧰 Tools

🪛 Ruff (0.8.2)

39-39: Undefined name _HFHRESPONSE_RESULT

(F821)

40-40: Undefined name _HFHRESPONSE_RESULT

(F821)

41-41: Undefined name _SCANNING

(F821)

42-42: Undefined name _SCANNING

(F821)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (3)

21-29: Stub definitions look good.
The stub methods Echo and FolderHashScan are properly defined and wired for unary calls.

54-63: RPC handlers are correctly set up.
The method handlers for Echo and FolderHashScan appear valid.

65-66: Generic handler configuration.
No issues noted. This looks standard for gRPC route registration.

src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py (1)

1-50: LGTM! Auto-generated protobuf file.

This is an auto-generated file by the protobuf compiler. The changes follow the standard protobuf output format and should not be modified manually.

Please ensure that these changes were generated using the correct version of the protobuf compiler and not modified manually.

🧰 Tools

🪛 Ruff (0.8.2)

14-14: Module level import not at top of file

(E402)

14-14: scanoss.api.common.v2.scanoss_common_pb2 imported but unused

Remove unused import: scanoss.api.common.v2.scanoss_common_pb2

(F401)

15-15: Module level import not at top of file

(E402)

15-15: google.api.annotations_pb2 imported but unused

Remove unused import: google.api.annotations_pb2

(F401)

16-16: Module level import not at top of file

(E402)

16-16: protoc_gen_swagger.options.annotations_pb2 imported but unused

Remove unused import: protoc_gen_swagger.options.annotations_pb2

(F401)

19-19: Line too long (2566 > 120)

(E501)

22-22: Line too long (124 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (419 > 120)

(E501)

27-27: Undefined name _VULNERABILITIES

(F821)

28-28: Undefined name _VULNERABILITIES

(F821)

28-28: Line too long (129 > 120)

(E501)

29-29: Undefined name _VULNERABILITIES

(F821)

30-30: Undefined name _VULNERABILITIES

(F821)

30-30: Line too long (132 > 120)

(E501)

31-31: Undefined name _VULNERABILITIES

(F821)

32-32: Undefined name _VULNERABILITIES

(F821)

32-32: Line too long (152 > 120)

(E501)

33-33: Undefined name _VULNERABILITYREQUEST

(F821)

34-34: Undefined name _VULNERABILITYREQUEST

(F821)

35-35: Undefined name _VULNERABILITYREQUEST_PURLS

(F821)

36-36: Undefined name _VULNERABILITYREQUEST_PURLS

(F821)

37-37: Undefined name _CPERESPONSE

(F821)

38-38: Undefined name _CPERESPONSE

(F821)

39-39: Undefined name _CPERESPONSE_PURLS

(F821)

40-40: Undefined name _CPERESPONSE_PURLS

(F821)

41-41: Undefined name _VULNERABILITYRESPONSE

(F821)

42-42: Undefined name _VULNERABILITYRESPONSE

(F821)

43-43: Undefined name _VULNERABILITYRESPONSE_VULNERABILITIES

(F821)

44-44: Undefined name _VULNERABILITYRESPONSE_VULNERABILITIES

(F821)

45-45: Undefined name _VULNERABILITYRESPONSE_PURLS

(F821)

46-46: Undefined name _VULNERABILITYRESPONSE_PURLS

(F821)

47-47: Undefined name _VULNERABILITIES

(F821)

48-48: Undefined name _VULNERABILITIES

(F821)

src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (1)

30-49: LGTM! New cryptography RPC methods added.

The changes add several new RPC methods to enhance the cryptography service:

GetAlgorithmsInRange: Get cryptographic algorithms for version ranges

GetVersionsInRange: Get versions with/without cryptographic algorithms

GetHintsInRange: Get protocol/library/sdk/framework hints for version ranges

GetEncryptionHints: Get encryption-related hints

src/scanoss/api/common/v2/scanoss_common_pb2.py (1)

1-37: LGTM! Auto-generated protobuf file.

This is an auto-generated file by the protobuf compiler. The changes follow the standard protobuf output format and should not be modified manually.

Please ensure that these changes were generated using the correct version of the protobuf compiler and not modified manually.

🧰 Tools

🪛 Ruff (0.8.2)

16-16: Line too long (845 > 120)

(E501)

20-20: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

24-24: Undefined name _STATUSCODE

(F821)

25-25: Undefined name _STATUSCODE

(F821)

26-26: Undefined name _STATUSRESPONSE

(F821)

27-27: Undefined name _STATUSRESPONSE

(F821)

28-28: Undefined name _ECHOREQUEST

(F821)

29-29: Undefined name _ECHOREQUEST

(F821)

30-30: Undefined name _ECHORESPONSE

(F821)

31-31: Undefined name _ECHORESPONSE

(F821)

32-32: Undefined name _PURLREQUEST

(F821)

33-33: Undefined name _PURLREQUEST

(F821)

34-34: Undefined name _PURLREQUEST_PURLS

(F821)

35-35: Undefined name _PURLREQUEST_PURLS

(F821)

src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py (1)

1-42: LGTM! Auto-generated protobuf file.

This is an auto-generated file by the protobuf compiler. The changes follow the standard protobuf output format and should not be modified manually.

Please ensure that these changes were generated using the correct version of the protobuf compiler and not modified manually.

🧰 Tools

🪛 Ruff (0.8.2)

14-14: Module level import not at top of file

(E402)

14-14: scanoss.api.common.v2.scanoss_common_pb2 imported but unused

Remove unused import: scanoss.api.common.v2.scanoss_common_pb2

(F401)

15-15: Module level import not at top of file

(E402)

15-15: google.api.annotations_pb2 imported but unused

Remove unused import: google.api.annotations_pb2

(F401)

16-16: Module level import not at top of file

(E402)

16-16: protoc_gen_swagger.options.annotations_pb2 imported but unused

Remove unused import: protoc_gen_swagger.options.annotations_pb2

(F401)

19-19: Line too long (1718 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (375 > 120)

(E501)

27-27: Undefined name _SEMGREP

(F821)

28-28: Undefined name _SEMGREP

(F821)

29-29: Undefined name _SEMGREP

(F821)

30-30: Undefined name _SEMGREP

(F821)

30-30: Line too long (123 > 120)

(E501)

31-31: Undefined name _SEMGREPRESPONSE

(F821)

32-32: Undefined name _SEMGREPRESPONSE

(F821)

33-33: Undefined name _SEMGREPRESPONSE_ISSUE

(F821)

34-34: Undefined name _SEMGREPRESPONSE_ISSUE

(F821)

35-35: Undefined name _SEMGREPRESPONSE_FILE

(F821)

36-36: Undefined name _SEMGREPRESPONSE_FILE

(F821)

37-37: Undefined name _SEMGREPRESPONSE_PURLS

(F821)

38-38: Undefined name _SEMGREPRESPONSE_PURLS

(F821)

39-39: Undefined name _SEMGREP

(F821)

40-40: Undefined name _SEMGREP

(F821)
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (1)

1-109: LGTM! Note: This is an auto-generated file.

The changes appear to be auto-generated formatting adjustments. As indicated by the comment at the top of the file, this is a generated gRPC file and should not be manually edited.

Please ensure that these changes were generated using the gRPC Python protocol compiler plugin and not manually modified. You can regenerate the file using:
❌ Verification inconclusive

🏁 Script executed:
#!/bin/bash
# Description: Find the proto file and verify it's being used to generate this gRPC file.

# Find the proto file
fd -e proto | grep -i "semgrep"

# Note: After finding the proto file, you can regenerate this file using:
# python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. path/to/your.proto
Length of output: 31

Auto-generated file – please verify generation process manually.

The gRPC file appears to be correctly auto-generated with only formatting changes. However, our initial automated search for the corresponding .proto file using "semgrep" did not return any results. Since this script output was unexpected, please perform a manual verification to confirm that the file was indeed generated using the correct proto file (likely containing the package scanoss.api.semgrep.v2) and that it hasn’t been manually edited.

Ensure that the proto file exists in the repository (it might be under a different name or directory).

Verify that regenerating the file with the gRPC Python protocol compiler plugin reproduces the current changes.
🧰 Tools

🪛 Ruff (0.8.2)

77-77: Too many arguments in function definition (10 > 5)

(PLR0913)

94-94: Too many arguments in function definition (10 > 5)

(PLR0913)

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

src/scanoss/cli.py (4)

484-490: Define format choices as a constant.

The format choices should be defined as a constant at the module level for better maintainability and reusability.

+FOLDER_SCAN_FORMATS = ['plain', 'json']
+
 p_folder_scan.add_argument(
     '--format',
     '-f',
     type=str,
-    choices=['plain', 'json'],
+    choices=FOLDER_SCAN_FORMATS,
     help='Result output format (optional - default: plain)',
 )

1462-1500: Enhance docstring and add logging.

The function could benefit from:

Type hints in the docstring
Logging for better debugging and monitoring

-def folder_hashing_scan(parser, args):
+def folder_hashing_scan(parser: argparse.ArgumentParser, args: argparse.Namespace) -> None:
     """Run the "folder-scan" sub-command
 
     Args:
-        parser (ArgumentParser): command line parser object
-        args (Namespace): Parsed arguments
+        parser (argparse.ArgumentParser): command line parser object
+        args (argparse.Namespace): Parsed arguments
+
+    Raises:
+        ScanossGrpcError: If there is an error during the gRPC communication
     """
     try:
         if not args.scan_dir:
             print_stderr('ERROR: Please specify a directory to scan')
             parser.parse_args([args.subparser, '-h'])
             sys.exit(1)
 
+        if args.debug:
+            print_stderr(f'Starting folder scan for directory: {args.scan_dir}')
+
         if not os.path.exists(args.scan_dir) or not os.path.isdir(args.scan_dir):
             print_stderr(f'ERROR: The specified directory {args.scan_dir} does not exist')
             sys.exit(1)
 
+        if args.debug:
+            print_stderr('Creating scanner configuration...')
         scanner_config = create_scanner_config_from_args(args)
         scanoss_settings = get_scanoss_settings_from_args(args)
         grpc_config = create_grpc_config_from_args(args)
 
+        if args.debug:
+            print_stderr('Initializing gRPC client...')
         client = ScanossGrpc(**asdict(grpc_config))
 
         scanner = ScannerHFH(
             scan_dir=args.scan_dir,
             config=scanner_config,
             client=client,
             scanoss_settings=scanoss_settings,
         )
 
         scanner.best_match = args.best_match
         scanner.threshold = args.threshold
 
+        if args.debug:
+            print_stderr('Starting scan...')
         scanner.scan()
+        if args.debug:
+            print_stderr('Presenting results...')
         scanner.present(output_file=args.output, output_format=args.format)
     except ScanossGrpcError as e:
+        if args.debug:
+            print_stderr(f'Scan failed with error: {e}')
         print_stderr(f'ERROR: {e}')
         sys.exit(1)

1502-1512: Add type hints and improve error handling.

The function would benefit from type hints and better error handling for the settings file path.

-def get_scanoss_settings_from_args(args):
+def get_scanoss_settings_from_args(args: argparse.Namespace) -> Optional[ScanossSettings]:
+    """Create ScanossSettings from command line arguments.
+
+    Args:
+        args (argparse.Namespace): Parsed command line arguments
+
+    Returns:
+        Optional[ScanossSettings]: The settings object or None if settings are skipped
+
+    Raises:
+        ScanossSettingsError: If there is an error loading the settings file
+    """
     scanoss_settings = None
     if not args.skip_settings_file:
+        if args.settings and not os.path.exists(args.settings):
+            print_stderr(f'Error: Settings file does not exist: {args.settings}')
+            sys.exit(1)
+
         scanoss_settings = ScanossSettings(debug=args.debug, trace=args.trace, quiet=args.quiet)
         try:
             scanoss_settings.load_json_file(args.settings, args.scan_dir).set_file_type('new').set_scan_type('identify')
         except ScanossSettingsError as e:
             print_stderr(f'Error: {e}')
             sys.exit(1)
-        return scanoss_settings
+    return scanoss_settings

625-631: Define exit codes as constants.

Exit codes should be defined as constants for better maintainability and clarity.

+# Exit codes
+EXIT_SUCCESS = 0
+EXIT_FAILURE = 1
+EXIT_INVALID_ARGS = 2
+
 if args.version:
     ver(parser, args)
-    sys.exit(0)
+    sys.exit(EXIT_SUCCESS)
 if not args.subparser:
     parser.print_help()  # No sub command subcommand, print general help
-    sys.exit(1)
+    sys.exit(EXIT_INVALID_ARGS)
 elif (args.subparser in ('utils', 'ut', 'component', 'comp', 'inspect', 'insp', 'ins')) and not args.subparsercmd:
     parser.parse_args([args.subparser, '--help'])  # Force utils helps to be displayed
-    sys.exit(1)
+    sys.exit(EXIT_INVALID_ARGS)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee6769b and e2a64c9.

📒 Files selected for processing (2)

src/scanoss/cli.py (46 hunks)
src/scanoss/scanners/scanner_hfh.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/scanoss/scanners/scanner_hfh.py

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: build

🔇 Additional comments (2)

src/scanoss/cli.py (2)

28-47: LGTM! Well-organized imports and constants.

The new imports and constants are properly organized and follow Python best practices. The use of explicit imports and constants improves maintainability.

505-518: LGTM! Well-integrated shared options.

The folder-scan command is properly integrated with the shared options, maintaining consistency with other commands.

Also applies to: 533-534, 561-562, 587-588

src/scanoss/cli.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/scanoss/file_filters.py (2)

242-271: Improve constructor documentation for better maintainability.

The constructor has been updated to use **kwargs for flexibility, but the docstring could be enhanced to better document the expected keyword arguments.

Apply this diff to improve the docstring:

     def __init__(self, debug: bool = False, trace: bool = False, quiet: bool = False, **kwargs):
         """
         Initialize scan filters based on default settings. Optionally append custom settings.
 
         Args:
             debug (bool): Enable debug output
             trace (bool): Enable trace output
             quiet (bool): Suppress output
-            **kwargs: Additional arguments including:
-                scanoss_settings (ScanossSettings): Custom settings to override defaults
-                all_extensions (bool): Include all file extensions
-                all_folders (bool): Include all folders
-                hidden_files_folders (bool): Include hidden files and folders
-                operation_type (str): Operation type ('scanning' or 'fingerprinting')
-                skip_size (int): Size to skip
-                skip_extensions (list): Extensions to skip
-                skip_folders (list): Folders to skip
+            **kwargs: Additional arguments
+
+            Keyword Args:
+                scanoss_settings (ScanossSettings): Custom settings to override defaults
+                all_extensions (bool, optional): Include all file extensions. Defaults to False.
+                all_folders (bool, optional): Include all folders. Defaults to False.
+                hidden_files_folders (bool, optional): Include hidden files and folders. Defaults to False.
+                operation_type (str, optional): Operation type ('scanning' or 'fingerprinting'). Defaults to 'scanning'.
+                skip_size (int, optional): Size to skip. Defaults to 0.
+                skip_extensions (list, optional): Extensions to skip. Defaults to [].
+                skip_folders (list, optional): Folders to skip. Defaults to [].
         """

518-552: Enhance error handling in _should_skip_file_for_hfh method.

The new method for folder hashing has a broad exception handler that might mask specific issues.

Apply this diff to improve error handling:

     def _should_skip_file_for_hfh(self, file_path: Path) -> bool:
         """
         Check if a file should be skipped during folder hashing scan.
 
         Args:
             file_path (Path): The path to the file to check.
 
         Returns:
             bool: True if the file should be skipped, False otherwise.
         """
         try:
             if (
                 any(part.startswith('.') for part in file_path.parts)  # Hidden files/folders
                 or file_path.is_symlink()  # Symlinks
                 or file_path.stat().st_size == 0  # Empty files
             ):
                 self.print_debug(f'Skipping file: {file_path} (hidden/symlink/empty)')
                 return True
 
             # Files ending with null
             if file_path.suffix.lower() == '.txt':
                 try:
                     with open(file_path, 'rb') as f:
                         if f.read().endswith(b'\x00'):
                             self.print_debug(f'Skipping file: {file_path} (text file ending with null)')
                             return True
-                except (OSError, IOError):
+                except (OSError, IOError) as e:
+                    self.print_debug(f'Skipping file: {file_path} (cannot read file content: {str(e)})')
                     return True
 
             return False
 
-        except Exception as e:
-            self.print_debug(f'Error checking file {file_path}: {str(e)}')
+        except OSError as e:
+            self.print_debug(f'Error accessing file {file_path}: {str(e)}')
+            return True
+        except Exception as e:
+            self.print_debug(f'Unexpected error checking file {file_path}: {str(e)}')
             return True

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e2a64c9 and b8ffc25.

📒 Files selected for processing (2)

src/scanoss/file_filters.py (6 hunks)
src/scanoss/scanners/scanner_hfh.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/scanoss/scanners/scanner_hfh.py

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: build

🔇 Additional comments (2)

src/scanoss/file_filters.py (2)

207-213: LGTM! New file extensions added to skip list.

The additions include common file extensions that should be skipped during scanning:

.whml, .pom, .smtml (markup/configuration files)

.min.js (minified JavaScript)

.mf, .base64, .s (binary/encoded files)

441-477: LGTM! Method visibility change is appropriate.

The _should_skip_dir method has been renamed to should_skip_dir, making it public. This change is appropriate as:

The method provides essential functionality that may be needed by external callers

The implementation is stable and well-tested

The change maintains backward compatibility

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/scanoss/scanners/scanner_hfh.py (2)

191-192: Enhance error handling in _build_root_node.

The error handling could be improved by:

Logging the full stack trace in debug mode.
Providing more context about the error.

-            except Exception as e:
-                self.print_debug(f'Skipping file {full_file_path}: {str(e)}')
+            except Exception as e:
+                self.print_debug(f'Error processing file {full_file_path}: {str(e)}')
+                if self.base.debug:
+                    import traceback
+                    self.print_debug(f'Stack trace:\n{traceback.format_exc()}')

308-316: Consider a more readable plain text format.

The plain output format currently returns the same JSON format as _format_json_output. Consider implementing a more human-readable format for plain text output.

     def _format_plain_output(self) -> str:
         """
         Format the scan output data into a plain text string
         """
-        return (
-            json.dumps(self.scanner.scan_results, indent=2)
-            if isinstance(self.scanner.scan_results, dict)
-            else str(self.scanner.scan_results)
-        )
+        if not isinstance(self.scanner.scan_results, dict):
+            return str(self.scanner.scan_results)
+
+        output = []
+        for result in self.scanner.scan_results.get('results', []):
+            output.append(
+                f"Path: {result.get('path_id', 'N/A')}\n"
+                f"Name Hash: {result.get('sim_hash_names', 'N/A')}\n"
+                f"Content Hash: {result.get('sim_hash_content', 'N/A')}\n"
+                f"Children: {len(result.get('children', []))}\n"
+            )
+        return '\n'.join(output)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8ffc25 and 293aed4.

📒 Files selected for processing (5)

src/scanoss/cli.py (46 hunks)
src/scanoss/file_filters.py (6 hunks)
src/scanoss/results.py (6 hunks)
src/scanoss/scanners/scanner_hfh.py (1 hunks)
src/scanoss/utils/abstract_presenter.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/scanoss/file_filters.py
src/scanoss/utils/abstract_presenter.py

🔇 Additional comments (8)

src/scanoss/results.py (3)

53-113: LGTM! Well-structured presenter class.

The ResultsPresenter class follows the Single Responsibility Principle by focusing on presentation logic, with clear separation of concerns between JSON and plain text formatting.

86-94: Return type inconsistency in plain output format.

The method's docstring indicates it returns a plain text string, but if there's no data, it returns a message without printing to stderr.

116-264: LGTM! Improved error handling and separation of concerns.

The changes to the Results class:

Improve separation of concerns by delegating presentation to ResultsPresenter.

Enhance error handling by using specific exceptions.

src/scanoss/scanners/scanner_hfh.py (3)

40-50: LGTM! Clean and focused data structure.

The DirectoryNode class is well-designed with clear attributes and a single responsibility.

52-61: LGTM! Clean and focused data structure.

The DirectoryFile class is well-designed with clear attributes and a single responsibility.

63-287: LGTM! Well-structured scanner implementation.

The ScannerHFH class demonstrates:

Clear separation of concerns between scanning, hashing, and presentation.

Efficient use of helper classes for directory tree representation.

Well-documented methods with clear responsibilities.

src/scanoss/cli.py (2)

1465-1503: Enhance error handling in the folder_hashing_scan function.

While the basic error handling is in place, consider adding:

More specific error types for better error reporting

Cleanup of resources in case of failures

Logging of scan progress and errors

1505-1514: LGTM! Well-structured settings loader.

The function demonstrates:

Clear error handling for settings file loading.

Proper configuration of settings object.

Consistent use of debug flags.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/scanoss/cli.py (1)

1475-1513: 🛠️ Refactor suggestion

Enhance error handling in the folder_hashing_scan function.

While the basic error handling is in place, consider adding:

More specific error types for better error reporting
Cleanup of resources in case of failures
Logging of scan progress and errors

Apply this diff to improve error handling:

 def folder_hashing_scan(parser, args):
     try:
         if not args.scan_dir:
             print_stderr('ERROR: Please specify a directory to scan')
             parser.parse_args([args.subparser, '-h'])
             sys.exit(1)

         if not os.path.exists(args.scan_dir) or not os.path.isdir(args.scan_dir):
             print_stderr(f'ERROR: The specified directory {args.scan_dir} does not exist')
             sys.exit(1)

+        if not os.access(args.scan_dir, os.R_OK):
+            print_stderr(f'ERROR: No read permission for directory {args.scan_dir}')
+            sys.exit(1)
+
         scanner_config = create_scanner_config_from_args(args)
         scanoss_settings = get_scanoss_settings_from_args(args)
         grpc_config = create_grpc_config_from_args(args)

         client = ScanossGrpc(**asdict(grpc_config))
         scanner = ScannerHFH(
             scan_dir=args.scan_dir,
             config=scanner_config,
             client=client,
             scanoss_settings=scanoss_settings,
         )

         scanner.best_match = args.best_match
         scanner.threshold = args.threshold

+        try:
             scanner.scan()
             scanner.present(output_file=args.output, output_format=args.format)
+        finally:
+            client.close()  # Ensure client is properly closed
+
     except ScanossGrpcError as e:
         print_stderr(f'ERROR: {e}')
         sys.exit(1)
+    except Exception as e:
+        print_stderr(f'ERROR: Unexpected error during scan: {e}')
+        sys.exit(1)

🧹 Nitpick comments (1)

src/scanoss/cli.py (1)

1515-1524: Add type hints and explicit return for None case.

The function could benefit from type hints and an explicit return for the case when skip_settings_file is True.

Apply this diff to improve type safety and readability:

-def get_scanoss_settings_from_args(args):
+def get_scanoss_settings_from_args(args) -> Optional[ScanossSettings]:
     scanoss_settings = None
     if not args.skip_settings_file:
         scanoss_settings = ScanossSettings(debug=args.debug, trace=args.trace, quiet=args.quiet)
         try:
             scanoss_settings.load_json_file(args.settings, args.scan_dir).set_file_type('new').set_scan_type('identify')
         except ScanossSettingsError as e:
             print_stderr(f'Error: {e}')
             sys.exit(1)
         return scanoss_settings
+    return None

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 293aed4 and 793827e.

📒 Files selected for processing (4)

requirements.txt (1 hunks)
setup.cfg (1 hunks)
src/scanoss/cli.py (46 hunks)
src/scanoss/scanners/scanner_hfh.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

requirements.txt
src/scanoss/scanners/scanner_hfh.py

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: build

🔇 Additional comments (1)

src/scanoss/cli.py (1)

478-515: LGTM! Well-structured argument setup for the folder-scan subcommand.

The argument setup is comprehensive and includes:

Required directory argument

Optional output file and format

Configurable timeout

Best match mode flag

Threshold validation with a valid range of 1-100

setup.cfg

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (6)

src/scanoss/scanners/folder_hasher.py (3)
100-120: Consider limiting memory usage when reading large files.

The call to full_file_path.read_bytes() at line 150 loads entire file contents into memory. For very large files, this could cause memory pressure or even out-of-memory errors.

You might consider reading the file in chunks or hashing it incrementally to reduce memory usage:
- file_bytes = full_file_path.read_bytes()
- key = CRC64.get_hash_buff(file_bytes)
+ with open(full_file_path, 'rb') as f:
+     hasher = CRC64()
+     while True:
+         chunk = f.read(8192)
+         if not chunk:
+             break
+         hasher.update(chunk)
+ key = hasher.digest()
137-174: Possible performance impact from sorting and repeated directory traversal.

You're sorting all filtered files at line 141 and then collecting them into a tree data structure in a nested loop. For directories with many files, this may become expensive.

Consider aggregating child nodes in one pass or using an iterative approach to reduce repeated traversals and dictionary lookups at lines 158–163.

212-226: Optimize repeated file-key processing.

The code creates a new bytes object at line 224 by calling bytes(file.key) in each iteration. If many files share the same interned key reference, this repeated call could be avoided.

You might store the byte key once when constructing DirectoryFile and reuse it directly if needed.
src/scanoss/scanners/scanner_hfh.py (1)

41-47: Consider exposing threshold & best_match in constructor.

Threshold (line 92) and best_match (line 91) are set on the instance after initialization. If these are critical config values, you could accept them in __init__ and set them once, reducing the risk of forgetting to configure them.
src/scanoss/cli.py (2)
1501-1539: Enhance error handling and resource cleanup.

Inside folder_hashing_scan, you create a ScanossGrpc client at line 1522 but never explicitly close it. If that client object needs explicit cleanup, consider wrapping the scan in a try/finally code block that closes the client.
+    try:
         scanner = ScannerHFH(...)
         scanner.scan()
         scanner.present(...)
+    finally:
+        client.close()  # or equivalent resource cleanup
1541-1572: Clarify exception handling for folder hashing.

At lines 1569–1570, you catch Exception and print an error, then exit. If you expect only certain I/O errors, consider narrowing the except block to handle more specific exceptions (e.g., OSError) for clearer debugging.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c1399b and 2dc9e4a.

📒 Files selected for processing (3)

src/scanoss/cli.py (46 hunks)
src/scanoss/scanners/folder_hasher.py (1 hunks)
src/scanoss/scanners/scanner_hfh.py (1 hunks)

🔇 Additional comments (5)

src/scanoss/scanners/folder_hasher.py (1)

263-296: Docstring clarity: _format_plain_output actually returns JSON.

This is consistent with prior patterns, but the docstring in _format_plain_output claims it is "plain text," while JSON is returned if self.folder_hasher.tree is a dict.

If returning JSON is indeed intended, consider updating the docstring to better reflect the actual output type. Otherwise, convert the data to truly plain text if that is the goal.

src/scanoss/scanners/scanner_hfh.py (2)

49-93: Validate client before gRPC calls.

At line 119, you call self.client.folder_hash_scan(...). If client is None or not configured properly, the call might fail unexpectedly.

Consider adding a quick check (e.g., if not self.client:) to provide a more actionable error message or fallback behavior.

139-161: Ensure docstring accuracy for _format_plain_output.

The _format_plain_output docstring states it returns a "plain text string," but it actually dumps JSON if self.scanner.scan_results is a dict (lines 157–160).

Update the docstring to reflect that JSON is returned unless it's not a dict.

src/scanoss/cli.py (2)

482-541: Review usage instructions for new sub-commands.

Lines 482–541 add two new sub-commands folder-scan and folder-hash. Ensure the help and usage docs are kept consistent across CLI documentation (e.g., in CLIENT_HELP.md) so that end-users fully understand these commands.

Consider reviewing doc references, readme files, or help outputs in other modules to confirm consistency.

507-517: Threshold range might exclude 0.

You’ve restricted --threshold to 1–100. If there's a scenario where a threshold of 0 is valid (meaning no content similarity required), consider allowing 0 in that range. If 1–100 is truly intended, this is good.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/scanoss/cli.py (1)

1501-1539: 🛠️ Refactor suggestion

Enhance error handling and resource cleanup.

Consider adding more robust error handling and ensuring proper resource cleanup.

 def folder_hashing_scan(parser, args):
     """Run the "folder-scan" sub-command

     Args:
         parser (ArgumentParser): command line parser object
         args (Namespace): Parsed arguments
     """
     try:
         if not args.scan_dir:
             print_stderr('ERROR: Please specify a directory to scan')
             parser.parse_args([args.subparser, '-h'])
             sys.exit(1)

         if not os.path.exists(args.scan_dir) or not os.path.isdir(args.scan_dir):
             print_stderr(f'ERROR: The specified directory {args.scan_dir} does not exist')
             sys.exit(1)

+        if not os.access(args.scan_dir, os.R_OK):
+            print_stderr(f'ERROR: No read permission for directory {args.scan_dir}')
+            sys.exit(1)
+
         scanner_config = create_scanner_config_from_args(args)
         scanoss_settings = get_scanoss_settings_from_args(args)
         grpc_config = create_grpc_config_from_args(args)

         client = ScanossGrpc(**asdict(grpc_config))
+        try:
             scanner = ScannerHFH(
                 scan_dir=args.scan_dir,
                 config=scanner_config,
                 client=client,
                 scanoss_settings=scanoss_settings,
             )

             scanner.best_match = args.best_match
             scanner.threshold = args.threshold

             scanner.scan()
             scanner.present(output_file=args.output, output_format=args.format)
+        finally:
+            client.close()  # Ensure client is properly closed
+
     except ScanossGrpcError as e:
         print_stderr(f'ERROR: {e}')
         sys.exit(1)
+    except Exception as e:
+        print_stderr(f'ERROR: Unexpected error during scan: {e}')
+        sys.exit(1)

🧹 Nitpick comments (12)

src/scanoss/scanners/folder_hasher.py (6)

17-27: Add type hints to improve code maintainability.

Consider adding type hints to improve code maintainability and IDE support.

 class DirectoryNode:
     """
     Represents a node in the directory tree for folder hashing.
     """

     def __init__(self, path: str):
-        self.path = path
-        self.is_dir = True
-        self.children: Dict[str, DirectoryNode] = {}
-        self.files: List[DirectoryFile] = []
+        self.path: str = path
+        self.is_dir: bool = True
+        self.children: Dict[str, "DirectoryNode"] = {}
+        self.files: List["DirectoryFile"] = []

29-38: Add type hints and property validation.

Consider adding type hints and property validation to improve code maintainability and data integrity.

 class DirectoryFile:
     """
     Represents a file in the directory tree for folder hashing.
     """

     def __init__(self, path: str, key: bytes, key_str: str):
+        if not isinstance(key, bytes):
+            raise ValueError("key must be bytes")
+        if not isinstance(key_str, str):
+            raise ValueError("key_str must be str")
+
-        self.path = path
-        self.key = key
-        self.key_str = key_str
+        self.path: str = path
+        self.key: bytes = key
+        self.key_str: str = key_str

51-60: Add argument validation.

Consider adding argument validation to ensure the provided arguments are valid.

 def create_folder_hasher_config_from_args(args) -> FolderHasherConfig:
+    if not hasattr(args, 'debug'):
+        raise ValueError("args must have 'debug' attribute")
+    if not hasattr(args, 'trace'):
+        raise ValueError("args must have 'trace' attribute")
+    if not hasattr(args, 'quiet'):
+        raise ValueError("args must have 'quiet' attribute")
+
     return FolderHasherConfig(
         debug=args.debug,
         trace=args.trace,
         quiet=args.quiet,
         output_file=getattr(args, 'output', None),
         output_format=getattr(args, 'format', 'json'),
         settings_file=getattr(args, 'settings', None),
         skip_settings_file=getattr(args, 'skip_settings_file', False),
     )

242-261: Optimize head calculation.

The _head_calc method can be optimized using list comprehension and bitwise operations.

 def _head_calc(self, sim_hash: int) -> int:
     """
     Compute the head value from a simhash integer.

     The function extracts each byte from the simhash, multiplies it by 2,
     sums these values, then shifts the result right by 4 bits and returns the lowest 8 bits.

     Args:
         sim_hash (int): The input simhash value.

     Returns:
         int: The computed head value as an 8-bit integer.
     """
-    total = 0
-    for i in range(8):
-        # Extract each byte and multiply by 2
-        b = (sim_hash >> (i * 8)) & 0xFF
-        total += b * 2
-    # Shift right by 4 bits and extract the lowest 8 bits
-    return (total >> 4) & 0xFF
+    # Extract bytes, multiply by 2, sum, shift right by 4 bits, and get lowest 8 bits
+    return (sum(((sim_hash >> (i * 8)) & 0xFF) * 2 for i in range(8)) >> 4) & 0xFF

100-119: Add progress tracking for better user experience.

Consider adding progress tracking to provide better feedback during directory hashing.

 def hash_directory(self, path: str) -> dict:
     """
     Generate the folder hashing request structure from a directory path.

     This method builds a directory tree (DirectoryNode) and computes the associated
     hash data for the folder.

     Args:
         path (str): The root directory path.

     Returns:
         dict: The folder hash request structure.
     """

+    self.print_debug(f'Building directory tree for {path}...')
     root_node = self._build_root_node(path)
+    self.print_debug('Computing hash data...')
     tree = self._hash_calc_from_node(root_node)

     self.tree = tree

     return tree

278-295: Add error handling for output formatting.

Consider adding error handling to gracefully handle JSON serialization errors.

 def _format_json_output(self) -> str:
     """
     Format the scan output data into a JSON object

     Returns:
         str: The formatted JSON string
     """
-    return json.dumps(self.folder_hasher.tree, indent=2)
+    try:
+        return json.dumps(self.folder_hasher.tree, indent=2)
+    except (TypeError, ValueError) as e:
+        self.print_debug(f'Error formatting JSON output: {e}')
+        return json.dumps({'error': str(e)}, indent=2)

 def _format_plain_output(self) -> str:
     """
     Format the scan output data into a plain text string
     """
-    return (
-        json.dumps(self.folder_hasher.tree, indent=2)
-        if isinstance(self.folder_hasher.tree, dict)
-        else str(self.folder_hasher.tree)
-    )
+    try:
+        if isinstance(self.folder_hasher.tree, dict):
+            return json.dumps(self.folder_hasher.tree, indent=2)
+        return str(self.folder_hasher.tree)
+    except Exception as e:
+        self.print_debug(f'Error formatting plain output: {e}')
+        return str(e)

src/scanoss/file_filters.py (3)

207-212: Document the purpose of new skipped extensions.

Consider adding comments to explain why these extensions are skipped.

+    # Skip web help markup language files
     '.whml',
+    # Skip Maven project object model files
     '.pom',
+    # Skip simplified markup template language files
     '.smtml',
+    # Skip minified JavaScript files
     '.min.js',
+    # Skip manifest files
     '.mf',
+    # Skip base64 encoded files
     '.base64',

242-242: Document the kwargs parameter in the class docstring.

Add kwargs documentation to the class docstring for better API understanding.

 class FileFilters(ScanossBase):
     """
     Filter for determining which files to process during scanning, fingerprinting, etc.
     Handles both inclusion and exclusion rules based on file paths, extensions, and sizes.
+
+    Args:
+        debug (bool): Enable debug output
+        trace (bool): Enable trace output
+        quiet (bool): Suppress output
+        **kwargs: Additional configuration options
+            scanoss_settings (ScanossSettings): Custom settings to override defaults
+            all_extensions (bool): Include all file extensions
+            all_folders (bool): Include all folders
+            hidden_files_folders (bool): Include hidden files and folders
+            operation_type (str): Operation type ('scanning' or 'fingerprinting')
+            skip_size (int): Size to skip
+            skip_extensions (list): Extensions to skip
+            skip_folders (list): Folders to skip
     """

518-552: Improve error handling and file reading.

Consider using more specific error handling and optimizing file reading.

 def _should_skip_file_for_hfh(self, file_path: Path) -> bool:
     """
     Check if a file should be skipped during folder hashing scan.

     Args:
         file_path (Path): The path to the file to check.

     Returns:
         bool: True if the file should be skipped, False otherwise.
     """
     try:
         if (
             any(part.startswith('.') for part in file_path.parts)  # Hidden files/folders
             or file_path.is_symlink()  # Symlinks
             or file_path.stat().st_size == 0  # Empty files
         ):
             self.print_debug(f'Skipping file: {file_path} (hidden/symlink/empty)')
             return True

         # Files ending with null
         if file_path.suffix.lower() == '.txt':
             try:
-                with open(file_path, 'rb') as f:
-                    if f.read().endswith(b'\x00'):
+                # Read only the last byte to check for null
+                with open(file_path, 'rb') as f:
+                    f.seek(-1, 2)  # Seek to last byte
+                    if f.read(1) == b'\x00':
                         self.print_debug(f'Skipping file: {file_path} (text file ending with null)')
                         return True
-            except (OSError, IOError):
+            except (OSError, IOError) as e:
+                self.print_debug(f'Error reading file {file_path}: {e}')
                 self.print_debug(f'Skipping file: {file_path} (cannot read file content)')
                 return True

         return False

-    except Exception as e:
+    except (OSError, ValueError) as e:
         self.print_debug(f'Error checking file {file_path}: {str(e)}')
         return True

src/scanoss/cli.py (1)

1574-1583: Add input validation for settings file.

Consider adding validation for the settings file path.

 def get_scanoss_settings_from_args(args):
+    if args.settings and not os.path.exists(args.settings):
+        print_stderr(f'ERROR: Settings file {args.settings} does not exist')
+        sys.exit(1)
+
     scanoss_settings = None
     if not args.skip_settings_file:
         scanoss_settings = ScanossSettings(debug=args.debug, trace=args.trace, quiet=args.quiet)
         try:
             scanoss_settings.load_json_file(args.settings, args.scan_dir).set_file_type('new').set_scan_type('identify')
         except ScanossSettingsError as e:
             print_stderr(f'Error: {e}')
             sys.exit(1)
         return scanoss_settings

docs/source/index.rst (2)

231-273: Add example usage and output format description.

Consider adding example usage and output format description to improve documentation.

 --------------------------------
 Folder Scanning: folder-scan, fs
 --------------------------------

 Performs a comprehensive scan of a directory using folder hashing to identify components and their matches.

 .. code-block:: bash

    scanoss-py folder-scan <directory>

+Example usage:
+
+.. code-block:: bash
+
+   # Basic usage
+   scanoss-py folder-scan /path/to/directory
+
+   # With best match and custom threshold
+   scanoss-py folder-scan /path/to/directory -bm --threshold 80
+
+   # Save results to a file
+   scanoss-py folder-scan /path/to/directory -o results.json
+
+Output format:
+
+.. code-block:: json
+
+   {
+     "matches": [
+       {
+         "file": "example.js",
+         "component": "example-lib",
+         "version": "1.0.0",
+         "license": "MIT",
+         "score": 100
+       }
+     ]
+   }
+
 .. list-table:: 
    :widths: 20 30
    :header-rows: 1

274-303: Add example usage and output format description.

Consider adding example usage and output format description to improve documentation.

 --------------------------------
 Folder Hashing: folder-hash, fh
 --------------------------------

 Generates cryptographic hashes for files in a given directory and its subdirectories.

 .. code-block:: bash

    scanoss-py folder-hash <directory>

+Example usage:
+
+.. code-block:: bash
+
+   # Basic usage
+   scanoss-py folder-hash /path/to/directory
+
+   # Save results to a file
+   scanoss-py folder-hash /path/to/directory -o hashes.json
+
+Output format:
+
+.. code-block:: json
+
+   {
+     "path_id": "/path/to/directory",
+     "sim_hash_names": "a1b2c3d4",
+     "sim_hash_content": "e5f6g7h8",
+     "children": [
+       {
+         "path_id": "/path/to/directory/subdirectory",
+         "sim_hash_names": "i9j0k1l2",
+         "sim_hash_content": "m3n4o5p6",
+         "children": []
+       }
+     ]
+   }
+
 .. list-table:: 
    :widths: 20 30
    :header-rows: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2dc9e4a and b558ed2.

📒 Files selected for processing (9)

CHANGELOG.md (2 hunks)
docs/source/index.rst (1 hunks)
setup.cfg (1 hunks)
src/scanoss/cli.py (46 hunks)
src/scanoss/file_filters.py (6 hunks)
src/scanoss/results.py (6 hunks)
src/scanoss/scanners/folder_hasher.py (1 hunks)
src/scanoss/scanners/scanner_hfh.py (1 hunks)
src/scanoss/utils/abstract_presenter.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

setup.cfg
src/scanoss/results.py
src/scanoss/utils/abstract_presenter.py
src/scanoss/scanners/scanner_hfh.py

🧰 Additional context used

🪛 LanguageTool

CHANGELOG.md

[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-02-10 ### Added - Add folder-scan subcommand - Add folder-has...

(REPEATED_VERBS)

🔇 Additional comments (3)

src/scanoss/scanners/folder_hasher.py (1)

40-49: LGTM!

The dataclass is well-designed with proper type hints, default values, and appropriate use of Optional and Literal types.

CHANGELOG.md (2)

12-17: LGTM! Clear and well-structured changelog entry.

The new version section follows the established format and clearly documents the new features added in version 1.21.0, which align well with the PR objectives (ES-163 Add folder hashing support).

🧰 Tools

🪛 LanguageTool

[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-02-10 ### Added - Add folder-scan subcommand - Add folder-has...

(REPEATED_VERBS)

466-468: LGTM! Version comparison links are properly updated.

The version comparison links are correctly added and follow the established pattern, maintaining the chronological order of releases.

src/scanoss/cli.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (10)

src/scanoss/file_filters.py (4)
61-66: Include .github in skipped directories

Good addition of common directories to skip during scans. These are typical directories containing documentation, examples, and tests that are not necessary for the scanning process. Consider also including .github which is another common directory pattern that typically contains workflow files and issue templates.

311-316: Consider consistent handling of symlinks

While you're warning about symbolic link folders, you're not automatically skipping them in the folder scan. However, in the _should_skip_file_for_hfh method, symlinks are skipped. Consider adopting a consistent approach to handling symlinks across all methods.
-            if dir_path.is_symlink():  # TODO should we skip symlink folders?
-                self.print_msg(f'WARNING: Found symbolic link folder: {dir_path}')
+            if dir_path.is_symlink():
+                self.print_msg(f'Skipping symbolic link folder: {dir_path}')
+                dirnames.clear()
+                continue
529-564: Consider more specific error handling in _should_skip_file_for_hfh

The new method for folder hashing filtering is well-structured but uses a very broad exception handler. Consider catching more specific exceptions to avoid masking unexpected issues.

Additionally, the special handling for .txt files ending with null bytes is interesting. Consider adding a comment explaining why this check is necessary for folder hashing (is it related to potential binary files with incorrect extensions?).
-        except Exception as e:
+        except (OSError, PermissionError, ValueError) as e:
549-558: Consider using a context manager for file operations

When checking for null bytes in text files, the code already uses a with statement for proper file handling, which is good. However, there are several conditions that might cause exceptions when opening the file. Consider adding debug output in the except block to provide more specific information about why the file couldn't be read.
            except (OSError, IOError):
-                self.print_debug(f'Skipping file: {file_path} (cannot read file content)')
+                self.print_debug(f'Skipping file: {file_path} (cannot read file content - file may be locked or inaccessible)')
src/scanoss/scanners/folder_hasher.py (6)

17-28: Consider storing only direct files per node.
Currently, the code structure suggests that files may get stored in multiple directory nodes along the path.

If your use case requires a more traditional tree, consider appending each file to only the deepest node representing its immediate directory.

40-49: Consider broadening output format support in the future.
The Literal['json'] default is fine now, but you might eventually want to accommodate other formats (e.g., YAML, CSV).

100-120: Gracefully handle empty sets of files.
If no files are filtered (e.g., empty directory), consider returning an explicit indicator (like an empty structure) so the user knows no files were processed.

121-175: Potential double inclusion of files in the tree.
Because each ancestor directory node also appends the file, a file might appear in multiple levels. Double-check that this is aligned with your intended design.

176-197: Children array is purely a list.
Returning a list of children can make it harder to distinguish unique directory names. If needed, consider preserving them in a dictionary keyed by directory name/path.

264-297: Plain output is effectively JSON.
_format_plain_output returns a JSON dump if the tree is a dictionary. If a simpler plain-text format is desired, consider an alternative method.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b558ed2 and 6a8507d.

📒 Files selected for processing (2)

src/scanoss/file_filters.py (7 hunks)
src/scanoss/scanners/folder_hasher.py (1 hunks)

🔇 Additional comments (10)

src/scanoss/file_filters.py (4)

215-224: LGTM: Comprehensive extension exclusion list

The added file extensions represent a good set of file types to exclude. Many of these are either minified files (.min.js), build artifacts (.mf), or diff/patch files that shouldn't be included in the scanning process.

253-253: Improved constructor flexibility with kwargs pattern

The refactoring to use **kwargs with kwargs.get() pattern is a good improvement for flexibility and future-proofing the API. This allows for backward compatibility while supporting additional parameters in the future without breaking changes.

Also applies to: 261-282

452-452: API change: Method visibility change from private to public

Changing _should_skip_dir to should_skip_dir (removing the underscore) makes this method part of the public API. Ensure this is an intentional change as it may impact users who have extended or used this class, and update any relevant documentation accordingly.

539-564: Add unit test for _should_skip_file_for_hfh

The new _should_skip_file_for_hfh method contains complex logic, especially the handling of text files with null bytes. Consider adding unit tests to verify this functionality works as expected across different file types and edge cases.

src/scanoss/scanners/folder_hasher.py (6)

1-8: Check dependency on progress.bar.
The import for progress.bar is not part of the standard library.

Please ensure that it's properly included and pinned in your environment's requirements or dependencies to avoid runtime errors.

29-39: DirectoryFile encapsulation looks clean.
It straightforwardly holds file path and CRC64 hash info.

51-61: Config creation from args is well-structured.
It aligns neatly with your argument parsing approach, keeping concerns separated.

63-99: Validate existence of the scan directory upfront.
Before hashing, confirm that scan_dir points to a valid directory to provide clearer error messages if something is amiss.

198-242: Duplicate file hashes at ancestor nodes.
_hash_calc relies on a per-node set of processed_hashes, so the same file might be rehashed at higher nodes. Verify this won't create inflated or repeated results.

243-263: Double-check bit-shift logic in _head_calc.
You shift right by 4 bits and then constrain the result to 8 bits. Confirm that the truncated data still meets your desired collision strategy.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

src/scanoss/scanners/folder_hasher.py (2)

25-31: Consider adding __str__ or __repr__ methods for easier debugging.
This class may benefit from a textual representation to aid in logging, troubleshooting, and testing.

295-311: Unify or differentiate JSON and plain text formats more clearly.
The _format_plain_output method currently returns JSON if the data is a dictionary, which may be surprising. Consider offering a genuinely plain text format for better clarity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a8507d and 99c1a62.

📒 Files selected for processing (1)

src/scanoss/scanners/folder_hasher.py (1 hunks)

🔇 Additional comments (2)

src/scanoss/scanners/folder_hasher.py (2)

231-244: Verify whether skipping hash calculations for smaller directories is intended.
By returning None when fewer than 8 files are present or when filenames total fewer than 32 characters, entire subdirectories might be ignored in the final hash. Confirm that this aligns with your use case.

251-252: Rewriting the most significant byte of the name simhash is a neat technique.
Ensure that this does not introduce collisions or side effects when analyzing downstream consumers of the simhash value.

coderabbitai · 2025-03-05T09:14:09Z

src/scanoss/scanners/folder_hasher.py

+                file_bytes = full_file_path.read_bytes()
+                key = CRC64.get_hash_buff(file_bytes)


🛠️ Refactor suggestion

Use a streaming or chunk-based approach for hashing large files.
Reading the entire file into memory might significantly impact performance, especially with very large files.

coderabbitai · 2025-03-05T09:14:09Z

src/scanoss/scanners/folder_hasher.py

+                for part in Path(rel_path).parent.parts:
+                    child_path = str(Path(current_node.path) / part)
+                    if child_path not in current_node.children:
+                        current_node.children[child_path] = DirectoryNode(child_path)
+                    current_node = current_node.children[child_path]
+                    current_node.files.append(file_item)
+
+                root_node.files.append(file_item)


⚠️ Potential issue

Avoid storing file references in both parent and child nodes to prevent duplication.
Appending the same DirectoryFile object to each parent directory may lead to bloated data structures, confusing file structures, and redundant file entries. Consider only maintaining the file reference in the leaf node corresponding to its directory.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/scanoss/file_filters.py (2)
529-563: Narrow the exception handling in _should_skip_file_for_hfh

The broad try-except block that catches all exceptions could mask unexpected errors, making debugging difficult. Consider catching more specific exceptions like OSError, IOError, etc., rather than the generic Exception class.

Additionally, the null-byte check only applies to .txt files. Should this be extended to other text file types?
- try:
-     if (
-         any(part.startswith('.') for part in file_path.parts)  # Hidden files/folders
-         or file_path.is_symlink()  # Symlinks
-         or file_path.stat().st_size == 0  # Empty files
-     ):
-         self.print_debug(f'Skipping file: {file_path} (hidden/symlink/empty)')
-         return True
+ # Check for hidden files/folders
+ if any(part.startswith('.') for part in file_path.parts):
+     self.print_debug(f'Skipping file: {file_path} (hidden file/folder)')
+     return True
+ 
+ try:
+     # Check for symlinks and empty files
+     if file_path.is_symlink():
+         self.print_debug(f'Skipping file: {file_path} (symlink)')
+         return True
+     
+     if file_path.stat().st_size == 0:
+         self.print_debug(f'Skipping file: {file_path} (empty file)')
+         return True
And for the null-byte check:
- # Files ending with null
- if file_path.suffix.lower() == '.txt':
+ # Files ending with null bytes - check common text file types
+ if file_path.suffix.lower() in ['.txt', '.md', '.csv', '.log', '.json', '.xml', '.yaml', '.yml']:
549-557: Consider enhancing the text file null-byte check

The check for null bytes only targets .txt files. Consider expanding this to other common text file formats that might have similar issues, such as .md, .csv, .log, etc.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99c1a62 and 19b9c1e.

📒 Files selected for processing (1)

src/scanoss/file_filters.py (7 hunks)

🔇 Additional comments (5)

src/scanoss/file_filters.py (5)

215-224: LGTM: File extension filtering expanded appropriately

The additional file extensions added to DEFAULT_SKIPPED_EXT appropriately handle more non-code files that should be skipped during scanning operations.

253-281: Well-structured kwargs implementation and documentation

The refactoring to use kwargs with default values and detailed documentation improves the flexibility of the FileFilters class while maintaining backward compatibility. This is a good approach for extending functionality.

311-316: LGTM: Improved path handling with pathlib

The switch to using Path objects from pathlib is a good modernization that makes path operations more robust and readable.

452-452:
❓ Verification inconclusive

Method made public - verify all usages are updated

The _should_skip_dir method has been renamed to should_skip_dir (removing the underscore), making it part of the public API. Ensure that all places in the codebase that call this method have been updated accordingly.

🏁 Script executed:
#!/bin/bash
# Check for any remaining calls to the old method name

echo "Searching for code that might still use the old _should_skip_dir method name..."
rg "_should_skip_dir" --type py
Length of output: 200
Manual Verification Required: Confirm All References to _should_skip_dir Are Updated

The automated search did not show any remaining usages of _should_skip_dir. However, since the command output was minimal, please manually verify that all calls to _should_skip_dir have been updated to the new should_skip_dir method.

Confirm that the old method is not defined or used anywhere in the codebase.

Double-check call sites in related modules and tests.

61-66:
❓ Verification inconclusive

Consider clarifying the commented directories in DEFAULT_SKIPPED_DIRS

These commented-out directories (example, examples, docs, tests, doc, test) suggest they might be skipped in certain scenarios. If there's a feature flag or configuration to enable these, it should be documented. Otherwise, consider removing the commented entries to avoid confusion.

🏁 Script executed:
#!/bin/bash
# Check if these commented directories are used conditionally elsewhere in the codebase

echo "Searching for code that might conditionally use these commented directories..."
rg -A 5 -B 5 "(DEFAULT_SKIPPED_DIRS.*example|example.*DEFAULT_SKIPPED_DIRS)" --type py
Length of output: 255
Action: Clarify or Remove Commented-Out Directories in DEFAULT_SKIPPED_DIRS

The search did not reveal any conditional usage of these directories elsewhere in the codebase. It remains unclear if these entries are placeholders for a future feature flag or simply vestigial. If they are meant to be activated via configuration or a flag, please add documentation explaining their usage. Otherwise, consider removing these commented-out entries from src/scanoss/file_filters.py (lines 61–66) to avoid potential confusion.

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (2)

src/scanoss/file_filters.py (1)
553-563: ⚠️ Potential issue

Remove duplicate code blocks.

These lines appear to be an exact duplicate of lines 542-552. They should be removed to avoid confusion and maintenance issues.
-                            self.print_debug(f'Skipping file: {file_path} (text file ending with null)')
-                            return True
-                except (OSError, IOError):
-                    self.print_debug(f'Skipping file: {file_path} (cannot read file content)')
-                    return True
-
-            return False
-
-        except Exception as e:
-            self.print_debug(f'Error checking file {file_path}: {str(e)}')
-            return True
src/scanoss/scanners/folder_hasher.py (1)
297-315: 🛠️ Refactor suggestion

Remove duplicated output formatting methods.
This block repeats _format_json_output and _format_plain_output already defined in lines 278–295. Having duplicate methods is error-prone and can lead to maintenance overhead.
-    def _format_json_output(self) -> str:
-        ...
-    def _format_plain_output(self) -> str:
-        ...
+    # Remove these duplicate methods, or unify them with the original definitions at lines 278-295.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19b9c1e and 7a880ee.

📒 Files selected for processing (9)

src/scanoss/file_filters.py (5 hunks)
src/scanoss/file_filters.py (1 hunks)
src/scanoss/scanners/folder_hasher.py (1 hunks)
src/scanoss/file_filters.py (2 hunks)
src/scanoss/scanners/folder_hasher.py (2 hunks)
src/scanoss/scanners/folder_hasher.py (3 hunks)
src/scanoss/file_filters.py (1 hunks)
src/scanoss/file_filters.py (1 hunks)
src/scanoss/scanners/folder_hasher.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (7)

src/scanoss/file_filters.py
src/scanoss/file_filters.py
src/scanoss/file_filters.py
src/scanoss/scanners/folder_hasher.py
src/scanoss/scanners/folder_hasher.py
src/scanoss/scanners/folder_hasher.py
src/scanoss/file_filters.py

🔇 Additional comments (4)

src/scanoss/file_filters.py (1)

518-552: Well-implemented file skip logic for folder hashing.

The new _should_skip_file_for_hfh method provides good encapsulation of filter logic specific to folder hashing operations. The implementation properly handles various conditions like hidden files, symlinks, empty files, and text files with null bytes. The error handling is robust with appropriate debug logging.

Consider clarifying what "hfh" stands for in the method name or docstring - I assume it's "folder hashing" but this might not be immediately clear to all developers.

src/scanoss/scanners/folder_hasher.py (3)

100-120: Consider adding unit tests for edge cases.
It would be beneficial to include unit tests that cover scenarios like empty directories, large directories, and permission-restricted files to ensure the hashing logic operates correctly under all conditions.

Would you like a script to scan for existing test files and confirm coverage for these edge cases?

150-151: Use a streaming or chunk-based approach for hashing large files.
Reading the entire file into memory at once can significantly impact performance and memory usage for very large files.

163-165: Avoid storing file references in both parent and child nodes.
Appending the same DirectoryFile object to the current node and root_node may cause duplication, leading to confusing directory structures.

src/scanoss/scanners/folder_hasher.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/scanoss/scanners/folder_hasher.py (1)

77-79: Parameter overshadow in class constructor

scan_dir is provided to the constructor but is overshadowed by the separate path parameter in hash_directory(). Consider removing scan_dir or using it consistently everywhere to reduce confusion.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a880ee and 79bbf0a.

📒 Files selected for processing (2)

src/scanoss/file_filters.py (6 hunks)
src/scanoss/scanners/folder_hasher.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/scanoss/file_filters.py

🧰 Additional context used

🧬 Code Definitions (1)

src/scanoss/scanners/folder_hasher.py (4)

src/scanoss/file_filters.py (2)

FileFilters (247-527)

get_filtered_files_from_folder (283-324)

src/scanoss/utils/abstract_presenter.py (4)

AbstractPresenter (8-68)

present (29-48)

_format_json_output (57-61)

_format_plain_output (64-68)

src/scanoss/results.py (3)

present (261-263)

_format_json_output (63-84)

_format_plain_output (86-99)

src/scanoss/scanners/scanner_hfh.py (3)

present (128-130)

_format_json_output (143-150)

_format_plain_output (152-160)

🔇 Additional comments (4)

src/scanoss/scanners/folder_hasher.py (4)

50-50: Potential type mismatch for output_format in FolderHasherConfig

Currently, output_format is annotated as Literal['json'] even though the presenter can also handle 'plain'. Verify and confirm that passing 'plain' will not lead to unexpected exceptions or type-checking issues.

158-159: Use a streaming approach to avoid high memory usage

Reading the entire file content into memory might impact performance significantly for large files. Adopting a chunk-based or streaming approach can mitigate memory pressure and improve scalability.

166-173: Avoid storing the same file reference in multiple nodes

Appending the same DirectoryFile object to both the child node and the root node can lead to redundancy and confusion in the directory tree. Consider storing file references only in the most relevant directory node to prevent duplication.

288-316: Overall presentation logic appears solid

The presenter subclasses AbstractPresenter correctly, and the JSON/plain output generation for folder hashes is well-structured. No immediate concerns here.

src/scanoss/scanners/folder_hasher.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

src/scanoss/scanners/folder_hasher.py (5)

175-178: Use more specific exception types for better error handling.

Catching all exceptions with a generic Exception handler might mask unexpected errors that should be handled differently.

Consider specifying the exceptions you expect to catch:

-            except Exception as e:
-                self.base.print_debug(f'Skipping file {full_file_path}: {str(e)}')
+            except (IOError, PermissionError, OSError) as e:
+                self.base.print_debug(f'Skipping file {full_file_path}: {str(e)}')
+            except Exception as e:
+                self.base.print_debug(f'Unexpected error processing file {full_file_path}: {str(e)}')
+                # Consider whether to re-raise certain exceptions

234-249: Add a descriptive comment explaining minimum threshold requirements.

The code checks if there are enough files and if the concatenated names are long enough, but it would be helpful to explain why these thresholds exist.

        if len(selected_names) < MINIMUM_FILE_COUNT:
+            # Insufficient files to generate a reliable hash signature
            return {
                'name_hash': None,
                'content_hash': None,
            }

        selected_names.sort()
        concatenated_names = ''.join(selected_names)

        if len(concatenated_names.encode('utf-8')) < MINIMUM_CONCATENATED_NAME_LENGTH:
+            # Concatenated names too short to generate a meaningful signature
            return {
                'name_hash': None,
                'content_hash': None,
            }

254-255: Document why the most significant byte of the simhash is replaced.

The replacement of the most significant byte with a head value is an important detail that deserves explanation.

        # Calculate head and overwrite MS byte
        head = self._head_calc(names_simhash)
+        # Replace most significant byte with head value to create a unique 
+        # identifier while maintaining the simhash properties in the remaining bytes
        names_simhash = (names_simhash & 0x00FFFFFFFFFFFFFF) | (head << 56)

307-315: _format_plain_output returns JSON regardless of the name.

The plain output formatter returns JSON content wrapped in a conditional that always evaluates to JSON content, making the method name potentially misleading.

Consider either:

Renaming the method to better reflect its functionality
Implementing a truly plain text output format that's distinct from JSON

    def _format_plain_output(self) -> str:
        """
-        Format the scan output data into a plain text string
+        Format the scan output data into a plain text string (currently returns JSON)
        """
        return (
            json.dumps(self.folder_hasher.tree, indent=2)
            if isinstance(self.folder_hasher.tree, dict)
            else str(self.folder_hasher.tree)
        )

150-151: Add a mechanism to disable progress bar in automated environments.

The progress bar is helpful for interactive use but may not be suitable in CI/CD or headless environments.

-        bar = Bar('Hashing files...', max=len(filtered_files))
+        bar = Bar('Hashing files...', max=len(filtered_files)) if not self.base.quiet else None
         for file_path in filtered_files:
             try:
                 # ... (existing code)
-            bar.next()
+            if bar:
+                bar.next()
             
-        bar.finish()
+        if bar:
+            bar.finish()

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 79bbf0a and eab81a4.

📒 Files selected for processing (1)

src/scanoss/scanners/folder_hasher.py (1 hunks)

🧰 Additional context used

🧬 Code Definitions (1)

src/scanoss/scanners/folder_hasher.py (4)

src/scanoss/file_filters.py (2)

FileFilters (247-527)

get_filtered_files_from_folder (283-324)

src/scanoss/utils/abstract_presenter.py (4)

AbstractPresenter (8-68)

present (29-48)

_format_json_output (57-61)

_format_plain_output (64-68)

src/scanoss/results.py (3)

present (261-263)

_format_json_output (63-84)

_format_plain_output (86-99)

src/scanoss/scanners/scanner_hfh.py (3)

present (128-130)

_format_json_output (143-150)

_format_plain_output (152-160)

🔇 Additional comments (5)

src/scanoss/scanners/folder_hasher.py (5)

143-145: File name filtering is now implemented correctly.

The filter condition correctly checks for names exceeding the maximum length, which aligns with the constant name MAXIMUM_FILE_NAME_LENGTH. This resolves the inconsistency mentioned in a past review comment.

158-159: Use a streaming or chunk-based approach for hashing large files.

Reading the entire file into memory might cause performance issues with very large files, consuming excessive memory.

Consider using a buffered reading approach:
-                file_bytes = full_file_path.read_bytes()
-                key = CRC64.get_hash_buff(file_bytes)
+                # Process file in chunks to avoid loading large files into memory
+                crc = CRC64()
+                with open(full_file_path, 'rb') as f:
+                    while chunk := f.read(65536):  # 64KB chunks
+                        crc.update(chunk)
+                key = crc.digest()
165-173: Avoid storing file references in both parent and child nodes to prevent duplication.

Appending the same DirectoryFile object to each parent directory may lead to bloated data structures and redundant file entries.

Consider only maintaining the file reference in its immediate parent directory, or use a more memory-efficient approach such as indexing files by ID.

176-176: Debug message is now using the correct method.

You're correctly using self.base.print_debug here, which resolves a previously identified issue.

250-255:
❓ Verification inconclusive

Ensure hash calculation is efficient with a suitable simhash algorithm.

The code uses both simhash for names and fingerprint for content. Verify that this approach produces a reliable and collision-resistant folder signature.

🏁 Script executed:
#!/bin/bash
# Find other uses of simhash functions in the codebase to ensure consistency
rg -A 3 "simhash\(" --type py
rg -A 3 "fingerprint\(" --type py

# Check if we should be using the same approach for both name_hash and content_hash
rg -A 5 "WordFeatureSet|vectorize_bytes" --type py
Length of output: 4261
Attention: Validate Folder Hashing Algorithm Efficiency and Robustness

The use of simhash on a WordFeatureSet for names alongside fingerprint(vectorize_bytes(...)) for file content is consistent with similar implementations in the codebase. In particular, the algorithm for computing a 64‑bit hash and then adjusting the most significant byte with a head value (via _head_calc) appears intentional.

Please verify that:

The dual approach (names and content) yields a collision-resistant folder signature under expected inputs.

The additional head calculation and bit masking do not adversely impact the hash’s uniqueness or performance.

The current simhash and fingerprint implementations are sufficiently efficient for the application's workload.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

src/scanoss/scanners/folder_hasher.py (1)

220-257: Validate performance of simhash for large file sets.
Although simhash is generally efficient, note that sorting names and computing feature sets for many files might become expensive. For massive directories, you might want to verify the time/space impact.

If necessary, consider calculating partial hashes or employing a parallel approach for extremely large sets.

src/scanoss/file_filters.py (2)

74-80: Confirm skipping new folders aligns with user expectations.
Adding doc, docs, test, and tests to the skip sets can be beneficial, but some users might still want them hashed in certain scenarios. Consider making this easily configurable or documented.

Also applies to: 82-95

439-439: FileFilters constructor now using **kwargs.
While this is flexible, it can make type-checking or IDE autocompletion less robust. Consider adopting a typed dataclass or explicit constructor parameters if you anticipate expansions or user misconfigurations.

Also applies to: 447-457, 460-468

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 779a921 and 819e324.

📒 Files selected for processing (2)

src/scanoss/file_filters.py (12 hunks)
src/scanoss/scanners/folder_hasher.py (1 hunks)

🧰 Additional context used

🧬 Code Definitions (1)

src/scanoss/scanners/folder_hasher.py (3)

src/scanoss/utils/abstract_presenter.py (4)

AbstractPresenter (8-68)

present (29-48)

_format_json_output (57-61)

_format_plain_output (64-68)

src/scanoss/results.py (3)

present (261-263)

_format_json_output (63-84)

_format_plain_output (86-99)

src/scanoss/scanners/scanner_hfh.py (3)

present (128-130)

_format_json_output (143-150)

_format_plain_output (152-160)

🔇 Additional comments (8)

src/scanoss/scanners/folder_hasher.py (5)

16-19: Clarify and confirm usage of naming constants.
The naming of MINIMUM_FILE_COUNT, MINIMUM_CONCATENATED_NAME_LENGTH, and MINIMUM_FILE_NAME_LENGTH suggests skipping items below a certain threshold, but please confirm their intended logic in the calling code to avoid confusion.

Would you like me to verify references to these constants in other files or code paths to ensure consistent usage?

126-144: Check usage of MINIMUM_FILE_NAME_LENGTH for filtering files.
Here, you only include files whose names have a length less than MINIMUM_FILE_NAME_LENGTH. This appears contradictory to the constant’s name, as “minimum” usually denotes the lower bound for acceptance, yet you’re including only very short filenames.

Can you confirm if this logic aligns with the intended spec, or if it should be > instead of <?

158-159: Consider using a streaming or chunk-based approach to read file contents.
Reading large files in their entirety with read_bytes() can lead to high memory usage. Revisiting the use of a chunk-based approach may improve performance for big files.

171-173: Avoid storing the same file reference at multiple node levels.
Appending file_item to both the specific child node and the root_node may cause data duplication or unintended complexity. Consider storing file references only where strictly needed.

258-291: Overall presentation logic seems sound.
The abstract presenter pattern is clean, and your JSON/plain text outputs are handled effectively. No immediate issues found here.

src/scanoss/file_filters.py (3)

49-59: Ensure consistency between DEFAULT_SKIPPED_FILES and DEFAULT_SKIPPED_FILES_HFH.
These newly introduced defaults for folder hashing scans look reasonable. However, confirm that skipping certain build wrapper scripts is indeed desired, especially if the user might rely on them.

245-254: Extensive default skip extensions for folder hashing.
This extended list of skipped extensions appears thorough. Double-check that none of these file types are crucial for certain scanning/hashing use cases.

Would you like me to run a usage scan in the repository to confirm no critical references to these file types exist?

Also applies to: 277-430

660-660: Renaming _should_skip_dir to should_skip_dir can affect external references.
This could break backward compatibility if any imports or references rely on the old name.

Would you like a shell script to grep for _should_skip_dir references outside this file to ensure nothing breaks?

feat: ES-163 add build children logic to folder hashing scan feat: ES-163 mimic go implementation of hashing folder content feat: ES-163 create our own crc64 implementation feat: ES-163 create our own simhash implementation based on go library feat: ES-163 better error handling feat: ES-163 update changelog, client help and version. add headers feat: ES-163 add AbstractPresenter to handle results output in a centralized way feat: ES-163 update changelog feat: ES-163 fix pr comments, update lint.yml workflow feat: ES-163 add best match and treshold arguments to hfh scan command downgrade python/protobuf versions feat: apply same filters from go minr feat: create standalone presenters that implement abstractpresenter feat: add progress tracking while scanning, clean folder-scan command arguments feat: use progress instead of rich feat: add folder hash sub-command, fix hfh spinner feat: update docs feat: update default files/directories to skip while scanning feat: add extra checks in file filters for folder hashing scan feat: comment new skipped dirs for now feat: align with go client again feat: filter hidden file/folders feat: add specific filters for hfh feat: add missing command args feat: ES-163 add build children logic to folder hashing scan feat: ES-163 mimic go implementation of hashing folder content feat: ES-163 create our own crc64 implementation feat: ES-163 create our own simhash implementation based on go library feat: ES-163 better error handling feat: ES-163 update changelog, client help and version. add headers feat: ES-163 add AbstractPresenter to handle results output in a centralized way feat: ES-163 update changelog feat: ES-163 fix pr comments, update lint.yml workflow feat: update changelog

coderabbitai

Actionable comments posted: 4

🔭 Outside diff range comments (1)

src/scanoss/scanossgrpc.py (1)
531-532: ⚠️ Potential issue

Remove unused variables.

SUCCEDED_WITH_WARNINGS_STATUS_CODE and FAILED_STATUS_CODE are never referenced and trigger lint errors. Remove them to address the warnings.
-    SUCCEDED_WITH_WARNINGS_STATUS_CODE = 2
-    FAILED_STATUS_CODE = 3
🧰 Tools

🪛 Ruff (0.8.2)

531-531: Local variable SUCCEDED_WITH_WARNINGS_STATUS_CODE is assigned to but never used

Remove assignment to unused variable SUCCEDED_WITH_WARNINGS_STATUS_CODE

(F841)

532-532: Local variable FAILED_STATUS_CODE is assigned to but never used

Remove assignment to unused variable FAILED_STATUS_CODE

(F841)

🪛 GitHub Actions: Lint

[error] 531-531: F841 Local variable SUCCEDED_WITH_WARNINGS_STATUS_CODE is assigned to but never used. Remove assignment to unused variable SUCCEDED_WITH_WARNINGS_STATUS_CODE.

[error] 532-532: F841 Local variable FAILED_STATUS_CODE is assigned to but never used. Remove assignment to unused variable FAILED_STATUS_CODE.

♻️ Duplicate comments (3)

src/scanoss/cli.py (2)

1589-1627: 🛠️ Refactor suggestion

Enhance error handling in folder_hashing_scan.

While basic error handling is present, consider adding more specific error types and resource cleanup.

 def folder_hashing_scan(parser, args):
     try:
         if not args.scan_dir:
             print_stderr('ERROR: Please specify a directory to scan')
             parser.parse_args([args.subparser, '-h'])
             sys.exit(1)

         if not os.path.exists(args.scan_dir) or not os.path.isdir(args.scan_dir):
             print_stderr(f'ERROR: The specified directory {args.scan_dir} does not exist')
             sys.exit(1)

+        if not os.access(args.scan_dir, os.R_OK):
+            print_stderr(f'ERROR: No read permission for directory {args.scan_dir}')
+            sys.exit(1)
+
         scanner_config = create_scanner_config_from_args(args)
         scanoss_settings = get_scanoss_settings_from_args(args)
         grpc_config = create_grpc_config_from_args(args)

         client = ScanossGrpc(**asdict(grpc_config))
+        
         scanner = ScannerHFH(
             scan_dir=args.scan_dir,
             config=scanner_config,
             client=client,
             scanoss_settings=scanoss_settings,
         )

         scanner.best_match = args.best_match
         scanner.threshold = args.threshold

+        try:
             scanner.scan()
             scanner.present(output_file=args.output, output_format=args.format)
+        finally:
+            client.close()  # Ensure client is properly closed
+
     except ScanossGrpcError as e:
         print_stderr(f'ERROR: {e}')
         sys.exit(1)
+    except Exception as e:
+        print_stderr(f'ERROR: Unexpected error during scan: {e}')
+        sys.exit(1)

1629-1659: 🛠️ Refactor suggestion

Enhance error handling in folder_hash.

Improve error handling by adding more specific exception types and providing better progress feedback.

 def folder_hash(parser, args):
     """Run the "folder-hash" sub-command

     Args:
         parser (ArgumentParser): command line parser object
         args (Namespace): Parsed arguments
     """
     try:
         if not args.scan_dir:
             print_stderr('ERROR: Please specify a directory to scan')
             parser.parse_args([args.subparser, '-h'])
             sys.exit(1)

         if not os.path.exists(args.scan_dir) or not os.path.isdir(args.scan_dir):
             print_stderr(f'ERROR: The specified directory {args.scan_dir} does not exist')
             sys.exit(1)

+        if not os.access(args.scan_dir, os.R_OK):
+            print_stderr(f'ERROR: No read permission for directory {args.scan_dir}')
+            sys.exit(1)
+
         folder_hasher_config = create_folder_hasher_config_from_args(args)
         scanoss_settings = get_scanoss_settings_from_args(args)

+        if not args.quiet:
+            print_stderr(f'Hashing directory: {args.scan_dir}')
+
         folder_hasher = FolderHasher(
             scan_dir=args.scan_dir,
             config=folder_hasher_config,
             scanoss_settings=scanoss_settings,
         )

         folder_hasher.hash_directory(args.scan_dir)
         folder_hasher.present(output_file=args.output, output_format=args.format)
-    except Exception as e:
+    except (OSError, IOError) as e:
+        print_stderr(f'ERROR: File system error: {e}')
+        sys.exit(1)
+    except ValueError as e:
+        print_stderr(f'ERROR: Invalid input: {e}')
+        sys.exit(1)
+    except Exception as e:
         print_stderr(f'ERROR: {e}')
         sys.exit(1)

src/scanoss/scanners/folder_hasher.py (1)

16-18: ⚠️ Potential issue

Check naming vs. usage.

MINIMUM_FILE_NAME_LENGTH = 32 suggests filenames shorter than 32 bytes might be invalid, yet line 143 filters files with len(f.name.encode('utf-8')) < MINIMUM_FILE_NAME_LENGTH. Verify whether you intended to exclude short filenames or exclude long ones. The logic is reversed from the variable naming.
- if f.is_file() and len(f.name.encode('utf-8')) < MINIMUM_FILE_NAME_LENGTH
+ if f.is_file() and len(f.name.encode('utf-8')) >= MINIMUM_FILE_NAME_LENGTH

🧹 Nitpick comments (22)

.github/workflows/lint.yml (2)
35-37: Remove trailing whitespace and verify filter comments.
There is a trailing space at the end of line 35, which triggers the YAMLlint error. Please remove it to comply with YAML formatting standards. The filtering logic and its accompanying comments look clear and appropriate as a temporary workaround.
-          # Filter out files that match exclude patterns from pyproject.toml␣
+          # Filter out files that match exclude patterns from pyproject.toml
🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 35-35: trailing spaces

(trailing-spaces)

58-58: Remove extra blank line.
Static analysis reports an excess blank line on line 58. Please remove it to adhere to YAMLlint guidelines.
-
🧰 Tools

🪛 YAMLlint (1.35.1)

[warning] 58-58: too many blank lines

(1 > 0) (empty-lines)
src/scanoss/api/provenance/v2/scanoss_provenance_pb2.py (1)
23-23: Consider using not instead of equality comparison with False

While this is auto-generated code that shouldn't be manually modified, for future reference:
-if _descriptor._USE_C_DESCRIPTORS == False:
+if not _descriptor._USE_C_DESCRIPTORS:
🧰 Tools

🪛 Ruff (0.8.2)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)
src/scanoss/api/common/v2/scanoss_common_pb2.py (1)
20-20: Consider using not instead of equality comparison with False

While this is auto-generated code that shouldn't be manually modified, for future reference:
-if _descriptor._USE_C_DESCRIPTORS == False:
+if not _descriptor._USE_C_DESCRIPTORS:
🧰 Tools

🪛 Ruff (0.8.2)

20-20: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (1)

77-86: Consider using fewer parameters or a configuration object

These methods have 10 parameters, which exceeds the recommended limit (5). While this is auto-generated code that shouldn't be modified directly, this pattern might be worth discussing with the gRPC team if you're contributing to the generator.

Also applies to: 94-103

🧰 Tools

🪛 Ruff (0.8.2)

77-77: Too many arguments in function definition (10 > 5)

(PLR0913)
src/protoc_gen_swagger/options/annotations_pb2.py (1)
22-22: Consider using not instead of equality comparison with False

While this is auto-generated code that shouldn't be manually modified, for future reference:
-if _descriptor._USE_C_DESCRIPTORS == False:
+if not _descriptor._USE_C_DESCRIPTORS:
🧰 Tools

🪛 Ruff (0.8.2)

22-22: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py (1)
23-24: Improve comparison with False

The comparison _descriptor._USE_C_DESCRIPTORS == False should be replaced with not _descriptor._USE_C_DESCRIPTORS for better Python style.
-if _descriptor._USE_C_DESCRIPTORS == False:
+if not _descriptor._USE_C_DESCRIPTORS:
🧰 Tools

🪛 Ruff (0.8.2)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)
src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (1)

155-155: Long lines in gRPC method declarations

Some lines exceed the recommended 120 character limit. Consider breaking these into multiple lines.

Also applies to: 172-172

🧰 Tools

🪛 Ruff (0.8.2)

155-155: Line too long (123 > 120)

(E501)
src/scanoss/cli.py (1)
1285-1285: Fix strip call with duplicate characters.

The pipeline reports a warning about duplicate characters in the strip call.
-            pac_local = pac.strip('file://')
+            pac_local = pac.replace('file://', '')
🧰 Tools

🪛 GitHub Actions: Lint

[error] 1285-1285: PLE1310 String strip call contains duplicate characters.
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (1)
6-6: Split the long import line
Static analysis flagged this line for exceeding 120 characters (E501). Consider splitting it for better readability and compliance with style guidelines.
-from scanoss.api.scanning.v2 import scanoss_scanning_pb2 as scanoss_dot_api_dot_scanning_dot_v2_dot_scanoss__scanning__pb2
+from scanoss.api.scanning.v2 import (
+    scanoss_scanning_pb2 as scanoss_dot_api_dot_scanning_dot_v2_dot_scanoss__scanning__pb2
+)
🧰 Tools

🪛 Ruff (0.8.2)

6-6: Line too long (122 > 120)

(E501)
src/scanoss/utils/crc64.py (1)
56-65: Consider supporting str type in the function signature
The method signature says data: bytes, but there's a check converting str to bytes. For clarity, either remove the check or include Union[str, bytes] in the signature.
-def update(self, data: bytes) -> None:
+def update(self, data: Union[str, bytes]) -> None:
     if isinstance(data, str):
         data = data.encode('utf-8')
     ...
src/scanoss/scanners/scanner_hfh.py (1)
94-126: Improve error handling for scan().

The try/finally block ensures the spinner always stops but does not re-raise any exceptions from the scanning process, which may suppress critical errors. Consider re-raising or explicitly returning an error indicator so that callers can handle failures.
 def scan(self) -> Optional[Dict]:
     ...
     try:
         response = self.client.folder_hash_scan(hfh_request)
         self.scan_results = response
+    except Exception as e:
+        self.base.print_stderr(f"Scan failure: {str(e)}")
+        raise
     finally:
         stop_spinner = True
         spinner_thread.join()
         spinner.finish()
     ...
src/scanoss/api/components/v2/scanoss_components_pb2.py (1)

23-23: Consider updating Protobuf generator configuration

While this is an auto-generated file, the equality comparison _descriptor._USE_C_DESCRIPTORS == False could be written as not _descriptor._USE_C_DESCRIPTORS according to Python style guidelines.

If possible, consider updating the configuration of your Protobuf generator in a future update.

🧰 Tools

🪛 Ruff (0.8.2)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

src/scanoss/file_filters.py (5)

49-59: Avoid code duplication.

DEFAULT_SKIPPED_FILES_HFH heavily overlaps with DEFAULT_SKIPPED_FILES. Consider unifying these sets under a single constant and applying conditional logic (e.g., by a flag) instead of introducing a near-duplicate.

82-101: Consider merging directory skip sets.

Both DEFAULT_SKIPPED_DIRS_HFH and DEFAULT_SKIPPED_DIR_EXT_HFH mimic existing sets. Unify them if possible (similar to how you might unify file skip sets) to maintain DRY principles and reduce future maintenance overhead.

276-430: Watch out for repetition in DEFAULT_SKIPPED_EXT_HFH.

These lines replicate much of DEFAULT_SKIPPED_EXT. If there's only a slight difference in behavior, consider merging them to avoid inconsistent updates between scanning logic and folder hashing logic in the future.

439-457: Constructor argument expansion.

Allowing **kwargs is flexible; however, it can obscure which parameters are expected. Consider maintaining a simple typed signature or adding type hints and docstrings for each supported kwarg (especially for scanoss_settings vs. others) to maintain clarity in the API.

706-726: Consolidate skip file checks.

The _should_skip_file method duplicates logic between scanning vs. folder hashing if is_folder_hashing_scan is set. Consider factoring out the overlap into a shared helper that merges or scopes these sets to avoid diverging patterns over time.

src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (1)

57-98: Unimplemented methods.

All server methods raise NotImplementedError. If these methods aren't planned for immediate implementation, consider providing partial functionality or returning an explicit “not supported” response. This helps clarify the endpoint's behavior for users.

🧰 Tools

🪛 Ruff (0.8.2)

79-79: Line too long (123 > 120)

(E501)
src/scanoss/utils/simhash.py (2)
100-109: Consider implementing a check for the vector length.

The fingerprint function assumes the input vector has 64 elements. Adding a validation check would make the function more robust.
 def fingerprint(v: list) -> int:
     """
     Given a 64-element vector, return a 64-bit fingerprint.
     For each bit i, if v[i] >= 0, set bit i to 1; otherwise leave it 0.
     """
+    if len(v) != 64:
+        raise ValueError('simhash.fingerprint(): input vector must have 64 elements')
     f = 0
     for i in range(64):
         if v[i] >= 0:
56-63: Consider adding type hints for return values.

The return type hints for the feature creation functions are correct but could be made more specific by using the actual class name.
-def new_feature(f: bytes) -> SimhashFeature:
+def new_feature(f: bytes) -> 'SimhashFeature':
     """Return a new feature for the given byte slice with weight 1."""
     return SimhashFeature(fnv1_64(f), 1)


-def new_feature_with_weight(f: bytes, weight: int) -> SimhashFeature:
+def new_feature_with_weight(f: bytes, weight: int) -> 'SimhashFeature':
     """Return a new feature for the given byte slice with the given weight."""
     return SimhashFeature(fnv1_64(f), weight)
src/protoc_gen_swagger/options/openapiv2_pb2.py (1)
22-22: Minor style issue in equality comparison.

While functional, using not _descriptor._USE_C_DESCRIPTORS is the preferred Python style over _descriptor._USE_C_DESCRIPTORS == False.
-if _descriptor._USE_C_DESCRIPTORS == False:
+if not _descriptor._USE_C_DESCRIPTORS:
🧰 Tools

🪛 Ruff (0.8.2)

22-22: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 819e324 and 4b2e026.

📒 Files selected for processing (42)

.github/workflows/lint.yml (2 hunks)
CHANGELOG.md (2 hunks)
CLIENT_HELP.md (1 hunks)
docs/source/index.rst (1 hunks)
pyproject.toml (1 hunks)
requirements.txt (1 hunks)
setup.cfg (1 hunks)
src/protoc_gen_swagger/options/annotations_pb2.py (1 hunks)
src/protoc_gen_swagger/options/annotations_pb2_grpc.py (1 hunks)
src/protoc_gen_swagger/options/openapiv2_pb2.py (1 hunks)
src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py (1 hunks)
src/scanoss/__init__.py (1 hunks)
src/scanoss/api/common/v2/scanoss_common_pb2.py (1 hunks)
src/scanoss/api/common/v2/scanoss_common_pb2_grpc.py (1 hunks)
src/scanoss/api/components/v2/scanoss_components_pb2.py (1 hunks)
src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (3 hunks)
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py (1 hunks)
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (3 hunks)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py (1 hunks)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (3 hunks)
src/scanoss/api/provenance/v2/scanoss_provenance_pb2.py (2 hunks)
src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (1 hunks)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (1 hunks)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (2 hunks)
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py (1 hunks)
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (2 hunks)
src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py (1 hunks)
src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (3 hunks)
src/scanoss/cli.py (19 hunks)
src/scanoss/constants.py (1 hunks)
src/scanoss/file_filters.py (12 hunks)
src/scanoss/results.py (6 hunks)
src/scanoss/scanners/__init__.py (1 hunks)
src/scanoss/scanners/folder_hasher.py (1 hunks)
src/scanoss/scanners/scanner_config.py (1 hunks)
src/scanoss/scanners/scanner_hfh.py (1 hunks)
src/scanoss/scanossbase.py (1 hunks)
src/scanoss/scanossgrpc.py (8 hunks)
src/scanoss/utils/abstract_presenter.py (1 hunks)
src/scanoss/utils/crc64.py (1 hunks)
src/scanoss/utils/simhash.py (1 hunks)
version.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (14)

setup.cfg
requirements.txt
src/protoc_gen_swagger/options/annotations_pb2_grpc.py
src/scanoss/api/common/v2/scanoss_common_pb2_grpc.py
src/scanoss/scanners/init.py
src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py
src/scanoss/scanossbase.py
src/scanoss/init.py
src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py
pyproject.toml
CLIENT_HELP.md
src/scanoss/scanners/scanner_config.py
version.py
src/scanoss/constants.py

🧰 Additional context used

🧬 Code Definitions (11)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (6)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (2)

Echo (47-52)

Echo (111-125)

src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (2)

Echo (57-62)

Echo (145-159)

src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (2)

Echo (42-47)

Echo (94-108)

src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/scanners/scanner_hfh.py (4)

src/scanoss/scanners/folder_hasher.py (5)

FolderHasher (67-260)

hash_directory (105-124)

present (258-260)

_format_json_output (273-280)

_format_plain_output (282-290)

src/scanoss/scanners/scanner_config.py (1)

ScannerConfig (39-54)

src/scanoss/scanossgrpc.py (2)

ScanossGrpc (97-615)

folder_hash_scan (468-483)

src/scanoss/utils/abstract_presenter.py (4)

AbstractPresenter (8-68)

present (29-48)

_format_json_output (57-61)

_format_plain_output (64-68)

src/scanoss/utils/abstract_presenter.py (4)

src/scanoss/scanossbase.py (2)

ScanossBase (28-101)

print_to_file_or_stdout (83-91)

src/scanoss/results.py (3)

present (261-263)

_format_json_output (63-84)

_format_plain_output (86-99)

src/scanoss/scanners/folder_hasher.py (3)

present (258-260)

_format_json_output (273-280)

_format_plain_output (282-290)

src/scanoss/scanners/scanner_hfh.py (3)

present (128-130)

_format_json_output (143-150)

_format_plain_output (152-160)

src/scanoss/file_filters.py (2)

src/scanoss/scanossbase.py (3)

ScanossBase (28-101)

print_msg (51-56)

print_debug (58-63)

src/scanoss/spdxlite.py (1)

print_debug (61-66)

src/scanoss/scanossgrpc.py (3)

src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (1)

ProvenanceStub (9-29)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (3)

ScanningStub (9-29)

FolderHashScan (44-49)

FolderHashScan (94-108)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (1)

DependenciesStub (9-29)

src/scanoss/scanners/folder_hasher.py (5)

src/scanoss/file_filters.py (2)

FileFilters (433-738)

get_filtered_files_from_files (514-564)

src/scanoss/utils/abstract_presenter.py (4)

AbstractPresenter (8-68)

present (29-48)

_format_json_output (57-61)

_format_plain_output (64-68)

src/scanoss/utils/crc64.py (2)

CRC64 (29-96)

get_hash_buff (82-96)

src/scanoss/utils/simhash.py (3)

simhash (125-130)

WordFeatureSet (163-169)

fingerprint (100-109)

src/scanoss/scanners/scanner_hfh.py (3)

present (128-130)

_format_json_output (143-150)

_format_plain_output (152-160)

src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (6)

src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (2)

Echo (47-52)

Echo (111-125)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (2)

Echo (42-47)

Echo (94-108)

src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/cli.py (4)

src/scanoss/scanners/folder_hasher.py (2)

FolderHasher (67-260)

hash_directory (105-124)

src/scanoss/scanossgrpc.py (1)

ScanossGrpc (97-615)

src/scanoss/scanners/scanner_config.py (1)

create_scanner_config_from_args (57-73)

src/scanoss/scanners/scanner_hfh.py (2)

ScannerHFH (41-130)

scan (94-126)

src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (7)

src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (2)

Echo (57-62)

Echo (145-159)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (2)

Echo (42-47)

Echo (94-108)

src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/components.py (1)

Components (37-357)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (6)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (2)

Echo (47-52)

Echo (111-125)

src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (2)

Echo (57-62)

Echo (145-159)

src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (2)

Echo (42-47)

Echo (94-108)

src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (6)

src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (2)

Echo (47-52)

Echo (111-125)

src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (2)

Echo (57-62)

Echo (145-159)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (2)

Echo (42-47)

Echo (94-108)

src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

🪛 Ruff (0.8.2)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py

6-6: Line too long (122 > 120)

(E501)

77-77: Too many arguments in function definition (10 > 5)

(PLR0913)

94-94: Too many arguments in function definition (10 > 5)

(PLR0913)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py

19-19: Line too long (1925 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (379 > 120)

(E501)

27-27: Undefined name _SCANNING

(F821)

28-28: Undefined name _SCANNING

(F821)

29-29: Undefined name _SCANNING

(F821)

30-30: Undefined name _SCANNING

(F821)

30-30: Line too long (132 > 120)

(E501)

31-31: Undefined name _HFHREQUEST

(F821)

32-32: Undefined name _HFHREQUEST

(F821)

33-33: Undefined name _HFHREQUEST_CHILDREN

(F821)

34-34: Undefined name _HFHREQUEST_CHILDREN

(F821)

35-35: Undefined name _HFHRESPONSE

(F821)

36-36: Undefined name _HFHRESPONSE

(F821)

37-37: Undefined name _HFHRESPONSE_COMPONENT

(F821)

38-38: Undefined name _HFHRESPONSE_COMPONENT

(F821)

39-39: Undefined name _HFHRESPONSE_RESULT

(F821)

40-40: Undefined name _HFHRESPONSE_RESULT

(F821)

41-41: Undefined name _SCANNING

(F821)

42-42: Undefined name _SCANNING

(F821)

src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py

19-19: Line too long (4142 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (400 > 120)

(E501)

27-27: Undefined name _CRYPTOGRAPHY

(F821)

28-28: Undefined name _CRYPTOGRAPHY

(F821)

28-28: Line too long (126 > 120)

(E501)

29-29: Undefined name _CRYPTOGRAPHY

(F821)

30-30: Undefined name _CRYPTOGRAPHY

(F821)

30-30: Line too long (138 > 120)

(E501)

31-31: Undefined name _CRYPTOGRAPHY

(F821)

32-32: Undefined name _CRYPTOGRAPHY

(F821)

32-32: Line too long (149 > 120)

(E501)

33-33: Undefined name _CRYPTOGRAPHY

(F821)

34-34: Undefined name _CRYPTOGRAPHY

(F821)

34-34: Line too long (145 > 120)

(E501)

35-35: Undefined name _CRYPTOGRAPHY

(F821)

36-36: Undefined name _CRYPTOGRAPHY

(F821)

36-36: Line too long (139 > 120)

(E501)

37-37: Undefined name _CRYPTOGRAPHY

(F821)

38-38: Undefined name _CRYPTOGRAPHY

(F821)

38-38: Line too long (141 > 120)

(E501)

39-39: Undefined name _ALGORITHM

(F821)

40-40: Undefined name _ALGORITHM

(F821)

41-41: Undefined name _ALGORITHMRESPONSE

(F821)

42-42: Undefined name _ALGORITHMRESPONSE

(F821)

43-43: Undefined name _ALGORITHMRESPONSE_PURLS

(F821)

44-44: Undefined name _ALGORITHMRESPONSE_PURLS

(F821)

45-45: Undefined name _ALGORITHMSINRANGERESPONSE

(F821)

46-46: Undefined name _ALGORITHMSINRANGERESPONSE

(F821)

47-47: Undefined name _ALGORITHMSINRANGERESPONSE_PURL

(F821)

48-48: Undefined name _ALGORITHMSINRANGERESPONSE_PURL

(F821)

49-49: Undefined name _VERSIONSINRANGERESPONSE

(F821)

50-50: Undefined name _VERSIONSINRANGERESPONSE

(F821)

51-51: Undefined name _VERSIONSINRANGERESPONSE_PURL

(F821)

52-52: Undefined name _VERSIONSINRANGERESPONSE_PURL

(F821)

53-53: Undefined name _HINT

(F821)

54-54: Undefined name _HINT

(F821)

55-55: Undefined name _HINTSRESPONSE

(F821)

56-56: Undefined name _HINTSRESPONSE

(F821)

57-57: Undefined name _HINTSRESPONSE_PURLS

(F821)

58-58: Undefined name _HINTSRESPONSE_PURLS

(F821)

59-59: Undefined name _HINTSINRANGERESPONSE

(F821)

60-60: Undefined name _HINTSINRANGERESPONSE

(F821)

61-61: Undefined name _HINTSINRANGERESPONSE_PURL

(F821)

62-62: Undefined name _HINTSINRANGERESPONSE_PURL

(F821)

63-63: Undefined name _CRYPTOGRAPHY

(F821)

64-64: Undefined name _CRYPTOGRAPHY

(F821)

src/scanoss/scanossgrpc.py

62-62: .api.dependencies.v2.scanoss_dependencies_pb2.DependencyResponse imported but unused

Remove unused import: .api.dependencies.v2.scanoss_dependencies_pb2.DependencyResponse

(F401)

src/scanoss/api/components/v2/scanoss_components_pb2.py

19-19: Line too long (3679 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (389 > 120)

(E501)

27-27: Undefined name _COMPONENTS

(F821)

28-28: Undefined name _COMPONENTS

(F821)

28-28: Line too long (122 > 120)

(E501)

29-29: Undefined name _COMPONENTS

(F821)

30-30: Undefined name _COMPONENTS

(F821)

30-30: Line too long (136 > 120)

(E501)

31-31: Undefined name _COMPONENTS

(F821)

32-32: Undefined name _COMPONENTS

(F821)

32-32: Line too long (139 > 120)

(E501)

33-33: Undefined name _COMPONENTS

(F821)

34-34: Undefined name _COMPONENTS

(F821)

34-34: Line too long (144 > 120)

(E501)

35-35: Undefined name _COMPSEARCHREQUEST

(F821)

36-36: Undefined name _COMPSEARCHREQUEST

(F821)

37-37: Undefined name _COMPSTATISTIC

(F821)

38-38: Undefined name _COMPSTATISTIC

(F821)

39-39: Undefined name _COMPSTATISTIC_LANGUAGE

(F821)

40-40: Undefined name _COMPSTATISTIC_LANGUAGE

(F821)

41-41: Undefined name _COMPSTATISTICRESPONSE

(F821)

42-42: Undefined name _COMPSTATISTICRESPONSE

(F821)

43-43: Undefined name _COMPSTATISTICRESPONSE_PURLS

(F821)

44-44: Undefined name _COMPSTATISTICRESPONSE_PURLS

(F821)

45-45: Undefined name _COMPSEARCHRESPONSE

(F821)

46-46: Undefined name _COMPSEARCHRESPONSE

(F821)

47-47: Undefined name _COMPSEARCHRESPONSE_COMPONENT

(F821)

48-48: Undefined name _COMPSEARCHRESPONSE_COMPONENT

(F821)

49-49: Undefined name _COMPVERSIONREQUEST

(F821)

50-50: Undefined name _COMPVERSIONREQUEST

(F821)

51-51: Undefined name _COMPVERSIONRESPONSE

(F821)

52-52: Undefined name _COMPVERSIONRESPONSE

(F821)

53-53: Undefined name _COMPVERSIONRESPONSE_LICENSE

(F821)

54-54: Undefined name _COMPVERSIONRESPONSE_LICENSE

(F821)

55-55: Undefined name _COMPVERSIONRESPONSE_VERSION

(F821)

56-56: Undefined name _COMPVERSIONRESPONSE_VERSION

(F821)

57-57: Undefined name _COMPVERSIONRESPONSE_COMPONENT

(F821)

58-58: Undefined name _COMPVERSIONRESPONSE_COMPONENT

(F821)

59-59: Undefined name _COMPONENTS

(F821)

60-60: Undefined name _COMPONENTS

(F821)

src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py

6-6: Line too long (138 > 120)

(E501)

79-79: Line too long (123 > 120)

(E501)

145-145: Too many arguments in function definition (10 > 5)

(PLR0913)

162-162: Too many arguments in function definition (10 > 5)

(PLR0913)

179-179: Too many arguments in function definition (10 > 5)

(PLR0913)

189-189: Line too long (127 > 120)

(E501)

196-196: Too many arguments in function definition (10 > 5)

(PLR0913)

206-206: Line too long (125 > 120)

(E501)

213-213: Too many arguments in function definition (10 > 5)

(PLR0913)

223-223: Line too long (122 > 120)

(E501)

230-230: Too many arguments in function definition (10 > 5)

(PLR0913)

240-240: Line too long (125 > 120)

(E501)

src/protoc_gen_swagger/options/openapiv2_pb2.py

18-18: Line too long (9607 > 120)

(E501)

22-22: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Undefined name _SWAGGER_RESPONSESENTRY

(F821)

27-27: Undefined name _SWAGGER_RESPONSESENTRY

(F821)

28-28: Undefined name _SWAGGER_EXTENSIONSENTRY

(F821)

29-29: Undefined name _SWAGGER_EXTENSIONSENTRY

(F821)

30-30: Undefined name _OPERATION_RESPONSESENTRY

(F821)

31-31: Undefined name _OPERATION_RESPONSESENTRY

(F821)

32-32: Undefined name _OPERATION_EXTENSIONSENTRY

(F821)

33-33: Undefined name _OPERATION_EXTENSIONSENTRY

(F821)

34-34: Undefined name _RESPONSE_HEADERSENTRY

(F821)

35-35: Undefined name _RESPONSE_HEADERSENTRY

(F821)

36-36: Undefined name _RESPONSE_EXAMPLESENTRY

(F821)

37-37: Undefined name _RESPONSE_EXAMPLESENTRY

(F821)

38-38: Undefined name _RESPONSE_EXTENSIONSENTRY

(F821)

39-39: Undefined name _RESPONSE_EXTENSIONSENTRY

(F821)

40-40: Undefined name _INFO_EXTENSIONSENTRY

(F821)

41-41: Undefined name _INFO_EXTENSIONSENTRY

(F821)

42-42: Undefined name _SCHEMA

(F821)

43-43: Undefined name _SCHEMA

(F821)

44-44: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY

(F821)

45-45: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY

(F821)

46-46: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY

(F821)

47-47: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY

(F821)

48-48: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY

(F821)

49-49: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY

(F821)

50-50: Undefined name _SCOPES_SCOPEENTRY

(F821)

51-51: Undefined name _SCOPES_SCOPEENTRY

(F821)

52-52: Undefined name _SWAGGER

(F821)

53-53: Undefined name _SWAGGER

(F821)

54-54: Undefined name _SWAGGER_RESPONSESENTRY

(F821)

55-55: Undefined name _SWAGGER_RESPONSESENTRY

(F821)

56-56: Undefined name _SWAGGER_EXTENSIONSENTRY

(F821)

57-57: Undefined name _SWAGGER_EXTENSIONSENTRY

(F821)

58-58: Undefined name _SWAGGER_SWAGGERSCHEME

(F821)

59-59: Undefined name _SWAGGER_SWAGGERSCHEME

(F821)

60-60: Undefined name _OPERATION

(F821)

61-61: Undefined name _OPERATION

(F821)

62-62: Undefined name _OPERATION_RESPONSESENTRY

(F821)

63-63: Undefined name _OPERATION_RESPONSESENTRY

(F821)

64-64: Undefined name _OPERATION_EXTENSIONSENTRY

(F821)

65-65: Undefined name _OPERATION_EXTENSIONSENTRY

(F821)

66-66: Undefined name _HEADER

(F821)

67-67: Undefined name _HEADER

(F821)

68-68: Undefined name _RESPONSE

(F821)

69-69: Undefined name _RESPONSE

(F821)

70-70: Undefined name _RESPONSE_HEADERSENTRY

(F821)

71-71: Undefined name _RESPONSE_HEADERSENTRY

(F821)

72-72: Undefined name _RESPONSE_EXAMPLESENTRY

(F821)

73-73: Undefined name _RESPONSE_EXAMPLESENTRY

(F821)

74-74: Undefined name _RESPONSE_EXTENSIONSENTRY

(F821)

75-75: Undefined name _RESPONSE_EXTENSIONSENTRY

(F821)

76-76: Undefined name _INFO

(F821)

77-77: Undefined name _INFO

(F821)

78-78: Undefined name _INFO_EXTENSIONSENTRY

(F821)

79-79: Undefined name _INFO_EXTENSIONSENTRY

(F821)

80-80: Undefined name _CONTACT

(F821)

81-81: Undefined name _CONTACT

(F821)

82-82: Undefined name _LICENSE

(F821)

83-83: Undefined name _LICENSE

(F821)

84-84: Undefined name _EXTERNALDOCUMENTATION

(F821)

85-85: Undefined name _EXTERNALDOCUMENTATION

(F821)

86-86: Undefined name _SCHEMA

(F821)

87-87: Undefined name _SCHEMA

(F821)

88-88: Undefined name _JSONSCHEMA

(F821)

89-89: Undefined name _JSONSCHEMA

(F821)

90-90: Undefined name _JSONSCHEMA_JSONSCHEMASIMPLETYPES

(F821)

91-91: Undefined name _JSONSCHEMA_JSONSCHEMASIMPLETYPES

(F821)

92-92: Undefined name _TAG

(F821)

93-93: Undefined name _TAG

(F821)

94-94: Undefined name _SECURITYDEFINITIONS

(F821)

95-95: Undefined name _SECURITYDEFINITIONS

(F821)

96-96: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY

(F821)

97-97: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY

(F821)

98-98: Undefined name _SECURITYSCHEME

(F821)

99-99: Undefined name _SECURITYSCHEME

(F821)

100-100: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY

(F821)

101-101: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY

(F821)

102-102: Undefined name _SECURITYSCHEME_TYPE

(F821)

103-103: Undefined name _SECURITYSCHEME_TYPE

(F821)

104-104: Undefined name _SECURITYSCHEME_IN

(F821)

105-105: Undefined name _SECURITYSCHEME_IN

(F821)

106-106: Undefined name _SECURITYSCHEME_FLOW

(F821)

107-107: Undefined name _SECURITYSCHEME_FLOW

(F821)

108-108: Undefined name _SECURITYREQUIREMENT

(F821)

109-109: Undefined name _SECURITYREQUIREMENT

(F821)

110-110: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTVALUE

(F821)

111-111: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTVALUE

(F821)

112-112: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY

(F821)

113-113: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY

(F821)

114-114: Undefined name _SCOPES

(F821)

115-115: Undefined name _SCOPES

(F821)

116-116: Undefined name _SCOPES_SCOPEENTRY

(F821)

117-117: Undefined name _SCOPES_SCOPEENTRY

(F821)

src/protoc_gen_swagger/options/annotations_pb2.py

18-18: Line too long (1009 > 120)

(E501)

22-22: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

23-23: Undefined name openapiv2_swagger

(F821)

24-24: Undefined name openapiv2_operation

(F821)

25-25: Undefined name openapiv2_schema

(F821)

26-26: Undefined name openapiv2_tag

(F821)

27-27: Undefined name openapiv2_field

(F821)

src/scanoss/api/common/v2/scanoss_common_pb2.py

16-16: Line too long (845 > 120)

(E501)

20-20: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

24-24: Undefined name _STATUSCODE

(F821)

25-25: Undefined name _STATUSCODE

(F821)

26-26: Undefined name _STATUSRESPONSE

(F821)

27-27: Undefined name _STATUSRESPONSE

(F821)

28-28: Undefined name _ECHOREQUEST

(F821)

29-29: Undefined name _ECHOREQUEST

(F821)

30-30: Undefined name _ECHORESPONSE

(F821)

31-31: Undefined name _ECHORESPONSE

(F821)

32-32: Undefined name _PURLREQUEST

(F821)

33-33: Undefined name _PURLREQUEST

(F821)

34-34: Undefined name _PURLREQUEST_PURLS

(F821)

35-35: Undefined name _PURLREQUEST_PURLS

(F821)

src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py

6-6: Line too long (130 > 120)

(E501)

111-111: Too many arguments in function definition (10 > 5)

(PLR0913)

128-128: Too many arguments in function definition (10 > 5)

(PLR0913)

145-145: Too many arguments in function definition (10 > 5)

(PLR0913)

155-155: Line too long (123 > 120)

(E501)

162-162: Too many arguments in function definition (10 > 5)

(PLR0913)

172-172: Line too long (125 > 120)

(E501)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py

19-19: Line too long (2471 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (398 > 120)

(E501)

27-27: Undefined name _DEPENDENCIES

(F821)

28-28: Undefined name _DEPENDENCIES

(F821)

28-28: Line too long (126 > 120)

(E501)

29-29: Undefined name _DEPENDENCIES

(F821)

30-30: Undefined name _DEPENDENCIES

(F821)

30-30: Line too long (139 > 120)

(E501)

31-31: Undefined name _DEPENDENCYREQUEST

(F821)

32-32: Undefined name _DEPENDENCYREQUEST

(F821)

33-33: Undefined name _DEPENDENCYREQUEST_PURLS

(F821)

34-34: Undefined name _DEPENDENCYREQUEST_PURLS

(F821)

35-35: Undefined name _DEPENDENCYREQUEST_FILES

(F821)

36-36: Undefined name _DEPENDENCYREQUEST_FILES

(F821)

37-37: Undefined name _DEPENDENCYRESPONSE

(F821)

38-38: Undefined name _DEPENDENCYRESPONSE

(F821)

39-39: Undefined name _DEPENDENCYRESPONSE_LICENSES

(F821)

40-40: Undefined name _DEPENDENCYRESPONSE_LICENSES

(F821)

41-41: Undefined name _DEPENDENCYRESPONSE_DEPENDENCIES

(F821)

42-42: Undefined name _DEPENDENCYRESPONSE_DEPENDENCIES

(F821)

43-43: Undefined name _DEPENDENCYRESPONSE_FILES

(F821)

44-44: Undefined name _DEPENDENCYRESPONSE_FILES

(F821)

45-45: Undefined name _DEPENDENCIES

(F821)

46-46: Undefined name _DEPENDENCIES

(F821)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py

6-6: Line too long (138 > 120)

(E501)

77-77: Too many arguments in function definition (10 > 5)

(PLR0913)

94-94: Too many arguments in function definition (10 > 5)

(PLR0913)

104-104: Line too long (122 > 120)

(E501)

src/scanoss/api/provenance/v2/scanoss_provenance_pb2.py

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (389 > 120)

(E501)

27-27: Undefined name _PROVENANCE

(F821)

28-28: Undefined name _PROVENANCE

(F821)

28-28: Line too long (122 > 120)

(E501)

29-29: Undefined name _PROVENANCE

(F821)

30-30: Undefined name _PROVENANCE

(F821)

30-30: Line too long (142 > 120)

(E501)

31-31: Undefined name _PROVENANCERESPONSE

(F821)

32-32: Undefined name _PROVENANCERESPONSE

(F821)

33-33: Undefined name _PROVENANCERESPONSE_DECLAREDLOCATION

(F821)

34-34: Undefined name _PROVENANCERESPONSE_DECLAREDLOCATION

(F821)

35-35: Undefined name _PROVENANCERESPONSE_CURATEDLOCATION

(F821)

36-36: Undefined name _PROVENANCERESPONSE_CURATEDLOCATION

(F821)

37-37: Undefined name _PROVENANCERESPONSE_PURLS

(F821)

38-38: Undefined name _PROVENANCERESPONSE_PURLS

(F821)

39-39: Undefined name _PROVENANCE

(F821)

40-40: Undefined name _PROVENANCE

(F821)

src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py

19-19: Line too long (1718 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (375 > 120)

(E501)

27-27: Undefined name _SEMGREP

(F821)

28-28: Undefined name _SEMGREP

(F821)

29-29: Undefined name _SEMGREP

(F821)

30-30: Undefined name _SEMGREP

(F821)

30-30: Line too long (123 > 120)

(E501)

31-31: Undefined name _SEMGREPRESPONSE

(F821)

32-32: Undefined name _SEMGREPRESPONSE

(F821)

33-33: Undefined name _SEMGREPRESPONSE_ISSUE

(F821)

34-34: Undefined name _SEMGREPRESPONSE_ISSUE

(F821)

35-35: Undefined name _SEMGREPRESPONSE_FILE

(F821)

36-36: Undefined name _SEMGREPRESPONSE_FILE

(F821)

37-37: Undefined name _SEMGREPRESPONSE_PURLS

(F821)

38-38: Undefined name _SEMGREPRESPONSE_PURLS

(F821)

39-39: Undefined name _SEMGREP

(F821)

40-40: Undefined name _SEMGREP

(F821)

src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py

77-77: Too many arguments in function definition (10 > 5)

(PLR0913)

94-94: Too many arguments in function definition (10 > 5)

(PLR0913)

src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py

19-19: Line too long (2566 > 120)

(E501)

22-22: Line too long (124 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (419 > 120)

(E501)

27-27: Undefined name _VULNERABILITIES

(F821)

28-28: Undefined name _VULNERABILITIES

(F821)

28-28: Line too long (129 > 120)

(E501)

29-29: Undefined name _VULNERABILITIES

(F821)

30-30: Undefined name _VULNERABILITIES

(F821)

30-30: Line too long (132 > 120)

(E501)

31-31: Undefined name _VULNERABILITIES

(F821)

32-32: Undefined name _VULNERABILITIES

(F821)

32-32: Line too long (152 > 120)

(E501)

33-33: Undefined name _VULNERABILITYREQUEST

(F821)

34-34: Undefined name _VULNERABILITYREQUEST

(F821)

35-35: Undefined name _VULNERABILITYREQUEST_PURLS

(F821)

36-36: Undefined name _VULNERABILITYREQUEST_PURLS

(F821)

37-37: Undefined name _CPERESPONSE

(F821)

38-38: Undefined name _CPERESPONSE

(F821)

39-39: Undefined name _CPERESPONSE_PURLS

(F821)

40-40: Undefined name _CPERESPONSE_PURLS

(F821)

41-41: Undefined name _VULNERABILITYRESPONSE

(F821)

42-42: Undefined name _VULNERABILITYRESPONSE

(F821)

43-43: Undefined name _VULNERABILITYRESPONSE_VULNERABILITIES

(F821)

44-44: Undefined name _VULNERABILITYRESPONSE_VULNERABILITIES

(F821)

45-45: Undefined name _VULNERABILITYRESPONSE_PURLS

(F821)

46-46: Undefined name _VULNERABILITYRESPONSE_PURLS

(F821)

47-47: Undefined name _VULNERABILITIES

(F821)

48-48: Undefined name _VULNERABILITIES

(F821)

src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py

6-6: Line too long (150 > 120)

(E501)

94-94: Too many arguments in function definition (10 > 5)

(PLR0913)

111-111: Too many arguments in function definition (10 > 5)

(PLR0913)

128-128: Too many arguments in function definition (10 > 5)

(PLR0913)

138-138: Line too long (131 > 120)

(E501)

🪛 LanguageTool

CHANGELOG.md

[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-03-27 ### Added - Add folder-scan subcommand - Add folder-has...

(REPEATED_VERBS)

🪛 GitHub Actions: Lint

src/scanoss/scanossgrpc.py

[error] 62-62: F401 [*] .api.dependencies.v2.scanoss_dependencies_pb2.DependencyResponse imported but unused. Remove unused import: .api.dependencies.v2.scanoss_dependencies_pb2.DependencyResponse.

[error] 531-531: F841 Local variable SUCCEDED_WITH_WARNINGS_STATUS_CODE is assigned to but never used. Remove assignment to unused variable SUCCEDED_WITH_WARNINGS_STATUS_CODE.

[error] 532-532: F841 Local variable FAILED_STATUS_CODE is assigned to but never used. Remove assignment to unused variable FAILED_STATUS_CODE.

src/scanoss/cli.py

[error] 69-69: F811 Redefinition of unused DEFAULT_POST_SIZE from line 47. Remove definition: DEFAULT_POST_SIZE.

[error] 70-70: F811 Redefinition of unused DEFAULT_TIMEOUT from line 49. Remove definition: DEFAULT_TIMEOUT.

[error] 72-72: F811 Redefinition of unused DEFAULT_RETRY from line 48. Remove definition: DEFAULT_RETRY.

[error] 1285-1285: PLE1310 String strip call contains duplicate characters.

🪛 YAMLlint (1.35.1)

.github/workflows/lint.yml

[error] 35-35: trailing spaces

(trailing-spaces)

[warning] 58-58: too many blank lines

(1 > 0) (empty-lines)

🔇 Additional comments (75)

.github/workflows/lint.yml (2)

41-41: Confirm multi-line output formatting.
The multi-line syntax using the GITHUB_OUTPUT variable is correctly applied in this section. Just ensure that this format remains consistent with other parts of the workflow if similar patterns are used.

44-45: Logging clarity for filtered files.
The added log messages for before and after file filtering provide useful feedback during the workflow execution. This transparency will help diagnose issues if the filtering does not behave as expected.

src/scanoss/api/provenance/v2/scanoss_provenance_pb2.py (1)

5-5: Auto-generated code changes look good!

These changes are part of auto-generated Protocol Buffer code, replacing the use of an intermediate _globals dictionary with direct calls to globals(). This is a standard pattern in newer versions of the protobuf compiler and doesn't affect functionality.

Also applies to: 21-40

src/scanoss/api/common/v2/scanoss_common_pb2.py (1)

14-16: Auto-generated code changes look good!

These changes in the Protocol Buffer generated code match the pattern seen in other files, using direct calls to globals() and adjusting descriptor assignments. This is expected behavior from the protobuf compiler and doesn't affect functionality.

Also applies to: 21-35

🧰 Tools

🪛 Ruff (0.8.2)

16-16: Line too long (845 > 120)

(E501)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (1)

6-6: Reformatting changes look good!

These changes improve code indentation and formatting consistency across the gRPC service definitions. The consolidation of import statements and consistent indentation of method parameters enhance readability without changing functionality.

Also applies to: 21-29, 54-63, 70-108

🧰 Tools

🪛 Ruff (0.8.2)

6-6: Line too long (138 > 120)

(E501)

src/protoc_gen_swagger/options/annotations_pb2.py (1)

18-18: Auto-generated code changes look good!

These changes follow the same pattern as other Protocol Buffer files in this PR, with consistent indentation of extension registrations and options assignments. This ensures uniform code style across the auto-generated files.

Also applies to: 23-30

🧰 Tools

🪛 Ruff (0.8.2)

18-18: Line too long (1009 > 120)

(E501)

docs/source/index.rst (1)

231-302: Well-structured documentation for new commands

The documentation for the new folder-scan and folder-hash commands follows the established pattern of the rest of the document. The descriptions and argument lists are clear and comprehensive.

src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (7)

6-6: Simplified import statement

Good cleanup of the import statement by removing unnecessary parentheses.

🧰 Tools

🪛 Ruff (0.8.2)

6-6: Line too long (130 > 120)

(E501)

20-39: Improved method indentation and formatting

The changes improve the readability of the stub methods by using consistent indentation.

47-52: Improved comment formatting

The Echo method documentation now uses a cleaner multi-line comment format.

54-73: Consistent comment formatting across methods

All method comments now follow the same multi-line format, improving consistency.

78-97: Improved RPC handler indentation

The indentation of the RPC method handlers is now more consistent, making the code more readable.

99-100: Consistent indentation for generic handler

The improved indentation for the generic handler aligns with the rest of the code.

111-176: Consistent parameter indentation in static methods

All static methods now have consistent indentation for parameters and return statements, improving code readability.

🧰 Tools

🪛 Ruff (0.8.2)

111-111: Too many arguments in function definition (10 > 5)

(PLR0913)

128-128: Too many arguments in function definition (10 > 5)

(PLR0913)

145-145: Too many arguments in function definition (10 > 5)

(PLR0913)

155-155: Line too long (123 > 120)

(E501)

162-162: Too many arguments in function definition (10 > 5)

(PLR0913)

172-172: Line too long (125 > 120)

(E501)

CHANGELOG.md (2)

12-17: Clear version documentation in changelog

The new version entry follows the established pattern and clearly lists the new features.

Just a minor grammatical note: The bullet points have redundant "Add" words - the section header already says "Added".

🧰 Tools

🪛 LanguageTool

[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-03-27 ### Added - Add folder-scan subcommand - Add folder-has...

(REPEATED_VERBS)

500-501: Updated version comparison links

Version comparison links are properly updated to include the new version.

src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py (1)

19-48: This generated code looks good.

The restructuring of how descriptor options and serialized values are assigned maintains the same functionality while making the assignments more straightforward.

🧰 Tools

🪛 Ruff (0.8.2)

19-19: Line too long (2566 > 120)

(E501)

22-22: Line too long (124 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (419 > 120)

(E501)

27-27: Undefined name _VULNERABILITIES

(F821)

28-28: Undefined name _VULNERABILITIES

(F821)

28-28: Line too long (129 > 120)

(E501)

29-29: Undefined name _VULNERABILITIES

(F821)

30-30: Undefined name _VULNERABILITIES

(F821)

30-30: Line too long (132 > 120)

(E501)

31-31: Undefined name _VULNERABILITIES

(F821)

32-32: Undefined name _VULNERABILITIES

(F821)

32-32: Line too long (152 > 120)

(E501)

33-33: Undefined name _VULNERABILITYREQUEST

(F821)

34-34: Undefined name _VULNERABILITYREQUEST

(F821)

35-35: Undefined name _VULNERABILITYREQUEST_PURLS

(F821)

36-36: Undefined name _VULNERABILITYREQUEST_PURLS

(F821)

37-37: Undefined name _CPERESPONSE

(F821)

38-38: Undefined name _CPERESPONSE

(F821)

39-39: Undefined name _CPERESPONSE_PURLS

(F821)

40-40: Undefined name _CPERESPONSE_PURLS

(F821)

41-41: Undefined name _VULNERABILITYRESPONSE

(F821)

42-42: Undefined name _VULNERABILITYRESPONSE

(F821)

43-43: Undefined name _VULNERABILITYRESPONSE_VULNERABILITIES

(F821)

44-44: Undefined name _VULNERABILITYRESPONSE_VULNERABILITIES

(F821)

45-45: Undefined name _VULNERABILITYRESPONSE_PURLS

(F821)

46-46: Undefined name _VULNERABILITYRESPONSE_PURLS

(F821)

47-47: Undefined name _VULNERABILITIES

(F821)

48-48: Undefined name _VULNERABILITIES

(F821)

src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (1)

5-142: Formatting improvements look good.

The changes improve code readability through consistent indentation and spacing without altering functionality.

🧰 Tools

🪛 Ruff (0.8.2)

6-6: Line too long (150 > 120)

(E501)

94-94: Too many arguments in function definition (10 > 5)

(PLR0913)

111-111: Too many arguments in function definition (10 > 5)

(PLR0913)

128-128: Too many arguments in function definition (10 > 5)

(PLR0913)

138-138: Line too long (131 > 120)

(E501)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (1)

19-42: The new message types and method for folder hashing look good.

The addition of HFHRequest and HFHResponse message types along with the FolderHashScan method properly supports the folder hashing functionality being implemented.

🧰 Tools

🪛 Ruff (0.8.2)

19-19: Line too long (1925 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (379 > 120)

(E501)

27-27: Undefined name _SCANNING

(F821)

28-28: Undefined name _SCANNING

(F821)

29-29: Undefined name _SCANNING

(F821)

30-30: Undefined name _SCANNING

(F821)

30-30: Line too long (132 > 120)

(E501)

31-31: Undefined name _HFHREQUEST

(F821)

32-32: Undefined name _HFHREQUEST

(F821)

33-33: Undefined name _HFHREQUEST_CHILDREN

(F821)

34-34: Undefined name _HFHREQUEST_CHILDREN

(F821)

35-35: Undefined name _HFHRESPONSE

(F821)

36-36: Undefined name _HFHRESPONSE

(F821)

37-37: Undefined name _HFHRESPONSE_COMPONENT

(F821)

38-38: Undefined name _HFHRESPONSE_COMPONENT

(F821)

39-39: Undefined name _HFHRESPONSE_RESULT

(F821)

40-40: Undefined name _HFHRESPONSE_RESULT

(F821)

41-41: Undefined name _SCANNING

(F821)

42-42: Undefined name _SCANNING

(F821)

src/scanoss/cli.py (5)

34-42: Import organization looks good.

The necessary imports for folder hashing functionality are properly added.

494-533: New folder-scan command implementation looks good.

The command structure and argument definitions are well-organized and include helpful descriptions. The threshold validation with choices=range(1, 101) ensures valid input.

535-552: New folder-hash command implementation looks good.

The command is properly structured with appropriate arguments and descriptions.

554-568: Good implementation of settings options for both new commands.

Adding settings support for the new commands maintains consistency with existing commands.

1662-1672: Good implementation of settings helper function.

The get_scanoss_settings_from_args function effectively extracts and handles the settings loading logic for reuse.

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (18)

10-12: Docstring looks good
Clear documentation, no issues.

21-24: No concerns
The unary_unary channel method for the Echo RPC is properly set up.

25-29: No concerns
The unary_unary channel method for the FolderHashScan RPC is properly set up.

33-35: Docstring looks good
Readable docstring with clear comments on the servicer methods.

38-42: Stub method is fine
Raising NotImplementedError for the Echo method is standard for a stub definition.

44-49: Stub method is fine
Raising NotImplementedError for the FolderHashScan method is also standard for a stub definition.

54-58: No concerns
The 'Echo' method handler looks correct and consistent.

59-63: No concerns
The 'FolderHashScan' method handler is also defined correctly.

65-66: No concerns
The generic handler registration is properly structured.

70-70: No concerns
The comment noting the experimental status is self-explanatory.

72-74: Docstring looks good
Clear class-level docstring for the Scanning class.

77-77: Ignore 'too many arguments'
These arguments match the auto-generated gRPC stubs. Adjusting them is generally not recommended.

🧰 Tools

🪛 Ruff (0.8.2)

77-77: Too many arguments in function definition (10 > 5)

(PLR0913)

79-86: Method signature appears correct
All parameters align with gRPC usage; no issues noted.

87-87: No concerns
Return statement for the gRPC call is standard.

90-91: No concerns
Standard usage of gRPC call options.

93-94: Ignore 'too many arguments'
Same reasoning as the Echo method—auto-generated stub signature.

🧰 Tools

🪛 Ruff (0.8.2)

94-94: Too many arguments in function definition (10 > 5)

(PLR0913)

95-103: No concerns
The FolderHashScan method signature aligns with gRPC patterns.

104-108: No concerns
Return statement is consistent with unary_unary usage.

src/scanoss/utils/crc64.py (1)

1-54: Logic and structure look solid
The CRC64 implementation (table generation, initialization, and usage) is correct and follows the ECMA polynomial.

src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py (1)

19-64: Auto-generated protobuffer code
Lines are flagged for exceeding line length (E501) and referencing undefined symbols, but these are typical in auto-generated code. Generally, no changes are recommended here.

🧰 Tools

🪛 Ruff (0.8.2)

19-19: Line too long (4142 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (400 > 120)

(E501)

27-27: Undefined name _CRYPTOGRAPHY

(F821)

28-28: Undefined name _CRYPTOGRAPHY

(F821)

28-28: Line too long (126 > 120)

(E501)

29-29: Undefined name _CRYPTOGRAPHY

(F821)

30-30: Undefined name _CRYPTOGRAPHY

(F821)

30-30: Line too long (138 > 120)

(E501)

31-31: Undefined name _CRYPTOGRAPHY

(F821)

32-32: Undefined name _CRYPTOGRAPHY

(F821)

32-32: Line too long (149 > 120)

(E501)

33-33: Undefined name _CRYPTOGRAPHY

(F821)

34-34: Undefined name _CRYPTOGRAPHY

(F821)

34-34: Line too long (145 > 120)

(E501)

35-35: Undefined name _CRYPTOGRAPHY

(F821)

36-36: Undefined name _CRYPTOGRAPHY

(F821)

36-36: Line too long (139 > 120)

(E501)

37-37: Undefined name _CRYPTOGRAPHY

(F821)

38-38: Undefined name _CRYPTOGRAPHY

(F821)

38-38: Line too long (141 > 120)

(E501)

39-39: Undefined name _ALGORITHM

(F821)

40-40: Undefined name _ALGORITHM

(F821)

41-41: Undefined name _ALGORITHMRESPONSE

(F821)

42-42: Undefined name _ALGORITHMRESPONSE

(F821)

43-43: Undefined name _ALGORITHMRESPONSE_PURLS

(F821)

44-44: Undefined name _ALGORITHMRESPONSE_PURLS

(F821)

45-45: Undefined name _ALGORITHMSINRANGERESPONSE

(F821)

46-46: Undefined name _ALGORITHMSINRANGERESPONSE

(F821)

47-47: Undefined name _ALGORITHMSINRANGERESPONSE_PURL

(F821)

48-48: Undefined name _ALGORITHMSINRANGERESPONSE_PURL

(F821)

49-49: Undefined name _VERSIONSINRANGERESPONSE

(F821)

50-50: Undefined name _VERSIONSINRANGERESPONSE

(F821)

51-51: Undefined name _VERSIONSINRANGERESPONSE_PURL

(F821)

52-52: Undefined name _VERSIONSINRANGERESPONSE_PURL

(F821)

53-53: Undefined name _HINT

(F821)

54-54: Undefined name _HINT

(F821)

55-55: Undefined name _HINTSRESPONSE

(F821)

56-56: Undefined name _HINTSRESPONSE

(F821)

57-57: Undefined name _HINTSRESPONSE_PURLS

(F821)

58-58: Undefined name _HINTSRESPONSE_PURLS

(F821)

59-59: Undefined name _HINTSINRANGERESPONSE

(F821)

60-60: Undefined name _HINTSINRANGERESPONSE

(F821)

61-61: Undefined name _HINTSINRANGERESPONSE_PURL

(F821)

62-62: Undefined name _HINTSINRANGERESPONSE_PURL

(F821)

63-63: Undefined name _CRYPTOGRAPHY

(F821)

64-64: Undefined name _CRYPTOGRAPHY

(F821)

src/scanoss/scanners/scanner_hfh.py (2)

49-55: Validate client existence.

If client is not provided (i.e., None), the subsequent gRPC call in scan() will fail. Consider adding a check to exit early or initialize a default ScanossGrpc client.

152-160: Docstring mismatch for _format_plain_output().

While the method name and docstring refer to a "plain text" format, it actually returns JSON if scan_results is a dictionary. Update the docstring or convert the output to actual plain text for consistency.

src/scanoss/utils/abstract_presenter.py (1)

1-69: Looks good overall.

The abstract methods _format_json_output() and _format_plain_output() are well-defined, and the fallback behavior for invalid or unspecified format is coherent with the docstring. No immediate issues found.

src/scanoss/scanossgrpc.py (2)

468-484: Confirm request data for folder-hash scanning.

Ensure that the request dictionary aligns with HFH requirements. Consider validating that 'root' and other required fields exist before calling _call_rpc.

485-523: Robust gRPC call structure.

The _call_rpc() approach for parsing requests, appending metadata, and managing errors neatly centralizes gRPC logic. No further concerns.

src/scanoss/results.py (6)

53-114: Well-structured implementation of the new ResultsPresenter class

The creation of a dedicated presenter class that inherits from AbstractPresenter is a good design choice that separates presentation logic from data processing. This follows the Single Responsibility Principle and makes the codebase more maintainable.

The implementation handles edge cases well, particularly in the _format_json_output method with appropriate exception handling, and in _format_plain_output_item with fallbacks for missing data.

93-94: Fixed docstring and return value mismatch

The method now correctly returns a message string when there are no results to present, addressing the previous review comment.

146-156: Clean integration of the presenter in the Results class

The initialization of the ResultsPresenter with the necessary parameters is well-structured. The Results class now delegates presentation responsibilities to the specialized presenter class, creating a cleaner separation of concerns.

171-171: Improved error handling method

Using self.base.print_stderr instead of self.print_stderr is consistent with the refactoring pattern in this class.

246-249: Enhanced error specificity

Replacing a generic Exception with ValueError provides more specific error information, which is a good practice for error handling. This makes debugging easier and provides clearer feedback to users.

262-263: Clean delegation to presenter

The present method now simply delegates to the presenter's present method, which is a clean implementation of the delegation pattern.

src/scanoss/api/components/v2/scanoss_components_pb2.py (1)

19-60: Auto-generated Protobuf file updates

The updates to the DESCRIPTOR and serialized options are consistent with the changes needed to support the new folder scanning and hashing functionality mentioned in the PR objectives.

Note that static analysis tools flag several undefined names (like _COMPONENTS), but these are expected in auto-generated Protobuf files as they are defined at runtime by the Protobuf machinery.

🧰 Tools

🪛 Ruff (0.8.2)

19-19: Line too long (3679 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (389 > 120)

(E501)

27-27: Undefined name _COMPONENTS

(F821)

28-28: Undefined name _COMPONENTS

(F821)

28-28: Line too long (122 > 120)

(E501)

29-29: Undefined name _COMPONENTS

(F821)

30-30: Undefined name _COMPONENTS

(F821)

30-30: Line too long (136 > 120)

(E501)

31-31: Undefined name _COMPONENTS

(F821)

32-32: Undefined name _COMPONENTS

(F821)

32-32: Line too long (139 > 120)

(E501)

33-33: Undefined name _COMPONENTS

(F821)

34-34: Undefined name _COMPONENTS

(F821)

34-34: Line too long (144 > 120)

(E501)

35-35: Undefined name _COMPSEARCHREQUEST

(F821)

36-36: Undefined name _COMPSEARCHREQUEST

(F821)

37-37: Undefined name _COMPSTATISTIC

(F821)

38-38: Undefined name _COMPSTATISTIC

(F821)

39-39: Undefined name _COMPSTATISTIC_LANGUAGE

(F821)

40-40: Undefined name _COMPSTATISTIC_LANGUAGE

(F821)

41-41: Undefined name _COMPSTATISTICRESPONSE

(F821)

42-42: Undefined name _COMPSTATISTICRESPONSE

(F821)

43-43: Undefined name _COMPSTATISTICRESPONSE_PURLS

(F821)

44-44: Undefined name _COMPSTATISTICRESPONSE_PURLS

(F821)

45-45: Undefined name _COMPSEARCHRESPONSE

(F821)

46-46: Undefined name _COMPSEARCHRESPONSE

(F821)

47-47: Undefined name _COMPSEARCHRESPONSE_COMPONENT

(F821)

48-48: Undefined name _COMPSEARCHRESPONSE_COMPONENT

(F821)

49-49: Undefined name _COMPVERSIONREQUEST

(F821)

50-50: Undefined name _COMPVERSIONREQUEST

(F821)

51-51: Undefined name _COMPVERSIONRESPONSE

(F821)

52-52: Undefined name _COMPVERSIONRESPONSE

(F821)

53-53: Undefined name _COMPVERSIONRESPONSE_LICENSE

(F821)

54-54: Undefined name _COMPVERSIONRESPONSE_LICENSE

(F821)

55-55: Undefined name _COMPVERSIONRESPONSE_VERSION

(F821)

56-56: Undefined name _COMPVERSIONRESPONSE_VERSION

(F821)

57-57: Undefined name _COMPVERSIONRESPONSE_COMPONENT

(F821)

58-58: Undefined name _COMPVERSIONRESPONSE_COMPONENT

(F821)

59-59: Undefined name _COMPONENTS

(F821)

60-60: Undefined name _COMPONENTS

(F821)

src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (1)

21-29: Improved code formatting and readability

The changes to indentation and parameter alignment enhance code readability without changing functionality. This is consistent with formatting improvements in other gRPC files observed in the code snippets from relevant_code_snippets.

Also applies to: 37-39, 45-46, 54-63, 65-66, 77-91, 94-108

src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py (1)

19-40: Auto-generated Protobuf file updates

The updates to the DESCRIPTOR and serialized options/indices are consistent with the protobuf definitions needed to support the new functionality mentioned in the PR objectives.

Similar to other Protobuf files, the static analysis tools flag undefined variables that are expected to be defined at runtime by the Protobuf machinery.

🧰 Tools

🪛 Ruff (0.8.2)

19-19: Line too long (1718 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (375 > 120)

(E501)

27-27: Undefined name _SEMGREP

(F821)

28-28: Undefined name _SEMGREP

(F821)

29-29: Undefined name _SEMGREP

(F821)

30-30: Undefined name _SEMGREP

(F821)

30-30: Line too long (123 > 120)

(E501)

31-31: Undefined name _SEMGREPRESPONSE

(F821)

32-32: Undefined name _SEMGREPRESPONSE

(F821)

33-33: Undefined name _SEMGREPRESPONSE_ISSUE

(F821)

34-34: Undefined name _SEMGREPRESPONSE_ISSUE

(F821)

35-35: Undefined name _SEMGREPRESPONSE_FILE

(F821)

36-36: Undefined name _SEMGREPRESPONSE_FILE

(F821)

37-37: Undefined name _SEMGREPRESPONSE_PURLS

(F821)

38-38: Undefined name _SEMGREPRESPONSE_PURLS

(F821)

39-39: Undefined name _SEMGREP

(F821)

40-40: Undefined name _SEMGREP

(F821)

src/scanoss/file_filters.py (3)

245-249: Validate newly added extensions.

Adding .whml, .pom, .smtml, .min.js, .mf, and .base64 is valuable but ensure that skipping these aligns with the overall scanning/hashing objectives. Consider confirming each extension with product owners or security leads to avoid skipping critical files by accident.

460-468: Align with docstring defaults.

All newly assigned fields from **kwargs match the docstrings, which is good consistency. However, verify whether passing is_folder_hashing_scan as False by default in all contexts is appropriate, or if you prefer to default to None for safer detection.

527-556: Additional checks for hidden files and size-based skips look solid.

The newly introduced logic for skipping hidden files/folders and empty files is consistent with the docstring. This helps to ensure you do not hash irrelevant or privileged data.

src/scanoss/scanners/folder_hasher.py (3)

158-159: Use a chunk-based approach for hashing large files.

166-173: Avoid duplicating file references in both parent and root nodes.

245-249: Overall approach is sound.

Your simhash for names/content and fallback conditions for insufficient file sets are well-structured. No concerns with the rest of the hashing pipeline.

src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (4)

6-6: Trivial line length warning.

The import line is reported by static analysis as exceeding the recommended length. Typically, these gRPC auto-generated lines are not a priority to fix unless your codebase enforces strict line length rules.

🧰 Tools

🪛 Ruff (0.8.2)

6-6: Line too long (138 > 120)

(E501)

30-49: Introducing new RPC stubs.

The stubs for GetAlgorithmsInRange, GetVersionsInRange, GetHintsInRange, and GetEncryptionHints conform to the existing naming conventions and usage of request/response types. Implementation detail remains consistent with gRPC best practices.

102-131: RPC method registration is correct.

The method handlers reference the correct request/response objects. Properly updates the server with new endpoints. No issues found.

145-244: Inline function definitions with many optional parameters.

The gRPC experimental stubs define many optional parameters (e.g., channel_credentials, call_credentials). This is standard for advanced usage, yet triggers static analysis warnings. Usually safe to ignore for auto-generated code.

🧰 Tools

🪛 Ruff (0.8.2)

145-145: Too many arguments in function definition (10 > 5)

(PLR0913)

162-162: Too many arguments in function definition (10 > 5)

(PLR0913)

179-179: Too many arguments in function definition (10 > 5)

(PLR0913)

189-189: Line too long (127 > 120)

(E501)

196-196: Too many arguments in function definition (10 > 5)

(PLR0913)

206-206: Line too long (125 > 120)

(E501)

213-213: Too many arguments in function definition (10 > 5)

(PLR0913)

223-223: Line too long (122 > 120)

(E501)

230-230: Too many arguments in function definition (10 > 5)

(PLR0913)

240-240: Line too long (125 > 120)

(E501)
src/scanoss/utils/simhash.py (6)

4-4: Update copyright year.

The copyright year is set to 2025, which is incorrect as it's beyond the current year.
-  Copyright (c) 2025, SCANOSS
+  Copyright (c) 2024, SCANOSS
33-39: LGTM: Well-implemented FNV-1 hashing algorithm.

The implementation of the 64-bit FNV-1 hash function follows the standard algorithm with correct usage of the offset basis, prime multiplier, and bit masking.

42-53: LGTM: Well-structured SimhashFeature class.

The class provides a clean encapsulation of hash values with weights, which is essential for the simhash algorithm. The methods are well-documented and appropriately named.

66-81: LGTM: Vectorize implementation correctly handles feature weights.

The vectorize function properly processes feature weights, adding them for set bits and subtracting for unset bits, which is essential for creating an accurate representation of the feature space.

112-122: LGTM: Optimal Hamming distance calculation.

This compare function efficiently calculates the Hamming distance using the bit manipulation trick (v &= v - 1) to count set bits, which is more efficient than naive approaches.

183-198: Add input validation to shingle function.

The shingle function should validate that input list b is not empty and contains only bytes objects.
 def shingle(w: int, b: list) -> list:
     if w < 1:
         raise ValueError('simhash.shingle(): k must be a positive integer')
+    if not b:
+        raise ValueError('simhash.shingle(): input list cannot be empty')
+    if not all(isinstance(x, bytes) for x in b):
+        raise TypeError('simhash.shingle(): all elements must be bytes objects')
     if w == 1:
         return b
     w = min(w, len(b))
src/protoc_gen_swagger/options/openapiv2_pb2.py (1)

18-117: LGTM: Auto-generated protobuf code restoration.

This correctly restores the serialized options and serialized start/end positions for the protocol buffer definitions, which is important for maintaining compatibility with the protobuf compiler.

The static analysis warnings for undefined names (F821) are false positives, as these variables are dynamically defined during the Protocol Buffer build process. Since this is an auto-generated file (as noted in the header comment), these warnings can be safely ignored.

🧰 Tools

🪛 Ruff (0.8.2)

18-18: Line too long (9607 > 120)

(E501)

22-22: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Undefined name _SWAGGER_RESPONSESENTRY

(F821)

27-27: Undefined name _SWAGGER_RESPONSESENTRY

(F821)

28-28: Undefined name _SWAGGER_EXTENSIONSENTRY

(F821)

29-29: Undefined name _SWAGGER_EXTENSIONSENTRY

(F821)

30-30: Undefined name _OPERATION_RESPONSESENTRY

(F821)

31-31: Undefined name _OPERATION_RESPONSESENTRY

(F821)

32-32: Undefined name _OPERATION_EXTENSIONSENTRY

(F821)

33-33: Undefined name _OPERATION_EXTENSIONSENTRY

(F821)

34-34: Undefined name _RESPONSE_HEADERSENTRY

(F821)

35-35: Undefined name _RESPONSE_HEADERSENTRY

(F821)

36-36: Undefined name _RESPONSE_EXAMPLESENTRY

(F821)

37-37: Undefined name _RESPONSE_EXAMPLESENTRY

(F821)

38-38: Undefined name _RESPONSE_EXTENSIONSENTRY

(F821)

39-39: Undefined name _RESPONSE_EXTENSIONSENTRY

(F821)

40-40: Undefined name _INFO_EXTENSIONSENTRY

(F821)

41-41: Undefined name _INFO_EXTENSIONSENTRY

(F821)

42-42: Undefined name _SCHEMA

(F821)

43-43: Undefined name _SCHEMA

(F821)

44-44: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY

(F821)

45-45: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY

(F821)

46-46: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY

(F821)

47-47: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY

(F821)

48-48: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY

(F821)

49-49: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY

(F821)

50-50: Undefined name _SCOPES_SCOPEENTRY

(F821)

51-51: Undefined name _SCOPES_SCOPEENTRY

(F821)

52-52: Undefined name _SWAGGER

(F821)

53-53: Undefined name _SWAGGER

(F821)

54-54: Undefined name _SWAGGER_RESPONSESENTRY

(F821)

55-55: Undefined name _SWAGGER_RESPONSESENTRY

(F821)

56-56: Undefined name _SWAGGER_EXTENSIONSENTRY

(F821)

57-57: Undefined name _SWAGGER_EXTENSIONSENTRY

(F821)

58-58: Undefined name _SWAGGER_SWAGGERSCHEME

(F821)

59-59: Undefined name _SWAGGER_SWAGGERSCHEME

(F821)

60-60: Undefined name _OPERATION

(F821)

61-61: Undefined name _OPERATION

(F821)

62-62: Undefined name _OPERATION_RESPONSESENTRY

(F821)

63-63: Undefined name _OPERATION_RESPONSESENTRY

(F821)

64-64: Undefined name _OPERATION_EXTENSIONSENTRY

(F821)

65-65: Undefined name _OPERATION_EXTENSIONSENTRY

(F821)

66-66: Undefined name _HEADER

(F821)

67-67: Undefined name _HEADER

(F821)

68-68: Undefined name _RESPONSE

(F821)

69-69: Undefined name _RESPONSE

(F821)

70-70: Undefined name _RESPONSE_HEADERSENTRY

(F821)

71-71: Undefined name _RESPONSE_HEADERSENTRY

(F821)

72-72: Undefined name _RESPONSE_EXAMPLESENTRY

(F821)

73-73: Undefined name _RESPONSE_EXAMPLESENTRY

(F821)

74-74: Undefined name _RESPONSE_EXTENSIONSENTRY

(F821)

75-75: Undefined name _RESPONSE_EXTENSIONSENTRY

(F821)

76-76: Undefined name _INFO

(F821)

77-77: Undefined name _INFO

(F821)

78-78: Undefined name _INFO_EXTENSIONSENTRY

(F821)

79-79: Undefined name _INFO_EXTENSIONSENTRY

(F821)

80-80: Undefined name _CONTACT

(F821)

81-81: Undefined name _CONTACT

(F821)

82-82: Undefined name _LICENSE

(F821)

83-83: Undefined name _LICENSE

(F821)

84-84: Undefined name _EXTERNALDOCUMENTATION

(F821)

85-85: Undefined name _EXTERNALDOCUMENTATION

(F821)

86-86: Undefined name _SCHEMA

(F821)

87-87: Undefined name _SCHEMA

(F821)

88-88: Undefined name _JSONSCHEMA

(F821)

89-89: Undefined name _JSONSCHEMA

(F821)

90-90: Undefined name _JSONSCHEMA_JSONSCHEMASIMPLETYPES

(F821)

91-91: Undefined name _JSONSCHEMA_JSONSCHEMASIMPLETYPES

(F821)

92-92: Undefined name _TAG

(F821)

93-93: Undefined name _TAG

(F821)

94-94: Undefined name _SECURITYDEFINITIONS

(F821)

95-95: Undefined name _SECURITYDEFINITIONS

(F821)

96-96: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY

(F821)

97-97: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY

(F821)

98-98: Undefined name _SECURITYSCHEME

(F821)

99-99: Undefined name _SECURITYSCHEME

(F821)

100-100: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY

(F821)

101-101: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY

(F821)

102-102: Undefined name _SECURITYSCHEME_TYPE

(F821)

103-103: Undefined name _SECURITYSCHEME_TYPE

(F821)

104-104: Undefined name _SECURITYSCHEME_IN

(F821)

105-105: Undefined name _SECURITYSCHEME_IN

(F821)

106-106: Undefined name _SECURITYSCHEME_FLOW

(F821)

107-107: Undefined name _SECURITYSCHEME_FLOW

(F821)

108-108: Undefined name _SECURITYREQUIREMENT

(F821)

109-109: Undefined name _SECURITYREQUIREMENT

(F821)

110-110: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTVALUE

(F821)

111-111: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTVALUE

(F821)

112-112: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY

(F821)

113-113: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY

(F821)

114-114: Undefined name _SCOPES

(F821)

115-115: Undefined name _SCOPES

(F821)

116-116: Undefined name _SCOPES_SCOPEENTRY

(F821)

117-117: Undefined name _SCOPES_SCOPEENTRY

(F821)

src/scanoss/cli.py

coderabbitai · 2025-03-27T11:29:00Z

src/scanoss/utils/crc64.py

+    @classmethod
+    def get_hash_buff(cls, buff: bytes) -> List[bytes]:
+        """
+        Get the hash value of the given buffer, and converts it to 8 bytes in big-endian order.
+
+        Args:
+            buff (bytes): The buffer to get the hash value of.
+
+        Returns:
+            bytes: The hash value of the given buffer, and converts it to 8 bytes in big-endian order.
+        """
+        crc = cls()
+        crc.update(buff)
+        hash_val = crc.digest()
+
+        return list(struct.pack('>Q', hash_val))


🛠️ Refactor suggestion

Align docstring with return type
The docstring indicates returning bytes, but the method returns a list of integers. Decide whether you want a byte array or a list. For consistency, consider returning actual bytes:

- def get_hash_buff(cls, buff: bytes) -> List[bytes]: ... - return list(struct.pack('>Q', hash_val)) + def get_hash_buff(cls, buff: bytes) -> bytes: ... + return struct.pack('>Q', hash_val)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@classmethod

def get_hash_buff(cls, buff: bytes) -> List[bytes]:

"""

Get the hash value of the given buffer, and converts it to 8 bytes in big-endian order.

Args:

buff (bytes): The buffer to get the hash value of.

Returns:

bytes: The hash value of the given buffer, and converts it to 8 bytes in big-endian order.

"""

crc = cls()

crc.update(buff)

hash_val = crc.digest()

return list(struct.pack('>Q', hash_val))

@classmethod

def get_hash_buff(cls, buff: bytes) -> bytes:

"""

Get the hash value of the given buffer, and converts it to 8 bytes in big-endian order.

Args:

buff (bytes): The buffer to get the hash value of.

Returns:

bytes: The hash value of the given buffer, and converts it to 8 bytes in big-endian order.

"""

crc = cls()

crc.update(buff)

hash_val = crc.digest()

return struct.pack('>Q', hash_val)

src/scanoss/scanossgrpc.py

coderabbitai · 2025-03-27T11:29:00Z

src/scanoss/file_filters.py

+            dir_path = Path(dirpath)
+            rel_path = dir_path.relative_to(root_path)
+            if dir_path.is_symlink():  # TODO should we skip symlink folders?
+                self.print_msg(f'WARNING: Found symbolic link folder: {dir_path}')

-            if self._should_skip_dir(str(rel_path)):  # Current directory should be skipped
+            if self.should_skip_dir(str(rel_path)):  # Current directory should be skipped
                dirnames.clear()
                continue
            for filename in filenames:
-                file_path = dirpath / filename
+                file_path = dir_path / filename


🛠️ Refactor suggestion

Consider skipping symlink directories.

Currently, you log a warning on encountering symlink folders but still descend them, unless the directory skip logic triggers. Evaluate whether you want to skip symlink directories altogether to avoid potential cyclical references or unexpected expansions.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

tests/test_file_filters.py (2)
195-211: Consider adding tests for edge cases.

The current test covers the basic case where files are relative to the base directory. Consider adding tests for:

When base_dir is not provided (defaults to None)

When files are not relative to the base directory

When the base directory itself contains characters that need special handling

204-204: Parameter name inconsistency.

Note that the parameter is named base_dir in the test but scan_root in the implementation. While this doesn't affect functionality, consistent naming would improve code readability.
-        filtered_files = self.file_filters.get_filtered_files_from_files(files, self.test_dir)
+        filtered_files = self.file_filters.get_filtered_files_from_files(files, scan_root=self.test_dir)
src/scanoss/cli.py (1)
1657-1667: Improve the get_scanoss_settings_from_args function to handle all scenarios.

The function doesn't explicitly handle the case when args.skip_settings_file is True, which could lead to confusion about what gets returned.
 def get_scanoss_settings_from_args(args):
+    """Extract SCANOSS settings from command-line arguments
+
+    Args:
+        args: Parsed command-line arguments
+
+    Returns:
+        ScanossSettings or None: The loaded settings object or None if settings should be skipped
+    """
     scanoss_settings = None
     if not args.skip_settings_file:
         scanoss_settings = ScanossSettings(debug=args.debug, trace=args.trace, quiet=args.quiet)
         try:
             scanoss_settings.load_json_file(args.settings, args.scan_dir).set_file_type('new').set_scan_type('identify')
         except ScanossSettingsError as e:
             print_stderr(f'Error: {e}')
             sys.exit(1)
-        return scanoss_settings
+    return scanoss_settings

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b2e026 and 4edc219.

📒 Files selected for processing (3)

src/scanoss/cli.py (20 hunks)
src/scanoss/scanossgrpc.py (7 hunks)
tests/test_file_filters.py (1 hunks)

🧰 Additional context used

🧬 Code Definitions (3)

tests/test_file_filters.py (1)

src/scanoss/file_filters.py (1)

get_filtered_files_from_files (514-564)

src/scanoss/cli.py (4)

src/scanoss/scanners/folder_hasher.py (3)

FolderHasher (67-260)

create_folder_hasher_config_from_args (55-64)

hash_directory (105-124)

src/scanoss/scanossgrpc.py (1)

ScanossGrpc (95-609)

src/scanoss/scanners/scanner_hfh.py (2)

ScannerHFH (41-130)

scan (94-126)

src/scanoss/scanoss_settings.py (2)

ScanossSettings (72-303)

load_json_file (99-134)

src/scanoss/scanossgrpc.py (2)

src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (1)

ProvenanceStub (9-29)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (3)

ScanningStub (9-29)

FolderHashScan (44-49)

FolderHashScan (94-108)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: build

🔇 Additional comments (13)

tests/test_file_filters.py (2)

200-204: LGTM: Updated test with new base_dir parameter.

The test has been correctly modified to use the new parameter base_dir in the get_filtered_files_from_files method. This change aligns with the implementation in src/scanoss/file_filters.py where the method now accepts a scan_root parameter.

207-209: LGTM: Simplified expected results.

The expected results have been simplified to use direct string representations of file paths instead of using os.path.relpath. This change makes the test more readable and directly represents the expected output format of the get_filtered_files_from_files method with the new parameter.
src/scanoss/cli.py (5)

489-528: Well-structured argument parsing for the new folder-scan command.

The implementation of the folder-scan command options is clean and comprehensive, with appropriate defaults and descriptions.

I particularly like the validation for the threshold parameter using choices=range(1, 101) with a descriptive metavar='1-100' that clearly communicates the valid range to users.

530-548: Good implementation of the folder-hash command with clear options.

The folder-hash command is well-structured with appropriate arguments.

549-562: Consistent settings options across commands.

Adding settings options to both new commands and existing ones provides a consistent user experience and reuses existing functionality.

1584-1622: Enhance error handling in the folder_hashing_scan function.

While the basic error handling is in place, consider adding:

More specific error types for better error reporting

Cleanup of resources in case of failures

Logging of scan progress and errors
 def folder_hashing_scan(parser, args):
     try:
         if not args.scan_dir:
             print_stderr('ERROR: Please specify a directory to scan')
             parser.parse_args([args.subparser, '-h'])
             sys.exit(1)

         if not os.path.exists(args.scan_dir) or not os.path.isdir(args.scan_dir):
             print_stderr(f'ERROR: The specified directory {args.scan_dir} does not exist')
             sys.exit(1)

+        if not os.access(args.scan_dir, os.R_OK):
+            print_stderr(f'ERROR: No read permission for directory {args.scan_dir}')
+            sys.exit(1)
+
         scanner_config = create_scanner_config_from_args(args)
         scanoss_settings = get_scanoss_settings_from_args(args)
         grpc_config = create_grpc_config_from_args(args)

         client = ScanossGrpc(**asdict(grpc_config))

+        try:
             scanner = ScannerHFH(
                 scan_dir=args.scan_dir,
                 config=scanner_config,
                 client=client,
                 scanoss_settings=scanoss_settings,
             )

             scanner.best_match = args.best_match
             scanner.threshold = args.threshold

             scanner.scan()
             scanner.present(output_file=args.output, output_format=args.format)
+        finally:
+            client.close()  # Ensure client is properly closed
+
     except ScanossGrpcError as e:
         print_stderr(f'ERROR: {e}')
         sys.exit(1)
+    except Exception as e:
+        print_stderr(f'ERROR: Unexpected error during scan: {e}')
+        sys.exit(1)
1624-1654: Enhance error handling and add progress feedback.

Consider adding more specific error handling and progress feedback.
 def folder_hash(parser, args):
     """Run the "folder-hash" sub-command

     Args:
         parser (ArgumentParser): command line parser object
         args (Namespace): Parsed arguments
     """
     try:
         if not args.scan_dir:
             print_stderr('ERROR: Please specify a directory to scan')
             parser.parse_args([args.subparser, '-h'])
             sys.exit(1)

         if not os.path.exists(args.scan_dir) or not os.path.isdir(args.scan_dir):
             print_stderr(f'ERROR: The specified directory {args.scan_dir} does not exist')
             sys.exit(1)

+        if not os.access(args.scan_dir, os.R_OK):
+            print_stderr(f'ERROR: No read permission for directory {args.scan_dir}')
+            sys.exit(1)
+
         folder_hasher_config = create_folder_hasher_config_from_args(args)
         scanoss_settings = get_scanoss_settings_from_args(args)

+        if not args.quiet:
+            print_stderr(f'Hashing directory: {args.scan_dir}')
+
         folder_hasher = FolderHasher(
             scan_dir=args.scan_dir,
             config=folder_hasher_config,
             scanoss_settings=scanoss_settings,
         )

         folder_hasher.hash_directory(args.scan_dir)
         folder_hasher.present(output_file=args.output, output_format=args.format)
-    except Exception as e:
+    except (OSError, IOError) as e:
+        print_stderr(f'ERROR: File system error: {e}')
+        sys.exit(1)
+    except ValueError as e:
+        print_stderr(f'ERROR: Invalid input: {e}')
+        sys.exit(1)
+    except Exception as e:
         print_stderr(f'ERROR: {e}')
         sys.exit(1)
src/scanoss/scanossgrpc.py (6)

78-93: Good implementation of custom exception and status code enum.

Creating a dedicated exception class and using an enum for status codes improves code readability and maintainability.

173-185: Clean integration of the scanning stub.

The scanning stub is properly initialized for both secure and insecure connections.

465-480: Well-implemented folder hash scan function.

The folder_hash_scan method effectively leverages the newly added _call_rpc helper method to reduce code duplication.

482-520: Excellent abstraction with the _call_rpc helper method.

The _call_rpc method is a well-designed abstraction that:

Handles common gRPC call patterns

Provides good error handling with specific exceptions

Simplifies request preparation and response parsing

Includes appropriate debug logging

This significantly reduces code duplication across gRPC methods.

532-543: Improved readability with enum-based status code checks.

Replacing hardcoded status code checks with the ScanossGrpcStatusCode enum improves code readability and maintainability.

617-646: Well-structured configuration with GrpcConfig dataclass.

The GrpcConfig dataclass and create_grpc_config_from_args helper function provide a clean way to encapsulate and create gRPC configuration from command-line arguments.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/scanoss/scanners/folder_hasher.py (1)
282-290: Redundant implementation in _format_plain_output.

The _format_plain_output method is almost identical to _format_json_output and includes a check for whether the tree is a dictionary, but the tree should always be a dictionary as returned by _hash_calc_from_node.

Consider simplifying this method:
-    def _format_plain_output(self) -> str:
-        """
-        Format the scan output data into a plain text string
-        """
-        return (
-            json.dumps(self.folder_hasher.tree, indent=2)
-            if isinstance(self.folder_hasher.tree, dict)
-            else str(self.folder_hasher.tree)
-        )
+    def _format_plain_output(self) -> str:
+        """
+        Format the scan output data into a plain text string
+        """
+        return json.dumps(self.folder_hasher.tree, indent=2)
Alternatively, if it's intended to have different formatting for different output types, implement actual differences between JSON and plain text formats.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4edc219 and ce782b3.

📒 Files selected for processing (1)

src/scanoss/scanners/folder_hasher.py (1 hunks)

🧰 Additional context used

🧬 Code Definitions (1)

src/scanoss/scanners/folder_hasher.py (4)

src/scanoss/scanossbase.py (2)

ScanossBase (28-101)

print_debug (58-63)

src/scanoss/utils/crc64.py (2)

CRC64 (29-96)

get_hash_buff (82-96)

src/scanoss/utils/simhash.py (4)

simhash (125-130)

WordFeatureSet (163-169)

fingerprint (100-109)

vectorize_bytes (84-97)

src/scanoss/scanners/scanner_hfh.py (3)

present (128-130)

_format_json_output (143-150)

_format_plain_output (152-160)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: build

🔇 Additional comments (11)

src/scanoss/scanners/folder_hasher.py (11)

16-18: Clear and well-defined constants.

The constants are explicitly defined with descriptive names, making the code more maintainable and the threshold values easy to adjust if needed.

21-31: Good use of class structure for directory representation.

The DirectoryNode class effectively represents the hierarchical nature of directories with a clean separation between child nodes and files.

33-42: Well-structured DirectoryFile class.

This class efficiently stores file metadata including path and hash information in both raw bytes and string format.

44-53: Good use of dataclass for configuration.

The FolderHasherConfig dataclass provides a clean way to handle configuration with sensible defaults.

55-64: Defensive coding in configuration creation.

The use of getattr with default values is a good practice for handling potentially missing attributes from the args object.

143-144: Verify file name length filtering logic.

The condition len(f.name.encode('utf-8')) <= MINIMUM_FILE_NAME_LENGTH filters out files with names longer than MINIMUM_FILE_NAME_LENGTH, which seems counter-intuitive given the variable name suggests a minimum rather than maximum length.

Please confirm if this logic is intentional. If MINIMUM_FILE_NAME_LENGTH is intended to be a maximum, consider renaming it to MAXIMUM_FILE_NAME_LENGTH for clarity.

158-159: Use a streaming or chunk-based approach for hashing large files.

Reading the entire file into memory with read_bytes() might cause performance issues or memory errors when processing very large files.

Consider implementing a chunked approach for file reading:
-                file_bytes = full_file_path.read_bytes()
-                key = CRC64.get_hash_buff(file_bytes)
+                crc = CRC64()
+                with open(full_file_path, 'rb') as f:
+                    while chunk := f.read(8192):  # 8KB chunks
+                        crc.update(chunk)
+                key = list(struct.pack('>Q', crc.digest()))
165-173: Avoid storing file references in both parent and child nodes.

Appending the same DirectoryFile object to each parent directory and the root node may lead to memory inefficiency and duplicate data structures.

Consider maintaining the file reference only in the leaf node corresponding to its actual directory:
-                    current_node.files.append(file_item)
+                # Only append file to the final directory node
+                current_node.files.append(file_item)
-                root_node.files.append(file_item)
205-256: Well-implemented hash calculation with proper edge case handling.

The method correctly checks for minimum requirements before calculating hashes and effectively deduplicates files. The simhash implementation provides a robust way to generate fingerprints for directories.

258-261: Good use of delegation pattern for presenting results.

The present method effectively delegates the presentation logic to the presenter class, promoting clean separation of concerns.

235-249: Thorough validation of minimum requirements.

The code correctly checks if both the minimum file count and minimum concatenated name length requirements are met before proceeding with hash calculations. This prevents generating unreliable hashes for directories with too few files.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (1)
23-23: Minor style issue: boolean comparison.

The condition uses == False instead of the more Pythonic is False or simply not.
-if _descriptor._USE_C_DESCRIPTORS == False:
+if not _descriptor._USE_C_DESCRIPTORS:
🧰 Tools

🪛 Ruff (0.8.2)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce782b3 and f347747.

📒 Files selected for processing (1)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py

19-19: Line too long (2012 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (379 > 120)

(E501)

27-27: Undefined name _SCANNING

(F821)

28-28: Undefined name _SCANNING

(F821)

29-29: Undefined name _SCANNING

(F821)

30-30: Undefined name _SCANNING

(F821)

30-30: Line too long (132 > 120)

(E501)

31-31: Undefined name _HFHREQUEST

(F821)

32-32: Undefined name _HFHREQUEST

(F821)

33-33: Undefined name _HFHREQUEST_CHILDREN

(F821)

34-34: Undefined name _HFHREQUEST_CHILDREN

(F821)

35-35: Undefined name _HFHRESPONSE

(F821)

36-36: Undefined name _HFHRESPONSE

(F821)

37-37: Undefined name _HFHRESPONSE_COMPONENT

(F821)

38-38: Undefined name _HFHRESPONSE_COMPONENT

(F821)

39-39: Undefined name _HFHRESPONSE_RESULT

(F821)

40-40: Undefined name _HFHRESPONSE_RESULT

(F821)

41-41: Undefined name _SCANNING

(F821)

42-42: Undefined name _SCANNING

(F821)

🔇 Additional comments (2)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (2)

19-42: New folder hashing functionality successfully implemented.

The code correctly adds new Protocol Buffer message types (HFHRequest and HFHResponse) and the FolderHashScan method as intended by the PR objectives. These changes enable support for folder hashing in the SCANOSS API.

🧰 Tools

🪛 Ruff (0.8.2)

19-19: Line too long (2012 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (379 > 120)

(E501)

27-27: Undefined name _SCANNING

(F821)

28-28: Undefined name _SCANNING

(F821)

29-29: Undefined name _SCANNING

(F821)

30-30: Undefined name _SCANNING

(F821)

30-30: Line too long (132 > 120)

(E501)

31-31: Undefined name _HFHREQUEST

(F821)

32-32: Undefined name _HFHREQUEST

(F821)

33-33: Undefined name _HFHREQUEST_CHILDREN

(F821)

34-34: Undefined name _HFHREQUEST_CHILDREN

(F821)

35-35: Undefined name _HFHRESPONSE

(F821)

36-36: Undefined name _HFHRESPONSE

(F821)

37-37: Undefined name _HFHRESPONSE_COMPONENT

(F821)

38-38: Undefined name _HFHRESPONSE_COMPONENT

(F821)

39-39: Undefined name _HFHRESPONSE_RESULT

(F821)

40-40: Undefined name _HFHRESPONSE_RESULT

(F821)

41-41: Undefined name _SCANNING

(F821)

42-42: Undefined name _SCANNING

(F821)

19-42: Static analysis warnings can be safely ignored.

The static analysis tool flagged several issues (line length and undefined names like _SCANNING, _HFHREQUEST, etc.), but these are normal for Protocol Buffer generated files. These symbols are defined at runtime by the Protocol Buffer compiler's code generation process, and the long lines are expected in serialized data.

🧰 Tools

🪛 Ruff (0.8.2)

19-19: Line too long (2012 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (379 > 120)

(E501)

27-27: Undefined name _SCANNING

(F821)

28-28: Undefined name _SCANNING

(F821)

29-29: Undefined name _SCANNING

(F821)

30-30: Undefined name _SCANNING

(F821)

30-30: Line too long (132 > 120)

(E501)

31-31: Undefined name _HFHREQUEST

(F821)

32-32: Undefined name _HFHREQUEST

(F821)

33-33: Undefined name _HFHREQUEST_CHILDREN

(F821)

34-34: Undefined name _HFHREQUEST_CHILDREN

(F821)

35-35: Undefined name _HFHRESPONSE

(F821)

36-36: Undefined name _HFHRESPONSE

(F821)

37-37: Undefined name _HFHRESPONSE_COMPONENT

(F821)

38-38: Undefined name _HFHRESPONSE_COMPONENT

(F821)

39-39: Undefined name _HFHRESPONSE_RESULT

(F821)

40-40: Undefined name _HFHRESPONSE_RESULT

(F821)

41-41: Undefined name _SCANNING

(F821)

42-42: Undefined name _SCANNING

(F821)

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py (1)
23-24: Consider using Pythonic comparison for boolean values.

The comparison == False could be replaced with the more Pythonic is False or simply not.
- if _descriptor._USE_C_DESCRIPTORS == False:
+ if not _descriptor._USE_C_DESCRIPTORS:
🧰 Tools

🪛 Ruff (0.8.2)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f347747 and 09df227.

📒 Files selected for processing (3)

src/scanoss/api/common/v2/scanoss_common_pb2.py (1 hunks)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py (1 hunks)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (3 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (6)

src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (2)

Echo (47-52)

Echo (111-125)

src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (2)

Echo (57-62)

Echo (145-159)

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (2)

Echo (37-42)

Echo (77-91)

src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (2)

Echo (42-47)

Echo (94-108)

🪛 Ruff (0.8.2)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py

6-6: Line too long (138 > 120)

(E501)

94-94: Too many arguments in function definition (10 > 5)

(PLR0913)

111-111: Too many arguments in function definition (10 > 5)

(PLR0913)

121-121: Line too long (122 > 120)

(E501)

128-128: Too many arguments in function definition (10 > 5)

(PLR0913)

138-138: Line too long (132 > 120)

(E501)

src/scanoss/api/common/v2/scanoss_common_pb2.py

16-16: Line too long (933 > 120)

(E501)

20-20: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

24-24: Undefined name _STATUSCODE

(F821)

25-25: Undefined name _STATUSCODE

(F821)

26-26: Undefined name _STATUSRESPONSE

(F821)

27-27: Undefined name _STATUSRESPONSE

(F821)

28-28: Undefined name _ECHOREQUEST

(F821)

29-29: Undefined name _ECHOREQUEST

(F821)

30-30: Undefined name _ECHORESPONSE

(F821)

31-31: Undefined name _ECHORESPONSE

(F821)

32-32: Undefined name _PURLREQUEST

(F821)

33-33: Undefined name _PURLREQUEST

(F821)

34-34: Undefined name _PURLREQUEST_PURLS

(F821)

35-35: Undefined name _PURLREQUEST_PURLS

(F821)

36-36: Undefined name _PURL

(F821)

37-37: Undefined name _PURL

(F821)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py

19-19: Line too long (3278 > 120)

(E501)

23-23: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

26-26: Line too long (398 > 120)

(E501)

27-27: Undefined name _DEPENDENCIES

(F821)

28-28: Undefined name _DEPENDENCIES

(F821)

28-28: Line too long (126 > 120)

(E501)

29-29: Undefined name _DEPENDENCIES

(F821)

30-30: Undefined name _DEPENDENCIES

(F821)

30-30: Line too long (139 > 120)

(E501)

31-31: Undefined name _DEPENDENCIES

(F821)

32-32: Undefined name _DEPENDENCIES

(F821)

32-32: Line too long (150 > 120)

(E501)

33-33: Undefined name _DEPENDENCYREQUEST

(F821)

34-34: Undefined name _DEPENDENCYREQUEST

(F821)

35-35: Undefined name _DEPENDENCYREQUEST_PURLS

(F821)

36-36: Undefined name _DEPENDENCYREQUEST_PURLS

(F821)

37-37: Undefined name _DEPENDENCYREQUEST_FILES

(F821)

38-38: Undefined name _DEPENDENCYREQUEST_FILES

(F821)

39-39: Undefined name _DEPENDENCYRESPONSE

(F821)

40-40: Undefined name _DEPENDENCYRESPONSE

(F821)

41-41: Undefined name _DEPENDENCYRESPONSE_LICENSES

(F821)

42-42: Undefined name _DEPENDENCYRESPONSE_LICENSES

(F821)

43-43: Undefined name _DEPENDENCYRESPONSE_DEPENDENCIES

(F821)

44-44: Undefined name _DEPENDENCYRESPONSE_DEPENDENCIES

(F821)

45-45: Undefined name _DEPENDENCYRESPONSE_FILES

(F821)

46-46: Undefined name _DEPENDENCYRESPONSE_FILES

(F821)

47-47: Undefined name _TRANSITIVEDEPENDENCYREQUEST

(F821)

48-48: Undefined name _TRANSITIVEDEPENDENCYREQUEST

(F821)

49-49: Undefined name _TRANSITIVEDEPENDENCYRESPONSE

(F821)

50-50: Undefined name _TRANSITIVEDEPENDENCYRESPONSE

(F821)

51-51: Undefined name _TRANSITIVEDEPENDENCYRESPONSE_DEPENDENCIES

(F821)

52-52: Undefined name _TRANSITIVEDEPENDENCYRESPONSE_DEPENDENCIES

(F821)

53-53: Undefined name _DEPENDENCIES

(F821)

54-54: Undefined name _DEPENDENCIES

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: build

🔇 Additional comments (11)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (4)

30-34: Addition of GetTransitiveDependencies to DependenciesStub class looks good.

The new method follows the same pattern as the existing methods, correctly setting up the gRPC channel with appropriate serializers and deserializers.

56-61: Implementation of GetTransitiveDependencies servicer method follows established patterns.

The method is correctly set up with the Not Implemented error as a placeholder, consistent with other methods in this generated file.

76-80: ServiceHandler registration for GetTransitiveDependencies is properly configured.

The method handler is correctly registered with the appropriate request deserializer and response serializer.

127-142: The experimental Dependencies class implementation is correctly extended.

The new GetTransitiveDependencies static method follows the same pattern as other methods in the class, with consistent parameter handling and return structure.

🧰 Tools

🪛 Ruff (0.8.2)

128-128: Too many arguments in function definition (10 > 5)

(PLR0913)

138-138: Line too long (132 > 120)

(E501)

src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py (3)

19-19: The serialized descriptor is properly updated for the new method.

The serialized file descriptor is correctly modified to include the new GetTransitiveDependencies method.

🧰 Tools

🪛 Ruff (0.8.2)

19-19: Line too long (3278 > 120)

(E501)

31-32: Proper configuration of GetTransitiveDependencies API endpoint.

The serialized options correctly define the API endpoint as /api/v2/dependencies/transitive for the new GetTransitiveDependencies method.

🧰 Tools

🪛 Ruff (0.8.2)

31-31: Undefined name _DEPENDENCIES

(F821)

32-32: Undefined name _DEPENDENCIES

(F821)

32-32: Line too long (150 > 120)

(E501)

47-52: Serialized start and end positions are properly defined for TransitiveDependency classes.

The serialized start and end positions for the new TransitiveDependencyRequest and TransitiveDependencyResponse classes are correctly defined to match the updated protocol buffer definition.

🧰 Tools

🪛 Ruff (0.8.2)

47-47: Undefined name _TRANSITIVEDEPENDENCYREQUEST

(F821)

48-48: Undefined name _TRANSITIVEDEPENDENCYREQUEST

(F821)

49-49: Undefined name _TRANSITIVEDEPENDENCYRESPONSE

(F821)

50-50: Undefined name _TRANSITIVEDEPENDENCYRESPONSE

(F821)

51-51: Undefined name _TRANSITIVEDEPENDENCYRESPONSE_DEPENDENCIES

(F821)

52-52: Undefined name _TRANSITIVEDEPENDENCYRESPONSE_DEPENDENCIES

(F821)

src/scanoss/api/common/v2/scanoss_common_pb2.py (4)

1-3: Important: This is an auto-generated file

This file is generated by the protocol buffer compiler and should not be manually edited, as indicated by the comment on line 2. Any changes should be made to the .proto file and then regenerated.

14-16: LGTM: Formatting changes to the serialized protobuf data

The changes to the DESCRIPTOR variable are formatting-related, consolidating the serialized protocol buffer definition into a more compact representation without line breaks. This is a common style for generated protobuf files and doesn't affect functionality.

🧰 Tools

🪛 Ruff (0.8.2)

16-16: Line too long (933 > 120)

(E501)

20-20: Note on comparison style in generated code

The comparison _descriptor._USE_C_DESCRIPTORS == False is flagged by the linter, but since this is generated code, it should not be manually modified.

🧰 Tools

🪛 Ruff (0.8.2)

20-20: Avoid equality comparisons to False; use if not _descriptor._USE_C_DESCRIPTORS: for false checks

Replace with not _descriptor._USE_C_DESCRIPTORS

(E712)

21-37: Updated serialized offsets are correct

The changes to the serialized start and end values are appropriate as they reflect the updated positions in the serialized protobuf data structure. This ensures correct parsing of the protocol buffer messages.

🧰 Tools

🪛 Ruff (0.8.2)

24-24: Undefined name _STATUSCODE

(F821)

25-25: Undefined name _STATUSCODE

(F821)

26-26: Undefined name _STATUSRESPONSE

(F821)

27-27: Undefined name _STATUSRESPONSE

(F821)

28-28: Undefined name _ECHOREQUEST

(F821)

29-29: Undefined name _ECHOREQUEST

(F821)

30-30: Undefined name _ECHORESPONSE

(F821)

31-31: Undefined name _ECHORESPONSE

(F821)

32-32: Undefined name _PURLREQUEST

(F821)

33-33: Undefined name _PURLREQUEST

(F821)

34-34: Undefined name _PURLREQUEST_PURLS

(F821)

35-35: Undefined name _PURLREQUEST_PURLS

(F821)

36-36: Undefined name _PURL

(F821)

37-37: Undefined name _PURL

(F821)

matiasdaloia self-assigned this Feb 10, 2025

matiasdaloia requested a review from eeisegn February 10, 2025 12:48

matiasdaloia added the enhancement New feature or request label Feb 10, 2025

coderabbitai bot reviewed Feb 10, 2025

View reviewed changes

src/scanoss/scanners/scanner_hfh.py Outdated Show resolved Hide resolved

src/scanoss/utils/crc64.py Outdated Show resolved Hide resolved

src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py Outdated Show resolved Hide resolved

matiasdaloia force-pushed the feature/mdaloia/add-folder-scan-command branch from a187dd5 to 86833bc Compare February 11, 2025 09:21

coderabbitai bot reviewed Feb 11, 2025

View reviewed changes

src/scanoss/utils/simhash.py Show resolved Hide resolved

src/scanoss/utils/simhash.py Show resolved Hide resolved

src/scanoss/cli.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 11, 2025

View reviewed changes

src/scanoss/cli.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 11, 2025

View reviewed changes

matiasdaloia force-pushed the feature/mdaloia/add-folder-scan-command branch from 1cd1c26 to 293aed4 Compare February 12, 2025 06:57

coderabbitai bot reviewed Feb 12, 2025

View reviewed changes

setup.cfg Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 13, 2025

View reviewed changes

matiasdaloia force-pushed the feature/mdaloia/add-folder-scan-command branch from 0583fdf to b558ed2 Compare February 13, 2025 09:38

coderabbitai bot reviewed Feb 13, 2025

View reviewed changes

src/scanoss/cli.py Show resolved Hide resolved

coderabbitai bot reviewed Feb 25, 2025

View reviewed changes

coderabbitai bot reviewed Mar 5, 2025

View reviewed changes

coderabbitai bot reviewed Mar 11, 2025

View reviewed changes

src/scanoss/scanners/folder_hasher.py Show resolved Hide resolved

coderabbitai bot reviewed Mar 24, 2025

View reviewed changes

src/scanoss/scanners/folder_hasher.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 24, 2025

View reviewed changes

matiasdaloia force-pushed the feature/mdaloia/add-folder-scan-command branch 2 times, most recently from 779a921 to 5d65364 Compare March 25, 2025 12:03

coderabbitai bot reviewed Mar 27, 2025

View reviewed changes

matiasdaloia force-pushed the feature/mdaloia/add-folder-scan-command branch from 819e324 to 5472319 Compare March 27, 2025 11:17

matiasdaloia force-pushed the feature/mdaloia/add-folder-scan-command branch from dee5976 to 4b2e026 Compare March 27, 2025 11:18

coderabbitai bot reviewed Mar 27, 2025

View reviewed changes

feat: fix lint issues and tests

4edc219

coderabbitai bot reviewed Mar 27, 2025

View reviewed changes

feat: fix MINIMUM_FILE_NAME_LENGTH

ce782b3

coderabbitai bot reviewed Mar 27, 2025

View reviewed changes

update hfh proto api defs

f347747

coderabbitai bot reviewed Apr 9, 2025

View reviewed changes

add transitive dependency grpc api

09df227

coderabbitai bot reviewed Apr 9, 2025

View reviewed changes

matiasdaloia merged commit 3ce0d0a into main Apr 11, 2025
4 checks passed

matiasdaloia deleted the feature/mdaloia/add-folder-scan-command branch April 11, 2025 10:24

coderabbitai bot mentioned this pull request Jun 19, 2025

[SP-2587] Add directory simhash, modify concatenated names to remove extensions #119

Open

		file_bytes = full_file_path.read_bytes()
		key = CRC64.get_hash_buff(file_bytes)

feat: ES-163 Add folder hashing support #98

feat: ES-163 Add folder hashing support #98

Uh oh!

Conversation

matiasdaloia commented Feb 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

matiasdaloia commented Feb 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 10, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)