-
Notifications
You must be signed in to change notification settings - Fork 25
feat: ES-163 Add folder hashing support #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe pull request introduces version Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant CLI as CLI Parser
participant HFH as ScannerHFH
participant GRPC as ScanossGrpc
participant Service as Scanning Service
participant Presenter as AbstractPresenter
U->>CLI: Execute folder-scan command
CLI->>HFH: Call folder_hashing_scan(folder_path, settings)
HFH->>GRPC: Invoke folder_hash_scan() via gRPC
GRPC->>Service: Perform RPC folder hash scan
Service-->>GRPC: Return scan results
GRPC-->>HFH: Deliver scan results
HFH->>Presenter: Format results (JSON/plain)
Presenter-->>U: Output formatted results
Suggested reviewers
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🔭 Outside diff range comments (1)
src/scanoss/results.py (1)
200-206
: Potential IndexError with 'purl' or 'licenses'
When'purl'
or'licenses'
is an empty list, accessing[0]
can raise an IndexError. Consider checking if the list has elements before indexing.- 'purl': (item.get('purl')[0] if item.get('purl') else 'N/A'), + purls = item.get('purl') + 'purl': purls[0] if purls and len(purls) > 0 else 'N/A',
🧹 Nitpick comments (13)
src/scanoss/utils/abstract_presenter.py (1)
5-5
: Consider making output formats extensible.
Currently,AVAILABLE_OUTPUT_FORMATS
is a hardcoded list. If more formats are expected in the future, you might consider a more dynamic or configurable approach.src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (2)
7-7
: Imports need sorting/formatting.
Your lint pipeline flagged an un-sorted import block. Consider placing all imports at the top of the file in alphabetical order or adding an ignore if this file is auto-generated.🧰 Tools
🪛 GitHub Actions: Lint
[warning] 7-7: Import block is un-sorted or un-formatted. Please organize imports.
30-30
: Address excessive line lengths or add ignores if auto-generated.
These lines exceed the 120-character limit cited by the pipeline. For manually maintained code, line-wrapping is preferable. If this file is auto-generated, consider adding a# noqa: E501
or equivalent to bypass lint.Example approach to splitting line 30:
-DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n.scanoss/api/scanning/v2/scanoss-scanning.proto...<1925 chars>...') +descriptor_data = b'\n.scanoss/api/scanning/v2/scanoss-scanning.proto...' \ + b'...<split any other long strings as needed>...' +DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(descriptor_data)Also applies to: 37-37, 39-39, 41-41
🧰 Tools
🪛 Ruff (0.8.2)
30-30: Line too long (1925 > 120)
(E501)
🪛 GitHub Actions: Lint
[error] 30-30: Line too long (1925 > 120).
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (3)
3-3
: Organize import statements.
Per pipeline warnings, consider re-sorting your import statements. If gRPC tooling auto-generates this code, you can add an ignore directive or reorder them if permitted by the generator.🧰 Tools
🪛 GitHub Actions: Lint
[warning] 3-3: Import block is un-sorted or un-formatted. Please organize imports.
22-22
: Remove extraneous “f” prefix.
Your pipeline flags this as an f-string without placeholders. It can be replaced with a normal string.- raise RuntimeError( - f'The grpc package installed is at version {GRPC_VERSION},' - + f' but the generated code ...' + raise RuntimeError( + 'The grpc package installed is at version ' + GRPC_VERSION + + ' but the generated code ...'🧰 Tools
🪛 Ruff (0.8.2)
22-22: f-string without any placeholders
Remove extraneous
f
prefix(F541)
🪛 GitHub Actions: Lint
[error] 22-22: f-string without any placeholders.
7-7
: Line exceeds 120 characters.
If this file is manually maintained, shorten the line. If auto-generated, consider adding ignore directives or adjusting generation parameters to split long lines.🧰 Tools
🪛 Ruff (0.8.2)
7-7: Line too long (122 > 120)
(E501)
🪛 GitHub Actions: Lint
[error] 7-7: Line too long (122 > 120).
src/scanoss/utils/simhash.py (2)
42-50
: Consider renaming thesum
method to avoid overshadowing Python's built-in function
The method namesum
in theSimhashFeature
class overshadows Python's built-in, which might cause confusion if used in dynamic or reflection-based contexts.- def sum(self) -> int: + def hash_sum(self) -> int:
189-190
: Minor discrepancy in docstring
The docstring references "k" but the code variable is "w". Consider aligning the naming for clarity.- raise ValueError('simhash.shingle(): k must be a positive integer') + raise ValueError('simhash.shingle(): w must be a positive integer')src/scanoss/scanners/scanner_hfh.py (2)
189-191
: Consider more specific exception handling
Catching a broadException
might mask other potential issues. Consider handling only expected exceptions or re-raising unhandled ones to maintain clarity.
231-260
: Potential performance concern for large directories
Concatenating large sets of file names into a single string could degrade performance in extremely large directories.src/scanoss/scanossgrpc.py (1)
494-497
: Use descriptive constants instead of magic values.
Lines 495-498 compare status_code to numeric constants (2 and 3). To comply with linting warnings (PLR2004, etc.), consider a small enum or named constants for clarity, e.g.STATUS_CODE_WARNING = 2
,STATUS_CODE_FAILURE = 3
.- if status_code == 2: + if status_code == STATUS_CODE_WARNING: ... - elif status_code == 3: + elif status_code == STATUS_CODE_FAILURE: ...🧰 Tools
🪛 Ruff (0.8.2)
495-495: Magic value used in comparison, consider replacing
2
with a constant variable(PLR2004)
🪛 GitHub Actions: Lint
[warning] 495-495: Magic value used in comparison, consider replacing
2
with a constant variable.CLIENT_HELP.md (1)
406-415
: Clarify and Enhance the Folder-Scan Section Documentation.
The newly added "Folder-Scan a Project Folder" section is well organized and provides a clear usage example. To further enhance clarity, consider adding a brief note on any prerequisites—such as required dependencies (e.g., the modules used for CRC64 and simhash computation) or links to the corresponding implementation (possibly insrc/scanoss/cli.py
orsrc/scanoss/scanners/scanner_hfh.py
). This will help users understand if they need any additional setup to use the folder-scan subcommand.CHANGELOG.md (1)
12-17
: Improve Changelog Entry Consistency.
The new version entry for 1.21.0 is clear; however, to maintain consistency with previous changelog entries, consider using the past-tense formatting (e.g., "Added folder-scan subcommand" instead of "Add folder-scan subcommand"). This minor change will help improve overall consistency and readability of the changelog.🧰 Tools
🪛 LanguageTool
[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-02-10 ### Added - Add folder-scan subcommand - Add AbstractPr...(REPEATED_VERBS)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
CHANGELOG.md
(2 hunks)CLIENT_HELP.md
(1 hunks)pyproject.toml
(1 hunks)requirements.txt
(1 hunks)src/scanoss/__init__.py
(1 hunks)src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
(2 hunks)src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py
(3 hunks)src/scanoss/cli.py
(46 hunks)src/scanoss/constants.py
(1 hunks)src/scanoss/file_filters.py
(3 hunks)src/scanoss/results.py
(5 hunks)src/scanoss/scanners/__init__.py
(1 hunks)src/scanoss/scanners/scanner_config.py
(1 hunks)src/scanoss/scanners/scanner_hfh.py
(1 hunks)src/scanoss/scanossbase.py
(1 hunks)src/scanoss/scanossgrpc.py
(14 hunks)src/scanoss/utils/abstract_presenter.py
(1 hunks)src/scanoss/utils/crc64.py
(1 hunks)src/scanoss/utils/simhash.py
(1 hunks)
✅ Files skipped from review due to trivial changes (4)
- requirements.txt
- src/scanoss/scanners/init.py
- src/scanoss/init.py
- src/scanoss/constants.py
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md
[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-02-10 ### Added - Add folder-scan subcommand - Add AbstractPr...
(REPEATED_VERBS)
🪛 Ruff (0.8.2)
src/scanoss/file_filters.py
455-455: Too many return statements (7 > 6)
(PLR0911)
src/scanoss/scanossgrpc.py
90-90: Too many statements (51 > 50)
(PLR0915)
495-495: Magic value used in comparison, consider replacing 2
with a constant variable
(PLR2004)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
30-30: Line too long (1925 > 120)
(E501)
37-37: Line too long (391 > 120)
(E501)
39-39: Line too long (130 > 120)
(E501)
41-41: Line too long (144 > 120)
(E501)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py
4-4: warnings
imported but unused
Remove unused import: warnings
(F401)
7-7: Line too long (122 > 120)
(E501)
22-22: f-string without any placeholders
Remove extraneous f
prefix
(F541)
98-98: Too many arguments in function definition (10 > 5)
(PLR0913)
125-125: Too many arguments in function definition (10 > 5)
(PLR0913)
🪛 GitHub Actions: Lint
src/scanoss/file_filters.py
[error] 243-243: Too many arguments in function definition (11 > 5).
[warning] 314-314: for
loop variable dirpath
overwritten by assignment target.
[warning] 455-455: Too many return statements (7 > 6).
[warning] 493-493: Too many return statements (7 > 6).
src/scanoss/scanossgrpc.py
[warning] 90-90: Too many statements (51 > 50).
[warning] 495-495: Magic value used in comparison, consider replacing 2
with a constant variable.
[warning] 498-498: Magic value used in comparison, consider replacing 3
with a constant variable.
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
[warning] 7-7: Import block is un-sorted or un-formatted. Please organize imports.
[error] 25-25: Module level import not at top of file.
[warning] 25-25: scanoss.api.common.v2.scanoss_common_pb2
imported but unused.
[error] 26-26: Module level import not at top of file.
[warning] 26-26: google.api.annotations_pb2
imported but unused.
[error] 27-27: Module level import not at top of file.
[warning] 27-27: protoc_gen_swagger.options.annotations_pb2
imported but unused.
[error] 30-30: Line too long (1925 > 120).
[error] 37-37: Line too long (391 > 120).
[error] 39-39: Line too long (130 > 120).
[error] 41-41: Line too long (144 > 120).
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py
[warning] 3-3: Import block is un-sorted or un-formatted. Please organize imports.
[warning] 4-4: warnings
imported but unused.
[error] 7-7: Line too long (122 > 120).
[error] 22-22: f-string without any placeholders.
[error] 98-98: Too many arguments in function definition (10 > 5).
[error] 125-125: Too many arguments in function definition (10 > 5).
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (39)
src/scanoss/utils/abstract_presenter.py (1)
50-55
: LGTM!
This_present_output
method cleanly delegates toprint_to_file_or_stdout
, keeping presentation logic modular.src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (1)
25-25
: Verify usage of imported module.
The pipeline warns thatscanoss.api.common.v2.scanoss_common_pb2
is imported but unused. If it’s truly unused, remove it; otherwise, confirm it is referenced properly in the code.🧰 Tools
🪛 Ruff (0.8.2)
25-25: Module level import not at top of file
(E402)
25-25:
scanoss.api.common.v2.scanoss_common_pb2
imported but unusedRemove unused import:
scanoss.api.common.v2.scanoss_common_pb2
(F401)
🪛 GitHub Actions: Lint
[error] 25-25: Module level import not at top of file.
[warning] 25-25:
scanoss.api.common.v2.scanoss_common_pb2
imported but unused.src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (1)
98-98
: Too many parameters in function definitions.
Your lint pipeline flags these methods for having more than 5 parameters. Since this is standard for gRPC stubs, you can safely ignore or suppress this check.Also applies to: 125-125
🧰 Tools
🪛 Ruff (0.8.2)
98-98: Too many arguments in function definition (10 > 5)
(PLR0913)
🪛 GitHub Actions: Lint
[error] 98-98: Too many arguments in function definition (10 > 5).
src/scanoss/utils/simhash.py (1)
1-199
: Overall Implementation Feedback
The simhash implementation appears correct and efficient. The usage of FNV64, the vectorization approach, and the final simhash/hamming distance logic follows standard practices.src/scanoss/results.py (1)
1-219
: Code structure changes are logical
Inheritance fromAbstractPresenter
appears to streamline result presentation logic without major disruptions.src/scanoss/scanners/scanner_hfh.py (1)
1-296
: Initial Implementation Looks Good
The folder hashing logic is cohesive and effectively leverages simhash and CRC64.src/scanoss/scanossgrpc.py (22)
25-25
: Import statements look fine.
No issues noted with the added imports; they align well with the new functionality.Also applies to: 28-29
37-39
: Additional import stubs for gRPC scanning and provenances.
These imports are necessary for the newly introduced scanning features.
41-48
: New references to version and proto definitions.
This set of imports is consistent with the code's usage ofMessageToDict
,ParseDict
, andStatusResponse
.
55-62
: Components, Cryptography, and Dependencies stubs.
These stubs follow standard gRPC usage patterns.
64-68
: HFHRequest, SemgrepResponse, and VulnerabilityResponse usage.
These lines introduce proto-based request and response classes for scanning and vulnerability checks.
70-70
: Blank line insertion.
Minor formatting change; no concerns.
77-83
: NewScanossGrpcError
exception class.
Nicely encapsulates gRPC-specific errors, improving clarity in error handling.
156-156
: Insecure channel scanning stub creation.
No immediate concerns. Keep ensuring appropriate environment checks for production usage.
168-168
: Secure channel scanning stub creation.
Similarly, this line is consistent with the established pattern for secure connections.
249-249
: Error message for empty dependency input.
Message clarity is fine.
256-256
: Error message inget_dependencies_json
.
Consistent with the function's usage.
283-283
: Error message inget_crypto_json
.
Nothing to change.
312-312
: Error message inget_vulnerabilities_json
.
Matches existing error style.
342-342
: Error message inget_semgrep_json
.
No issues.
372-372
: Error message insearch_components_json
.
Ok as-is.
402-402
: Error message inget_component_versions_json
.
No further changes needed.
425-434
: Docstring forfolder_hash_scan
.
Describes the functionality well.
435-441
: RPC call for folder hash scan.
This approach reuses_call_rpc
to standardize gRPC calls. Good pattern.
442-480
:_call_rpc
utility method.
Centralizing gRPC logic helps with reuse and reliability. The error raising viaScanossGrpcError
is also a good pattern.
544-546
: Exception logging inget_provenance_json
.
Properly logs the gRPC call failure to stderr, consistent with other methods.
561-574
:GrpcConfig
dataclass.
This data class encapsulates gRPC client configuration neatly.
576-589
:create_grpc_config_from_args
helper function.
This function is straightforward, ensuring default values are pulled from environment or fallback constants where appropriate.src/scanoss/scanners/scanner_config.py (2)
1-55
: NewScannerConfig
class introduction.
All lines in this range define the license header, imports, and theScannerConfig
dataclass. The structure is clear, with sensible defaults and typed optional fields.
57-74
:create_scanner_config_from_args
: well-organized function.
Constructs aScannerConfig
instance reliably. Good usage ofgetattr
for CLI argument fallback.src/scanoss/scanossbase.py (1)
83-83
: Renamed parameter for clarity and consistent usage.
Changingmsg
tocontent
inprint_to_file_or_stdout
is a style improvement, clarifying the function’s intent. Implementation remains straightforward.Also applies to: 89-89, 91-91
src/scanoss/utils/crc64.py (2)
29-42
: LGTM! Efficient implementation of CRC64 ECMA standard.The implementation correctly follows the ECMA standard and matches Go's hash/crc64 package. The lookup table is efficiently shared across instances by using a class-level variable.
43-69
: LGTM! Robust implementation of core CRC64 calculation methods.The implementation correctly handles:
- Lookup table generation
- Both bytes and UTF-8 strings in the update method
- Final XOR operation in the digest method
src/scanoss/file_filters.py (2)
208-214
: LGTM! Additional file extensions for skipping.The new file extensions are valid additions to the skip list.
455-455
: Verify the impact of renaming_should_skip_dir
toshould_skip_dir
.The method has been renamed from private (
_should_skip_dir
) to public (should_skip_dir
). Ensure that this change in visibility is intentional and verify its usage across the codebase.Run the following script to verify the method usage:
✅ Verification successful
🏁 Script executed:
#!/bin/bash # Description: Check for any references to the old method name and verify the usage of the new method name. # Test: Search for old method name. Expect: No occurrences. echo "Checking for old method name..." rg "_should_skip_dir" # Test: Search for new method name. Expect: Only intended usage. echo "Checking for new method name..." rg "should_skip_dir"Length of output: 393
Renaming Impact Verified
The search for_should_skip_dir
returned no occurrences, and the new method nameshould_skip_dir
is used as expected insrc/scanoss/file_filters.py
. This indicates that the renaming from private to public was intentional and correctly propagated.🧰 Tools
🪛 Ruff (0.8.2)
455-455: Too many return statements (7 > 6)
(PLR0911)
🪛 GitHub Actions: Lint
[warning] 455-455: Too many return statements (7 > 6).
src/scanoss/cli.py (3)
477-491
: LGTM! Well-structured folder-scan subcommand.The implementation follows the established pattern and includes appropriate arguments consistent with other subcommands.
1451-1485
: LGTM! Robust implementation of folder hashing scan.The implementation includes:
- Proper error handling with descriptive messages
- Comprehensive configuration setup
- Clear documentation
1487-1497
: LGTM! Well-structured settings helper function.The implementation follows good practices:
- Single responsibility principle
- Proper error handling
- Clear return value
pyproject.toml (1)
11-16
: LGTM! Improved exclude configuration.The changes improve readability by using multi-line format and add appropriate paths to the exclude list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
src/scanoss/file_filters.py (1)
455-491
: Inconsistent method visibility.The
should_skip_dir
method is made public while_should_skip_file
remains private. This creates an inconsistent API where related methods have different visibility levels.Make both methods public for consistency:
-def _should_skip_file(self, file_rel_path: str) -> bool: +def should_skip_file(self, file_rel_path: str) -> bool:Also applies to: 493-530
🧰 Tools
🪛 Ruff (0.8.2)
455-455: Too many return statements (7 > 6)
(PLR0911)
🪛 GitHub Actions: Lint
[error] 455-455: Too many return statements (7 > 6).
🧹 Nitpick comments (15)
src/scanoss/utils/abstract_presenter.py (1)
36-39
: Use a more specific exception type for invalid format.Raising a generic
Exception
for format validation can obscure handling in downstream code. Consider usingValueError
or a custom exception to better reflect the nature of the error.- raise Exception( + raise ValueError(src/scanoss/utils/simhash.py (3)
47-49
: Rename thesum()
method for clarity.
sum()
may not intuitively convey that this method returns the 64-bit hash value. Consider a more descriptive name likeget_hash_value()
orhash64()
to improve readability.
66-82
: Consider performance impacts for large feature sets.The bit-by-bit operations in
vectorize
may become a bottleneck for very large lists. Optimizing this loop or parallelizing it (depending on the environment) might be beneficial if performance is critical.
112-122
: Use modern popcount for Hamming distance.Python 3.10+ provides an integer
bit_count()
method, which can be more performant and concise than manually clearing the lowest set bit in a loop.def compare(a: int, b: int) -> int: v = a ^ b - c = 0 - while v: - v &= v - 1 - c += 1 - return c + return v.bit_count()src/scanoss/results.py (1)
172-180
: Use specialized exceptions for filter validation.Raise a more specific exception, such as
ValueError
, instead of a genericException
. This allows callers to handle invalid filter states more gracefully.- raise Exception( + raise ValueError(src/scanoss/scanners/scanner_hfh.py (3)
172-172
: Consider chunked file reading to optimize memory usage.Reading entire files into memory with
full_file_path.read_bytes()
can be problematic for very large files. Consider reading files in chunks or imposing a file size limit to avoid potential out-of-memory issues during folder hashing.
194-215
: Potential parallelization for improved performance.For very large directories, consider parallelizing the recursion in
_hash_calc_from_node()
and_build_root_node()
by processing child nodes concurrently, reducing the time to build the directory tree and compute hashes.
253-254
: Clarify rationale for overwriting the most significant byte.Overwriting the MS byte of
names_simhash
with the computed head value might mask high-order bits. Document the logic behind this choice or rename_head_calc
to make its purpose clearer to future maintainers.src/scanoss/scanossgrpc.py (2)
425-441
: Consider adding additional error handling or retry logic for folder_hash_scan.As a potentially high-load operation, folder hashing might fail mid-scan. Evaluate adding robust retries or partial scanning logic to handle transient failures rather than returning immediately.
495-498
: Replace magic values with named constants.Comparing
status_code
directly to2
or3
in_check_status_response
can be unclear. Consider using named constants or an enum to clarify the meaning of each status code.- if status_code == 2: + if status_code == STATUS_SUCCEEDED_WITH_WARNINGS: ... - elif status_code == 3: + elif status_code == STATUS_FAILED_WITH_WARNINGS:🧰 Tools
🪛 Ruff (0.8.2)
495-495: Magic value used in comparison, consider replacing
2
with a constant variable(PLR2004)
498-498: Magic value used in comparison, consider replacing
3
with a constant variable(PLR2004)
🪛 GitHub Actions: Lint
[warning] 495-495: Magic value used in comparison, consider replacing
2
with a constant variable.
[warning] 498-498: Magic value used in comparison, consider replacing
3
with a constant variable.src/scanoss/scanners/scanner_config.py (1)
57-73
: Add validation for timeout and thread values.Consider adding validation to ensure timeout values are positive and thread count is within reasonable bounds.
def create_scanner_config_from_args(args) -> ScannerConfig: + post_size = getattr(args, 'post_size', DEFAULT_POST_SIZE) + timeout = max(1, getattr(args, 'timeout', DEFAULT_TIMEOUT)) + sc_timeout = max(1, getattr(args, 'sc_timeout', DEFAULT_SC_TIMEOUT)) + nb_threads = max(1, min(32, getattr(args, 'nb_threads', DEFAULT_NB_THREADS))) return ScannerConfig( debug=args.debug, trace=args.trace, quiet=args.quiet, api_key=getattr(args, 'api_key', None), url=getattr(args, 'url', None), grpc_url=getattr(args, 'grpc_url', None), - post_size=getattr(args, 'post_size', DEFAULT_POST_SIZE), - timeout=getattr(args, 'timeout', DEFAULT_TIMEOUT), - sc_timeout=getattr(args, 'sc_timeout', DEFAULT_SC_TIMEOUT), - nb_threads=getattr(args, 'nb_threads', DEFAULT_NB_THREADS), + post_size=post_size, + timeout=timeout, + sc_timeout=sc_timeout, + nb_threads=nb_threads, proxy=getattr(args, 'proxy', None), grpc_proxy=getattr(args, 'grpc_proxy', None), ca_cert=getattr(args, 'ca_cert', None), pac=getattr(args, 'pac', None), )src/scanoss/utils/crc64.py (1)
74-78
: Add missing docstring parameters.The docstring for the
checksum
method is missing parameter and return type descriptions.def checksum(cls, data: bytes) -> int: - """Calculate CRC64 checksum for the given data.""" + """Calculate CRC64 checksum for the given data. + + Args: + data (bytes): The data to calculate the checksum for. + + Returns: + int: The calculated CRC64 checksum. + """ crc = cls() crc.update(data) return crc.digest()src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (1)
2-3
: Strengthen the warning against manual edits.For generated protobuf files, consider using a stronger warning message to prevent accidental modifications.
# Generated by the protocol buffer compiler. DO NOT EDIT! -# NO CHECKED-IN PROTOBUF GENCODE +# WARNING: This is an auto-generated file. Any manual changes will be overwritten! +# To modify this file, update the proto definitions and regenerate using protoc.CLIENT_HELP.md (1)
406-415
: New Folder-Scan Subcommand Documentation
The new section describing thefolder-scan
subcommand is clear and informative. It concisely explains the purpose (generating directory-level fingerprints using CRC64 and simhash) and provides a usage example that matches the new functionality. Consider adding brief information on possible outputs or error handling if that information is available elsewhere for consistency.CHANGELOG.md (1)
12-17
: New Version Section [1.21.0] Entry
The changelog entry for version[1.21.0]
is well-formatted and clearly lists the new features: thefolder-scan
subcommand, theAbstractPresenter
class, and the helper functions for config construction. Ensure that the documentation in the CLI (and related modules) fully reflects these additions. Also, double-check for any repeated phrasing or verb redundancies—the static analysis hinted at a possible repetition in this area.🧰 Tools
🪛 LanguageTool
[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-02-10 ### Added - Add folder-scan subcommand - Add AbstractPr...(REPEATED_VERBS)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
CHANGELOG.md
(2 hunks)CLIENT_HELP.md
(1 hunks)pyproject.toml
(1 hunks)requirements.txt
(1 hunks)src/scanoss/__init__.py
(1 hunks)src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
(2 hunks)src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py
(3 hunks)src/scanoss/cli.py
(46 hunks)src/scanoss/constants.py
(1 hunks)src/scanoss/file_filters.py
(3 hunks)src/scanoss/results.py
(5 hunks)src/scanoss/scanners/__init__.py
(1 hunks)src/scanoss/scanners/scanner_config.py
(1 hunks)src/scanoss/scanners/scanner_hfh.py
(1 hunks)src/scanoss/scanossbase.py
(1 hunks)src/scanoss/scanossgrpc.py
(14 hunks)src/scanoss/utils/abstract_presenter.py
(1 hunks)src/scanoss/utils/crc64.py
(1 hunks)src/scanoss/utils/simhash.py
(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- src/scanoss/init.py
- src/scanoss/scanners/init.py
- requirements.txt
🧰 Additional context used
🪛 Ruff (0.8.2)
src/scanoss/file_filters.py
455-455: Too many return statements (7 > 6)
(PLR0911)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
30-30: Line too long (1925 > 120)
(E501)
37-37: Line too long (391 > 120)
(E501)
39-39: Line too long (130 > 120)
(E501)
41-41: Line too long (144 > 120)
(E501)
src/scanoss/scanossgrpc.py
90-90: Too many statements (51 > 50)
(PLR0915)
495-495: Magic value used in comparison, consider replacing 2
with a constant variable
(PLR2004)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py
4-4: warnings
imported but unused
Remove unused import: warnings
(F401)
7-7: Line too long (122 > 120)
(E501)
22-22: f-string without any placeholders
Remove extraneous f
prefix
(F541)
98-98: Too many arguments in function definition (10 > 5)
(PLR0913)
125-125: Too many arguments in function definition (10 > 5)
(PLR0913)
🪛 GitHub Actions: Lint
src/scanoss/file_filters.py
[error] 243-243: Too many arguments in function definition (11 > 5).
[warning] 314-314: for
loop variable dirpath
overwritten by assignment target.
[error] 455-455: Too many return statements (7 > 6).
[error] 493-493: Too many return statements (7 > 6).
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
[warning] 7-7: Import block is un-sorted or un-formatted. Please organize imports.
[error] 25-25: Module level import not at top of file.
[warning] 25-25: Import block is un-sorted or un-formatted. Please organize imports.
[error] 25-25: scanoss.api.common.v2.scanoss_common_pb2
imported but unused.
[error] 26-26: Module level import not at top of file.
[warning] 26-26: google.api.annotations_pb2
imported but unused.
[error] 27-27: Module level import not at top of file.
[warning] 27-27: protoc_gen_swagger.options.annotations_pb2
imported but unused.
[error] 30-30: Line too long (1925 > 120).
[error] 37-37: Line too long (391 > 120).
[error] 39-39: Line too long (130 > 120).
[error] 41-41: Line too long (144 > 120).
src/scanoss/scanossgrpc.py
[error] 90-90: Too many statements (51 > 50).
[warning] 495-495: Magic value used in comparison, consider replacing 2
with a constant variable.
[warning] 498-498: Magic value used in comparison, consider replacing 3
with a constant variable.
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py
[warning] 3-3: Import block is un-sorted or un-formatted. Please organize imports.
[error] 4-4: warnings
imported but unused.
[error] 7-7: Line too long (122 > 120).
[error] 22-22: f-string without any placeholders.
[error] 98-98: Too many arguments in function definition (10 > 5).
[error] 125-125: Too many arguments in function definition (10 > 5).
🪛 LanguageTool
CHANGELOG.md
[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-02-10 ### Added - Add folder-scan subcommand - Add AbstractPr...
(REPEATED_VERBS)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (17)
src/scanoss/utils/abstract_presenter.py (1)
45-46
: Clarify fallback behavior for unrecognized formats.Falling back to
'plain'
format might mask user errors or unintended mismatches. You may wish to explicitly require a valid format or provide a clearer fallback warning.src/scanoss/results.py (2)
83-84
: Confirm constructor ordering for multiple inheritance.When using multiple base classes, call to
super().__init__()
is typically recommended if both base classes are using cooperative inheritance. Double-check if direct calls toAbstractPresenter.__init__
andScanossBase.__init__
in this order are required or if a singlesuper().__init__()
call suffices.
211-211
: Return consistent types even for empty data.
_format_plain_output
callsprint_stderr()
whenself.data
is empty, which might not return a string. Align return types with the docstring and consider returning a plain message string or raising an exception.src/scanoss/scanners/scanner_hfh.py (1)
180-187
: Validate the intended duplication of file references.Each file appears to be appended to both the root node and its child node(s). If this design isn't intentional, consider only adding the file to the relevant child node. Otherwise, confirm that multiple references won’t affect hashing or cause duplicated results in downstream processes.
src/scanoss/scanossgrpc.py (2)
77-83
: Good addition of a dedicated gRPC error class.Defining a custom exception class like
ScanossGrpcError
provides clearer error contexts and helps unify exception handling. Great approach!
442-480
: Centralized gRPC call handling is well-structured.Using
_call_rpc
to unify request building, metadata insertion, and response parsing streamlines the code. Keep an eye on largerequest_input
payloads; you may need advanced flow control or streaming RPC for extremely large directory scans.src/scanoss/constants.py (1)
1-13
: Constants are properly centralized.Centralizing configuration values (e.g.,
DEFAULT_TIMEOUT
,DEFAULT_RETRY
) makes the code more maintainable. Consider referencing them consistently throughout the code to avoid scattering hard-coded values in multiple places.src/scanoss/scanners/scanner_config.py (1)
38-54
: LGTM! Well-structured configuration class.The dataclass approach with proper type hints and default values is a clean way to manage scanner configuration.
src/scanoss/scanossbase.py (1)
83-91
: LGTM! Parameter rename improves clarity.The change from
msg
tocontent
makes the parameter name more descriptive and consistent with its usage.src/scanoss/utils/crc64.py (1)
29-73
: LGTM! Well-implemented CRC64 ECMA algorithm.The implementation is clean, efficient, and matches Go's hash/crc64 package. Good use of class methods and type hints.
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (2)
9-26
: LGTM! Well-implemented version check.The version check implementation is robust and provides clear error messages for version mismatches.
🧰 Tools
🪛 Ruff (0.8.2)
22-22: f-string without any placeholders
Remove extraneous
f
prefix(F541)
🪛 GitHub Actions: Lint
[error] 22-22: f-string without any placeholders.
45-49
: LGTM! Well-structured gRPC method implementation.The
FolderHashScan
method is properly implemented and follows the same pattern as the existingEcho
method, maintaining consistency in the codebase.Also applies to: 64-69
src/scanoss/file_filters.py (1)
208-214
: LGTM! New file extensions added.The new file extensions are properly added to the
DEFAULT_SKIPPED_EXT
set.src/scanoss/cli.py (3)
477-491
: LGTM! Well-structured folder-scan command.The new
folder-scan
subcommand is properly implemented with clear help messages and appropriate options.
494-507
: LGTM! Consistent settings options.Settings options are properly added to relevant commands with clear help messages.
1451-1485
: LGTM! Robust error handling in folder-scan implementation.The
folder_hashing_scan
function includes proper error handling and validation:
- Validates required arguments
- Checks directory existence
- Handles gRPC errors
pyproject.toml (1)
11-16
: LGTM! Improved exclude list formatting and coverage.The exclude list is now more readable with multi-line formatting and properly excludes generated code paths.
a187dd5
to
86833bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
src/scanoss/results.py (1)
191-209
: 🛠️ Refactor suggestionConsider adding error handling for JSON serialization.
The
_format_json_output
method should handle potential JSON serialization errors.def _format_json_output(self) -> str: formatted_data = [] for item in self.data: formatted_data.append({...}) - return json.dumps({'results': formatted_data, 'total': len(formatted_data)}, indent=2) + try: + return json.dumps({'results': formatted_data, 'total': len(formatted_data)}, indent=2) + except Exception as e: + self.print_stderr(f'ERROR: Failed to serialize JSON: {e}') + raise
🧹 Nitpick comments (14)
.github/workflows/lint.yml (2)
35-38
: Consider extracting exclude patterns from pyproject.toml.The current implementation hardcodes exclude patterns in the workflow, which duplicates configuration from
pyproject.toml
and makes maintenance harder. Since this is marked as a temporary workaround, we should:
- Track this technical debt in an issue
- Consider parsing exclude patterns directly from
pyproject.toml
Would you like me to:
- Open an issue to track this technical debt?
- Generate a Python script to parse exclude patterns from
pyproject.toml
?🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 35-35: trailing spaces
(trailing-spaces)
35-35
: Fix minor formatting issues.There are two formatting issues to address:
- Line 35: Remove trailing spaces
- Line 58: Remove extra blank line
- # Filter out files that match exclude patterns from pyproject.toml + # Filter out files that match exclude patterns from pyproject.toml # this is a temporary workaround until we fix all the lint errors filtered_files=$(echo "$files" | grep -v -E 'tests/|test_.*\.py|src/protoc_gen_swagger/|src/scanoss/api/' || true) # Use the multi-line syntax for outputs. echo "files<<EOF" >> "$GITHUB_OUTPUT" echo "${filtered_files}" >> "$GITHUB_OUTPUT" echo "EOF" >> "$GITHUB_OUTPUT" echo "Changed files before filtering: ${files}" echo "Changed files after filtering: ${filtered_files}" - name: Run Ruff on changed files - +Also applies to: 58-58
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 35-35: trailing spaces
(trailing-spaces)
src/scanoss/scanners/scanner_config.py (1)
57-73
: Add type hints and input validation.Consider these improvements:
- Add type hints for the
args
parameter.- Add validation for numeric fields to ensure they are positive.
-def create_scanner_config_from_args(args) -> ScannerConfig: +from argparse import Namespace + +def create_scanner_config_from_args(args: Namespace) -> ScannerConfig: + post_size = getattr(args, 'post_size', DEFAULT_POST_SIZE) + timeout = getattr(args, 'timeout', DEFAULT_TIMEOUT) + sc_timeout = getattr(args, 'sc_timeout', DEFAULT_SC_TIMEOUT) + nb_threads = getattr(args, 'nb_threads', DEFAULT_NB_THREADS) + + if post_size <= 0: + raise ValueError("post_size must be positive") + if timeout <= 0: + raise ValueError("timeout must be positive") + if sc_timeout <= 0: + raise ValueError("sc_timeout must be positive") + if nb_threads <= 0: + raise ValueError("nb_threads must be positive") + return ScannerConfig( debug=args.debug, trace=args.trace, quiet=args.quiet, api_key=getattr(args, 'key', None), url=getattr(args, 'api_url', None), grpc_url=getattr(args, 'grpc_url', None), - post_size=getattr(args, 'post_size', DEFAULT_POST_SIZE), - timeout=getattr(args, 'timeout', DEFAULT_TIMEOUT), - sc_timeout=getattr(args, 'sc_timeout', DEFAULT_SC_TIMEOUT), - nb_threads=getattr(args, 'nb_threads', DEFAULT_NB_THREADS), + post_size=post_size, + timeout=timeout, + sc_timeout=sc_timeout, + nb_threads=nb_threads, proxy=getattr(args, 'proxy', None), grpc_proxy=getattr(args, 'grpc_proxy', None), ca_cert=getattr(args, 'ca_cert', None), pac=getattr(args, 'pac', None), )src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (1)
19-25
: Simplify version validation error message.The error message construction can be simplified by removing unnecessary f-strings and concatenations.
raise RuntimeError( - f'The grpc package installed is at version {GRPC_VERSION},' - + f' but the generated code in scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py depends on' - + f' grpcio>={GRPC_GENERATED_VERSION}.' - + f' Please upgrade your grpc module to grpcio>={GRPC_GENERATED_VERSION}' - + f' or downgrade your generated code using grpcio-tools<={GRPC_VERSION}.' + f'The grpc package installed is at version {GRPC_VERSION}, but the generated code in ' + f'scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py depends on grpcio>={GRPC_GENERATED_VERSION}. ' + f'Please upgrade your grpc module to grpcio>={GRPC_GENERATED_VERSION} or downgrade your ' + f'generated code using grpcio-tools<={GRPC_VERSION}.' )🧰 Tools
🪛 Ruff (0.8.2)
21-21: f-string without any placeholders
Remove extraneous
f
prefix(F541)
src/scanoss/utils/simhash.py (2)
42-54
: Consider adding value validation in SimhashFeature.The class could benefit from input validation to ensure hash_value and weight are valid integers.
def __init__(self, hash_value: int, weight: int = 1): + if not isinstance(hash_value, int): + raise TypeError("hash_value must be an integer") + if not isinstance(weight, int) or weight < 1: + raise ValueError("weight must be a positive integer") self.hash_value = hash_value self.weight = weight
66-81
: Consider optimizing vectorize function.The current implementation has O(n*64) complexity. For large feature sets, this could be optimized using numpy for better performance.
+import numpy as np + def vectorize(features: list) -> list: - v = [0] * 64 - for feature in features: - h = feature.sum() - w = feature.get_weight() - for i in range(64): - if ((h >> i) & 1) == 1: - v[i] += w - else: - v[i] -= w - return v + v = np.zeros(64, dtype=np.int64) + for feature in features: + h = feature.sum() + w = feature.get_weight() + v += w * (2 * ((h >> np.arange(64)) & 1) - 1) + return v.tolist()src/scanoss/results.py (1)
53-87
: Consider using composition over multiple inheritance.The class inherits from both
AbstractPresenter
andScanossBase
, which could lead to the diamond problem. Consider using composition instead.-class Results(AbstractPresenter, ScanossBase): +class Results: def __init__(self, debug: bool = False, ...): - AbstractPresenter.__init__(self, output_file=output_file, output_format=output_format) - ScanossBase.__init__(self, debug, trace, quiet) + self.presenter = AbstractPresenter(output_file=output_file, output_format=output_format) + self.base = ScanossBase(debug, trace, quiet)src/scanoss/file_filters.py (1)
207-213
: Consider sorting the file extensions alphabetically.The newly added extensions break the alphabetical ordering of the list. Consider reordering for better maintainability.
- '.whml', - '.pom', - '.smtml', - '.min.js', - '.mf', - '.base64', - '.s', + '.base64', + '.mf', + '.min.js', + '.pom', + '.s', + '.smtml', + '.whml',src/scanoss/scanossgrpc.py (2)
78-93
: Consider adding error codes to ScanossGrpcError.The custom exception could be enhanced with error codes for better error handling by clients.
class ScanossGrpcError(Exception): + def __init__(self, message: str, code: int = None): + super().__init__(message) + self.code = code
571-599
: Consider adding validation to GrpcConfig.The
GrpcConfig
dataclass could benefit from runtime validation of its fields.+from dataclasses import dataclass, field +from typing import Optional + @dataclass class GrpcConfig: + def __post_init__(self): + if not isinstance(self.url, str): + raise TypeError("url must be a string") + if self.timeout is not None and (not isinstance(self.timeout, int) or self.timeout < 0): + raise ValueError("timeout must be a non-negative integer")src/scanoss/cli.py (2)
477-491
: Consider enhancing input validation for the folder-scan command.While the command setup follows the established pattern, consider adding validation for:
- Maximum directory depth to prevent excessive recursion
- Directory size limits to prevent memory issues
- File type restrictions for better performance
p_folder_scan = subparsers.add_parser( 'folder-scan', description=f'Scan the given directory using folder hashing: {__version__}', help='Scan the given directory using folder hashing', ) p_folder_scan.add_argument('scan_dir', metavar='FILE/DIR', type=str, nargs='?', help='The root directory to scan') +p_folder_scan.add_argument('--max-depth', type=int, default=10, help='Maximum directory depth to scan') +p_folder_scan.add_argument('--max-size', type=str, default='1GB', help='Maximum total directory size to scan') +p_folder_scan.add_argument('--file-types', type=str, help='Comma-separated list of file types to scan') p_folder_scan.add_argument('--output', '-o', type=str, help='Output result file name (optional - default stdout).')
1487-1496
: Add input validation for settings configuration.Consider validating the settings configuration before applying it:
- Check for required fields
- Validate setting values are within acceptable ranges
- Verify setting combinations are valid
def get_scanoss_settings_from_args(args): scanoss_settings = None if not args.skip_settings_file: scanoss_settings = ScanossSettings(debug=args.debug, trace=args.trace, quiet=args.quiet) try: + # Validate settings file exists and is readable + if args.settings and not os.path.isfile(args.settings): + raise ScanossSettingsError(f"Settings file not found: {args.settings}") + if args.settings and not os.access(args.settings, os.R_OK): + raise ScanossSettingsError(f"Settings file not readable: {args.settings}") + scanoss_settings.load_json_file(args.settings, args.scan_dir).set_file_type('new').set_scan_type('identify') + + # Validate loaded settings + if not scanoss_settings.validate(): + raise ScanossSettingsError("Invalid settings configuration") + except ScanossSettingsError as e: print_stderr(f'Error: {e}') sys.exit(1) return scanoss_settingsCLIENT_HELP.md (1)
408-415
: Enhance folder-scan documentation with more examples and best practices.The current documentation provides basic usage but would benefit from:
- More detailed examples showing different output formats
- Best practices for scanning large directories
- Performance considerations and recommendations
### Folder-Scan a Project Folder The new `folder-scan` subcommand performs a comprehensive scan on an entire directory by recursively processing files to generate folder-level fingerprints. It computes CRC64 hashes and simhash values to detect directory-level similarities, which is especially useful for comparing large code bases or detecting duplicate folder structures. **Usage:** ```shell scanoss-py folder-scan /path/to/folder -o folder-scan-results.json + +# Scan with JSON output format +scanoss-py folder-scan /path/to/folder -o results.json --format json + +# Best practices for large directories +scanoss-py folder-scan /path/to/folder --max-depth 5 --max-size 1GB -o results.json + +# Performance optimization example +scanoss-py folder-scan /path/to/folder --file-types=".py,.js,.java" -o results.json
+Performance Considerations:
+- Limit directory depth for better performance
+- Use file type filtering for focused scans
+- Monitor memory usage for large directories</blockquote></details> <details> <summary>CHANGELOG.md (1)</summary><blockquote> `12-16`: **Enhance changelog entries with more details.** While the changelog follows the Keep a Changelog format, consider adding: 1. More detailed descriptions of the new features 2. Breaking changes section if applicable 3. Migration instructions if needed ```diff ## [1.21.0] - 2025-02-10 ### Added -Add folder-scan subcommand -Add AbstractPresenter class for presenting output in a given format -Add several reusable helper functions for constructing config objects from CLI args +- Add folder-scan subcommand for comprehensive directory scanning + - Support for CRC64 hash computation + - Support for simhash-based similarity detection + - Configurable scan depth and file type filtering + +- Add AbstractPresenter class for flexible output formatting + - Support for plain text and JSON output + - Extensible design for future formats + +- Add reusable configuration helpers + - Simplified CLI argument handling + - Improved error handling and validation + +### Breaking Changes +- None + +### Migration +- No migration needed for existing functionality
🧰 Tools
🪛 LanguageTool
[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-02-10 ### Added - Add folder-scan subcommand - Add AbstractPr...(REPEATED_VERBS)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
.github/workflows/lint.yml
(2 hunks)CHANGELOG.md
(2 hunks)CLIENT_HELP.md
(1 hunks)pyproject.toml
(1 hunks)requirements.txt
(1 hunks)src/scanoss/__init__.py
(1 hunks)src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
(2 hunks)src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py
(3 hunks)src/scanoss/cli.py
(46 hunks)src/scanoss/constants.py
(1 hunks)src/scanoss/file_filters.py
(5 hunks)src/scanoss/results.py
(5 hunks)src/scanoss/scanners/__init__.py
(1 hunks)src/scanoss/scanners/scanner_config.py
(1 hunks)src/scanoss/scanners/scanner_hfh.py
(1 hunks)src/scanoss/scanossbase.py
(1 hunks)src/scanoss/scanossgrpc.py
(14 hunks)src/scanoss/utils/abstract_presenter.py
(1 hunks)src/scanoss/utils/crc64.py
(1 hunks)src/scanoss/utils/simhash.py
(1 hunks)version.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- src/scanoss/init.py
- requirements.txt
- src/scanoss/scanossbase.py
- pyproject.toml
- src/scanoss/scanners/init.py
- src/scanoss/constants.py
- src/scanoss/utils/abstract_presenter.py
- src/scanoss/utils/crc64.py
- src/scanoss/scanners/scanner_hfh.py
🧰 Additional context used
🪛 YAMLlint (1.35.1)
.github/workflows/lint.yml
[error] 35-35: trailing spaces
(trailing-spaces)
[warning] 58-58: too many blank lines
(1 > 0) (empty-lines)
🪛 Ruff (0.8.2)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
30-30: Line too long (1925 > 120)
(E501)
37-37: Line too long (391 > 120)
(E501)
39-39: Line too long (130 > 120)
(E501)
41-41: Line too long (144 > 120)
(E501)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py
6-6: Line too long (122 > 120)
(E501)
21-21: f-string without any placeholders
Remove extraneous f
prefix
(F541)
97-97: Too many arguments in function definition (10 > 5)
(PLR0913)
124-124: Too many arguments in function definition (10 > 5)
(PLR0913)
🪛 LanguageTool
CHANGELOG.md
[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-02-10 ### Added - Add folder-scan subcommand - Add AbstractPr...
(REPEATED_VERBS)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (9)
.github/workflows/lint.yml (2)
41-41
: LGTM!The filtered files are correctly written to the GitHub Actions output variable.
44-45
: LGTM! Great addition of debug logging.The added logging improves workflow transparency by showing which files were excluded during filtering.
version.py (1)
26-27
: LGTM! Clean refactoring.The changes improve code organization by grouping imports together and simplify error handling logic by removing the unnecessary
else
clause.Also applies to: 51-51
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (1)
1-55
: LGTM! Generated code.This is auto-generated protobuf code and should not be manually edited. The long lines flagged by static analysis are expected in generated code.
🧰 Tools
🪛 Ruff (0.8.2)
25-25: Module level import not at top of file
(E402)
25-25:
scanoss.api.common.v2.scanoss_common_pb2
imported but unusedRemove unused import:
scanoss.api.common.v2.scanoss_common_pb2
(F401)
26-26: Module level import not at top of file
(E402)
26-26:
google.api.annotations_pb2
imported but unusedRemove unused import:
google.api.annotations_pb2
(F401)
27-27: Module level import not at top of file
(E402)
27-27:
protoc_gen_swagger.options.annotations_pb2
imported but unusedRemove unused import:
protoc_gen_swagger.options.annotations_pb2
(F401)
30-30: Line too long (1925 > 120)
(E501)
37-37: Line too long (391 > 120)
(E501)
39-39: Line too long (130 > 120)
(E501)
41-41: Line too long (144 > 120)
(E501)
src/scanoss/utils/simhash.py (1)
33-39
: LGTM! Well-implemented FNV-1 hash function.The implementation of FNV-1 hash algorithm is correct and follows the standard specification. The use of bitwise operations and masking ensures proper 64-bit integer handling.
src/scanoss/results.py (1)
210-224
: LGTM! Clear and consistent plain text formatting.The plain text output formatting is well-structured and handles edge cases appropriately.
src/scanoss/file_filters.py (1)
441-478
: LGTM! Well-documented method with comprehensive checks.The
should_skip_dir
method is well-documented and includes thorough validation logic for directory skipping.src/scanoss/scanossgrpc.py (1)
435-490
: LGTM! Well-implemented gRPC call abstraction.The
_call_rpc
helper method effectively reduces code duplication and provides consistent error handling across all gRPC calls.src/scanoss/cli.py (1)
28-47
: LGTM! Well-organized imports.The imports are properly organized and follow Python best practices. The use of dataclasses for configuration objects and centralized constants improves maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (13)
src/scanoss/api/provenance/v2/scanoss_provenance_pb2.py (1)
23-23
: Fix boolean comparison style.Replace the equality comparison to
False
with anot
expression for better Python style.-if _descriptor._USE_C_DESCRIPTORS == False: +if not _descriptor._USE_C_DESCRIPTORS:🧰 Tools
🪛 Ruff (0.8.2)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
src/scanoss/api/components/v2/scanoss_components_pb2.py (1)
23-23
: Consider usingis
operator for boolean comparison.Replace
== False
withis False
or simply usenot
for better Python idioms.-if _descriptor._USE_C_DESCRIPTORS == False: +if not _descriptor._USE_C_DESCRIPTORS:🧰 Tools
🪛 Ruff (0.8.2)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py (1)
23-23
: Style: Improve boolean comparison.Replace
== False
withis False
or simplynot
for better Python idioms.-if _descriptor._USE_C_DESCRIPTORS == False: +if not _descriptor._USE_C_DESCRIPTORS:🧰 Tools
🪛 Ruff (0.8.2)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py (2)
19-19
: Avoid manual formatting changes to generated code.
Line (4142 > 120) is triggering E501. Since this is auto-generated, it's generally recommended to exclude such files from lint checks rather than modify them manually.🧰 Tools
🪛 Ruff (0.8.2)
19-19: Line too long (4142 > 120)
(E501)
25-26
: Long line in auto-generated code.
Line (400 > 120) triggers E501. Similarly, consider ignoring lint errors for protoc-generated files instead of manually editing them.🧰 Tools
🪛 Ruff (0.8.2)
26-26: Line too long (400 > 120)
(E501)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (2)
19-19
: Long line in generated code.
Line (1925 > 120) triggers E501. Because this is generated, best practice is to exclude or ignore it in linting.🧰 Tools
🪛 Ruff (0.8.2)
19-19: Line too long (1925 > 120)
(E501)
25-26
: Another long line in generated code.
Line (379 > 120) triggers E501. Consider ignoring or excluding these proto-generated files from lint checks.🧰 Tools
🪛 Ruff (0.8.2)
26-26: Line too long (379 > 120)
(E501)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (6)
6-6
: Line too long (122 > 120).
Since this is partially auto-generated, consider ignoring or suppressing lint errors here.🧰 Tools
🪛 Ruff (0.8.2)
6-6: Line too long (122 > 120)
(E501)
33-42
: UnimplementedEcho
method.
Currently raisesNotImplementedError
. If this is intentional, consider adding a docstring explaining why or removing it if not needed.
44-49
:FolderHashScan
method not implemented.
Same pattern asEcho
—either implement it fully or clarify with a docstring.
70-74
: Experimental API docstring.
Provide clarity if this is production-ready or restricted usage.
77-92
: Method defines too many parameters (10 > 5).
Python style recommends fewer parameters. Try grouping into a config object or reduce arguments if possible.🧰 Tools
🪛 Ruff (0.8.2)
77-77: Too many arguments in function definition (10 > 5)
(PLR0913)
94-109
: Excessive parameters inFolderHashScan
.
Same concern as above; consider consolidating arguments for readability and maintainability.🧰 Tools
🪛 Ruff (0.8.2)
94-94: Too many arguments in function definition (10 > 5)
(PLR0913)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
src/protoc_gen_swagger/options/annotations_pb2.py
(1 hunks)src/protoc_gen_swagger/options/annotations_pb2_grpc.py
(1 hunks)src/protoc_gen_swagger/options/openapiv2_pb2.py
(1 hunks)src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py
(1 hunks)src/scanoss/api/common/v2/scanoss_common_pb2.py
(1 hunks)src/scanoss/api/common/v2/scanoss_common_pb2_grpc.py
(1 hunks)src/scanoss/api/components/v2/scanoss_components_pb2.py
(1 hunks)src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py
(3 hunks)src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py
(1 hunks)src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py
(3 hunks)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py
(1 hunks)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py
(3 hunks)src/scanoss/api/provenance/v2/scanoss_provenance_pb2.py
(2 hunks)src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py
(1 hunks)src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
(1 hunks)src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py
(2 hunks)src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py
(1 hunks)src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py
(2 hunks)src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py
(1 hunks)src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py
(3 hunks)src/scanoss/utils/crc64.py
(1 hunks)
✅ Files skipped from review due to trivial changes (4)
- src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py
- src/protoc_gen_swagger/options/annotations_pb2_grpc.py
- src/scanoss/api/common/v2/scanoss_common_pb2_grpc.py
- src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/scanoss/utils/crc64.py
🧰 Additional context used
🪛 Ruff (0.8.2)
src/scanoss/api/provenance/v2/scanoss_provenance_pb2.py
23-23: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (389 > 120)
(E501)
27-27: Undefined name _PROVENANCE
(F821)
28-28: Undefined name _PROVENANCE
(F821)
28-28: Line too long (122 > 120)
(E501)
29-29: Undefined name _PROVENANCE
(F821)
30-30: Undefined name _PROVENANCE
(F821)
30-30: Line too long (142 > 120)
(E501)
31-31: Undefined name _PROVENANCERESPONSE
(F821)
32-32: Undefined name _PROVENANCERESPONSE
(F821)
33-33: Undefined name _PROVENANCERESPONSE_DECLAREDLOCATION
(F821)
34-34: Undefined name _PROVENANCERESPONSE_DECLAREDLOCATION
(F821)
35-35: Undefined name _PROVENANCERESPONSE_CURATEDLOCATION
(F821)
36-36: Undefined name _PROVENANCERESPONSE_CURATEDLOCATION
(F821)
37-37: Undefined name _PROVENANCERESPONSE_PURLS
(F821)
38-38: Undefined name _PROVENANCERESPONSE_PURLS
(F821)
39-39: Undefined name _PROVENANCE
(F821)
40-40: Undefined name _PROVENANCE
(F821)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py
19-19: Line too long (2471 > 120)
(E501)
23-23: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (398 > 120)
(E501)
27-27: Undefined name _DEPENDENCIES
(F821)
28-28: Undefined name _DEPENDENCIES
(F821)
28-28: Line too long (126 > 120)
(E501)
29-29: Undefined name _DEPENDENCIES
(F821)
30-30: Undefined name _DEPENDENCIES
(F821)
30-30: Line too long (139 > 120)
(E501)
31-31: Undefined name _DEPENDENCYREQUEST
(F821)
32-32: Undefined name _DEPENDENCYREQUEST
(F821)
33-33: Undefined name _DEPENDENCYREQUEST_PURLS
(F821)
34-34: Undefined name _DEPENDENCYREQUEST_PURLS
(F821)
35-35: Undefined name _DEPENDENCYREQUEST_FILES
(F821)
36-36: Undefined name _DEPENDENCYREQUEST_FILES
(F821)
37-37: Undefined name _DEPENDENCYRESPONSE
(F821)
38-38: Undefined name _DEPENDENCYRESPONSE
(F821)
39-39: Undefined name _DEPENDENCYRESPONSE_LICENSES
(F821)
40-40: Undefined name _DEPENDENCYRESPONSE_LICENSES
(F821)
41-41: Undefined name _DEPENDENCYRESPONSE_DEPENDENCIES
(F821)
42-42: Undefined name _DEPENDENCYRESPONSE_DEPENDENCIES
(F821)
43-43: Undefined name _DEPENDENCYRESPONSE_FILES
(F821)
44-44: Undefined name _DEPENDENCYRESPONSE_FILES
(F821)
45-45: Undefined name _DEPENDENCIES
(F821)
46-46: Undefined name _DEPENDENCIES
(F821)
src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py
19-19: Line too long (2566 > 120)
(E501)
22-22: Line too long (124 > 120)
(E501)
23-23: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (419 > 120)
(E501)
27-27: Undefined name _VULNERABILITIES
(F821)
28-28: Undefined name _VULNERABILITIES
(F821)
28-28: Line too long (129 > 120)
(E501)
29-29: Undefined name _VULNERABILITIES
(F821)
30-30: Undefined name _VULNERABILITIES
(F821)
30-30: Line too long (132 > 120)
(E501)
31-31: Undefined name _VULNERABILITIES
(F821)
32-32: Undefined name _VULNERABILITIES
(F821)
32-32: Line too long (152 > 120)
(E501)
33-33: Undefined name _VULNERABILITYREQUEST
(F821)
34-34: Undefined name _VULNERABILITYREQUEST
(F821)
35-35: Undefined name _VULNERABILITYREQUEST_PURLS
(F821)
36-36: Undefined name _VULNERABILITYREQUEST_PURLS
(F821)
37-37: Undefined name _CPERESPONSE
(F821)
38-38: Undefined name _CPERESPONSE
(F821)
39-39: Undefined name _CPERESPONSE_PURLS
(F821)
40-40: Undefined name _CPERESPONSE_PURLS
(F821)
41-41: Undefined name _VULNERABILITYRESPONSE
(F821)
42-42: Undefined name _VULNERABILITYRESPONSE
(F821)
43-43: Undefined name _VULNERABILITYRESPONSE_VULNERABILITIES
(F821)
44-44: Undefined name _VULNERABILITYRESPONSE_VULNERABILITIES
(F821)
45-45: Undefined name _VULNERABILITYRESPONSE_PURLS
(F821)
46-46: Undefined name _VULNERABILITYRESPONSE_PURLS
(F821)
47-47: Undefined name _VULNERABILITIES
(F821)
48-48: Undefined name _VULNERABILITIES
(F821)
src/protoc_gen_swagger/options/annotations_pb2.py
18-18: Line too long (1009 > 120)
(E501)
22-22: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
23-23: Undefined name openapiv2_swagger
(F821)
24-24: Undefined name openapiv2_operation
(F821)
25-25: Undefined name openapiv2_schema
(F821)
26-26: Undefined name openapiv2_tag
(F821)
27-27: Undefined name openapiv2_field
(F821)
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py
19-19: Line too long (4142 > 120)
(E501)
23-23: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (400 > 120)
(E501)
27-27: Undefined name _CRYPTOGRAPHY
(F821)
28-28: Undefined name _CRYPTOGRAPHY
(F821)
28-28: Line too long (126 > 120)
(E501)
29-29: Undefined name _CRYPTOGRAPHY
(F821)
30-30: Undefined name _CRYPTOGRAPHY
(F821)
30-30: Line too long (138 > 120)
(E501)
31-31: Undefined name _CRYPTOGRAPHY
(F821)
32-32: Undefined name _CRYPTOGRAPHY
(F821)
32-32: Line too long (149 > 120)
(E501)
33-33: Undefined name _CRYPTOGRAPHY
(F821)
34-34: Undefined name _CRYPTOGRAPHY
(F821)
34-34: Line too long (145 > 120)
(E501)
35-35: Undefined name _CRYPTOGRAPHY
(F821)
36-36: Undefined name _CRYPTOGRAPHY
(F821)
36-36: Line too long (139 > 120)
(E501)
37-37: Undefined name _CRYPTOGRAPHY
(F821)
38-38: Undefined name _CRYPTOGRAPHY
(F821)
38-38: Line too long (141 > 120)
(E501)
39-39: Undefined name _ALGORITHM
(F821)
40-40: Undefined name _ALGORITHM
(F821)
41-41: Undefined name _ALGORITHMRESPONSE
(F821)
42-42: Undefined name _ALGORITHMRESPONSE
(F821)
43-43: Undefined name _ALGORITHMRESPONSE_PURLS
(F821)
44-44: Undefined name _ALGORITHMRESPONSE_PURLS
(F821)
45-45: Undefined name _ALGORITHMSINRANGERESPONSE
(F821)
46-46: Undefined name _ALGORITHMSINRANGERESPONSE
(F821)
47-47: Undefined name _ALGORITHMSINRANGERESPONSE_PURL
(F821)
48-48: Undefined name _ALGORITHMSINRANGERESPONSE_PURL
(F821)
49-49: Undefined name _VERSIONSINRANGERESPONSE
(F821)
50-50: Undefined name _VERSIONSINRANGERESPONSE
(F821)
51-51: Undefined name _VERSIONSINRANGERESPONSE_PURL
(F821)
52-52: Undefined name _VERSIONSINRANGERESPONSE_PURL
(F821)
53-53: Undefined name _HINT
(F821)
54-54: Undefined name _HINT
(F821)
55-55: Undefined name _HINTSRESPONSE
(F821)
56-56: Undefined name _HINTSRESPONSE
(F821)
57-57: Undefined name _HINTSRESPONSE_PURLS
(F821)
58-58: Undefined name _HINTSRESPONSE_PURLS
(F821)
59-59: Undefined name _HINTSINRANGERESPONSE
(F821)
60-60: Undefined name _HINTSINRANGERESPONSE
(F821)
61-61: Undefined name _HINTSINRANGERESPONSE_PURL
(F821)
62-62: Undefined name _HINTSINRANGERESPONSE_PURL
(F821)
63-63: Undefined name _CRYPTOGRAPHY
(F821)
64-64: Undefined name _CRYPTOGRAPHY
(F821)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
19-19: Line too long (1925 > 120)
(E501)
23-23: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (379 > 120)
(E501)
27-27: Undefined name _SCANNING
(F821)
28-28: Undefined name _SCANNING
(F821)
29-29: Undefined name _SCANNING
(F821)
30-30: Undefined name _SCANNING
(F821)
30-30: Line too long (132 > 120)
(E501)
31-31: Undefined name _HFHREQUEST
(F821)
32-32: Undefined name _HFHREQUEST
(F821)
33-33: Undefined name _HFHREQUEST_CHILDREN
(F821)
34-34: Undefined name _HFHREQUEST_CHILDREN
(F821)
35-35: Undefined name _HFHRESPONSE
(F821)
36-36: Undefined name _HFHRESPONSE
(F821)
37-37: Undefined name _HFHRESPONSE_COMPONENT
(F821)
38-38: Undefined name _HFHRESPONSE_COMPONENT
(F821)
39-39: Undefined name _HFHRESPONSE_RESULT
(F821)
40-40: Undefined name _HFHRESPONSE_RESULT
(F821)
41-41: Undefined name _SCANNING
(F821)
42-42: Undefined name _SCANNING
(F821)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py
6-6: Line too long (122 > 120)
(E501)
77-77: Too many arguments in function definition (10 > 5)
(PLR0913)
94-94: Too many arguments in function definition (10 > 5)
(PLR0913)
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py
6-6: Line too long (138 > 120)
(E501)
79-79: Line too long (123 > 120)
(E501)
145-145: Too many arguments in function definition (10 > 5)
(PLR0913)
162-162: Too many arguments in function definition (10 > 5)
(PLR0913)
179-179: Too many arguments in function definition (10 > 5)
(PLR0913)
189-189: Line too long (127 > 120)
(E501)
196-196: Too many arguments in function definition (10 > 5)
(PLR0913)
206-206: Line too long (125 > 120)
(E501)
213-213: Too many arguments in function definition (10 > 5)
(PLR0913)
223-223: Line too long (122 > 120)
(E501)
230-230: Too many arguments in function definition (10 > 5)
(PLR0913)
240-240: Line too long (125 > 120)
(E501)
src/scanoss/api/components/v2/scanoss_components_pb2.py
19-19: Line too long (3679 > 120)
(E501)
23-23: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (389 > 120)
(E501)
27-27: Undefined name _COMPONENTS
(F821)
28-28: Undefined name _COMPONENTS
(F821)
28-28: Line too long (122 > 120)
(E501)
29-29: Undefined name _COMPONENTS
(F821)
30-30: Undefined name _COMPONENTS
(F821)
30-30: Line too long (136 > 120)
(E501)
31-31: Undefined name _COMPONENTS
(F821)
32-32: Undefined name _COMPONENTS
(F821)
32-32: Line too long (139 > 120)
(E501)
33-33: Undefined name _COMPONENTS
(F821)
34-34: Undefined name _COMPONENTS
(F821)
34-34: Line too long (144 > 120)
(E501)
35-35: Undefined name _COMPSEARCHREQUEST
(F821)
36-36: Undefined name _COMPSEARCHREQUEST
(F821)
37-37: Undefined name _COMPSTATISTIC
(F821)
38-38: Undefined name _COMPSTATISTIC
(F821)
39-39: Undefined name _COMPSTATISTIC_LANGUAGE
(F821)
40-40: Undefined name _COMPSTATISTIC_LANGUAGE
(F821)
41-41: Undefined name _COMPSTATISTICRESPONSE
(F821)
42-42: Undefined name _COMPSTATISTICRESPONSE
(F821)
43-43: Undefined name _COMPSTATISTICRESPONSE_PURLS
(F821)
44-44: Undefined name _COMPSTATISTICRESPONSE_PURLS
(F821)
45-45: Undefined name _COMPSEARCHRESPONSE
(F821)
46-46: Undefined name _COMPSEARCHRESPONSE
(F821)
47-47: Undefined name _COMPSEARCHRESPONSE_COMPONENT
(F821)
48-48: Undefined name _COMPSEARCHRESPONSE_COMPONENT
(F821)
49-49: Undefined name _COMPVERSIONREQUEST
(F821)
50-50: Undefined name _COMPVERSIONREQUEST
(F821)
51-51: Undefined name _COMPVERSIONRESPONSE
(F821)
52-52: Undefined name _COMPVERSIONRESPONSE
(F821)
53-53: Undefined name _COMPVERSIONRESPONSE_LICENSE
(F821)
54-54: Undefined name _COMPVERSIONRESPONSE_LICENSE
(F821)
55-55: Undefined name _COMPVERSIONRESPONSE_VERSION
(F821)
56-56: Undefined name _COMPVERSIONRESPONSE_VERSION
(F821)
57-57: Undefined name _COMPVERSIONRESPONSE_COMPONENT
(F821)
58-58: Undefined name _COMPVERSIONRESPONSE_COMPONENT
(F821)
59-59: Undefined name _COMPONENTS
(F821)
60-60: Undefined name _COMPONENTS
(F821)
src/protoc_gen_swagger/options/openapiv2_pb2.py
18-18: Line too long (9607 > 120)
(E501)
22-22: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Undefined name _SWAGGER_RESPONSESENTRY
(F821)
27-27: Undefined name _SWAGGER_RESPONSESENTRY
(F821)
28-28: Undefined name _SWAGGER_EXTENSIONSENTRY
(F821)
29-29: Undefined name _SWAGGER_EXTENSIONSENTRY
(F821)
30-30: Undefined name _OPERATION_RESPONSESENTRY
(F821)
31-31: Undefined name _OPERATION_RESPONSESENTRY
(F821)
32-32: Undefined name _OPERATION_EXTENSIONSENTRY
(F821)
33-33: Undefined name _OPERATION_EXTENSIONSENTRY
(F821)
34-34: Undefined name _RESPONSE_HEADERSENTRY
(F821)
35-35: Undefined name _RESPONSE_HEADERSENTRY
(F821)
36-36: Undefined name _RESPONSE_EXAMPLESENTRY
(F821)
37-37: Undefined name _RESPONSE_EXAMPLESENTRY
(F821)
38-38: Undefined name _RESPONSE_EXTENSIONSENTRY
(F821)
39-39: Undefined name _RESPONSE_EXTENSIONSENTRY
(F821)
40-40: Undefined name _INFO_EXTENSIONSENTRY
(F821)
41-41: Undefined name _INFO_EXTENSIONSENTRY
(F821)
42-42: Undefined name _SCHEMA
(F821)
43-43: Undefined name _SCHEMA
(F821)
44-44: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY
(F821)
45-45: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY
(F821)
46-46: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY
(F821)
47-47: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY
(F821)
48-48: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY
(F821)
49-49: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY
(F821)
50-50: Undefined name _SCOPES_SCOPEENTRY
(F821)
51-51: Undefined name _SCOPES_SCOPEENTRY
(F821)
52-52: Undefined name _SWAGGER
(F821)
53-53: Undefined name _SWAGGER
(F821)
54-54: Undefined name _SWAGGER_RESPONSESENTRY
(F821)
55-55: Undefined name _SWAGGER_RESPONSESENTRY
(F821)
56-56: Undefined name _SWAGGER_EXTENSIONSENTRY
(F821)
57-57: Undefined name _SWAGGER_EXTENSIONSENTRY
(F821)
58-58: Undefined name _SWAGGER_SWAGGERSCHEME
(F821)
59-59: Undefined name _SWAGGER_SWAGGERSCHEME
(F821)
60-60: Undefined name _OPERATION
(F821)
61-61: Undefined name _OPERATION
(F821)
62-62: Undefined name _OPERATION_RESPONSESENTRY
(F821)
63-63: Undefined name _OPERATION_RESPONSESENTRY
(F821)
64-64: Undefined name _OPERATION_EXTENSIONSENTRY
(F821)
65-65: Undefined name _OPERATION_EXTENSIONSENTRY
(F821)
66-66: Undefined name _HEADER
(F821)
67-67: Undefined name _HEADER
(F821)
68-68: Undefined name _RESPONSE
(F821)
69-69: Undefined name _RESPONSE
(F821)
70-70: Undefined name _RESPONSE_HEADERSENTRY
(F821)
71-71: Undefined name _RESPONSE_HEADERSENTRY
(F821)
72-72: Undefined name _RESPONSE_EXAMPLESENTRY
(F821)
73-73: Undefined name _RESPONSE_EXAMPLESENTRY
(F821)
74-74: Undefined name _RESPONSE_EXTENSIONSENTRY
(F821)
75-75: Undefined name _RESPONSE_EXTENSIONSENTRY
(F821)
76-76: Undefined name _INFO
(F821)
77-77: Undefined name _INFO
(F821)
78-78: Undefined name _INFO_EXTENSIONSENTRY
(F821)
79-79: Undefined name _INFO_EXTENSIONSENTRY
(F821)
80-80: Undefined name _CONTACT
(F821)
81-81: Undefined name _CONTACT
(F821)
82-82: Undefined name _LICENSE
(F821)
83-83: Undefined name _LICENSE
(F821)
84-84: Undefined name _EXTERNALDOCUMENTATION
(F821)
85-85: Undefined name _EXTERNALDOCUMENTATION
(F821)
86-86: Undefined name _SCHEMA
(F821)
87-87: Undefined name _SCHEMA
(F821)
88-88: Undefined name _JSONSCHEMA
(F821)
89-89: Undefined name _JSONSCHEMA
(F821)
90-90: Undefined name _JSONSCHEMA_JSONSCHEMASIMPLETYPES
(F821)
91-91: Undefined name _JSONSCHEMA_JSONSCHEMASIMPLETYPES
(F821)
92-92: Undefined name _TAG
(F821)
93-93: Undefined name _TAG
(F821)
94-94: Undefined name _SECURITYDEFINITIONS
(F821)
95-95: Undefined name _SECURITYDEFINITIONS
(F821)
96-96: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY
(F821)
97-97: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY
(F821)
98-98: Undefined name _SECURITYSCHEME
(F821)
99-99: Undefined name _SECURITYSCHEME
(F821)
100-100: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY
(F821)
101-101: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY
(F821)
102-102: Undefined name _SECURITYSCHEME_TYPE
(F821)
103-103: Undefined name _SECURITYSCHEME_TYPE
(F821)
104-104: Undefined name _SECURITYSCHEME_IN
(F821)
105-105: Undefined name _SECURITYSCHEME_IN
(F821)
106-106: Undefined name _SECURITYSCHEME_FLOW
(F821)
107-107: Undefined name _SECURITYSCHEME_FLOW
(F821)
108-108: Undefined name _SECURITYREQUIREMENT
(F821)
109-109: Undefined name _SECURITYREQUIREMENT
(F821)
110-110: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTVALUE
(F821)
111-111: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTVALUE
(F821)
112-112: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY
(F821)
113-113: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY
(F821)
114-114: Undefined name _SCOPES
(F821)
115-115: Undefined name _SCOPES
(F821)
116-116: Undefined name _SCOPES_SCOPEENTRY
(F821)
117-117: Undefined name _SCOPES_SCOPEENTRY
(F821)
src/scanoss/api/common/v2/scanoss_common_pb2.py
16-16: Line too long (845 > 120)
(E501)
20-20: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
24-24: Undefined name _STATUSCODE
(F821)
25-25: Undefined name _STATUSCODE
(F821)
26-26: Undefined name _STATUSRESPONSE
(F821)
27-27: Undefined name _STATUSRESPONSE
(F821)
28-28: Undefined name _ECHOREQUEST
(F821)
29-29: Undefined name _ECHOREQUEST
(F821)
30-30: Undefined name _ECHORESPONSE
(F821)
31-31: Undefined name _ECHORESPONSE
(F821)
32-32: Undefined name _PURLREQUEST
(F821)
33-33: Undefined name _PURLREQUEST
(F821)
34-34: Undefined name _PURLREQUEST_PURLS
(F821)
35-35: Undefined name _PURLREQUEST_PURLS
(F821)
src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py
6-6: Line too long (130 > 120)
(E501)
111-111: Too many arguments in function definition (10 > 5)
(PLR0913)
128-128: Too many arguments in function definition (10 > 5)
(PLR0913)
145-145: Too many arguments in function definition (10 > 5)
(PLR0913)
155-155: Line too long (123 > 120)
(E501)
162-162: Too many arguments in function definition (10 > 5)
(PLR0913)
172-172: Line too long (125 > 120)
(E501)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py
6-6: Line too long (138 > 120)
(E501)
77-77: Too many arguments in function definition (10 > 5)
(PLR0913)
94-94: Too many arguments in function definition (10 > 5)
(PLR0913)
104-104: Line too long (122 > 120)
(E501)
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py
19-19: Line too long (1718 > 120)
(E501)
23-23: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (375 > 120)
(E501)
27-27: Undefined name _SEMGREP
(F821)
28-28: Undefined name _SEMGREP
(F821)
29-29: Undefined name _SEMGREP
(F821)
30-30: Undefined name _SEMGREP
(F821)
30-30: Line too long (123 > 120)
(E501)
31-31: Undefined name _SEMGREPRESPONSE
(F821)
32-32: Undefined name _SEMGREPRESPONSE
(F821)
33-33: Undefined name _SEMGREPRESPONSE_ISSUE
(F821)
34-34: Undefined name _SEMGREPRESPONSE_ISSUE
(F821)
35-35: Undefined name _SEMGREPRESPONSE_FILE
(F821)
36-36: Undefined name _SEMGREPRESPONSE_FILE
(F821)
37-37: Undefined name _SEMGREPRESPONSE_PURLS
(F821)
38-38: Undefined name _SEMGREPRESPONSE_PURLS
(F821)
39-39: Undefined name _SEMGREP
(F821)
40-40: Undefined name _SEMGREP
(F821)
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py
77-77: Too many arguments in function definition (10 > 5)
(PLR0913)
94-94: Too many arguments in function definition (10 > 5)
(PLR0913)
src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py
6-6: Line too long (150 > 120)
(E501)
94-94: Too many arguments in function definition (10 > 5)
(PLR0913)
111-111: Too many arguments in function definition (10 > 5)
(PLR0913)
128-128: Too many arguments in function definition (10 > 5)
(PLR0913)
138-138: Line too long (131 > 120)
(E501)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (26)
src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (1)
1-143
: LGTM! Note: This is a generated file.The changes are purely cosmetic (indentation and formatting) and do not affect the functionality. Since this is a generated file (as indicated by the comment at line 1), any manual changes will be overwritten the next time the gRPC code is generated.
🧰 Tools
🪛 Ruff (0.8.2)
6-6: Line too long (150 > 120)
(E501)
94-94: Too many arguments in function definition (10 > 5)
(PLR0913)
111-111: Too many arguments in function definition (10 > 5)
(PLR0913)
128-128: Too many arguments in function definition (10 > 5)
(PLR0913)
138-138: Line too long (131 > 120)
(E501)
src/scanoss/api/provenance/v2/scanoss_provenance_pb2.py (1)
1-4
: Note: This is an auto-generated file.This file is generated by the protocol buffer compiler and should not be manually edited. Any changes should be made to the source
.proto
file instead.src/scanoss/api/components/v2/scanoss_components_pb2.py (2)
1-3
: LGTM! This is an auto-generated file.This file is automatically generated by the protocol buffer compiler and should not be manually edited.
19-60
: Static analysis warnings can be safely ignored.The static analysis tool reports several warnings about:
- Line length violations - These are expected for serialized protobuf data
- Undefined names - These symbols are dynamically created by the protobuf compiler at runtime
These warnings can be safely ignored as they are typical for generated protobuf code.
🧰 Tools
🪛 Ruff (0.8.2)
19-19: Line too long (3679 > 120)
(E501)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (389 > 120)
(E501)
27-27: Undefined name
_COMPONENTS
(F821)
28-28: Undefined name
_COMPONENTS
(F821)
28-28: Line too long (122 > 120)
(E501)
29-29: Undefined name
_COMPONENTS
(F821)
30-30: Undefined name
_COMPONENTS
(F821)
30-30: Line too long (136 > 120)
(E501)
31-31: Undefined name
_COMPONENTS
(F821)
32-32: Undefined name
_COMPONENTS
(F821)
32-32: Line too long (139 > 120)
(E501)
33-33: Undefined name
_COMPONENTS
(F821)
34-34: Undefined name
_COMPONENTS
(F821)
34-34: Line too long (144 > 120)
(E501)
35-35: Undefined name
_COMPSEARCHREQUEST
(F821)
36-36: Undefined name
_COMPSEARCHREQUEST
(F821)
37-37: Undefined name
_COMPSTATISTIC
(F821)
38-38: Undefined name
_COMPSTATISTIC
(F821)
39-39: Undefined name
_COMPSTATISTIC_LANGUAGE
(F821)
40-40: Undefined name
_COMPSTATISTIC_LANGUAGE
(F821)
41-41: Undefined name
_COMPSTATISTICRESPONSE
(F821)
42-42: Undefined name
_COMPSTATISTICRESPONSE
(F821)
43-43: Undefined name
_COMPSTATISTICRESPONSE_PURLS
(F821)
44-44: Undefined name
_COMPSTATISTICRESPONSE_PURLS
(F821)
45-45: Undefined name
_COMPSEARCHRESPONSE
(F821)
46-46: Undefined name
_COMPSEARCHRESPONSE
(F821)
47-47: Undefined name
_COMPSEARCHRESPONSE_COMPONENT
(F821)
48-48: Undefined name
_COMPSEARCHRESPONSE_COMPONENT
(F821)
49-49: Undefined name
_COMPVERSIONREQUEST
(F821)
50-50: Undefined name
_COMPVERSIONREQUEST
(F821)
51-51: Undefined name
_COMPVERSIONRESPONSE
(F821)
52-52: Undefined name
_COMPVERSIONRESPONSE
(F821)
53-53: Undefined name
_COMPVERSIONRESPONSE_LICENSE
(F821)
54-54: Undefined name
_COMPVERSIONRESPONSE_LICENSE
(F821)
55-55: Undefined name
_COMPVERSIONRESPONSE_VERSION
(F821)
56-56: Undefined name
_COMPVERSIONRESPONSE_VERSION
(F821)
57-57: Undefined name
_COMPVERSIONRESPONSE_COMPONENT
(F821)
58-58: Undefined name
_COMPVERSIONRESPONSE_COMPONENT
(F821)
59-59: Undefined name
_COMPONENTS
(F821)
60-60: Undefined name
_COMPONENTS
(F821)
src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (1)
9-177
: LGTM! Verify experimental API status.The gRPC service definition is well-structured with all necessary components (stub, servicer, and experimental API). Since this is a generated file, no changes are needed.
Please verify if the experimental API status is intentional and documented in the project's documentation. This affects the following methods:
- Echo
- SearchComponents
- GetComponentVersions
- GetComponentStatistics
🧰 Tools
🪛 Ruff (0.8.2)
111-111: Too many arguments in function definition (10 > 5)
(PLR0913)
128-128: Too many arguments in function definition (10 > 5)
(PLR0913)
145-145: Too many arguments in function definition (10 > 5)
(PLR0913)
155-155: Line too long (123 > 120)
(E501)
162-162: Too many arguments in function definition (10 > 5)
(PLR0913)
172-172: Line too long (125 > 120)
(E501)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py (1)
25-30
: LGTM! API paths are correctly defined.The serialized options correctly define the API paths for the Dependencies service:
/api/v2/dependencies/echo
/api/v2/dependencies/dependencies
🧰 Tools
🪛 Ruff (0.8.2)
26-26: Line too long (398 > 120)
(E501)
27-27: Undefined name
_DEPENDENCIES
(F821)
28-28: Undefined name
_DEPENDENCIES
(F821)
28-28: Line too long (126 > 120)
(E501)
29-29: Undefined name
_DEPENDENCIES
(F821)
30-30: Undefined name
_DEPENDENCIES
(F821)
30-30: Line too long (139 > 120)
(E501)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (2)
70-74
: Note: This is an experimental API.Users should be aware that this API is marked as experimental and may be subject to changes.
76-91
: LGTM! Service methods are correctly defined.The gRPC service methods are properly defined with all necessary parameters for:
- Echo: Standard health check endpoint
- GetDependencies: Main service endpoint for dependency analysis
Also applies to: 93-108
🧰 Tools
🪛 Ruff (0.8.2)
77-77: Too many arguments in function definition (10 > 5)
(PLR0913)
src/protoc_gen_swagger/options/openapiv2_pb2.py (3)
18-18
: Ignore line-length warning in auto-generated code.This line is exceedingly long (over 9,600 characters) as flagged by static analysis, but it's typical of auto-generated protobuf logic. Generally, we avoid manually splitting such lines to preserve parity with the generator's output.
🧰 Tools
🪛 Ruff (0.8.2)
18-18: Line too long (9607 > 120)
(E501)
23-51
: Ignore undefined-name warnings in auto-generated code (part 1).Static analysis flags references like
_SWAGGER_RESPONSESENTRY
as undefined. These are dynamically created by_builder.BuildMessageAndEnumDescriptors
and are standard in generated protobuf files, so you can safely disregard the warnings.🧰 Tools
🪛 Ruff (0.8.2)
26-26: Undefined name
_SWAGGER_RESPONSESENTRY
(F821)
27-27: Undefined name
_SWAGGER_RESPONSESENTRY
(F821)
28-28: Undefined name
_SWAGGER_EXTENSIONSENTRY
(F821)
29-29: Undefined name
_SWAGGER_EXTENSIONSENTRY
(F821)
30-30: Undefined name
_OPERATION_RESPONSESENTRY
(F821)
31-31: Undefined name
_OPERATION_RESPONSESENTRY
(F821)
32-32: Undefined name
_OPERATION_EXTENSIONSENTRY
(F821)
33-33: Undefined name
_OPERATION_EXTENSIONSENTRY
(F821)
34-34: Undefined name
_RESPONSE_HEADERSENTRY
(F821)
35-35: Undefined name
_RESPONSE_HEADERSENTRY
(F821)
36-36: Undefined name
_RESPONSE_EXAMPLESENTRY
(F821)
37-37: Undefined name
_RESPONSE_EXAMPLESENTRY
(F821)
38-38: Undefined name
_RESPONSE_EXTENSIONSENTRY
(F821)
39-39: Undefined name
_RESPONSE_EXTENSIONSENTRY
(F821)
40-40: Undefined name
_INFO_EXTENSIONSENTRY
(F821)
41-41: Undefined name
_INFO_EXTENSIONSENTRY
(F821)
42-42: Undefined name
_SCHEMA
(F821)
43-43: Undefined name
_SCHEMA
(F821)
44-44: Undefined name
_SECURITYDEFINITIONS_SECURITYENTRY
(F821)
45-45: Undefined name
_SECURITYDEFINITIONS_SECURITYENTRY
(F821)
46-46: Undefined name
_SECURITYSCHEME_EXTENSIONSENTRY
(F821)
47-47: Undefined name
_SECURITYSCHEME_EXTENSIONSENTRY
(F821)
48-48: Undefined name
_SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY
(F821)
49-49: Undefined name
_SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY
(F821)
50-50: Undefined name
_SCOPES_SCOPEENTRY
(F821)
51-51: Undefined name
_SCOPES_SCOPEENTRY
(F821)
52-117
: Ignore undefined-name warnings in auto-generated code (part 2).Similar to the previous block, any references flagged as undefined (e.g.,
_SWAGGER
,_OPERATION
,_INFO
, etc.) are created at runtime and are not actual errors. This is normal for protobuf-generated files.🧰 Tools
🪛 Ruff (0.8.2)
52-52: Undefined name
_SWAGGER
(F821)
53-53: Undefined name
_SWAGGER
(F821)
54-54: Undefined name
_SWAGGER_RESPONSESENTRY
(F821)
55-55: Undefined name
_SWAGGER_RESPONSESENTRY
(F821)
56-56: Undefined name
_SWAGGER_EXTENSIONSENTRY
(F821)
57-57: Undefined name
_SWAGGER_EXTENSIONSENTRY
(F821)
58-58: Undefined name
_SWAGGER_SWAGGERSCHEME
(F821)
59-59: Undefined name
_SWAGGER_SWAGGERSCHEME
(F821)
60-60: Undefined name
_OPERATION
(F821)
61-61: Undefined name
_OPERATION
(F821)
62-62: Undefined name
_OPERATION_RESPONSESENTRY
(F821)
63-63: Undefined name
_OPERATION_RESPONSESENTRY
(F821)
64-64: Undefined name
_OPERATION_EXTENSIONSENTRY
(F821)
65-65: Undefined name
_OPERATION_EXTENSIONSENTRY
(F821)
66-66: Undefined name
_HEADER
(F821)
67-67: Undefined name
_HEADER
(F821)
68-68: Undefined name
_RESPONSE
(F821)
69-69: Undefined name
_RESPONSE
(F821)
70-70: Undefined name
_RESPONSE_HEADERSENTRY
(F821)
71-71: Undefined name
_RESPONSE_HEADERSENTRY
(F821)
72-72: Undefined name
_RESPONSE_EXAMPLESENTRY
(F821)
73-73: Undefined name
_RESPONSE_EXAMPLESENTRY
(F821)
74-74: Undefined name
_RESPONSE_EXTENSIONSENTRY
(F821)
75-75: Undefined name
_RESPONSE_EXTENSIONSENTRY
(F821)
76-76: Undefined name
_INFO
(F821)
77-77: Undefined name
_INFO
(F821)
78-78: Undefined name
_INFO_EXTENSIONSENTRY
(F821)
79-79: Undefined name
_INFO_EXTENSIONSENTRY
(F821)
80-80: Undefined name
_CONTACT
(F821)
81-81: Undefined name
_CONTACT
(F821)
82-82: Undefined name
_LICENSE
(F821)
83-83: Undefined name
_LICENSE
(F821)
84-84: Undefined name
_EXTERNALDOCUMENTATION
(F821)
85-85: Undefined name
_EXTERNALDOCUMENTATION
(F821)
86-86: Undefined name
_SCHEMA
(F821)
87-87: Undefined name
_SCHEMA
(F821)
88-88: Undefined name
_JSONSCHEMA
(F821)
89-89: Undefined name
_JSONSCHEMA
(F821)
90-90: Undefined name
_JSONSCHEMA_JSONSCHEMASIMPLETYPES
(F821)
91-91: Undefined name
_JSONSCHEMA_JSONSCHEMASIMPLETYPES
(F821)
92-92: Undefined name
_TAG
(F821)
93-93: Undefined name
_TAG
(F821)
94-94: Undefined name
_SECURITYDEFINITIONS
(F821)
95-95: Undefined name
_SECURITYDEFINITIONS
(F821)
96-96: Undefined name
_SECURITYDEFINITIONS_SECURITYENTRY
(F821)
97-97: Undefined name
_SECURITYDEFINITIONS_SECURITYENTRY
(F821)
98-98: Undefined name
_SECURITYSCHEME
(F821)
99-99: Undefined name
_SECURITYSCHEME
(F821)
100-100: Undefined name
_SECURITYSCHEME_EXTENSIONSENTRY
(F821)
101-101: Undefined name
_SECURITYSCHEME_EXTENSIONSENTRY
(F821)
102-102: Undefined name
_SECURITYSCHEME_TYPE
(F821)
103-103: Undefined name
_SECURITYSCHEME_TYPE
(F821)
104-104: Undefined name
_SECURITYSCHEME_IN
(F821)
105-105: Undefined name
_SECURITYSCHEME_IN
(F821)
106-106: Undefined name
_SECURITYSCHEME_FLOW
(F821)
107-107: Undefined name
_SECURITYSCHEME_FLOW
(F821)
108-108: Undefined name
_SECURITYREQUIREMENT
(F821)
109-109: Undefined name
_SECURITYREQUIREMENT
(F821)
110-110: Undefined name
_SECURITYREQUIREMENT_SECURITYREQUIREMENTVALUE
(F821)
111-111: Undefined name
_SECURITYREQUIREMENT_SECURITYREQUIREMENTVALUE
(F821)
112-112: Undefined name
_SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY
(F821)
113-113: Undefined name
_SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY
(F821)
114-114: Undefined name
_SCOPES
(F821)
115-115: Undefined name
_SCOPES
(F821)
116-116: Undefined name
_SCOPES_SCOPEENTRY
(F821)
117-117: Undefined name
_SCOPES_SCOPEENTRY
(F821)
src/protoc_gen_swagger/options/annotations_pb2.py (3)
18-18
: Ignore line-length warning in auto-generated code.This line is very long due to the serialized data for the descriptor. It is standard behavior in protobuf-generated files, and manually wrapping or modifying this line could break the generated code’s structure.
🧰 Tools
🪛 Ruff (0.8.2)
18-18: Line too long (1009 > 120)
(E501)
23-27
: Ignore undefined-name warnings in auto-generated code (extensions).The static analysis flags
openapiv2_swagger
,openapiv2_operation
, etc., as undefined. These are extensions declared within the generated protobuf logic. They are correctly assembled at runtime and are not issues.🧰 Tools
🪛 Ruff (0.8.2)
23-23: Undefined name
openapiv2_swagger
(F821)
24-24: Undefined name
openapiv2_operation
(F821)
25-25: Undefined name
openapiv2_schema
(F821)
26-26: Undefined name
openapiv2_tag
(F821)
27-27: Undefined name
openapiv2_field
(F821)
29-30
: Ignore protobuf descriptor assignment warnings.Setting
_options
and_serialized_options
toNone
or a byte string here is part of standard protobuf code generation logic. The static analysis warnings about undefined references do not apply to these generated lines.src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py (2)
27-38
: Static analysis false positive for_CRYPTOGRAPHY
.
The_CRYPTOGRAPHY
reference may appear undefined to certain linters, but protoc-generated code often instantiates these variables. Verify it’s generated correctly or silence this lint rule for auto-generated code.🧰 Tools
🪛 Ruff (0.8.2)
27-27: Undefined name
_CRYPTOGRAPHY
(F821)
28-28: Undefined name
_CRYPTOGRAPHY
(F821)
28-28: Line too long (126 > 120)
(E501)
29-29: Undefined name
_CRYPTOGRAPHY
(F821)
30-30: Undefined name
_CRYPTOGRAPHY
(F821)
30-30: Line too long (138 > 120)
(E501)
31-31: Undefined name
_CRYPTOGRAPHY
(F821)
32-32: Undefined name
_CRYPTOGRAPHY
(F821)
32-32: Line too long (149 > 120)
(E501)
33-33: Undefined name
_CRYPTOGRAPHY
(F821)
34-34: Undefined name
_CRYPTOGRAPHY
(F821)
34-34: Line too long (145 > 120)
(E501)
35-35: Undefined name
_CRYPTOGRAPHY
(F821)
36-36: Undefined name
_CRYPTOGRAPHY
(F821)
36-36: Line too long (139 > 120)
(E501)
37-37: Undefined name
_CRYPTOGRAPHY
(F821)
38-38: Undefined name
_CRYPTOGRAPHY
(F821)
38-38: Line too long (141 > 120)
(E501)
39-64
: Check auto-generated references.
References to_ALGORITHM
,_ALGORITHMRESPONSE
,_ALGORITHMSINRANGERESPONSE
,_VERSIONSINRANGERESPONSE
,_HINT
,_HINTSRESPONSE
, and_HINTSINRANGERESPONSE
might trigger “undefined name” warnings. These are typical in protoc code. Confirm they’re generated or ignore false positives.🧰 Tools
🪛 Ruff (0.8.2)
39-39: Undefined name
_ALGORITHM
(F821)
40-40: Undefined name
_ALGORITHM
(F821)
41-41: Undefined name
_ALGORITHMRESPONSE
(F821)
42-42: Undefined name
_ALGORITHMRESPONSE
(F821)
43-43: Undefined name
_ALGORITHMRESPONSE_PURLS
(F821)
44-44: Undefined name
_ALGORITHMRESPONSE_PURLS
(F821)
45-45: Undefined name
_ALGORITHMSINRANGERESPONSE
(F821)
46-46: Undefined name
_ALGORITHMSINRANGERESPONSE
(F821)
47-47: Undefined name
_ALGORITHMSINRANGERESPONSE_PURL
(F821)
48-48: Undefined name
_ALGORITHMSINRANGERESPONSE_PURL
(F821)
49-49: Undefined name
_VERSIONSINRANGERESPONSE
(F821)
50-50: Undefined name
_VERSIONSINRANGERESPONSE
(F821)
51-51: Undefined name
_VERSIONSINRANGERESPONSE_PURL
(F821)
52-52: Undefined name
_VERSIONSINRANGERESPONSE_PURL
(F821)
53-53: Undefined name
_HINT
(F821)
54-54: Undefined name
_HINT
(F821)
55-55: Undefined name
_HINTSRESPONSE
(F821)
56-56: Undefined name
_HINTSRESPONSE
(F821)
57-57: Undefined name
_HINTSRESPONSE_PURLS
(F821)
58-58: Undefined name
_HINTSRESPONSE_PURLS
(F821)
59-59: Undefined name
_HINTSINRANGERESPONSE
(F821)
60-60: Undefined name
_HINTSINRANGERESPONSE
(F821)
61-61: Undefined name
_HINTSINRANGERESPONSE_PURL
(F821)
62-62: Undefined name
_HINTSINRANGERESPONSE_PURL
(F821)
63-63: Undefined name
_CRYPTOGRAPHY
(F821)
64-64: Undefined name
_CRYPTOGRAPHY
(F821)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (2)
27-38
: Potential false positive for_SCANNING
.
References to_SCANNING
can appear undefined in lint checks for generated code. Verify you have correct protoc definitions or suppress these warnings.🧰 Tools
🪛 Ruff (0.8.2)
27-27: Undefined name
_SCANNING
(F821)
28-28: Undefined name
_SCANNING
(F821)
29-29: Undefined name
_SCANNING
(F821)
30-30: Undefined name
_SCANNING
(F821)
30-30: Line too long (132 > 120)
(E501)
31-31: Undefined name
_HFHREQUEST
(F821)
32-32: Undefined name
_HFHREQUEST
(F821)
33-33: Undefined name
_HFHREQUEST_CHILDREN
(F821)
34-34: Undefined name
_HFHREQUEST_CHILDREN
(F821)
35-35: Undefined name
_HFHRESPONSE
(F821)
36-36: Undefined name
_HFHRESPONSE
(F821)
37-37: Undefined name
_HFHRESPONSE_COMPONENT
(F821)
38-38: Undefined name
_HFHRESPONSE_COMPONENT
(F821)
39-42
: Check references to_HFHREQUEST
&_HFHRESPONSE
.
Static analysis may flag these as undefined. In protoc-generated files, that’s likely a false positive.🧰 Tools
🪛 Ruff (0.8.2)
39-39: Undefined name
_HFHRESPONSE_RESULT
(F821)
40-40: Undefined name
_HFHRESPONSE_RESULT
(F821)
41-41: Undefined name
_SCANNING
(F821)
42-42: Undefined name
_SCANNING
(F821)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (3)
21-29
: Stub definitions look good.
The stub methodsEcho
andFolderHashScan
are properly defined and wired for unary calls.
54-63
: RPC handlers are correctly set up.
The method handlers forEcho
andFolderHashScan
appear valid.
65-66
: Generic handler configuration.
No issues noted. This looks standard for gRPC route registration.src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py (1)
1-50
: LGTM! Auto-generated protobuf file.This is an auto-generated file by the protobuf compiler. The changes follow the standard protobuf output format and should not be modified manually.
Please ensure that these changes were generated using the correct version of the protobuf compiler and not modified manually.
🧰 Tools
🪛 Ruff (0.8.2)
14-14: Module level import not at top of file
(E402)
14-14:
scanoss.api.common.v2.scanoss_common_pb2
imported but unusedRemove unused import:
scanoss.api.common.v2.scanoss_common_pb2
(F401)
15-15: Module level import not at top of file
(E402)
15-15:
google.api.annotations_pb2
imported but unusedRemove unused import:
google.api.annotations_pb2
(F401)
16-16: Module level import not at top of file
(E402)
16-16:
protoc_gen_swagger.options.annotations_pb2
imported but unusedRemove unused import:
protoc_gen_swagger.options.annotations_pb2
(F401)
19-19: Line too long (2566 > 120)
(E501)
22-22: Line too long (124 > 120)
(E501)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (419 > 120)
(E501)
27-27: Undefined name
_VULNERABILITIES
(F821)
28-28: Undefined name
_VULNERABILITIES
(F821)
28-28: Line too long (129 > 120)
(E501)
29-29: Undefined name
_VULNERABILITIES
(F821)
30-30: Undefined name
_VULNERABILITIES
(F821)
30-30: Line too long (132 > 120)
(E501)
31-31: Undefined name
_VULNERABILITIES
(F821)
32-32: Undefined name
_VULNERABILITIES
(F821)
32-32: Line too long (152 > 120)
(E501)
33-33: Undefined name
_VULNERABILITYREQUEST
(F821)
34-34: Undefined name
_VULNERABILITYREQUEST
(F821)
35-35: Undefined name
_VULNERABILITYREQUEST_PURLS
(F821)
36-36: Undefined name
_VULNERABILITYREQUEST_PURLS
(F821)
37-37: Undefined name
_CPERESPONSE
(F821)
38-38: Undefined name
_CPERESPONSE
(F821)
39-39: Undefined name
_CPERESPONSE_PURLS
(F821)
40-40: Undefined name
_CPERESPONSE_PURLS
(F821)
41-41: Undefined name
_VULNERABILITYRESPONSE
(F821)
42-42: Undefined name
_VULNERABILITYRESPONSE
(F821)
43-43: Undefined name
_VULNERABILITYRESPONSE_VULNERABILITIES
(F821)
44-44: Undefined name
_VULNERABILITYRESPONSE_VULNERABILITIES
(F821)
45-45: Undefined name
_VULNERABILITYRESPONSE_PURLS
(F821)
46-46: Undefined name
_VULNERABILITYRESPONSE_PURLS
(F821)
47-47: Undefined name
_VULNERABILITIES
(F821)
48-48: Undefined name
_VULNERABILITIES
(F821)
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (1)
30-49
: LGTM! New cryptography RPC methods added.The changes add several new RPC methods to enhance the cryptography service:
GetAlgorithmsInRange
: Get cryptographic algorithms for version rangesGetVersionsInRange
: Get versions with/without cryptographic algorithmsGetHintsInRange
: Get protocol/library/sdk/framework hints for version rangesGetEncryptionHints
: Get encryption-related hintssrc/scanoss/api/common/v2/scanoss_common_pb2.py (1)
1-37
: LGTM! Auto-generated protobuf file.This is an auto-generated file by the protobuf compiler. The changes follow the standard protobuf output format and should not be modified manually.
Please ensure that these changes were generated using the correct version of the protobuf compiler and not modified manually.
🧰 Tools
🪛 Ruff (0.8.2)
16-16: Line too long (845 > 120)
(E501)
20-20: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
24-24: Undefined name
_STATUSCODE
(F821)
25-25: Undefined name
_STATUSCODE
(F821)
26-26: Undefined name
_STATUSRESPONSE
(F821)
27-27: Undefined name
_STATUSRESPONSE
(F821)
28-28: Undefined name
_ECHOREQUEST
(F821)
29-29: Undefined name
_ECHOREQUEST
(F821)
30-30: Undefined name
_ECHORESPONSE
(F821)
31-31: Undefined name
_ECHORESPONSE
(F821)
32-32: Undefined name
_PURLREQUEST
(F821)
33-33: Undefined name
_PURLREQUEST
(F821)
34-34: Undefined name
_PURLREQUEST_PURLS
(F821)
35-35: Undefined name
_PURLREQUEST_PURLS
(F821)
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py (1)
1-42
: LGTM! Auto-generated protobuf file.This is an auto-generated file by the protobuf compiler. The changes follow the standard protobuf output format and should not be modified manually.
Please ensure that these changes were generated using the correct version of the protobuf compiler and not modified manually.
🧰 Tools
🪛 Ruff (0.8.2)
14-14: Module level import not at top of file
(E402)
14-14:
scanoss.api.common.v2.scanoss_common_pb2
imported but unusedRemove unused import:
scanoss.api.common.v2.scanoss_common_pb2
(F401)
15-15: Module level import not at top of file
(E402)
15-15:
google.api.annotations_pb2
imported but unusedRemove unused import:
google.api.annotations_pb2
(F401)
16-16: Module level import not at top of file
(E402)
16-16:
protoc_gen_swagger.options.annotations_pb2
imported but unusedRemove unused import:
protoc_gen_swagger.options.annotations_pb2
(F401)
19-19: Line too long (1718 > 120)
(E501)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (375 > 120)
(E501)
27-27: Undefined name
_SEMGREP
(F821)
28-28: Undefined name
_SEMGREP
(F821)
29-29: Undefined name
_SEMGREP
(F821)
30-30: Undefined name
_SEMGREP
(F821)
30-30: Line too long (123 > 120)
(E501)
31-31: Undefined name
_SEMGREPRESPONSE
(F821)
32-32: Undefined name
_SEMGREPRESPONSE
(F821)
33-33: Undefined name
_SEMGREPRESPONSE_ISSUE
(F821)
34-34: Undefined name
_SEMGREPRESPONSE_ISSUE
(F821)
35-35: Undefined name
_SEMGREPRESPONSE_FILE
(F821)
36-36: Undefined name
_SEMGREPRESPONSE_FILE
(F821)
37-37: Undefined name
_SEMGREPRESPONSE_PURLS
(F821)
38-38: Undefined name
_SEMGREPRESPONSE_PURLS
(F821)
39-39: Undefined name
_SEMGREP
(F821)
40-40: Undefined name
_SEMGREP
(F821)
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (1)
1-109
: LGTM! Note: This is an auto-generated file.The changes appear to be auto-generated formatting adjustments. As indicated by the comment at the top of the file, this is a generated gRPC file and should not be manually edited.
Please ensure that these changes were generated using the gRPC Python protocol compiler plugin and not manually modified. You can regenerate the file using:
❌ Verification inconclusive
🏁 Script executed:
#!/bin/bash # Description: Find the proto file and verify it's being used to generate this gRPC file. # Find the proto file fd -e proto | grep -i "semgrep" # Note: After finding the proto file, you can regenerate this file using: # python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. path/to/your.protoLength of output: 31
Auto-generated file – please verify generation process manually.
The gRPC file appears to be correctly auto-generated with only formatting changes. However, our initial automated search for the corresponding
.proto
file using "semgrep" did not return any results. Since this script output was unexpected, please perform a manual verification to confirm that the file was indeed generated using the correct proto file (likely containing the packagescanoss.api.semgrep.v2
) and that it hasn’t been manually edited.
- Ensure that the proto file exists in the repository (it might be under a different name or directory).
- Verify that regenerating the file with the gRPC Python protocol compiler plugin reproduces the current changes.
🧰 Tools
🪛 Ruff (0.8.2)
77-77: Too many arguments in function definition (10 > 5)
(PLR0913)
94-94: Too many arguments in function definition (10 > 5)
(PLR0913)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
src/scanoss/cli.py (4)
484-490
: Define format choices as a constant.The format choices should be defined as a constant at the module level for better maintainability and reusability.
+FOLDER_SCAN_FORMATS = ['plain', 'json'] + p_folder_scan.add_argument( '--format', '-f', type=str, - choices=['plain', 'json'], + choices=FOLDER_SCAN_FORMATS, help='Result output format (optional - default: plain)', )
1462-1500
: Enhance docstring and add logging.The function could benefit from:
- Type hints in the docstring
- Logging for better debugging and monitoring
-def folder_hashing_scan(parser, args): +def folder_hashing_scan(parser: argparse.ArgumentParser, args: argparse.Namespace) -> None: """Run the "folder-scan" sub-command Args: - parser (ArgumentParser): command line parser object - args (Namespace): Parsed arguments + parser (argparse.ArgumentParser): command line parser object + args (argparse.Namespace): Parsed arguments + + Raises: + ScanossGrpcError: If there is an error during the gRPC communication """ try: if not args.scan_dir: print_stderr('ERROR: Please specify a directory to scan') parser.parse_args([args.subparser, '-h']) sys.exit(1) + if args.debug: + print_stderr(f'Starting folder scan for directory: {args.scan_dir}') + if not os.path.exists(args.scan_dir) or not os.path.isdir(args.scan_dir): print_stderr(f'ERROR: The specified directory {args.scan_dir} does not exist') sys.exit(1) + if args.debug: + print_stderr('Creating scanner configuration...') scanner_config = create_scanner_config_from_args(args) scanoss_settings = get_scanoss_settings_from_args(args) grpc_config = create_grpc_config_from_args(args) + if args.debug: + print_stderr('Initializing gRPC client...') client = ScanossGrpc(**asdict(grpc_config)) scanner = ScannerHFH( scan_dir=args.scan_dir, config=scanner_config, client=client, scanoss_settings=scanoss_settings, ) scanner.best_match = args.best_match scanner.threshold = args.threshold + if args.debug: + print_stderr('Starting scan...') scanner.scan() + if args.debug: + print_stderr('Presenting results...') scanner.present(output_file=args.output, output_format=args.format) except ScanossGrpcError as e: + if args.debug: + print_stderr(f'Scan failed with error: {e}') print_stderr(f'ERROR: {e}') sys.exit(1)
1502-1512
: Add type hints and improve error handling.The function would benefit from type hints and better error handling for the settings file path.
-def get_scanoss_settings_from_args(args): +def get_scanoss_settings_from_args(args: argparse.Namespace) -> Optional[ScanossSettings]: + """Create ScanossSettings from command line arguments. + + Args: + args (argparse.Namespace): Parsed command line arguments + + Returns: + Optional[ScanossSettings]: The settings object or None if settings are skipped + + Raises: + ScanossSettingsError: If there is an error loading the settings file + """ scanoss_settings = None if not args.skip_settings_file: + if args.settings and not os.path.exists(args.settings): + print_stderr(f'Error: Settings file does not exist: {args.settings}') + sys.exit(1) + scanoss_settings = ScanossSettings(debug=args.debug, trace=args.trace, quiet=args.quiet) try: scanoss_settings.load_json_file(args.settings, args.scan_dir).set_file_type('new').set_scan_type('identify') except ScanossSettingsError as e: print_stderr(f'Error: {e}') sys.exit(1) - return scanoss_settings + return scanoss_settings
625-631
: Define exit codes as constants.Exit codes should be defined as constants for better maintainability and clarity.
+# Exit codes +EXIT_SUCCESS = 0 +EXIT_FAILURE = 1 +EXIT_INVALID_ARGS = 2 + if args.version: ver(parser, args) - sys.exit(0) + sys.exit(EXIT_SUCCESS) if not args.subparser: parser.print_help() # No sub command subcommand, print general help - sys.exit(1) + sys.exit(EXIT_INVALID_ARGS) elif (args.subparser in ('utils', 'ut', 'component', 'comp', 'inspect', 'insp', 'ins')) and not args.subparsercmd: parser.parse_args([args.subparser, '--help']) # Force utils helps to be displayed - sys.exit(1) + sys.exit(EXIT_INVALID_ARGS)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/scanoss/cli.py
(46 hunks)src/scanoss/scanners/scanner_hfh.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/scanoss/scanners/scanner_hfh.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (2)
src/scanoss/cli.py (2)
28-47
: LGTM! Well-organized imports and constants.The new imports and constants are properly organized and follow Python best practices. The use of explicit imports and constants improves maintainability.
505-518
: LGTM! Well-integrated shared options.The folder-scan command is properly integrated with the shared options, maintaining consistency with other commands.
Also applies to: 533-534, 561-562, 587-588
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/scanoss/file_filters.py (2)
242-271
: Improve constructor documentation for better maintainability.The constructor has been updated to use
**kwargs
for flexibility, but the docstring could be enhanced to better document the expected keyword arguments.Apply this diff to improve the docstring:
def __init__(self, debug: bool = False, trace: bool = False, quiet: bool = False, **kwargs): """ Initialize scan filters based on default settings. Optionally append custom settings. Args: debug (bool): Enable debug output trace (bool): Enable trace output quiet (bool): Suppress output - **kwargs: Additional arguments including: - scanoss_settings (ScanossSettings): Custom settings to override defaults - all_extensions (bool): Include all file extensions - all_folders (bool): Include all folders - hidden_files_folders (bool): Include hidden files and folders - operation_type (str): Operation type ('scanning' or 'fingerprinting') - skip_size (int): Size to skip - skip_extensions (list): Extensions to skip - skip_folders (list): Folders to skip + **kwargs: Additional arguments + + Keyword Args: + scanoss_settings (ScanossSettings): Custom settings to override defaults + all_extensions (bool, optional): Include all file extensions. Defaults to False. + all_folders (bool, optional): Include all folders. Defaults to False. + hidden_files_folders (bool, optional): Include hidden files and folders. Defaults to False. + operation_type (str, optional): Operation type ('scanning' or 'fingerprinting'). Defaults to 'scanning'. + skip_size (int, optional): Size to skip. Defaults to 0. + skip_extensions (list, optional): Extensions to skip. Defaults to []. + skip_folders (list, optional): Folders to skip. Defaults to []. """
518-552
: Enhance error handling in _should_skip_file_for_hfh method.The new method for folder hashing has a broad exception handler that might mask specific issues.
Apply this diff to improve error handling:
def _should_skip_file_for_hfh(self, file_path: Path) -> bool: """ Check if a file should be skipped during folder hashing scan. Args: file_path (Path): The path to the file to check. Returns: bool: True if the file should be skipped, False otherwise. """ try: if ( any(part.startswith('.') for part in file_path.parts) # Hidden files/folders or file_path.is_symlink() # Symlinks or file_path.stat().st_size == 0 # Empty files ): self.print_debug(f'Skipping file: {file_path} (hidden/symlink/empty)') return True # Files ending with null if file_path.suffix.lower() == '.txt': try: with open(file_path, 'rb') as f: if f.read().endswith(b'\x00'): self.print_debug(f'Skipping file: {file_path} (text file ending with null)') return True - except (OSError, IOError): + except (OSError, IOError) as e: + self.print_debug(f'Skipping file: {file_path} (cannot read file content: {str(e)})') return True return False - except Exception as e: - self.print_debug(f'Error checking file {file_path}: {str(e)}') + except OSError as e: + self.print_debug(f'Error accessing file {file_path}: {str(e)}') + return True + except Exception as e: + self.print_debug(f'Unexpected error checking file {file_path}: {str(e)}') return True
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/scanoss/file_filters.py
(6 hunks)src/scanoss/scanners/scanner_hfh.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/scanoss/scanners/scanner_hfh.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (2)
src/scanoss/file_filters.py (2)
207-213
: LGTM! New file extensions added to skip list.The additions include common file extensions that should be skipped during scanning:
.whml
,.pom
,.smtml
(markup/configuration files).min.js
(minified JavaScript).mf
,.base64
,.s
(binary/encoded files)
441-477
: LGTM! Method visibility change is appropriate.The
_should_skip_dir
method has been renamed toshould_skip_dir
, making it public. This change is appropriate as:
- The method provides essential functionality that may be needed by external callers
- The implementation is stable and well-tested
- The change maintains backward compatibility
1cd1c26
to
293aed4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/scanoss/scanners/scanner_hfh.py (2)
191-192
: Enhance error handling in_build_root_node
.The error handling could be improved by:
- Logging the full stack trace in debug mode.
- Providing more context about the error.
- except Exception as e: - self.print_debug(f'Skipping file {full_file_path}: {str(e)}') + except Exception as e: + self.print_debug(f'Error processing file {full_file_path}: {str(e)}') + if self.base.debug: + import traceback + self.print_debug(f'Stack trace:\n{traceback.format_exc()}')
308-316
: Consider a more readable plain text format.The plain output format currently returns the same JSON format as
_format_json_output
. Consider implementing a more human-readable format for plain text output.def _format_plain_output(self) -> str: """ Format the scan output data into a plain text string """ - return ( - json.dumps(self.scanner.scan_results, indent=2) - if isinstance(self.scanner.scan_results, dict) - else str(self.scanner.scan_results) - ) + if not isinstance(self.scanner.scan_results, dict): + return str(self.scanner.scan_results) + + output = [] + for result in self.scanner.scan_results.get('results', []): + output.append( + f"Path: {result.get('path_id', 'N/A')}\n" + f"Name Hash: {result.get('sim_hash_names', 'N/A')}\n" + f"Content Hash: {result.get('sim_hash_content', 'N/A')}\n" + f"Children: {len(result.get('children', []))}\n" + ) + return '\n'.join(output)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/scanoss/cli.py
(46 hunks)src/scanoss/file_filters.py
(6 hunks)src/scanoss/results.py
(6 hunks)src/scanoss/scanners/scanner_hfh.py
(1 hunks)src/scanoss/utils/abstract_presenter.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/scanoss/file_filters.py
- src/scanoss/utils/abstract_presenter.py
🔇 Additional comments (8)
src/scanoss/results.py (3)
53-113
: LGTM! Well-structured presenter class.The
ResultsPresenter
class follows the Single Responsibility Principle by focusing on presentation logic, with clear separation of concerns between JSON and plain text formatting.
86-94
: Return type inconsistency in plain output format.The method's docstring indicates it returns a plain text string, but if there's no data, it returns a message without printing to stderr.
116-264
: LGTM! Improved error handling and separation of concerns.The changes to the
Results
class:
- Improve separation of concerns by delegating presentation to
ResultsPresenter
.- Enhance error handling by using specific exceptions.
src/scanoss/scanners/scanner_hfh.py (3)
40-50
: LGTM! Clean and focused data structure.The
DirectoryNode
class is well-designed with clear attributes and a single responsibility.
52-61
: LGTM! Clean and focused data structure.The
DirectoryFile
class is well-designed with clear attributes and a single responsibility.
63-287
: LGTM! Well-structured scanner implementation.The
ScannerHFH
class demonstrates:
- Clear separation of concerns between scanning, hashing, and presentation.
- Efficient use of helper classes for directory tree representation.
- Well-documented methods with clear responsibilities.
src/scanoss/cli.py (2)
1465-1503
: Enhance error handling in the folder_hashing_scan function.While the basic error handling is in place, consider adding:
- More specific error types for better error reporting
- Cleanup of resources in case of failures
- Logging of scan progress and errors
1505-1514
: LGTM! Well-structured settings loader.The function demonstrates:
- Clear error handling for settings file loading.
- Proper configuration of settings object.
- Consistent use of debug flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/scanoss/cli.py (1)
1475-1513
: 🛠️ Refactor suggestionEnhance error handling in the folder_hashing_scan function.
While the basic error handling is in place, consider adding:
- More specific error types for better error reporting
- Cleanup of resources in case of failures
- Logging of scan progress and errors
Apply this diff to improve error handling:
def folder_hashing_scan(parser, args): try: if not args.scan_dir: print_stderr('ERROR: Please specify a directory to scan') parser.parse_args([args.subparser, '-h']) sys.exit(1) if not os.path.exists(args.scan_dir) or not os.path.isdir(args.scan_dir): print_stderr(f'ERROR: The specified directory {args.scan_dir} does not exist') sys.exit(1) + if not os.access(args.scan_dir, os.R_OK): + print_stderr(f'ERROR: No read permission for directory {args.scan_dir}') + sys.exit(1) + scanner_config = create_scanner_config_from_args(args) scanoss_settings = get_scanoss_settings_from_args(args) grpc_config = create_grpc_config_from_args(args) client = ScanossGrpc(**asdict(grpc_config)) scanner = ScannerHFH( scan_dir=args.scan_dir, config=scanner_config, client=client, scanoss_settings=scanoss_settings, ) scanner.best_match = args.best_match scanner.threshold = args.threshold + try: scanner.scan() scanner.present(output_file=args.output, output_format=args.format) + finally: + client.close() # Ensure client is properly closed + except ScanossGrpcError as e: print_stderr(f'ERROR: {e}') sys.exit(1) + except Exception as e: + print_stderr(f'ERROR: Unexpected error during scan: {e}') + sys.exit(1)
🧹 Nitpick comments (1)
src/scanoss/cli.py (1)
1515-1524
: Add type hints and explicit return for None case.The function could benefit from type hints and an explicit return for the case when skip_settings_file is True.
Apply this diff to improve type safety and readability:
-def get_scanoss_settings_from_args(args): +def get_scanoss_settings_from_args(args) -> Optional[ScanossSettings]: scanoss_settings = None if not args.skip_settings_file: scanoss_settings = ScanossSettings(debug=args.debug, trace=args.trace, quiet=args.quiet) try: scanoss_settings.load_json_file(args.settings, args.scan_dir).set_file_type('new').set_scan_type('identify') except ScanossSettingsError as e: print_stderr(f'Error: {e}') sys.exit(1) return scanoss_settings + return None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
requirements.txt
(1 hunks)setup.cfg
(1 hunks)src/scanoss/cli.py
(46 hunks)src/scanoss/scanners/scanner_hfh.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- requirements.txt
- src/scanoss/scanners/scanner_hfh.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (1)
src/scanoss/cli.py (1)
478-515
: LGTM! Well-structured argument setup for the folder-scan subcommand.The argument setup is comprehensive and includes:
- Required directory argument
- Optional output file and format
- Configurable timeout
- Best match mode flag
- Threshold validation with a valid range of 1-100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
src/scanoss/scanners/folder_hasher.py (3)
100-120
: Consider limiting memory usage when reading large files.The call to
full_file_path.read_bytes()
at line 150 loads entire file contents into memory. For very large files, this could cause memory pressure or even out-of-memory errors.You might consider reading the file in chunks or hashing it incrementally to reduce memory usage:
- file_bytes = full_file_path.read_bytes() - key = CRC64.get_hash_buff(file_bytes) + with open(full_file_path, 'rb') as f: + hasher = CRC64() + while True: + chunk = f.read(8192) + if not chunk: + break + hasher.update(chunk) + key = hasher.digest()
137-174
: Possible performance impact from sorting and repeated directory traversal.You're sorting all filtered files at line 141 and then collecting them into a tree data structure in a nested loop. For directories with many files, this may become expensive.
Consider aggregating child nodes in one pass or using an iterative approach to reduce repeated traversals and dictionary lookups at lines 158–163.
212-226
: Optimize repeated file-key processing.The code creates a new
bytes
object at line 224 by callingbytes(file.key)
in each iteration. If many files share the same interned key reference, this repeated call could be avoided.You might store the byte key once when constructing
DirectoryFile
and reuse it directly if needed.src/scanoss/scanners/scanner_hfh.py (1)
41-47
: Consider exposing threshold & best_match in constructor.Threshold (line 92) and best_match (line 91) are set on the instance after initialization. If these are critical config values, you could accept them in
__init__
and set them once, reducing the risk of forgetting to configure them.src/scanoss/cli.py (2)
1501-1539
: Enhance error handling and resource cleanup.Inside
folder_hashing_scan
, you create aScanossGrpc
client at line 1522 but never explicitly close it. If that client object needs explicit cleanup, consider wrapping the scan in atry/finally
code block that closes the client.+ try: scanner = ScannerHFH(...) scanner.scan() scanner.present(...) + finally: + client.close() # or equivalent resource cleanup
1541-1572
: Clarify exception handling for folder hashing.At lines 1569–1570, you catch
Exception
and print an error, then exit. If you expect only certain I/O errors, consider narrowing the except block to handle more specific exceptions (e.g.,OSError
) for clearer debugging.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/scanoss/cli.py
(46 hunks)src/scanoss/scanners/folder_hasher.py
(1 hunks)src/scanoss/scanners/scanner_hfh.py
(1 hunks)
🔇 Additional comments (5)
src/scanoss/scanners/folder_hasher.py (1)
263-296
: Docstring clarity:_format_plain_output
actually returns JSON.This is consistent with prior patterns, but the docstring in
_format_plain_output
claims it is "plain text," while JSON is returned ifself.folder_hasher.tree
is a dict.If returning JSON is indeed intended, consider updating the docstring to better reflect the actual output type. Otherwise, convert the data to truly plain text if that is the goal.
src/scanoss/scanners/scanner_hfh.py (2)
49-93
: Validateclient
before gRPC calls.At line 119, you call
self.client.folder_hash_scan(...)
. Ifclient
isNone
or not configured properly, the call might fail unexpectedly.Consider adding a quick check (e.g.,
if not self.client:
) to provide a more actionable error message or fallback behavior.
139-161
: Ensure docstring accuracy for_format_plain_output
.The
_format_plain_output
docstring states it returns a "plain text string," but it actually dumps JSON ifself.scanner.scan_results
is a dict (lines 157–160).Update the docstring to reflect that JSON is returned unless it's not a dict.
src/scanoss/cli.py (2)
482-541
: Review usage instructions for new sub-commands.Lines 482–541 add two new sub-commands
folder-scan
andfolder-hash
. Ensure the help and usage docs are kept consistent across CLI documentation (e.g., inCLIENT_HELP.md
) so that end-users fully understand these commands.Consider reviewing doc references, readme files, or help outputs in other modules to confirm consistency.
507-517
: Threshold range might exclude 0.You’ve restricted
--threshold
to1–100
. If there's a scenario where a threshold of 0 is valid (meaning no content similarity required), consider allowing0
in that range. If 1–100 is truly intended, this is good.
0583fdf
to
b558ed2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/scanoss/cli.py (1)
1501-1539
: 🛠️ Refactor suggestionEnhance error handling and resource cleanup.
Consider adding more robust error handling and ensuring proper resource cleanup.
def folder_hashing_scan(parser, args): """Run the "folder-scan" sub-command Args: parser (ArgumentParser): command line parser object args (Namespace): Parsed arguments """ try: if not args.scan_dir: print_stderr('ERROR: Please specify a directory to scan') parser.parse_args([args.subparser, '-h']) sys.exit(1) if not os.path.exists(args.scan_dir) or not os.path.isdir(args.scan_dir): print_stderr(f'ERROR: The specified directory {args.scan_dir} does not exist') sys.exit(1) + if not os.access(args.scan_dir, os.R_OK): + print_stderr(f'ERROR: No read permission for directory {args.scan_dir}') + sys.exit(1) + scanner_config = create_scanner_config_from_args(args) scanoss_settings = get_scanoss_settings_from_args(args) grpc_config = create_grpc_config_from_args(args) client = ScanossGrpc(**asdict(grpc_config)) + try: scanner = ScannerHFH( scan_dir=args.scan_dir, config=scanner_config, client=client, scanoss_settings=scanoss_settings, ) scanner.best_match = args.best_match scanner.threshold = args.threshold scanner.scan() scanner.present(output_file=args.output, output_format=args.format) + finally: + client.close() # Ensure client is properly closed + except ScanossGrpcError as e: print_stderr(f'ERROR: {e}') sys.exit(1) + except Exception as e: + print_stderr(f'ERROR: Unexpected error during scan: {e}') + sys.exit(1)
🧹 Nitpick comments (12)
src/scanoss/scanners/folder_hasher.py (6)
17-27
: Add type hints to improve code maintainability.Consider adding type hints to improve code maintainability and IDE support.
class DirectoryNode: """ Represents a node in the directory tree for folder hashing. """ def __init__(self, path: str): - self.path = path - self.is_dir = True - self.children: Dict[str, DirectoryNode] = {} - self.files: List[DirectoryFile] = [] + self.path: str = path + self.is_dir: bool = True + self.children: Dict[str, "DirectoryNode"] = {} + self.files: List["DirectoryFile"] = []
29-38
: Add type hints and property validation.Consider adding type hints and property validation to improve code maintainability and data integrity.
class DirectoryFile: """ Represents a file in the directory tree for folder hashing. """ def __init__(self, path: str, key: bytes, key_str: str): + if not isinstance(key, bytes): + raise ValueError("key must be bytes") + if not isinstance(key_str, str): + raise ValueError("key_str must be str") + - self.path = path - self.key = key - self.key_str = key_str + self.path: str = path + self.key: bytes = key + self.key_str: str = key_str
51-60
: Add argument validation.Consider adding argument validation to ensure the provided arguments are valid.
def create_folder_hasher_config_from_args(args) -> FolderHasherConfig: + if not hasattr(args, 'debug'): + raise ValueError("args must have 'debug' attribute") + if not hasattr(args, 'trace'): + raise ValueError("args must have 'trace' attribute") + if not hasattr(args, 'quiet'): + raise ValueError("args must have 'quiet' attribute") + return FolderHasherConfig( debug=args.debug, trace=args.trace, quiet=args.quiet, output_file=getattr(args, 'output', None), output_format=getattr(args, 'format', 'json'), settings_file=getattr(args, 'settings', None), skip_settings_file=getattr(args, 'skip_settings_file', False), )
242-261
: Optimize head calculation.The
_head_calc
method can be optimized using list comprehension and bitwise operations.def _head_calc(self, sim_hash: int) -> int: """ Compute the head value from a simhash integer. The function extracts each byte from the simhash, multiplies it by 2, sums these values, then shifts the result right by 4 bits and returns the lowest 8 bits. Args: sim_hash (int): The input simhash value. Returns: int: The computed head value as an 8-bit integer. """ - total = 0 - for i in range(8): - # Extract each byte and multiply by 2 - b = (sim_hash >> (i * 8)) & 0xFF - total += b * 2 - # Shift right by 4 bits and extract the lowest 8 bits - return (total >> 4) & 0xFF + # Extract bytes, multiply by 2, sum, shift right by 4 bits, and get lowest 8 bits + return (sum(((sim_hash >> (i * 8)) & 0xFF) * 2 for i in range(8)) >> 4) & 0xFF
100-119
: Add progress tracking for better user experience.Consider adding progress tracking to provide better feedback during directory hashing.
def hash_directory(self, path: str) -> dict: """ Generate the folder hashing request structure from a directory path. This method builds a directory tree (DirectoryNode) and computes the associated hash data for the folder. Args: path (str): The root directory path. Returns: dict: The folder hash request structure. """ + self.print_debug(f'Building directory tree for {path}...') root_node = self._build_root_node(path) + self.print_debug('Computing hash data...') tree = self._hash_calc_from_node(root_node) self.tree = tree return tree
278-295
: Add error handling for output formatting.Consider adding error handling to gracefully handle JSON serialization errors.
def _format_json_output(self) -> str: """ Format the scan output data into a JSON object Returns: str: The formatted JSON string """ - return json.dumps(self.folder_hasher.tree, indent=2) + try: + return json.dumps(self.folder_hasher.tree, indent=2) + except (TypeError, ValueError) as e: + self.print_debug(f'Error formatting JSON output: {e}') + return json.dumps({'error': str(e)}, indent=2) def _format_plain_output(self) -> str: """ Format the scan output data into a plain text string """ - return ( - json.dumps(self.folder_hasher.tree, indent=2) - if isinstance(self.folder_hasher.tree, dict) - else str(self.folder_hasher.tree) - ) + try: + if isinstance(self.folder_hasher.tree, dict): + return json.dumps(self.folder_hasher.tree, indent=2) + return str(self.folder_hasher.tree) + except Exception as e: + self.print_debug(f'Error formatting plain output: {e}') + return str(e)src/scanoss/file_filters.py (3)
207-212
: Document the purpose of new skipped extensions.Consider adding comments to explain why these extensions are skipped.
+ # Skip web help markup language files '.whml', + # Skip Maven project object model files '.pom', + # Skip simplified markup template language files '.smtml', + # Skip minified JavaScript files '.min.js', + # Skip manifest files '.mf', + # Skip base64 encoded files '.base64',
242-242
: Document the kwargs parameter in the class docstring.Add kwargs documentation to the class docstring for better API understanding.
class FileFilters(ScanossBase): """ Filter for determining which files to process during scanning, fingerprinting, etc. Handles both inclusion and exclusion rules based on file paths, extensions, and sizes. + + Args: + debug (bool): Enable debug output + trace (bool): Enable trace output + quiet (bool): Suppress output + **kwargs: Additional configuration options + scanoss_settings (ScanossSettings): Custom settings to override defaults + all_extensions (bool): Include all file extensions + all_folders (bool): Include all folders + hidden_files_folders (bool): Include hidden files and folders + operation_type (str): Operation type ('scanning' or 'fingerprinting') + skip_size (int): Size to skip + skip_extensions (list): Extensions to skip + skip_folders (list): Folders to skip """
518-552
: Improve error handling and file reading.Consider using more specific error handling and optimizing file reading.
def _should_skip_file_for_hfh(self, file_path: Path) -> bool: """ Check if a file should be skipped during folder hashing scan. Args: file_path (Path): The path to the file to check. Returns: bool: True if the file should be skipped, False otherwise. """ try: if ( any(part.startswith('.') for part in file_path.parts) # Hidden files/folders or file_path.is_symlink() # Symlinks or file_path.stat().st_size == 0 # Empty files ): self.print_debug(f'Skipping file: {file_path} (hidden/symlink/empty)') return True # Files ending with null if file_path.suffix.lower() == '.txt': try: - with open(file_path, 'rb') as f: - if f.read().endswith(b'\x00'): + # Read only the last byte to check for null + with open(file_path, 'rb') as f: + f.seek(-1, 2) # Seek to last byte + if f.read(1) == b'\x00': self.print_debug(f'Skipping file: {file_path} (text file ending with null)') return True - except (OSError, IOError): + except (OSError, IOError) as e: + self.print_debug(f'Error reading file {file_path}: {e}') self.print_debug(f'Skipping file: {file_path} (cannot read file content)') return True return False - except Exception as e: + except (OSError, ValueError) as e: self.print_debug(f'Error checking file {file_path}: {str(e)}') return Truesrc/scanoss/cli.py (1)
1574-1583
: Add input validation for settings file.Consider adding validation for the settings file path.
def get_scanoss_settings_from_args(args): + if args.settings and not os.path.exists(args.settings): + print_stderr(f'ERROR: Settings file {args.settings} does not exist') + sys.exit(1) + scanoss_settings = None if not args.skip_settings_file: scanoss_settings = ScanossSettings(debug=args.debug, trace=args.trace, quiet=args.quiet) try: scanoss_settings.load_json_file(args.settings, args.scan_dir).set_file_type('new').set_scan_type('identify') except ScanossSettingsError as e: print_stderr(f'Error: {e}') sys.exit(1) return scanoss_settingsdocs/source/index.rst (2)
231-273
: Add example usage and output format description.Consider adding example usage and output format description to improve documentation.
-------------------------------- Folder Scanning: folder-scan, fs -------------------------------- Performs a comprehensive scan of a directory using folder hashing to identify components and their matches. .. code-block:: bash scanoss-py folder-scan <directory> +Example usage: + +.. code-block:: bash + + # Basic usage + scanoss-py folder-scan /path/to/directory + + # With best match and custom threshold + scanoss-py folder-scan /path/to/directory -bm --threshold 80 + + # Save results to a file + scanoss-py folder-scan /path/to/directory -o results.json + +Output format: + +.. code-block:: json + + { + "matches": [ + { + "file": "example.js", + "component": "example-lib", + "version": "1.0.0", + "license": "MIT", + "score": 100 + } + ] + } + .. list-table:: :widths: 20 30 :header-rows: 1
274-303
: Add example usage and output format description.Consider adding example usage and output format description to improve documentation.
-------------------------------- Folder Hashing: folder-hash, fh -------------------------------- Generates cryptographic hashes for files in a given directory and its subdirectories. .. code-block:: bash scanoss-py folder-hash <directory> +Example usage: + +.. code-block:: bash + + # Basic usage + scanoss-py folder-hash /path/to/directory + + # Save results to a file + scanoss-py folder-hash /path/to/directory -o hashes.json + +Output format: + +.. code-block:: json + + { + "path_id": "/path/to/directory", + "sim_hash_names": "a1b2c3d4", + "sim_hash_content": "e5f6g7h8", + "children": [ + { + "path_id": "/path/to/directory/subdirectory", + "sim_hash_names": "i9j0k1l2", + "sim_hash_content": "m3n4o5p6", + "children": [] + } + ] + } + .. list-table:: :widths: 20 30 :header-rows: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CHANGELOG.md
(2 hunks)docs/source/index.rst
(1 hunks)setup.cfg
(1 hunks)src/scanoss/cli.py
(46 hunks)src/scanoss/file_filters.py
(6 hunks)src/scanoss/results.py
(6 hunks)src/scanoss/scanners/folder_hasher.py
(1 hunks)src/scanoss/scanners/scanner_hfh.py
(1 hunks)src/scanoss/utils/abstract_presenter.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- setup.cfg
- src/scanoss/results.py
- src/scanoss/utils/abstract_presenter.py
- src/scanoss/scanners/scanner_hfh.py
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md
[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-02-10 ### Added - Add folder-scan subcommand - Add folder-has...
(REPEATED_VERBS)
🔇 Additional comments (3)
src/scanoss/scanners/folder_hasher.py (1)
40-49
: LGTM!The dataclass is well-designed with proper type hints, default values, and appropriate use of Optional and Literal types.
CHANGELOG.md (2)
12-17
: LGTM! Clear and well-structured changelog entry.The new version section follows the established format and clearly documents the new features added in version 1.21.0, which align well with the PR objectives (ES-163 Add folder hashing support).
🧰 Tools
🪛 LanguageTool
[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-02-10 ### Added - Add folder-scan subcommand - Add folder-has...(REPEATED_VERBS)
466-468
: LGTM! Version comparison links are properly updated.The version comparison links are correctly added and follow the established pattern, maintaining the chronological order of releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (10)
src/scanoss/file_filters.py (4)
61-66
: Include .github in skipped directoriesGood addition of common directories to skip during scans. These are typical directories containing documentation, examples, and tests that are not necessary for the scanning process. Consider also including
.github
which is another common directory pattern that typically contains workflow files and issue templates.
311-316
: Consider consistent handling of symlinksWhile you're warning about symbolic link folders, you're not automatically skipping them in the folder scan. However, in the
_should_skip_file_for_hfh
method, symlinks are skipped. Consider adopting a consistent approach to handling symlinks across all methods.- if dir_path.is_symlink(): # TODO should we skip symlink folders? - self.print_msg(f'WARNING: Found symbolic link folder: {dir_path}') + if dir_path.is_symlink(): + self.print_msg(f'Skipping symbolic link folder: {dir_path}') + dirnames.clear() + continue
529-564
: Consider more specific error handling in _should_skip_file_for_hfhThe new method for folder hashing filtering is well-structured but uses a very broad exception handler. Consider catching more specific exceptions to avoid masking unexpected issues.
Additionally, the special handling for
.txt
files ending with null bytes is interesting. Consider adding a comment explaining why this check is necessary for folder hashing (is it related to potential binary files with incorrect extensions?).- except Exception as e: + except (OSError, PermissionError, ValueError) as e:
549-558
: Consider using a context manager for file operationsWhen checking for null bytes in text files, the code already uses a
with
statement for proper file handling, which is good. However, there are several conditions that might cause exceptions when opening the file. Consider adding debug output in theexcept
block to provide more specific information about why the file couldn't be read.except (OSError, IOError): - self.print_debug(f'Skipping file: {file_path} (cannot read file content)') + self.print_debug(f'Skipping file: {file_path} (cannot read file content - file may be locked or inaccessible)')src/scanoss/scanners/folder_hasher.py (6)
17-28
: Consider storing only direct files per node.
Currently, the code structure suggests that files may get stored in multiple directory nodes along the path.If your use case requires a more traditional tree, consider appending each file to only the deepest node representing its immediate directory.
40-49
: Consider broadening output format support in the future.
TheLiteral['json']
default is fine now, but you might eventually want to accommodate other formats (e.g., YAML, CSV).
100-120
: Gracefully handle empty sets of files.
If no files are filtered (e.g., empty directory), consider returning an explicit indicator (like an empty structure) so the user knows no files were processed.
121-175
: Potential double inclusion of files in the tree.
Because each ancestor directory node also appends the file, a file might appear in multiple levels. Double-check that this is aligned with your intended design.
176-197
: Children array is purely a list.
Returning a list of children can make it harder to distinguish unique directory names. If needed, consider preserving them in a dictionary keyed by directory name/path.
264-297
: Plain output is effectively JSON.
_format_plain_output
returns a JSON dump if the tree is a dictionary. If a simpler plain-text format is desired, consider an alternative method.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/scanoss/file_filters.py
(7 hunks)src/scanoss/scanners/folder_hasher.py
(1 hunks)
🔇 Additional comments (10)
src/scanoss/file_filters.py (4)
215-224
: LGTM: Comprehensive extension exclusion listThe added file extensions represent a good set of file types to exclude. Many of these are either minified files (
.min.js
), build artifacts (.mf
), or diff/patch files that shouldn't be included in the scanning process.
253-253
: Improved constructor flexibility with kwargs patternThe refactoring to use
**kwargs
withkwargs.get()
pattern is a good improvement for flexibility and future-proofing the API. This allows for backward compatibility while supporting additional parameters in the future without breaking changes.Also applies to: 261-282
452-452
: API change: Method visibility change from private to publicChanging
_should_skip_dir
toshould_skip_dir
(removing the underscore) makes this method part of the public API. Ensure this is an intentional change as it may impact users who have extended or used this class, and update any relevant documentation accordingly.
539-564
: Add unit test for _should_skip_file_for_hfhThe new
_should_skip_file_for_hfh
method contains complex logic, especially the handling of text files with null bytes. Consider adding unit tests to verify this functionality works as expected across different file types and edge cases.src/scanoss/scanners/folder_hasher.py (6)
1-8
: Check dependency onprogress.bar
.
The import forprogress.bar
is not part of the standard library.Please ensure that it's properly included and pinned in your environment's requirements or dependencies to avoid runtime errors.
29-39
: DirectoryFile encapsulation looks clean.
It straightforwardly holds file path and CRC64 hash info.
51-61
: Config creation from args is well-structured.
It aligns neatly with your argument parsing approach, keeping concerns separated.
63-99
: Validate existence of the scan directory upfront.
Before hashing, confirm thatscan_dir
points to a valid directory to provide clearer error messages if something is amiss.
198-242
: Duplicate file hashes at ancestor nodes.
_hash_calc
relies on a per-node set ofprocessed_hashes
, so the same file might be rehashed at higher nodes. Verify this won't create inflated or repeated results.
243-263
: Double-check bit-shift logic in_head_calc
.
You shift right by 4 bits and then constrain the result to 8 bits. Confirm that the truncated data still meets your desired collision strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/scanoss/scanners/folder_hasher.py (2)
25-31
: Consider adding__str__
or__repr__
methods for easier debugging.
This class may benefit from a textual representation to aid in logging, troubleshooting, and testing.
295-311
: Unify or differentiate JSON and plain text formats more clearly.
The_format_plain_output
method currently returns JSON if the data is a dictionary, which may be surprising. Consider offering a genuinely plain text format for better clarity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/scanoss/scanners/folder_hasher.py
(1 hunks)
🔇 Additional comments (2)
src/scanoss/scanners/folder_hasher.py (2)
231-244
: Verify whether skipping hash calculations for smaller directories is intended.
By returningNone
when fewer than 8 files are present or when filenames total fewer than 32 characters, entire subdirectories might be ignored in the final hash. Confirm that this aligns with your use case.
251-252
: Rewriting the most significant byte of the name simhash is a neat technique.
Ensure that this does not introduce collisions or side effects when analyzing downstream consumers of the simhash value.
file_bytes = full_file_path.read_bytes() | ||
key = CRC64.get_hash_buff(file_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use a streaming or chunk-based approach for hashing large files.
Reading the entire file into memory might significantly impact performance, especially with very large files.
for part in Path(rel_path).parent.parts: | ||
child_path = str(Path(current_node.path) / part) | ||
if child_path not in current_node.children: | ||
current_node.children[child_path] = DirectoryNode(child_path) | ||
current_node = current_node.children[child_path] | ||
current_node.files.append(file_item) | ||
|
||
root_node.files.append(file_item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid storing file references in both parent and child nodes to prevent duplication.
Appending the same DirectoryFile
object to each parent directory may lead to bloated data structures, confusing file structures, and redundant file entries. Consider only maintaining the file reference in the leaf node corresponding to its directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/scanoss/file_filters.py (2)
529-563
: Narrow the exception handling in _should_skip_file_for_hfhThe broad try-except block that catches all exceptions could mask unexpected errors, making debugging difficult. Consider catching more specific exceptions like OSError, IOError, etc., rather than the generic Exception class.
Additionally, the null-byte check only applies to .txt files. Should this be extended to other text file types?
- try: - if ( - any(part.startswith('.') for part in file_path.parts) # Hidden files/folders - or file_path.is_symlink() # Symlinks - or file_path.stat().st_size == 0 # Empty files - ): - self.print_debug(f'Skipping file: {file_path} (hidden/symlink/empty)') - return True + # Check for hidden files/folders + if any(part.startswith('.') for part in file_path.parts): + self.print_debug(f'Skipping file: {file_path} (hidden file/folder)') + return True + + try: + # Check for symlinks and empty files + if file_path.is_symlink(): + self.print_debug(f'Skipping file: {file_path} (symlink)') + return True + + if file_path.stat().st_size == 0: + self.print_debug(f'Skipping file: {file_path} (empty file)') + return TrueAnd for the null-byte check:
- # Files ending with null - if file_path.suffix.lower() == '.txt': + # Files ending with null bytes - check common text file types + if file_path.suffix.lower() in ['.txt', '.md', '.csv', '.log', '.json', '.xml', '.yaml', '.yml']:
549-557
: Consider enhancing the text file null-byte checkThe check for null bytes only targets
.txt
files. Consider expanding this to other common text file formats that might have similar issues, such as.md
,.csv
,.log
, etc.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/scanoss/file_filters.py
(7 hunks)
🔇 Additional comments (5)
src/scanoss/file_filters.py (5)
215-224
: LGTM: File extension filtering expanded appropriatelyThe additional file extensions added to DEFAULT_SKIPPED_EXT appropriately handle more non-code files that should be skipped during scanning operations.
253-281
: Well-structured kwargs implementation and documentationThe refactoring to use kwargs with default values and detailed documentation improves the flexibility of the FileFilters class while maintaining backward compatibility. This is a good approach for extending functionality.
311-316
: LGTM: Improved path handling with pathlibThe switch to using Path objects from pathlib is a good modernization that makes path operations more robust and readable.
452-452
:❓ Verification inconclusive
Method made public - verify all usages are updated
The
_should_skip_dir
method has been renamed toshould_skip_dir
(removing the underscore), making it part of the public API. Ensure that all places in the codebase that call this method have been updated accordingly.
🏁 Script executed:
#!/bin/bash # Check for any remaining calls to the old method name echo "Searching for code that might still use the old _should_skip_dir method name..." rg "_should_skip_dir" --type pyLength of output: 200
Manual Verification Required: Confirm All References to
_should_skip_dir
Are UpdatedThe automated search did not show any remaining usages of
_should_skip_dir
. However, since the command output was minimal, please manually verify that all calls to_should_skip_dir
have been updated to the newshould_skip_dir
method.
- Confirm that the old method is not defined or used anywhere in the codebase.
- Double-check call sites in related modules and tests.
61-66
:❓ Verification inconclusive
Consider clarifying the commented directories in DEFAULT_SKIPPED_DIRS
These commented-out directories (
example
,examples
,docs
,tests
,doc
,test
) suggest they might be skipped in certain scenarios. If there's a feature flag or configuration to enable these, it should be documented. Otherwise, consider removing the commented entries to avoid confusion.
🏁 Script executed:
#!/bin/bash # Check if these commented directories are used conditionally elsewhere in the codebase echo "Searching for code that might conditionally use these commented directories..." rg -A 5 -B 5 "(DEFAULT_SKIPPED_DIRS.*example|example.*DEFAULT_SKIPPED_DIRS)" --type pyLength of output: 255
Action: Clarify or Remove Commented-Out Directories in DEFAULT_SKIPPED_DIRS
The search did not reveal any conditional usage of these directories elsewhere in the codebase. It remains unclear if these entries are placeholders for a future feature flag or simply vestigial. If they are meant to be activated via configuration or a flag, please add documentation explaining their usage. Otherwise, consider removing these commented-out entries from
src/scanoss/file_filters.py
(lines 61–66) to avoid potential confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
src/scanoss/file_filters.py (1)
553-563
:⚠️ Potential issueRemove duplicate code blocks.
These lines appear to be an exact duplicate of lines 542-552. They should be removed to avoid confusion and maintenance issues.
- self.print_debug(f'Skipping file: {file_path} (text file ending with null)') - return True - except (OSError, IOError): - self.print_debug(f'Skipping file: {file_path} (cannot read file content)') - return True - - return False - - except Exception as e: - self.print_debug(f'Error checking file {file_path}: {str(e)}') - return Truesrc/scanoss/scanners/folder_hasher.py (1)
297-315
: 🛠️ Refactor suggestionRemove duplicated output formatting methods.
This block repeats_format_json_output
and_format_plain_output
already defined in lines 278–295. Having duplicate methods is error-prone and can lead to maintenance overhead.- def _format_json_output(self) -> str: - ... - def _format_plain_output(self) -> str: - ... + # Remove these duplicate methods, or unify them with the original definitions at lines 278-295.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
src/scanoss/file_filters.py
(5 hunks)src/scanoss/file_filters.py
(1 hunks)src/scanoss/scanners/folder_hasher.py
(1 hunks)src/scanoss/file_filters.py
(2 hunks)src/scanoss/scanners/folder_hasher.py
(2 hunks)src/scanoss/scanners/folder_hasher.py
(3 hunks)src/scanoss/file_filters.py
(1 hunks)src/scanoss/file_filters.py
(1 hunks)src/scanoss/scanners/folder_hasher.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- src/scanoss/file_filters.py
- src/scanoss/file_filters.py
- src/scanoss/file_filters.py
- src/scanoss/scanners/folder_hasher.py
- src/scanoss/scanners/folder_hasher.py
- src/scanoss/scanners/folder_hasher.py
- src/scanoss/file_filters.py
🔇 Additional comments (4)
src/scanoss/file_filters.py (1)
518-552
: Well-implemented file skip logic for folder hashing.The new
_should_skip_file_for_hfh
method provides good encapsulation of filter logic specific to folder hashing operations. The implementation properly handles various conditions like hidden files, symlinks, empty files, and text files with null bytes. The error handling is robust with appropriate debug logging.Consider clarifying what "hfh" stands for in the method name or docstring - I assume it's "folder hashing" but this might not be immediately clear to all developers.
src/scanoss/scanners/folder_hasher.py (3)
100-120
: Consider adding unit tests for edge cases.
It would be beneficial to include unit tests that cover scenarios like empty directories, large directories, and permission-restricted files to ensure the hashing logic operates correctly under all conditions.Would you like a script to scan for existing test files and confirm coverage for these edge cases?
150-151
: Use a streaming or chunk-based approach for hashing large files.
Reading the entire file into memory at once can significantly impact performance and memory usage for very large files.
163-165
: Avoid storing file references in both parent and child nodes.
Appending the sameDirectoryFile
object to the current node androot_node
may cause duplication, leading to confusing directory structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/scanoss/scanners/folder_hasher.py (1)
77-79
: Parameter overshadow in class constructor
scan_dir
is provided to the constructor but is overshadowed by the separatepath
parameter inhash_directory()
. Consider removingscan_dir
or using it consistently everywhere to reduce confusion.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/scanoss/file_filters.py
(6 hunks)src/scanoss/scanners/folder_hasher.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/scanoss/file_filters.py
🧰 Additional context used
🧬 Code Definitions (1)
src/scanoss/scanners/folder_hasher.py (4)
src/scanoss/file_filters.py (2)
FileFilters
(247-527)get_filtered_files_from_folder
(283-324)src/scanoss/utils/abstract_presenter.py (4)
AbstractPresenter
(8-68)present
(29-48)_format_json_output
(57-61)_format_plain_output
(64-68)src/scanoss/results.py (3)
present
(261-263)_format_json_output
(63-84)_format_plain_output
(86-99)src/scanoss/scanners/scanner_hfh.py (3)
present
(128-130)_format_json_output
(143-150)_format_plain_output
(152-160)
🔇 Additional comments (4)
src/scanoss/scanners/folder_hasher.py (4)
50-50
: Potential type mismatch for output_format in FolderHasherConfigCurrently,
output_format
is annotated asLiteral['json']
even though the presenter can also handle'plain'
. Verify and confirm that passing'plain'
will not lead to unexpected exceptions or type-checking issues.
158-159
: Use a streaming approach to avoid high memory usageReading the entire file content into memory might impact performance significantly for large files. Adopting a chunk-based or streaming approach can mitigate memory pressure and improve scalability.
166-173
: Avoid storing the same file reference in multiple nodesAppending the same
DirectoryFile
object to both the child node and the root node can lead to redundancy and confusion in the directory tree. Consider storing file references only in the most relevant directory node to prevent duplication.
288-316
: Overall presentation logic appears solidThe presenter subclasses
AbstractPresenter
correctly, and the JSON/plain output generation for folder hashes is well-structured. No immediate concerns here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
src/scanoss/scanners/folder_hasher.py (5)
175-178
: Use more specific exception types for better error handling.Catching all exceptions with a generic
Exception
handler might mask unexpected errors that should be handled differently.Consider specifying the exceptions you expect to catch:
- except Exception as e: - self.base.print_debug(f'Skipping file {full_file_path}: {str(e)}') + except (IOError, PermissionError, OSError) as e: + self.base.print_debug(f'Skipping file {full_file_path}: {str(e)}') + except Exception as e: + self.base.print_debug(f'Unexpected error processing file {full_file_path}: {str(e)}') + # Consider whether to re-raise certain exceptions
234-249
: Add a descriptive comment explaining minimum threshold requirements.The code checks if there are enough files and if the concatenated names are long enough, but it would be helpful to explain why these thresholds exist.
if len(selected_names) < MINIMUM_FILE_COUNT: + # Insufficient files to generate a reliable hash signature return { 'name_hash': None, 'content_hash': None, } selected_names.sort() concatenated_names = ''.join(selected_names) if len(concatenated_names.encode('utf-8')) < MINIMUM_CONCATENATED_NAME_LENGTH: + # Concatenated names too short to generate a meaningful signature return { 'name_hash': None, 'content_hash': None, }
254-255
: Document why the most significant byte of the simhash is replaced.The replacement of the most significant byte with a head value is an important detail that deserves explanation.
# Calculate head and overwrite MS byte head = self._head_calc(names_simhash) + # Replace most significant byte with head value to create a unique + # identifier while maintaining the simhash properties in the remaining bytes names_simhash = (names_simhash & 0x00FFFFFFFFFFFFFF) | (head << 56)
307-315
:_format_plain_output
returns JSON regardless of the name.The plain output formatter returns JSON content wrapped in a conditional that always evaluates to JSON content, making the method name potentially misleading.
Consider either:
- Renaming the method to better reflect its functionality
- Implementing a truly plain text output format that's distinct from JSON
def _format_plain_output(self) -> str: """ - Format the scan output data into a plain text string + Format the scan output data into a plain text string (currently returns JSON) """ return ( json.dumps(self.folder_hasher.tree, indent=2) if isinstance(self.folder_hasher.tree, dict) else str(self.folder_hasher.tree) )
150-151
: Add a mechanism to disable progress bar in automated environments.The progress bar is helpful for interactive use but may not be suitable in CI/CD or headless environments.
- bar = Bar('Hashing files...', max=len(filtered_files)) + bar = Bar('Hashing files...', max=len(filtered_files)) if not self.base.quiet else None for file_path in filtered_files: try: # ... (existing code) - bar.next() + if bar: + bar.next() - bar.finish() + if bar: + bar.finish()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/scanoss/scanners/folder_hasher.py
(1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
src/scanoss/scanners/folder_hasher.py (4)
src/scanoss/file_filters.py (2)
FileFilters
(247-527)get_filtered_files_from_folder
(283-324)src/scanoss/utils/abstract_presenter.py (4)
AbstractPresenter
(8-68)present
(29-48)_format_json_output
(57-61)_format_plain_output
(64-68)src/scanoss/results.py (3)
present
(261-263)_format_json_output
(63-84)_format_plain_output
(86-99)src/scanoss/scanners/scanner_hfh.py (3)
present
(128-130)_format_json_output
(143-150)_format_plain_output
(152-160)
🔇 Additional comments (5)
src/scanoss/scanners/folder_hasher.py (5)
143-145
: File name filtering is now implemented correctly.The filter condition correctly checks for names exceeding the maximum length, which aligns with the constant name
MAXIMUM_FILE_NAME_LENGTH
. This resolves the inconsistency mentioned in a past review comment.
158-159
: Use a streaming or chunk-based approach for hashing large files.Reading the entire file into memory might cause performance issues with very large files, consuming excessive memory.
Consider using a buffered reading approach:
- file_bytes = full_file_path.read_bytes() - key = CRC64.get_hash_buff(file_bytes) + # Process file in chunks to avoid loading large files into memory + crc = CRC64() + with open(full_file_path, 'rb') as f: + while chunk := f.read(65536): # 64KB chunks + crc.update(chunk) + key = crc.digest()
165-173
: Avoid storing file references in both parent and child nodes to prevent duplication.Appending the same
DirectoryFile
object to each parent directory may lead to bloated data structures and redundant file entries.Consider only maintaining the file reference in its immediate parent directory, or use a more memory-efficient approach such as indexing files by ID.
176-176
: Debug message is now using the correct method.You're correctly using
self.base.print_debug
here, which resolves a previously identified issue.
250-255
:❓ Verification inconclusive
Ensure hash calculation is efficient with a suitable simhash algorithm.
The code uses both
simhash
for names andfingerprint
for content. Verify that this approach produces a reliable and collision-resistant folder signature.
🏁 Script executed:
#!/bin/bash # Find other uses of simhash functions in the codebase to ensure consistency rg -A 3 "simhash\(" --type py rg -A 3 "fingerprint\(" --type py # Check if we should be using the same approach for both name_hash and content_hash rg -A 5 "WordFeatureSet|vectorize_bytes" --type pyLength of output: 4261
Attention: Validate Folder Hashing Algorithm Efficiency and Robustness
The use of
simhash
on aWordFeatureSet
for names alongsidefingerprint(vectorize_bytes(...))
for file content is consistent with similar implementations in the codebase. In particular, the algorithm for computing a 64‑bit hash and then adjusting the most significant byte with a head value (via_head_calc
) appears intentional.Please verify that:
- The dual approach (names and content) yields a collision-resistant folder signature under expected inputs.
- The additional head calculation and bit masking do not adversely impact the hash’s uniqueness or performance.
- The current simhash and fingerprint implementations are sufficiently efficient for the application's workload.
779a921
to
5d65364
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/scanoss/scanners/folder_hasher.py (1)
220-257
: Validate performance of simhash for large file sets.
Although simhash is generally efficient, note that sorting names and computing feature sets for many files might become expensive. For massive directories, you might want to verify the time/space impact.If necessary, consider calculating partial hashes or employing a parallel approach for extremely large sets.
src/scanoss/file_filters.py (2)
74-80
: Confirm skipping new folders aligns with user expectations.
Addingdoc
,docs
,test
, andtests
to the skip sets can be beneficial, but some users might still want them hashed in certain scenarios. Consider making this easily configurable or documented.Also applies to: 82-95
439-439
:FileFilters
constructor now using**kwargs
.
While this is flexible, it can make type-checking or IDE autocompletion less robust. Consider adopting a typed dataclass or explicit constructor parameters if you anticipate expansions or user misconfigurations.Also applies to: 447-457, 460-468
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/scanoss/file_filters.py
(12 hunks)src/scanoss/scanners/folder_hasher.py
(1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
src/scanoss/scanners/folder_hasher.py (3)
src/scanoss/utils/abstract_presenter.py (4)
AbstractPresenter
(8-68)present
(29-48)_format_json_output
(57-61)_format_plain_output
(64-68)src/scanoss/results.py (3)
present
(261-263)_format_json_output
(63-84)_format_plain_output
(86-99)src/scanoss/scanners/scanner_hfh.py (3)
present
(128-130)_format_json_output
(143-150)_format_plain_output
(152-160)
🔇 Additional comments (8)
src/scanoss/scanners/folder_hasher.py (5)
16-19
: Clarify and confirm usage of naming constants.
The naming ofMINIMUM_FILE_COUNT
,MINIMUM_CONCATENATED_NAME_LENGTH
, andMINIMUM_FILE_NAME_LENGTH
suggests skipping items below a certain threshold, but please confirm their intended logic in the calling code to avoid confusion.Would you like me to verify references to these constants in other files or code paths to ensure consistent usage?
126-144
: Check usage ofMINIMUM_FILE_NAME_LENGTH
for filtering files.
Here, you only include files whose names have a length less thanMINIMUM_FILE_NAME_LENGTH
. This appears contradictory to the constant’s name, as “minimum” usually denotes the lower bound for acceptance, yet you’re including only very short filenames.Can you confirm if this logic aligns with the intended spec, or if it should be
>
instead of<
?
158-159
: Consider using a streaming or chunk-based approach to read file contents.
Reading large files in their entirety withread_bytes()
can lead to high memory usage. Revisiting the use of a chunk-based approach may improve performance for big files.
171-173
: Avoid storing the same file reference at multiple node levels.
Appendingfile_item
to both the specific child node and theroot_node
may cause data duplication or unintended complexity. Consider storing file references only where strictly needed.
258-291
: Overall presentation logic seems sound.
The abstract presenter pattern is clean, and your JSON/plain text outputs are handled effectively. No immediate issues found here.src/scanoss/file_filters.py (3)
49-59
: Ensure consistency betweenDEFAULT_SKIPPED_FILES
andDEFAULT_SKIPPED_FILES_HFH
.
These newly introduced defaults for folder hashing scans look reasonable. However, confirm that skipping certain build wrapper scripts is indeed desired, especially if the user might rely on them.
245-254
: Extensive default skip extensions for folder hashing.
This extended list of skipped extensions appears thorough. Double-check that none of these file types are crucial for certain scanning/hashing use cases.Would you like me to run a usage scan in the repository to confirm no critical references to these file types exist?
Also applies to: 277-430
660-660
: Renaming_should_skip_dir
toshould_skip_dir
can affect external references.
This could break backward compatibility if any imports or references rely on the old name.Would you like a shell script to grep for
_should_skip_dir
references outside this file to ensure nothing breaks?
819e324
to
5472319
Compare
feat: ES-163 add build children logic to folder hashing scan feat: ES-163 mimic go implementation of hashing folder content feat: ES-163 create our own crc64 implementation feat: ES-163 create our own simhash implementation based on go library feat: ES-163 better error handling feat: ES-163 update changelog, client help and version. add headers feat: ES-163 add AbstractPresenter to handle results output in a centralized way feat: ES-163 update changelog feat: ES-163 fix pr comments, update lint.yml workflow feat: ES-163 add best match and treshold arguments to hfh scan command downgrade python/protobuf versions feat: apply same filters from go minr feat: create standalone presenters that implement abstractpresenter feat: add progress tracking while scanning, clean folder-scan command arguments feat: use progress instead of rich feat: add folder hash sub-command, fix hfh spinner feat: update docs feat: update default files/directories to skip while scanning feat: add extra checks in file filters for folder hashing scan feat: comment new skipped dirs for now feat: align with go client again feat: filter hidden file/folders feat: add specific filters for hfh feat: add missing command args feat: ES-163 add build children logic to folder hashing scan feat: ES-163 mimic go implementation of hashing folder content feat: ES-163 create our own crc64 implementation feat: ES-163 create our own simhash implementation based on go library feat: ES-163 better error handling feat: ES-163 update changelog, client help and version. add headers feat: ES-163 add AbstractPresenter to handle results output in a centralized way feat: ES-163 update changelog feat: ES-163 fix pr comments, update lint.yml workflow feat: update changelog
dee5976
to
4b2e026
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🔭 Outside diff range comments (1)
src/scanoss/scanossgrpc.py (1)
531-532
:⚠️ Potential issueRemove unused variables.
SUCCEDED_WITH_WARNINGS_STATUS_CODE
andFAILED_STATUS_CODE
are never referenced and trigger lint errors. Remove them to address the warnings.- SUCCEDED_WITH_WARNINGS_STATUS_CODE = 2 - FAILED_STATUS_CODE = 3🧰 Tools
🪛 Ruff (0.8.2)
531-531: Local variable
SUCCEDED_WITH_WARNINGS_STATUS_CODE
is assigned to but never usedRemove assignment to unused variable
SUCCEDED_WITH_WARNINGS_STATUS_CODE
(F841)
532-532: Local variable
FAILED_STATUS_CODE
is assigned to but never usedRemove assignment to unused variable
FAILED_STATUS_CODE
(F841)
🪛 GitHub Actions: Lint
[error] 531-531: F841 Local variable
SUCCEDED_WITH_WARNINGS_STATUS_CODE
is assigned to but never used. Remove assignment to unused variableSUCCEDED_WITH_WARNINGS_STATUS_CODE
.
[error] 532-532: F841 Local variable
FAILED_STATUS_CODE
is assigned to but never used. Remove assignment to unused variableFAILED_STATUS_CODE
.
♻️ Duplicate comments (3)
src/scanoss/cli.py (2)
1589-1627
: 🛠️ Refactor suggestionEnhance error handling in folder_hashing_scan.
While basic error handling is present, consider adding more specific error types and resource cleanup.
def folder_hashing_scan(parser, args): try: if not args.scan_dir: print_stderr('ERROR: Please specify a directory to scan') parser.parse_args([args.subparser, '-h']) sys.exit(1) if not os.path.exists(args.scan_dir) or not os.path.isdir(args.scan_dir): print_stderr(f'ERROR: The specified directory {args.scan_dir} does not exist') sys.exit(1) + if not os.access(args.scan_dir, os.R_OK): + print_stderr(f'ERROR: No read permission for directory {args.scan_dir}') + sys.exit(1) + scanner_config = create_scanner_config_from_args(args) scanoss_settings = get_scanoss_settings_from_args(args) grpc_config = create_grpc_config_from_args(args) client = ScanossGrpc(**asdict(grpc_config)) + scanner = ScannerHFH( scan_dir=args.scan_dir, config=scanner_config, client=client, scanoss_settings=scanoss_settings, ) scanner.best_match = args.best_match scanner.threshold = args.threshold + try: scanner.scan() scanner.present(output_file=args.output, output_format=args.format) + finally: + client.close() # Ensure client is properly closed + except ScanossGrpcError as e: print_stderr(f'ERROR: {e}') sys.exit(1) + except Exception as e: + print_stderr(f'ERROR: Unexpected error during scan: {e}') + sys.exit(1)
1629-1659
: 🛠️ Refactor suggestionEnhance error handling in folder_hash.
Improve error handling by adding more specific exception types and providing better progress feedback.
def folder_hash(parser, args): """Run the "folder-hash" sub-command Args: parser (ArgumentParser): command line parser object args (Namespace): Parsed arguments """ try: if not args.scan_dir: print_stderr('ERROR: Please specify a directory to scan') parser.parse_args([args.subparser, '-h']) sys.exit(1) if not os.path.exists(args.scan_dir) or not os.path.isdir(args.scan_dir): print_stderr(f'ERROR: The specified directory {args.scan_dir} does not exist') sys.exit(1) + if not os.access(args.scan_dir, os.R_OK): + print_stderr(f'ERROR: No read permission for directory {args.scan_dir}') + sys.exit(1) + folder_hasher_config = create_folder_hasher_config_from_args(args) scanoss_settings = get_scanoss_settings_from_args(args) + if not args.quiet: + print_stderr(f'Hashing directory: {args.scan_dir}') + folder_hasher = FolderHasher( scan_dir=args.scan_dir, config=folder_hasher_config, scanoss_settings=scanoss_settings, ) folder_hasher.hash_directory(args.scan_dir) folder_hasher.present(output_file=args.output, output_format=args.format) - except Exception as e: + except (OSError, IOError) as e: + print_stderr(f'ERROR: File system error: {e}') + sys.exit(1) + except ValueError as e: + print_stderr(f'ERROR: Invalid input: {e}') + sys.exit(1) + except Exception as e: print_stderr(f'ERROR: {e}') sys.exit(1)src/scanoss/scanners/folder_hasher.py (1)
16-18
:⚠️ Potential issueCheck naming vs. usage.
MINIMUM_FILE_NAME_LENGTH = 32
suggests filenames shorter than 32 bytes might be invalid, yet line 143 filters files withlen(f.name.encode('utf-8')) < MINIMUM_FILE_NAME_LENGTH
. Verify whether you intended to exclude short filenames or exclude long ones. The logic is reversed from the variable naming.- if f.is_file() and len(f.name.encode('utf-8')) < MINIMUM_FILE_NAME_LENGTH + if f.is_file() and len(f.name.encode('utf-8')) >= MINIMUM_FILE_NAME_LENGTH
🧹 Nitpick comments (22)
.github/workflows/lint.yml (2)
35-37
: Remove trailing whitespace and verify filter comments.
There is a trailing space at the end of line 35, which triggers the YAMLlint error. Please remove it to comply with YAML formatting standards. The filtering logic and its accompanying comments look clear and appropriate as a temporary workaround.- # Filter out files that match exclude patterns from pyproject.toml␣ + # Filter out files that match exclude patterns from pyproject.toml🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 35-35: trailing spaces
(trailing-spaces)
58-58
: Remove extra blank line.
Static analysis reports an excess blank line on line 58. Please remove it to adhere to YAMLlint guidelines.-
🧰 Tools
🪛 YAMLlint (1.35.1)
[warning] 58-58: too many blank lines
(1 > 0) (empty-lines)
src/scanoss/api/provenance/v2/scanoss_provenance_pb2.py (1)
23-23
: Consider usingnot
instead of equality comparison withFalse
While this is auto-generated code that shouldn't be manually modified, for future reference:
-if _descriptor._USE_C_DESCRIPTORS == False: +if not _descriptor._USE_C_DESCRIPTORS:🧰 Tools
🪛 Ruff (0.8.2)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
src/scanoss/api/common/v2/scanoss_common_pb2.py (1)
20-20
: Consider usingnot
instead of equality comparison withFalse
While this is auto-generated code that shouldn't be manually modified, for future reference:
-if _descriptor._USE_C_DESCRIPTORS == False: +if not _descriptor._USE_C_DESCRIPTORS:🧰 Tools
🪛 Ruff (0.8.2)
20-20: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (1)
77-86
: Consider using fewer parameters or a configuration objectThese methods have 10 parameters, which exceeds the recommended limit (5). While this is auto-generated code that shouldn't be modified directly, this pattern might be worth discussing with the gRPC team if you're contributing to the generator.
Also applies to: 94-103
🧰 Tools
🪛 Ruff (0.8.2)
77-77: Too many arguments in function definition (10 > 5)
(PLR0913)
src/protoc_gen_swagger/options/annotations_pb2.py (1)
22-22
: Consider usingnot
instead of equality comparison withFalse
While this is auto-generated code that shouldn't be manually modified, for future reference:
-if _descriptor._USE_C_DESCRIPTORS == False: +if not _descriptor._USE_C_DESCRIPTORS:🧰 Tools
🪛 Ruff (0.8.2)
22-22: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py (1)
23-24
: Improve comparison withFalse
The comparison
_descriptor._USE_C_DESCRIPTORS == False
should be replaced withnot _descriptor._USE_C_DESCRIPTORS
for better Python style.-if _descriptor._USE_C_DESCRIPTORS == False: +if not _descriptor._USE_C_DESCRIPTORS:🧰 Tools
🪛 Ruff (0.8.2)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (1)
155-155
: Long lines in gRPC method declarationsSome lines exceed the recommended 120 character limit. Consider breaking these into multiple lines.
Also applies to: 172-172
🧰 Tools
🪛 Ruff (0.8.2)
155-155: Line too long (123 > 120)
(E501)
src/scanoss/cli.py (1)
1285-1285
: Fix strip call with duplicate characters.The pipeline reports a warning about duplicate characters in the strip call.
- pac_local = pac.strip('file://') + pac_local = pac.replace('file://', '')🧰 Tools
🪛 GitHub Actions: Lint
[error] 1285-1285: PLE1310 String
strip
call contains duplicate characters.src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (1)
6-6
: Split the long import line
Static analysis flagged this line for exceeding 120 characters (E501). Consider splitting it for better readability and compliance with style guidelines.-from scanoss.api.scanning.v2 import scanoss_scanning_pb2 as scanoss_dot_api_dot_scanning_dot_v2_dot_scanoss__scanning__pb2 +from scanoss.api.scanning.v2 import ( + scanoss_scanning_pb2 as scanoss_dot_api_dot_scanning_dot_v2_dot_scanoss__scanning__pb2 +)🧰 Tools
🪛 Ruff (0.8.2)
6-6: Line too long (122 > 120)
(E501)
src/scanoss/utils/crc64.py (1)
56-65
: Consider supporting str type in the function signature
The method signature saysdata: bytes
, but there's a check convertingstr
to bytes. For clarity, either remove the check or includeUnion[str, bytes]
in the signature.-def update(self, data: bytes) -> None: +def update(self, data: Union[str, bytes]) -> None: if isinstance(data, str): data = data.encode('utf-8') ...src/scanoss/scanners/scanner_hfh.py (1)
94-126
: Improve error handling forscan()
.The
try/finally
block ensures the spinner always stops but does not re-raise any exceptions from the scanning process, which may suppress critical errors. Consider re-raising or explicitly returning an error indicator so that callers can handle failures.def scan(self) -> Optional[Dict]: ... try: response = self.client.folder_hash_scan(hfh_request) self.scan_results = response + except Exception as e: + self.base.print_stderr(f"Scan failure: {str(e)}") + raise finally: stop_spinner = True spinner_thread.join() spinner.finish() ...src/scanoss/api/components/v2/scanoss_components_pb2.py (1)
23-23
: Consider updating Protobuf generator configurationWhile this is an auto-generated file, the equality comparison
_descriptor._USE_C_DESCRIPTORS == False
could be written asnot _descriptor._USE_C_DESCRIPTORS
according to Python style guidelines.If possible, consider updating the configuration of your Protobuf generator in a future update.
🧰 Tools
🪛 Ruff (0.8.2)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
src/scanoss/file_filters.py (5)
49-59
: Avoid code duplication.
DEFAULT_SKIPPED_FILES_HFH
heavily overlaps withDEFAULT_SKIPPED_FILES
. Consider unifying these sets under a single constant and applying conditional logic (e.g., by a flag) instead of introducing a near-duplicate.
82-101
: Consider merging directory skip sets.Both
DEFAULT_SKIPPED_DIRS_HFH
andDEFAULT_SKIPPED_DIR_EXT_HFH
mimic existing sets. Unify them if possible (similar to how you might unify file skip sets) to maintain DRY principles and reduce future maintenance overhead.
276-430
: Watch out for repetition inDEFAULT_SKIPPED_EXT_HFH
.These lines replicate much of
DEFAULT_SKIPPED_EXT
. If there's only a slight difference in behavior, consider merging them to avoid inconsistent updates between scanning logic and folder hashing logic in the future.
439-457
: Constructor argument expansion.Allowing
**kwargs
is flexible; however, it can obscure which parameters are expected. Consider maintaining a simple typed signature or adding type hints and docstrings for each supported kwarg (especially forscanoss_settings
vs. others) to maintain clarity in the API.
706-726
: Consolidate skip file checks.The
_should_skip_file
method duplicates logic between scanning vs. folder hashing ifis_folder_hashing_scan
is set. Consider factoring out the overlap into a shared helper that merges or scopes these sets to avoid diverging patterns over time.src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (1)
57-98
: Unimplemented methods.All server methods raise
NotImplementedError
. If these methods aren't planned for immediate implementation, consider providing partial functionality or returning an explicit “not supported” response. This helps clarify the endpoint's behavior for users.🧰 Tools
🪛 Ruff (0.8.2)
79-79: Line too long (123 > 120)
(E501)
src/scanoss/utils/simhash.py (2)
100-109
: Consider implementing a check for the vector length.The fingerprint function assumes the input vector has 64 elements. Adding a validation check would make the function more robust.
def fingerprint(v: list) -> int: """ Given a 64-element vector, return a 64-bit fingerprint. For each bit i, if v[i] >= 0, set bit i to 1; otherwise leave it 0. """ + if len(v) != 64: + raise ValueError('simhash.fingerprint(): input vector must have 64 elements') f = 0 for i in range(64): if v[i] >= 0:
56-63
: Consider adding type hints for return values.The return type hints for the feature creation functions are correct but could be made more specific by using the actual class name.
-def new_feature(f: bytes) -> SimhashFeature: +def new_feature(f: bytes) -> 'SimhashFeature': """Return a new feature for the given byte slice with weight 1.""" return SimhashFeature(fnv1_64(f), 1) -def new_feature_with_weight(f: bytes, weight: int) -> SimhashFeature: +def new_feature_with_weight(f: bytes, weight: int) -> 'SimhashFeature': """Return a new feature for the given byte slice with the given weight.""" return SimhashFeature(fnv1_64(f), weight)src/protoc_gen_swagger/options/openapiv2_pb2.py (1)
22-22
: Minor style issue in equality comparison.While functional, using
not _descriptor._USE_C_DESCRIPTORS
is the preferred Python style over_descriptor._USE_C_DESCRIPTORS == False
.-if _descriptor._USE_C_DESCRIPTORS == False: +if not _descriptor._USE_C_DESCRIPTORS:🧰 Tools
🪛 Ruff (0.8.2)
22-22: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (42)
.github/workflows/lint.yml
(2 hunks)CHANGELOG.md
(2 hunks)CLIENT_HELP.md
(1 hunks)docs/source/index.rst
(1 hunks)pyproject.toml
(1 hunks)requirements.txt
(1 hunks)setup.cfg
(1 hunks)src/protoc_gen_swagger/options/annotations_pb2.py
(1 hunks)src/protoc_gen_swagger/options/annotations_pb2_grpc.py
(1 hunks)src/protoc_gen_swagger/options/openapiv2_pb2.py
(1 hunks)src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py
(1 hunks)src/scanoss/__init__.py
(1 hunks)src/scanoss/api/common/v2/scanoss_common_pb2.py
(1 hunks)src/scanoss/api/common/v2/scanoss_common_pb2_grpc.py
(1 hunks)src/scanoss/api/components/v2/scanoss_components_pb2.py
(1 hunks)src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py
(3 hunks)src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py
(1 hunks)src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py
(3 hunks)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py
(1 hunks)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py
(3 hunks)src/scanoss/api/provenance/v2/scanoss_provenance_pb2.py
(2 hunks)src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py
(1 hunks)src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
(1 hunks)src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py
(2 hunks)src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py
(1 hunks)src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py
(2 hunks)src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py
(1 hunks)src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py
(3 hunks)src/scanoss/cli.py
(19 hunks)src/scanoss/constants.py
(1 hunks)src/scanoss/file_filters.py
(12 hunks)src/scanoss/results.py
(6 hunks)src/scanoss/scanners/__init__.py
(1 hunks)src/scanoss/scanners/folder_hasher.py
(1 hunks)src/scanoss/scanners/scanner_config.py
(1 hunks)src/scanoss/scanners/scanner_hfh.py
(1 hunks)src/scanoss/scanossbase.py
(1 hunks)src/scanoss/scanossgrpc.py
(8 hunks)src/scanoss/utils/abstract_presenter.py
(1 hunks)src/scanoss/utils/crc64.py
(1 hunks)src/scanoss/utils/simhash.py
(1 hunks)version.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (14)
- setup.cfg
- requirements.txt
- src/protoc_gen_swagger/options/annotations_pb2_grpc.py
- src/scanoss/api/common/v2/scanoss_common_pb2_grpc.py
- src/scanoss/scanners/init.py
- src/protoc_gen_swagger/options/openapiv2_pb2_grpc.py
- src/scanoss/scanossbase.py
- src/scanoss/init.py
- src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py
- pyproject.toml
- CLIENT_HELP.md
- src/scanoss/scanners/scanner_config.py
- version.py
- src/scanoss/constants.py
🧰 Additional context used
🧬 Code Definitions (11)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (6)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (2)
Echo
(47-52)Echo
(111-125)src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (2)
Echo
(57-62)Echo
(145-159)src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (2)
Echo
(42-47)Echo
(94-108)src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)
src/scanoss/scanners/scanner_hfh.py (4)
src/scanoss/scanners/folder_hasher.py (5)
FolderHasher
(67-260)hash_directory
(105-124)present
(258-260)_format_json_output
(273-280)_format_plain_output
(282-290)src/scanoss/scanners/scanner_config.py (1)
ScannerConfig
(39-54)src/scanoss/scanossgrpc.py (2)
ScanossGrpc
(97-615)folder_hash_scan
(468-483)src/scanoss/utils/abstract_presenter.py (4)
AbstractPresenter
(8-68)present
(29-48)_format_json_output
(57-61)_format_plain_output
(64-68)
src/scanoss/utils/abstract_presenter.py (4)
src/scanoss/scanossbase.py (2)
ScanossBase
(28-101)print_to_file_or_stdout
(83-91)src/scanoss/results.py (3)
present
(261-263)_format_json_output
(63-84)_format_plain_output
(86-99)src/scanoss/scanners/folder_hasher.py (3)
present
(258-260)_format_json_output
(273-280)_format_plain_output
(282-290)src/scanoss/scanners/scanner_hfh.py (3)
present
(128-130)_format_json_output
(143-150)_format_plain_output
(152-160)
src/scanoss/file_filters.py (2)
src/scanoss/scanossbase.py (3)
ScanossBase
(28-101)print_msg
(51-56)print_debug
(58-63)src/scanoss/spdxlite.py (1)
print_debug
(61-66)
src/scanoss/scanossgrpc.py (3)
src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (1)
ProvenanceStub
(9-29)src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (3)
ScanningStub
(9-29)FolderHashScan
(44-49)FolderHashScan
(94-108)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (1)
DependenciesStub
(9-29)
src/scanoss/scanners/folder_hasher.py (5)
src/scanoss/file_filters.py (2)
FileFilters
(433-738)get_filtered_files_from_files
(514-564)src/scanoss/utils/abstract_presenter.py (4)
AbstractPresenter
(8-68)present
(29-48)_format_json_output
(57-61)_format_plain_output
(64-68)src/scanoss/utils/crc64.py (2)
CRC64
(29-96)get_hash_buff
(82-96)src/scanoss/utils/simhash.py (3)
simhash
(125-130)WordFeatureSet
(163-169)fingerprint
(100-109)src/scanoss/scanners/scanner_hfh.py (3)
present
(128-130)_format_json_output
(143-150)_format_plain_output
(152-160)
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (6)
src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (2)
Echo
(47-52)Echo
(111-125)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (2)
Echo
(42-47)Echo
(94-108)src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)
src/scanoss/cli.py (4)
src/scanoss/scanners/folder_hasher.py (2)
FolderHasher
(67-260)hash_directory
(105-124)src/scanoss/scanossgrpc.py (1)
ScanossGrpc
(97-615)src/scanoss/scanners/scanner_config.py (1)
create_scanner_config_from_args
(57-73)src/scanoss/scanners/scanner_hfh.py (2)
ScannerHFH
(41-130)scan
(94-126)
src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (7)
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (2)
Echo
(57-62)Echo
(145-159)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (2)
Echo
(42-47)Echo
(94-108)src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)src/scanoss/components.py (1)
Components
(37-357)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (6)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (2)
Echo
(47-52)Echo
(111-125)src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (2)
Echo
(57-62)Echo
(145-159)src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (2)
Echo
(42-47)Echo
(94-108)src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (6)
src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (2)
Echo
(47-52)Echo
(111-125)src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (2)
Echo
(57-62)Echo
(145-159)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (2)
Echo
(42-47)Echo
(94-108)src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)
🪛 Ruff (0.8.2)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py
6-6: Line too long (122 > 120)
(E501)
77-77: Too many arguments in function definition (10 > 5)
(PLR0913)
94-94: Too many arguments in function definition (10 > 5)
(PLR0913)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
19-19: Line too long (1925 > 120)
(E501)
23-23: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (379 > 120)
(E501)
27-27: Undefined name _SCANNING
(F821)
28-28: Undefined name _SCANNING
(F821)
29-29: Undefined name _SCANNING
(F821)
30-30: Undefined name _SCANNING
(F821)
30-30: Line too long (132 > 120)
(E501)
31-31: Undefined name _HFHREQUEST
(F821)
32-32: Undefined name _HFHREQUEST
(F821)
33-33: Undefined name _HFHREQUEST_CHILDREN
(F821)
34-34: Undefined name _HFHREQUEST_CHILDREN
(F821)
35-35: Undefined name _HFHRESPONSE
(F821)
36-36: Undefined name _HFHRESPONSE
(F821)
37-37: Undefined name _HFHRESPONSE_COMPONENT
(F821)
38-38: Undefined name _HFHRESPONSE_COMPONENT
(F821)
39-39: Undefined name _HFHRESPONSE_RESULT
(F821)
40-40: Undefined name _HFHRESPONSE_RESULT
(F821)
41-41: Undefined name _SCANNING
(F821)
42-42: Undefined name _SCANNING
(F821)
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py
19-19: Line too long (4142 > 120)
(E501)
23-23: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (400 > 120)
(E501)
27-27: Undefined name _CRYPTOGRAPHY
(F821)
28-28: Undefined name _CRYPTOGRAPHY
(F821)
28-28: Line too long (126 > 120)
(E501)
29-29: Undefined name _CRYPTOGRAPHY
(F821)
30-30: Undefined name _CRYPTOGRAPHY
(F821)
30-30: Line too long (138 > 120)
(E501)
31-31: Undefined name _CRYPTOGRAPHY
(F821)
32-32: Undefined name _CRYPTOGRAPHY
(F821)
32-32: Line too long (149 > 120)
(E501)
33-33: Undefined name _CRYPTOGRAPHY
(F821)
34-34: Undefined name _CRYPTOGRAPHY
(F821)
34-34: Line too long (145 > 120)
(E501)
35-35: Undefined name _CRYPTOGRAPHY
(F821)
36-36: Undefined name _CRYPTOGRAPHY
(F821)
36-36: Line too long (139 > 120)
(E501)
37-37: Undefined name _CRYPTOGRAPHY
(F821)
38-38: Undefined name _CRYPTOGRAPHY
(F821)
38-38: Line too long (141 > 120)
(E501)
39-39: Undefined name _ALGORITHM
(F821)
40-40: Undefined name _ALGORITHM
(F821)
41-41: Undefined name _ALGORITHMRESPONSE
(F821)
42-42: Undefined name _ALGORITHMRESPONSE
(F821)
43-43: Undefined name _ALGORITHMRESPONSE_PURLS
(F821)
44-44: Undefined name _ALGORITHMRESPONSE_PURLS
(F821)
45-45: Undefined name _ALGORITHMSINRANGERESPONSE
(F821)
46-46: Undefined name _ALGORITHMSINRANGERESPONSE
(F821)
47-47: Undefined name _ALGORITHMSINRANGERESPONSE_PURL
(F821)
48-48: Undefined name _ALGORITHMSINRANGERESPONSE_PURL
(F821)
49-49: Undefined name _VERSIONSINRANGERESPONSE
(F821)
50-50: Undefined name _VERSIONSINRANGERESPONSE
(F821)
51-51: Undefined name _VERSIONSINRANGERESPONSE_PURL
(F821)
52-52: Undefined name _VERSIONSINRANGERESPONSE_PURL
(F821)
53-53: Undefined name _HINT
(F821)
54-54: Undefined name _HINT
(F821)
55-55: Undefined name _HINTSRESPONSE
(F821)
56-56: Undefined name _HINTSRESPONSE
(F821)
57-57: Undefined name _HINTSRESPONSE_PURLS
(F821)
58-58: Undefined name _HINTSRESPONSE_PURLS
(F821)
59-59: Undefined name _HINTSINRANGERESPONSE
(F821)
60-60: Undefined name _HINTSINRANGERESPONSE
(F821)
61-61: Undefined name _HINTSINRANGERESPONSE_PURL
(F821)
62-62: Undefined name _HINTSINRANGERESPONSE_PURL
(F821)
63-63: Undefined name _CRYPTOGRAPHY
(F821)
64-64: Undefined name _CRYPTOGRAPHY
(F821)
src/scanoss/scanossgrpc.py
62-62: .api.dependencies.v2.scanoss_dependencies_pb2.DependencyResponse
imported but unused
Remove unused import: .api.dependencies.v2.scanoss_dependencies_pb2.DependencyResponse
(F401)
src/scanoss/api/components/v2/scanoss_components_pb2.py
19-19: Line too long (3679 > 120)
(E501)
23-23: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (389 > 120)
(E501)
27-27: Undefined name _COMPONENTS
(F821)
28-28: Undefined name _COMPONENTS
(F821)
28-28: Line too long (122 > 120)
(E501)
29-29: Undefined name _COMPONENTS
(F821)
30-30: Undefined name _COMPONENTS
(F821)
30-30: Line too long (136 > 120)
(E501)
31-31: Undefined name _COMPONENTS
(F821)
32-32: Undefined name _COMPONENTS
(F821)
32-32: Line too long (139 > 120)
(E501)
33-33: Undefined name _COMPONENTS
(F821)
34-34: Undefined name _COMPONENTS
(F821)
34-34: Line too long (144 > 120)
(E501)
35-35: Undefined name _COMPSEARCHREQUEST
(F821)
36-36: Undefined name _COMPSEARCHREQUEST
(F821)
37-37: Undefined name _COMPSTATISTIC
(F821)
38-38: Undefined name _COMPSTATISTIC
(F821)
39-39: Undefined name _COMPSTATISTIC_LANGUAGE
(F821)
40-40: Undefined name _COMPSTATISTIC_LANGUAGE
(F821)
41-41: Undefined name _COMPSTATISTICRESPONSE
(F821)
42-42: Undefined name _COMPSTATISTICRESPONSE
(F821)
43-43: Undefined name _COMPSTATISTICRESPONSE_PURLS
(F821)
44-44: Undefined name _COMPSTATISTICRESPONSE_PURLS
(F821)
45-45: Undefined name _COMPSEARCHRESPONSE
(F821)
46-46: Undefined name _COMPSEARCHRESPONSE
(F821)
47-47: Undefined name _COMPSEARCHRESPONSE_COMPONENT
(F821)
48-48: Undefined name _COMPSEARCHRESPONSE_COMPONENT
(F821)
49-49: Undefined name _COMPVERSIONREQUEST
(F821)
50-50: Undefined name _COMPVERSIONREQUEST
(F821)
51-51: Undefined name _COMPVERSIONRESPONSE
(F821)
52-52: Undefined name _COMPVERSIONRESPONSE
(F821)
53-53: Undefined name _COMPVERSIONRESPONSE_LICENSE
(F821)
54-54: Undefined name _COMPVERSIONRESPONSE_LICENSE
(F821)
55-55: Undefined name _COMPVERSIONRESPONSE_VERSION
(F821)
56-56: Undefined name _COMPVERSIONRESPONSE_VERSION
(F821)
57-57: Undefined name _COMPVERSIONRESPONSE_COMPONENT
(F821)
58-58: Undefined name _COMPVERSIONRESPONSE_COMPONENT
(F821)
59-59: Undefined name _COMPONENTS
(F821)
60-60: Undefined name _COMPONENTS
(F821)
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py
6-6: Line too long (138 > 120)
(E501)
79-79: Line too long (123 > 120)
(E501)
145-145: Too many arguments in function definition (10 > 5)
(PLR0913)
162-162: Too many arguments in function definition (10 > 5)
(PLR0913)
179-179: Too many arguments in function definition (10 > 5)
(PLR0913)
189-189: Line too long (127 > 120)
(E501)
196-196: Too many arguments in function definition (10 > 5)
(PLR0913)
206-206: Line too long (125 > 120)
(E501)
213-213: Too many arguments in function definition (10 > 5)
(PLR0913)
223-223: Line too long (122 > 120)
(E501)
230-230: Too many arguments in function definition (10 > 5)
(PLR0913)
240-240: Line too long (125 > 120)
(E501)
src/protoc_gen_swagger/options/openapiv2_pb2.py
18-18: Line too long (9607 > 120)
(E501)
22-22: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Undefined name _SWAGGER_RESPONSESENTRY
(F821)
27-27: Undefined name _SWAGGER_RESPONSESENTRY
(F821)
28-28: Undefined name _SWAGGER_EXTENSIONSENTRY
(F821)
29-29: Undefined name _SWAGGER_EXTENSIONSENTRY
(F821)
30-30: Undefined name _OPERATION_RESPONSESENTRY
(F821)
31-31: Undefined name _OPERATION_RESPONSESENTRY
(F821)
32-32: Undefined name _OPERATION_EXTENSIONSENTRY
(F821)
33-33: Undefined name _OPERATION_EXTENSIONSENTRY
(F821)
34-34: Undefined name _RESPONSE_HEADERSENTRY
(F821)
35-35: Undefined name _RESPONSE_HEADERSENTRY
(F821)
36-36: Undefined name _RESPONSE_EXAMPLESENTRY
(F821)
37-37: Undefined name _RESPONSE_EXAMPLESENTRY
(F821)
38-38: Undefined name _RESPONSE_EXTENSIONSENTRY
(F821)
39-39: Undefined name _RESPONSE_EXTENSIONSENTRY
(F821)
40-40: Undefined name _INFO_EXTENSIONSENTRY
(F821)
41-41: Undefined name _INFO_EXTENSIONSENTRY
(F821)
42-42: Undefined name _SCHEMA
(F821)
43-43: Undefined name _SCHEMA
(F821)
44-44: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY
(F821)
45-45: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY
(F821)
46-46: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY
(F821)
47-47: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY
(F821)
48-48: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY
(F821)
49-49: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY
(F821)
50-50: Undefined name _SCOPES_SCOPEENTRY
(F821)
51-51: Undefined name _SCOPES_SCOPEENTRY
(F821)
52-52: Undefined name _SWAGGER
(F821)
53-53: Undefined name _SWAGGER
(F821)
54-54: Undefined name _SWAGGER_RESPONSESENTRY
(F821)
55-55: Undefined name _SWAGGER_RESPONSESENTRY
(F821)
56-56: Undefined name _SWAGGER_EXTENSIONSENTRY
(F821)
57-57: Undefined name _SWAGGER_EXTENSIONSENTRY
(F821)
58-58: Undefined name _SWAGGER_SWAGGERSCHEME
(F821)
59-59: Undefined name _SWAGGER_SWAGGERSCHEME
(F821)
60-60: Undefined name _OPERATION
(F821)
61-61: Undefined name _OPERATION
(F821)
62-62: Undefined name _OPERATION_RESPONSESENTRY
(F821)
63-63: Undefined name _OPERATION_RESPONSESENTRY
(F821)
64-64: Undefined name _OPERATION_EXTENSIONSENTRY
(F821)
65-65: Undefined name _OPERATION_EXTENSIONSENTRY
(F821)
66-66: Undefined name _HEADER
(F821)
67-67: Undefined name _HEADER
(F821)
68-68: Undefined name _RESPONSE
(F821)
69-69: Undefined name _RESPONSE
(F821)
70-70: Undefined name _RESPONSE_HEADERSENTRY
(F821)
71-71: Undefined name _RESPONSE_HEADERSENTRY
(F821)
72-72: Undefined name _RESPONSE_EXAMPLESENTRY
(F821)
73-73: Undefined name _RESPONSE_EXAMPLESENTRY
(F821)
74-74: Undefined name _RESPONSE_EXTENSIONSENTRY
(F821)
75-75: Undefined name _RESPONSE_EXTENSIONSENTRY
(F821)
76-76: Undefined name _INFO
(F821)
77-77: Undefined name _INFO
(F821)
78-78: Undefined name _INFO_EXTENSIONSENTRY
(F821)
79-79: Undefined name _INFO_EXTENSIONSENTRY
(F821)
80-80: Undefined name _CONTACT
(F821)
81-81: Undefined name _CONTACT
(F821)
82-82: Undefined name _LICENSE
(F821)
83-83: Undefined name _LICENSE
(F821)
84-84: Undefined name _EXTERNALDOCUMENTATION
(F821)
85-85: Undefined name _EXTERNALDOCUMENTATION
(F821)
86-86: Undefined name _SCHEMA
(F821)
87-87: Undefined name _SCHEMA
(F821)
88-88: Undefined name _JSONSCHEMA
(F821)
89-89: Undefined name _JSONSCHEMA
(F821)
90-90: Undefined name _JSONSCHEMA_JSONSCHEMASIMPLETYPES
(F821)
91-91: Undefined name _JSONSCHEMA_JSONSCHEMASIMPLETYPES
(F821)
92-92: Undefined name _TAG
(F821)
93-93: Undefined name _TAG
(F821)
94-94: Undefined name _SECURITYDEFINITIONS
(F821)
95-95: Undefined name _SECURITYDEFINITIONS
(F821)
96-96: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY
(F821)
97-97: Undefined name _SECURITYDEFINITIONS_SECURITYENTRY
(F821)
98-98: Undefined name _SECURITYSCHEME
(F821)
99-99: Undefined name _SECURITYSCHEME
(F821)
100-100: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY
(F821)
101-101: Undefined name _SECURITYSCHEME_EXTENSIONSENTRY
(F821)
102-102: Undefined name _SECURITYSCHEME_TYPE
(F821)
103-103: Undefined name _SECURITYSCHEME_TYPE
(F821)
104-104: Undefined name _SECURITYSCHEME_IN
(F821)
105-105: Undefined name _SECURITYSCHEME_IN
(F821)
106-106: Undefined name _SECURITYSCHEME_FLOW
(F821)
107-107: Undefined name _SECURITYSCHEME_FLOW
(F821)
108-108: Undefined name _SECURITYREQUIREMENT
(F821)
109-109: Undefined name _SECURITYREQUIREMENT
(F821)
110-110: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTVALUE
(F821)
111-111: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTVALUE
(F821)
112-112: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY
(F821)
113-113: Undefined name _SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY
(F821)
114-114: Undefined name _SCOPES
(F821)
115-115: Undefined name _SCOPES
(F821)
116-116: Undefined name _SCOPES_SCOPEENTRY
(F821)
117-117: Undefined name _SCOPES_SCOPEENTRY
(F821)
src/protoc_gen_swagger/options/annotations_pb2.py
18-18: Line too long (1009 > 120)
(E501)
22-22: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
23-23: Undefined name openapiv2_swagger
(F821)
24-24: Undefined name openapiv2_operation
(F821)
25-25: Undefined name openapiv2_schema
(F821)
26-26: Undefined name openapiv2_tag
(F821)
27-27: Undefined name openapiv2_field
(F821)
src/scanoss/api/common/v2/scanoss_common_pb2.py
16-16: Line too long (845 > 120)
(E501)
20-20: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
24-24: Undefined name _STATUSCODE
(F821)
25-25: Undefined name _STATUSCODE
(F821)
26-26: Undefined name _STATUSRESPONSE
(F821)
27-27: Undefined name _STATUSRESPONSE
(F821)
28-28: Undefined name _ECHOREQUEST
(F821)
29-29: Undefined name _ECHOREQUEST
(F821)
30-30: Undefined name _ECHORESPONSE
(F821)
31-31: Undefined name _ECHORESPONSE
(F821)
32-32: Undefined name _PURLREQUEST
(F821)
33-33: Undefined name _PURLREQUEST
(F821)
34-34: Undefined name _PURLREQUEST_PURLS
(F821)
35-35: Undefined name _PURLREQUEST_PURLS
(F821)
src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py
6-6: Line too long (130 > 120)
(E501)
111-111: Too many arguments in function definition (10 > 5)
(PLR0913)
128-128: Too many arguments in function definition (10 > 5)
(PLR0913)
145-145: Too many arguments in function definition (10 > 5)
(PLR0913)
155-155: Line too long (123 > 120)
(E501)
162-162: Too many arguments in function definition (10 > 5)
(PLR0913)
172-172: Line too long (125 > 120)
(E501)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py
19-19: Line too long (2471 > 120)
(E501)
23-23: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (398 > 120)
(E501)
27-27: Undefined name _DEPENDENCIES
(F821)
28-28: Undefined name _DEPENDENCIES
(F821)
28-28: Line too long (126 > 120)
(E501)
29-29: Undefined name _DEPENDENCIES
(F821)
30-30: Undefined name _DEPENDENCIES
(F821)
30-30: Line too long (139 > 120)
(E501)
31-31: Undefined name _DEPENDENCYREQUEST
(F821)
32-32: Undefined name _DEPENDENCYREQUEST
(F821)
33-33: Undefined name _DEPENDENCYREQUEST_PURLS
(F821)
34-34: Undefined name _DEPENDENCYREQUEST_PURLS
(F821)
35-35: Undefined name _DEPENDENCYREQUEST_FILES
(F821)
36-36: Undefined name _DEPENDENCYREQUEST_FILES
(F821)
37-37: Undefined name _DEPENDENCYRESPONSE
(F821)
38-38: Undefined name _DEPENDENCYRESPONSE
(F821)
39-39: Undefined name _DEPENDENCYRESPONSE_LICENSES
(F821)
40-40: Undefined name _DEPENDENCYRESPONSE_LICENSES
(F821)
41-41: Undefined name _DEPENDENCYRESPONSE_DEPENDENCIES
(F821)
42-42: Undefined name _DEPENDENCYRESPONSE_DEPENDENCIES
(F821)
43-43: Undefined name _DEPENDENCYRESPONSE_FILES
(F821)
44-44: Undefined name _DEPENDENCYRESPONSE_FILES
(F821)
45-45: Undefined name _DEPENDENCIES
(F821)
46-46: Undefined name _DEPENDENCIES
(F821)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py
6-6: Line too long (138 > 120)
(E501)
77-77: Too many arguments in function definition (10 > 5)
(PLR0913)
94-94: Too many arguments in function definition (10 > 5)
(PLR0913)
104-104: Line too long (122 > 120)
(E501)
src/scanoss/api/provenance/v2/scanoss_provenance_pb2.py
23-23: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (389 > 120)
(E501)
27-27: Undefined name _PROVENANCE
(F821)
28-28: Undefined name _PROVENANCE
(F821)
28-28: Line too long (122 > 120)
(E501)
29-29: Undefined name _PROVENANCE
(F821)
30-30: Undefined name _PROVENANCE
(F821)
30-30: Line too long (142 > 120)
(E501)
31-31: Undefined name _PROVENANCERESPONSE
(F821)
32-32: Undefined name _PROVENANCERESPONSE
(F821)
33-33: Undefined name _PROVENANCERESPONSE_DECLAREDLOCATION
(F821)
34-34: Undefined name _PROVENANCERESPONSE_DECLAREDLOCATION
(F821)
35-35: Undefined name _PROVENANCERESPONSE_CURATEDLOCATION
(F821)
36-36: Undefined name _PROVENANCERESPONSE_CURATEDLOCATION
(F821)
37-37: Undefined name _PROVENANCERESPONSE_PURLS
(F821)
38-38: Undefined name _PROVENANCERESPONSE_PURLS
(F821)
39-39: Undefined name _PROVENANCE
(F821)
40-40: Undefined name _PROVENANCE
(F821)
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py
19-19: Line too long (1718 > 120)
(E501)
23-23: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (375 > 120)
(E501)
27-27: Undefined name _SEMGREP
(F821)
28-28: Undefined name _SEMGREP
(F821)
29-29: Undefined name _SEMGREP
(F821)
30-30: Undefined name _SEMGREP
(F821)
30-30: Line too long (123 > 120)
(E501)
31-31: Undefined name _SEMGREPRESPONSE
(F821)
32-32: Undefined name _SEMGREPRESPONSE
(F821)
33-33: Undefined name _SEMGREPRESPONSE_ISSUE
(F821)
34-34: Undefined name _SEMGREPRESPONSE_ISSUE
(F821)
35-35: Undefined name _SEMGREPRESPONSE_FILE
(F821)
36-36: Undefined name _SEMGREPRESPONSE_FILE
(F821)
37-37: Undefined name _SEMGREPRESPONSE_PURLS
(F821)
38-38: Undefined name _SEMGREPRESPONSE_PURLS
(F821)
39-39: Undefined name _SEMGREP
(F821)
40-40: Undefined name _SEMGREP
(F821)
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py
77-77: Too many arguments in function definition (10 > 5)
(PLR0913)
94-94: Too many arguments in function definition (10 > 5)
(PLR0913)
src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py
19-19: Line too long (2566 > 120)
(E501)
22-22: Line too long (124 > 120)
(E501)
23-23: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (419 > 120)
(E501)
27-27: Undefined name _VULNERABILITIES
(F821)
28-28: Undefined name _VULNERABILITIES
(F821)
28-28: Line too long (129 > 120)
(E501)
29-29: Undefined name _VULNERABILITIES
(F821)
30-30: Undefined name _VULNERABILITIES
(F821)
30-30: Line too long (132 > 120)
(E501)
31-31: Undefined name _VULNERABILITIES
(F821)
32-32: Undefined name _VULNERABILITIES
(F821)
32-32: Line too long (152 > 120)
(E501)
33-33: Undefined name _VULNERABILITYREQUEST
(F821)
34-34: Undefined name _VULNERABILITYREQUEST
(F821)
35-35: Undefined name _VULNERABILITYREQUEST_PURLS
(F821)
36-36: Undefined name _VULNERABILITYREQUEST_PURLS
(F821)
37-37: Undefined name _CPERESPONSE
(F821)
38-38: Undefined name _CPERESPONSE
(F821)
39-39: Undefined name _CPERESPONSE_PURLS
(F821)
40-40: Undefined name _CPERESPONSE_PURLS
(F821)
41-41: Undefined name _VULNERABILITYRESPONSE
(F821)
42-42: Undefined name _VULNERABILITYRESPONSE
(F821)
43-43: Undefined name _VULNERABILITYRESPONSE_VULNERABILITIES
(F821)
44-44: Undefined name _VULNERABILITYRESPONSE_VULNERABILITIES
(F821)
45-45: Undefined name _VULNERABILITYRESPONSE_PURLS
(F821)
46-46: Undefined name _VULNERABILITYRESPONSE_PURLS
(F821)
47-47: Undefined name _VULNERABILITIES
(F821)
48-48: Undefined name _VULNERABILITIES
(F821)
src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py
6-6: Line too long (150 > 120)
(E501)
94-94: Too many arguments in function definition (10 > 5)
(PLR0913)
111-111: Too many arguments in function definition (10 > 5)
(PLR0913)
128-128: Too many arguments in function definition (10 > 5)
(PLR0913)
138-138: Line too long (131 > 120)
(E501)
🪛 LanguageTool
CHANGELOG.md
[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-03-27 ### Added - Add folder-scan subcommand - Add folder-has...
(REPEATED_VERBS)
🪛 GitHub Actions: Lint
src/scanoss/scanossgrpc.py
[error] 62-62: F401 [*] .api.dependencies.v2.scanoss_dependencies_pb2.DependencyResponse
imported but unused. Remove unused import: .api.dependencies.v2.scanoss_dependencies_pb2.DependencyResponse
.
[error] 531-531: F841 Local variable SUCCEDED_WITH_WARNINGS_STATUS_CODE
is assigned to but never used. Remove assignment to unused variable SUCCEDED_WITH_WARNINGS_STATUS_CODE
.
[error] 532-532: F841 Local variable FAILED_STATUS_CODE
is assigned to but never used. Remove assignment to unused variable FAILED_STATUS_CODE
.
src/scanoss/cli.py
[error] 69-69: F811 Redefinition of unused DEFAULT_POST_SIZE
from line 47. Remove definition: DEFAULT_POST_SIZE
.
[error] 70-70: F811 Redefinition of unused DEFAULT_TIMEOUT
from line 49. Remove definition: DEFAULT_TIMEOUT
.
[error] 72-72: F811 Redefinition of unused DEFAULT_RETRY
from line 48. Remove definition: DEFAULT_RETRY
.
[error] 1285-1285: PLE1310 String strip
call contains duplicate characters.
🪛 YAMLlint (1.35.1)
.github/workflows/lint.yml
[error] 35-35: trailing spaces
(trailing-spaces)
[warning] 58-58: too many blank lines
(1 > 0) (empty-lines)
🔇 Additional comments (75)
.github/workflows/lint.yml (2)
41-41
: Confirm multi-line output formatting.
The multi-line syntax using theGITHUB_OUTPUT
variable is correctly applied in this section. Just ensure that this format remains consistent with other parts of the workflow if similar patterns are used.
44-45
: Logging clarity for filtered files.
The added log messages for before and after file filtering provide useful feedback during the workflow execution. This transparency will help diagnose issues if the filtering does not behave as expected.src/scanoss/api/provenance/v2/scanoss_provenance_pb2.py (1)
5-5
: Auto-generated code changes look good!These changes are part of auto-generated Protocol Buffer code, replacing the use of an intermediate
_globals
dictionary with direct calls toglobals()
. This is a standard pattern in newer versions of the protobuf compiler and doesn't affect functionality.Also applies to: 21-40
src/scanoss/api/common/v2/scanoss_common_pb2.py (1)
14-16
: Auto-generated code changes look good!These changes in the Protocol Buffer generated code match the pattern seen in other files, using direct calls to
globals()
and adjusting descriptor assignments. This is expected behavior from the protobuf compiler and doesn't affect functionality.Also applies to: 21-35
🧰 Tools
🪛 Ruff (0.8.2)
16-16: Line too long (845 > 120)
(E501)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (1)
6-6
: Reformatting changes look good!These changes improve code indentation and formatting consistency across the gRPC service definitions. The consolidation of import statements and consistent indentation of method parameters enhance readability without changing functionality.
Also applies to: 21-29, 54-63, 70-108
🧰 Tools
🪛 Ruff (0.8.2)
6-6: Line too long (138 > 120)
(E501)
src/protoc_gen_swagger/options/annotations_pb2.py (1)
18-18
: Auto-generated code changes look good!These changes follow the same pattern as other Protocol Buffer files in this PR, with consistent indentation of extension registrations and options assignments. This ensures uniform code style across the auto-generated files.
Also applies to: 23-30
🧰 Tools
🪛 Ruff (0.8.2)
18-18: Line too long (1009 > 120)
(E501)
docs/source/index.rst (1)
231-302
: Well-structured documentation for new commandsThe documentation for the new folder-scan and folder-hash commands follows the established pattern of the rest of the document. The descriptions and argument lists are clear and comprehensive.
src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (7)
6-6
: Simplified import statementGood cleanup of the import statement by removing unnecessary parentheses.
🧰 Tools
🪛 Ruff (0.8.2)
6-6: Line too long (130 > 120)
(E501)
20-39
: Improved method indentation and formattingThe changes improve the readability of the stub methods by using consistent indentation.
47-52
: Improved comment formattingThe Echo method documentation now uses a cleaner multi-line comment format.
54-73
: Consistent comment formatting across methodsAll method comments now follow the same multi-line format, improving consistency.
78-97
: Improved RPC handler indentationThe indentation of the RPC method handlers is now more consistent, making the code more readable.
99-100
: Consistent indentation for generic handlerThe improved indentation for the generic handler aligns with the rest of the code.
111-176
: Consistent parameter indentation in static methodsAll static methods now have consistent indentation for parameters and return statements, improving code readability.
🧰 Tools
🪛 Ruff (0.8.2)
111-111: Too many arguments in function definition (10 > 5)
(PLR0913)
128-128: Too many arguments in function definition (10 > 5)
(PLR0913)
145-145: Too many arguments in function definition (10 > 5)
(PLR0913)
155-155: Line too long (123 > 120)
(E501)
162-162: Too many arguments in function definition (10 > 5)
(PLR0913)
172-172: Line too long (125 > 120)
(E501)
CHANGELOG.md (2)
12-17
: Clear version documentation in changelogThe new version entry follows the established pattern and clearly lists the new features.
Just a minor grammatical note: The bullet points have redundant "Add" words - the section header already says "Added".
🧰 Tools
🪛 LanguageTool
[grammar] ~13-~13: You’ve repeated a verb. Did you mean to only write one of them?
Context: ...hanges... ## [1.21.0] - 2025-03-27 ### Added - Add folder-scan subcommand - Add folder-has...(REPEATED_VERBS)
500-501
: Updated version comparison linksVersion comparison links are properly updated to include the new version.
src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2.py (1)
19-48
: This generated code looks good.The restructuring of how descriptor options and serialized values are assigned maintains the same functionality while making the assignments more straightforward.
🧰 Tools
🪛 Ruff (0.8.2)
19-19: Line too long (2566 > 120)
(E501)
22-22: Line too long (124 > 120)
(E501)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (419 > 120)
(E501)
27-27: Undefined name
_VULNERABILITIES
(F821)
28-28: Undefined name
_VULNERABILITIES
(F821)
28-28: Line too long (129 > 120)
(E501)
29-29: Undefined name
_VULNERABILITIES
(F821)
30-30: Undefined name
_VULNERABILITIES
(F821)
30-30: Line too long (132 > 120)
(E501)
31-31: Undefined name
_VULNERABILITIES
(F821)
32-32: Undefined name
_VULNERABILITIES
(F821)
32-32: Line too long (152 > 120)
(E501)
33-33: Undefined name
_VULNERABILITYREQUEST
(F821)
34-34: Undefined name
_VULNERABILITYREQUEST
(F821)
35-35: Undefined name
_VULNERABILITYREQUEST_PURLS
(F821)
36-36: Undefined name
_VULNERABILITYREQUEST_PURLS
(F821)
37-37: Undefined name
_CPERESPONSE
(F821)
38-38: Undefined name
_CPERESPONSE
(F821)
39-39: Undefined name
_CPERESPONSE_PURLS
(F821)
40-40: Undefined name
_CPERESPONSE_PURLS
(F821)
41-41: Undefined name
_VULNERABILITYRESPONSE
(F821)
42-42: Undefined name
_VULNERABILITYRESPONSE
(F821)
43-43: Undefined name
_VULNERABILITYRESPONSE_VULNERABILITIES
(F821)
44-44: Undefined name
_VULNERABILITYRESPONSE_VULNERABILITIES
(F821)
45-45: Undefined name
_VULNERABILITYRESPONSE_PURLS
(F821)
46-46: Undefined name
_VULNERABILITYRESPONSE_PURLS
(F821)
47-47: Undefined name
_VULNERABILITIES
(F821)
48-48: Undefined name
_VULNERABILITIES
(F821)
src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (1)
5-142
: Formatting improvements look good.The changes improve code readability through consistent indentation and spacing without altering functionality.
🧰 Tools
🪛 Ruff (0.8.2)
6-6: Line too long (150 > 120)
(E501)
94-94: Too many arguments in function definition (10 > 5)
(PLR0913)
111-111: Too many arguments in function definition (10 > 5)
(PLR0913)
128-128: Too many arguments in function definition (10 > 5)
(PLR0913)
138-138: Line too long (131 > 120)
(E501)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (1)
19-42
: The new message types and method for folder hashing look good.The addition of
HFHRequest
andHFHResponse
message types along with theFolderHashScan
method properly supports the folder hashing functionality being implemented.🧰 Tools
🪛 Ruff (0.8.2)
19-19: Line too long (1925 > 120)
(E501)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (379 > 120)
(E501)
27-27: Undefined name
_SCANNING
(F821)
28-28: Undefined name
_SCANNING
(F821)
29-29: Undefined name
_SCANNING
(F821)
30-30: Undefined name
_SCANNING
(F821)
30-30: Line too long (132 > 120)
(E501)
31-31: Undefined name
_HFHREQUEST
(F821)
32-32: Undefined name
_HFHREQUEST
(F821)
33-33: Undefined name
_HFHREQUEST_CHILDREN
(F821)
34-34: Undefined name
_HFHREQUEST_CHILDREN
(F821)
35-35: Undefined name
_HFHRESPONSE
(F821)
36-36: Undefined name
_HFHRESPONSE
(F821)
37-37: Undefined name
_HFHRESPONSE_COMPONENT
(F821)
38-38: Undefined name
_HFHRESPONSE_COMPONENT
(F821)
39-39: Undefined name
_HFHRESPONSE_RESULT
(F821)
40-40: Undefined name
_HFHRESPONSE_RESULT
(F821)
41-41: Undefined name
_SCANNING
(F821)
42-42: Undefined name
_SCANNING
(F821)
src/scanoss/cli.py (5)
34-42
: Import organization looks good.The necessary imports for folder hashing functionality are properly added.
494-533
: New folder-scan command implementation looks good.The command structure and argument definitions are well-organized and include helpful descriptions. The threshold validation with
choices=range(1, 101)
ensures valid input.
535-552
: New folder-hash command implementation looks good.The command is properly structured with appropriate arguments and descriptions.
554-568
: Good implementation of settings options for both new commands.Adding settings support for the new commands maintains consistency with existing commands.
1662-1672
: Good implementation of settings helper function.The
get_scanoss_settings_from_args
function effectively extracts and handles the settings loading logic for reuse.src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (18)
10-12
: Docstring looks good
Clear documentation, no issues.
21-24
: No concerns
The unary_unary channel method for the Echo RPC is properly set up.
25-29
: No concerns
The unary_unary channel method for the FolderHashScan RPC is properly set up.
33-35
: Docstring looks good
Readable docstring with clear comments on the servicer methods.
38-42
: Stub method is fine
RaisingNotImplementedError
for the Echo method is standard for a stub definition.
44-49
: Stub method is fine
RaisingNotImplementedError
for the FolderHashScan method is also standard for a stub definition.
54-58
: No concerns
The'Echo'
method handler looks correct and consistent.
59-63
: No concerns
The'FolderHashScan'
method handler is also defined correctly.
65-66
: No concerns
The generic handler registration is properly structured.
70-70
: No concerns
The comment noting the experimental status is self-explanatory.
72-74
: Docstring looks good
Clear class-level docstring for the Scanning class.
77-77
: Ignore 'too many arguments'
These arguments match the auto-generated gRPC stubs. Adjusting them is generally not recommended.🧰 Tools
🪛 Ruff (0.8.2)
77-77: Too many arguments in function definition (10 > 5)
(PLR0913)
79-86
: Method signature appears correct
All parameters align with gRPC usage; no issues noted.
87-87
: No concerns
Return statement for the gRPC call is standard.
90-91
: No concerns
Standard usage of gRPC call options.
93-94
: Ignore 'too many arguments'
Same reasoning as the Echo method—auto-generated stub signature.🧰 Tools
🪛 Ruff (0.8.2)
94-94: Too many arguments in function definition (10 > 5)
(PLR0913)
95-103
: No concerns
The FolderHashScan method signature aligns with gRPC patterns.
104-108
: No concerns
Return statement is consistent with unary_unary usage.src/scanoss/utils/crc64.py (1)
1-54
: Logic and structure look solid
The CRC64 implementation (table generation, initialization, and usage) is correct and follows the ECMA polynomial.src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2.py (1)
19-64
: Auto-generated protobuffer code
Lines are flagged for exceeding line length (E501) and referencing undefined symbols, but these are typical in auto-generated code. Generally, no changes are recommended here.🧰 Tools
🪛 Ruff (0.8.2)
19-19: Line too long (4142 > 120)
(E501)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (400 > 120)
(E501)
27-27: Undefined name
_CRYPTOGRAPHY
(F821)
28-28: Undefined name
_CRYPTOGRAPHY
(F821)
28-28: Line too long (126 > 120)
(E501)
29-29: Undefined name
_CRYPTOGRAPHY
(F821)
30-30: Undefined name
_CRYPTOGRAPHY
(F821)
30-30: Line too long (138 > 120)
(E501)
31-31: Undefined name
_CRYPTOGRAPHY
(F821)
32-32: Undefined name
_CRYPTOGRAPHY
(F821)
32-32: Line too long (149 > 120)
(E501)
33-33: Undefined name
_CRYPTOGRAPHY
(F821)
34-34: Undefined name
_CRYPTOGRAPHY
(F821)
34-34: Line too long (145 > 120)
(E501)
35-35: Undefined name
_CRYPTOGRAPHY
(F821)
36-36: Undefined name
_CRYPTOGRAPHY
(F821)
36-36: Line too long (139 > 120)
(E501)
37-37: Undefined name
_CRYPTOGRAPHY
(F821)
38-38: Undefined name
_CRYPTOGRAPHY
(F821)
38-38: Line too long (141 > 120)
(E501)
39-39: Undefined name
_ALGORITHM
(F821)
40-40: Undefined name
_ALGORITHM
(F821)
41-41: Undefined name
_ALGORITHMRESPONSE
(F821)
42-42: Undefined name
_ALGORITHMRESPONSE
(F821)
43-43: Undefined name
_ALGORITHMRESPONSE_PURLS
(F821)
44-44: Undefined name
_ALGORITHMRESPONSE_PURLS
(F821)
45-45: Undefined name
_ALGORITHMSINRANGERESPONSE
(F821)
46-46: Undefined name
_ALGORITHMSINRANGERESPONSE
(F821)
47-47: Undefined name
_ALGORITHMSINRANGERESPONSE_PURL
(F821)
48-48: Undefined name
_ALGORITHMSINRANGERESPONSE_PURL
(F821)
49-49: Undefined name
_VERSIONSINRANGERESPONSE
(F821)
50-50: Undefined name
_VERSIONSINRANGERESPONSE
(F821)
51-51: Undefined name
_VERSIONSINRANGERESPONSE_PURL
(F821)
52-52: Undefined name
_VERSIONSINRANGERESPONSE_PURL
(F821)
53-53: Undefined name
_HINT
(F821)
54-54: Undefined name
_HINT
(F821)
55-55: Undefined name
_HINTSRESPONSE
(F821)
56-56: Undefined name
_HINTSRESPONSE
(F821)
57-57: Undefined name
_HINTSRESPONSE_PURLS
(F821)
58-58: Undefined name
_HINTSRESPONSE_PURLS
(F821)
59-59: Undefined name
_HINTSINRANGERESPONSE
(F821)
60-60: Undefined name
_HINTSINRANGERESPONSE
(F821)
61-61: Undefined name
_HINTSINRANGERESPONSE_PURL
(F821)
62-62: Undefined name
_HINTSINRANGERESPONSE_PURL
(F821)
63-63: Undefined name
_CRYPTOGRAPHY
(F821)
64-64: Undefined name
_CRYPTOGRAPHY
(F821)
src/scanoss/scanners/scanner_hfh.py (2)
49-55
: Validateclient
existence.If
client
is not provided (i.e.,None
), the subsequent gRPC call inscan()
will fail. Consider adding a check to exit early or initialize a defaultScanossGrpc
client.
152-160
: Docstring mismatch for_format_plain_output()
.While the method name and docstring refer to a "plain text" format, it actually returns JSON if
scan_results
is a dictionary. Update the docstring or convert the output to actual plain text for consistency.src/scanoss/utils/abstract_presenter.py (1)
1-69
: Looks good overall.The abstract methods
_format_json_output()
and_format_plain_output()
are well-defined, and the fallback behavior for invalid or unspecified format is coherent with the docstring. No immediate issues found.src/scanoss/scanossgrpc.py (2)
468-484
: Confirm request data for folder-hash scanning.Ensure that the
request
dictionary aligns with HFH requirements. Consider validating that'root'
and other required fields exist before calling_call_rpc
.
485-523
: Robust gRPC call structure.The
_call_rpc()
approach for parsing requests, appending metadata, and managing errors neatly centralizes gRPC logic. No further concerns.src/scanoss/results.py (6)
53-114
: Well-structured implementation of the new ResultsPresenter classThe creation of a dedicated presenter class that inherits from AbstractPresenter is a good design choice that separates presentation logic from data processing. This follows the Single Responsibility Principle and makes the codebase more maintainable.
The implementation handles edge cases well, particularly in the
_format_json_output
method with appropriate exception handling, and in_format_plain_output_item
with fallbacks for missing data.
93-94
: Fixed docstring and return value mismatchThe method now correctly returns a message string when there are no results to present, addressing the previous review comment.
146-156
: Clean integration of the presenter in the Results classThe initialization of the ResultsPresenter with the necessary parameters is well-structured. The Results class now delegates presentation responsibilities to the specialized presenter class, creating a cleaner separation of concerns.
171-171
: Improved error handling methodUsing
self.base.print_stderr
instead ofself.print_stderr
is consistent with the refactoring pattern in this class.
246-249
: Enhanced error specificityReplacing a generic Exception with ValueError provides more specific error information, which is a good practice for error handling. This makes debugging easier and provides clearer feedback to users.
262-263
: Clean delegation to presenterThe present method now simply delegates to the presenter's present method, which is a clean implementation of the delegation pattern.
src/scanoss/api/components/v2/scanoss_components_pb2.py (1)
19-60
: Auto-generated Protobuf file updatesThe updates to the DESCRIPTOR and serialized options are consistent with the changes needed to support the new folder scanning and hashing functionality mentioned in the PR objectives.
Note that static analysis tools flag several undefined names (like
_COMPONENTS
), but these are expected in auto-generated Protobuf files as they are defined at runtime by the Protobuf machinery.🧰 Tools
🪛 Ruff (0.8.2)
19-19: Line too long (3679 > 120)
(E501)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (389 > 120)
(E501)
27-27: Undefined name
_COMPONENTS
(F821)
28-28: Undefined name
_COMPONENTS
(F821)
28-28: Line too long (122 > 120)
(E501)
29-29: Undefined name
_COMPONENTS
(F821)
30-30: Undefined name
_COMPONENTS
(F821)
30-30: Line too long (136 > 120)
(E501)
31-31: Undefined name
_COMPONENTS
(F821)
32-32: Undefined name
_COMPONENTS
(F821)
32-32: Line too long (139 > 120)
(E501)
33-33: Undefined name
_COMPONENTS
(F821)
34-34: Undefined name
_COMPONENTS
(F821)
34-34: Line too long (144 > 120)
(E501)
35-35: Undefined name
_COMPSEARCHREQUEST
(F821)
36-36: Undefined name
_COMPSEARCHREQUEST
(F821)
37-37: Undefined name
_COMPSTATISTIC
(F821)
38-38: Undefined name
_COMPSTATISTIC
(F821)
39-39: Undefined name
_COMPSTATISTIC_LANGUAGE
(F821)
40-40: Undefined name
_COMPSTATISTIC_LANGUAGE
(F821)
41-41: Undefined name
_COMPSTATISTICRESPONSE
(F821)
42-42: Undefined name
_COMPSTATISTICRESPONSE
(F821)
43-43: Undefined name
_COMPSTATISTICRESPONSE_PURLS
(F821)
44-44: Undefined name
_COMPSTATISTICRESPONSE_PURLS
(F821)
45-45: Undefined name
_COMPSEARCHRESPONSE
(F821)
46-46: Undefined name
_COMPSEARCHRESPONSE
(F821)
47-47: Undefined name
_COMPSEARCHRESPONSE_COMPONENT
(F821)
48-48: Undefined name
_COMPSEARCHRESPONSE_COMPONENT
(F821)
49-49: Undefined name
_COMPVERSIONREQUEST
(F821)
50-50: Undefined name
_COMPVERSIONREQUEST
(F821)
51-51: Undefined name
_COMPVERSIONRESPONSE
(F821)
52-52: Undefined name
_COMPVERSIONRESPONSE
(F821)
53-53: Undefined name
_COMPVERSIONRESPONSE_LICENSE
(F821)
54-54: Undefined name
_COMPVERSIONRESPONSE_LICENSE
(F821)
55-55: Undefined name
_COMPVERSIONRESPONSE_VERSION
(F821)
56-56: Undefined name
_COMPVERSIONRESPONSE_VERSION
(F821)
57-57: Undefined name
_COMPVERSIONRESPONSE_COMPONENT
(F821)
58-58: Undefined name
_COMPVERSIONRESPONSE_COMPONENT
(F821)
59-59: Undefined name
_COMPONENTS
(F821)
60-60: Undefined name
_COMPONENTS
(F821)
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (1)
21-29
: Improved code formatting and readabilityThe changes to indentation and parameter alignment enhance code readability without changing functionality. This is consistent with formatting improvements in other gRPC files observed in the code snippets from relevant_code_snippets.
Also applies to: 37-39, 45-46, 54-63, 65-66, 77-91, 94-108
src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2.py (1)
19-40
: Auto-generated Protobuf file updatesThe updates to the DESCRIPTOR and serialized options/indices are consistent with the protobuf definitions needed to support the new functionality mentioned in the PR objectives.
Similar to other Protobuf files, the static analysis tools flag undefined variables that are expected to be defined at runtime by the Protobuf machinery.
🧰 Tools
🪛 Ruff (0.8.2)
19-19: Line too long (1718 > 120)
(E501)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (375 > 120)
(E501)
27-27: Undefined name
_SEMGREP
(F821)
28-28: Undefined name
_SEMGREP
(F821)
29-29: Undefined name
_SEMGREP
(F821)
30-30: Undefined name
_SEMGREP
(F821)
30-30: Line too long (123 > 120)
(E501)
31-31: Undefined name
_SEMGREPRESPONSE
(F821)
32-32: Undefined name
_SEMGREPRESPONSE
(F821)
33-33: Undefined name
_SEMGREPRESPONSE_ISSUE
(F821)
34-34: Undefined name
_SEMGREPRESPONSE_ISSUE
(F821)
35-35: Undefined name
_SEMGREPRESPONSE_FILE
(F821)
36-36: Undefined name
_SEMGREPRESPONSE_FILE
(F821)
37-37: Undefined name
_SEMGREPRESPONSE_PURLS
(F821)
38-38: Undefined name
_SEMGREPRESPONSE_PURLS
(F821)
39-39: Undefined name
_SEMGREP
(F821)
40-40: Undefined name
_SEMGREP
(F821)
src/scanoss/file_filters.py (3)
245-249
: Validate newly added extensions.Adding
.whml
,.pom
,.smtml
,.min.js
,.mf
, and.base64
is valuable but ensure that skipping these aligns with the overall scanning/hashing objectives. Consider confirming each extension with product owners or security leads to avoid skipping critical files by accident.
460-468
: Align with docstring defaults.All newly assigned fields from
**kwargs
match the docstrings, which is good consistency. However, verify whether passingis_folder_hashing_scan
asFalse
by default in all contexts is appropriate, or if you prefer to default toNone
for safer detection.
527-556
: Additional checks for hidden files and size-based skips look solid.The newly introduced logic for skipping hidden files/folders and empty files is consistent with the docstring. This helps to ensure you do not hash irrelevant or privileged data.
src/scanoss/scanners/folder_hasher.py (3)
158-159
: Use a chunk-based approach for hashing large files.
166-173
: Avoid duplicating file references in both parent and root nodes.
245-249
: Overall approach is sound.Your simhash for names/content and fallback conditions for insufficient file sets are well-structured. No concerns with the rest of the hashing pipeline.
src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (4)
6-6
: Trivial line length warning.The import line is reported by static analysis as exceeding the recommended length. Typically, these gRPC auto-generated lines are not a priority to fix unless your codebase enforces strict line length rules.
🧰 Tools
🪛 Ruff (0.8.2)
6-6: Line too long (138 > 120)
(E501)
30-49
: Introducing new RPC stubs.The stubs for
GetAlgorithmsInRange
,GetVersionsInRange
,GetHintsInRange
, andGetEncryptionHints
conform to the existing naming conventions and usage of request/response types. Implementation detail remains consistent with gRPC best practices.
102-131
: RPC method registration is correct.The method handlers reference the correct request/response objects. Properly updates the server with new endpoints. No issues found.
145-244
: Inline function definitions with many optional parameters.The gRPC experimental stubs define many optional parameters (e.g.,
channel_credentials
,call_credentials
). This is standard for advanced usage, yet triggers static analysis warnings. Usually safe to ignore for auto-generated code.🧰 Tools
🪛 Ruff (0.8.2)
145-145: Too many arguments in function definition (10 > 5)
(PLR0913)
162-162: Too many arguments in function definition (10 > 5)
(PLR0913)
179-179: Too many arguments in function definition (10 > 5)
(PLR0913)
189-189: Line too long (127 > 120)
(E501)
196-196: Too many arguments in function definition (10 > 5)
(PLR0913)
206-206: Line too long (125 > 120)
(E501)
213-213: Too many arguments in function definition (10 > 5)
(PLR0913)
223-223: Line too long (122 > 120)
(E501)
230-230: Too many arguments in function definition (10 > 5)
(PLR0913)
240-240: Line too long (125 > 120)
(E501)
src/scanoss/utils/simhash.py (6)
4-4
: Update copyright year.The copyright year is set to 2025, which is incorrect as it's beyond the current year.
- Copyright (c) 2025, SCANOSS + Copyright (c) 2024, SCANOSS
33-39
: LGTM: Well-implemented FNV-1 hashing algorithm.The implementation of the 64-bit FNV-1 hash function follows the standard algorithm with correct usage of the offset basis, prime multiplier, and bit masking.
42-53
: LGTM: Well-structured SimhashFeature class.The class provides a clean encapsulation of hash values with weights, which is essential for the simhash algorithm. The methods are well-documented and appropriately named.
66-81
: LGTM: Vectorize implementation correctly handles feature weights.The vectorize function properly processes feature weights, adding them for set bits and subtracting for unset bits, which is essential for creating an accurate representation of the feature space.
112-122
: LGTM: Optimal Hamming distance calculation.This compare function efficiently calculates the Hamming distance using the bit manipulation trick (v &= v - 1) to count set bits, which is more efficient than naive approaches.
183-198
: Add input validation to shingle function.The shingle function should validate that input list
b
is not empty and contains only bytes objects.def shingle(w: int, b: list) -> list: if w < 1: raise ValueError('simhash.shingle(): k must be a positive integer') + if not b: + raise ValueError('simhash.shingle(): input list cannot be empty') + if not all(isinstance(x, bytes) for x in b): + raise TypeError('simhash.shingle(): all elements must be bytes objects') if w == 1: return b w = min(w, len(b))src/protoc_gen_swagger/options/openapiv2_pb2.py (1)
18-117
: LGTM: Auto-generated protobuf code restoration.This correctly restores the serialized options and serialized start/end positions for the protocol buffer definitions, which is important for maintaining compatibility with the protobuf compiler.
The static analysis warnings for undefined names (F821) are false positives, as these variables are dynamically defined during the Protocol Buffer build process. Since this is an auto-generated file (as noted in the header comment), these warnings can be safely ignored.
🧰 Tools
🪛 Ruff (0.8.2)
18-18: Line too long (9607 > 120)
(E501)
22-22: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Undefined name
_SWAGGER_RESPONSESENTRY
(F821)
27-27: Undefined name
_SWAGGER_RESPONSESENTRY
(F821)
28-28: Undefined name
_SWAGGER_EXTENSIONSENTRY
(F821)
29-29: Undefined name
_SWAGGER_EXTENSIONSENTRY
(F821)
30-30: Undefined name
_OPERATION_RESPONSESENTRY
(F821)
31-31: Undefined name
_OPERATION_RESPONSESENTRY
(F821)
32-32: Undefined name
_OPERATION_EXTENSIONSENTRY
(F821)
33-33: Undefined name
_OPERATION_EXTENSIONSENTRY
(F821)
34-34: Undefined name
_RESPONSE_HEADERSENTRY
(F821)
35-35: Undefined name
_RESPONSE_HEADERSENTRY
(F821)
36-36: Undefined name
_RESPONSE_EXAMPLESENTRY
(F821)
37-37: Undefined name
_RESPONSE_EXAMPLESENTRY
(F821)
38-38: Undefined name
_RESPONSE_EXTENSIONSENTRY
(F821)
39-39: Undefined name
_RESPONSE_EXTENSIONSENTRY
(F821)
40-40: Undefined name
_INFO_EXTENSIONSENTRY
(F821)
41-41: Undefined name
_INFO_EXTENSIONSENTRY
(F821)
42-42: Undefined name
_SCHEMA
(F821)
43-43: Undefined name
_SCHEMA
(F821)
44-44: Undefined name
_SECURITYDEFINITIONS_SECURITYENTRY
(F821)
45-45: Undefined name
_SECURITYDEFINITIONS_SECURITYENTRY
(F821)
46-46: Undefined name
_SECURITYSCHEME_EXTENSIONSENTRY
(F821)
47-47: Undefined name
_SECURITYSCHEME_EXTENSIONSENTRY
(F821)
48-48: Undefined name
_SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY
(F821)
49-49: Undefined name
_SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY
(F821)
50-50: Undefined name
_SCOPES_SCOPEENTRY
(F821)
51-51: Undefined name
_SCOPES_SCOPEENTRY
(F821)
52-52: Undefined name
_SWAGGER
(F821)
53-53: Undefined name
_SWAGGER
(F821)
54-54: Undefined name
_SWAGGER_RESPONSESENTRY
(F821)
55-55: Undefined name
_SWAGGER_RESPONSESENTRY
(F821)
56-56: Undefined name
_SWAGGER_EXTENSIONSENTRY
(F821)
57-57: Undefined name
_SWAGGER_EXTENSIONSENTRY
(F821)
58-58: Undefined name
_SWAGGER_SWAGGERSCHEME
(F821)
59-59: Undefined name
_SWAGGER_SWAGGERSCHEME
(F821)
60-60: Undefined name
_OPERATION
(F821)
61-61: Undefined name
_OPERATION
(F821)
62-62: Undefined name
_OPERATION_RESPONSESENTRY
(F821)
63-63: Undefined name
_OPERATION_RESPONSESENTRY
(F821)
64-64: Undefined name
_OPERATION_EXTENSIONSENTRY
(F821)
65-65: Undefined name
_OPERATION_EXTENSIONSENTRY
(F821)
66-66: Undefined name
_HEADER
(F821)
67-67: Undefined name
_HEADER
(F821)
68-68: Undefined name
_RESPONSE
(F821)
69-69: Undefined name
_RESPONSE
(F821)
70-70: Undefined name
_RESPONSE_HEADERSENTRY
(F821)
71-71: Undefined name
_RESPONSE_HEADERSENTRY
(F821)
72-72: Undefined name
_RESPONSE_EXAMPLESENTRY
(F821)
73-73: Undefined name
_RESPONSE_EXAMPLESENTRY
(F821)
74-74: Undefined name
_RESPONSE_EXTENSIONSENTRY
(F821)
75-75: Undefined name
_RESPONSE_EXTENSIONSENTRY
(F821)
76-76: Undefined name
_INFO
(F821)
77-77: Undefined name
_INFO
(F821)
78-78: Undefined name
_INFO_EXTENSIONSENTRY
(F821)
79-79: Undefined name
_INFO_EXTENSIONSENTRY
(F821)
80-80: Undefined name
_CONTACT
(F821)
81-81: Undefined name
_CONTACT
(F821)
82-82: Undefined name
_LICENSE
(F821)
83-83: Undefined name
_LICENSE
(F821)
84-84: Undefined name
_EXTERNALDOCUMENTATION
(F821)
85-85: Undefined name
_EXTERNALDOCUMENTATION
(F821)
86-86: Undefined name
_SCHEMA
(F821)
87-87: Undefined name
_SCHEMA
(F821)
88-88: Undefined name
_JSONSCHEMA
(F821)
89-89: Undefined name
_JSONSCHEMA
(F821)
90-90: Undefined name
_JSONSCHEMA_JSONSCHEMASIMPLETYPES
(F821)
91-91: Undefined name
_JSONSCHEMA_JSONSCHEMASIMPLETYPES
(F821)
92-92: Undefined name
_TAG
(F821)
93-93: Undefined name
_TAG
(F821)
94-94: Undefined name
_SECURITYDEFINITIONS
(F821)
95-95: Undefined name
_SECURITYDEFINITIONS
(F821)
96-96: Undefined name
_SECURITYDEFINITIONS_SECURITYENTRY
(F821)
97-97: Undefined name
_SECURITYDEFINITIONS_SECURITYENTRY
(F821)
98-98: Undefined name
_SECURITYSCHEME
(F821)
99-99: Undefined name
_SECURITYSCHEME
(F821)
100-100: Undefined name
_SECURITYSCHEME_EXTENSIONSENTRY
(F821)
101-101: Undefined name
_SECURITYSCHEME_EXTENSIONSENTRY
(F821)
102-102: Undefined name
_SECURITYSCHEME_TYPE
(F821)
103-103: Undefined name
_SECURITYSCHEME_TYPE
(F821)
104-104: Undefined name
_SECURITYSCHEME_IN
(F821)
105-105: Undefined name
_SECURITYSCHEME_IN
(F821)
106-106: Undefined name
_SECURITYSCHEME_FLOW
(F821)
107-107: Undefined name
_SECURITYSCHEME_FLOW
(F821)
108-108: Undefined name
_SECURITYREQUIREMENT
(F821)
109-109: Undefined name
_SECURITYREQUIREMENT
(F821)
110-110: Undefined name
_SECURITYREQUIREMENT_SECURITYREQUIREMENTVALUE
(F821)
111-111: Undefined name
_SECURITYREQUIREMENT_SECURITYREQUIREMENTVALUE
(F821)
112-112: Undefined name
_SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY
(F821)
113-113: Undefined name
_SECURITYREQUIREMENT_SECURITYREQUIREMENTENTRY
(F821)
114-114: Undefined name
_SCOPES
(F821)
115-115: Undefined name
_SCOPES
(F821)
116-116: Undefined name
_SCOPES_SCOPEENTRY
(F821)
117-117: Undefined name
_SCOPES_SCOPEENTRY
(F821)
@classmethod | ||
def get_hash_buff(cls, buff: bytes) -> List[bytes]: | ||
""" | ||
Get the hash value of the given buffer, and converts it to 8 bytes in big-endian order. | ||
|
||
Args: | ||
buff (bytes): The buffer to get the hash value of. | ||
|
||
Returns: | ||
bytes: The hash value of the given buffer, and converts it to 8 bytes in big-endian order. | ||
""" | ||
crc = cls() | ||
crc.update(buff) | ||
hash_val = crc.digest() | ||
|
||
return list(struct.pack('>Q', hash_val)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Align docstring with return type
The docstring indicates returning bytes
, but the method returns a list
of integers. Decide whether you want a byte array or a list. For consistency, consider returning actual bytes:
- def get_hash_buff(cls, buff: bytes) -> List[bytes]:
...
- return list(struct.pack('>Q', hash_val))
+ def get_hash_buff(cls, buff: bytes) -> bytes:
...
+ return struct.pack('>Q', hash_val)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
@classmethod | |
def get_hash_buff(cls, buff: bytes) -> List[bytes]: | |
""" | |
Get the hash value of the given buffer, and converts it to 8 bytes in big-endian order. | |
Args: | |
buff (bytes): The buffer to get the hash value of. | |
Returns: | |
bytes: The hash value of the given buffer, and converts it to 8 bytes in big-endian order. | |
""" | |
crc = cls() | |
crc.update(buff) | |
hash_val = crc.digest() | |
return list(struct.pack('>Q', hash_val)) | |
@classmethod | |
def get_hash_buff(cls, buff: bytes) -> bytes: | |
""" | |
Get the hash value of the given buffer, and converts it to 8 bytes in big-endian order. | |
Args: | |
buff (bytes): The buffer to get the hash value of. | |
Returns: | |
bytes: The hash value of the given buffer, and converts it to 8 bytes in big-endian order. | |
""" | |
crc = cls() | |
crc.update(buff) | |
hash_val = crc.digest() | |
return struct.pack('>Q', hash_val) |
dir_path = Path(dirpath) | ||
rel_path = dir_path.relative_to(root_path) | ||
if dir_path.is_symlink(): # TODO should we skip symlink folders? | ||
self.print_msg(f'WARNING: Found symbolic link folder: {dir_path}') | ||
|
||
if self._should_skip_dir(str(rel_path)): # Current directory should be skipped | ||
if self.should_skip_dir(str(rel_path)): # Current directory should be skipped | ||
dirnames.clear() | ||
continue | ||
for filename in filenames: | ||
file_path = dirpath / filename | ||
file_path = dir_path / filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider skipping symlink directories.
Currently, you log a warning on encountering symlink folders but still descend them, unless the directory skip logic triggers. Evaluate whether you want to skip symlink directories altogether to avoid potential cyclical references or unexpected expansions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tests/test_file_filters.py (2)
195-211
: Consider adding tests for edge cases.The current test covers the basic case where files are relative to the base directory. Consider adding tests for:
- When
base_dir
is not provided (defaults to None)- When files are not relative to the base directory
- When the base directory itself contains characters that need special handling
204-204
: Parameter name inconsistency.Note that the parameter is named
base_dir
in the test butscan_root
in the implementation. While this doesn't affect functionality, consistent naming would improve code readability.- filtered_files = self.file_filters.get_filtered_files_from_files(files, self.test_dir) + filtered_files = self.file_filters.get_filtered_files_from_files(files, scan_root=self.test_dir)src/scanoss/cli.py (1)
1657-1667
: Improve the get_scanoss_settings_from_args function to handle all scenarios.The function doesn't explicitly handle the case when
args.skip_settings_file
is True, which could lead to confusion about what gets returned.def get_scanoss_settings_from_args(args): + """Extract SCANOSS settings from command-line arguments + + Args: + args: Parsed command-line arguments + + Returns: + ScanossSettings or None: The loaded settings object or None if settings should be skipped + """ scanoss_settings = None if not args.skip_settings_file: scanoss_settings = ScanossSettings(debug=args.debug, trace=args.trace, quiet=args.quiet) try: scanoss_settings.load_json_file(args.settings, args.scan_dir).set_file_type('new').set_scan_type('identify') except ScanossSettingsError as e: print_stderr(f'Error: {e}') sys.exit(1) - return scanoss_settings + return scanoss_settings
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/scanoss/cli.py
(20 hunks)src/scanoss/scanossgrpc.py
(7 hunks)tests/test_file_filters.py
(1 hunks)
🧰 Additional context used
🧬 Code Definitions (3)
tests/test_file_filters.py (1)
src/scanoss/file_filters.py (1)
get_filtered_files_from_files
(514-564)
src/scanoss/cli.py (4)
src/scanoss/scanners/folder_hasher.py (3)
FolderHasher
(67-260)create_folder_hasher_config_from_args
(55-64)hash_directory
(105-124)src/scanoss/scanossgrpc.py (1)
ScanossGrpc
(95-609)src/scanoss/scanners/scanner_hfh.py (2)
ScannerHFH
(41-130)scan
(94-126)src/scanoss/scanoss_settings.py (2)
ScanossSettings
(72-303)load_json_file
(99-134)
src/scanoss/scanossgrpc.py (2)
src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (1)
ProvenanceStub
(9-29)src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (3)
ScanningStub
(9-29)FolderHashScan
(44-49)FolderHashScan
(94-108)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (13)
tests/test_file_filters.py (2)
200-204
: LGTM: Updated test with new base_dir parameter.The test has been correctly modified to use the new parameter
base_dir
in theget_filtered_files_from_files
method. This change aligns with the implementation insrc/scanoss/file_filters.py
where the method now accepts ascan_root
parameter.
207-209
: LGTM: Simplified expected results.The expected results have been simplified to use direct string representations of file paths instead of using
os.path.relpath
. This change makes the test more readable and directly represents the expected output format of theget_filtered_files_from_files
method with the new parameter.src/scanoss/cli.py (5)
489-528
: Well-structured argument parsing for the new folder-scan command.The implementation of the
folder-scan
command options is clean and comprehensive, with appropriate defaults and descriptions.I particularly like the validation for the threshold parameter using
choices=range(1, 101)
with a descriptivemetavar='1-100'
that clearly communicates the valid range to users.
530-548
: Good implementation of the folder-hash command with clear options.The folder-hash command is well-structured with appropriate arguments.
549-562
: Consistent settings options across commands.Adding settings options to both new commands and existing ones provides a consistent user experience and reuses existing functionality.
1584-1622
: Enhance error handling in the folder_hashing_scan function.While the basic error handling is in place, consider adding:
- More specific error types for better error reporting
- Cleanup of resources in case of failures
- Logging of scan progress and errors
def folder_hashing_scan(parser, args): try: if not args.scan_dir: print_stderr('ERROR: Please specify a directory to scan') parser.parse_args([args.subparser, '-h']) sys.exit(1) if not os.path.exists(args.scan_dir) or not os.path.isdir(args.scan_dir): print_stderr(f'ERROR: The specified directory {args.scan_dir} does not exist') sys.exit(1) + if not os.access(args.scan_dir, os.R_OK): + print_stderr(f'ERROR: No read permission for directory {args.scan_dir}') + sys.exit(1) + scanner_config = create_scanner_config_from_args(args) scanoss_settings = get_scanoss_settings_from_args(args) grpc_config = create_grpc_config_from_args(args) client = ScanossGrpc(**asdict(grpc_config)) + try: scanner = ScannerHFH( scan_dir=args.scan_dir, config=scanner_config, client=client, scanoss_settings=scanoss_settings, ) scanner.best_match = args.best_match scanner.threshold = args.threshold scanner.scan() scanner.present(output_file=args.output, output_format=args.format) + finally: + client.close() # Ensure client is properly closed + except ScanossGrpcError as e: print_stderr(f'ERROR: {e}') sys.exit(1) + except Exception as e: + print_stderr(f'ERROR: Unexpected error during scan: {e}') + sys.exit(1)
1624-1654
: Enhance error handling and add progress feedback.Consider adding more specific error handling and progress feedback.
def folder_hash(parser, args): """Run the "folder-hash" sub-command Args: parser (ArgumentParser): command line parser object args (Namespace): Parsed arguments """ try: if not args.scan_dir: print_stderr('ERROR: Please specify a directory to scan') parser.parse_args([args.subparser, '-h']) sys.exit(1) if not os.path.exists(args.scan_dir) or not os.path.isdir(args.scan_dir): print_stderr(f'ERROR: The specified directory {args.scan_dir} does not exist') sys.exit(1) + if not os.access(args.scan_dir, os.R_OK): + print_stderr(f'ERROR: No read permission for directory {args.scan_dir}') + sys.exit(1) + folder_hasher_config = create_folder_hasher_config_from_args(args) scanoss_settings = get_scanoss_settings_from_args(args) + if not args.quiet: + print_stderr(f'Hashing directory: {args.scan_dir}') + folder_hasher = FolderHasher( scan_dir=args.scan_dir, config=folder_hasher_config, scanoss_settings=scanoss_settings, ) folder_hasher.hash_directory(args.scan_dir) folder_hasher.present(output_file=args.output, output_format=args.format) - except Exception as e: + except (OSError, IOError) as e: + print_stderr(f'ERROR: File system error: {e}') + sys.exit(1) + except ValueError as e: + print_stderr(f'ERROR: Invalid input: {e}') + sys.exit(1) + except Exception as e: print_stderr(f'ERROR: {e}') sys.exit(1)src/scanoss/scanossgrpc.py (6)
78-93
: Good implementation of custom exception and status code enum.Creating a dedicated exception class and using an enum for status codes improves code readability and maintainability.
173-185
: Clean integration of the scanning stub.The scanning stub is properly initialized for both secure and insecure connections.
465-480
: Well-implemented folder hash scan function.The
folder_hash_scan
method effectively leverages the newly added_call_rpc
helper method to reduce code duplication.
482-520
: Excellent abstraction with the _call_rpc helper method.The
_call_rpc
method is a well-designed abstraction that:
- Handles common gRPC call patterns
- Provides good error handling with specific exceptions
- Simplifies request preparation and response parsing
- Includes appropriate debug logging
This significantly reduces code duplication across gRPC methods.
532-543
: Improved readability with enum-based status code checks.Replacing hardcoded status code checks with the
ScanossGrpcStatusCode
enum improves code readability and maintainability.
617-646
: Well-structured configuration with GrpcConfig dataclass.The
GrpcConfig
dataclass andcreate_grpc_config_from_args
helper function provide a clean way to encapsulate and create gRPC configuration from command-line arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/scanoss/scanners/folder_hasher.py (1)
282-290
: Redundant implementation in _format_plain_output.The
_format_plain_output
method is almost identical to_format_json_output
and includes a check for whether the tree is a dictionary, but the tree should always be a dictionary as returned by_hash_calc_from_node
.Consider simplifying this method:
- def _format_plain_output(self) -> str: - """ - Format the scan output data into a plain text string - """ - return ( - json.dumps(self.folder_hasher.tree, indent=2) - if isinstance(self.folder_hasher.tree, dict) - else str(self.folder_hasher.tree) - ) + def _format_plain_output(self) -> str: + """ + Format the scan output data into a plain text string + """ + return json.dumps(self.folder_hasher.tree, indent=2)Alternatively, if it's intended to have different formatting for different output types, implement actual differences between JSON and plain text formats.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/scanoss/scanners/folder_hasher.py
(1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
src/scanoss/scanners/folder_hasher.py (4)
src/scanoss/scanossbase.py (2)
ScanossBase
(28-101)print_debug
(58-63)src/scanoss/utils/crc64.py (2)
CRC64
(29-96)get_hash_buff
(82-96)src/scanoss/utils/simhash.py (4)
simhash
(125-130)WordFeatureSet
(163-169)fingerprint
(100-109)vectorize_bytes
(84-97)src/scanoss/scanners/scanner_hfh.py (3)
present
(128-130)_format_json_output
(143-150)_format_plain_output
(152-160)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (11)
src/scanoss/scanners/folder_hasher.py (11)
16-18
: Clear and well-defined constants.The constants are explicitly defined with descriptive names, making the code more maintainable and the threshold values easy to adjust if needed.
21-31
: Good use of class structure for directory representation.The
DirectoryNode
class effectively represents the hierarchical nature of directories with a clean separation between child nodes and files.
33-42
: Well-structured DirectoryFile class.This class efficiently stores file metadata including path and hash information in both raw bytes and string format.
44-53
: Good use of dataclass for configuration.The
FolderHasherConfig
dataclass provides a clean way to handle configuration with sensible defaults.
55-64
: Defensive coding in configuration creation.The use of
getattr
with default values is a good practice for handling potentially missing attributes from the args object.
143-144
: Verify file name length filtering logic.The condition
len(f.name.encode('utf-8')) <= MINIMUM_FILE_NAME_LENGTH
filters out files with names longer thanMINIMUM_FILE_NAME_LENGTH
, which seems counter-intuitive given the variable name suggests a minimum rather than maximum length.Please confirm if this logic is intentional. If
MINIMUM_FILE_NAME_LENGTH
is intended to be a maximum, consider renaming it toMAXIMUM_FILE_NAME_LENGTH
for clarity.
158-159
: Use a streaming or chunk-based approach for hashing large files.Reading the entire file into memory with
read_bytes()
might cause performance issues or memory errors when processing very large files.Consider implementing a chunked approach for file reading:
- file_bytes = full_file_path.read_bytes() - key = CRC64.get_hash_buff(file_bytes) + crc = CRC64() + with open(full_file_path, 'rb') as f: + while chunk := f.read(8192): # 8KB chunks + crc.update(chunk) + key = list(struct.pack('>Q', crc.digest()))
165-173
: Avoid storing file references in both parent and child nodes.Appending the same
DirectoryFile
object to each parent directory and the root node may lead to memory inefficiency and duplicate data structures.Consider maintaining the file reference only in the leaf node corresponding to its actual directory:
- current_node.files.append(file_item) + # Only append file to the final directory node + current_node.files.append(file_item) - root_node.files.append(file_item)
205-256
: Well-implemented hash calculation with proper edge case handling.The method correctly checks for minimum requirements before calculating hashes and effectively deduplicates files. The simhash implementation provides a robust way to generate fingerprints for directories.
258-261
: Good use of delegation pattern for presenting results.The
present
method effectively delegates the presentation logic to the presenter class, promoting clean separation of concerns.
235-249
: Thorough validation of minimum requirements.The code correctly checks if both the minimum file count and minimum concatenated name length requirements are met before proceeding with hash calculations. This prevents generating unreliable hashes for directories with too few files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (1)
23-23
: Minor style issue: boolean comparison.The condition uses
== False
instead of the more Pythonicis False
or simplynot
.-if _descriptor._USE_C_DESCRIPTORS == False: +if not _descriptor._USE_C_DESCRIPTORS:🧰 Tools
🪛 Ruff (0.8.2)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py
19-19: Line too long (2012 > 120)
(E501)
23-23: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (379 > 120)
(E501)
27-27: Undefined name _SCANNING
(F821)
28-28: Undefined name _SCANNING
(F821)
29-29: Undefined name _SCANNING
(F821)
30-30: Undefined name _SCANNING
(F821)
30-30: Line too long (132 > 120)
(E501)
31-31: Undefined name _HFHREQUEST
(F821)
32-32: Undefined name _HFHREQUEST
(F821)
33-33: Undefined name _HFHREQUEST_CHILDREN
(F821)
34-34: Undefined name _HFHREQUEST_CHILDREN
(F821)
35-35: Undefined name _HFHRESPONSE
(F821)
36-36: Undefined name _HFHRESPONSE
(F821)
37-37: Undefined name _HFHRESPONSE_COMPONENT
(F821)
38-38: Undefined name _HFHRESPONSE_COMPONENT
(F821)
39-39: Undefined name _HFHRESPONSE_RESULT
(F821)
40-40: Undefined name _HFHRESPONSE_RESULT
(F821)
41-41: Undefined name _SCANNING
(F821)
42-42: Undefined name _SCANNING
(F821)
🔇 Additional comments (2)
src/scanoss/api/scanning/v2/scanoss_scanning_pb2.py (2)
19-42
: New folder hashing functionality successfully implemented.The code correctly adds new Protocol Buffer message types (
HFHRequest
andHFHResponse
) and theFolderHashScan
method as intended by the PR objectives. These changes enable support for folder hashing in the SCANOSS API.🧰 Tools
🪛 Ruff (0.8.2)
19-19: Line too long (2012 > 120)
(E501)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (379 > 120)
(E501)
27-27: Undefined name
_SCANNING
(F821)
28-28: Undefined name
_SCANNING
(F821)
29-29: Undefined name
_SCANNING
(F821)
30-30: Undefined name
_SCANNING
(F821)
30-30: Line too long (132 > 120)
(E501)
31-31: Undefined name
_HFHREQUEST
(F821)
32-32: Undefined name
_HFHREQUEST
(F821)
33-33: Undefined name
_HFHREQUEST_CHILDREN
(F821)
34-34: Undefined name
_HFHREQUEST_CHILDREN
(F821)
35-35: Undefined name
_HFHRESPONSE
(F821)
36-36: Undefined name
_HFHRESPONSE
(F821)
37-37: Undefined name
_HFHRESPONSE_COMPONENT
(F821)
38-38: Undefined name
_HFHRESPONSE_COMPONENT
(F821)
39-39: Undefined name
_HFHRESPONSE_RESULT
(F821)
40-40: Undefined name
_HFHRESPONSE_RESULT
(F821)
41-41: Undefined name
_SCANNING
(F821)
42-42: Undefined name
_SCANNING
(F821)
19-42
: Static analysis warnings can be safely ignored.The static analysis tool flagged several issues (line length and undefined names like
_SCANNING
,_HFHREQUEST
, etc.), but these are normal for Protocol Buffer generated files. These symbols are defined at runtime by the Protocol Buffer compiler's code generation process, and the long lines are expected in serialized data.🧰 Tools
🪛 Ruff (0.8.2)
19-19: Line too long (2012 > 120)
(E501)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (379 > 120)
(E501)
27-27: Undefined name
_SCANNING
(F821)
28-28: Undefined name
_SCANNING
(F821)
29-29: Undefined name
_SCANNING
(F821)
30-30: Undefined name
_SCANNING
(F821)
30-30: Line too long (132 > 120)
(E501)
31-31: Undefined name
_HFHREQUEST
(F821)
32-32: Undefined name
_HFHREQUEST
(F821)
33-33: Undefined name
_HFHREQUEST_CHILDREN
(F821)
34-34: Undefined name
_HFHREQUEST_CHILDREN
(F821)
35-35: Undefined name
_HFHRESPONSE
(F821)
36-36: Undefined name
_HFHRESPONSE
(F821)
37-37: Undefined name
_HFHRESPONSE_COMPONENT
(F821)
38-38: Undefined name
_HFHRESPONSE_COMPONENT
(F821)
39-39: Undefined name
_HFHRESPONSE_RESULT
(F821)
40-40: Undefined name
_HFHRESPONSE_RESULT
(F821)
41-41: Undefined name
_SCANNING
(F821)
42-42: Undefined name
_SCANNING
(F821)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py (1)
23-24
: Consider using Pythonic comparison for boolean values.The comparison
== False
could be replaced with the more Pythonicis False
or simplynot
.- if _descriptor._USE_C_DESCRIPTORS == False: + if not _descriptor._USE_C_DESCRIPTORS:🧰 Tools
🪛 Ruff (0.8.2)
23-23: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/scanoss/api/common/v2/scanoss_common_pb2.py
(1 hunks)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py
(1 hunks)src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py
(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (6)
src/scanoss/api/components/v2/scanoss_components_pb2_grpc.py (2)
Echo
(47-52)Echo
(111-125)src/scanoss/api/provenance/v2/scanoss_provenance_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)src/scanoss/api/cryptography/v2/scanoss_cryptography_pb2_grpc.py (2)
Echo
(57-62)Echo
(145-159)src/scanoss/api/scanning/v2/scanoss_scanning_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)src/scanoss/api/semgrep/v2/scanoss_semgrep_pb2_grpc.py (2)
Echo
(37-42)Echo
(77-91)src/scanoss/api/vulnerabilities/v2/scanoss_vulnerabilities_pb2_grpc.py (2)
Echo
(42-47)Echo
(94-108)
🪛 Ruff (0.8.2)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py
6-6: Line too long (138 > 120)
(E501)
94-94: Too many arguments in function definition (10 > 5)
(PLR0913)
111-111: Too many arguments in function definition (10 > 5)
(PLR0913)
121-121: Line too long (122 > 120)
(E501)
128-128: Too many arguments in function definition (10 > 5)
(PLR0913)
138-138: Line too long (132 > 120)
(E501)
src/scanoss/api/common/v2/scanoss_common_pb2.py
16-16: Line too long (933 > 120)
(E501)
20-20: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
24-24: Undefined name _STATUSCODE
(F821)
25-25: Undefined name _STATUSCODE
(F821)
26-26: Undefined name _STATUSRESPONSE
(F821)
27-27: Undefined name _STATUSRESPONSE
(F821)
28-28: Undefined name _ECHOREQUEST
(F821)
29-29: Undefined name _ECHOREQUEST
(F821)
30-30: Undefined name _ECHORESPONSE
(F821)
31-31: Undefined name _ECHORESPONSE
(F821)
32-32: Undefined name _PURLREQUEST
(F821)
33-33: Undefined name _PURLREQUEST
(F821)
34-34: Undefined name _PURLREQUEST_PURLS
(F821)
35-35: Undefined name _PURLREQUEST_PURLS
(F821)
36-36: Undefined name _PURL
(F821)
37-37: Undefined name _PURL
(F821)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py
19-19: Line too long (3278 > 120)
(E501)
23-23: Avoid equality comparisons to False
; use if not _descriptor._USE_C_DESCRIPTORS:
for false checks
Replace with not _descriptor._USE_C_DESCRIPTORS
(E712)
26-26: Line too long (398 > 120)
(E501)
27-27: Undefined name _DEPENDENCIES
(F821)
28-28: Undefined name _DEPENDENCIES
(F821)
28-28: Line too long (126 > 120)
(E501)
29-29: Undefined name _DEPENDENCIES
(F821)
30-30: Undefined name _DEPENDENCIES
(F821)
30-30: Line too long (139 > 120)
(E501)
31-31: Undefined name _DEPENDENCIES
(F821)
32-32: Undefined name _DEPENDENCIES
(F821)
32-32: Line too long (150 > 120)
(E501)
33-33: Undefined name _DEPENDENCYREQUEST
(F821)
34-34: Undefined name _DEPENDENCYREQUEST
(F821)
35-35: Undefined name _DEPENDENCYREQUEST_PURLS
(F821)
36-36: Undefined name _DEPENDENCYREQUEST_PURLS
(F821)
37-37: Undefined name _DEPENDENCYREQUEST_FILES
(F821)
38-38: Undefined name _DEPENDENCYREQUEST_FILES
(F821)
39-39: Undefined name _DEPENDENCYRESPONSE
(F821)
40-40: Undefined name _DEPENDENCYRESPONSE
(F821)
41-41: Undefined name _DEPENDENCYRESPONSE_LICENSES
(F821)
42-42: Undefined name _DEPENDENCYRESPONSE_LICENSES
(F821)
43-43: Undefined name _DEPENDENCYRESPONSE_DEPENDENCIES
(F821)
44-44: Undefined name _DEPENDENCYRESPONSE_DEPENDENCIES
(F821)
45-45: Undefined name _DEPENDENCYRESPONSE_FILES
(F821)
46-46: Undefined name _DEPENDENCYRESPONSE_FILES
(F821)
47-47: Undefined name _TRANSITIVEDEPENDENCYREQUEST
(F821)
48-48: Undefined name _TRANSITIVEDEPENDENCYREQUEST
(F821)
49-49: Undefined name _TRANSITIVEDEPENDENCYRESPONSE
(F821)
50-50: Undefined name _TRANSITIVEDEPENDENCYRESPONSE
(F821)
51-51: Undefined name _TRANSITIVEDEPENDENCYRESPONSE_DEPENDENCIES
(F821)
52-52: Undefined name _TRANSITIVEDEPENDENCYRESPONSE_DEPENDENCIES
(F821)
53-53: Undefined name _DEPENDENCIES
(F821)
54-54: Undefined name _DEPENDENCIES
(F821)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: build
🔇 Additional comments (11)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2_grpc.py (4)
30-34
: Addition of GetTransitiveDependencies to DependenciesStub class looks good.The new method follows the same pattern as the existing methods, correctly setting up the gRPC channel with appropriate serializers and deserializers.
56-61
: Implementation of GetTransitiveDependencies servicer method follows established patterns.The method is correctly set up with the Not Implemented error as a placeholder, consistent with other methods in this generated file.
76-80
: ServiceHandler registration for GetTransitiveDependencies is properly configured.The method handler is correctly registered with the appropriate request deserializer and response serializer.
127-142
: The experimental Dependencies class implementation is correctly extended.The new GetTransitiveDependencies static method follows the same pattern as other methods in the class, with consistent parameter handling and return structure.
🧰 Tools
🪛 Ruff (0.8.2)
128-128: Too many arguments in function definition (10 > 5)
(PLR0913)
138-138: Line too long (132 > 120)
(E501)
src/scanoss/api/dependencies/v2/scanoss_dependencies_pb2.py (3)
19-19
: The serialized descriptor is properly updated for the new method.The serialized file descriptor is correctly modified to include the new GetTransitiveDependencies method.
🧰 Tools
🪛 Ruff (0.8.2)
19-19: Line too long (3278 > 120)
(E501)
31-32
: Proper configuration of GetTransitiveDependencies API endpoint.The serialized options correctly define the API endpoint as
/api/v2/dependencies/transitive
for the new GetTransitiveDependencies method.🧰 Tools
🪛 Ruff (0.8.2)
31-31: Undefined name
_DEPENDENCIES
(F821)
32-32: Undefined name
_DEPENDENCIES
(F821)
32-32: Line too long (150 > 120)
(E501)
47-52
: Serialized start and end positions are properly defined for TransitiveDependency classes.The serialized start and end positions for the new TransitiveDependencyRequest and TransitiveDependencyResponse classes are correctly defined to match the updated protocol buffer definition.
🧰 Tools
🪛 Ruff (0.8.2)
47-47: Undefined name
_TRANSITIVEDEPENDENCYREQUEST
(F821)
48-48: Undefined name
_TRANSITIVEDEPENDENCYREQUEST
(F821)
49-49: Undefined name
_TRANSITIVEDEPENDENCYRESPONSE
(F821)
50-50: Undefined name
_TRANSITIVEDEPENDENCYRESPONSE
(F821)
51-51: Undefined name
_TRANSITIVEDEPENDENCYRESPONSE_DEPENDENCIES
(F821)
52-52: Undefined name
_TRANSITIVEDEPENDENCYRESPONSE_DEPENDENCIES
(F821)
src/scanoss/api/common/v2/scanoss_common_pb2.py (4)
1-3
: Important: This is an auto-generated fileThis file is generated by the protocol buffer compiler and should not be manually edited, as indicated by the comment on line 2. Any changes should be made to the .proto file and then regenerated.
14-16
: LGTM: Formatting changes to the serialized protobuf dataThe changes to the DESCRIPTOR variable are formatting-related, consolidating the serialized protocol buffer definition into a more compact representation without line breaks. This is a common style for generated protobuf files and doesn't affect functionality.
🧰 Tools
🪛 Ruff (0.8.2)
16-16: Line too long (933 > 120)
(E501)
20-20
: Note on comparison style in generated codeThe comparison
_descriptor._USE_C_DESCRIPTORS == False
is flagged by the linter, but since this is generated code, it should not be manually modified.🧰 Tools
🪛 Ruff (0.8.2)
20-20: Avoid equality comparisons to
False
; useif not _descriptor._USE_C_DESCRIPTORS:
for false checksReplace with
not _descriptor._USE_C_DESCRIPTORS
(E712)
21-37
: Updated serialized offsets are correctThe changes to the serialized start and end values are appropriate as they reflect the updated positions in the serialized protobuf data structure. This ensures correct parsing of the protocol buffer messages.
🧰 Tools
🪛 Ruff (0.8.2)
24-24: Undefined name
_STATUSCODE
(F821)
25-25: Undefined name
_STATUSCODE
(F821)
26-26: Undefined name
_STATUSRESPONSE
(F821)
27-27: Undefined name
_STATUSRESPONSE
(F821)
28-28: Undefined name
_ECHOREQUEST
(F821)
29-29: Undefined name
_ECHOREQUEST
(F821)
30-30: Undefined name
_ECHORESPONSE
(F821)
31-31: Undefined name
_ECHORESPONSE
(F821)
32-32: Undefined name
_PURLREQUEST
(F821)
33-33: Undefined name
_PURLREQUEST
(F821)
34-34: Undefined name
_PURLREQUEST_PURLS
(F821)
35-35: Undefined name
_PURLREQUEST_PURLS
(F821)
36-36: Undefined name
_PURL
(F821)
37-37: Undefined name
_PURL
(F821)
Summary by CodeRabbit
New Features
AbstractPresenter
class for output presentation in various formats.CHANGELOG.md
to reflect new versioning and features.Refactor
Documentation
CHANGELOG.md
.