[After 2.7] Fix SonarCloud issues#381
Conversation
📝 WalkthroughWalkthroughConsolidates dataframe schema definitions via a new DataframeSchemaMixin and JobHostSummarySchema, simplifies set-to-list conversions when sorting, tightens exception types across storage/collector modules, renames unused parameters, adjusts merge/aggregation lambdas to bind loop variables, and adds/updates extensive tests for validation and merge behavior. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Reviewed by Cursor Bugbot for commit 320813a. Configure here.
…45 method name clash - Fix python:S3403 (BLOCKER BUG) in package_crc.py: `super()` was compared with `is False` which always evaluated to False since `super()` returns a proxy object, not a boolean. Changed to call `super().is_shipping_configured()` and check `not ret`. - Fix python:S1845 (BLOCKER CODE_SMELL) in base/package.py: Renamed classmethod `max_data_size` to `get_max_data_size` to avoid naming clash/confusion with the class attribute `MAX_DATA_SIZE`. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…, S5713, S6735 - Fix python:S1186 (empty methods) in base_traditional.py and dataframe_engine/base.py: Added explanatory comments and changed pass to meaningful return values in stub static methods (cast_types, data_columns, operations, unique_index_columns, build_dataframe). - Fix python:S5727 (identity checks always True) in test files: Replaced assert workbook is not None (openpyxl never returns None) with meaningful assertions (assert len(workbook.sheetnames) > 0). Fixed assert json_string is not None with assert len(json_string) > 0. - Fix python:S1192 (duplicate string literals) in job_host_summary.py and dataframe_jobhost_summary_usage.py: Extracted 'datetime64[ns]' into a DATETIME64_NS constant. - Fix python:S1515 (lambda closure over loop variable) in base_traditional.py, dataframe_engine/base.py, and renewal_guidance.py: Added default parameter c=col / s=serial to bind loop variable at lambda definition time. - Fix python:S5713 (redundant Exception subclass in except tuple) in events_modules_anonymized_rollup.py and test_complex_CCSPv2_with_canonical_facts.py: Removed json.JSONDecodeError since it is a subclass of ValueError. - Fix python:S6735 (missing validate parameter on pd.merge) in base_traditional.py and dataframe_engine/base.py: Added validate='many_to_many'. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…81, S1481 - Fix python:S112 (generic Exception class) in total_workers_vcpu.py, util.py, storage/crc.py, storage/segment.py, storage/directory.py, storage/s3.py: Replaced bare Exception raises with more specific RuntimeError, TypeError, or ValueError as appropriate. - Fix python:S125 (commented-out code) in base_dataframe.py, job_host_summary.py, storage/util.py, test_factory.py, report/base.py, report_ccsp_v2.py, and dataframe_jobhost_summary_usage.py: Removed dead commented code blocks. - Fix python:S1172 (unused function parameters) in extractors.py (local -> _local), utils.py (last_gather/save_last_gather now use **_kwargs), helpers.py (key,last_gather -> _key,_last_gather), base_anonymized_rollup.py (dataframe -> _dataframe), report/base.py (removed unused mode parameter). - Fix python:S1871 (duplicate if-elif branches) in base_anonymized_rollup.py: Merged identical list and dict branches into a single isinstance(value, (list,dict)) check. - Fix python:S5781 (duplicate set literal entry) in test_renewal_guidance.py: Removed duplicate None from set literal. - Fix python:S1481 (unused loop variable) in renewal_guidance.py: Replaced i with _. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…8521, S7519, S7494 - Fix python:S7508 (redundant list() before sorted()) in anonymized_rollups.py, credentials_anonymized_rollup.py, events_modules_anonymized_rollup.py, test_jobs_anonymized_rollups.py, test_multiple_files.py, report/base.py: Removed unnecessary list() wrapping since sorted() accepts any iterable. - Fix python:S6353 (use \w instead of [A-Za-z0-9_]) in events_modules_anonymized_rollup.py: Simplified regex patterns to use \w shorthand. - Fix python:S7504 (unnecessary list() on iterable) in renewal_guidance.py: Removed redundant list() in list comprehension iteration. - Fix python:S7500 (replace comprehension with collection constructor) in conftest.py and report_saver_s3.py: Simplified unnecessary comprehension wrappers. - Fix python:S8521 (unnecessary .keys() call) in test_gathering.py and test_slicing.py: Removed .keys() calls since dict membership testing is implicit. - Fix python:S7519 (use dict.fromkeys) in management_utility.py: Replaced dict comprehension with dict.fromkeys() call. - Fix python:S7494 (use set comprehension) in main_jobevent_service.py: Replaced set() constructor with generator with a set comprehension literal. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
320813a to
b599a99
Compare
… pattern In Python 3, \w matches Unicode word characters (accented letters, CJK, etc.) by default. The original explicit character class [A-Za-z0-9_] correctly restricts matching to ASCII Ansible collection name characters only. Addresses Cursor Bugbot review comment on PR ansible#381. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Introduced a new constant DATETIME64_NS in metric_utils.py for consistent datetime representation. - Removed commented-out code in Base class methods in base.py and base_traditional.py to enhance clarity and maintainability. - Updated imports in dataframe_jobhost_summary_usage.py and job_host_summary.py to utilize the new constant. This commit improves code readability and standardizes datetime handling across the project.
- Introduced new tests to validate that DictOutput raises a TypeError when provided with non-dict input. - Added a test for CollectionOutput to ensure it raises a TypeError when given non-list input. - Updated imports in test_collectors_util.py to include the new output classes. This commit enhances the robustness of the utility by ensuring proper error handling for invalid input types.
- Extract shared sanity check logic into run_report_sanity_check() in conftest.py, eliminating the identical test_command body duplicated across test_complex_CCSP_sanity_check.py and test_complex_CCSPv2_sanity_check.py - Replace hardcoded /tmp path with pytest tmp_path fixture in test_collection_output_raises_type_error_for_non_list to avoid world-writable directory security warning Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The 4 static schema methods (unique_index_columns, data_columns, cast_types, operations) were character-identical between DataframeJobhostSummaryUsage and DataframeJobHostSummary, causing SonarCloud to report 75% duplication on new code in dataframe_jobhost_summary_usage.py. Move the method return values to shared constants in metric_utils.py (JOB_HOST_SUMMARY_INDEX_COLUMNS, JOB_HOST_SUMMARY_DATA_COLUMNS, JOB_HOST_SUMMARY_CAST_TYPES, JOB_HOST_SUMMARY_OPERATIONS) and replace the multi-line method bodies with single-line constant references in both classes. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Duplication (5.4% → target ≤3%): - Introduce DataframeSchemaMixin in dataframe_schema.py with the 4 stub methods (cast_types/data_columns/operations/unique_index_columns) - Introduce JobHostSummarySchema(DataframeSchemaMixin) with the shared job-host-summary schema overrides - Base and BaseTraditional inherit from DataframeSchemaMixin; removing the duplicated stub bodies from both files - DataframeJobhostSummaryUsage and DataframeJobHostSummary inherit from JobHostSummarySchema; removing the duplicated one-liner bodies from both files Coverage (78.1% → target ≥90%): - test_dataframe_stubs.py: cover DataframeSchemaMixin and JobHostSummarySchema directly in dataframe_schema.py - test_summarize_merge_ops.py: cover combine_set/combine_json/ combine_json_values lambda branches and validate='many_to_many' path in both Base.merge() and BaseTraditional.merge() - test_coverage_gaps.py: cover BaseAnonymizedRollup.base(), ManagementUtility.get_commands(), ReportSaverS3.report_exist(), StorageCRC._put() RuntimeError, EventModulesAnonymizedRollup host_ids branch, PackageCRC.is_shipping_configured() super() call Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
/cursor review |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
metrics_utility/test/library/test_summarize_merge_ops.py (2)
57-91: Merge tests pass but don't exercise many-to-many semantics.Both tests use disjoint single-row indexes, which is effectively a one-to-one outer join — they confirm
merge()doesn't crash and returns 2 rows but don't actually validate the many-to-many path. Fine to keep as a smoke test for the SonarCloud fix; just noting that duplicate keys on both sides would more directly cover the validate parameter behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@metrics_utility/test/library/test_summarize_merge_ops.py` around lines 57 - 91, Update the two tests (test_base_traditional_merge_many_to_many and test_base_engine_merge_many_to_many) so they use duplicate keys on both sides to actually exercise many-to-many semantics: build df1 and df2 with repeated 'id' values (e.g., id duplicated) so the merge triggers the many-to-many path in Base.merge()/ _ConcreteTraditional.merge and assert the expected expanded row count (e.g., 4 for two duplicates on each side) instead of using disjoint single-row indexes.
30-33: Weak assertion forcombine_json_values.The test only verifies key
'k'is present, not that values from both inputs are actually combined. Givencombine_json_valuesbuilds a set per key, a stronger assertion exercises the real behavior:♻️ Proposed stronger assertion
-def test_summarize_combine_json_values(): - df = pd.DataFrame({'col_x': [{'k': 'v1'}], 'col_y': [{'k': 'v2'}]}) - result = _bt().summarize_merged_dataframes(df, ['col'], operations={'col': 'combine_json_values'}) - assert 'k' in result['col'].iloc[0] +def test_summarize_combine_json_values(): + df = pd.DataFrame({'col_x': [{'k': 'v1'}], 'col_y': [{'k': 'v2'}]}) + result = _bt().summarize_merged_dataframes(df, ['col'], operations={'col': 'combine_json_values'}) + assert result['col'].iloc[0] == {'k': {'v1', 'v2'}}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@metrics_utility/test/library/test_summarize_merge_ops.py` around lines 30 - 33, The test test_summarize_combine_json_values has a weak assertion that only checks the presence of key 'k' but not that values from both input dicts were combined; update the assertion on the result of _bt().summarize_merged_dataframes (operation 'combine_json_values') to assert that the value for 'k' contains both 'v1' and 'v2' (e.g., that the set or list at result['col'].iloc[0]['k'] equals or includes both values) so the test verifies the actual combining behavior.metrics_utility/library/dataframes/base_traditional.py (1)
14-14: Mixin defaults silently return empty schemas.
DataframeSchemaMixinreturns{}/[]for all schema methods, so any subclass ofBaseTraditionalthat forgets to override (e.g.)unique_index_columns()will silently merge/cast on empty keys and produce wrong rollups instead of failing fast. The previous local placeholders presumably had the same property, so this isn't a regression — but worth either documenting the contract on the mixin or having the defaults raiseNotImplementedError(or usingabc.abstractmethod) to surface mistakes early.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@metrics_utility/library/dataframes/base_traditional.py` at line 14, DataframeSchemaMixin currently returns empty dicts/lists which lets subclasses like BaseTraditional silently use empty schemas; change the mixin so its schema accessor methods (e.g., unique_index_columns, index_columns, schema_fields — whatever schema methods exist on DataframeSchemaMixin) either raise NotImplementedError by default or, better, mark them as abstract using abc.abstractmethod so concrete subclasses must implement them; update DataframeSchemaMixin accordingly and ensure BaseTraditional (and its subclasses) implement the required methods to avoid silent empty-schema behavior.metrics_utility/automation_controller_billing/report_saver/report_saver_s3.py (1)
19-19: Avoid materializing entire iterable just to check non-emptiness.If
list_filesreturns an iterator/generator over potentially many S3 objects,list(...)walks all of them. Usingany()short-circuits on the first item.♻️ Proposed refactor
- return len(list(self.s3_handler.list_files(self.report_spreadsheet_destination_path))) > 0 + return any(True for _ in self.s3_handler.list_files(self.report_spreadsheet_destination_path))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@metrics_utility/automation_controller_billing/report_saver/report_saver_s3.py` at line 19, The current check materializes the entire iterator by doing len(list(self.s3_handler.list_files(self.report_spreadsheet_destination_path))) > 0; replace this with a short-circuiting check using any(self.s3_handler.list_files(self.report_spreadsheet_destination_path)) so the generator/iterator returned by list_files is not fully consumed (update the return expression in the method that contains this line).metrics_utility/test/library/test_storage_s3.py (1)
92-94: LGTM — covers the newValueErrorcontract.Asserting both the exception type and
match='bucket not set'pins down the public failure mode ofStorageS3()without abucket.One small consideration: the message in
s3.pyis'StorageS3: bucket not set'while other tests in the suite (e.g.test_storage_directory.py) likely match the prefixed form. Worth verifying thematchregex is the intended substring across the storage test family for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@metrics_utility/test/library/test_storage_s3.py` around lines 92 - 94, The test test_missing_bucket_raises_value_error asserts the error message is 'bucket not set' but the implementation in s3.py raises "StorageS3: bucket not set", causing an inconsistency with other storage tests; update the test_missing_bucket_raises_value_error to match the actual prefixed message (e.g., use match='StorageS3: bucket not set' or a regex that accepts the prefix) or alternatively change the raise message in StorageS3 (in s3.py) to the unprefixed form—ensure consistency across StorageS3, the test function test_missing_bucket_raises_value_error, and other storage tests like test_storage_directory.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@metrics_utility/library/dataframes/base_traditional.py`:
- Line 79: The merge call using pd.merge(rollup, new_group,
on=self.unique_index_columns(), how='outer', validate='many_to_many') should
enforce uniqueness of the merge keys; change validate from 'many_to_many' to
'one_to_one' so that duplicates in rollup or new_group (which should have been
prevented by dedup()/regroup()) raise an error and prevent silent data
corruption; locate the pd.merge invocation that references rollup, new_group and
self.unique_index_columns() and update its validate argument accordingly,
keeping the outer merge semantics if that behavior is required.
In `@metrics_utility/library/storage/segment.py`:
- Around line 108-110: The conditional in StorageSegment that reads "if filename
or fileobj or dict is None:" conflates missing filename/fileobj with a missing
dict and produces a misleading message; change the check to explicitly test for
(filename is None and fileobj is None) and separately for (dict is None) so you
can raise distinct ValueError messages—one saying "StorageSegment: filename and
fileobj both missing; provide filename or fileobj" and the other saying
"StorageSegment: dict missing; use dict="—so locate the check in the
StorageSegment constructor or the function that validates inputs (references:
filename, fileobj, dict) and split the conditions accordingly to raise the
appropriate message for each missing parameter case.
In `@metrics_utility/test/library/test_storage_segment.py`:
- Around line 154-157: The test name and behavior are mismatched: update the
test in test_storage_segment.py so it actually verifies the "no dict" branch of
StorageSegment.put instead of the filename branch—remove the filename='foo.json'
argument from the call to storage.put(artifact_name='test', ...) so the call
exercises the missing dict path and keeps the pytest.raises(ValueError,
match='use dict=') assertion; alternatively, if you want to lock both branches,
add a separate test named test_put_with_filename_raises_value_error that passes
filename='foo.json' and asserts the same ValueError for the unsupported filename
branch.
---
Nitpick comments:
In
`@metrics_utility/automation_controller_billing/report_saver/report_saver_s3.py`:
- Line 19: The current check materializes the entire iterator by doing
len(list(self.s3_handler.list_files(self.report_spreadsheet_destination_path)))
> 0; replace this with a short-circuiting check using
any(self.s3_handler.list_files(self.report_spreadsheet_destination_path)) so the
generator/iterator returned by list_files is not fully consumed (update the
return expression in the method that contains this line).
In `@metrics_utility/library/dataframes/base_traditional.py`:
- Line 14: DataframeSchemaMixin currently returns empty dicts/lists which lets
subclasses like BaseTraditional silently use empty schemas; change the mixin so
its schema accessor methods (e.g., unique_index_columns, index_columns,
schema_fields — whatever schema methods exist on DataframeSchemaMixin) either
raise NotImplementedError by default or, better, mark them as abstract using
abc.abstractmethod so concrete subclasses must implement them; update
DataframeSchemaMixin accordingly and ensure BaseTraditional (and its subclasses)
implement the required methods to avoid silent empty-schema behavior.
In `@metrics_utility/test/library/test_storage_s3.py`:
- Around line 92-94: The test test_missing_bucket_raises_value_error asserts the
error message is 'bucket not set' but the implementation in s3.py raises
"StorageS3: bucket not set", causing an inconsistency with other storage tests;
update the test_missing_bucket_raises_value_error to match the actual prefixed
message (e.g., use match='StorageS3: bucket not set' or a regex that accepts the
prefix) or alternatively change the raise message in StorageS3 (in s3.py) to the
unprefixed form—ensure consistency across StorageS3, the test function
test_missing_bucket_raises_value_error, and other storage tests like
test_storage_directory.py.
In `@metrics_utility/test/library/test_summarize_merge_ops.py`:
- Around line 57-91: Update the two tests
(test_base_traditional_merge_many_to_many and
test_base_engine_merge_many_to_many) so they use duplicate keys on both sides to
actually exercise many-to-many semantics: build df1 and df2 with repeated 'id'
values (e.g., id duplicated) so the merge triggers the many-to-many path in
Base.merge()/ _ConcreteTraditional.merge and assert the expected expanded row
count (e.g., 4 for two duplicates on each side) instead of using disjoint
single-row indexes.
- Around line 30-33: The test test_summarize_combine_json_values has a weak
assertion that only checks the presence of key 'k' but not that values from both
input dicts were combined; update the assertion on the result of
_bt().summarize_merged_dataframes (operation 'combine_json_values') to assert
that the value for 'k' contains both 'v1' and 'v2' (e.g., that the set or list
at result['col'].iloc[0]['k'] equals or includes both values) so the test
verifies the actual combining behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 2cb04212-5a18-4677-8cb1-aed3c90f9fd9
📒 Files selected for processing (52)
metrics_utility/anonymized_rollups/anonymized_rollups.pymetrics_utility/anonymized_rollups/base_anonymized_rollup.pymetrics_utility/anonymized_rollups/credentials_anonymized_rollup.pymetrics_utility/anonymized_rollups/events_modules_anonymized_rollup.pymetrics_utility/automation_controller_billing/dataframe_engine/base.pymetrics_utility/automation_controller_billing/dataframe_engine/dataframe_jobhost_summary_usage.pymetrics_utility/automation_controller_billing/dedup/renewal_guidance.pymetrics_utility/automation_controller_billing/package/package_crc.pymetrics_utility/automation_controller_billing/report/base.pymetrics_utility/automation_controller_billing/report/report_ccsp_v2.pymetrics_utility/automation_controller_billing/report_saver/report_saver_s3.pymetrics_utility/base/package.pymetrics_utility/dataframe_schema.pymetrics_utility/library/collectors/controller/main_jobevent_service.pymetrics_utility/library/collectors/others/total_workers_vcpu.pymetrics_utility/library/collectors/util.pymetrics_utility/library/dataframes/base_dataframe.pymetrics_utility/library/dataframes/base_traditional.pymetrics_utility/library/dataframes/job_host_summary.pymetrics_utility/library/extractors.pymetrics_utility/library/storage/crc.pymetrics_utility/library/storage/directory.pymetrics_utility/library/storage/s3.pymetrics_utility/library/storage/segment.pymetrics_utility/library/storage/util.pymetrics_utility/library/utils.pymetrics_utility/management_utility.pymetrics_utility/metric_utils.pymetrics_utility/test/base/functional/helpers.pymetrics_utility/test/base/functional/test_gathering.pymetrics_utility/test/base/functional/test_slicing.pymetrics_utility/test/base/test_package.pymetrics_utility/test/ccspv_reports/dedup/test_complex_CCSPv2_with_canonical_facts/test_complex_CCSPv2_with_canonical_facts.pymetrics_utility/test/ccspv_reports/test_complex_CCSP_sanity_check.pymetrics_utility/test/ccspv_reports/test_complex_CCSPv2_sanity_check.pymetrics_utility/test/ccspv_reports/test_empty_data_for_CCSP_and_CCSPv2.pymetrics_utility/test/ccspv_reports/test_invalid_data_for_CCSP_and_CCSPv2.pymetrics_utility/test/conftest.pymetrics_utility/test/dedup/test_factory.pymetrics_utility/test/dedup/test_renewal_guidance.pymetrics_utility/test/library/test_collectors_prometheus_client.pymetrics_utility/test/library/test_collectors_util.pymetrics_utility/test/library/test_coverage_gaps.pymetrics_utility/test/library/test_dataframe_stubs.pymetrics_utility/test/library/test_storage_crc.pymetrics_utility/test/library/test_storage_directory.pymetrics_utility/test/library/test_storage_s3.pymetrics_utility/test/library/test_storage_segment.pymetrics_utility/test/library/test_summarize_merge_ops.pymetrics_utility/test/test_anonymized_rollups/test_jobs_anonymized_rollups.pymetrics_utility/test/test_anonymized_rollups/test_multiple_files.pymetrics_utility/test/test_helpers.py
💤 Files with no reviewable changes (3)
- metrics_utility/library/storage/util.py
- metrics_utility/test/ccspv_reports/test_invalid_data_for_CCSP_and_CCSPv2.py
- metrics_utility/test/dedup/test_factory.py
…o_one' and enhance error handling in StorageSegment - Changed the merge validation parameter from 'many_to_many' to 'one_to_one' in the merge methods of Base and BaseTraditional classes to ensure correct data merging behavior. - Improved error handling in StorageSegment by refining the conditions for raising ValueErrors when invalid arguments are provided, ensuring that the 'dict' argument is required. - Updated related tests to reflect the changes in error messages and validation logic.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (6)
metrics_utility/test/base/functional/helpers.py (1)
64-67: Inconsistent with the S7508 cleanup applied elsewhere.The same
in files.keys()idiom was simplified throughouttest_gathering.pyandtest_slicing.pyin this PR but left here. For consistency:♻️ Proposed cleanup
def assert_common_files(files): - assert './config.json' in files.keys() - assert './manifest.json' in files.keys() - assert './data_collection_status.csv' in files.keys() + assert './config.json' in files + assert './manifest.json' in files + assert './data_collection_status.csv' in files🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@metrics_utility/test/base/functional/helpers.py` around lines 64 - 67, In assert_common_files, remove the unnecessary .keys() calls to match the S7508 cleanup elsewhere: update the three assertions in the assert_common_files function to check membership directly in the mapping (e.g., assert './config.json' in files) instead of using in files.keys(); this keeps consistency with test_gathering.py and test_slicing.py.metrics_utility/library/utils.py (1)
30-37: No positional callers found; however, consider preserving explicit parameter names for the public API.The search confirms all repository callers use keyword arguments (
db=,key=,value=), so the**_kwargssignature is safe internally. Both functions are exported inlibrary/__init__.pyas public API. While the current change introduces no breakage within the repo, external consumers relying on the explicit signature could be affected. For a public API, preserving the parameter contract improves discoverability and reduces surprise:♻️ Suggested alternative preserving the signature
-def last_gather(**_kwargs): +def last_gather(_db=None, _key=None): log('library.utils last_gather') return None -def save_last_gather(**_kwargs): +def save_last_gather(_db=None, _key=None, _value=None): log('library.utils save_last_gather')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@metrics_utility/library/utils.py` around lines 30 - 37, The public functions last_gather and save_last_gather were changed to accept only **_kwargs which can surprise external callers; restore explicit parameter names in their signatures (e.g., def last_gather(db=None, key=None, value=None, **_kwargs) and def save_last_gather(db=None, key=None, value=None, **_kwargs)) so the public API documents expected parameters while still allowing forward-compatible keyword captures; update the function docstrings/comments if present and keep the internal logging behavior intact (these functions are exported via library.__init__ as public API).metrics_utility/automation_controller_billing/report_saver/report_saver_s3.py (1)
18-19: Preferany(...)overlen(list(...)) > 0to avoid materializing all keys.
s3_handler.list_filesis a generator that paginates over S3 (paginator.paginate(...)); wrapping it inlist(...)forces enumeration of every object under the prefix just to check existence, which is wasted I/O and memory when many files exist. Useany(...)to short-circuit on the first hit.Proposed fix
- def report_exist(self): - return len(list(self.s3_handler.list_files(self.report_spreadsheet_destination_path))) > 0 + def report_exist(self): + return any(self.s3_handler.list_files(self.report_spreadsheet_destination_path))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@metrics_utility/automation_controller_billing/report_saver/report_saver_s3.py` around lines 18 - 19, The report_exist method currently materializes all keys by doing len(list(self.s3_handler.list_files(self.report_spreadsheet_destination_path))) > 0; change it to use a short-circuiting check with any(...) over the generator returned by self.s3_handler.list_files(self.report_spreadsheet_destination_path) so it returns True on the first found object and avoids building a full list.metrics_utility/test/base/test_package.py (1)
11-23: Optional: prefer a realPackageinstance forhas_free_spacetests.Calling
Package.has_free_space(pkg, …)with aMagicMock(spec=Package)works today, but it bypasses the actualget_max_data_sizeclassmethod and only exercises the inequality. Constructing a realPackage(collector=Mock())(or a thin subclass overridingMAX_DATA_SIZE) would also catch regressions inget_max_data_sizeitself.Suggested approach
-def test_has_free_space_within_limit(): - pkg = MagicMock(spec=Package) - pkg.total_data_size = 100 - pkg.get_max_data_size.return_value = 200 - assert Package.has_free_space(pkg, 99) is True - assert Package.has_free_space(pkg, 100) is True # exactly at limit +def test_has_free_space_within_limit(monkeypatch): + pkg = Package(collector=MagicMock()) + pkg.total_data_size = 100 + monkeypatch.setattr(Package, 'MAX_DATA_SIZE', 200) + assert pkg.has_free_space(99) is True + assert pkg.has_free_space(100) is True # exactly at limit🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@metrics_utility/test/base/test_package.py` around lines 11 - 23, Tests use MagicMock(spec=Package) which bypasses Package.get_max_data_size behavior; replace the MagicMock-based pkg with a real Package instance (e.g., Package(collector=Mock()) or a small test subclass that sets MAX_DATA_SIZE) so that Package.has_free_space is exercised against the actual get_max_data_size logic in tests test_has_free_space_within_limit and test_has_free_space_exceeds_limit; ensure you set pkg.total_data_size appropriately and call Package.has_free_space(pkg, ...) as before.metrics_utility/anonymized_rollups/anonymized_rollups.py (1)
278-278: Optional: drop the redundant ternary.
sorted(empty_set)already returns[], so theif ansible_versions_set else []guard is unnecessary.♻️ Proposed simplification
- return sorted(ansible_versions_set) if ansible_versions_set else [] + return sorted(ansible_versions_set)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@metrics_utility/anonymized_rollups/anonymized_rollups.py` at line 278, The return expression currently uses a redundant ternary: "return sorted(ansible_versions_set) if ansible_versions_set else []"; simplify it by returning sorted(ansible_versions_set) directly since sorted(empty_set) already yields [], i.e., replace that return with "return sorted(ansible_versions_set)" referencing the ansible_versions_set variable and the existing sorted call.metrics_utility/library/collectors/util.py (1)
22-23:TypeErroris the right exception, but considerisinstancefor subclass tolerance.
type(x) is not dict/type(x) is not listrejects valid subclasses (e.g.,OrderedDict,collections.UserList). If you want to keep the exact-type semantics, fine — otherwise preferisinstancefor forward-compatibility.♻️ Optional refactor
- if type(data) is not dict: + if not isinstance(data, dict): raise TypeError('data must be a dict, or None')- if type(filenames) is not list: + if not isinstance(filenames, list): raise TypeError('filenames must be a list, or None')Also applies to: 40-41
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@metrics_utility/library/collectors/util.py` around lines 22 - 23, Replace strict type checks that use "type(x) is not dict" and "type(x) is not list" with isinstance-based checks so subclasses are accepted; specifically update the conditional guarding the variable "data" (currently using type(data) is not dict) to use "isinstance(data, dict)" and likewise update the check around the list variable (the check at lines ~40-41) to use isinstance(..., list) or, if more appropriate, collections.abc.Sequence/UserList depending on intended semantics, preserving the existing TypeError and error message text.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@metrics_utility/metric_utils.py`:
- Around line 11-53: JobHostSummarySchema methods unique_index_columns(),
data_columns(), cast_types(), and operations() currently return direct
references to module-level mutable lists/dicts (JOB_HOST_SUMMARY_INDEX_COLUMNS,
JOB_HOST_SUMMARY_DATA_COLUMNS, JOB_HOST_SUMMARY_CAST_TYPES,
JOB_HOST_SUMMARY_OPERATIONS); change those methods to return immutable views
(e.g., convert lists to tuples and dicts to types.MappingProxyType) or return
shallow copies (list(...) and dict(...)) so callers cannot mutate the shared
module-level objects and corrupt global schema state.
In `@metrics_utility/test/library/test_summarize_merge_ops.py`:
- Around line 31-34: The test test_summarize_combine_json_values uses a weak
assertion that only checks for key presence; update it to assert the exact
expected merged structure from summarize_merged_dataframes when using the
combine_json_values operation (e.g., verify result['col'].iloc[0] equals the
expected combined JSON object/collection), so the test asserts both keys and
values produced by combine_json_values and will fail if values are dropped or
combined incorrectly.
- Around line 19-34: Add a test that exercises multiple columns in one call to
ensure the lambda closure bug in summarize_merged_dataframes is prevented:
create a DataFrame with multiple pairs like 'a_x'/'a_y', 'b_x'/'b_y',
'c_x'/'c_y', call _bt().summarize_merged_dataframes(df, ['a','b','c'],
operations=ops) where ops =
{'a':'combine_json','b':'combine_set','c':'combine_json_values'}, and assert
each resulting column (accessed via result['a'], result['b'], result['c']) has
the expected merged values; name the test
test_summarize_lambda_branches_bind_per_column so it fails if the loop variable
`col` in summarize_merged_dataframes (and any lambda like lambda row: ...) is
not properly bound per-iteration.
---
Nitpick comments:
In `@metrics_utility/anonymized_rollups/anonymized_rollups.py`:
- Line 278: The return expression currently uses a redundant ternary: "return
sorted(ansible_versions_set) if ansible_versions_set else []"; simplify it by
returning sorted(ansible_versions_set) directly since sorted(empty_set) already
yields [], i.e., replace that return with "return sorted(ansible_versions_set)"
referencing the ansible_versions_set variable and the existing sorted call.
In
`@metrics_utility/automation_controller_billing/report_saver/report_saver_s3.py`:
- Around line 18-19: The report_exist method currently materializes all keys by
doing
len(list(self.s3_handler.list_files(self.report_spreadsheet_destination_path)))
> 0; change it to use a short-circuiting check with any(...) over the generator
returned by self.s3_handler.list_files(self.report_spreadsheet_destination_path)
so it returns True on the first found object and avoids building a full list.
In `@metrics_utility/library/collectors/util.py`:
- Around line 22-23: Replace strict type checks that use "type(x) is not dict"
and "type(x) is not list" with isinstance-based checks so subclasses are
accepted; specifically update the conditional guarding the variable "data"
(currently using type(data) is not dict) to use "isinstance(data, dict)" and
likewise update the check around the list variable (the check at lines ~40-41)
to use isinstance(..., list) or, if more appropriate,
collections.abc.Sequence/UserList depending on intended semantics, preserving
the existing TypeError and error message text.
In `@metrics_utility/library/utils.py`:
- Around line 30-37: The public functions last_gather and save_last_gather were
changed to accept only **_kwargs which can surprise external callers; restore
explicit parameter names in their signatures (e.g., def last_gather(db=None,
key=None, value=None, **_kwargs) and def save_last_gather(db=None, key=None,
value=None, **_kwargs)) so the public API documents expected parameters while
still allowing forward-compatible keyword captures; update the function
docstrings/comments if present and keep the internal logging behavior intact
(these functions are exported via library.__init__ as public API).
In `@metrics_utility/test/base/functional/helpers.py`:
- Around line 64-67: In assert_common_files, remove the unnecessary .keys()
calls to match the S7508 cleanup elsewhere: update the three assertions in the
assert_common_files function to check membership directly in the mapping (e.g.,
assert './config.json' in files) instead of using in files.keys(); this keeps
consistency with test_gathering.py and test_slicing.py.
In `@metrics_utility/test/base/test_package.py`:
- Around line 11-23: Tests use MagicMock(spec=Package) which bypasses
Package.get_max_data_size behavior; replace the MagicMock-based pkg with a real
Package instance (e.g., Package(collector=Mock()) or a small test subclass that
sets MAX_DATA_SIZE) so that Package.has_free_space is exercised against the
actual get_max_data_size logic in tests test_has_free_space_within_limit and
test_has_free_space_exceeds_limit; ensure you set pkg.total_data_size
appropriately and call Package.has_free_space(pkg, ...) as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 9212e7f3-e340-4b40-aecb-a59972eb54f1
📒 Files selected for processing (52)
metrics_utility/anonymized_rollups/anonymized_rollups.pymetrics_utility/anonymized_rollups/base_anonymized_rollup.pymetrics_utility/anonymized_rollups/credentials_anonymized_rollup.pymetrics_utility/anonymized_rollups/events_modules_anonymized_rollup.pymetrics_utility/automation_controller_billing/dataframe_engine/base.pymetrics_utility/automation_controller_billing/dataframe_engine/dataframe_jobhost_summary_usage.pymetrics_utility/automation_controller_billing/dedup/renewal_guidance.pymetrics_utility/automation_controller_billing/package/package_crc.pymetrics_utility/automation_controller_billing/report/base.pymetrics_utility/automation_controller_billing/report/report_ccsp_v2.pymetrics_utility/automation_controller_billing/report_saver/report_saver_s3.pymetrics_utility/base/package.pymetrics_utility/dataframe_schema.pymetrics_utility/library/collectors/controller/main_jobevent_service.pymetrics_utility/library/collectors/others/total_workers_vcpu.pymetrics_utility/library/collectors/util.pymetrics_utility/library/dataframes/base_dataframe.pymetrics_utility/library/dataframes/base_traditional.pymetrics_utility/library/dataframes/job_host_summary.pymetrics_utility/library/extractors.pymetrics_utility/library/storage/crc.pymetrics_utility/library/storage/directory.pymetrics_utility/library/storage/s3.pymetrics_utility/library/storage/segment.pymetrics_utility/library/storage/util.pymetrics_utility/library/utils.pymetrics_utility/management_utility.pymetrics_utility/metric_utils.pymetrics_utility/test/base/functional/helpers.pymetrics_utility/test/base/functional/test_gathering.pymetrics_utility/test/base/functional/test_slicing.pymetrics_utility/test/base/test_package.pymetrics_utility/test/ccspv_reports/dedup/test_complex_CCSPv2_with_canonical_facts/test_complex_CCSPv2_with_canonical_facts.pymetrics_utility/test/ccspv_reports/test_complex_CCSP_sanity_check.pymetrics_utility/test/ccspv_reports/test_complex_CCSPv2_sanity_check.pymetrics_utility/test/ccspv_reports/test_empty_data_for_CCSP_and_CCSPv2.pymetrics_utility/test/ccspv_reports/test_invalid_data_for_CCSP_and_CCSPv2.pymetrics_utility/test/conftest.pymetrics_utility/test/dedup/test_factory.pymetrics_utility/test/dedup/test_renewal_guidance.pymetrics_utility/test/library/test_collectors_prometheus_client.pymetrics_utility/test/library/test_collectors_util.pymetrics_utility/test/library/test_coverage_gaps.pymetrics_utility/test/library/test_dataframe_stubs.pymetrics_utility/test/library/test_storage_crc.pymetrics_utility/test/library/test_storage_directory.pymetrics_utility/test/library/test_storage_s3.pymetrics_utility/test/library/test_storage_segment.pymetrics_utility/test/library/test_summarize_merge_ops.pymetrics_utility/test/test_anonymized_rollups/test_jobs_anonymized_rollups.pymetrics_utility/test/test_anonymized_rollups/test_multiple_files.pymetrics_utility/test/test_helpers.py
💤 Files with no reviewable changes (3)
- metrics_utility/library/storage/util.py
- metrics_utility/test/ccspv_reports/test_invalid_data_for_CCSP_and_CCSPv2.py
- metrics_utility/test/dedup/test_factory.py
- Return list/dict copies from JobHostSummarySchema methods to prevent
callers from accidentally mutating the shared module-level constants
- Strengthen test_summarize_combine_json_values to assert the full
{'k': {'v1', 'v2'}} structure rather than just key presence
- Add test_summarize_lambda_branches_bind_per_column: exercises all
three lambda operations (combine_json, combine_set, combine_json_values)
across multiple columns in a single call — the only scenario that would
catch a regression of the c=col closure-capture fix (S1515)
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|




Summary
ansible/metrics-utilityprojectsuper()comparison (python:S3403), method name clash (python:S1845)validateparam inpd.merge(S6735)list()beforesorted()(S7508), regex character class simplification (S6353)Test plan
🤖 Generated with Claude Code
Note
Medium Risk
Touches core rollup merge logic (
pd.mergevalidation) and packaging/shipping configuration checks, which could surface previously-silent data issues or alter runtime failure modes. Changes are mostly mechanical/defensive with expanded test coverage, but affect critical aggregation and upload paths.Overview
Primarily refactors and hardens metrics/billing utilities to satisfy SonarCloud findings, including safer pandas merge semantics and schema reuse across dataframe implementations.
Key behavior changes include: enforcing
pd.merge(..., validate='one_to_one')for rollup merges, fixing lambda capture-over-loop-variable in merge summarization, and centralizing job-host-summary dataframe schema via newDataframeSchemaMixin/JobHostSummarySchema(plus shared constants inmetric_utils).Also cleans up/standardizes error handling and APIs: replaces generic
ExceptionwithTypeError/ValueError/RuntimeErrorin collectors/storage, fixesPackageCRC.is_shipping_configured()to callsuper().is_shipping_configured(), renamesPackage.max_data_size()toPackage.get_max_data_size(), simplifies set/list sorting unions, and tweaks report/test helpers and coverage (new targeted unit tests and consolidated report sanity checks).Reviewed by Cursor Bugbot for commit 18dffa5. Bugbot is set up for automated code reviews on this repo. Configure here.