CLI: use library collectors by himdel · Pull Request #339 · ansible/metrics-utility

himdel · 2026-03-03T07:06:53Z

use metrics_utility.library.collectors from the cli - via cli_* wrappers in the original collector location

cli collectors get since, until, output params, passed to library, and used to specify collector output

exposed all exisiting collectors in the cli
limit_slicer becomes until_slicer, using until over now()
full_sync removed, unused
register description removed, unused

collector outputs - library collectors now get an output= param,
defaulting to either DataframeOutput, where output.sql(db, query) returns a pandas dataframe,
or to DictOutput, where output.dict({...}) returns a dict.

for the CLI calls, CollectorOutput gets passed instead, initialized with a temporary path for files,
and outputs a filelist from output.sql, and a dict from output.dict.

this allows preserving the csv streaming for the cli, while keeping the
collector output a dataframe for anonymized

Summary by CodeRabbit

New Features
- CLI-style collectors with since/until slicing and pluggable output handlers; enhanced total-workers metric with optional metering.
Refactor
- Consolidated collector wiring and standardized output/gather interfaces.
Bug Fixes
- Improved environment boolean parsing and more robust token/CA handling and error messages.
Behavior change
- Collection results now include collection_timestamp (renamed from timestamp).
Tests
- Expanded and updated suites covering collectors, outputs, metering, and utilities.

Note

Medium Risk
Medium risk due to broad refactors across the collector execution path (CLI wrappers, output plumbing, slicing, and last-gather persistence) and changes to the vCPU/Prometheus auth flow, which could affect what data gets collected and how tarballs are partitioned.

Overview
Billing collection is refactored to make the CLI use metrics_utility.library.collectors via new cli_* wrappers that accept (since, until, output), with a new CollectionOutput adapter to keep CSV streaming for tarball generation while allowing library collectors to default to pandas (DataframeOutput) or dict (DictOutput) outputs.

The legacy SQL/copy logic in automation_controller_billing/collectors.py is removed and replaced by thin wrappers around library collectors, slicing is adjusted (limit_slicing → until_slicing storing at until-1s), and “full sync” support plus unused @register metadata (descriptions/shipping groups) are dropped.

Environment parsing is centralized with bool_from_env, Collector/Collection gathering APIs are simplified (no max_data_size argument), and total_workers_vcpu is reworked to use the library implementation with explicit Kubernetes token/CA handling and updated timestamp formatting; associated validation and tests are updated/added accordingly.

^{Written by Cursor Bugbot for commit 2315891. This will update automatically on new commits. Configure here.}

use metrics_utility.library.collectors, via cli_* wrappers cli collectors get since, until, output params, passed to library, and used to specify collector output exposed all exisiting collectors in the cli limit_slicer becomes until_slicer, using until over now() full_sync removed, unused register description removed, unused Issue: AAP-66077

library collectors now get an output= param, defaulting to either `DataframeOutput`, where `output.sql(db, query)` returns a pandas dataframe, or to `DictOutput`, where `output.dict({...})` returns a dict. for the CLI calls, `CollectorOutput` gets passed instead, initialized with a temporary path for files, and outputs a filelist from output.sql, and a dict from output.dict. this allows preserving the csv streaming for the cli, while keeping the collector output a dataframe for anonymized

…olerance)

…the since/until interval

sonarqubecloud · 2026-03-09T12:55:28Z

Quality Gate passed

Issues
7 New issues
0 Accepted issues

Measures
0 Security Hotspots
93.1% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

metrics_utility/test/library/test_collectors_execution_environments.py (2)

21-22: Docstring refers to old function name.

The docstring still mentions "copy_table" but the test now patches _copy_table_pandas. Consider updating for consistency.

📝 Suggested fix

 `@patch`('metrics_utility.library.collectors.util._copy_table_pandas')
 def test_execution_environments_calls_copy_table(mock_copy_pandas):
-    """Test that execution_environments calls copy_table with correct parameters."""
+    """Test that execution_environments calls _copy_table_pandas with correct parameters."""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@metrics_utility/test/library/test_collectors_execution_environments.py`
around lines 21 - 22, Update the test docstring to reflect the current patched
function name: replace the outdated reference to "copy_table" with
"_copy_table_pandas" in the docstring of
test_execution_environments_calls_copy_table so the description matches the
mocked symbol used in the test.

46-46: Inconsistent mock return value type.

This mock returns an empty list [] while other tests return pd.DataFrame(). For consistency and to catch type-related issues, consider returning an empty DataFrame.

📝 Suggested fix

-    mock_copy_pandas.return_value = []
+    mock_copy_pandas.return_value = pd.DataFrame()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@metrics_utility/test/library/test_collectors_execution_environments.py` at
line 46, The test sets mock_copy_pandas.return_value to an empty list which is
inconsistent with other tests that return pandas DataFrame objects; change the
mock to return an empty DataFrame (pd.DataFrame()) instead, ensuring the test
file imports pandas as pd if not already and update any assertions that assume a
DataFrame type to remain valid; target the mock named mock_copy_pandas in the
test_collectors_execution_environments.py test.

metrics_utility/test/library/test_collectors_main_indirectmanagednodeaudit.py (1)

26-27: Docstring refers to old function name.

Similar to other test files, the docstring mentions "copy_table" but the test patches _copy_table_pandas.

📝 Suggested fix

 `@patch`('metrics_utility.library.collectors.util._copy_table_pandas')
 def test_main_indirectmanagednodeaudit_calls_copy_table(mock_copy_pandas):
-    """Test that main_indirectmanagednodeaudit calls copy_table."""
+    """Test that main_indirectmanagednodeaudit calls _copy_table_pandas."""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@metrics_utility/test/library/test_collectors_main_indirectmanagednodeaudit.py`
around lines 26 - 27, The test's docstring is out-of-date: update the docstring
in test_main_indirectmanagednodeaudit_calls_copy_table to reflect the actual
patched function name (_copy_table_pandas) instead of "copy_table"; locate the
test by its name and the mock parameter mock_copy_pandas and change the
docstring text to accurately describe that the test verifies
main_indirectmanagednodeaudit calls _copy_table_pandas.

metrics_utility/base/collector.py (1)

193-196: Redundant conditional that is always false.

last_gather is set to horizon on line 193, making the condition last_gather < horizon on line 194 always false. This appears to be leftover code from when last_gather was loaded from persistent storage.

♻️ Suggested simplification

-        last_gather = horizon
-        if last_gather < horizon:
-            last_gather = horizon
-            logger.warning(f'Last analytics run was more than {get_max_gather_period_days()} days prior to {until}, using {horizon} instead.')
+        last_gather = horizon

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@metrics_utility/base/collector.py` around lines 193 - 196, The code assigns
last_gather = horizon and then checks if last_gather < horizon which can never
be true; remove the redundant conditional and its logger.warning call (or, if
intended, restore the original persistent-load logic instead of the direct
assignment). Update the block around last_gather and horizon (variables
last_gather, horizon and the logger.warning/get_max_gather_period_days() call)
so it either only assigns last_gather = horizon or implements the persisted-load
path correctly.

metrics_utility/library/collectors/others/total_workers_vcpu.py (1)

114-123: Consider using a custom exception class instead of generic Exception.

The _get method raises generic Exception for both HTTP errors and API errors. This makes it harder for callers to distinguish error types and implement specific handling.

Proposed improvement

+class PrometheusError(Exception):
+    """Base exception for Prometheus client errors."""
+    pass
+
+class PrometheusHTTPError(PrometheusError):
+    """HTTP-level error from Prometheus."""
+    pass
+
+class PrometheusAPIError(PrometheusError):
+    """API-level error from Prometheus."""
+    pass
+
     def _get(self, url, params):
         response = self.session.get(url, params=params, timeout=self.timeout)
         if response.status_code != 200:
-            raise Exception(f'HTTP error {response.status_code}: {response.text}')
+            raise PrometheusHTTPError(f'HTTP error {response.status_code}: {response.text}')

         data = response.json()
         if data.get('status') != 'success':
-            raise Exception(f'Prometheus API error: {data.get("error", "Unknown error")}')
+            raise PrometheusAPIError(f'Prometheus API error: {data.get("error", "Unknown error")}')

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@metrics_utility/library/collectors/others/total_workers_vcpu.py` around lines
114 - 123, The _get method currently raises generic Exception for HTTP and
Prometheus API errors; add two specific exception classes (e.g.,
HttpRequestError and PrometheusApiError) in this module or a shared metrics
exceptions module and raise them instead of Exception inside _get (replace the
HTTP branch that checks response.status_code and the API-status branch that
checks data.get('status') with raises of HttpRequestError and PrometheusApiError
respectively, including status code and response.text or the API error message
in the exception message) so callers can catch and handle these distinct error
types.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@metrics_utility/base/utils.py`:
- Around line 29-40: The bool_from_env helper currently coerces any non-accepted
string to False; change bool_from_env to only accept '1' or 'true'
(case-insensitive) as True and '0' or 'false' as False, return default only when
the variable is unset, and when the variable is set to any other value raise a
ValueError that includes the env var name and the offending value so
typos/trailing whitespace are surfaced (locate and update the bool_from_env
function).

In `@metrics_utility/test/gather/test_gather_ranges.py`:
- Line 67: The warning message currently hardcodes "days" and produces "1 days"
when get_max_gather_period_days() returns 1; update the message construction in
the collector (where the warning at metrics_utility/base/collector.py is
built—refer to get_max_gather_period_days() and the function/method that logs
the warning) to pluralize correctly (use "day" when the value === 1, otherwise
"days"), and then update the test assertion in
metrics_utility/test/gather/test_gather_ranges.py (the assertion expecting the
string with "1 days") to expect "1 day" when the max period is 1.

In `@metrics_utility/test/library/test_collectors_total_workers_vcpu_helpers.py`:
- Around line 211-230: The test currently encodes and asserts double-counting
for multi-series Prometheus responses; update
test_get_cpu_timeline_multiple_series to reflect correct behavior of
get_cpu_timeline by expecting merged/normalized results instead of flat
concatenation — specifically, ensure the test asserts unique timestamps (e.g.,
len(result) == 2) and that values are combined/normalized per timestamp (or
otherwise assert timestamp uniqueness), so the helper is validated for merging
multi-series responses rather than allowing duplicate timeline entries.

---

Nitpick comments:
In `@metrics_utility/base/collector.py`:
- Around line 193-196: The code assigns last_gather = horizon and then checks if
last_gather < horizon which can never be true; remove the redundant conditional
and its logger.warning call (or, if intended, restore the original
persistent-load logic instead of the direct assignment). Update the block around
last_gather and horizon (variables last_gather, horizon and the
logger.warning/get_max_gather_period_days() call) so it either only assigns
last_gather = horizon or implements the persisted-load path correctly.

In `@metrics_utility/library/collectors/others/total_workers_vcpu.py`:
- Around line 114-123: The _get method currently raises generic Exception for
HTTP and Prometheus API errors; add two specific exception classes (e.g.,
HttpRequestError and PrometheusApiError) in this module or a shared metrics
exceptions module and raise them instead of Exception inside _get (replace the
HTTP branch that checks response.status_code and the API-status branch that
checks data.get('status') with raises of HttpRequestError and PrometheusApiError
respectively, including status code and response.text or the API error message
in the exception message) so callers can catch and handle these distinct error
types.

In `@metrics_utility/test/library/test_collectors_execution_environments.py`:
- Around line 21-22: Update the test docstring to reflect the current patched
function name: replace the outdated reference to "copy_table" with
"_copy_table_pandas" in the docstring of
test_execution_environments_calls_copy_table so the description matches the
mocked symbol used in the test.
- Line 46: The test sets mock_copy_pandas.return_value to an empty list which is
inconsistent with other tests that return pandas DataFrame objects; change the
mock to return an empty DataFrame (pd.DataFrame()) instead, ensuring the test
file imports pandas as pd if not already and update any assertions that assume a
DataFrame type to remain valid; target the mock named mock_copy_pandas in the
test_collectors_execution_environments.py test.

In
`@metrics_utility/test/library/test_collectors_main_indirectmanagednodeaudit.py`:
- Around line 26-27: The test's docstring is out-of-date: update the docstring
in test_main_indirectmanagednodeaudit_calls_copy_table to reflect the actual
patched function name (_copy_table_pandas) instead of "copy_table"; locate the
test by its name and the mock parameter mock_copy_pandas and change the
docstring text to accurately describe that the test verifies
main_indirectmanagednodeaudit calls _copy_table_pandas.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 84603860-0928-4299-b9a3-ed028c3219f6

📥 Commits

Reviewing files that changed from the base of the PR and between 2315891 and adb85fd.

📒 Files selected for processing (59)

metrics_utility/automation_controller_billing/collector.py
metrics_utility/automation_controller_billing/collectors.py
metrics_utility/automation_controller_billing/helpers.py
metrics_utility/automation_controller_billing/kubernetes_client.py
metrics_utility/automation_controller_billing/prometheus_client.py
metrics_utility/base/collection.py
metrics_utility/base/collection_data_status.py
metrics_utility/base/collection_manifest.py
metrics_utility/base/collector.py
metrics_utility/base/decorators.py
metrics_utility/base/package.py
metrics_utility/base/utils.py
metrics_utility/library/collectors/controller/config.py
metrics_utility/library/collectors/controller/controller_version_service.py
metrics_utility/library/collectors/controller/credentials_service.py
metrics_utility/library/collectors/controller/execution_environments.py
metrics_utility/library/collectors/controller/job_host_summary.py
metrics_utility/library/collectors/controller/job_host_summary_service.py
metrics_utility/library/collectors/controller/main_host.py
metrics_utility/library/collectors/controller/main_indirectmanagednodeaudit.py
metrics_utility/library/collectors/controller/main_jobevent.py
metrics_utility/library/collectors/controller/main_jobevent_service.py
metrics_utility/library/collectors/controller/table_metadata.py
metrics_utility/library/collectors/controller/unified_jobs.py
metrics_utility/library/collectors/others/prometheus_client.py
metrics_utility/library/collectors/others/total_workers_vcpu.py
metrics_utility/library/collectors/util.py
metrics_utility/library/csv_file_splitter.py
metrics_utility/logger.py
metrics_utility/management/validation.py
metrics_utility/test/base/functional/collector_module.py
metrics_utility/test/base/functional/collector_module2.py
metrics_utility/test/base/functional/collector_module3.py
metrics_utility/test/base/functional/collector_module4_slicing.py
metrics_utility/test/base/functional/helpers.py
metrics_utility/test/base/functional/test_slicing.py
metrics_utility/test/base/test_utils.py
metrics_utility/test/gather/test_gather_ranges.py
metrics_utility/test/gather/test_jobhostsummary_gather.py
metrics_utility/test/gather/test_total_workers_vcpu.py
metrics_utility/test/gather/test_total_workers_vcpu_token_handling.py
metrics_utility/test/library/test_collectors_execution_environments.py
metrics_utility/test/library/test_collectors_job_host_summary.py
metrics_utility/test/library/test_collectors_job_host_summary_service.py
metrics_utility/test/library/test_collectors_main_host.py
metrics_utility/test/library/test_collectors_main_indirectmanagednodeaudit.py
metrics_utility/test/library/test_collectors_main_jobevent.py
metrics_utility/test/library/test_collectors_main_jobevent_service.py
metrics_utility/test/library/test_collectors_prometheus_client.py
metrics_utility/test/library/test_collectors_total_workers_vcpu.py
metrics_utility/test/library/test_collectors_total_workers_vcpu_helpers.py
metrics_utility/test/library/test_collectors_unified_jobs.py
metrics_utility/test/library/test_collectors_util_helpers.py
metrics_utility/test/library/test_collectors_util_output.py
metrics_utility/test/test_automation_controller_billing_helpers.py
metrics_utility/test/test_collectors.py
metrics_utility/test/test_kubernetes_client.py
metrics_utility/test/test_prometheus_client.py
workers/0-gather-controller-crc.py

💤 Files with no reviewable changes (7)

metrics_utility/automation_controller_billing/prometheus_client.py
metrics_utility/automation_controller_billing/kubernetes_client.py
metrics_utility/library/collectors/others/prometheus_client.py
metrics_utility/test/test_kubernetes_client.py
metrics_utility/base/decorators.py
metrics_utility/test/test_prometheus_client.py
workers/0-gather-controller-crc.py

🚧 Files skipped from review as they are similar to previous changes (16)

metrics_utility/base/collection_manifest.py
metrics_utility/test/library/test_collectors_unified_jobs.py
metrics_utility/test/base/test_utils.py
metrics_utility/test/library/test_collectors_util_helpers.py
metrics_utility/test/library/test_collectors_prometheus_client.py
metrics_utility/library/collectors/controller/main_host.py
metrics_utility/test/gather/test_total_workers_vcpu.py
metrics_utility/test/gather/test_total_workers_vcpu_token_handling.py
metrics_utility/test/library/test_collectors_job_host_summary.py
metrics_utility/library/collectors/controller/unified_jobs.py
metrics_utility/test/library/test_collectors_job_host_summary_service.py
metrics_utility/automation_controller_billing/collector.py
metrics_utility/test/library/test_collectors_util_output.py
metrics_utility/library/collectors/controller/main_indirectmanagednodeaudit.py
metrics_utility/library/collectors/controller/config.py
metrics_utility/test/library/test_collectors_total_workers_vcpu.py

cshiels-ie · 2026-03-09T13:23:06Z

LGTM So much deleted code !