Fix: Standardise Collector Error Propagation

## Background

Several collectors silently swallow errors during scrapes: they log a warning, skip the affected resource, and return \`nil\` from \`Collect()\`. From the exporter's perspective the scrape succeeded, so \`cloudcost_exporter_collector_error\` never increments and SLO dashboards show the collector as healthy. Hundreds of VMs or volumes can go unpriced with no alertable signal.

The \`collectormetrics\` wrapper in \`pkg/gatherer\` already increments the error counter whenever \`Collect()\` returns a non-nil error; the infrastructure for observable failures exists. The gap is that collectors need to surface errors rather than absorb them.

> **Note:** This issue is about operational metrics - metrics that describe the health of the exporter itself (e.g. \`cloudcost_exporter_collector_scrape_errors_total\`) - not the cost rate metrics that collectors emit.

---

## Option Definitions (from #869)

**Option A - Return Error, Provider Logs and Skips**
At init, \`New()\` returns an error and the provider logs it and skips the collector. At scrape, \`Collect()\` returns an error and the provider logs it. Other collectors are unaffected at both phases. Failures are only observable via logs.

**Option B - Return Error, Log, Skip, and Increment a Metric** *<-* __**Chosen Option**__
Same as A, but the provider also increments an error counter labelled by collector name. If the EC2 collector fails to initialize, the AWS provider logs the error, skips EC2, and increments \`cloudcost_exporter_collector_init_errors_total{collector="ec2"}\`. An alert can fire on that counter without any log monitoring.

**Option C - Never Fail, Defer All Errors**
\`New()\` always succeeds. \`Collect()\` re-emits stale cached values or serves background-refreshed data rather than returning an error. A broken collector silently appears healthy.

**Option D - Fail Fast, Fail the Provider or Scrape**
Any collector failure fails the entire provider at init, or fails the entire scrape at scrape time.

---

## Problem

Several collectors currently implement Option C: they swallow errors in \`Collect()\`, return \`nil\`, and the \`cloudcost_exporter_collector_error\` counter never increments. SLO dashboards show these collectors as healthy even when they silently fail to price resources.

The \`collectormetrics.Collect()\` wrapper in \`pkg/gatherer\` already increments the error counter when \`Collect()\` returns a non-nil error. The gap is entirely on the collector side: collectors need to return errors rather than swallowing them.

## Possibly Related Issues
- https://github.com/grafana/cloudcost-exporter/issues/105
- https://github.com/grafana/deployment_tools/issues/529664

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Standardise Collector Error Propagation #870

Background

Option Definitions (from #869)

Problem

Possibly Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Fix: Standardise Collector Error Propagation #870

Description

Background

Option Definitions (from #869)

Problem

Possibly Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions