fix(aws): skip failing RDS and VPC collectors instead of aborting provider by stephan-rayner · Pull Request #863 · grafana/cloudcost-exporter

stephan-rayner · 2026-03-27T04:50:54Z

Summary

RDS and VPC collector init failures previously returned nil, err, killing the entire AWS provider and dropping all metrics from every other service
Aligns RDS and VPC error handling with S3 and EC2: log the error at ERROR level and continue so remaining collectors stay intact

Fixes #716

Test plan

make test passes
Verify that a pricing config error for RDS/VPC no longer prevents S3/EC2 metrics from being exported

…vider createAWSConfig failures during RDS and VPC collector init returned nil, err, killing the entire AWS provider. S3 and EC2 already log and continue on init failure. Align RDS and VPC with the same pattern so a pricing config error for one collector does not drop metrics from all others. Also makes createAWSConfig a package-level var to allow injection in tests. Adds unit tests asserting that a createAWSConfig failure for RDS or VPC skips that collector while leaving the remaining collectors intact. Fixes #716

leonorfmartins · 2026-03-27T09:45:56Z

hm, I'm more unsure about this one. I agree that we should not block other collectors from progressing if one of the collectors fails to be initialised. But just logging an error and continuing will not be enough because then our error rate SLOs will never fire and we won't know a collector broke in the first place.
What if we, on top of this change that you are proposing, return an error on each collector's initialisation? That way, we will still know if a collector fails to be initialised but healthy collectors will keep doing its thing.

stephan-rayner · 2026-03-27T16:44:11Z

hm, I'm more unsure about this one. I agree that we should not block other collectors from progressing if one of the collectors fails to be initialised. But just logging an error and continuing will not be enough because then our error rate SLOs will never fire and we won't know a collector broke in the first place. What if we, on top of this change that you are proposing, return an error on each collector's initialisation? That way, we will still know if a collector fails to be initialised but healthy collectors will keep doing its thing.

I like that @leonorfmartins! It was a kind of a coin flip for this change. A little more than half of the resources we look at do it the way in the PR and a little less than half do it the way you are suggesting.

Upon further reflection, and after having slept on it, I agree with you!

stephan-rayner requested a review from a team March 27, 2026 04:50

stephan-rayner self-assigned this Mar 27, 2026

stephan-rayner added bug Something isn't working area/monitoring labels Mar 27, 2026

stephan-rayner force-pushed the stephan-rayner/fix/aws-collector-init branch from 5e41c46 to 36aeb94 Compare March 27, 2026 04:56

stephan-rayner marked this pull request as draft March 27, 2026 16:44

stephan-rayner mentioned this pull request Apr 1, 2026

RFC: Standardize Collector Error Handling #869

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(aws): skip failing RDS and VPC collectors instead of aborting provider#863

fix(aws): skip failing RDS and VPC collectors instead of aborting provider#863
stephan-rayner wants to merge 1 commit intomainfrom
stephan-rayner/fix/aws-collector-init

stephan-rayner commented Mar 27, 2026 •

edited

Loading

Uh oh!

leonorfmartins commented Mar 27, 2026

Uh oh!

stephan-rayner commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stephan-rayner commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

leonorfmartins commented Mar 27, 2026

Uh oh!

stephan-rayner commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

stephan-rayner commented Mar 27, 2026 •

edited

Loading