Summary
NewPricingStore discards the error from PopulatePricingMap at startup and returns the store regardless. The natgateway and MSK collectors start with empty pricing data and emit no metrics, with no scrape error raised.
Symptoms
Log output at startup:
time=... level=ERROR msg="error populating pricing map" error="..."
Observable effects:
- Natgateway and MSK collectors emit no metrics after startup when the pricing API is unavailable
- No scrape error metric is raised - collectors appear healthy
- The empty state persists until the next 24-hour background refresh succeeds
Root Cause
// pkg/aws/pricingstore/pricing_store.go
func NewPricingStore(...) *PricingStore {
store.current.Store(&priceSnapshot{byRegion: make(map[string]map[string]float64)})
err := store.PopulatePricingMap(ctx)
if err != nil {
store.logger.Error("error populating pricing map", "error", err)
}
return store // always returns, even on failure
}
NewPricingStore always returns the store. natgateway.New and msk.New receive it without any indication that population failed. Additionally, the background refresh goroutine in natgateway.New silently drops errors (MSK already logs them correctly):
// natgateway.New - refresh goroutine drops error
pricingStore.PopulatePricingMap(ctx) // return value ignored
Proposed Fix
- Change
NewPricingStore to return (*PricingStore, error), propagating the error from PopulatePricingMap.
- Update
natgateway.New and msk.New to return (*Collector, error) and wrap the pricing store error.
- Fix the background refresh goroutine in
natgateway.New to log errors (matches MSK).
- Update
aws.go to handle errors from natgateway.New and msk.New with log-and-continue, matching the existing pattern for S3 and EC2.
Scope: AWS only. pkg/aws/pricingstore is used exclusively by the natgateway and MSK collectors. GCP and Azure are out of scope.
Affected Files
| File |
Change |
pkg/aws/pricingstore/pricing_store.go |
NewPricingStore returns (*PricingStore, error) |
pkg/aws/natgateway/natgateway.go |
New returns (*Collector, error); log refresh errors |
pkg/aws/msk/msk.go |
New returns (*Collector, error) |
pkg/aws/aws.go |
Handle errors from natgateway.New and msk.New |
Summary
NewPricingStorediscards the error fromPopulatePricingMapat startup and returns the store regardless. The natgateway and MSK collectors start with empty pricing data and emit no metrics, with no scrape error raised.Symptoms
Log output at startup:
Observable effects:
Root Cause
NewPricingStorealways returns the store.natgateway.Newandmsk.Newreceive it without any indication that population failed. Additionally, the background refresh goroutine innatgateway.Newsilently drops errors (MSK already logs them correctly):Proposed Fix
NewPricingStoreto return(*PricingStore, error), propagating the error fromPopulatePricingMap.natgateway.Newandmsk.Newto return(*Collector, error)and wrap the pricing store error.natgateway.Newto log errors (matches MSK).aws.goto handle errors fromnatgateway.Newandmsk.Newwith log-and-continue, matching the existing pattern for S3 and EC2.Scope: AWS only.
pkg/aws/pricingstoreis used exclusively by the natgateway and MSK collectors. GCP and Azure are out of scope.Affected Files
pkg/aws/pricingstore/pricing_store.goNewPricingStorereturns(*PricingStore, error)pkg/aws/natgateway/natgateway.goNewreturns(*Collector, error); log refresh errorspkg/aws/msk/msk.goNewreturns(*Collector, error)pkg/aws/aws.gonatgateway.Newandmsk.New