Skip to content

Azure: Add blob collector stub#893

Open
suntala wants to merge 6 commits intomainfrom
suntala/azure-object-storage
Open

Azure: Add blob collector stub#893
suntala wants to merge 6 commits intomainfrom
suntala/azure-object-storage

Conversation

@suntala
Copy link
Copy Markdown
Contributor

@suntala suntala commented Apr 10, 2026

For #54

Adds an Azure blob cost collector behind -azure.services blob, including metrics wiring that follows the same gatherer pattern as AKS (cost vec collected via Collect rather than separate registration). Azure startup now skips collectors that fail to construct—logging the error and continuing—matching AWS behavior so one bad service does not stop the whole provider.

How to check it locally:

First run the exporter:

go run ./cmd/exporter/exporter.go \
  -provider azure \
  -azure.subscription-id "$(az account show --query id -o tsv)" \
  -azure.services blob \
  -log.level=info \
  -log.type=text

Then scrape the metrics:

curl -s localhost:8080/metrics | grep -E 'azure_blob|collector='

@suntala suntala force-pushed the suntala/azure-object-storage branch from 66153e2 to 90e8621 Compare April 10, 2026 09:43
@suntala suntala force-pushed the suntala/azure-object-storage branch from ce363d2 to 60d583d Compare April 10, 2026 10:24
@suntala suntala force-pushed the suntala/azure-object-storage branch from 6d9ba2a to bbd2b42 Compare April 10, 2026 11:50
@suntala suntala force-pushed the suntala/azure-object-storage branch from bbd2b42 to 94ec06b Compare April 10, 2026 12:28
// Azure's top-level Collector gathers them via Collect → GaugeVec.Collect (same pattern as pkg/azure/aks).
type metrics struct {
StorageGauge *prometheus.GaugeVec
// Planned future work: operation request rate (parity with S3/GCS cloudcost_*_operation_by_location_usd_per_krequest).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might already have that with existing metrics, see https://ops.grafana-ops.net/goto/ffio5ap805csgf?orgId=stacks-27821

but it depends what kind of operation rate we are talking about :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. I'll keep that in mind when digging into the next metric. Thanks!

@suntala suntala changed the title Azure: Add stub blob collector Azure: Add blob collector stub Apr 10, 2026
@suntala suntala marked this pull request as ready for review April 10, 2026 16:10
@suntala suntala requested a review from a team April 10, 2026 16:11
for _, svc := range config.Services {
switch strings.ToUpper(svc) {
case "AKS":
svc = strings.TrimSpace(svc)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to TrimSpace() here again? I think the new expandAzureServices() already trims it before the services are passed to azure.New()

SubscriptionId: cfg.Providers.Azure.SubscriptionId,
Services: cfg.Providers.Azure.Services,
ScrapeInterval: cfg.Collector.ScrapeInterval,
Services: expandAzureServices(cfg.Providers.Azure.Services),
Copy link
Copy Markdown
Contributor

@rooneyshuman rooneyshuman Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use the inline pattern used by the other providers? eg, strings.Split(cfg.Providers.Azure.Services.String(), ","),

Copy link
Copy Markdown
Contributor

@stephan-rayner stephan-rayner Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context:
There are two ways in which this could be argued to be better:

If we get whitespace in the list split would do this "thing, blob" => "thing", " blob" while this option would do "thing, blob" => "thing", "blob". But since latter in this flow we trim white space this isn't a huge save.

If for some reason cfg.Providers.Azure.Services == "" then this will be mapped to [""] in the string split option and nil in this option. In the split option this will lead to "unknown service" being logged eventually... In the case of nil being returned it would be a no-op (could be made to send a metric or log or something more communicative)

Suggestion:
AWS and GCP already use the string split approach and we don't have problems there, so lets do that, it also means we can pull the test out.

Future:
That said, I think this would be better served as a helper utility we would incorporate across the codebase to avoid the extra confusing log line (because "" isn't an unknown service it is no service). I think the utility-ification of this could be done in a future PR.

Comment on lines +7 to +18
## Planned behavior (see issue #54):

### Metric shape (AWS S3 and GCP GCS in this repo)

- **Storage rate:** Same naming pattern as S3 and GCS: `storage_by_location_usd_per_gibyte_hour` under the service subsystem (here `azure_blob` → `cloudcost_azure_blob_storage_by_location_usd_per_gibyte_hour`). Implemented today as `cloudcost_aws_s3_*` in `pkg/aws/s3/s3.go` and `cloudcost_gcp_gcs_*` in `pkg/google/metrics/metrics.go`.
- **Operations rate (optional):** S3 and GCS also define `operation_by_location_usd_per_krequest` (`pkg/aws/s3/s3.go`, `pkg/google/metrics/metrics.go`). **Not registered in the exporter yet** (see Planned future work under [Cost metrics](#cost-metrics)). A Blob Collector may expose an analogous gauge when the chosen Azure dataset supports it.
- **Labels:** S3 uses `region` and `class` for storage (and `tier` for operations); GCS uses `location` and `storage_class` (and operation class labels for ops). Azure label sets would follow whatever dimensions the Cost Management query returns; the above collectors show the cross-cloud “by location + storage class/tier” pattern only.

### Azure Cost Management

- **Query API:** Azure documentation mentions `POST https://management.azure.com/{scope}/providers/Microsoft.CostManagement/query` and states that `{scope}` includes **`/subscriptions/{subscriptionId}/`** for subscription scope. Sample responses in that snapshot include cost columns such as **`PreTaxCost`** alongside dimensions like **`ResourceLocation`** in example filters.
- **Permissions:** Azure documentation states that you must have **Cost Management permissions at the appropriate scope** to use Cost Management APIs (the captured article links to Microsoft’s assign-access documentation for details). No specific Azure role name is asserted here.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove future work plans from this doc. The docs is meant to describe what functionality currently exists. We could add this info to the issue though!


## Status

The Blob Collector is registered when you pass `blob` in `--azure.services` (matching is case-insensitive). The collector **does not query Azure Cost Management yet**. The storage cost `GaugeVec` appears in `Describe` but is **not** `MustRegister`'d on the root registry (same pattern as `azure_aks`); **`Collect` forwards** `StorageGauge.Collect(ch)` to the parent Azure gatherer. **There are no samples** until **`Collect` calls `Set`** on label values (not implemented yet).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Blob Collector is registered when you pass `blob` in `--azure.services` (matching is case-insensitive). The collector **does not query Azure Cost Management yet**. The storage cost `GaugeVec` appears in `Describe` but is **not** `MustRegister`'d on the root registry (same pattern as `azure_aks`); **`Collect` forwards** `StorageGauge.Collect(ch)` to the parent Azure gatherer. **There are no samples** until **`Collect` calls `Set`** on label values (not implemented yet).
Pass blob to -azure.services to enable this collector (matching is case-insensitive).

Let's use active voice here. We can iterate on this doc, let's not add what the collector doesn't do

|-------------|------|--------|---------|
| `cloudcost_azure_blob_storage_by_location_usd_per_gibyte_hour` | Gauge | `region`, `class` | None; exporter does not populate this metric yet (no billing API calls in `Collect`) |

**Planned future work:** `cloudcost_azure_blob_operation_by_location_usd_per_krequest` (Gauge; `region`, `class`, `tier`) — parity with S3/GCS operation metrics; commented out in `pkg/azure/blob/blob.go` until operation pricing is implemented. How we group cost rows into labels is TBD when we add the API integration.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here on including future work

SubscriptionId: cfg.Providers.Azure.SubscriptionId,
Services: cfg.Providers.Azure.Services,
ScrapeInterval: cfg.Collector.ScrapeInterval,
Services: expandAzureServices(cfg.Providers.Azure.Services),
Copy link
Copy Markdown
Contributor

@stephan-rayner stephan-rayner Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context:
There are two ways in which this could be argued to be better:

If we get whitespace in the list split would do this "thing, blob" => "thing", " blob" while this option would do "thing, blob" => "thing", "blob". But since latter in this flow we trim white space this isn't a huge save.

If for some reason cfg.Providers.Azure.Services == "" then this will be mapped to [""] in the string split option and nil in this option. In the split option this will lead to "unknown service" being logged eventually... In the case of nil being returned it would be a no-op (could be made to send a metric or log or something more communicative)

Suggestion:
AWS and GCP already use the string split approach and we don't have problems there, so lets do that, it also means we can pull the test out.

Future:
That said, I think this would be better served as a helper utility we would incorporate across the codebase to avoid the extra confusing log line (because "" isn't an unknown service it is no service). I think the utility-ification of this could be done in a future PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants