Conversation
66153e2 to
90e8621
Compare
ce363d2 to
60d583d
Compare
6d9ba2a to
bbd2b42
Compare
bbd2b42 to
94ec06b
Compare
| // Azure's top-level Collector gathers them via Collect → GaugeVec.Collect (same pattern as pkg/azure/aks). | ||
| type metrics struct { | ||
| StorageGauge *prometheus.GaugeVec | ||
| // Planned future work: operation request rate (parity with S3/GCS cloudcost_*_operation_by_location_usd_per_krequest). |
There was a problem hiding this comment.
we might already have that with existing metrics, see https://ops.grafana-ops.net/goto/ffio5ap805csgf?orgId=stacks-27821
but it depends what kind of operation rate we are talking about :)
There was a problem hiding this comment.
Good to know. I'll keep that in mind when digging into the next metric. Thanks!
| for _, svc := range config.Services { | ||
| switch strings.ToUpper(svc) { | ||
| case "AKS": | ||
| svc = strings.TrimSpace(svc) |
There was a problem hiding this comment.
do we need to TrimSpace() here again? I think the new expandAzureServices() already trims it before the services are passed to azure.New()
| SubscriptionId: cfg.Providers.Azure.SubscriptionId, | ||
| Services: cfg.Providers.Azure.Services, | ||
| ScrapeInterval: cfg.Collector.ScrapeInterval, | ||
| Services: expandAzureServices(cfg.Providers.Azure.Services), |
There was a problem hiding this comment.
why not just use the inline pattern used by the other providers? eg, strings.Split(cfg.Providers.Azure.Services.String(), ","),
There was a problem hiding this comment.
Context:
There are two ways in which this could be argued to be better:
If we get whitespace in the list split would do this "thing, blob" => "thing", " blob" while this option would do "thing, blob" => "thing", "blob". But since latter in this flow we trim white space this isn't a huge save.
If for some reason cfg.Providers.Azure.Services == "" then this will be mapped to [""] in the string split option and nil in this option. In the split option this will lead to "unknown service" being logged eventually... In the case of nil being returned it would be a no-op (could be made to send a metric or log or something more communicative)
Suggestion:
AWS and GCP already use the string split approach and we don't have problems there, so lets do that, it also means we can pull the test out.
Future:
That said, I think this would be better served as a helper utility we would incorporate across the codebase to avoid the extra confusing log line (because "" isn't an unknown service it is no service). I think the utility-ification of this could be done in a future PR.
| ## Planned behavior (see issue #54): | ||
|
|
||
| ### Metric shape (AWS S3 and GCP GCS in this repo) | ||
|
|
||
| - **Storage rate:** Same naming pattern as S3 and GCS: `storage_by_location_usd_per_gibyte_hour` under the service subsystem (here `azure_blob` → `cloudcost_azure_blob_storage_by_location_usd_per_gibyte_hour`). Implemented today as `cloudcost_aws_s3_*` in `pkg/aws/s3/s3.go` and `cloudcost_gcp_gcs_*` in `pkg/google/metrics/metrics.go`. | ||
| - **Operations rate (optional):** S3 and GCS also define `operation_by_location_usd_per_krequest` (`pkg/aws/s3/s3.go`, `pkg/google/metrics/metrics.go`). **Not registered in the exporter yet** (see Planned future work under [Cost metrics](#cost-metrics)). A Blob Collector may expose an analogous gauge when the chosen Azure dataset supports it. | ||
| - **Labels:** S3 uses `region` and `class` for storage (and `tier` for operations); GCS uses `location` and `storage_class` (and operation class labels for ops). Azure label sets would follow whatever dimensions the Cost Management query returns; the above collectors show the cross-cloud “by location + storage class/tier” pattern only. | ||
|
|
||
| ### Azure Cost Management | ||
|
|
||
| - **Query API:** Azure documentation mentions `POST https://management.azure.com/{scope}/providers/Microsoft.CostManagement/query` and states that `{scope}` includes **`/subscriptions/{subscriptionId}/`** for subscription scope. Sample responses in that snapshot include cost columns such as **`PreTaxCost`** alongside dimensions like **`ResourceLocation`** in example filters. | ||
| - **Permissions:** Azure documentation states that you must have **Cost Management permissions at the appropriate scope** to use Cost Management APIs (the captured article links to Microsoft’s assign-access documentation for details). No specific Azure role name is asserted here. |
There was a problem hiding this comment.
I'd remove future work plans from this doc. The docs is meant to describe what functionality currently exists. We could add this info to the issue though!
|
|
||
| ## Status | ||
|
|
||
| The Blob Collector is registered when you pass `blob` in `--azure.services` (matching is case-insensitive). The collector **does not query Azure Cost Management yet**. The storage cost `GaugeVec` appears in `Describe` but is **not** `MustRegister`'d on the root registry (same pattern as `azure_aks`); **`Collect` forwards** `StorageGauge.Collect(ch)` to the parent Azure gatherer. **There are no samples** until **`Collect` calls `Set`** on label values (not implemented yet). |
There was a problem hiding this comment.
| The Blob Collector is registered when you pass `blob` in `--azure.services` (matching is case-insensitive). The collector **does not query Azure Cost Management yet**. The storage cost `GaugeVec` appears in `Describe` but is **not** `MustRegister`'d on the root registry (same pattern as `azure_aks`); **`Collect` forwards** `StorageGauge.Collect(ch)` to the parent Azure gatherer. **There are no samples** until **`Collect` calls `Set`** on label values (not implemented yet). | |
| Pass blob to -azure.services to enable this collector (matching is case-insensitive). |
Let's use active voice here. We can iterate on this doc, let's not add what the collector doesn't do
| |-------------|------|--------|---------| | ||
| | `cloudcost_azure_blob_storage_by_location_usd_per_gibyte_hour` | Gauge | `region`, `class` | None; exporter does not populate this metric yet (no billing API calls in `Collect`) | | ||
|
|
||
| **Planned future work:** `cloudcost_azure_blob_operation_by_location_usd_per_krequest` (Gauge; `region`, `class`, `tier`) — parity with S3/GCS operation metrics; commented out in `pkg/azure/blob/blob.go` until operation pricing is implemented. How we group cost rows into labels is TBD when we add the API integration. |
There was a problem hiding this comment.
same comment here on including future work
| SubscriptionId: cfg.Providers.Azure.SubscriptionId, | ||
| Services: cfg.Providers.Azure.Services, | ||
| ScrapeInterval: cfg.Collector.ScrapeInterval, | ||
| Services: expandAzureServices(cfg.Providers.Azure.Services), |
There was a problem hiding this comment.
Context:
There are two ways in which this could be argued to be better:
If we get whitespace in the list split would do this "thing, blob" => "thing", " blob" while this option would do "thing, blob" => "thing", "blob". But since latter in this flow we trim white space this isn't a huge save.
If for some reason cfg.Providers.Azure.Services == "" then this will be mapped to [""] in the string split option and nil in this option. In the split option this will lead to "unknown service" being logged eventually... In the case of nil being returned it would be a no-op (could be made to send a metric or log or something more communicative)
Suggestion:
AWS and GCP already use the string split approach and we don't have problems there, so lets do that, it also means we can pull the test out.
Future:
That said, I think this would be better served as a helper utility we would incorporate across the codebase to avoid the extra confusing log line (because "" isn't an unknown service it is no service). I think the utility-ification of this could be done in a future PR.
For #54
Adds an Azure blob cost collector behind
-azure.services blob, including metrics wiring that follows the same gatherer pattern as AKS (cost vec collected via Collect rather than separate registration). Azure startup now skips collectors that fail to construct—logging the error and continuing—matching AWS behavior so one bad service does not stop the whole provider.How to check it locally:
First run the exporter:
Then scrape the metrics: