Skip to content

Commit 53a2437

Browse files
authored
azuremonitorreceiver: discover and collect custom metric namespaces (#49035)
## Description The `MetricDefinitions` API only returns custom metric namespaces (e.g. `azure.vm.linux.guestmetrics` published by AMA/MetricsExtension) when the `metricnamespace` query parameter is explicitly set. Without it, only standard platform metrics are returned, so any metrics configured under `receiver::metrics` that belong to a custom namespace were silently dropped — the receiver never discovered or collected them. ### Root cause `loadMetricsDefinitions` called `MetricDefinitionsClient.NewListPager(resourceID, nil)` (nil = no namespace filter). Azure Monitor returns definitions only for the resource's default namespace (e.g. `Microsoft.Compute/virtualMachines`). Custom namespaces like `azure.vm.linux.guestmetrics` require an explicit `metricnamespace` query parameter. Additionally, the `Metrics` (values) API also requires the `metricnamespace` parameter when querying a non-default namespace, so `newResourceMetricsValuesRequestOptions` needed the same fix. The batch scraper's `QueryResources` call had the same issue: it was passing `resourceType` as the namespace parameter instead of the actual metric namespace from the composite key. ### Changes - **`scraper.go`**: Add `namespace` field to `metricsCompositeKey` so metrics from different namespaces are batched in separate `Metrics` API calls (Azure Monitor accepts only one namespace per request). Refactor `loadMetricsDefinitions` into a `collectMetricDefinitions` helper that accepts optional `MetricDefinitionsClientListOptions`. After the default (no-filter) call, make one additional call per namespace configured in `receiver.metrics` that was not already discovered, using the explicit `metricnamespace` parameter. Pass `compositeKey.namespace` to `newResourceMetricsValuesRequestOptions`. - **`scraper_batch.go`**: Apply the same two-pass fix. Extract the pager loop into a `collectMetricDefinitionsByType` helper with `discoveredNamespaces` tracking. After the default call, make one additional namespace-filtered call per custom namespace in `cfg.Metrics` not already returned. Pass `compositeKey.namespace` instead of `resourceType` to `QueryResources` so the correct namespace is used in the batch values API call. - **`scraper_test.go`**: Add `TestAzureScraperScrapeCustomNamespaceMetrics` covering the end-to-end flow: a VM with no standard metrics but a custom namespace metric (`azure.vm.linux.guestmetrics/disk/free_percent`) is fully discovered and collected. - **`scraper_batch_test.go`**: Update mock data to use the actual namespace values from metric definitions (instead of the resource type string). Add `TestAzureScraperBatchScrapeCustomNamespaceMetrics` covering the same end-to-end flow for the batch scraper path. - **`mocks_test.go`**: Fix the mock to return an empty success page (instead of a zero `PagerResponder`) when no matching key exists. The Azure SDK fake transport uses the URL path as the tracker key (ignoring query params), so without this fix the empty `PagerResponder` from the default call was reused for subsequent namespace-filtered calls, preventing the mock from being invoked again. - **`testdata/expected_metrics/metrics_custom_namespace.yaml`**: Golden file for the new tests. - **`.chloggen/40989-azuremonitor-custom-namespace-discovery.yaml`**: Changelog entry for the bug fix. - **`README.md`**: Add a note explaining custom namespace API behavior, an example config for `azure.vm.linux.guestmetrics`, and a cardinality note in the API calls section. ### Testing ``` go test ./receiver/azuremonitorreceiver/... -v -run TestAzureScraperScrape ``` All existing tests continue to pass. The new tests `TestAzureScraperScrapeCustomNamespaceMetrics` and `TestAzureScraperBatchScrapeCustomNamespaceMetrics` cover the custom namespace discovery path for both scraper implementations. ### Related issues This fix enables collection of metrics published by the Azure Monitor Agent (AMA) / MetricsExtension to custom namespaces, which is necessary for guest OS metrics (disk, memory, CPU from within the VM) that are not available through standard Azure platform metrics. Fixes: #40989
1 parent 3f592ab commit 53a2437

8 files changed

Lines changed: 393 additions & 23 deletions

File tree

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
change_type: bug_fix
2+
component: receiver/azure_monitor
3+
note: Fix discovery and collection of custom metric namespace definitions (e.g. `azure.vm.linux.guestmetrics` published by Azure Monitor Agent / MetricsExtension)
4+
issues: [40989]
5+
subtext: >
6+
The MetricDefinitions API only returns custom namespace metrics when the
7+
`metricnamespace` query parameter is explicitly set. Previously, metrics
8+
configured under `receiver::metrics` for a custom namespace were silently
9+
dropped because the API call used no filter and only returned the
10+
resource's default namespace. The receiver now makes an additional
11+
namespace-filtered call for each custom namespace in the `metrics` config
12+
that was not returned by the default call.
13+
change_logs: [user]

receiver/azuremonitorreceiver/README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,11 @@ It accepts a nested map where
7171
In this case, the scraper will fetch **all supported aggregations** for that metric, which is also the case if no
7272
`metrics` configuration is provided.
7373
> - **Case Insensitive**: The letter case of the Namespaces, Metric names, and Aggregations does not affect the functionality.
74+
> - **Custom metric namespaces**: Metrics published by the [Azure Monitor Agent (AMA)](https://learn.microsoft.com/en-us/azure/azure-monitor/agents/azure-monitor-agent-overview)
75+
or MetricsExtension (e.g. guest OS metrics under `azure.vm.linux.guestmetrics`) belong to a custom namespace that
76+
the MetricDefinitions API only returns when queried with an explicit `metricnamespace` filter. To collect these
77+
metrics, add the custom namespace as a top-level key under `metrics`; the receiver will issue the necessary
78+
namespace-filtered discovery call automatically.
7479

7580
> [!WARNING]
7681
> If you started providing a `metrics` configuration for a namespace, you have to specify all the metrics and their
@@ -93,6 +98,18 @@ receivers:
9398
ActiveConnections: [] # metric ActiveConnections with all known aggregations (same effect than [*])
9499
```
95100
101+
Scraping guest OS metrics from a custom namespace (e.g. published by Azure Monitor Agent):
102+
103+
```yaml
104+
receivers:
105+
azure_monitor:
106+
services:
107+
- Microsoft.Compute/virtualMachines
108+
metrics:
109+
"azure.vm.linux.guestmetrics":
110+
"filesystem % free space": [Average] # metric names depend on agent config; check MetricDefinitions API for your namespace
111+
```
112+
96113
### Use Batch API (experimental)
97114
98115
There's two API to collect metrics in Azure Monitor:
@@ -248,6 +265,8 @@ conditions: always
248265
cardinality:
249266
- if use_batch_api is false, once per res id and *page of metrics def
250267
- if use_batch_api is true, once per res type and *page of metrics def
268+
- one additional call per custom namespace configured under `metrics` that
269+
was not returned by the default call (e.g. azure.vm.linux.guestmetrics)
251270
```
252271

253272
### [Metrics - List](https://learn.microsoft.com/en-us/rest/api/monitor/metrics/list?view=rest-monitor-2023-10-01&tabs=HTTP)

receiver/azuremonitorreceiver/mocks_test.go

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,11 +59,28 @@ func newMockResourcesListPager(resourcesPages []armresources.ClientListResponse)
5959

6060
// newMockMetricsDefinitionListPager is a helper function to create a list pager for metrics definitions.
6161
// Don't use it, it's designed to be called in the newMockClientOptionsResolver ctor only.
62+
//
63+
// The map key is either a plain resourceURI (for the default no-filter call) or
64+
// "resourceURI::namespace" (for namespace-filtered calls). When a namespace-filtered
65+
// call is made and no namespace-specific key exists, an empty response is returned
66+
// rather than falling back to the plain-URI key, which prevents duplicate definitions.
6267
func newMockMetricsDefinitionListPager(metricDefinitionsPagesByResourceURI map[string][]armmonitor.MetricDefinitionsClientListResponse) func(resourceURI string, options *armmonitor.MetricDefinitionsClientListOptions) (resp azfake.PagerResponder[armmonitor.MetricDefinitionsClientListResponse]) {
63-
return func(resourceURI string, _ *armmonitor.MetricDefinitionsClientListOptions) (resp azfake.PagerResponder[armmonitor.MetricDefinitionsClientListResponse]) {
68+
return func(resourceURI string, options *armmonitor.MetricDefinitionsClientListOptions) (resp azfake.PagerResponder[armmonitor.MetricDefinitionsClientListResponse]) {
6469
resourceURI = fmt.Sprintf("/%s", resourceURI) // Hack the fake API as it's not taking starting slash from called request
65-
for _, page := range metricDefinitionsPagesByResourceURI[resourceURI] {
66-
resp.AddPage(http.StatusOK, page, nil)
70+
key := resourceURI
71+
if options != nil && options.Metricnamespace != nil {
72+
key = resourceURI + "::" + *options.Metricnamespace
73+
}
74+
pages, found := metricDefinitionsPagesByResourceURI[key]
75+
if found {
76+
for _, page := range pages {
77+
resp.AddPage(http.StatusOK, page, nil)
78+
}
79+
} else {
80+
// Return an empty success page so the fake transport removes this tracker entry,
81+
// allowing subsequent calls with different query params (e.g. metricnamespace) to
82+
// create a fresh tracker entry and invoke the mock handler again.
83+
resp.AddPage(http.StatusOK, armmonitor.MetricDefinitionsClientListResponse{}, nil)
6784
}
6885
return resp
6986
}

receiver/azuremonitorreceiver/scraper.go

Lines changed: 55 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ type azureResource struct {
7878
}
7979

8080
type metricsCompositeKey struct {
81+
namespace string
8182
dimensions string // comma separated sorted dimensions
8283
aggregations string // comma separated sorted aggregations
8384
timeGrain string
@@ -483,8 +484,46 @@ func (s *azureScraper) loadMetricsDefinitions(ctx context.Context, subscriptionI
483484
return
484485
}
485486

486-
pager := clientMetricsDefinitions.NewListPager(resourceID, nil)
487+
// discoveredNamespaces tracks namespaces returned by the default (no-filter) call
488+
// so we can skip redundant namespace-filtered calls for those namespaces.
489+
discoveredNamespaces := map[string]struct{}{}
487490

491+
s.collectMetricDefinitions(ctx, subscriptionID, resourceID, clientMetricsDefinitions, nil, discoveredNamespaces)
492+
493+
// The Azure Monitor MetricDefinitions API only returns custom metric namespace
494+
// definitions (e.g. "azure.vm.linux.guestmetrics" published by AMA/MetricsExtension)
495+
// when the metricnamespace query parameter is set explicitly. Make additional calls
496+
// for each namespace configured in the metrics filter that was not already returned
497+
// by the default call above.
498+
for configNamespace := range s.cfg.Metrics {
499+
if _, found := discoveredNamespaces[strings.ToLower(configNamespace)]; found {
500+
continue
501+
}
502+
opts := &armmonitor.MetricDefinitionsClientListOptions{
503+
Metricnamespace: to.Ptr(configNamespace),
504+
}
505+
s.collectMetricDefinitions(ctx, subscriptionID, resourceID, clientMetricsDefinitions, opts, nil)
506+
}
507+
508+
s.resources[subscriptionID][resourceID].metricsDefinitionsUpdated = time.Now()
509+
s.settings.Logger.Info("Loaded the list of Azure Metrics Definitions",
510+
zap.Int("metrics_definitions_count", len(s.resources[subscriptionID][resourceID].metricsByCompositeKey)),
511+
zap.String("resource_id", resourceID),
512+
zap.String("subscription_id", subscriptionID))
513+
}
514+
515+
// collectMetricDefinitions pages through a MetricDefinitions pager and registers each
516+
// metric definition into the resource's metricsByCompositeKey map.
517+
// discoveredNamespaces, when non-nil, is populated with the lowercased namespaces seen in
518+
// the response so callers can skip redundant follow-up calls.
519+
func (s *azureScraper) collectMetricDefinitions(
520+
ctx context.Context,
521+
subscriptionID, resourceID string,
522+
client *armmonitor.MetricDefinitionsClient,
523+
opts *armmonitor.MetricDefinitionsClientListOptions,
524+
discoveredNamespaces map[string]struct{},
525+
) {
526+
pager := client.NewListPager(resourceID, opts)
488527
page := 0
489528
for pager.More() {
490529
nextResult, err := pager.NextPage(ctx)
@@ -505,27 +544,28 @@ func (s *azureScraper) loadMetricsDefinitions(ctx context.Context, subscriptionI
505544

506545
for _, v := range nextResult.Value {
507546
metricName := *v.Name.Value
508-
metricAggregations := getMetricAggregations(*v.Namespace, metricName, s.cfg.Metrics, convertAggregationsToStr(v.SupportedAggregationTypes))
547+
metricNamespace := *v.Namespace
548+
549+
if discoveredNamespaces != nil {
550+
discoveredNamespaces[strings.ToLower(metricNamespace)] = struct{}{}
551+
}
552+
553+
metricAggregations := getMetricAggregations(metricNamespace, metricName, s.cfg.Metrics, convertAggregationsToStr(v.SupportedAggregationTypes))
509554
if len(metricAggregations) == 0 {
510555
continue
511556
}
512557

513558
timeGrain := *v.MetricAvailabilities[0].TimeGrain
514559
dimensions := filterDimensions(v.Dimensions, s.cfg.Dimensions, *s.resources[subscriptionID][resourceID].resourceType, metricName)
515560
compositeKey := metricsCompositeKey{
561+
namespace: metricNamespace,
516562
timeGrain: timeGrain,
517563
dimensions: serializeDimensions(dimensions),
518564
aggregations: strings.Join(metricAggregations, ","),
519565
}
520566
s.loadMetricsDefinition(subscriptionID, resourceID, metricName, compositeKey)
521567
}
522568
}
523-
524-
s.resources[subscriptionID][resourceID].metricsDefinitionsUpdated = time.Now()
525-
s.settings.Logger.Info("Loaded the list of Azure Metrics Definitions",
526-
zap.Int("metrics_definitions_count", len(s.resources[subscriptionID][resourceID].metricsByCompositeKey)),
527-
zap.String("resource_id", resourceID),
528-
zap.String("subscription_id", subscriptionID))
529569
}
530570

531571
func (s *azureScraper) loadMetricsDefinition(subscriptionID, resourceID, metricName string, compositeKey metricsCompositeKey) {
@@ -577,6 +617,7 @@ func (s *azureScraper) loadMetricsValues(ctx context.Context, subscriptionID, re
577617
compositeKey.dimensions,
578618
compositeKey.timeGrain,
579619
compositeKey.aggregations,
620+
compositeKey.namespace,
580621
start,
581622
end,
582623
s.cfg.MaximumNumberOfRecordsPerResource,
@@ -638,18 +679,23 @@ func newResourceMetricsValuesRequestOptions(
638679
dimensionsStr string,
639680
timeGrain string,
640681
aggregationsStr string,
682+
namespace string,
641683
start int,
642684
end int,
643685
top int32,
644686
) armmonitor.MetricsClientListOptions {
645-
return armmonitor.MetricsClientListOptions{
687+
opts := armmonitor.MetricsClientListOptions{
646688
Metricnames: to.Ptr(strings.Join(metrics[start:end], ",")),
647689
Interval: to.Ptr(timeGrain),
648690
Timespan: to.Ptr(timeGrain),
649691
Aggregation: to.Ptr(aggregationsStr),
650692
Top: to.Ptr(top),
651693
Filter: buildDimensionsFilter(dimensionsStr),
652694
}
695+
if namespace != "" {
696+
opts.Metricnamespace = to.Ptr(namespace)
697+
}
698+
return opts
653699
}
654700

655701
func (s *azureScraper) processTimeseriesData(

receiver/azuremonitorreceiver/scraper_batch.go

Lines changed: 47 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -442,7 +442,44 @@ func (s *azureBatchScraper) loadResourceMetricsDefinitionsByType(ctx context.Con
442442
return
443443
}
444444

445-
pager := clientMetricsDefinitions.NewListPager(resourceIDs[0], nil)
445+
discoveredNamespaces := map[string]struct{}{}
446+
447+
s.collectMetricDefinitionsByType(ctx, subscriptionID, resourceType, resourceIDs[0], clientMetricsDefinitions, nil, discoveredNamespaces)
448+
449+
// The Azure Monitor MetricDefinitions API only returns custom metric namespace
450+
// definitions (e.g. "azure.vm.linux.guestmetrics" published by AMA/MetricsExtension)
451+
// when the metricnamespace query parameter is set explicitly. Make additional calls
452+
// for each namespace configured in the metrics filter that was not already returned
453+
// by the default call above.
454+
for configNamespace := range s.cfg.Metrics {
455+
if _, found := discoveredNamespaces[strings.ToLower(configNamespace)]; found {
456+
continue
457+
}
458+
opts := &armmonitor.MetricDefinitionsClientListOptions{
459+
Metricnamespace: to.Ptr(configNamespace),
460+
}
461+
s.collectMetricDefinitionsByType(ctx, subscriptionID, resourceType, resourceIDs[0], clientMetricsDefinitions, opts, nil)
462+
}
463+
464+
s.resourceTypes[subscriptionID][resourceType].metricsDefinitionsUpdated = time.Now()
465+
s.settings.Logger.Info("Loaded the list of Azure Metrics Definitions",
466+
zap.Int("metrics_definitions_count", len(s.resourceTypes[subscriptionID][resourceType].metricsByCompositeKey)),
467+
zap.String("resource_type", resourceType),
468+
zap.String("subscription_id", subscriptionID))
469+
}
470+
471+
// collectMetricDefinitionsByType pages through a MetricDefinitions pager and registers each
472+
// metric definition into the resourceType's metricsByCompositeKey map.
473+
// discoveredNamespaces, when non-nil, is populated with the lowercased namespaces seen in this call.
474+
// TODO: Partially duplicate of collectMetricDefinitions in scraper.go
475+
func (s *azureBatchScraper) collectMetricDefinitionsByType(
476+
ctx context.Context,
477+
subscriptionID, resourceType, resourceID string,
478+
clientMetricsDefinitions *armmonitor.MetricDefinitionsClient,
479+
opts *armmonitor.MetricDefinitionsClientListOptions,
480+
discoveredNamespaces map[string]struct{},
481+
) {
482+
pager := clientMetricsDefinitions.NewListPager(resourceID, opts)
446483
page := 0
447484
for pager.More() {
448485
nextResult, err := pager.NextPage(ctx)
@@ -462,26 +499,28 @@ func (s *azureBatchScraper) loadResourceMetricsDefinitionsByType(ctx context.Con
462499

463500
for _, v := range nextResult.Value {
464501
metricName := *v.Name.Value
465-
metricAggregations := getMetricAggregations(*v.Namespace, metricName, s.cfg.Metrics, convertAggregationsToStr(v.SupportedAggregationTypes))
502+
metricNamespace := *v.Namespace
503+
504+
if discoveredNamespaces != nil {
505+
discoveredNamespaces[strings.ToLower(metricNamespace)] = struct{}{}
506+
}
507+
508+
metricAggregations := getMetricAggregations(metricNamespace, metricName, s.cfg.Metrics, convertAggregationsToStr(v.SupportedAggregationTypes))
466509
if len(metricAggregations) == 0 {
467510
continue
468511
}
469512

470513
timeGrain := *v.MetricAvailabilities[0].TimeGrain
471514
dimensions := filterDimensions(v.Dimensions, s.cfg.Dimensions, resourceType, metricName)
472515
compositeKey := metricsCompositeKey{
516+
namespace: metricNamespace,
473517
timeGrain: timeGrain,
474518
dimensions: serializeDimensions(dimensions),
475519
aggregations: strings.Join(metricAggregations, ","),
476520
}
477521
s.loadMetricsDefinitionByType(subscriptionID, resourceType, metricName, compositeKey)
478522
}
479523
}
480-
s.resourceTypes[subscriptionID][resourceType].metricsDefinitionsUpdated = time.Now()
481-
s.settings.Logger.Info("Loaded the list of Azure Metrics Definitions",
482-
zap.Int("metrics_definitions_count", len(s.resourceTypes[subscriptionID][resourceType].metricsByCompositeKey)),
483-
zap.String("resource_type", resourceType),
484-
zap.String("subscription_id", subscriptionID))
485524
}
486525

487526
// TODO: duplicate
@@ -580,7 +619,7 @@ func (s *azureBatchScraper) loadBatchMetricsValues(ctx context.Context, subscrip
580619
response, err := clientMetrics.QueryResources(
581620
ctx,
582621
subscriptionID,
583-
resourceType,
622+
compositeKey.namespace,
584623
metricsByGrain.metrics[start:end],
585624
azmetrics.ResourceIDList{ResourceIDs: resType.resourceIDs[startResources:endResources]},
586625
&opts,

0 commit comments

Comments
 (0)