[extension/azure_encoding] Add support for Administrative, Alert, Autoscale, Policy, Security, ServiceHealth, and ResourceHealth log categories. by zmoog · Pull Request #45699 · open-telemetry/opentelemetry-collector-contrib

zmoog · 2026-01-28T17:03:02Z

Description

This PR brings over the remaining Activity log categories from translator/azurelogs, aligning this component with the changes made in #44871.

New log categories:

Administrative
Alert
Autoscale
Policy
Security
ServiceHealth
ResourceHealth

Testing

We added test cases for each category, leveraging the same log events found in translator/azurelogs for consistency.

Documentation

Added a section covering Identity (including its major subfields) along with dedicated sections for each new log category.

zmoog · 2026-01-28T17:42:32Z

@Fiery-Fenix, I’d love to get your eyes on this as well, whenever you have a chance.

# Conflicts: # extension/encoding/azureencodingextension/internal/unmarshaler/logs/README.md # extension/encoding/azureencodingextension/internal/unmarshaler/logs/category.go # extension/encoding/azureencodingextension/internal/unmarshaler/logs/helpers.go

Use the tranlator/azurelogs test document

Use the translator/azurelogs test document

Fiery-Fenix · 2026-01-29T15:17:18Z

Also I think it's better to place all test data (input json files and expected yaml files) into single folder that correlates with name of code file that is actually parsing them, like it done for other category_*.go files.
As we parsing all new Categories in this PR in category_activitylogs.go so all test data files should be placed into testdata/activitylogs folder.
It's simplify finding of respective test data during debugging/issue investigation

zmoog · 2026-01-30T10:06:40Z

It's simplify finding of respective test data during debugging/issue investigation

I agree! It definitely simplifies finding the relevant test data during debugging. Sorry for missing the obvious—my bad!

zmoog · 2026-01-30T10:54:05Z

@Fiery-Fenix, I believe we need to reconcile the data model for azure.identity between activity and storage logs.

Analysis of 36 JSON test files revealed:

9 files (25%) contain an identity field
27 files (75%) have no identity field at all

Files WITH Identity

Family	File	Identity Structure
Activity Logs	`administrative/administrative-log.json`	`authorization` (object) + `claims` (map)
Activity Logs	`policy/policy-log.json`	`authorization` (object) + `claims` (map)
Activity Logs	`alert/alert-log.json`	`claims` only (single `spn` claim)
Activity Logs	`autoscale/autoscale-log.json`	`claims` only (single `spn` claim)
Activity Logs	`servicehealth/servicehealth-log.json`	`claims` only (single `emailaddress` claim)
Activity Logs	`recommendation/recommendation-log.json`	`claims` only (single `emailaddress` claim)
Storage	`storage/storage-read-log.json`	`type`, `tokenHash`, `authorization` (array), `requester`
General (test)	`general/maximum.json`	`claim` (singular, not `claims`) with `oid`

Files WITHOUT Identity

The following categories have no identity field in their test data:

All App Service logs (8 files)
All Front Door logs (3 files)
All Azure CDN logs (1 file)
All Data Factory logs (3 files)
All Messaging logs (5 files)
All AGW logs (3 files)
Function App logs (1 file)
General minimum logs (2 files)
Security log (1 file) - security info is in properties instead
Resource Health log (1 file)

Distinct Identity Patterns

Pattern 1: Activity Log Identity (Azure AD / Entra ID)

Used by: Administrative, Policy, Alert, Autoscale, ServiceHealth, Recommendation

This is the standard Azure Activity Log identity structure based on Azure AD/Entra ID tokens.

{
  "authorization": {
    "scope": "/subscriptions/.../providers/...",
    "action": "microsoft.insights/diagnosticSettings/write",
    "evidence": {
      "role": "Owner",
      "roleAssignmentScope": "/subscriptions/...",
      "roleAssignmentId": "...",
      "roleDefinitionId": "...",
      "principalId": "...",
      "principalType": "User"
    }
  },
  "claims": {
    "iss": "https://sts.windows.net/.../",
    "aud": "https://management.core.windows.net/",
    "exp": "1744717084",
    "nbf": "1744711621",
    "iat": "1744711621",
    "http://schemas.microsoft.com/identity/claims/scope": "user_impersonation",
    "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress": "user@example.com",
    "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameidentifier": "...",
    "appid": "...",
    "idtyp": "user"
  }
}

Key characteristics:

authorization is an object with scope, action, and optional evidence
claims is a map of JWT claim keys to values
Timestamps (exp, nbf, iat) are Unix timestamps (seconds since epoch)
Some claims use long XML namespace prefixes

Variants:

Full structure: Has both authorization and claims (Administrative, Policy)
Claims-only: Has only claims with minimal entries (Alert, Autoscale, ServiceHealth, Recommendation)

Pattern 2: Storage Identity (SAS/Delegation)

Used by: Storage logs (StorageRead, StorageWrite, StorageDelete)

This structure is specific to Azure Storage and represents SAS token or delegation-based access.

{
  "type": "DelegationSAS",
  "tokenHash": "system-delegation(...),SasSignature(...)",
  "authorization": [
    {
      "action": "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
      "roleAssignmentId": "...",
      "roleDefinitionId": "...",
      "principals": [
        {
          "id": "...",
          "type": "User"
        }
      ],
      "denyAssignmentId": "",
      "type": "RBAC",
      "result": "Denied",
      "reason": "NoPolicy"
    }
  ],
  "requester": {
    "objectId": "...",
    "tenantId": "..."
  }
}

Key characteristics:

authorization is an array (not an object like in Activity Logs!)
Contains type field indicating authentication type (e.g., "DelegationSAS")
Has tokenHash for audit purposes
requester contains the identity of the requester
Each authorization entry has principals array and result/reason for access decisions

Pattern 3: Generic/Unknown (KeyVault-like)

Used by: general/maximum.json (test data, possibly KeyVault)

This appears to be a simplified or different identity structure.

{
  "claim": {
    "oid": "607964b6-41a5-4e24-a5db-db7aab3b9b34"
  }
}

Key characteristics:

Uses claim (singular) instead of claims (plural)
Contains only oid (object identifier)
Much simpler structure than Activity Log identity

zmoog · 2026-01-30T11:11:03Z

The current PR is based on the pkg/translator/azurelogs implementation, which attempts to:

Parse identity as Activity Log structure (with authorization object and claims map)
Extract specific fields into flat OpenTelemetry attributes

So, for the Policy category, the translator and current PR create azure.identity.authorization.* fields like:

  - key: azure.identity.authorization.action
    value:
      stringValue: microsoft.insights/diagnosticSettings/write
  - key: azure.identity.authorization.evidence.role
    value:
      stringValue: Owner
  - key: azure.identity.authorization.evidence.role.assignment.scope
    value:
      stringValue: /subscriptions/11111111-1111-1111-1111-111111111111

Instead, the storage categories map identity as follows:

  - key: azure.identity
    value:
      kvlistValue:
        values:
          - key: authorization
            value:
              arrayValue:
                values:
                  - kvlistValue:
                      values:
                        - key: principals
                          value:
                            arrayValue:
                              values:
                                - kvlistValue:
                                    values:
                                      - key: type
                                        value:
                                          stringValue: User
                                      - key: id
                                        value:
                                          stringValue: 7890abcd-ef12-3456-7890-abcd12345678

The structural difference is that azure.identity.authorization is an array in storage.

zmoog · 2026-01-30T11:39:38Z

@Fiery-Fenix @constanca-m, how should we move forward?

Should we continue deviating from the extension by:

Moving away from the flat fields in azure.identity.*?
Mapping azure.identity.authorization as an array to support all existing sub-fields from activity and storage logs?

In that case, how should we handle the azure.identity.claims elements? Should we:

Map each claim to an OpenTelemetry attribute (e.g., mapping the claim http://schemas.[..]/identity/claims/emailaddress as user.email field), like the translator?
Map them as-is, like the extension?

Fiery-Fenix · 2026-02-01T13:23:29Z

@Fiery-Fenix @constanca-m, how should we move forward?

@zmoog How about to make similar trick as we made for general Category parsing? It should bring required flexibility in Azure Identity parsing.

Remove Identity field from azureLogRecordBase structure. It seems that it present only in small subset of Categories, so simply not needed in all Categories
Create a new azureIdentityRecord interface:

type azureIdentityRecord interface {
  PutIdentityAttributes(attrs pcommon.Map)
}

Create a base structure with Azure Identity fields that are common, at least that we know to be common at the moment:

type azureIdentityBase struct {
  ........ list of common fields ....
}
func (i *azureIdentityBase) PutIdentityAttributes(attrs pcommon.Map) {
  // put common Identity fields into Log Attributes here
}

Create a Category-specific structures that are handling discrepancy between common Identity schema and Category specific Identity schema, like authorization object/array difference as you mentioned:

type azureIdentityStorage struct {
  azureIdentityBase
  .... set of category-specific fields ...
}
func (i *azureIdentityStorage) PutIdentityAttributes(attrs pcommon.Map) {
  // put common Identity fields into Log Attributes
  i.azureIdentityBase.PutIdentityAttributes(attrs)

  // put category-specific Identity fields into Log Attributes here
}

In required Category structures add Identity field with specific type:

type azureStorageBlobLog struct {
  azureLogRecordBase

  .... existing fields ...

  Identity  azureIdentityStorage `json:"identity"`

Add Identity attributes processing call to existing PutCommonAttributes Category function:

func (r *azureStorageBlobLog) PutCommonAttributes(attrs pcommon.Map, body pcommon.Value) {
  // existing code goes here

  // add Identity attributes
  r.Identity.PutIdentityAttributes(attrs)
}

I think by using this approach we can adopt Azure Identity parsing for any possible JSON structure if it will be required in future. Also by using this approach we'll avoid additional JSON parsing for identity field as the whole category-specific JSON record will be parsed fully at once - good for performance as well

zmoog · 2026-02-02T12:32:59Z

How about to make similar trick as we made for general Category parsing? It should bring required flexibility in Azure Identity parsing.

I think by using this approach we can adopt Azure Identity parsing for any possible JSON structure if it will be required in future.

It's a sound approach.

But before moving to the implementation, I would love to sort out the general problem.

We learned that activity and storage logs both have and identity fiend with a diverging structure:

Activity Logs: identity is an object with authorization (object) and claims (map)
Storage Logs: identity is an object with authorization (array), type, tokenHash, and requester

Structure is clearly different, but what about the semantics? Are these semantically the same, similar, or fundamentally different?

These seems semantically different concepts that happen to share the field name identity:

Activity Log identity = Caller Identity Assertion

Answers: "Who is the authenticated entity that performed this action?"
Contains: Identity claims from OAuth/OIDC token
Purpose: Audit trail of WHO

Storage Log identity = Authorization Decision Audit

Answers: "How was this request authenticated and what authorization decisions were made?"
Contains: Authentication method, authorization check results, denial reasons
Purpose: Audit trail of HOW and WHY (allowed or denied)

The Storage identity is closer to an access decision log than a pure identity claim.
It records authorization outcomes, which seems quite different from Activity Log's
identity assertions.

Based on differences in semantics, should we separate the concepts into different attribute families?

For example (just an example, I'm not suggesting to make these changes):

Activity Logs → azure.identity.*
  azure.identity.issuer
  azure.identity.audience
  azure.identity.authorization.scope
  azure.identity.authorization.action
  user.email
  user.name

Storage Logs → azure.authorization.*
  azure.authorization.type              (DelegationSAS, OAuth, AccountKey)
  azure.authorization.token_hash
  azure.authorization.decisions         (array of check results)
  azure.requester.object_id
  azure.requester.tenant_id

As a user, I would find confusing getting logs events with the same azure.identity field with two different structures and semantics.

zmoog · 2026-02-02T12:43:38Z

Or, perhaps simply renaming azure.identity.authorization to azure.identity.authorizations in the storage logs would resolve the field conflict, effectively turning azure.identity into a general container for identity-related info.

I can live with that, but schema inconsistency across categories still gives me some pause. Could having the same field with different structures and semantics cause issues for downstream backends?

constanca-m · 2026-02-02T13:23:45Z

I propose this semantic conventions:

Semantic conventions for users: https://opentelemetry.io/docs/specs/semconv/registry/attributes/user/#user-attributes

Azure Claim Field	OpenTelemetry Attribute
http://.../nameidentifier	user.id
http://.../emailaddress	user.email
http://.../scope	user.roles

The other fields don't have conventions that fit them, that I could find. So I think we could use the ones you mentioned.

For the storage logs, since authorization is an array, I agree it makes sense to use the ones you mentioned.

It seems azure.identity is missing from some categories, and there are significant structural and semantic differences between them.

zmoog · 2026-02-13T12:03:31Z

@Fiery-Fenix, I think your proposal (moving azure.identity mappings to categories) makes sense.

I don't think we have a good sampling of how the categories implement azure.identity, at least not yet. So, we can deal with the current status in categories and maybe re-evaluate this decision as we add more log categories over time.

So I'm making these changes and addressing other suggestions from the review comments:

Switched user.name to user.id for the nameidentifier claim (per @constanca-m's semconv suggestion)
Removed rawMapIdentity and generic identity handling from azureLogRecordGeneric - unknown categories no longer parse identity
Aligned azureIdentityStorage structure with activity logs by embedding azureIdentityBase (following your proposed pattern)

Remove trailing blank line in category_identity.go and fix struct field alignment in category_storage.go Co-authored-by: Cursor <cursoragent@cursor.com>

zmoog · 2026-02-18T13:33:21Z

@constanca-m, on scope → user.roles: scope seems to refer to OAuth2 permissions, which doesn't always quite fit here semantics-wise. In sample logs I often see user_impersonation as value.

How strongly do you feel about pushing this through?

zmoog · 2026-02-18T14:59:55Z

@Fiery-Fenix, could you take another look when you have a moment?

Fiery-Fenix

Looks great!

constanca-m

Sorry I am being such a slow reviewer

Just dropping the empty field denyAssignmentId

constanca-m

I am a bit worried of using camel case for attributes, but if this behavior was already present, we can handle it in a different PR

zmoog · 2026-02-25T14:40:14Z

I am a bit worried of using camel case for attributes, but if this behavior was already present, we can handle it in a different PR

I see, but it was already part of the storage mapping.

zmoog · 2026-02-25T14:51:13Z

@atoulme, I think we're ready for your final review. Could you take a look when you have a moment? 🙇

otelbot · 2026-02-27T17:40:52Z

Thank you for your contribution @zmoog! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. If you are getting started contributing, you can also join the CNCF Slack channel #opentelemetry-new-contributors to ask for guidance and get help.

…oscale, Policy, Security, ServiceHealth, and ResourceHealth log categories. (open-telemetry#45699)  #### Description This PR brings over the remaining Activity log categories from `translator/azurelogs`, aligning this component with the changes made in open-telemetry#44871. New log categories: - Administrative - Alert - Autoscale - Policy - Security - ServiceHealth - ResourceHealth   #### Testing We added test cases for each category, leveraging the same log events found in translator/azurelogs for consistency.  #### Documentation  Added a section covering Identity (including its major subfields) along with dedicated sections for each new log category. --------- Co-authored-by: Cursor <cursoragent@cursor.com>

zmoog requested review from a team and axw as code owners January 28, 2026 17:03

github-actions bot assigned mx-psi Jan 28, 2026

github-actions bot added the extension/encoding/azureencoding label Jan 28, 2026

github-actions bot requested a review from constanca-m January 28, 2026 17:03

zmoog marked this pull request as draft January 28, 2026 17:09

zmoog changed the title ~~[extension/azure_encoding] Port Activity logs from translator/azurelogs~~ [extension/azure_encoding] Add support for Alert, Autoscale, Policy, Recommendation, Security, ServiceHealth, and ResourceHealth log categories. Jan 28, 2026

zmoog marked this pull request as ready for review January 28, 2026 17:42

zmoog added 12 commits January 29, 2026 13:13

Align Autoscale test document

3e2a66c

Align policy test document

56f1976

Use the tranlator/azurelogs test document

Align recommendation test document

beb7c14

Use the translator/azurelogs test document

Align resourcehealth test document

4e6ea86

Use the translator/azurelogs test document

Align security test document

65f9389

Use the translator/azurelogs test document

Align servicehealth test document

8477f4a

Use the translator/azurelogs test document

Update identity fields in the README

ef14888

Cleanup

c24b60d

Add changelog entry

65831ff

Fix changelog entry

f6d9f6d

Remove unused const

4028542

zmoog force-pushed the zmoog/azureencodingextension/port-activitylogs-from-translator branch from 95b92f4 to 4028542 Compare January 29, 2026 12:17

Fiery-Fenix reviewed Jan 29, 2026

View reviewed changes

Only set the attribute if the field is populated

3c89c01

Move azure.identity mappings to categories

9599283

It seems azure.identity is missing from some categories, and there are significant structural and semantic differences between them.

Fix lint issues in azureencodingextension

2f01736

Remove trailing blank line in category_identity.go and fix struct field alignment in category_storage.go Co-authored-by: Cursor <cursoragent@cursor.com>

zmoog requested a review from Fiery-Fenix February 18, 2026 14:58

Fiery-Fenix approved these changes Feb 22, 2026

View reviewed changes

constanca-m reviewed Feb 23, 2026

View reviewed changes

zmoog added 2 commits February 24, 2026 15:56

Refine equal-fold comparison

d8cd39a

Restore original golden file

53233d3

Just dropping the empty field denyAssignmentId

zmoog requested a review from constanca-m February 24, 2026 17:06

constanca-m approved these changes Feb 24, 2026

View reviewed changes

atoulme approved these changes Feb 27, 2026

View reviewed changes

atoulme merged commit 7154bf1 into open-telemetry:main Feb 27, 2026
197 checks passed

Conversation

zmoog commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Documentation

Uh oh!

zmoog commented Jan 28, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fiery-Fenix commented Jan 29, 2026

Uh oh!

zmoog commented Jan 30, 2026

Uh oh!

zmoog commented Jan 30, 2026

Files WITH Identity

Files WITHOUT Identity

Distinct Identity Patterns

Pattern 1: Activity Log Identity (Azure AD / Entra ID)

Pattern 2: Storage Identity (SAS/Delegation)

Pattern 3: Generic/Unknown (KeyVault-like)

Uh oh!

zmoog commented Jan 30, 2026

Uh oh!

zmoog commented Jan 30, 2026

Uh oh!

Fiery-Fenix commented Feb 1, 2026

Uh oh!

zmoog commented Feb 2, 2026

Uh oh!

zmoog commented Feb 2, 2026

Uh oh!

constanca-m commented Feb 2, 2026

Uh oh!

zmoog commented Feb 13, 2026

Uh oh!

zmoog commented Feb 18, 2026

Uh oh!

zmoog commented Feb 18, 2026

Uh oh!

Fiery-Fenix left a comment

Choose a reason for hiding this comment

Uh oh!

constanca-m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

constanca-m left a comment

Choose a reason for hiding this comment

Uh oh!

zmoog commented Feb 25, 2026

Uh oh!

zmoog commented Feb 25, 2026

Uh oh!

Uh oh!

otelbot bot commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zmoog commented Jan 28, 2026 •

edited

Loading