Skip to content

[extension/azure_encoding] Add support for Administrative, Alert, Autoscale, Policy, Security, ServiceHealth, and ResourceHealth log categories.#45699

Merged
atoulme merged 17 commits intoopen-telemetry:mainfrom
zmoog:zmoog/azureencodingextension/port-activitylogs-from-translator
Feb 27, 2026
Merged

[extension/azure_encoding] Add support for Administrative, Alert, Autoscale, Policy, Security, ServiceHealth, and ResourceHealth log categories.#45699
atoulme merged 17 commits intoopen-telemetry:mainfrom
zmoog:zmoog/azureencodingextension/port-activitylogs-from-translator

Conversation

@zmoog
Copy link
Copy Markdown
Contributor

@zmoog zmoog commented Jan 28, 2026

Description

This PR brings over the remaining Activity log categories from translator/azurelogs, aligning this component with the changes made in #44871.

New log categories:

  • Administrative
  • Alert
  • Autoscale
  • Policy
  • Security
  • ServiceHealth
  • ResourceHealth

Testing

We added test cases for each category, leveraging the same log events found in translator/azurelogs for consistency.

Documentation

Added a section covering Identity (including its major subfields) along with dedicated sections for each new log category.

@zmoog zmoog requested review from a team and axw as code owners January 28, 2026 17:03
@github-actions github-actions bot requested a review from constanca-m January 28, 2026 17:03
@zmoog zmoog marked this pull request as draft January 28, 2026 17:09
@zmoog zmoog changed the title [extension/azure_encoding] Port Activity logs from translator/azurelogs [extension/azure_encoding] Add support for Alert, Autoscale, Policy, Recommendation, Security, ServiceHealth, and ResourceHealth log categories. Jan 28, 2026
@zmoog zmoog changed the title [extension/azure_encoding] Add support for Alert, Autoscale, Policy, Recommendation, Security, ServiceHealth, and ResourceHealth log categories. [extension/azure_encoding] Add support for Administrative, Alert, Autoscale, Policy, Security, ServiceHealth, and ResourceHealth log categories. Jan 28, 2026
@zmoog zmoog marked this pull request as ready for review January 28, 2026 17:42
@zmoog
Copy link
Copy Markdown
Contributor Author

zmoog commented Jan 28, 2026

@Fiery-Fenix, I’d love to get your eyes on this as well, whenever you have a chance.

zmoog added 12 commits January 29, 2026 13:13
# Conflicts:
#	extension/encoding/azureencodingextension/internal/unmarshaler/logs/README.md
#	extension/encoding/azureencodingextension/internal/unmarshaler/logs/category.go
#	extension/encoding/azureencodingextension/internal/unmarshaler/logs/helpers.go
Use the tranlator/azurelogs test document
Use the translator/azurelogs test document
Use the translator/azurelogs test document
Use the translator/azurelogs test document
Use the translator/azurelogs test document
@zmoog zmoog force-pushed the zmoog/azureencodingextension/port-activitylogs-from-translator branch from 95b92f4 to 4028542 Compare January 29, 2026 12:17
Comment thread extension/encoding/azureencodingextension/internal/unmarshaler/logs/category.go Outdated
Comment thread extension/encoding/azureencodingextension/internal/unmarshaler/logs/category.go Outdated
Comment thread extension/encoding/azureencodingextension/internal/unmarshaler/logs/category.go Outdated
@Fiery-Fenix
Copy link
Copy Markdown
Contributor

Also I think it's better to place all test data (input json files and expected yaml files) into single folder that correlates with name of code file that is actually parsing them, like it done for other category_*.go files.
As we parsing all new Categories in this PR in category_activitylogs.go so all test data files should be placed into testdata/activitylogs folder.
It's simplify finding of respective test data during debugging/issue investigation

@zmoog
Copy link
Copy Markdown
Contributor Author

zmoog commented Jan 30, 2026

It's simplify finding of respective test data during debugging/issue investigation

I agree! It definitely simplifies finding the relevant test data during debugging. Sorry for missing the obvious—my bad!

@zmoog
Copy link
Copy Markdown
Contributor Author

zmoog commented Jan 30, 2026

@Fiery-Fenix, I believe we need to reconcile the data model for azure.identity between activity and storage logs.

Analysis of 36 JSON test files revealed:

  • 9 files (25%) contain an identity field
  • 27 files (75%) have no identity field at all

Files WITH Identity

Family File Identity Structure
Activity Logs administrative/administrative-log.json authorization (object) + claims (map)
Activity Logs policy/policy-log.json authorization (object) + claims (map)
Activity Logs alert/alert-log.json claims only (single spn claim)
Activity Logs autoscale/autoscale-log.json claims only (single spn claim)
Activity Logs servicehealth/servicehealth-log.json claims only (single emailaddress claim)
Activity Logs recommendation/recommendation-log.json claims only (single emailaddress claim)
Storage storage/storage-read-log.json type, tokenHash, authorization (array), requester
General (test) general/maximum.json claim (singular, not claims) with oid

Files WITHOUT Identity

The following categories have no identity field in their test data:

  • All App Service logs (8 files)
  • All Front Door logs (3 files)
  • All Azure CDN logs (1 file)
  • All Data Factory logs (3 files)
  • All Messaging logs (5 files)
  • All AGW logs (3 files)
  • Function App logs (1 file)
  • General minimum logs (2 files)
  • Security log (1 file) - security info is in properties instead
  • Resource Health log (1 file)

Distinct Identity Patterns

Pattern 1: Activity Log Identity (Azure AD / Entra ID)

Used by: Administrative, Policy, Alert, Autoscale, ServiceHealth, Recommendation

This is the standard Azure Activity Log identity structure based on Azure AD/Entra ID tokens.

{
  "authorization": {
    "scope": "/subscriptions/.../providers/...",
    "action": "microsoft.insights/diagnosticSettings/write",
    "evidence": {
      "role": "Owner",
      "roleAssignmentScope": "/subscriptions/...",
      "roleAssignmentId": "...",
      "roleDefinitionId": "...",
      "principalId": "...",
      "principalType": "User"
    }
  },
  "claims": {
    "iss": "https://sts.windows.net/.../",
    "aud": "https://management.core.windows.net/",
    "exp": "1744717084",
    "nbf": "1744711621",
    "iat": "1744711621",
    "http://schemas.microsoft.com/identity/claims/scope": "user_impersonation",
    "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress": "user@example.com",
    "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameidentifier": "...",
    "appid": "...",
    "idtyp": "user"
  }
}

Key characteristics:

  • authorization is an object with scope, action, and optional evidence
  • claims is a map of JWT claim keys to values
  • Timestamps (exp, nbf, iat) are Unix timestamps (seconds since epoch)
  • Some claims use long XML namespace prefixes

Variants:

  • Full structure: Has both authorization and claims (Administrative, Policy)
  • Claims-only: Has only claims with minimal entries (Alert, Autoscale, ServiceHealth, Recommendation)

Pattern 2: Storage Identity (SAS/Delegation)

Used by: Storage logs (StorageRead, StorageWrite, StorageDelete)

This structure is specific to Azure Storage and represents SAS token or delegation-based access.

{
  "type": "DelegationSAS",
  "tokenHash": "system-delegation(...),SasSignature(...)",
  "authorization": [
    {
      "action": "Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
      "roleAssignmentId": "...",
      "roleDefinitionId": "...",
      "principals": [
        {
          "id": "...",
          "type": "User"
        }
      ],
      "denyAssignmentId": "",
      "type": "RBAC",
      "result": "Denied",
      "reason": "NoPolicy"
    }
  ],
  "requester": {
    "objectId": "...",
    "tenantId": "..."
  }
}

Key characteristics:

  • authorization is an array (not an object like in Activity Logs!)
  • Contains type field indicating authentication type (e.g., "DelegationSAS")
  • Has tokenHash for audit purposes
  • requester contains the identity of the requester
  • Each authorization entry has principals array and result/reason for access decisions

Pattern 3: Generic/Unknown (KeyVault-like)

Used by: general/maximum.json (test data, possibly KeyVault)

This appears to be a simplified or different identity structure.

{
  "claim": {
    "oid": "607964b6-41a5-4e24-a5db-db7aab3b9b34"
  }
}

Key characteristics:

  • Uses claim (singular) instead of claims (plural)
  • Contains only oid (object identifier)
  • Much simpler structure than Activity Log identity

@zmoog
Copy link
Copy Markdown
Contributor Author

zmoog commented Jan 30, 2026

The current PR is based on the pkg/translator/azurelogs implementation, which attempts to:

  1. Parse identity as Activity Log structure (with authorization object and claims map)
  2. Extract specific fields into flat OpenTelemetry attributes

So, for the Policy category, the translator and current PR create azure.identity.authorization.* fields like:

  - key: azure.identity.authorization.action
    value:
      stringValue: microsoft.insights/diagnosticSettings/write
  - key: azure.identity.authorization.evidence.role
    value:
      stringValue: Owner
  - key: azure.identity.authorization.evidence.role.assignment.scope
    value:
      stringValue: /subscriptions/11111111-1111-1111-1111-111111111111

Instead, the storage categories map identity as follows:

  - key: azure.identity
    value:
      kvlistValue:
        values:
          - key: authorization
            value:
              arrayValue:
                values:
                  - kvlistValue:
                      values:
                        - key: principals
                          value:
                            arrayValue:
                              values:
                                - kvlistValue:
                                    values:
                                      - key: type
                                        value:
                                          stringValue: User
                                      - key: id
                                        value:
                                          stringValue: 7890abcd-ef12-3456-7890-abcd12345678

The structural difference is that azure.identity.authorization is an array in storage.

@zmoog
Copy link
Copy Markdown
Contributor Author

zmoog commented Jan 30, 2026

@Fiery-Fenix @constanca-m, how should we move forward?

Should we continue deviating from the extension by:

  • Moving away from the flat fields in azure.identity.*?
  • Mapping azure.identity.authorization as an array to support all existing sub-fields from activity and storage logs?

In that case, how should we handle the azure.identity.claims elements? Should we:

  • Map each claim to an OpenTelemetry attribute (e.g., mapping the claim http://schemas.[..]/identity/claims/emailaddress as user.email field), like the translator?
  • Map them as-is, like the extension?

@Fiery-Fenix
Copy link
Copy Markdown
Contributor

@Fiery-Fenix @constanca-m, how should we move forward?

@zmoog How about to make similar trick as we made for general Category parsing? It should bring required flexibility in Azure Identity parsing.

  • Remove Identity field from azureLogRecordBase structure. It seems that it present only in small subset of Categories, so simply not needed in all Categories
  • Create a new azureIdentityRecord interface:
type azureIdentityRecord interface {
  PutIdentityAttributes(attrs pcommon.Map)
}
  • Create a base structure with Azure Identity fields that are common, at least that we know to be common at the moment:
type azureIdentityBase struct {
  ........ list of common fields ....
}
func (i *azureIdentityBase) PutIdentityAttributes(attrs pcommon.Map) {
  // put common Identity fields into Log Attributes here
}
  • Create a Category-specific structures that are handling discrepancy between common Identity schema and Category specific Identity schema, like authorization object/array difference as you mentioned:
type azureIdentityStorage struct {
  azureIdentityBase
  .... set of category-specific fields ...
}
func (i *azureIdentityStorage) PutIdentityAttributes(attrs pcommon.Map) {
  // put common Identity fields into Log Attributes
  i.azureIdentityBase.PutIdentityAttributes(attrs)

  // put category-specific Identity fields into Log Attributes here
}
  • In required Category structures add Identity field with specific type:
type azureStorageBlobLog struct {
  azureLogRecordBase

  .... existing fields ...

  Identity  azureIdentityStorage `json:"identity"`
  • Add Identity attributes processing call to existing PutCommonAttributes Category function:
func (r *azureStorageBlobLog) PutCommonAttributes(attrs pcommon.Map, body pcommon.Value) {
  // existing code goes here

  // add Identity attributes
  r.Identity.PutIdentityAttributes(attrs)
}

I think by using this approach we can adopt Azure Identity parsing for any possible JSON structure if it will be required in future. Also by using this approach we'll avoid additional JSON parsing for identity field as the whole category-specific JSON record will be parsed fully at once - good for performance as well

@zmoog
Copy link
Copy Markdown
Contributor Author

zmoog commented Feb 2, 2026

How about to make similar trick as we made for general Category parsing? It should bring required flexibility in Azure Identity parsing.

I think by using this approach we can adopt Azure Identity parsing for any possible JSON structure if it will be required in future.

It's a sound approach.

But before moving to the implementation, I would love to sort out the general problem.


We learned that activity and storage logs both have and identity fiend with a diverging structure:

  • Activity Logs: identity is an object with authorization (object) and claims (map)
  • Storage Logs: identity is an object with authorization (array), type, tokenHash, and requester

Structure is clearly different, but what about the semantics? Are these semantically the same, similar, or fundamentally different?

These seems semantically different concepts that happen to share the field name identity:

Activity Log identity = Caller Identity Assertion

  • Answers: "Who is the authenticated entity that performed this action?"
  • Contains: Identity claims from OAuth/OIDC token
  • Purpose: Audit trail of WHO

Storage Log identity = Authorization Decision Audit

  • Answers: "How was this request authenticated and what authorization decisions were made?"
  • Contains: Authentication method, authorization check results, denial reasons
  • Purpose: Audit trail of HOW and WHY (allowed or denied)

The Storage identity is closer to an access decision log than a pure identity claim.
It records authorization outcomes, which seems quite different from Activity Log's
identity assertions.

Based on differences in semantics, should we separate the concepts into different attribute families?

For example (just an example, I'm not suggesting to make these changes):

Activity Logs → azure.identity.*
  azure.identity.issuer
  azure.identity.audience
  azure.identity.authorization.scope
  azure.identity.authorization.action
  user.email
  user.name

Storage Logs → azure.authorization.*
  azure.authorization.type              (DelegationSAS, OAuth, AccountKey)
  azure.authorization.token_hash
  azure.authorization.decisions         (array of check results)
  azure.requester.object_id
  azure.requester.tenant_id

As a user, I would find confusing getting logs events with the same azure.identity field with two different structures and semantics.

@zmoog
Copy link
Copy Markdown
Contributor Author

zmoog commented Feb 2, 2026

Or, perhaps simply renaming azure.identity.authorization to azure.identity.authorizations in the storage logs would resolve the field conflict, effectively turning azure.identity into a general container for identity-related info.

I can live with that, but schema inconsistency across categories still gives me some pause. Could having the same field with different structures and semantics cause issues for downstream backends?

@constanca-m
Copy link
Copy Markdown
Contributor

I propose this semantic conventions:

Azure Claim Field OpenTelemetry Attribute
http://.../nameidentifier user.id
http://.../emailaddress user.email
http://.../scope user.roles

The other fields don't have conventions that fit them, that I could find. So I think we could use the ones you mentioned.

For the storage logs, since authorization is an array, I agree it makes sense to use the ones you mentioned.

It seems azure.identity is missing from some categories, and there are
significant structural and semantic differences between them.
@zmoog
Copy link
Copy Markdown
Contributor Author

zmoog commented Feb 13, 2026

@Fiery-Fenix, I think your proposal (moving azure.identity mappings to categories) makes sense.

I don't think we have a good sampling of how the categories implement azure.identity, at least not yet. So, we can deal with the current status in categories and maybe re-evaluate this decision as we add more log categories over time.

So I'm making these changes and addressing other suggestions from the review comments:

  • Switched user.name to user.id for the nameidentifier claim (per @constanca-m's semconv suggestion)
  • Removed rawMapIdentity and generic identity handling from azureLogRecordGeneric - unknown categories no longer parse identity
  • Aligned azureIdentityStorage structure with activity logs by embedding azureIdentityBase (following your proposed pattern)

Remove trailing blank line in category_identity.go and fix struct field alignment in category_storage.go

Co-authored-by: Cursor <cursoragent@cursor.com>
@zmoog
Copy link
Copy Markdown
Contributor Author

zmoog commented Feb 18, 2026

@constanca-m, on scope → user.roles: scope seems to refer to OAuth2 permissions, which doesn't always quite fit here semantics-wise. In sample logs I often see user_impersonation as value.

How strongly do you feel about pushing this through?

@zmoog zmoog requested a review from Fiery-Fenix February 18, 2026 14:58
@zmoog
Copy link
Copy Markdown
Contributor Author

zmoog commented Feb 18, 2026

@Fiery-Fenix, could you take another look when you have a moment?

Copy link
Copy Markdown
Contributor

@Fiery-Fenix Fiery-Fenix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Copy link
Copy Markdown
Contributor

@constanca-m constanca-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I am being such a slow reviewer

Just dropping the empty field denyAssignmentId
@zmoog zmoog requested a review from constanca-m February 24, 2026 17:06
Copy link
Copy Markdown
Contributor

@constanca-m constanca-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried of using camel case for attributes, but if this behavior was already present, we can handle it in a different PR

@zmoog
Copy link
Copy Markdown
Contributor Author

zmoog commented Feb 25, 2026

I am a bit worried of using camel case for attributes, but if this behavior was already present, we can handle it in a different PR

I see, but it was already part of the storage mapping.

@zmoog
Copy link
Copy Markdown
Contributor Author

zmoog commented Feb 25, 2026

@atoulme, I think we're ready for your final review. Could you take a look when you have a moment? 🙇

@atoulme atoulme merged commit 7154bf1 into open-telemetry:main Feb 27, 2026
197 checks passed
@otelbot
Copy link
Copy Markdown
Contributor

otelbot bot commented Feb 27, 2026

Thank you for your contribution @zmoog! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. If you are getting started contributing, you can also join the CNCF Slack channel #opentelemetry-new-contributors to ask for guidance and get help.

antonio-mazzini pushed a commit to antonio-mazzini/opentelemetry-collector-contrib that referenced this pull request Mar 5, 2026
…oscale, Policy, Security, ServiceHealth, and ResourceHealth log categories. (open-telemetry#45699)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR brings over the remaining Activity log categories from
`translator/azurelogs`, aligning this component with the changes made in
open-telemetry#44871.

New log categories:

- Administrative
- Alert
- Autoscale
- Policy
- Security
- ServiceHealth
- ResourceHealth

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. 
#### Link to tracking issue
Fixes n/a
-->

<!--Describe what testing was performed and which tests were added.-->
#### Testing

We added test cases for each category, leveraging the same log events
found in translator/azurelogs for consistency.

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->

Added a section covering Identity (including its major subfields) along
with dedicated sections for each new log category.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
avleentwilio pushed a commit to avleentwilio/opentelemetry-collector-contrib that referenced this pull request Apr 1, 2026
…oscale, Policy, Security, ServiceHealth, and ResourceHealth log categories. (open-telemetry#45699)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR brings over the remaining Activity log categories from
`translator/azurelogs`, aligning this component with the changes made in
open-telemetry#44871.

New log categories:

- Administrative
- Alert
- Autoscale
- Policy
- Security
- ServiceHealth
- ResourceHealth

<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. 
#### Link to tracking issue
Fixes n/a
-->

<!--Describe what testing was performed and which tests were added.-->
#### Testing

We added test cases for each category, leveraging the same log events
found in translator/azurelogs for consistency.

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->

Added a section covering Identity (including its major subfields) along
with dedicated sections for each new log category.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants