Skip to content

Bug: Resource is never reconciled if Azure API has a temporary issue #4902

@HarleyB123

Description

@HarleyB123

Describe the bug

We had a resource created via Azure Service Operator (FederatedIdentityCredential). This resource was created in March.

On a recent reconciliation we got an error of MethodNotAllowed, see below:

Status:
  ...
  Conditions:
    Last Transition Time:  2025-08-22T19:32:05Z
    Message:               The request format was unexpected : Support for federated identity credentials not enabled: PUT https://management.azure.com/subscriptions/redacted....
--------------------------------------------------------------------------------
RESPONSE 405: 405 Method Not Allowed
ERROR CODE: MethodNotAllowed
--------------------------------------------------------------------------------
{
  "error": {
    "code": "MethodNotAllowed",
    "message": "The request format was unexpected : Support for federated identity credentials not enabled"
  }
}
--------------------------------------------------------------------------------

    Observed Generation:  1
    Reason:               MethodNotAllowed
    Severity:             Error
    Status:               False
    Type:                 Ready

According to Azure documentation here, this error occurs if you are trying to create this resource in an unsupported region (of which these are the only two that are unsupported). This FederatedIdentityCredential already existed since March, and is in the eastus region. So this leads to the suspicion that the Azure API had a temporary blip/outage/something went wrong.

If the Azure Service Operator was to reconcile this resource again afterwards, then this wouldn't be an issue. However, MethodNotAllowed is classified as a fatal error, which gives it an Error Severity. Then the logic here means that it fast return and stops tracking the resources in those cases. This tracks with our logs for the Azure Service Operator, where we see the error and then no more reconciliation attempts against this resource.

For us, this means that a temporary API mishap leaves this resource broken in the cluster until we either restart the controllers or delete/re-apply the resource.

Azure Service Operator Version: 2.12.0

Expected behavior

Ideally, Azure wouldn't have API issues that could put our resource in this state, since I understand the reasoning behind not wanting to retry if the error suggests it won't ever be recoverable. However, if the API could do this unintentionally, then maybe it's worth reconciling regardless with a backoff?

To Reproduce

Tough to reproduce, but I guess for the same behaviour you could cause another error like PasswordTooLong and validate that reconciliations against the resource don't occur.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Up Next

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions