Skip to content

[RFE] Report failures to Azure when there's an unrecoverable error #58

Open
@anhvoms

Description

@anhvoms

Current situation

Currently azure-init does not report any failure to Azure. If it can't finish provisioning, it will return with an error code. From a user perspective, provisioning will eventually fail with OS provisioning timeout due to Azure platform not receiving a provisioning complete signal.

In many cases the user might not be able to access the VM if provisioning fails and as such, might have a very hard time figuring out why provisioning failed

Ideal future situation

Have the azure-init report failures to Azure, which will then fail provisioning with a useful error message indicating why provisioning failed.

**Implementation options

These are not two mutually exclusive options, but rather complimenting each other.
 

  1. Use wireserver to report errors to the platform. Here is how cloud-init is doing it. Essentially azure-init will need to construct a health report similar to reporting provisioning complete, but indicating the report status as NotReady, a substatus of ProvisioningFailed, and a meaningful description that will eventually show up as an error message back to the user.
    I would strongly encourage azure-init to follow the error messages used by cloud-init, because we have post-processing, monitoring, and alerting mechanism built around the errors returned by cloud-init.
    A sample error returned by cloud-init

result=error|reason=http error querying IMDS|agent=Cloud-Init/23.3.3-0ubuntu0~20.04.1|http_code=410|duration=300.2051315307617|'exception=UrlError(''410 Client Error: Gone for url: http://169.254.169.254/metadata/instance?api-version=2021-08-01&extended=true'')'|url=http://169.254.169.254/metadata/instance?api-version=2021-08-01&extended=true|vm_id=e76f68ac-04a8-4069-be7c-7f04b01f520f|timestamp=2024-03-12T09:39:16.373226|documentation_url=https://aka.ms/linuxprovisioningerror

  1. The failure reporting via wireserver only works if azure-init can establish communication to wireserver and can successfully post the error. In the cases where it's not working, the other option is to write a KVP with the error and Azure platform will process it.
    See cloud-init implementation as reference

Metadata

Metadata

Assignees

Labels

EpicA very large feature request or roadmap item.error-handling-loggingfeatureNew feature or request

Type

No type

Projects

  • Status

    Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions