Skip to content

Topology-aware routing causes envoy proxy crashloops #6117

Open
@dghubble

Description

@dghubble

Description:

On Azure, some regions don't have availability zones, so AKS annotates nodes with topology.kubernetes.io/zone: 0 (see docs).

In envoy-gateway v1.4.0, topolgy-aware routing starts to use these pod annotations to populate the envoy bootstrap config. But envoy considers the value of zero to be invalid.

ENVOY_SERVICE_ZONE:        (v1:metadata.annotations['topology.kubernetes.io/zone'])                                                                                                                                                                                                                                   

As a result, envoy proxy pods crashloop on upgrade to envoy-gateway v1.4.0.

envoy  ': Unable to parse JSON as proto (INVALID_ARGUMENT: invalid JSON in envoy.config.bootstrap.v3.Bootstrap @ static_resources.clusters[1].load_assignment.endpoints[0].locality.zone:  string, near   1:1020   (offset 1019): unexpected  character: '0'; expected  '"'): {"cluster_manager":{"local_cluster_name":"loc │
│ envoy [2025-05-19 17:34:15.677][1][info][main] [source/server/server.cc:1072] exiting                                                                                                                                                                                                                                       │
│ envoy Unable to parse JSON as proto (INVALID_ARGUMENT: invalid JSON in envoy.config.bootstrap.v3.Bootstrap @ static_resources.clusters[1].load_assignment.endpoints[0].locality.zone:  string, near   1:1020   (offset 1019): unexpected  character: '0'; expected  '"'): {"cluster_manager":{"local_cluster_name":"local_cluster...

Rolling back to v1.4.0-rc.2 (which I was testing previously) resolves the issue, because the release candidate does not have the change that added this behavior it seems.

What issue is being seen? Describe what should be happening instead of
the bug, for example: The expected value isn't returned, etc.

Repro steps:

Annotate the envoy proxy pods with topology.kubernetes.io/zone: 0.

Include sample requests, environment, etc. All data and inputs
required to reproduce the bug.

Note:

I believe stringifying the value to "0" (in quotes) could be a solution potentially. But I'm not that familiar with this new feature being added.

Environment:

Include the environment like gateway version, envoy version and so on.

Azure AKS

Logs:

Include the access logs and the Envoy logs.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions