Description
Description:
On Azure, some regions don't have availability zones, so AKS annotates nodes with topology.kubernetes.io/zone: 0
(see docs).
In envoy-gateway v1.4.0, topolgy-aware routing starts to use these pod annotations to populate the envoy bootstrap config. But envoy considers the value of zero to be invalid.
ENVOY_SERVICE_ZONE: (v1:metadata.annotations['topology.kubernetes.io/zone'])
As a result, envoy proxy pods crashloop on upgrade to envoy-gateway v1.4.0.
envoy ': Unable to parse JSON as proto (INVALID_ARGUMENT: invalid JSON in envoy.config.bootstrap.v3.Bootstrap @ static_resources.clusters[1].load_assignment.endpoints[0].locality.zone: string, near 1:1020 (offset 1019): unexpected character: '0'; expected '"'): {"cluster_manager":{"local_cluster_name":"loc │
│ envoy [2025-05-19 17:34:15.677][1][info][main] [source/server/server.cc:1072] exiting │
│ envoy Unable to parse JSON as proto (INVALID_ARGUMENT: invalid JSON in envoy.config.bootstrap.v3.Bootstrap @ static_resources.clusters[1].load_assignment.endpoints[0].locality.zone: string, near 1:1020 (offset 1019): unexpected character: '0'; expected '"'): {"cluster_manager":{"local_cluster_name":"local_cluster...
Rolling back to v1.4.0-rc.2 (which I was testing previously) resolves the issue, because the release candidate does not have the change that added this behavior it seems.
What issue is being seen? Describe what should be happening instead of
the bug, for example: The expected value isn't returned, etc.
Repro steps:
Annotate the envoy proxy pods with topology.kubernetes.io/zone: 0
.
Include sample requests, environment, etc. All data and inputs
required to reproduce the bug.
Note:
I believe stringifying the value to "0" (in quotes) could be a solution potentially. But I'm not that familiar with this new feature being added.
Environment:
Include the environment like gateway version, envoy version and so on.
Azure AKS
Logs:
Include the access logs and the Envoy logs.