Skip to content

incrementalXDS silently ignores requests from the second request for the same typeURL #448

Open
@jparklab

Description

@jparklab

Envoy sends multiple DeltaDiscoveryRequests for the same typeURL via the same stream connection to keep receiving updates for the type. However, the current incrementalXDS implementation silently ignores from the second request.

Here's what we saw from output of our integration test.(we added callbacks to print messages on delta stream open/close/request/response)

{"level":"debug","timestamp":"2021-06-11T18:40:31.806Z","msg":"Delta stream connected(peer: 172.23.160.81:39808, streamID: 1, typeURL: )","caller":"service/server.go:409"}
{"level":"debug","timestamp":"2021-06-11T18:40:31.807Z","msg":"Delta stream request(streamID: 1, node: test-cluster/test-id, typeUrl: type.googleapis.com/envoy.config.cluster.v3.Cluster, has initial versions: false, subscribe: 0, unsubscribe: 0)","caller":"service/server.go:430"}
{"level":"debug","timestamp":"2021-06-11T18:40:31.847Z","msg":"Delta stream response(streamID: 1, node: test-cluster/test-id, typeUrl: type.googleapis.com/envoy.config.cluster.v3.Cluster, resouces: 2000, removed resources: 0)","caller":"service/server.go:436"}
{"level":"debug","timestamp":"2021-06-11T18:40:34.593Z","msg":"Delta stream request(streamID: 1, node: test-cluster/test-id, typeUrl: type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment, has initial versions: false, subscribe: 2000, unsubscribe: 0)","caller":"service/server.go:430"}
{"level":"debug","timestamp":"2021-06-11T18:40:34.623Z","msg":"Delta stream request(streamID: 1, node: test-cluster/test-id, typeUrl: type.googleapis.com/envoy.config.cluster.v3.Cluster, has initial versions: false, subscribe: 0, unsubscribe: 0)","caller":"service/server.go:430"}
{"level":"debug","timestamp":"2021-06-11T18:40:34.648Z","msg":"Delta stream response(streamID: 1, node: test-cluster/test-id, typeUrl: type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment, resouces: 2000, removed resources: 0)","caller":"service/server.go:436"}
{"level":"debug","timestamp":"2021-06-11T18:40:34.898Z","msg":"Delta stream request(streamID: 1, node: test-cluster/test-id, typeUrl: type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment, has initial versions: false, subscribe: 0, unsubscribe: 0)","caller":"service/server.go:430"}
{"level":"debug","timestamp":"2021-06-11T18:40:34.991Z","msg":"Delta stream request(streamID: 1, node: test-cluster/test-id, typeUrl: type.googleapis.com/envoy.config.listener.v3.Listener, has initial versions: false, subscribe: 0, unsubscribe: 0)","caller":"service/server.go:430"}
{"level":"debug","timestamp":"2021-06-11T18:40:35.018Z","msg":"Delta stream response(streamID: 1, node: test-cluster/test-id, typeUrl: type.googleapis.com/envoy.config.listener.v3.Listener, resouces: 2, removed resources: 0)","caller":"service/server.go:436"}
{"level":"debug","timestamp":"2021-06-11T18:40:35.247Z","msg":"Delta stream request(streamID: 1, node: test-cluster/test-id, typeUrl: type.googleapis.com/envoy.config.route.v3.RouteConfiguration, has initial versions: false, subscribe: 1, unsubscribe: 0)","caller":"service/server.go:430"}
{"level":"debug","timestamp":"2021-06-11T18:40:35.272Z","msg":"Delta stream request(streamID: 1, node: test-cluster/test-id, typeUrl: type.googleapis.com/envoy.config.listener.v3.Listener, has initial versions: false, subscribe: 0, unsubscribe: 0)","caller":"service/server.go:430"}
{"level":"debug","timestamp":"2021-06-11T18:40:35.274Z","msg":"Delta stream response(streamID: 1, node: test-cluster/test-id, typeUrl: type.googleapis.com/envoy.config.route.v3.RouteConfiguration, resouces: 1, removed resources: 0)","caller":"service/server.go:436"}
{"level":"debug","timestamp":"2021-06-11T18:40:35.709Z","msg":"Delta stream request(streamID: 1, node: test-cluster/test-id, typeUrl: type.googleapis.com/envoy.config.route.v3.RouteConfiguration, has initial versions: false, subscribe: 0, unsubscribe: 0)","caller":"service/server.go:430"}
{"level":"debug","timestamp":"2021-06-11T18:40:37.591Z","msg":"Updating snapshot to version 1623437330","caller":"service/server.go:552"}

on 2021-06-11T18:40:37.591Z, the snapshot is updated to a new version that adds a new routes to RouteConfiguration along with a new Cluster/ClusterAssignment, but there is no response sent to the envoy until the test times out after 10 minutes.

It seems like that the logic at https://github.com/envoyproxy/go-control-plane/blob/main/pkg/server/delta/v3/server.go#L160 does not handle subsequent requests for the same typeURL well.

I managed it to pass our integration tests by patching the code that handles the request(https://github.com/envoyproxy/go-control-plane/blob/main/pkg/server/delta/v3/server.go#L160-L199) like below.

			if ok {
				// cancel existing watch
				watch.Cancel()
				if len(req.GetResourceNamesSubscribe()) != 0 || len(req.GetResourceNamesUnsubscribe()) != 0 {
					s.subscribe(req.GetResourceNamesSubscribe(), watch.state.ResourceVersions)
					s.unsubscribe(req.GetResourceNamesUnsubscribe(), watch.state.ResourceVersions)
				}
			} else {
				// Initialize the state of the stream.
				// Since there was no previous state, we know we're handling the first request of this type
				// so we set the initial resource versions if we have any, and also signal if this stream is in wildcard mode.
				watch.state = stream.NewStreamState(len(req.GetResourceNamesSubscribe()) == 0, req.GetInitialResourceVersions())
				if len(req.GetResourceNamesSubscribe()) != 0 {
					s.subscribe(req.GetResourceNamesSubscribe(), watch.state.ResourceVersions)
					s.unsubscribe(req.GetResourceNamesUnsubscribe(), watch.state.ResourceVersions)
				}				
			}
			
			// Code below is the same as line 173 to 199(https://github.com/envoyproxy/go-control-plane/blob/main/pkg/server/delta/v3/server.go#L173)
			watch.responses, watch.cancel = s.cache.CreateDeltaWatch(req, watch.state)

			...

(FYI, the tests still fail sometimes because the stream is closed unexpected by the controller, and envoy does not seem to be able to recover it. I'll post an update if I find why the stream is closed unexpected)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions