Description
The LinearCache will clear the watches under the name
of changed resources in notifyAll
calls (L147):
go-control-plane/pkg/cache/v3/linear.go
Lines 140 to 148 in 996a28b
However, just cleaning the the watches under the name
is not enough. It needs to clean all the watches to the same chan Response
. Because the Sotw v3 server creates the chan Response
with only 1 buffer (L369):
go-control-plane/pkg/server/sotw/v3/server.go
Lines 362 to 371 in 7e211bd
Consider the following sequence:
- The sotw server receives a
DiscoveryRequest
with 2 resource names, and calls thecache.CreateWatch
to the LinearCache. - The LinearCache registers the
chan Response
provided by the sotw server with 2 watch entries corresponding to the requested resources. - The LinearCache's
UpdateResource
is called with the first resource name. - The LinearCache's
UpdateResource
is called with the second resource name and thechan Response
is blocked. - The sotw server receives another
DiscoveryRequest
but the LinearCache is still locked therefore they are deadlocked.
The LinearCache could just maintain another chan Response -> resource name
map for fast cleaning in notifyAll
call. But I think the root cause is that the sotw server uses a single goroutine to handle both streams of bi-di grpc.
If it handled them in separated goroutines, there would be no such deadlock, and there might be even no need to recreate a new chan Response
on each DiscoveryRequest
and unregister watches on each notifyAll
call in LinearCache
.