Skip to content

Sotw v3 server causing deadlock in LinearCache #530

Open
@rueian

Description

@rueian

The LinearCache will clear the watches under the name of changed resources in notifyAll calls (L147):

func (cache *LinearCache) notifyAll(modified map[string]struct{}) {
// de-duplicate watches that need to be responded
notifyList := make(map[chan Response][]string)
for name := range modified {
for watch := range cache.watches[name] {
notifyList[watch] = append(notifyList[watch], name)
}
delete(cache.watches, name)
}

However, just cleaning the the watches under the name is not enough. It needs to clean all the watches to the same chan Response. Because the Sotw v3 server creates the chan Response with only 1 buffer (L369):

// cancel existing watches to (re-)request a newer version
switch {
case req.TypeUrl == resource.EndpointType:
if values.endpointNonce == "" || values.endpointNonce == nonce {
if values.endpointCancel != nil {
values.endpointCancel()
}
values.endpoints = make(chan cache.Response, 1)
values.endpointCancel = s.cache.CreateWatch(req, streamState, values.endpoints)
}

Consider the following sequence:

  1. The sotw server receives a DiscoveryRequest with 2 resource names, and calls the cache.CreateWatch to the LinearCache.
  2. The LinearCache registers the chan Response provided by the sotw server with 2 watch entries corresponding to the requested resources.
  3. The LinearCache's UpdateResource is called with the first resource name.
  4. The LinearCache's UpdateResource is called with the second resource name and the chan Response is blocked.
  5. The sotw server receives another DiscoveryRequest but the LinearCache is still locked therefore they are deadlocked.

The LinearCache could just maintain another chan Response -> resource name map for fast cleaning in notifyAll call. But I think the root cause is that the sotw server uses a single goroutine to handle both streams of bi-di grpc.

If it handled them in separated goroutines, there would be no such deadlock, and there might be even no need to recreate a new chan Response on each DiscoveryRequest and unregister watches on each notifyAll call in LinearCache.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions