Skip to content

Deletion race condition when fetching secret from the Kubernetes secrets provider cache #6340

@cmacknz

Description

@cmacknz

The Kubernetes secrets provider caches secrets and updates them on a configurable schedule to avoid placing unnecessary load on the Kubernetes API.

The cache is updated in a goroutine:

if !p.config.DisableCache {
go p.updateSecrets(ctx, comm)
}

The cache is copied, the latest value for each non-expired secret is fetched, and then the copy is merged into the active map.

// to not hold the lock for long, we copy the current state of the cache map
copyMap := make(map[string]secretsData)
p.secretsCacheMx.RLock()
for name, data := range p.secretsCache {
copyMap[name] = *data
}
p.secretsCacheMx.RUnlock()
// The only way to update an entry in the cache is through the last access time (to delete the key)
// or if the value gets updated.
for name, data := range copyMap {
diff := time.Since(data.lastAccess)
if diff < p.config.TTLDelete {
value, ok := p.fetchSecretWithTimeout(name)
if ok {
newData := &secretsData{
value: value,
lastAccess: data.lastAccess,
}
cacheTmp[name] = newData
if value != data.value {
updatedCache = true
}
}
} else {
updatedCache = true
}
}
// While the cache was updated, it is possible that some secret was added through another go routine.
// We need to merge the updated map with the current cache map to catch the new entries and avoid
// loss of data.
var updated bool
p.secretsCacheMx.Lock()
p.secretsCache, updated = p.mergeWithCurrent(cacheTmp)
p.secretsCacheMx.Unlock()

The logic to get a value from the cache follows below.

func (p *contextProviderK8sSecrets) getFromCache(key string) (string, bool) {
p.secretsCacheMx.RLock()
_, ok := p.secretsCache[key]
p.secretsCacheMx.RUnlock()
// if value is still not present in cache, it is possible we haven't tried to fetch it yet
if !ok {
value, ok := p.addToCache(key)
// if it was not possible to fetch the secret, return
if !ok {
return value, ok
}
}
p.secretsCacheMx.Lock()
data, ok := p.secretsCache[key]
data.lastAccess = time.Now()
pass := data.value
p.secretsCacheMx.Unlock()
return pass, ok
}

  1. A read lock on the cache is taken to check if the value exists in the cache.
  2. If it isn't, the secret is fetched which involves holding a write lock again to update the cache.
  3. A write lock on the cache is taken and the cache is updated with the value from step 2 and the last access time is set.

The race condition is that after 1 and 2 have completed, the updateCache() method that can delete secrets from the cache could have run and deleted the secret right before 3 below where the value is returned directly from the cache with no check for whether the secret still exists.

p.secretsCacheMx.Lock()
data, ok := p.secretsCache[key]
data.lastAccess = time.Now()
pass := data.value
p.secretsCacheMx.Unlock()

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions