Open
Description
Issue Summary
I suspect a potential deadlock related to the TagValueIterator()
function when interacting with AddSeriesList()
.
Possible Deadlock Scenario
The issue appears to arise due to conflicting RLock() and Lock() calls on f.mu within LogFile. Specifically:
TagValueIterator()
acquires anRLock()
onf.mu
.- It then calls
tk.TagValueIterator()
, which attempts to acquire anotherRLock()
ontk.f.mu
(which is the same asf.mu
). - Meanwhile,
AddSeriesList()
is called and attempts to acquire a writeLock()
onf.mu
, whileRLock()
is still held. - This can lead to a deadlock since Go’s
sync.RWMutex
does not allow acquiring aLock()
when anRLock()
is already held.
Relevant Code
TagValueIterator()
(log_file.go
)
func (f *LogFile) TagValueIterator(name, key []byte) TagValueIterator {
f.mu.RLock() // First RLock
defer f.mu.RUnlock()
mm, ok := f.mms[string(name)]
if !ok {
return nil
}
tk, ok := mm.tagSet[string(key)]
if !ok {
return nil
}
return tk.TagValueIterator() // Calls tk.TagValueIterator(), which also acquires RLock
}
tk.TagValueIterator()
(log_file.go
)
func (tk *logTagKey) TagValueIterator() TagValueIterator {
tk.f.mu.RLock() // Second RLock (on the same f.mu)
a := make([]logTagValue, 0, len(tk.tagValues))
for _, v := range tk.tagValues {
a = append(a, v)
}
tk.f.mu.RUnlock()
return newLogTagValueIterator(a)
}
AddSeriesList()
(log_file.go
)
func (f *LogFile) AddSeriesList(...) {
//..
f.mu.Lock() // Write lock on f.mu
defer f.mu.Unlock()
//,,.
}
pprof Output When Deadlock Occurred
goroutine 106814401 [semacquire, 6 minutes]:
sync.runtime_SemacquireMutex(0xc00015020c?, 0x78?, 0x3?)
/usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).Lock(0xc023a48620?)
/usr/local/go/src/sync/rwmutex.go:152 +0x71
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*LogFile).AddSeriesList(0xc0095a71d0, 0xc000150200, {0xc00863f800?, 0x13, 0x0?}, {0xc00863fb00?, 0x13, 0xc00e37daf8?})
influxdb-2.6.0/tsdb/index/tsi1/log_file.go:545 +0x4a5
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*Partition).createSeriesListIfNotExists(0xc037ff10e0, {0xc00863f800, 0x13, 0x20}, {0xc00863fb00, 0x13, 0x20})
influxdb-2.6.0/tsdb/index/tsi1/partition.go:725 +0x165
goroutine 106814631 [semacquire, 6 minutes]:
sync.runtime_SemacquireMutex(0x3318308?, 0x38?, 0xc?)
/usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).RLock(...)
/usr/local/go/src/sync/rwmutex.go:71
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*logTagKey).TagValueIterator(0xc02a1a6fb8)
influxdb-2.6.0/tsdb/index/tsi1/log_file.go:1385 +0x51
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*LogFile).TagValueIterator(0xc0095a71d0?, {0xc04537e640?, 0xa?, 0x158ed72?}, {0xc03be04a20, 0x9, 0x28?})
influxdb-2.6.0/tsdb/index/tsi1/log_file.go:432 +0x185
Currently, the only way to recover from this issue is to restart InfluxDB, which is problematic.