Skip to content

Possible Deadlock in TagValueIterator() and AddSeriesList() #26164

Open
@line301

Description

@line301

Issue Summary
I suspect a potential deadlock related to the TagValueIterator() function when interacting with AddSeriesList().

Possible Deadlock Scenario
The issue appears to arise due to conflicting RLock() and Lock() calls on f.mu within LogFile. Specifically:

  1. TagValueIterator() acquires an RLock() on f.mu.
  2. It then calls tk.TagValueIterator(), which attempts to acquire another RLock() on tk.f.mu (which is the same as f.mu).
  3. Meanwhile, AddSeriesList() is called and attempts to acquire a write Lock() on f.mu, while RLock() is still held.
  4. This can lead to a deadlock since Go’s sync.RWMutex does not allow acquiring a Lock() when an RLock() is already held.

Relevant Code
TagValueIterator() (log_file.go)

func (f *LogFile) TagValueIterator(name, key []byte) TagValueIterator {
    f.mu.RLock() // First RLock
    defer f.mu.RUnlock()
    
    mm, ok := f.mms[string(name)]
    if !ok {
        return nil
    }

    tk, ok := mm.tagSet[string(key)]
    if !ok {
        return nil
    }
    return tk.TagValueIterator() // Calls tk.TagValueIterator(), which also acquires RLock
}

tk.TagValueIterator() (log_file.go)

func (tk *logTagKey) TagValueIterator() TagValueIterator {
    tk.f.mu.RLock() // Second RLock (on the same f.mu)
    a := make([]logTagValue, 0, len(tk.tagValues))
    for _, v := range tk.tagValues {
        a = append(a, v)
    }
    tk.f.mu.RUnlock()

    return newLogTagValueIterator(a)
}

AddSeriesList() (log_file.go)

func (f *LogFile) AddSeriesList(...) {
    //..

    f.mu.Lock() // Write lock on f.mu
    defer f.mu.Unlock()

    //,,.
}

pprof Output When Deadlock Occurred

goroutine 106814401 [semacquire, 6 minutes]:
sync.runtime_SemacquireMutex(0xc00015020c?, 0x78?, 0x3?)
        /usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).Lock(0xc023a48620?)
        /usr/local/go/src/sync/rwmutex.go:152 +0x71
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*LogFile).AddSeriesList(0xc0095a71d0, 0xc000150200, {0xc00863f800?, 0x13, 0x0?}, {0xc00863fb00?, 0x13, 0xc00e37daf8?})
       influxdb-2.6.0/tsdb/index/tsi1/log_file.go:545 +0x4a5
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*Partition).createSeriesListIfNotExists(0xc037ff10e0, {0xc00863f800, 0x13, 0x20}, {0xc00863fb00, 0x13, 0x20})
       influxdb-2.6.0/tsdb/index/tsi1/partition.go:725 +0x165
goroutine 106814631 [semacquire, 6 minutes]:
sync.runtime_SemacquireMutex(0x3318308?, 0x38?, 0xc?)
        /usr/local/go/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).RLock(...)
        /usr/local/go/src/sync/rwmutex.go:71
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*logTagKey).TagValueIterator(0xc02a1a6fb8)
       influxdb-2.6.0/tsdb/index/tsi1/log_file.go:1385 +0x51
github.com/influxdata/influxdb/v2/tsdb/index/tsi1.(*LogFile).TagValueIterator(0xc0095a71d0?, {0xc04537e640?, 0xa?, 0x158ed72?}, {0xc03be04a20, 0x9, 0x28?})
       influxdb-2.6.0/tsdb/index/tsi1/log_file.go:432 +0x185

Currently, the only way to recover from this issue is to restart InfluxDB, which is problematic.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions