Deadlock when following logs for pipelineruns and taskruns

# Versions and Operating System

- Kubernetes version:

  **Output of `kubectl version`:**

```
Client Version: v1.28.10+rke2r1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.10+rke2r1
```

- Tekton Pipeline version:

  **Output of `tkn version` or `kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'`**

```
v0.48.0
```

- Operating System:

```
5.14.0-284.30.1.el9_2.x86_64
```

# Expected Behavior
Regardless of state, logs are streamed to completion and then the process completes

# Actual Behavior
Very rarely, the process streaming logs hangs forever. 

# Steps to Reproduce the Problem
I'm unable to provide consistent steps to reproduce the problem as it appears to be a very tight race. It should be reproducible simply by using the tekton CLI to tail the logs of a pipelinerun, however.

# Additional Info
The process in which I typically encounter this issue embeds tektoncd/cli in another golang process. Currently, I've been able to reproduce this issue using tektoncd/cli version 0.39.1.

I took a goroutine dump of a process in this state, and discovered a deadlock in `pkg/pods/pod.go`:
The first goroutine is stuck here, attempting to send an event on a channel. I'll call this goroutine "A" in the future:
```
goroutine 1248 [chan send, 33 minutes]:
runtime.gopark(0x1d384c0?, 0xc001184d54?, 0xb8?, 0x4c?, 0x409c5e?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0009c4c90 sp=0xc0009c4c70 pc=0x43d22e
runtime.chansend(0xc0007b1200, 0xc0009c4d98, 0x1, 0x0?)
	/usr/local/go/src/runtime/chan.go:259 +0x3a5 fp=0xc0009c4d00 sp=0xc0009c4c90 pc=0x408ca5
runtime.chansend1(0xc001184d50?, 0x3?)
	/usr/local/go/src/runtime/chan.go:145 +0x17 fp=0xc0009c4d30 sp=0xc0009c4d00 pc=0x4088f7
github.com/tektoncd/cli/pkg/pods.(*Pod).watcher.func2({0x409632?, 0xc0007b1260?}, {0x208a620, 0xc0007ca000})
	/go/pkg/mod/github.com/tektoncd/cli@v0.32.0/pkg/pods/pod.go:129 +0xc8 fp=0xc0009c4d88 sp=0xc0009c4d30 pc=0x1b20828
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
	/go/pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/controller.go:246
k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnUpdate(0x0?, {0x208a620?, 0xc000538d80?}, {0x208a620?, 0xc0007ca000?})
	<autogenerated>:1 +0x4b fp=0xc0009c4db8 sp=0xc0009c4d88 pc=0xc7fe6b
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/go/pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/shared_informer.go:972 +0xea fp=0xc0009c4e58 sp=0xc0009c4db8 pc=0xc7ac4a
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/go/pkg/mod/k8s.io/apimachinery@v0.28.2/pkg/util/wait/backoff.go:226 +0x33 fp=0xc0009c4e78 sp=0xc0009c4e58 pc=0x7b6613
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0009c4f38?, {0x23d4f60, 0xc001180120}, 0x1, 0xc0002b8a20)
	/go/pkg/mod/k8s.io/apimachinery@v0.28.2/pkg/util/wait/backoff.go:227 +0xaf fp=0xc0009c4ef8 sp=0xc0009c4e78 pc=0x7b64cf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0007a4600?, 0x3b9aca00, 0x0, 0x0?, 0xc0009c4f88?)
	/go/pkg/mod/k8s.io/apimachinery@v0.28.2/pkg/util/wait/backoff.go:204 +0x7f fp=0xc0009c4f48 sp=0xc0009c4ef8 pc=0x7b63bf
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/go/pkg/mod/k8s.io/apimachinery@v0.28.2/pkg/util/wait/backoff.go:161
k8s.io/client-go/tools/cache.(*processorListener).run(0xc0007d9440)
	/go/pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/shared_informer.go:968 +0x69 fp=0xc0009c4f98 sp=0xc0009c4f48 pc=0xc7ab29
k8s.io/client-go/tools/cache.(*processorListener).run-fm()
	<autogenerated>:1 +0x25 fp=0xc0009c4fb0 sp=0xc0009c4f98 pc=0xc80325
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
	/go/pkg/mod/k8s.io/apimachinery@v0.28.2/pkg/util/wait/wait.go:72 +0x4f fp=0xc0009c4fe0 sp=0xc0009c4fb0 pc=0x7b872f
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0009c4fe8 sp=0xc0009c4fe0 pc=0x46cd21
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 1412
	/go/pkg/mod/k8s.io/apimachinery@v0.28.2/pkg/util/wait/wait.go:70 +0x73
```
the second goroutine is stuck here, attempting to acquire a lock held by the first goroutine. I'll call this goroutine "B" in the future:
```
goroutine 188 [sync.Mutex.Lock, 33 minutes]:
runtime.gopark(0xc000549080?, 0x1d7cc00?, 0x80?, 0x90?, 0x409c5e?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000ba6cc0 sp=0xc000ba6ca0 pc=0x43d22e
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:404
runtime.semacquire1(0xc001184d54, 0x0?, 0x3, 0x1, 0x5b?)
	/usr/local/go/src/runtime/sema.go:160 +0x218 fp=0xc000ba6d28 sp=0xc000ba6cc0 pc=0x44dcf8
sync.runtime_SemacquireMutex(0xc0004b7da0?, 0x3?, 0xc0007d4480?)
	/usr/local/go/src/runtime/sema.go:77 +0x25 fp=0xc000ba6d60 sp=0xc000ba6d28 pc=0x4690e5
sync.(*Mutex).lockSlow(0xc001184d50)
	/usr/local/go/src/sync/mutex.go:171 +0x15d fp=0xc000ba6db0 sp=0xc000ba6d60 pc=0x477d9d
sync.(*Mutex).Lock(...)
	/usr/local/go/src/sync/mutex.go:90
github.com/tektoncd/cli/pkg/pods.(*Pod).Wait.func1()
	/go/pkg/mod/github.com/tektoncd/cli@v0.32.0/pkg/pods/pod.go:83 +0x46 fp=0xc000ba6de8 sp=0xc000ba6db0 pc=0x1b201e6
github.com/tektoncd/cli/pkg/pods.(*Pod).Wait(0xc0009c658e?)
	/go/pkg/mod/github.com/tektoncd/cli@v0.32.0/pkg/pods/pod.go:100 +0x19c fp=0xc000ba6ea0 sp=0xc000ba6de8 pc=0x1b2013c
github.com/tektoncd/cli/pkg/log.(*Reader).readPodLogs.func2()
	/go/pkg/mod/github.com/tektoncd/cli@v0.32.0/pkg/log/task_reader.go:196 +0x25d fp=0xc000ba6fe0 sp=0xc000ba6ea0 pc=0x1b24d3d
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000ba6fe8 sp=0xc000ba6fe0 pc=0x46cd21
created by github.com/tektoncd/cli/pkg/log.(*Reader).readPodLogs in goroutine 1386
	/go/pkg/mod/github.com/tektoncd/cli@v0.32.0/pkg/log/task_reader.go:184 +0x185
```

In `pod.go`'s `Wait` function, a channel `eventC` is created along with a mutex `mu`. Both the channel and mutex are passed to `watcher`, which receives interesting events from Kubernetes. When an event is received, goroutine A acquires mutex `mu` and then sends an event to the `eventC` channel.  goroutine B is waiting for events, and once it receives an event confirming the pod status, it will no longer receive events from `eventC`. goroutine B then acquires mutex `mu` and closes `eventC`.

In the event where two events are quickly received by the watcher, and the first event results in a non-nil first return from `checkPodStatus`, it is possible for deadlock to occur. goroutine B which receives events on `eventC` will not attempt to listen for any further events, and goroutine A will attempt to send an event on `eventC` while holding mutex `mu`. Neither goroutine can make progress in this situation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deadlock when following logs for pipelineruns and taskruns #2484

Versions and Operating System

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deadlock when following logs for pipelineruns and taskruns #2484

Description

Versions and Operating System

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions