Skip to content

TCP WorkQueueManager race condition #33219

@mrsaldana

Description

@mrsaldana

Summary

During testing, it has been observed that a TCP connection can be closed while the TCP work queue is still processing I/O for the same connection. When the worker thread processes what would now be a stale work item, the attemptIO encounters a NullPointerException on the dereferenced connection link.

Observed Behavior

  • Socket is closed:

SocketChannel closing, local: localhost/127.0.0.1:56988 remote: localhost/127.0.0.1:8030

  • Worker process a queue item afterwards and throws:

FFDC1015I: An FFDC Incident has been created: "java.lang.NullPointerException: Cannot invoke "com.ibm.ws.tcpchannel.internal.SocketIOChannel.getSocket()" because the return value of "com.ibm.ws.tcpchannel.internal.TCPConnLink.getSocketIOChannel()" is null com.ibm.ws.tcpchannel.internal.WorkQueueManager workerRun(req)" at ffdc_25.10.22_01.19.19.0.log

Root Cause

The WorkQueueManager.attemptIO(...) reads the TCPConnLink from the request. Assuming that this is valid, it immediately calls conn.getSocketIOChannel(). There is a race condition where the connection close/destroy has already nulled out this object. This results in the ioChannel being null when the worker finally works on the request. Since the code immediately calls conn.getSocketIOChannel(), an NPE is thrown.

We should defensively guard the SocketIOChannel in the attemptIO method.

Doing this does not change behavior for valid connections; only the error path changes from throwing the NPE to a graceful close.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions