AI DISCLOSURE
I used Claude Code to generate this repro. Its very hard to verify that the socketname is really "None" in the running system, but by patching this code in my production deployment the error did go away...
The Issue
AsyncBolt.__init__ (and several sibling sites in the async connect path) call self.socket.getsockname()[1] with no None guard. For the async driver, BoltSocket.getsockname() is self._writer.transport.get_extra_info("sockname"), which asyncio returns as None once the transport's socket is gone — i.e. when a load balancer drops a freshly-opened connection (we see this constantly on Neo4j Aura, most often on the connection opened for a routing-table refresh).
Because the result is a bare TypeError rather than ServiceUnavailable/SessionExpired, the driver's transaction-retry never catches it, so a transient connection drop becomes a hard, non-retryable crash with a misleading message.
This is the same defect previously reported in #730 (2022) and #1057 (2024); both were closed because the condition could not be reproduced (#1057: "I was not able to reproduce this condition locally", and "feel free to reopen … while providing the additional information requested"). This report provides a deterministic, minimal reproduction — the missing piece — plus a verification that guarding the None fixes it.
Versions
- Driver: reproduced on
neo4j==5.28.4; the unguarded line is identical in the latest 6.2.0 (_bolt.py:159).
- Server: Neo4j Aura in production; the repro below works against any local Neo4j.
- Python 3.12.
Production traceback (matches #730 / #1057)
File ".../neo4j/_async/io/_pool.py", line 792, in fetch_routing_info
cx = await self._acquire(address, auth, deadline, None)
File ".../neo4j/_async/io/_pool.py", line 711, in opener
return await AsyncBolt.open(...)
File ".../neo4j/_async/io/_bolt.py", line 413, in open
connection = bolt_cls(...)
File ".../neo4j/_async/io/_bolt.py", line 156, in __init__
self.local_port = self.socket.getsockname()[1]
~~~~~~~~~~~~~~~~~~~~~~~~~^^^
TypeError: 'NoneType' object is not subscriptable
Deterministic reproduction
The real-world trigger (an LB dropping a brand-new connection) is hard to race — presumably why the prior issues stalled. But the condition asyncio actually exposes is simple: get_extra_info("sockname") returns None once the transport's socket is gone. The script marks the first freshly-opened socket as "dropped" right after its handshake — exactly the production sequence — so the crash lands on the same line as #730/#1057. It then applies a one-line guard and shows the same drop become retryable and self-heal. Needs only pip install neo4j:
docker run --rm -d -p 7687:7687 -e NEO4J_AUTH=neo4j/password neo4j:5
NEO4J_URI=bolt://localhost:7687 NEO4J_USER=neo4j NEO4J_PASSWORD=password python repro.py
# repro.py
import asyncio
import os
import traceback
import neo4j._async.io._bolt as bolt_mod
import neo4j._async.io._bolt_socket as io_sock
import neo4j._async_compat.network._bolt_socket as base_sock
from neo4j import AsyncGraphDatabase
from neo4j.exceptions import ServiceUnavailable
URI = os.environ.get("NEO4J_URI", "bolt://localhost:7687")
USER = os.environ.get("NEO4J_USER", "neo4j")
PASSWORD = os.environ.get("NEO4J_PASSWORD", "password")
# Make the sockets we choose report sockname == None -- the value asyncio yields
# for a transport whose socket is already gone.
_orig_getsockname = base_sock.AsyncBoltSocketBase.getsockname
_orig_connect = io_sock.AsyncBoltSocket.connect.__func__ # unwrap classmethod
_dead_sockets: set[int] = set()
_arm = {"on": False}
def _getsockname(self):
return None if id(self) in _dead_sockets else _orig_getsockname(self)
async def _connect(cls, *args, **kwargs):
sock, *rest = await _orig_connect(cls, *args, **kwargs)
if _arm["on"]:
_dead_sockets.add(id(sock)) # LB drops THIS freshly-opened connection
_arm["on"] = False # one-shot: only the first connection
return (sock, *rest)
base_sock.AsyncBoltSocketBase.getsockname = _getsockname
io_sock.AsyncBoltSocket.connect = classmethod(_connect)
async def _run_query() -> int:
driver = AsyncGraphDatabase.driver(URI, auth=(USER, PASSWORD))
try:
records, _, _ = await driver.execute_query("RETURN 1 AS n")
return records[0]["n"]
finally:
await driver.close()
async def main() -> int:
print(f"neo4j driver {__import__('neo4j').__version__} | target {URI}\n")
print("SCENARIO 1 - stock driver, first connection's socket dropped:")
_dead_sockets.clear(); _arm["on"] = True
try:
await _run_query()
print(" UNEXPECTED: query succeeded\n"); s1 = False
except TypeError as exc:
line = next((l.strip() for l in traceback.format_exc().splitlines()
if "getsockname()[1]" in l), "?")
print(f" REPRODUCED -> TypeError: {exc}\n at: {line}\n"
" (a bare TypeError -- the driver's retry never catches it)\n")
s1 = True
# The fix: reclassify the None-sockname subscript crash as retryable.
_orig_init = bolt_mod.AsyncBolt.__init__
def _guarded_init(self, *args, **kwargs):
try:
_orig_init(self, *args, **kwargs)
except TypeError as exc:
if "subscriptable" not in str(exc):
raise
raise ServiceUnavailable(
"socket dropped before use (sockname unavailable); retrying"
) from exc
bolt_mod.AsyncBolt.__init__ = _guarded_init
print("SCENARIO 2 - same drop, with the None-guard applied:")
_dead_sockets.clear(); _arm["on"] = True
try:
result = await _run_query()
print(f" HEALED -> driver retried on a fresh connection, returned {result}\n")
s2 = result == 1
finally:
bolt_mod.AsyncBolt.__init__ = _orig_init
print("RESULT:", "PASS" if (s1 and s2) else "FAIL")
return 0 if (s1 and s2) else 1
if __name__ == "__main__":
raise SystemExit(asyncio.run(main()))
Output (driver 5.28.4, local Neo4j):
SCENARIO 1 - stock driver, first connection's socket dropped:
REPRODUCED -> TypeError: 'NoneType' object is not subscriptable
at: self.local_port = self.socket.getsockname()[1]
(a bare TypeError -- the driver's retry never catches it)
SCENARIO 2 - same drop, with the None-guard applied:
Transaction failed and will be retried in 0.83s (socket dropped before use (sockname unavailable); retrying)
HEALED -> driver retried on a fresh connection, returned 1
RESULT: PASS
All unguarded getsockname()[1] sites (any can be hit, depending on when the socket dies)
In 5.28.4 (same pattern in 6.2.0):
_async/io/_bolt.py:156 — AsyncBolt.__init__ (the production crash site)
_async/io/_bolt.py:381, 399, 407 — AsyncBolt.open close / auth-failure debug logging
_async/io/_bolt_socket.py:225 — _handshake (a one-liner getsockname = lambda self: None lands here instead)
_async/io/_bolt_socket.py:343, 361 — connect
_async/io/_common.py:41 — AsyncOutbox
Why it matters
This is a transient condition (connection dropped mid/just-after open — normal during Aura leader elections / routing refresh; #1057 also noted it "manifested itself under high load, and when leader elections were happening"). The driver already retries transient connection failures, but a bare TypeError is not in the retryable set, so it escapes and kills the operation instead of re-opening.
Note: get_extra_info("sockname") returns None simply because the transport's socket is already None/closed — there isn't necessarily an OSError at the getsockname call itself (which may be why the async-sockname OSError-surfacing branch from #1057 didn't pan out).
Suggested fix
Guard the None and raise a retryable driver error instead of subscripting, e.g. in AsyncBolt.__init__:
sockname = self.socket.getsockname()
if sockname is None:
raise ServiceUnavailable(
"Connection's socket was closed before it could be used "
"(sockname unavailable); will retry on a fresh connection"
)
self.local_port = sockname[1]
and treat the debug-logging sites defensively (local_port = sockname[1] if sockname else -1). As Scenario 2 shows, this lets the existing retry transparently re-open — the behavior every reporter of this defect actually wants.
AI DISCLOSURE
I used Claude Code to generate this repro. Its very hard to verify that the socketname is really "None" in the running system, but by patching this code in my production deployment the error did go away...
The Issue
AsyncBolt.__init__(and several sibling sites in the async connect path) callself.socket.getsockname()[1]with noNoneguard. For the async driver,BoltSocket.getsockname()isself._writer.transport.get_extra_info("sockname"), which asyncio returns asNoneonce the transport's socket is gone — i.e. when a load balancer drops a freshly-opened connection (we see this constantly on Neo4j Aura, most often on the connection opened for a routing-table refresh).Because the result is a bare
TypeErrorrather thanServiceUnavailable/SessionExpired, the driver's transaction-retry never catches it, so a transient connection drop becomes a hard, non-retryable crash with a misleading message.This is the same defect previously reported in #730 (2022) and #1057 (2024); both were closed because the condition could not be reproduced (#1057: "I was not able to reproduce this condition locally", and "feel free to reopen … while providing the additional information requested"). This report provides a deterministic, minimal reproduction — the missing piece — plus a verification that guarding the
Nonefixes it.Versions
neo4j==5.28.4; the unguarded line is identical in the latest6.2.0(_bolt.py:159).Production traceback (matches #730 / #1057)
Deterministic reproduction
The real-world trigger (an LB dropping a brand-new connection) is hard to race — presumably why the prior issues stalled. But the condition asyncio actually exposes is simple:
get_extra_info("sockname")returnsNoneonce the transport's socket is gone. The script marks the first freshly-opened socket as "dropped" right after its handshake — exactly the production sequence — so the crash lands on the same line as #730/#1057. It then applies a one-line guard and shows the same drop become retryable and self-heal. Needs onlypip install neo4j:Output (driver 5.28.4, local Neo4j):
All unguarded
getsockname()[1]sites (any can be hit, depending on when the socket dies)In
5.28.4(same pattern in6.2.0):_async/io/_bolt.py:156—AsyncBolt.__init__(the production crash site)_async/io/_bolt.py:381, 399, 407—AsyncBolt.openclose / auth-failure debug logging_async/io/_bolt_socket.py:225—_handshake(a one-linergetsockname = lambda self: Nonelands here instead)_async/io/_bolt_socket.py:343, 361— connect_async/io/_common.py:41—AsyncOutboxWhy it matters
This is a transient condition (connection dropped mid/just-after open — normal during Aura leader elections / routing refresh; #1057 also noted it "manifested itself under high load, and when leader elections were happening"). The driver already retries transient connection failures, but a bare
TypeErroris not in the retryable set, so it escapes and kills the operation instead of re-opening.Note:
get_extra_info("sockname")returnsNonesimply because the transport's socket is alreadyNone/closed — there isn't necessarily anOSErrorat thegetsocknamecall itself (which may be why theasync-socknameOSError-surfacing branch from #1057 didn't pan out).Suggested fix
Guard the
Noneand raise a retryable driver error instead of subscripting, e.g. inAsyncBolt.__init__:and treat the debug-logging sites defensively (
local_port = sockname[1] if sockname else -1). As Scenario 2 shows, this lets the existing retry transparently re-open — the behavior every reporter of this defect actually wants.