Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion python/pyhive/hive.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,8 @@ def __init__(
check_hostname=None,
ssl_cert=None,
thrift_transport=None,
ssl_context=None
ssl_context=None,
connection_timeout=None,
Comment thread
ryanbordo marked this conversation as resolved.
):
"""Connect to HiveServer2

Expand All @@ -175,6 +176,7 @@ def __init__(
Incompatible with host, port, auth, kerberos_service_name, and password.
:param ssl_context: A custom SSL context to use for HTTPS connections. If provided,
this overrides check_hostname and ssl_cert parameters.
:param connection_timeout: Millisecond timeout for Thrift connections.
Comment thread
ryanbordo marked this conversation as resolved.
Outdated
The way to support LDAP and GSSAPI is originated from cloudera/Impyla:
https://github.com/cloudera/impyla/blob/255b07ed973d47a3395214ed92d35ec0615ebf62
/impala/_thrift_api.py#L152-L160
Expand All @@ -193,6 +195,8 @@ def __init__(
),
ssl_context=ssl_context,
)
if connection_timeout:
thrift_transport.setTimeout(connection_timeout)

if auth in ("BASIC", "NOSASL", "NONE", None):
# Always needs the Authorization header
Expand Down Expand Up @@ -236,6 +240,8 @@ def __init__(
if auth is None:
auth = 'NONE'
socket = thrift.transport.TSocket.TSocket(host, port)
if connection_timeout:
Comment thread
ryanbordo marked this conversation as resolved.
Outdated
socket.setTimeout(connection_timeout)
Copy link
Copy Markdown

@fbertsch fbertsch Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this sets both connection and socket timeout. There's a nice description of the difference here.

Suggested change
socket.setTimeout(connection_timeout)
socket.setConnectTimeout(connection_timeout)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good mention. My goal with this was originally to manage the socket timeout. However, I think I will leave this way to not get too granular.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Java, a socket can have different connectTimeout, readTimeout, and writeTimeout values. Does Python have an equivalent? Generally, we want a small connectTimeout, and long readTimeout and writeTimeout, so it can fail fast on connecting dead server and failover to health one, if it has multiple server instances

Copy link
Copy Markdown
Author

@ryanbordo ryanbordo Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my short research, it looks like the short answer is no, but possible with an explicit create_connection At least, python thrift is only using supporting one timeout and not doing this.
https://github.com/apache/thrift/blob/c99d09a231648d72e05a89d80281b38c9d0d1b9a/lib/py/src/transport/TSocket.py#L145-L147

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can call socket.setTimeout after the socket is connected, right? if so, it's easy to implement connectTimeout, socketTimeout

Copy link
Copy Markdown
Member

@pan3793 pan3793 Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I read the code, self._transport.open() happens in Connection __init__, so we can call setTimeout(connect_time) before it, and call setTimeout(socket_time), same for both host/port and thrift_transport cases

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks for the feedback.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed up a solution for this, don't love it but should work. We can't guarantee a public API for setting a socket timeout so I believe it's this or reverting to only setting a timeout once, for connection and socket. Let me know what you think

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards reverting to the connection timeout only, for both connections and socket, plus only on host, port combos for tradeoffs in simplicity. A user with a provided thrift object could fine tune themselves

Copy link
Copy Markdown
Author

@ryanbordo ryanbordo Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting to the previous PR for these reasons. Plus, the pyhive connection wrapper provides polling and ability to run async, which lessons the need for a separate socket timeout. We can always add onto this given user feedback and a user can use a custom thrift transport to fine tune this still

sql, runAsync=async_)

def poll(self, get_progress_update=True):

if auth == 'NOSASL':
# NOSASL corresponds to hive.server2.authentication=NOSASL in hive-site.xml
self._transport = thrift.transport.TTransport.TBufferedTransport(socket)
Expand Down
11 changes: 11 additions & 0 deletions python/pyhive/tests/test_hive.py
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,17 @@ def test_basic_ssl_context(self):
cursor.execute('SELECT 1 FROM one_row')
self.assertEqual(cursor.fetchall(), [(1,)])

def test_connection_timeout(self):
"""Test that a connection timeout is set without error."""
with contextlib.closing(hive.connect(
host=_HOST,
port=10000,
connection_timeout=10 * 1000
)) as connection:
with contextlib.closing(connection.cursor()) as cursor:
# Use the same query pattern as other tests
cursor.execute('SELECT 1 FROM one_row')
self.assertEqual(cursor.fetchall(), [(1,)])
Comment thread
ryanbordo marked this conversation as resolved.
Outdated

def _restart_hs2():
subprocess.check_call(['sudo', 'service', 'hive-server2', 'restart'])
Expand Down
Loading