Bug Description
The OneDriveReader component makes synchronous HTTP calls using requests.get() without specifying a timeout parameter. According to the requests documentation, lacking a timeout means the client will block indefinitely if the server does not send a response after the TCP handshake.
In production RAG pipelines, if the Microsoft Graph API experiences network blackholes, temporary outages, or severe rate-limiting, these operations will hang forever. Due to Python's GIL and thread pool limits, multiple concurrent hung requests will quickly exhaust all available worker threads, leading to a complete Denial of Service (DoS) of the host application or agent container.
Affected methods in llama_index/readers/microsoft_onedrive/base.py:
_download_file_by_url
_get_items_in_drive_with_maxretries
_get_permissions_info
Suggested Fix:
Enforce a reasonable timeout tuple (e.g., timeout=(3.0, 30.0)) on all requests.get() calls within this module.
Version
main
Steps to Reproduce
This is an underlying network logic flaw. To conceptually reproduce:
-
Initialize OneDriveReader and execute load_data().
-
Simulate a network hang (e.g., using iptables to drop packets from graph.microsoft.com after the initial connection, or mock requests.get to sleep indefinitely).
-
Observe that the application execution hangs forever without raising any Timeout exceptions.
-
If running in a threaded environment, observe that the thread pool becomes completely starved and unresponsive.
Relevant Logs/Tracebacks
# No explicit traceback is generated because the process hangs indefinitely.
# The application thread remains permanently stuck in the socket read phase, awaiting a response that never arrives.
Bug Description
The OneDriveReader component makes synchronous HTTP calls using requests.get() without specifying a timeout parameter. According to the requests documentation, lacking a timeout means the client will block indefinitely if the server does not send a response after the TCP handshake.
In production RAG pipelines, if the Microsoft Graph API experiences network blackholes, temporary outages, or severe rate-limiting, these operations will hang forever. Due to Python's GIL and thread pool limits, multiple concurrent hung requests will quickly exhaust all available worker threads, leading to a complete Denial of Service (DoS) of the host application or agent container.
Affected methods in llama_index/readers/microsoft_onedrive/base.py:
_download_file_by_url
_get_items_in_drive_with_maxretries
_get_permissions_info
Suggested Fix:
Enforce a reasonable timeout tuple (e.g., timeout=(3.0, 30.0)) on all requests.get() calls within this module.
Version
main
Steps to Reproduce
This is an underlying network logic flaw. To conceptually reproduce:
Initialize OneDriveReader and execute load_data().
Simulate a network hang (e.g., using iptables to drop packets from graph.microsoft.com after the initial connection, or mock requests.get to sleep indefinitely).
Observe that the application execution hangs forever without raising any Timeout exceptions.
If running in a threaded environment, observe that the thread pool becomes completely starved and unresponsive.
Relevant Logs/Tracebacks