Skip to content

[Bug]: Infinite block and thread pool starvation in OneDriveReader due to missing timeout in requests.get() #22140

Description

@QiuYucheng2003

Bug Description

The OneDriveReader component makes synchronous HTTP calls using requests.get() without specifying a timeout parameter. According to the requests documentation, lacking a timeout means the client will block indefinitely if the server does not send a response after the TCP handshake.

In production RAG pipelines, if the Microsoft Graph API experiences network blackholes, temporary outages, or severe rate-limiting, these operations will hang forever. Due to Python's GIL and thread pool limits, multiple concurrent hung requests will quickly exhaust all available worker threads, leading to a complete Denial of Service (DoS) of the host application or agent container.

Affected methods in llama_index/readers/microsoft_onedrive/base.py:

_download_file_by_url

_get_items_in_drive_with_maxretries

_get_permissions_info

Suggested Fix:
Enforce a reasonable timeout tuple (e.g., timeout=(3.0, 30.0)) on all requests.get() calls within this module.

Version

main

Steps to Reproduce

This is an underlying network logic flaw. To conceptually reproduce:

  1. Initialize OneDriveReader and execute load_data().

  2. Simulate a network hang (e.g., using iptables to drop packets from graph.microsoft.com after the initial connection, or mock requests.get to sleep indefinitely).

  3. Observe that the application execution hangs forever without raising any Timeout exceptions.

  4. If running in a threaded environment, observe that the thread pool becomes completely starved and unresponsive.

Relevant Logs/Tracebacks

# No explicit traceback is generated because the process hangs indefinitely.
# The application thread remains permanently stuck in the socket read phase, awaiting a response that never arrives.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageIssue needs to be triaged/prioritized

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions