Skip to content

Ergonomic way of dealing with LLM rate limits #142

Open
@tinco

Description

Is your feature request related to a problem? Please describe.
Transformers that use LLM calls suffer from overloading their endpoints, resulting in errors like these:

2024-07-10T17:47:10.993661Z  WARN ingestion_pipeline.run:transformers.metadata_qa_code:prompt: async_openai::client: Rate limited: Rate limit reached for gpt-3.5-turbo in organization org-Gna8CW74JAUnoOFeI6Ivvn03 on tokens per min (TPM): Limit 80000, Used 79866, Requested 258. Please try again in 93ms. Visit https://platform.openai.com/account/rate-limits to learn more.

Describe the solution you'd like
The LLM client needs to maintain a connection pool and apply adequate backpressure so that the pipeline does not overload the LLM endpoint.

Describe alternatives you've considered
Reducing concurrency or adding sleep would be suboptimal and not adaptive to changes in rate limits or hardware.

Additional context
After about 45 minutes hammering OpenAI, they close the connection:

2024-07-10T17:47:14.520091Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(19999), flags: (0x4: END_HEADERS) }
2024-07-10T17:47:14.520282Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(19999), flags: (0x1: END_STREAM) }
2024-07-10T17:47:14.520366Z DEBUG ingestion_pipeline.run:transformers.metadata_qa_code:prompt: hyper_util::client::legacy::pool: reuse idle connection for ("https", api.openai.com)
2024-07-10T17:47:14.521003Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(20001), flags: (0x4: END_HEADERS) }
2024-07-10T17:47:14.521151Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(20001), flags: (0x1: END_STREAM) }
2024-07-10T17:47:14.532274Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_read: received frame=GoAway { error_code: NO_ERROR, last_stream_id: StreamId(19999) }
2024-07-10T17:47:14.532655Z DEBUG ingestion_pipeline.run:transformers.metadata_qa_code:prompt: hyper_util::client::legacy::pool: reuse idle connection for ("https", api.openai.com)
2024-07-10T17:47:14.532940Z ERROR ingestion_pipeline.run:transformers.metadata_qa_code:prompt: swiftide::integrations::openai::simple_prompt: error=http error: error sending request for url (https://api.openai.com/v1/chat/completions)
2024-07-10T17:47:14.534130Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_write: send frame=Reset { stream_id: StreamId(19999), error_code: CANCEL }
thread 'main' panicked at src/main.rs:37:10:
Could not load documentation: http error: error sending request for url (https://api.openai.com/v1/chat/completions)

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions