Open
Description
Is your feature request related to a problem? Please describe.
Transformers that use LLM calls suffer from overloading their endpoints, resulting in errors like these:
2024-07-10T17:47:10.993661Z WARN ingestion_pipeline.run:transformers.metadata_qa_code:prompt: async_openai::client: Rate limited: Rate limit reached for gpt-3.5-turbo in organization org-Gna8CW74JAUnoOFeI6Ivvn03 on tokens per min (TPM): Limit 80000, Used 79866, Requested 258. Please try again in 93ms. Visit https://platform.openai.com/account/rate-limits to learn more.
Describe the solution you'd like
The LLM client needs to maintain a connection pool and apply adequate backpressure so that the pipeline does not overload the LLM endpoint.
Describe alternatives you've considered
Reducing concurrency or adding sleep would be suboptimal and not adaptive to changes in rate limits or hardware.
Additional context
After about 45 minutes hammering OpenAI, they close the connection:
2024-07-10T17:47:14.520091Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(19999), flags: (0x4: END_HEADERS) }
2024-07-10T17:47:14.520282Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(19999), flags: (0x1: END_STREAM) }
2024-07-10T17:47:14.520366Z DEBUG ingestion_pipeline.run:transformers.metadata_qa_code:prompt: hyper_util::client::legacy::pool: reuse idle connection for ("https", api.openai.com)
2024-07-10T17:47:14.521003Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(20001), flags: (0x4: END_HEADERS) }
2024-07-10T17:47:14.521151Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(20001), flags: (0x1: END_STREAM) }
2024-07-10T17:47:14.532274Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_read: received frame=GoAway { error_code: NO_ERROR, last_stream_id: StreamId(19999) }
2024-07-10T17:47:14.532655Z DEBUG ingestion_pipeline.run:transformers.metadata_qa_code:prompt: hyper_util::client::legacy::pool: reuse idle connection for ("https", api.openai.com)
2024-07-10T17:47:14.532940Z ERROR ingestion_pipeline.run:transformers.metadata_qa_code:prompt: swiftide::integrations::openai::simple_prompt: error=http error: error sending request for url (https://api.openai.com/v1/chat/completions)
2024-07-10T17:47:14.534130Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_write: send frame=Reset { stream_id: StreamId(19999), error_code: CANCEL }
thread 'main' panicked at src/main.rs:37:10:
Could not load documentation: http error: error sending request for url (https://api.openai.com/v1/chat/completions)