Ergonomic way of dealing with LLM rate limits

**Is your feature request related to a problem? Please describe.**
Transformers that use LLM calls suffer from overloading their endpoints, resulting in errors like these:

```
2024-07-10T17:47:10.993661Z  WARN ingestion_pipeline.run:transformers.metadata_qa_code:prompt: async_openai::client: Rate limited: Rate limit reached for gpt-3.5-turbo in organization org-Gna8CW74JAUnoOFeI6Ivvn03 on tokens per min (TPM): Limit 80000, Used 79866, Requested 258. Please try again in 93ms. Visit https://platform.openai.com/account/rate-limits to learn more.
```

**Describe the solution you'd like**
The LLM client needs to maintain a connection pool and apply adequate backpressure so that the pipeline does not overload the LLM endpoint.

**Describe alternatives you've considered**
Reducing concurrency or adding sleep would be suboptimal and not adaptive to changes in rate limits or hardware.

**Additional context**
After about 45 minutes hammering OpenAI, they close the connection:

```
2024-07-10T17:47:14.520091Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(19999), flags: (0x4: END_HEADERS) }
2024-07-10T17:47:14.520282Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(19999), flags: (0x1: END_STREAM) }
2024-07-10T17:47:14.520366Z DEBUG ingestion_pipeline.run:transformers.metadata_qa_code:prompt: hyper_util::client::legacy::pool: reuse idle connection for ("https", api.openai.com)
2024-07-10T17:47:14.521003Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_write: send frame=Headers { stream_id: StreamId(20001), flags: (0x4: END_HEADERS) }
2024-07-10T17:47:14.521151Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_write: send frame=Data { stream_id: StreamId(20001), flags: (0x1: END_STREAM) }
2024-07-10T17:47:14.532274Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_read: received frame=GoAway { error_code: NO_ERROR, last_stream_id: StreamId(19999) }
2024-07-10T17:47:14.532655Z DEBUG ingestion_pipeline.run:transformers.metadata_qa_code:prompt: hyper_util::client::legacy::pool: reuse idle connection for ("https", api.openai.com)
2024-07-10T17:47:14.532940Z ERROR ingestion_pipeline.run:transformers.metadata_qa_code:prompt: swiftide::integrations::openai::simple_prompt: error=http error: error sending request for url (https://api.openai.com/v1/chat/completions)
2024-07-10T17:47:14.534130Z DEBUG ingestion_pipeline.run{total_nodes=283}:transformers.metadata_qa_text:prompt:Connection{peer=Client}: h2::codec::framed_write: send frame=Reset { stream_id: StreamId(19999), error_code: CANCEL }
thread 'main' panicked at src/main.rs:37:10:
Could not load documentation: http error: error sending request for url (https://api.openai.com/v1/chat/completions)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ergonomic way of dealing with LLM rate limits #142

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ergonomic way of dealing with LLM rate limits #142

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions