Skip to content

Commit 79ed281

Browse files
Update README.md
1 parent 5118829 commit 79ed281

1 file changed

Lines changed: 55 additions & 41 deletions

File tree

README.md

Lines changed: 55 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -79,44 +79,76 @@ Monitor usage and manage spend via the model dashboard on [Lightning AI](https:/
7979
✅ Bring your own model (connect your API keys, coming soon...)
8080
✅ Chat logs (coming soon...)
8181

82-
# Advanced features
82+
<br/>
8383

84-
## Concurrency with async
84+
# Advanced features
8585

86-
LitAI supports asynchronous execution, allowing you to handle multiple requests concurrently without blocking. This is especially useful in high-throughput applications like chatbots, APIs, or agent loops.
86+
### Auto fallbacks and retries
8787

88-
To enable async behavior, set `enable_async=True` when initializing the `LLM` class. Then use `await llm.chat(...)` inside an `async` function.
88+
Model APIs can flake or can have outages. LitAI LitAI automatically retries in case of failures. After multiple failures it can automatically fallback to other models in case the provider is down.
8989

9090
```python
91-
import asyncio
9291
from litai import LLM
9392

93+
llm = LLM(
94+
model="openai/gpt-4",
95+
fallback_models=["google/gemini-2.5-flash", "anthropic/claude-3-5-sonnet-20240620"],
96+
max_retries=4,
97+
)
9498

95-
async def main():
96-
llm = LLM(model="openai/gpt-4", teamspace="lightning-ai/litai", enable_async=True)
97-
print(await llm.chat("who are you?"))
99+
print(llm.chat("How do I fine-tune an LLM?"))
100+
```
98101

102+
Details:
103+
- Fallback models are tried in the order provided.
104+
- Each model gets up to `max_retries` attempts independently.
105+
- The first successful response is returned immediately.
106+
- If all models fail after their retry limits, LitAI raises an error.
99107

100-
if __name__ == "__main__":
101-
asyncio.run(main())
102-
```
103108

104-
## Streaming
109+
<details>
110+
<summary>Streaming</summary>
105111

106-
Stream the model response as it's being generated.
112+
Real-time chat applications benefit from showing words as they generate which gives the illusion of faster speed to the user. Streaming
113+
is the mechanism that allows you to do this.
107114

108115
```python
109116
from litai import LLM
110117

111118
llm = LLM(model="openai/gpt-4")
112119
for chunk in llm.chat("hello", stream=True):
113120
print(chunk, end="", flush=True)
121+
````
122+
</details>
123+
124+
<details>
125+
<summary>Concurrency with async</summary>
126+
127+
Advanced Python programs that process multiple requests at once rely on "async" to do this. LitAI can work with async libraries without blocking calls. This is especially useful in high-throughput applications like chatbots, APIs, or agent loops.
128+
129+
To enable async behavior, set `enable_async=True` when initializing the `LLM` class. Then use `await llm.chat(...)` inside an `async` function.
130+
131+
```python
132+
import asyncio
133+
from litai import LLM
134+
135+
async def main():
136+
llm = LLM(model="openai/gpt-4", teamspace="lightning-ai/litai", enable_async=True)
137+
print(await llm.chat("who are you?"))
138+
139+
140+
if __name__ == "__main__":
141+
asyncio.run(main())
114142
```
115143

116-
## Conversations
144+
</details>
145+
117146

118-
Keep chat history across multiple turns so the model remembers context.
119-
This is useful for assistants, summarizers, or research tools that need multi-turn chat history.
147+
<details>
148+
<summary>Multi-turn conversations</summary>
149+
150+
Models only know the message that was sent to them. To enable them to respond with memory of all the messages sent to it so far, track the related
151+
message under the same conversation. This is useful for assistants, summarizers, or research tools that need multi-turn chat history.
120152

121153
Each conversation is identified by a unique name. LitAI stores conversation history separately for each name.
122154

@@ -145,13 +177,15 @@ llm.chat("What's a RAG pipeline?", conversation="research")
145177

146178
print(llm.list_conversations())
147179
```
180+
</details>
148181

149-
## Switch models
182+
<details>
183+
<summary>Switch models on each call</summary>
150184

151-
Use the best model for each task.
152-
LitAI lets us dynamically switch models at request time.
185+
In certain applications you may want to call ChatGPT in one message and Anthropic in another so you can use the best model for each task.
186+
LitAI lets you dynamically switch models at request time.
153187

154-
We set a default model when initializing `LLM` and override it with the `model` parameter only when needed.
188+
Set a default model when initializing `LLM` and override it with the `model` parameter only when needed.
155189

156190
```python
157191
from litai import LLM
@@ -170,25 +204,5 @@ print(llm.chat("Who created you?", model="google/gemini-2.5-flash"))
170204
print(llm.chat("Who created you?"))
171205
# >> I am a large language model, trained by OpenAI.
172206
```
207+
</details>
173208

174-
## Fallbacks and retries
175-
176-
Ensure reliable responses even if a model is unavailable.\
177-
LitAI automatically retries requests and switches to fallback models in order.
178-
179-
- Fallback models are tried in the order provided.
180-
- Each model gets up to `max_retries` attempts independently.
181-
- The first successful response is returned immediately.
182-
- If all models fail after their retry limits, LitAI raises an error.
183-
184-
```python
185-
from litai import LLM
186-
187-
llm = LLM(
188-
model="openai/gpt-4",
189-
fallback_models=["google/gemini-2.5-flash", "anthropic/claude-3-5-sonnet-20240620"],
190-
max_retries=4,
191-
)
192-
193-
print(llm.chat("How do I fine-tune an LLM?"))
194-
```

0 commit comments

Comments
 (0)