Skip to content

Commit 58764f6

Browse files
committed
finish documentation
1 parent 031b592 commit 58764f6

7 files changed

Lines changed: 153 additions & 157 deletions

File tree

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,6 @@ run.sh
218218
.claude
219219
bruno
220220
docs/.astro
221-
docs/node_modules
221+
docs/node_modules
222222
docs/.cache
223223
docs/package-lock.json

docs/src/content/docs/configuration/dependencies/prometheus.mdx

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -26,12 +26,16 @@ This endpoint returns metrics in Prometheus text-based exposition format, which
2626
## List of metrics
2727

2828
All default metrics of *prometheus-fastapi-instrumentator* are available, see their [README](https://github.com/trallnag/prometheus-fastapi-instrumentator) for more information.
29-
These metrics are prefixed with `opengatellm_inference_`.
30-
31-
In addition, the following metrics are available:
32-
| Metric | Description |
33-
| --- | --- |
34-
| *work in progress, check our [roadmap](https://github.com/etalab-ia/OpenGateLLM/milestone/4/) for more information.* | ... |
29+
These metrics are prefixed by the namespace `ogl_`.
30+
31+
In addition, OpenGateLLM exposes the following metrics for inference:
32+
| Metric | Type | Description |
33+
| --- | --- | --- |
34+
| `ogl_inference_requests_total` | Counter | Total number of LLM requests (`endpoint`, `model`, `status_code`). |
35+
| `ogl_inference_requests_duration_seconds` | Histogram | Duration of LLM requests in seconds (`endpoint`, `model`, `status_code`). |
36+
| `ogl_inference_ttft_milliseconds` | Histogram | Time to first token for streaming responses in milliseconds (`endpoint`, `model`, `status_code`). |
37+
| `ogl_inference_output_tokens_per_second` | Histogram | Output generation speed in tokens/second (`endpoint`, `model`). |
38+
| `ogl_inference_tokens_total` | Counter | Total number of consumed tokens with `type=prompt|completion` (`endpoint`, `model`, `type`). |
3539

3640
## Grafana dashboard
3741

docs/src/content/docs/deployment/production.mdx

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,20 @@ This guide provides practical defaults and hardening recommendations for running
1818
session_secret_key: ${SESSION_SECRET_KEY}
1919
```
2020
21+
- Add `GUNICORN_CMD_ARGS` environment variable to the deployment configuration to configure the Gunicorn server.
22+
We recommend to use the following configuration:
23+
```bash
24+
GUNICORN_CMD_ARGS= --workers {{ workers }} --worker-connections 1000 --timeout 240 --keep-alive 75 --graceful-timeout 75
25+
```
26+
27+
Configure the number of workers based on the expected load, the number of CPU cores and max PostgreSQL connections.
28+
See [Gunicorn documentation](https://docs.gunicorn.org/en/latest/design.html#how-many-workers) for more information.
29+
2130
## Security and access control
2231

2332
- Use the master key only for bootstrap operations: **creating the first admin role and user**.
2433
- Do not use the master identity for day-to-day model administration. When you create a router, the model is shown with an `owned_by` attribute set to the organization of the user who created it.
25-
- Set a strong `auth_master_key` (at least 32 characters, high entropy..
34+
- Set a strong `auth_master_key` at least 32 characters, high entropy.
2635

2736
<Aside type="caution" title="Master key rotation impact">
2837
The master key is used to encrypt user API keys. If you change it, you need to regenerate all user API keys.

docs/src/content/docs/features/usage/budget.mdx

Lines changed: 46 additions & 92 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,14 @@ title: User budget
33
sidebar:
44
label: "[lucide:piggy-bank] User budget"
55
---
6-
import { Aside, Tabs, TabItem } from '@astrojs/starlight/components';
7-
8-
9-
OpenGateLLM allows you to define the costs for each model in the configuration file then attach a budget to each user to limit ...
6+
import { Aside, LinkButton, Tabs, TabItem } from '@astrojs/starlight/components';
107

8+
OpenGateLLM allows you to define the costs for each model router. For more information about model routers, see [setup your models documentation](/getting-started/models/).
9+
Then it attach a budget to each user to limit the usage of amount of requests made by the user.
1110
The compute cost is calculated based on the number of tokens used and the budget defined for the model based on the following formula:
1211

1312
```python
14-
cost = round((prompt_tokens / 1000000 * client.costs.prompt_tokens) + (completion_tokens / 1000000 * client.costs.completion_tokens), ndigits=6)
13+
cost = round((prompt_tokens / 1000000 * router.costs.prompt_tokens) + (completion_tokens / 1000000 * router.costs.completion_tokens), ndigits=6)
1514
```
1615

1716
The compute cost returned in the response, in the `usage.cost` field. After the request is processed, the budget amount of the user is updated by the [hooks decorator](https://github.com/etalab-ia/OpenGateLLM/blob/main/api/utils/hooks_decorator.py) attached to each endpoint. The request cost is stored in the *usage* table, see [usage monitoring documentation](/features/usage/inference_monitoring/) for more information.
@@ -21,47 +20,30 @@ The compute cost returned in the response, in the `usage.cost` field. After the
2120
There are three ways to configure model pricing used for budget computation: Playground UI, API, or configuration file.
2221

2322
<Tabs>
24-
<TabItem value="playground" label="Playground UI" icon="laptop">
23+
<TabItem value="playground" label="Playground UI" icon="lucide:laptop">
2524

2625
To define pricing in the Playground, go to the *Provider* page and create or edit a provider with:
2726
- **`Prompt token cost`**: Cost per million input tokens.
2827
- **`Completion token cost`**: Cost per million output tokens.
28+
</TabItem>
2929

30-
<Aside type="note" title="Cost calculation">
31-
If one of these values is missing, the corresponding part of the request cost cannot be computed accurately.
32-
</Aside>
30+
<TabItem value="config" label="API" icon="lucide:code">
3331

34-
</TabItem>
35-
<TabItem value="config" label="API" icon="document">
36-
* Create a new provider:
37-
```diff lang="bash"
38-
curl -X POST "http://localhost:8000/v1/admin/provider?router_id=1" \
39-
-H "Content-Type: application/json" \
40-
-H "Authorization: Bearer changeme" \
41-
-d '{
42-
"type": "vllm",
43-
"url": ${MODEL_API_URL},
44-
"key": ${MODEL_API_KEY},
45-
"model_name": "meta-llama/Llama-3.1-8B-Instruct",
46-
+ "model_cost_prompt_tokens": 0.1,
47-
+ "model_cost_completion_tokens": 0.3
48-
}'
49-
```
32+
See POST and PUT /v1/admin/routers endpoints for defining `model_cost_prompt_tokens` and `model_cost_completion_tokens` of a router in API reference.
33+
Prompt and completion token costs are expressed per million tokens.
5034

51-
* Update an existing provider:
52-
```bash
53-
curl -X PUT "http://localhost:8000/v1/admin/provider/{provider_id}" \
54-
-H "Content-Type: application/json" \
55-
-H "Authorization: Bearer changeme" \
56-
-d '{"model_cost_prompt_tokens": 0.1, "model_cost_completion_tokens": 0.3}'
57-
```
35+
<LinkButton href="/reference/" icon="external">API Reference</LinkButton>
5836
</TabItem>
59-
<TabItem value="config" label="Configuration file" icon="document">
37+
38+
<TabItem value="config" label="Configuration file" icon="lucide:file-text">
39+
<Aside type="caution">
40+
We not recommend to use the configuration file to setup your models parameters, prefer to use the Playground UI or API endpoints.
41+
</Aside>
42+
6043
To define model pricing in the configuration file, set the following fields for each provider:
6144
- **`model_cost_prompt_tokens`**: Cost per million prompt/input tokens.
6245
- **`model_cost_completion_tokens`**: Cost per million completion/output tokens.
6346

64-
For more information, see [configuration file documentation](/configuration/configuration_file/).
6547

6648
**Example:**
6749

@@ -78,69 +60,41 @@ There are three ways to configure model pricing used for budget computation: Pla
7860
model_cost_prompt_tokens: 0.1
7961
model_cost_completion_tokens: 0.3
8062
```
63+
64+
<LinkButton href="/configuration/configuration_file/" icon="external">Configuration file documentation</LinkButton>
8165
</TabItem>
8266
</Tabs>
8367
84-
For each model provider, you can define the costs of each model in the `config.yml` file for the prompt and completion tokens (per million tokens).
85-
86-
The following parameters are used for cost computation:
87-
- `model_cost_prompt_tokens`
88-
- `model_cost_completion_tokens`
89-
90-
For more information, see [Configuration file documentation](/configuration/configuration_file/) documentation.
91-
92-
**Example:**
93-
94-
```yaml
95-
models:
96-
[...]
97-
- name: my-language-model
98-
type: text-generation
99-
providers:
100-
- type: openai
101-
url: https://api.openai.com
102-
key: ${OPENAI_API_KEY}
103-
model_name: gpt-4o-mini
104-
model_cost_prompt_tokens: 0.1
105-
model_cost_completion_tokens: 0.3
106-
```
68+
<Aside type="note" title="Cost calculation">
69+
By default, there value are set to 0, this means that the model requests are free of charge.
70+
</Aside>
10771
10872
## Assign budget to a user
10973
11074
Each user has a budget defined by create user endpoint or update user endpoint. The budget is defined in the `budget` field. You need has `admin` permission to create or update a user.
11175

112-
<Tabs>
113-
<TabItem label="Create user" default>
114-
```diff lang="bash"
115-
curl -X POST http://localhost:8000/v1/admin/users \
116-
-H "Authorization: Bearer ${API_KEY}" \
117-
-H "Content-Type: application/json" \
118-
-d '{
119-
"email": "john.doe@example.com",
120-
"role": 1,
121-
+ "budget": 100
122-
}'
123-
```
124-
<Aside type="note" title="Budget undefined">
125-
If budget is not defined when user is create, the user has no limit on the number of requests.
126-
</Aside>
127-
128-
</TabItem>
129-
130-
<TabItem label="Update user">
131-
```diff lang="bash"
132-
curl -X PATCH http://localhost:8000/v1/admin/users/1 \
133-
-H "Authorization: Bearer ${API_KEY}" \
134-
-H "Content-Type: application/json" \
135-
-d '{
136-
+ "budget": 100
137-
}'
138-
```
139-
<Aside type="note" title="Budget undefined">
140-
If budget is not defined when user is updated, the user budget is set to None and user has no limit on the number of requests.
141-
</Aside>
142-
143-
</TabItem>
144-
</Tabs>
145-
146-
76+
See POST and PATCH /v1/admin/users endpoints for more information on [API reference](/reference/).
77+
78+
## Budget monitoring
79+
80+
The user can see each request cost in the response of the API request. The cost is returned in the `usage.cost` field.
81+
Moreover, *Usage* page in the Playground allows the user to see the history of the requests made by him and the cost of each.
82+
83+
```diff lang="json"
84+
{
85+
"id": "chatcmpl-123",
86+
"object": "chat.completion",
87+
"created": 1677652288,
88+
"model": "my-language-model",
89+
"choices": [
90+
...
91+
],
92+
"usage": {
93+
"prompt_tokens": 10,
94+
"completion_tokens": 20,
95+
"total_tokens": 30,
96+
+ "cost": 0.000015,
97+
"carbon": {"kWh": 0.0001456, "kgCO2eq": 0.0000672 }
98+
}
99+
}
100+
```

docs/src/content/docs/features/usage/environmental_footprint.mdx

Lines changed: 11 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Environmental footprint
33
sidebar:
44
label: "[lucide:leaf] Environmental footprint"
55
---
6-
import { Aside, Tabs, TabItem } from '@astrojs/starlight/components';
6+
import { Aside, LinkButton, Tabs, TabItem } from '@astrojs/starlight/components';
77

88
OpenGateLLM tracks the environmental impact of AI model usage through the [EcoLogits](https://ecologits.ai) library, which provides a comprehensive view of the environmental footprint of generative AI models at inference.
99

@@ -14,7 +14,7 @@ The environmental footprint is deduced from the number of parameters of the mode
1414
There are three way to configure environmental impact tracking, by Playground, API or configuration file.
1515

1616
<Tabs>
17-
<TabItem value="playground" label="Playground UI" icon="laptop">
17+
<TabItem label="Playground UI" icon="lucide:laptop">
1818

1919
To define model parameters in the Playground, go to the *Provider* page. Complete the form fields when you create or edit a model provider including:
2020
- **`Total params of the model`**: Total number of parameters of the model in billions of parameters for environmental footprint computation.
@@ -29,33 +29,13 @@ There are three way to configure environmental impact tracking, by Playground, A
2929

3030
</TabItem>
3131

32-
<TabItem value="config" label="API" icon="document">
33-
* Create a new provider:
34-
```diff lang="bash"
35-
curl -X POST "http://localhost:8000/v1/admin/provider?router_id=1" \
36-
-H "Content-Type: application/json" \
37-
-H "Authorization: Bearer changeme" \
38-
-d '{
39-
"type": "vllm",
40-
"url": ${MODEL_API_URL},
41-
"key": ${MODEL_API_KEY},
42-
"model_name": "meta-llama/Llama-3.1-8B-Instruct",
43-
+ "model_total_params": 8,
44-
+ "model_active_params": 8,
45-
+ "model_hosting_zone": "FRA"
46-
}'
47-
```
32+
<TabItem label="API" icon="lucide:code">
33+
See POST and PUT /v1/admin/providers endpoints for defining `model_total_params`, `model_active_params` and `model_hosting_zone` of a provider in API reference.
4834

49-
* Update an existing provider:
50-
```bash
51-
curl -X PUT "http://localhost:8000/v1/admin/provider/{provider_id}" \
52-
-H "Content-Type: application/json" \
53-
-H "Authorization: Bearer changeme" \
54-
-d '{"model_total_params": 35, "model_active_params": 35, "model_hosting_zone": "WOR"}'
55-
```
35+
<LinkButton href="/reference/" icon="external">API Reference</LinkButton>
5636
</TabItem>
5737

58-
<TabItem value="config" label="Configuration file" icon="document">
38+
<TabItem value="config" label="Configuration file" icon="lucide:file-text">
5939
<Aside type="caution">
6040
We not recommend to use the configuration file to setup your models parameters, prefer to use the Playground UI or API endpoints.
6141
</Aside>
@@ -65,8 +45,6 @@ There are three way to configure environmental impact tracking, by Playground, A
6545
- **`model_active_params`**: Active number of parameters of the model in billions of parameters for environmental footprint computation.
6646
- **`model_hosting_zone`**: Hosting zone of the model in ISO 3166-1 alpha-3 code format (e.g., `WOR` for World, `FRA` for France, `USA` for United States).
6747

68-
For more information, see [configuration file documentation](/configuration/configuration_file/).
69-
7048
**Example:**
7149

7250
```yaml
@@ -84,6 +62,8 @@ There are three way to configure environmental impact tracking, by Playground, A
8462
model_hosting_zone: WOR
8563
```
8664
65+
<LinkButton href="/configuration/configuration_file/" icon="external">Configuration file documentation</LinkButton>
66+
8767
<Aside type="note" title="Environmental footprint computation">
8868
Environmental footprint computation requires at least one of `model_total_params` or `model_active_params` to be defined. If not provided, the environmental impact will not be computed for that model provider (display as 0 kWh and 0 kgCO2eq).
8969

@@ -101,37 +81,21 @@ For each call to a generative AI model, the API returns environmental impact met
10181

10282
**Example response:**
10383

104-
```json
84+
```diff lang="json"
10585
{
10686
"id": "chatcmpl-123",
10787
"object": "chat.completion",
10888
"created": 1677652288,
10989
"model": "my-language-model",
11090
"choices": [
111-
{
112-
"index": 0,
113-
"message": {
114-
"role": "assistant",
115-
"content": "Hello! How can I help you today?"
116-
},
117-
"finish_reason": "stop"
118-
}
91+
...
11992
],
12093
"usage": {
12194
"prompt_tokens": 10,
12295
"completion_tokens": 20,
12396
"total_tokens": 30,
12497
"cost": 0.000015,
125-
"carbon": {
126-
"kWh": {
127-
"min": 0.0001234,
128-
"max": 0.0001456
129-
},
130-
"kgCO2eq": {
131-
"min": 0.0000567,
132-
"max": 0.0000672
133-
}
134-
}
98+
+ "carbon": {"kWh": 0.0001456, "kgCO2eq": 0.0000672 }
13599
}
136100
}
137101
```
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
title: Users management
3+
sidebar:
4+
label: "[lucide:users] Users management"
5+
---
6+
import { Aside } from '@astrojs/starlight/components';
7+
8+
<Aside type="caution">
9+
🚧 This page is under construction. 🚧
10+
</Aside>

0 commit comments

Comments
 (0)