You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/content/docs/configuration/dependencies/prometheus.mdx
+10-6Lines changed: 10 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,12 +26,16 @@ This endpoint returns metrics in Prometheus text-based exposition format, which
26
26
## List of metrics
27
27
28
28
All default metrics of *prometheus-fastapi-instrumentator* are available, see their [README](https://github.com/trallnag/prometheus-fastapi-instrumentator) for more information.
29
-
These metrics are prefixed with `opengatellm_inference_`.
30
-
31
-
In addition, the following metrics are available:
32
-
| Metric | Description |
33
-
| --- | --- |
34
-
|*work in progress, check our [roadmap](https://github.com/etalab-ia/OpenGateLLM/milestone/4/) for more information.*| ... |
29
+
These metrics are prefixed by the namespace `ogl_`.
30
+
31
+
In addition, OpenGateLLM exposes the following metrics for inference:
32
+
| Metric | Type | Description |
33
+
| --- | --- | --- |
34
+
|`ogl_inference_requests_total`| Counter | Total number of LLM requests (`endpoint`, `model`, `status_code`). |
35
+
|`ogl_inference_requests_duration_seconds`| Histogram | Duration of LLM requests in seconds (`endpoint`, `model`, `status_code`). |
36
+
|`ogl_inference_ttft_milliseconds`| Histogram | Time to first token for streaming responses in milliseconds (`endpoint`, `model`, `status_code`). |
Configure the number of workers based on the expected load, the number of CPU cores and max PostgreSQL connections.
28
+
See [Gunicorn documentation](https://docs.gunicorn.org/en/latest/design.html#how-many-workers) for more information.
29
+
21
30
## Security and access control
22
31
23
32
- Use the master key only for bootstrap operations: **creating the first admin role and user**.
24
33
- Do not use the master identity for day-to-day model administration. When you create a router, the model is shown with an `owned_by` attribute set to the organization of the user who created it.
25
-
- Set a strong `auth_master_key` (at least 32 characters, high entropy..
34
+
- Set a strong `auth_master_key` at least 32 characters, high entropy.
OpenGateLLM allows you to define the costs for each model router. For more information about model routers, see [setup your models documentation](/getting-started/models/).
9
+
Then it attach a budget to each user to limit the usage of amount of requests made by the user.
11
10
The compute cost is calculated based on the number of tokens used and the budget defined for the model based on the following formula:
The compute cost returned in the response, in the `usage.cost` field. After the request is processed, the budget amount of the user is updated by the [hooks decorator](https://github.com/etalab-ia/OpenGateLLM/blob/main/api/utils/hooks_decorator.py) attached to each endpoint. The request cost is stored in the *usage* table, see [usage monitoring documentation](/features/usage/inference_monitoring/) for more information.
@@ -21,47 +20,30 @@ The compute cost returned in the response, in the `usage.cost` field. After the
21
20
There are three ways to configure model pricing used for budget computation: Playground UI, API, or configuration file.
For each model provider, you can define the costs of each model in the `config.yml` file for the prompt and completion tokens (per million tokens).
85
-
86
-
The following parameters are used for cost computation:
87
-
- `model_cost_prompt_tokens`
88
-
- `model_cost_completion_tokens`
89
-
90
-
For more information, see [Configuration file documentation](/configuration/configuration_file/) documentation.
91
-
92
-
**Example:**
93
-
94
-
```yaml
95
-
models:
96
-
[...]
97
-
- name: my-language-model
98
-
type: text-generation
99
-
providers:
100
-
- type: openai
101
-
url: https://api.openai.com
102
-
key: ${OPENAI_API_KEY}
103
-
model_name: gpt-4o-mini
104
-
model_cost_prompt_tokens: 0.1
105
-
model_cost_completion_tokens: 0.3
106
-
```
68
+
<Aside type="note" title="Cost calculation">
69
+
By default, there value are set to 0, this means that the model requests are free of charge.
70
+
</Aside>
107
71
108
72
## Assign budget to a user
109
73
110
74
Each user has a budget defined by create user endpoint or update user endpoint. The budget is defined in the `budget` field. You need has `admin` permission to create or update a user.
111
75
112
-
<Tabs>
113
-
<TabItem label="Create user" default>
114
-
```diff lang="bash"
115
-
curl -X POST http://localhost:8000/v1/admin/users \
116
-
-H "Authorization: Bearer ${API_KEY}" \
117
-
-H "Content-Type: application/json" \
118
-
-d '{
119
-
"email": "john.doe@example.com",
120
-
"role": 1,
121
-
+ "budget": 100
122
-
}'
123
-
```
124
-
<Aside type="note" title="Budget undefined">
125
-
If budget is not defined when user is create, the user has no limit on the number of requests.
OpenGateLLM tracks the environmental impact of AI model usage through the [EcoLogits](https://ecologits.ai) library, which provides a comprehensive view of the environmental footprint of generative AI models at inference.
9
9
@@ -14,7 +14,7 @@ The environmental footprint is deduced from the number of parameters of the mode
14
14
There are three way to configure environmental impact tracking, by Playground, API or configuration file.
To define model parameters in the Playground, go to the *Provider* page. Complete the form fields when you create or edit a model provider including:
20
20
-**`Total params of the model`**: Total number of parameters of the model in billions of parameters for environmental footprint computation.
@@ -29,33 +29,13 @@ There are three way to configure environmental impact tracking, by Playground, A
29
29
30
30
</TabItem>
31
31
32
-
<TabItemvalue="config"label="API"icon="document">
33
-
* Create a new provider:
34
-
```diff lang="bash"
35
-
curl -X POST "http://localhost:8000/v1/admin/provider?router_id=1" \
36
-
-H "Content-Type: application/json" \
37
-
-H "Authorization: Bearer changeme" \
38
-
-d '{
39
-
"type": "vllm",
40
-
"url": ${MODEL_API_URL},
41
-
"key": ${MODEL_API_KEY},
42
-
"model_name": "meta-llama/Llama-3.1-8B-Instruct",
43
-
+ "model_total_params": 8,
44
-
+ "model_active_params": 8,
45
-
+ "model_hosting_zone": "FRA"
46
-
}'
47
-
```
32
+
<TabItemlabel="API"icon="lucide:code">
33
+
See POST and PUT /v1/admin/providers endpoints for defining `model_total_params`, `model_active_params` and `model_hosting_zone` of a provider in API reference.
48
34
49
-
* Update an existing provider:
50
-
```bash
51
-
curl -X PUT "http://localhost:8000/v1/admin/provider/{provider_id}" \
We not recommend to use the configuration file to setup your models parameters, prefer to use the Playground UI or API endpoints.
61
41
</Aside>
@@ -65,8 +45,6 @@ There are three way to configure environmental impact tracking, by Playground, A
65
45
-**`model_active_params`**: Active number of parameters of the model in billions of parameters for environmental footprint computation.
66
46
-**`model_hosting_zone`**: Hosting zone of the model in ISO 3166-1 alpha-3 code format (e.g., `WOR` for World, `FRA` for France, `USA` for United States).
67
47
68
-
For more information, see [configuration file documentation](/configuration/configuration_file/).
69
-
70
48
**Example:**
71
49
72
50
```yaml
@@ -84,6 +62,8 @@ There are three way to configure environmental impact tracking, by Playground, A
Environmental footprint computation requires at least one of `model_total_params` or `model_active_params` to be defined. If not provided, the environmental impact will not be computed for that model provider (display as 0 kWh and 0 kgCO2eq).
89
69
@@ -101,37 +81,21 @@ For each call to a generative AI model, the API returns environmental impact met
0 commit comments