finish documentation

leoguillaume · leoguillaume · commit 58764f6a23f9 · 2026-03-04T09:10:32.000+01:00
diff --git a/.gitignore b/.gitignore
@@ -218,6 +218,6 @@ run.sh
 .claude
 bruno
 docs/.astro
-docs/node_modules 
+docs/node_modules
 docs/.cache
 docs/package-lock.json
diff --git a/docs/src/content/docs/configuration/dependencies/prometheus.mdx b/docs/src/content/docs/configuration/dependencies/prometheus.mdx
@@ -26,12 +26,16 @@ This endpoint returns metrics in Prometheus text-based exposition format, which
 ## List of metrics
 
 All default metrics of *prometheus-fastapi-instrumentator* are available, see their [README](https://github.com/trallnag/prometheus-fastapi-instrumentator) for more information.
-These metrics are prefixed with `opengatellm_inference_`.
-
-In addition, the following metrics are available:
-| Metric | Description |
-| --- | --- |
-| *work in progress, check our [roadmap](https://github.com/etalab-ia/OpenGateLLM/milestone/4/) for more information.* | ... |
+These metrics are prefixed by the namespace `ogl_`.
+
+In addition, OpenGateLLM exposes the following metrics for inference:
+| Metric | Type | Description |
+| --- | --- | --- |
+| `ogl_inference_requests_total` | Counter | Total number of LLM requests (`endpoint`, `model`, `status_code`). |
+| `ogl_inference_requests_duration_seconds` | Histogram | Duration of LLM requests in seconds (`endpoint`, `model`, `status_code`). |
+| `ogl_inference_ttft_milliseconds` | Histogram | Time to first token for streaming responses in milliseconds (`endpoint`, `model`, `status_code`). |
+| `ogl_inference_output_tokens_per_second` | Histogram | Output generation speed in tokens/second (`endpoint`, `model`). |
+| `ogl_inference_tokens_total` | Counter | Total number of consumed tokens with `type=prompt|completion` (`endpoint`, `model`, `type`). |
 
 ## Grafana dashboard
 
diff --git a/docs/src/content/docs/deployment/production.mdx b/docs/src/content/docs/deployment/production.mdx
@@ -18,11 +18,20 @@ This guide provides practical defaults and hardening recommendations for running
     session_secret_key: ${SESSION_SECRET_KEY}
   ```
 
+- Add `GUNICORN_CMD_ARGS` environment variable to the deployment configuration to configure the Gunicorn server.
+  We recommend to use the following configuration:
+  ```bash
+  GUNICORN_CMD_ARGS= --workers {{ workers }} --worker-connections 1000 --timeout 240 --keep-alive 75 --graceful-timeout 75
+  ```
+
+  Configure the number of workers based on the expected load, the number of CPU cores and max PostgreSQL connections.
+  See [Gunicorn documentation](https://docs.gunicorn.org/en/latest/design.html#how-many-workers) for more information.
+
 ## Security and access control
 
 - Use the master key only for bootstrap operations: **creating the first admin role and user**.
 - Do not use the master identity for day-to-day model administration. When you create a router, the model is shown with an `owned_by` attribute set to the organization of the user who created it.
-- Set a strong `auth_master_key` (at least 32 characters, high entropy..
+- Set a strong `auth_master_key` at least 32 characters, high entropy.
 
   <Aside type="caution" title="Master key rotation impact">
     The master key is used to encrypt user API keys. If you change it, you need to regenerate all user API keys.
diff --git a/docs/src/content/docs/features/usage/budget.mdx b/docs/src/content/docs/features/usage/budget.mdx
@@ -3,15 +3,14 @@ title: User budget
 sidebar:
   label: "[lucide:piggy-bank] User budget"
 ---
-import { Aside, Tabs, TabItem } from '@astrojs/starlight/components';
-
-
-OpenGateLLM allows you to define the costs for each model in the configuration file then attach a budget to each user to limit ...
+import { Aside, LinkButton, Tabs, TabItem } from '@astrojs/starlight/components';
 
+OpenGateLLM allows you to define the costs for each model router. For more information about model routers, see [setup your models documentation](/getting-started/models/).
+Then it attach a budget to each user to limit the usage of amount of requests made by the user.
 The compute cost is calculated based on the number of tokens used and the budget defined for the model based on the following formula:
 
 ```python
-cost = round((prompt_tokens / 1000000 * client.costs.prompt_tokens) + (completion_tokens / 1000000 * client.costs.completion_tokens), ndigits=6)
+cost = round((prompt_tokens / 1000000 * router.costs.prompt_tokens) + (completion_tokens / 1000000 * router.costs.completion_tokens), ndigits=6)
 ```
 
 The compute cost returned in the response, in the `usage.cost` field. After the request is processed, the budget amount of the user is updated by the [hooks decorator](https://github.com/etalab-ia/OpenGateLLM/blob/main/api/utils/hooks_decorator.py) attached to each endpoint. The request cost is stored in the *usage* table, see [usage monitoring documentation](/features/usage/inference_monitoring/) for more information. 
@@ -21,47 +20,30 @@ The compute cost returned in the response, in the `usage.cost` field. After the
 There are three ways to configure model pricing used for budget computation: Playground UI, API, or configuration file.
 
 <Tabs>
-  <TabItem value="playground" label="Playground UI" icon="laptop">
+  <TabItem value="playground" label="Playground UI" icon="lucide:laptop">
   
   To define pricing in the Playground, go to the *Provider* page and create or edit a provider with:
   - **`Prompt token cost`**: Cost per million input tokens.
   - **`Completion token cost`**: Cost per million output tokens.
+  </TabItem>
 
-  <Aside type="note" title="Cost calculation">
-  If one of these values is missing, the corresponding part of the request cost cannot be computed accurately.
-  </Aside>
+  <TabItem value="config" label="API" icon="lucide:code">
 
-  </TabItem>
-  <TabItem value="config" label="API" icon="document">
-  * Create a new provider:
-  ```diff lang="bash"
-  curl -X POST "http://localhost:8000/v1/admin/provider?router_id=1" \
-   -H "Content-Type: application/json" \
-   -H "Authorization: Bearer changeme" \
-   -d '{
-        "type": "vllm",
-        "url": ${MODEL_API_URL},
-        "key": ${MODEL_API_KEY},
-        "model_name": "meta-llama/Llama-3.1-8B-Instruct",
-  +      "model_cost_prompt_tokens": 0.1,
-  +      "model_cost_completion_tokens": 0.3
-      }'
-  ```
+  See POST and PUT /v1/admin/routers endpoints for defining `model_cost_prompt_tokens` and `model_cost_completion_tokens` of a router in API reference.
+  Prompt and completion token costs are expressed per million tokens.
   
-  * Update an existing provider:
-  ```bash
-  curl -X PUT "http://localhost:8000/v1/admin/provider/{provider_id}" \
-  -H "Content-Type: application/json" \
-  -H "Authorization: Bearer changeme" \
-  -d '{"model_cost_prompt_tokens": 0.1, "model_cost_completion_tokens": 0.3}'
-  ```
+  <LinkButton href="/reference/" icon="external">API Reference</LinkButton>
   </TabItem>
-  <TabItem value="config" label="Configuration file" icon="document">
+
+  <TabItem value="config" label="Configuration file" icon="lucide:file-text">
+  <Aside type="caution">
+  We not recommend to use the configuration file to setup your models parameters, prefer to use the Playground UI or API endpoints.
+  </Aside>
+
   To define model pricing in the configuration file, set the following fields for each provider:
   - **`model_cost_prompt_tokens`**: Cost per million prompt/input tokens.
   - **`model_cost_completion_tokens`**: Cost per million completion/output tokens.
 
-  For more information, see [configuration file documentation](/configuration/configuration_file/).
 
   **Example:**
 
@@ -78,69 +60,41 @@ There are three ways to configure model pricing used for budget computation: Pla
           model_cost_prompt_tokens: 0.1
           model_cost_completion_tokens: 0.3
   ```
+
+  <LinkButton href="/configuration/configuration_file/" icon="external">Configuration file documentation</LinkButton>
   </TabItem>
 </Tabs>
 
-For each model provider, you can define the costs of each model in the `config.yml` file for the prompt and completion tokens (per million tokens). 
-
-The following parameters are used for cost computation:
-- `model_cost_prompt_tokens`
-- `model_cost_completion_tokens`
-
-For more information, see [Configuration file documentation](/configuration/configuration_file/) documentation.
-
-**Example:**
-
-```yaml
-models:
-  [...]
-  - name: my-language-model
-    type: text-generation
-    providers:
-      - type: openai
-        url: https://api.openai.com
-        key: ${OPENAI_API_KEY}
-        model_name: gpt-4o-mini
-        model_cost_prompt_tokens: 0.1
-        model_cost_completion_tokens: 0.3
-```
+<Aside type="note" title="Cost calculation">
+By default, there value are set to 0, this means that the model requests are free of charge.
+</Aside>
 
 ## Assign budget to a user
 
 Each user has a budget defined by create user endpoint or update user endpoint. The budget is defined in the `budget` field. You need has `admin` permission to create or update a user.
 
-<Tabs>
-  <TabItem label="Create user" default>
-  ```diff lang="bash"
-    curl -X POST http://localhost:8000/v1/admin/users \
-    -H "Authorization: Bearer ${API_KEY}" \
-    -H "Content-Type: application/json" \
-    -d '{
-        "email": "john.doe@example.com",
-        "role": 1,
-  +     "budget": 100
-    }'
-    ```
-    <Aside type="note" title="Budget undefined">
-    If budget is not defined when user is create, the user has no limit on the number of requests.
-    </Aside>
-
-  </TabItem>
-
-  <TabItem label="Update user">
-  ```diff lang="bash"
-    curl -X PATCH http://localhost:8000/v1/admin/users/1 \
-    -H "Authorization: Bearer ${API_KEY}" \
-    -H "Content-Type: application/json" \
-    -d '{
-  +     "budget": 100
-    }'
-    ```
-    <Aside type="note" title="Budget undefined">
-    If budget is not defined when user is updated, the user budget is set to None and user has no limit on the number of requests.
-    </Aside>
-
-    </TabItem>
-</Tabs>
-
-
+See POST and PATCH /v1/admin/users endpoints for more information on [API reference](/reference/).
+
+## Budget monitoring
+
+The user can see each request cost in the response of the API request. The cost is returned in the `usage.cost` field. 
+Moreover, *Usage* page in the Playground allows the user to see the history of the requests made by him and the cost of each.
+
+```diff lang="json"
+{
+  "id": "chatcmpl-123",
+  "object": "chat.completion",
+  "created": 1677652288,
+  "model": "my-language-model",
+  "choices": [
+    ...
+  ],
+  "usage": {
+    "prompt_tokens": 10,
+    "completion_tokens": 20,
+    "total_tokens": 30,
++    "cost": 0.000015,
+    "carbon": {"kWh": 0.0001456, "kgCO2eq": 0.0000672 }
+  }
+}
+```
diff --git a/docs/src/content/docs/features/usage/environmental_footprint.mdx b/docs/src/content/docs/features/usage/environmental_footprint.mdx
@@ -3,7 +3,7 @@ title: Environmental footprint
 sidebar:
   label: "[lucide:leaf] Environmental footprint"
 ---
-import { Aside, Tabs, TabItem } from '@astrojs/starlight/components';
+import { Aside, LinkButton, Tabs, TabItem } from '@astrojs/starlight/components';
 
 OpenGateLLM tracks the environmental impact of AI model usage through the [EcoLogits](https://ecologits.ai) library, which provides a comprehensive view of the environmental footprint of generative AI models at inference.
 
@@ -14,7 +14,7 @@ The environmental footprint is deduced from the number of parameters of the mode
 There are three way to configure environmental impact tracking, by Playground, API or configuration file.
 
 <Tabs>
-  <TabItem value="playground" label="Playground UI" icon="laptop">
+  <TabItem label="Playground UI" icon="lucide:laptop">
   
   To define model parameters in the Playground, go to the *Provider* page. Complete the form fields when you create or edit a model provider including:
   - **`Total params of the model`**: Total number of parameters of the model in billions of parameters for environmental footprint computation.
@@ -29,33 +29,13 @@ There are three way to configure environmental impact tracking, by Playground, A
 
   </TabItem>
 
-  <TabItem value="config" label="API" icon="document">
-  * Create a new provider:
-  ```diff lang="bash"
-  curl -X POST "http://localhost:8000/v1/admin/provider?router_id=1" \
-   -H "Content-Type: application/json" \
-   -H "Authorization: Bearer changeme" \
-   -d '{
-        "type": "vllm",
-        "url": ${MODEL_API_URL},
-        "key": ${MODEL_API_KEY},
-        "model_name": "meta-llama/Llama-3.1-8B-Instruct",
-  +      "model_total_params": 8,
-  +      "model_active_params": 8,
-  +      "model_hosting_zone": "FRA"
-      }'
-  ```
+  <TabItem label="API" icon="lucide:code">
+  See POST and PUT /v1/admin/providers endpoints for defining `model_total_params`, `model_active_params` and `model_hosting_zone` of a provider in API reference.
   
-  * Update an existing provider:
-  ```bash
-  curl -X PUT "http://localhost:8000/v1/admin/provider/{provider_id}" \
-  -H "Content-Type: application/json" \
-  -H "Authorization: Bearer changeme" \
-  -d '{"model_total_params": 35, "model_active_params": 35, "model_hosting_zone": "WOR"}'
-  ```
+  <LinkButton href="/reference/" icon="external">API Reference</LinkButton>
   </TabItem>
 
-  <TabItem value="config" label="Configuration file" icon="document">
+  <TabItem value="config" label="Configuration file" icon="lucide:file-text">
   <Aside type="caution">
   We not recommend to use the configuration file to setup your models parameters, prefer to use the Playground UI or API endpoints.
   </Aside>
@@ -65,8 +45,6 @@ There are three way to configure environmental impact tracking, by Playground, A
   - **`model_active_params`**: Active number of parameters of the model in billions of parameters for environmental footprint computation.
   - **`model_hosting_zone`**: Hosting zone of the model in ISO 3166-1 alpha-3 code format (e.g., `WOR` for World, `FRA` for France, `USA` for United States).
 
-  For more information, see [configuration file documentation](/configuration/configuration_file/).
-
   **Example:**
 
   ```yaml
@@ -84,6 +62,8 @@ There are three way to configure environmental impact tracking, by Playground, A
           model_hosting_zone: WOR
   ```
 
+  <LinkButton href="/configuration/configuration_file/" icon="external">Configuration file documentation</LinkButton>
+
   <Aside type="note" title="Environmental footprint computation">
   Environmental footprint computation requires at least one of `model_total_params` or `model_active_params` to be defined. If not provided, the environmental impact will not be computed for that model provider (display as 0 kWh and 0 kgCO2eq).
 
@@ -101,37 +81,21 @@ For each call to a generative AI model, the API returns environmental impact met
 
 **Example response:**
 
-```json
+```diff lang="json"
 {
   "id": "chatcmpl-123",
   "object": "chat.completion",
   "created": 1677652288,
   "model": "my-language-model",
   "choices": [
-    {
-      "index": 0,
-      "message": {
-        "role": "assistant",
-        "content": "Hello! How can I help you today?"
-      },
-      "finish_reason": "stop"
-    }
+    ...
   ],
   "usage": {
     "prompt_tokens": 10,
     "completion_tokens": 20,
     "total_tokens": 30,
     "cost": 0.000015,
-    "carbon": {
-      "kWh": {
-        "min": 0.0001234,
-        "max": 0.0001456
-      },
-      "kgCO2eq": {
-        "min": 0.0000567,
-        "max": 0.0000672
-      }
-    }
++    "carbon": {"kWh": 0.0001456, "kgCO2eq": 0.0000672 }
   }
 }
 ```
diff --git a/docs/src/content/docs/features/users_management/index.mdx b/docs/src/content/docs/features/users_management/index.mdx
@@ -0,0 +1,10 @@
+---
+title: Users management
+sidebar:
+  label: "[lucide:users] Users management"
+---
+import { Aside } from '@astrojs/starlight/components';
+
+<Aside type="caution">
+🚧 This page is under construction. 🚧
+</Aside>
diff --git a/docs/src/content/docs/getting-started/models.mdx b/docs/src/content/docs/getting-started/models.mdx