Skip to content

Commit aa96a93

Browse files
authored
doc(adr): elasticsearch scaling (#668)
1 parent 4f8de1b commit aa96a93

3 files changed

Lines changed: 448 additions & 1 deletion

File tree

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -218,4 +218,5 @@ compose.yml
218218
playground/.web
219219
playground/.states
220220
playground/.gitignore
221-
playground/requirements.txt
221+
playground/requirements.txt
222+
run.sh

adr/2026-01-30-es-scaling.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# ADR - 2026-01-30 - Elasticsearch Scaling
2+
3+
**Status:** In Progress
4+
**Date:** 2026-01-30
5+
**Authors:** Development Team
6+
**Decision Outcome:** Merge Elasticsearch indices into a single index
7+
8+
---
9+
10+
## Context
11+
12+
[WIP]
13+
14+
## Migration script
15+
16+
The [`adr/scripts/2026-01-30-es-migration.py`](adr/scripts/2026-01-30-es-migration.py) script is used to migrate the data from the source Elasticsearch indices to the destination Elasticsearch index.
17+
18+
1. Clone the repository on the target server
19+
20+
```bash
21+
git clone https://github.com/etalab-ia/OpenGateLLM.git && cd OpenGateLLM
22+
```
23+
24+
2. Install the dependencies
25+
26+
> [!NOTE]
27+
> We recommend to create a virtual environment and activate it before installing the dependencies.
28+
29+
```bash
30+
pip install ".[api]"
31+
```
32+
33+
3. Create the bash script to run the migration
34+
35+
```bash
36+
touch run.sh
37+
```
38+
39+
The script should look like this:
40+
41+
```bash
42+
#!/bin/bash
43+
export PYTHONPATH=.
44+
export POSTGRES_URL=
45+
46+
export SOURCE_ES_URL=
47+
export SOURCE_ES_USERNAME=
48+
export SOURCE_ES_PASSWORD=
49+
50+
export DESTINATION_ES_URL=
51+
export DESTINATION_ES_USERNAME=
52+
export DESTINATION_ES_PASSWORD=
53+
export DESTINATION_ES_INDEX_NAME=
54+
export DESTINATION_ES_VECTOR_SIZE=
55+
export DESTINATION_ES_NUMBER_OF_SHARDS=
56+
export DESTINATION_ES_NUMBER_OF_REPLICAS=
57+
export DESTINATION_ES_INDEX_LANGUAGE=
58+
59+
python adr/scripts/2026-01-30-es-migration.py > migration.log 2>&1
60+
```
61+
62+
The environment variables are:
63+
64+
| Variable | Description | Example |
65+
|----------|-------------|-------------|
66+
| `POSTGRES_URL` | The URL of the PostgreSQL database. The URL must be in the format `postgresql+asyncpg://<username>:<password>@<host>:<port>/<database>`. | `postgresql+asyncpg://postgres:changeme@localhost:5432/postgres` |
67+
| `SOURCE_ES_URL` | The URL of the source Elasticsearch cluster must be in the format `http://<host>:<port>`. You can use the same Elastiscearch cluster for the source and destination. | `http://localhost:9200` |
68+
| `SOURCE_ES_USERNAME` | The username of the source Elasticsearch cluster. | `elasticsearch` |
69+
| `SOURCE_ES_PASSWORD` | The password of the source Elasticsearch cluster. | `changeme` |
70+
| `DESTINATION_ES_URL` | The URL of the destination Elasticsearch cluster must be in the format `http://<host>:<port>`. | `http://localhost:9200` |
71+
| `DESTINATION_ES_USERNAME` | The username of the destination Elasticsearch cluster. | `elasticsearch` |
72+
| `DESTINATION_ES_PASSWORD` | The password of the destination Elasticsearch cluster. | `changeme` |
73+
| `DESTINATION_ES_INDEX_NAME` | The name of the destination Elasticsearch index. By default, the index name is `opengatellm`, corresponds to the default index name in the configuration file. | `opengatellm` |
74+
| `DESTINATION_ES_VECTOR_SIZE` | The vector size corresponds to the dimension of the vector embedding used by the embeddings model setup in your configuration file (ex: `1024`). | `1024` |
75+
| `DESTINATION_ES_NUMBER_OF_SHARDS` | The number of shards of the destination Elasticsearch index, check Elasticsearch documentation to know the maximum number of shards per node. | `1` |
76+
| `DESTINATION_ES_NUMBER_OF_REPLICAS` | The number of replicas of the destination Elasticsearch index, check Elasticsearch documentation to know the maximum number of replicas per node. | `1` |
77+
| `DESTINATION_ES_INDEX_LANGUAGE` | The language of the destination Elasticsearch index. The supported languages are: `french`, `english`, `german`, `italian`, `portuguese`, `spanish`, `swedish`. | `french` |
78+
79+
4. Run the script
80+
81+
```bash
82+
./run.sh
83+
```
84+
85+
The script will output the progress of the migration to the `migration.log` file. If script fails, you can rerun it, it will continue from the last point where it failed.
86+
87+
5. Check the migration
88+
89+
```bash
90+
cat migration.log

0 commit comments

Comments
 (0)