|
| 1 | +# ADR - 2026-01-30 - Elasticsearch Scaling |
| 2 | + |
| 3 | +**Status:** In Progress |
| 4 | +**Date:** 2026-01-30 |
| 5 | +**Authors:** Development Team |
| 6 | +**Decision Outcome:** Merge Elasticsearch indices into a single index |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +## Context |
| 11 | + |
| 12 | +[WIP] |
| 13 | + |
| 14 | +## Migration script |
| 15 | + |
| 16 | +The [`adr/scripts/2026-01-30-es-migration.py`](adr/scripts/2026-01-30-es-migration.py) script is used to migrate the data from the source Elasticsearch indices to the destination Elasticsearch index. |
| 17 | + |
| 18 | +1. Clone the repository on the target server |
| 19 | + |
| 20 | +```bash |
| 21 | +git clone https://github.com/etalab-ia/OpenGateLLM.git && cd OpenGateLLM |
| 22 | +``` |
| 23 | + |
| 24 | +2. Install the dependencies |
| 25 | + |
| 26 | +> [!NOTE] |
| 27 | +> We recommend to create a virtual environment and activate it before installing the dependencies. |
| 28 | +
|
| 29 | +```bash |
| 30 | +pip install ".[api]" |
| 31 | +``` |
| 32 | + |
| 33 | +3. Create the bash script to run the migration |
| 34 | + |
| 35 | +```bash |
| 36 | +touch run.sh |
| 37 | +``` |
| 38 | + |
| 39 | +The script should look like this: |
| 40 | + |
| 41 | +```bash |
| 42 | +#!/bin/bash |
| 43 | +export PYTHONPATH=. |
| 44 | +export POSTGRES_URL= |
| 45 | + |
| 46 | +export SOURCE_ES_URL= |
| 47 | +export SOURCE_ES_USERNAME= |
| 48 | +export SOURCE_ES_PASSWORD= |
| 49 | + |
| 50 | +export DESTINATION_ES_URL= |
| 51 | +export DESTINATION_ES_USERNAME= |
| 52 | +export DESTINATION_ES_PASSWORD= |
| 53 | +export DESTINATION_ES_INDEX_NAME= |
| 54 | +export DESTINATION_ES_VECTOR_SIZE= |
| 55 | +export DESTINATION_ES_NUMBER_OF_SHARDS= |
| 56 | +export DESTINATION_ES_NUMBER_OF_REPLICAS= |
| 57 | +export DESTINATION_ES_INDEX_LANGUAGE= |
| 58 | + |
| 59 | +python adr/scripts/2026-01-30-es-migration.py > migration.log 2>&1 |
| 60 | +``` |
| 61 | + |
| 62 | +The environment variables are: |
| 63 | + |
| 64 | +| Variable | Description | Example | |
| 65 | +|----------|-------------|-------------| |
| 66 | +| `POSTGRES_URL` | The URL of the PostgreSQL database. The URL must be in the format `postgresql+asyncpg://<username>:<password>@<host>:<port>/<database>`. | `postgresql+asyncpg://postgres:changeme@localhost:5432/postgres` | |
| 67 | +| `SOURCE_ES_URL` | The URL of the source Elasticsearch cluster must be in the format `http://<host>:<port>`. You can use the same Elastiscearch cluster for the source and destination. | `http://localhost:9200` | |
| 68 | +| `SOURCE_ES_USERNAME` | The username of the source Elasticsearch cluster. | `elasticsearch` | |
| 69 | +| `SOURCE_ES_PASSWORD` | The password of the source Elasticsearch cluster. | `changeme` | |
| 70 | +| `DESTINATION_ES_URL` | The URL of the destination Elasticsearch cluster must be in the format `http://<host>:<port>`. | `http://localhost:9200` | |
| 71 | +| `DESTINATION_ES_USERNAME` | The username of the destination Elasticsearch cluster. | `elasticsearch` | |
| 72 | +| `DESTINATION_ES_PASSWORD` | The password of the destination Elasticsearch cluster. | `changeme` | |
| 73 | +| `DESTINATION_ES_INDEX_NAME` | The name of the destination Elasticsearch index. By default, the index name is `opengatellm`, corresponds to the default index name in the configuration file. | `opengatellm` | |
| 74 | +| `DESTINATION_ES_VECTOR_SIZE` | The vector size corresponds to the dimension of the vector embedding used by the embeddings model setup in your configuration file (ex: `1024`). | `1024` | |
| 75 | +| `DESTINATION_ES_NUMBER_OF_SHARDS` | The number of shards of the destination Elasticsearch index, check Elasticsearch documentation to know the maximum number of shards per node. | `1` | |
| 76 | +| `DESTINATION_ES_NUMBER_OF_REPLICAS` | The number of replicas of the destination Elasticsearch index, check Elasticsearch documentation to know the maximum number of replicas per node. | `1` | |
| 77 | +| `DESTINATION_ES_INDEX_LANGUAGE` | The language of the destination Elasticsearch index. The supported languages are: `french`, `english`, `german`, `italian`, `portuguese`, `spanish`, `swedish`. | `french` | |
| 78 | + |
| 79 | +4. Run the script |
| 80 | + |
| 81 | +```bash |
| 82 | +./run.sh |
| 83 | +``` |
| 84 | + |
| 85 | +The script will output the progress of the migration to the `migration.log` file. If script fails, you can rerun it, it will continue from the last point where it failed. |
| 86 | + |
| 87 | +5. Check the migration |
| 88 | + |
| 89 | +```bash |
| 90 | +cat migration.log |
0 commit comments