chatbot-rag-app: recover from timeouts on first use of ELSER

Fixes #307 Signed-off-by: Adrian Cole <[email protected]>
elastic · Feb 20, 2025 · 48d829f · 48d829f
1 parent ce9eeb2
commit 48d829f
Show file tree

Hide file tree

Showing 10 changed files with 266 additions and 127 deletions.
diff --git a/.github/workflows/docker-chatbot-rag-app.yml b/.github/workflows/docker-chatbot-rag-app.yml
@@ -12,7 +12,9 @@ on:
     branches:
       - main
     paths:
-      # Verify changes to the Dockerfile on PRs
+      # Verify changes to the Dockerfile on PRs, tainted when we update ES.
+      - docker/docker-compose-elastic.yml
+      - example-apps/chatbot-rag-app/docker-compose.yml
       - example-apps/chatbot-rag-app/Dockerfile
       - .github/workflows/docker-chatbot-rag-app.yml
       - '!**/*.md'
@@ -42,13 +44,27 @@ jobs:
           registry: ghcr.io
           username: ${{ github.actor }}
           password: ${{ secrets.GITHUB_TOKEN }}
+      # This builds the image and pushes its digest if a multi-architecture
+      # image will be made later (event_name == 'push'). If PR, the image is
+      # loaded into docker for testing.
       - uses: docker/build-push-action@v6
         id: build
         with:
           context: example-apps/chatbot-rag-app
-          outputs: type=image,name=${{ env.IMAGE }},push-by-digest=true,name-canonical=true,push=${{ github.event_name == 'push' && 'true' || 'false' }}
+          outputs: type=${{ github.event_name == 'pull_request' && 'docker' || 'image' }},name=${{ env.IMAGE }},push-by-digest=true,name-canonical=true,push=${{ github.event_name == 'push' && 'true' || 'false' }}
           cache-from: type=gha
           cache-to: type=gha,mode=max
+      - name: start elasticsearch
+        if: github.event_name == 'pull_request'
+        run: docker compose -f docker/docker-compose-elastic.yml up --quiet-pull -d --wait --wait-timeout 120 elasticsearch
+      - name: test image
+        if: github.event_name == 'pull_request'
+        working-directory: example-apps/chatbot-rag-app
+        run: |  # This tests ELSER is working, which doesn't require an LLM.
+          cp env.example .env
+          # same as `docker compose run --rm -T create-index`, except pull never
+          docker run --rm --name create-index --env-file .env --pull never \
+            --add-host "localhost:host-gateway" ${{ env.IMAGE }} flask create-index
       - name: export digest
         if: github.event_name == 'push'
         run: |

diff --git a/docker/README.md b/docker/README.md
@@ -12,7 +12,7 @@ wget https://raw.githubusercontent.com/elastic/elasticsearch-labs/refs/heads/mai
 Use docker compose to run Elastic stack in the background:
 
 ```bash
-docker compose -f docker-compose-elastic.yml up --force-recreate -d
+docker compose -f docker-compose-elastic.yml up --force-recreate --wait -d
 ```
 
 Then, you can view Kibana at http://localhost:5601/app/home#/

diff --git a/docker/docker-compose-elastic.yml b/docker/docker-compose-elastic.yml
@@ -2,7 +2,7 @@ name: elastic-stack
 
 services:
   elasticsearch:
-    image: docker.elastic.co/elasticsearch/elasticsearch:8.17.0
+    image: docker.elastic.co/elasticsearch/elasticsearch:8.17.2
     container_name: elasticsearch
     ports:
       - 9200:9200
@@ -16,21 +16,29 @@ services:
       - xpack.security.http.ssl.enabled=false
       - xpack.security.transport.ssl.enabled=false
       - xpack.license.self_generated.type=trial
-      - ES_JAVA_OPTS=-Xmx8g
+      # Note that ELSER is recommended to have 2GB, but it is JNI (PyTorch).
+      # So, ELSER's memory is in addition to the heap and other overhead.
+      - ES_JAVA_OPTS=-Xms2g -Xmx2g
     ulimits:
       memlock:
         soft: -1
         hard: -1
     healthcheck:
-      test: ["CMD-SHELL", "curl -s http://localhost:9200/_cluster/health?wait_for_status=yellow&timeout=500ms"]
-      retries: 300
+      test:  # readiness probe taken from kbn-health-gateway-server script
+        [
+          "CMD-SHELL",
+          "curl -s http://localhost:9200 | grep -q 'missing authentication credentials'",
+        ]
+      start_period: 10s
       interval: 1s
+      timeout: 10s
+      retries: 120
 
   elasticsearch_settings:
     depends_on:
       elasticsearch:
         condition: service_healthy
-    image: docker.elastic.co/elasticsearch/elasticsearch:8.17.0
+    image: docker.elastic.co/elasticsearch/elasticsearch:8.17.2
     container_name: elasticsearch_settings
     restart: 'no'
     command: >
@@ -42,7 +50,7 @@ services:
       '
 
   kibana:
-    image: docker.elastic.co/kibana/kibana:8.17.0
+    image: docker.elastic.co/kibana/kibana:8.17.2
     container_name: kibana
     depends_on:
       elasticsearch_settings:
@@ -66,7 +74,7 @@ services:
       interval: 1s
 
   apm-server:
-    image: docker.elastic.co/apm/apm-server:8.17.0
+    image: docker.elastic.co/apm/apm-server:8.17.2
     container_name: apm-server
     depends_on:
       elasticsearch:

diff --git a/example-apps/chatbot-rag-app/README.md b/example-apps/chatbot-rag-app/README.md
@@ -45,34 +45,36 @@ and configure its templated connection settings:
 
 ## Running the App
 
+This application contains two services:
+* create-index: Installs ELSER and ingests data into elasticsearch
+* api-frontend: Hosts the chatbot-rag-app application on http://localhost:4000
+
 There are two ways to run the app: via Docker or locally. Docker is advised for
 ease while locally is advised if you are making changes to the application.
 
 ### Run with docker
 
-Docker compose is the easiest way, as you get one-step to:
-* ingest data into elasticsearch
-* run the app, which listens on http://localhost:4000
+Docker compose is the easiest way to get started, as you don't need to have a
+working Python environment.
 
 **Double-check you have a `.env` file with all your variables set first!**
 
 ```bash
 docker compose up --pull always --force-recreate
 ```
 
-*Note*: First time creating the index can fail on timeout. Wait a few minutes
-and retry.
+*Note*: The first run may take several minutes to become available.
 
 Clean up when finished, like this:
 
 ```bash
 docker compose down
 ```
 
-### Run locally
+### Run with Python
 
-If you want to run this example with Python and Node.js, you need to do a few
-things listed in the [Dockerfile](Dockerfile). The below uses the same
+If you want to run this example with Python, you need to do a few things listed
+in the [Dockerfile](Dockerfile) to build it first. The below uses the same
 production mode as used in Docker to avoid problems in debug mode.
 
 **Double-check you have a `.env` file with all your variables set first!**
@@ -89,7 +91,7 @@ nvm use --lts
 (cd frontend; yarn install; REACT_APP_API_HOST=/api yarn build)
 ```
 
-#### Configure your python environment
+#### Configure your Python environment
 
 Before we can run the app, we need a working Python environment with the
 correct packages installed:
@@ -102,17 +104,16 @@ pip install "python-dotenv[cli]"
 pip install -r requirements.txt
 ```
 
-#### Run the ingest command
+#### Create your Elasticsearch index
 
 First, ingest the data into elasticsearch:
 ```bash
-FLASK_APP=api/app.py dotenv run -- flask create-index
+dotenv run -- flask create-index
 ```
 
-*Note*: First time creating the index can fail on timeout. Wait a few minutes
-and retry.
+*Note*: This may take several minutes to complete
 
-#### Run the app
+#### Run the application
 
 Now, run the app, which listens on http://localhost:4000
 ```bash
@@ -185,10 +186,10 @@ passages. Modify this script to index your own data.
 
 See [Langchain documentation][loader-docs] for more ways to load documents.
 
-### Building from source with docker
+### Running from source with Docker
 
-To build the app from source instead of using published images, pass the `--build`
-flag to Docker Compose.
+To build the app from source instead of using published images, pass the
+`--build` flag to Docker Compose instead of `--pull always`
 
 ```bash
 docker compose up --build --force-recreate

diff --git a/example-apps/chatbot-rag-app/api/chat.py b/example-apps/chatbot-rag-app/api/chat.py
@@ -6,6 +6,7 @@
     get_elasticsearch_chat_message_history,
 )
 from flask import current_app, render_template, stream_with_context
+from functools import cache
 from langchain_elasticsearch import (
     ElasticsearchStore,
     SparseVectorStrategy,
@@ -27,11 +28,16 @@
     strategy=SparseVectorStrategy(model_id=ELSER_MODEL),
 )
 
-llm = get_llm()
+
+@cache
+def get_lazy_llm():
+    return get_llm()
 
 
 @stream_with_context
 def ask_question(question, session_id):
+    llm = get_lazy_llm()
+
     yield f"data: {SESSION_ID_TAG} {session_id}\n\n"
     current_app.logger.debug("Chat session ID: %s", session_id)