diff --git a/docs/dynamic-plugins/installing-plugins.md b/docs/dynamic-plugins/installing-plugins.md index bbaea0d48b..19208061a0 100644 --- a/docs/dynamic-plugins/installing-plugins.md +++ b/docs/dynamic-plugins/installing-plugins.md @@ -287,23 +287,40 @@ When using the Operator .... The directory where dynamic plugins are located is mounted as a volume to the `install-dynamic-plugins` init container and the `backstage-backend` container. The `install-dynamic-plugins` init container is responsible for downloading and extracting the plugins into this directory. Depending on the deployment method, the directory is mounted as an ephemeral or persistent volume. In the latter case, the volume can be shared between several Pods, and the plugins installation script is also responsible for downloading and extracting the plugins only once, avoiding conflicts. -**Important Note:** If `install-dynamic-plugins` init container was killed with SIGKILL signal, which may happen due to the following reasons: +**Important Note:** When the `dynamic-plugins-root` directory is backed by a persistent volume, the `install-dynamic-plugins` init container uses a lock file (`/dynamic-plugins-root/install-dynamic-plugins.lock`) to prevent concurrent plugin installations across Pods that share the same volume. The lock is acquired before installation begins and released when it completes (or fails). + +If the `install-dynamic-plugins` init container is killed with a SIGKILL signal, the lock file cannot be cleaned up. This may happen due to the following reasons: - pod eviction (to free up node resources) -- pod deletion (if not terminated with SIGTERM within graceful period) +- pod deletion (if not terminated with SIGTERM within the graceful period) - node shutdown - container runtime issues - exceeding resource limits (OOM for example) -Then the script will not be able to remove the lock file, so the next time the pod starts, it will be be stuck waiting for the lock to release. You will see the following message in the logs for the init `install-dynamic-plugins` container: +When this occurs, the next pod to start will wait up to **10 minutes** (by default) for the stale lock to be released, logging the following message: ```console oc logs -n -f backstage-- -c install-dynamic-plugins -======= Waiting for lock release (file: /dynamic-plugins-root/install-dynamic-plugins.lock)... +======= Waiting for lock to be released: /dynamic-plugins-root/install-dynamic-plugins.lock +``` + +After the timeout expires, the init container exits with an error: + +``` +Timed out after 600000ms waiting for lock file /dynamic-plugins-root/install-dynamic-plugins.lock. +Another install may be stuck — remove the file manually to proceed. ``` -In such a case, you can delete the lock file manually from any of the Pods: +The exit handler automatically removes the stale lock file during shutdown. The pod restarts, and the next init container run starts with no lock file present, so it proceeds normally. The total recovery time equals the configured lock timeout (10 minutes by default). No manual intervention is required. + +To skip the timeout wait and recover immediately, delete the lock file manually: ```console -oc exec -n deploy/backstage- -c install-dynamic-plugins -- rm -f /dynamic-plugins-root/dynamic-plugins.lock +oc exec -n deploy/backstage- -c install-dynamic-plugins -- rm -f /dynamic-plugins-root/install-dynamic-plugins.lock ``` + +The lock timeout can be configured via the `DYNAMIC_PLUGINS_LOCK_TIMEOUT_MS` environment variable on the `install-dynamic-plugins` init container (value in milliseconds, default: `600000` which is 10 minutes). + +> **Note:** In RHDH 1.10.x and earlier, the install script used a Python implementation with no lock timeout. A stale lock file would cause the init container to wait indefinitely, and the only way to recover was to manually delete the lock file. The timeout, configurable environment variable, and automatic lock cleanup on exit were introduced with the TypeScript rewrite of the install script. + +Note: This lock file behavior only applies when using a persistent volume for the `dynamic-plugins-root` directory. With the default ephemeral volume, each pod gets its own volume, so no lock contention can occur.