Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
183 changes: 183 additions & 0 deletions docs/docs/Deployment/deployment-cold-start.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
---
title: Cold-start optimization
slug: /deployment-cold-start
---

Langflow and lfx can be deployed in cold-start-sensitive environments (scale-from-zero container platforms, serverless frameworks, CI ephemeral runners). This page documents the container- and deployment-level knobs that reduce first-request latency when your container starts fresh.

If your deployment always runs a warm Langflow server (single long-lived container, persistent VM), the knobs here are optional. They become valuable when every new request risks paying full container cold-start cost.

## Which image for which use

Langflow publishes two container images. Choose the one that matches your workload.

| Image | Base | Contents | Use when |
|---|---|---|---|
| `langflowai/lfx` (built from `src/lfx/docker/Dockerfile`) | Debian slim (glibc) | lfx package + venv only | You execute flows programmatically — no UI, no visual editor. Embedded runtime, scale-from-zero, serverless. |
| `langflowai/langflow` (built from `docker/build_and_push.Dockerfile`) | Debian slim (glibc) | Langflow backend + React frontend + Node.js + Playwright | You host the Langflow UI for flow authoring + execution. |

A common mistake: running `langflowai/langflow` for pure flow execution. The langflow image bundles a compiled React frontend, Node.js runtime, and Playwright binaries that are unused when you only call `lfx run`, and those extras add unnecessary cold-start cost. For execution-only workloads, use `langflowai/lfx`.

## Cold-start knobs

The lfx reference Dockerfile (`src/lfx/docker/Dockerfile`) already applies the following optimizations. If you build your own image, reuse these patterns.

### Compile bytecode at build time

Set `UV_COMPILE_BYTECODE=1` during `uv sync` so `.pyc` files land in the image layer. Without bytecode compilation, Python compiles on first import in every fresh container, costing hundreds of milliseconds.

```dockerfile
ENV UV_COMPILE_BYTECODE=1
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --frozen --no-dev --no-install-project --package lfx
```

### Layer order: deps first, source last

Structure the Dockerfile so the dependency-install layer is cache-stable across source-only changes. The key flag is `--no-install-project` on the first `uv sync`:

```dockerfile
# Copy only the files that affect dependency resolution (keeps cache stable).
COPY pyproject.toml uv.lock ./
COPY src/lfx/pyproject.toml /app/src/lfx/pyproject.toml

# Install ONLY dependencies — not the lfx package itself. This layer caches.
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --frozen --no-dev --no-install-project --package lfx

# Copy source AFTER deps install (source changes do NOT bust the deps layer).
COPY src/lfx/src /app/src/lfx/src

# Install lfx itself (fast — deps are already present).
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --frozen --no-dev --no-editable --package lfx
```

When only your source code changes, docker re-uses the cached deps layer and only the second `uv sync` runs.

### Python version

Run on Python 3.13. Python 3.13 introduces module-import-time restrictions (notably, creating `asyncio.Lock()` at import raises `RuntimeError`) that Langflow has been updated to handle with lazy-property patterns. Older Python versions still work but lose the runtime improvements landed in Langflow 1.9+.

## Scale-from-zero patterns

For platforms that tear down containers between requests and spin up fresh ones on demand (scale-from-zero), every request pays full cold-start cost: base image pull, venv mount, Python import, `initialize_services`, component-index build, and first-flow execution.

Reduce first-request latency by:

- **Using the lfx image, not the langflow image**, for execution-only workloads.
- **Pre-baking your flow dependencies** into a derived image (see below).
- **Caching the container image in the platform's registry** so the pull cost is amortized across requests.
- **Keeping at least one warm replica** where your platform supports it. One always-on container avoids scale-from-zero on the happy path.

## Pre-baking flow dependencies

Your flows may install runtime provider packages on first execution (openai, anthropic, langchain-*, vector-store clients). Each install adds seconds to the first cold-start in a fresh container. Pre-bake them into a derived image.

### Derived image recipe

```dockerfile
FROM langflowai/lfx:latest AS base

USER root
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --python /app/.venv/bin/python \
openai \
anthropic \
langchain-openai \
chromadb

USER lfx
```

Pick the deps that match the flows you actually run. The list above covers a common top-tier (LLM providers + a vector store); extend it for embeddings, document loaders, or other providers your flows call.

For parameterized builds, use `--build-arg`:

```dockerfile
FROM langflowai/lfx:latest AS base
ARG EXTRA_DEPS="openai anthropic"
USER root
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --python /app/.venv/bin/python $EXTRA_DEPS
USER lfx
```

Then build with `docker build --build-arg EXTRA_DEPS="openai chromadb pinecone-client" -t my-lfx ...`.

### glibc base (manylinux wheels)

The lfx image uses Debian slim (glibc) as its base. This accepts `manylinux` wheels from PyPI, which covers essentially every popular Python package including ML / scientific stacks (`onnxruntime`, `torch`, `numpy`, `pandas`, `pillow`). Most pre-baking recipes install cleanly without a compiler.

If a specific package only ships wheels for a narrower glibc version (for example `manylinux_2_28` when the runtime is `manylinux_2_17`), either upgrade the base image (`python:3.13-slim-bookworm` tracks `manylinux_2_28`) or add a builder stage with `build-essential` and discard it in the final image.

> The lfx image was previously Alpine-based. It was switched to Debian slim because some transitive dependencies (for example `onnxruntime` via `markitdown → magika`) do not publish `musllinux` wheels at all, and `onnxruntime` has no sdist, so the Alpine image could not resolve its own dependencies. If you have a constrained dep tree that fits musl, you can still build a smaller Alpine variant by swapping the `FROM` lines back to `ghcr.io/astral-sh/uv:python3.13-alpine` and `python:3.13-alpine`.

### Cross-platform requirements generation

Generating a `requirements.txt` on macOS or Windows and then installing it on Linux fails: the macOS / Windows resolver picks platform-specific wheels whose hashes do not match the manylinux wheels available on Linux.

Use `uv pip compile` to generate a Linux-targeted requirements file from any host:

```bash
uv pip compile requirements.in \
--python-platform linux \
--python-version 3.13 \
-o requirements-linux.txt
```

> `uv export` does not support `--python-platform` or `--python-version`; use `uv pip compile` for cross-platform pinning. Verify the current flag names with `uv pip compile --help` on your installed uv version.

Then install from that file in your derived image.

## LANGFLOW_GUNICORN_PRELOAD migration notes

`LANGFLOW_GUNICORN_PRELOAD=true` asks gunicorn to load the Langflow application once in the master process before forking workers, so every worker inherits the application object instead of loading it independently. When safe, this reduces per-worker startup cost.

Starting with this release, `LANGFLOW_GUNICORN_PRELOAD` defaults to `true`. Gunicorn loads the Langflow application once in the master process before forking workers, and every worker inherits the application object instead of loading it independently. On multi-worker deployments this reduces per-worker startup cost by avoiding redundant imports and initialization.

### What the audit covered

Enabling `preload_app=true` makes any resource constructed before fork (the master process) shared with every worker via copy-on-write. Sockets and file descriptors are literally shared, not copied — a stale inherited HTTP pool, database engine, or background task will break.

The fork-safety audit covered seven classes of pre-fork state:

- **SQLAlchemy engine.** Safe by construction — `DatabaseService.__init__` runs inside `initialize_services()` inside the FastAPI lifespan, which executes per-worker post-fork. The engine is never created in the master.
- **asyncio locks.** Safe. `ComponentCache._lock` uses a lazy `@property` pattern; each worker creates its own `asyncio.Lock()` on first access, bound to its own event loop.
- **Component index cache.** Safe. `ComponentCache.all_types_dict` is populated inside the FastAPI lifespan (post-fork) by the wave-2 `asyncio.gather` block.
- **Redis connection pool.** Safe (conditional). `RedisCache` is only constructed when `LANGFLOW_CACHE_TYPE=redis`, and in that case it is built inside `initialize_services()` (post-fork).
- **asyncio background tasks at import time.** Safe. No `asyncio.create_task` calls at module scope; all tasks are created inside the lifespan body or inside `service.start()` methods.
- **File descriptors at import time.** Safe. No module-level `open()` calls in `langflow/main.py` or adjacent imports; gunicorn's own `Logger.reopen_files` handles its log FDs.
- **Telemetry HTTP client.** Hazard — and fixed. `TelemetryService.__init__` constructs `httpx.AsyncClient` at service construction, which happens inside `get_lifespan()` and therefore runs pre-fork when `preload_app=true`. A gunicorn `post_fork` hook now resets `TelemetryService.client` to `None`; the service's `start()` method reconstructs the client inside the worker's event loop on first use.

### Opting out

If your deployment depends on the previous (sequential worker load) behavior, opt out with:

```bash
export LANGFLOW_GUNICORN_PRELOAD=false
```

Any truthy value other than `true` (`false`, `0`, empty) disables preload. The environment variable name is preserved from the previous release — existing scripts continue to work.

### When to stay opted out

Preload offers the smallest win (and the most risk) in these cases:

- Single-worker deployments (`WORKERS=1`). There is nothing to share; preload adds nothing.
- Deployments using custom services or integrations that construct their own connection pools during module import. Audit those services for fork-safety before flipping preload on.
- Environments where `preload_app=true` is known to conflict with the gunicorn worker class in use.

For multi-worker deployments using the stock services, the default is recommended.

### Measured improvements

The cold-start improvements milestone reduced `lfx run` and `langflow run` cold-start latency on Linux CI runners. Headline numbers from the post-fix authoritative run on Python 3.13:

- `lfx run` bare boot dropped from 17.82 s to 10.55 s (`lfx_bare` scenario).
- `lfx run <flow>` uncompiled dropped from 18.92 s to 16.01 s (`lfx_with_flow` scenario).
- `lfx run <flow>` with pre-baked bytecode dropped from 9.52 s to 8.43 s (`lfx_with_flow_prebaked` scenario).

The gains come from in-process changes (deferred imports on the Graph hot path, atomic and version-stamped component index, parallelized lifespan tasks, event-driven MCP startup, persisted component-index cache) combined with the build-time and image-layer tunings documented above (`UV_COMPILE_BYTECODE=1`, multi-stage layer separation).

For the full before / after table, per-scenario deltas, and per-phase checkpoint breakdown, see the cold-start performance improvements entry in the [Langflow release notes](/release-notes).
6 changes: 5 additions & 1 deletion docs/docs/Deployment/deployment-docker.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -275,4 +275,8 @@ This approach keeps the persistent volumes separate from the Langflow container,
If you need to upgrade to a custom image based on a Langflow release, such as to add `uv` in `1.8.0`, first build a derived image from the official image, and then follow the same steps above.
Set the custom image in your compose file or `docker run`, and then pull and restart.

For a minimal Dockerfile that adds `uv` to the 1.8.0 image, see the [release notes](/release-notes) (“Docker image no longer includes uv or uvx”).
For a minimal Dockerfile that adds `uv` to the 1.8.0 image, see the [release notes](/release-notes) (“Docker image no longer includes uv or uvx”).

## Cold-start optimization

For guidance on reducing cold-start time in containerized deployments (scale-from-zero, serverless, pre-baked images), see [Cold-start optimization](/deployment-cold-start).
3 changes: 2 additions & 1 deletion docs/docs/Deployment/deployment-prod-best-practices.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -124,4 +124,5 @@ Follow industry best practices and use secure Langflow configurations, such as t

* [Deploy the Langflow production environment on Kubernetes](/deployment-kubernetes-prod)
* [Langflow Helm Charts repository](https://github.com/langflow-ai/langflow-helm-charts)
* [Langflow environment variables](/environment-variables)
* [Langflow environment variables](/environment-variables)
* [Cold-start optimization](/deployment-cold-start)
15 changes: 15 additions & 0 deletions docs/docs/Support/release-notes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,21 @@ For all changes, see the [Changelog](https://github.com/langflow-ai/langflow/rel
The **Policies** component uses [ToolGuard](https://github.com/AgentToolkit/toolguard) to generate guard code from natural-language business policies and apply it to agent tools.
For more information, see [Policies (Beta)](../Components/policies.mdx).

- Cold-start performance improvements for `lfx run` and `langflow run`

Cold-start latency is reduced on the `lfx run` execution path and on `langflow run` restart scenarios. The gains come from deferred heavy imports on the Graph hot path, atomic and version-stamped component index, parallelized lifespan tasks, event-driven MCP startup, and a persisted component-index cache that skips the full package walk when the installed lfx version matches. The reference `langflowai/lfx` image also sets `UV_COMPILE_BYTECODE=1` at build time and uses a multi-stage layer layout so `.pyc` files are baked into the image layer.

Cold-start numbers (Linux CI, Python 3.13, `ubuntu-latest`):

| Scenario | Before (ms) | After (ms) | Delta |
|----------|-------------|------------|-------|
| `lfx run` bare boot (`lfx_bare`) | 17820 | 10550 | -7270ms (-40.8%) |
| `lfx run <flow>` uncompiled (`lfx_with_flow`) | 18920 | 16013 | -2907ms (-15.4%) |
| `lfx run <flow>` prebaked (`lfx_with_flow_prebaked`) | 9520 | 8425 | -1095ms (-11.5%) |
| `langflow run --backend-only` no-change restart | new scenario | 11325 | see [Cold-start optimization](../Deployment/deployment-cold-start.mdx) |

For deployment-level tuning (build-time bytecode compilation, layer ordering, pre-baking flow dependencies, `LANGFLOW_GUNICORN_PRELOAD`), see [Cold-start optimization](../Deployment/deployment-cold-start.mdx).

## 1.8.x

Highlights of this release include the following changes.
Expand Down
5 changes: 5 additions & 0 deletions docs/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,11 @@ module.exports = {
id: "Deployment/deployment-caddyfile",
label: "Deploy Langflow on a remote server"
},
{
type: "doc",
id: "Deployment/deployment-cold-start",
label: "Cold-start optimization"
},
{
type: "category",
label: "Kubernetes",
Expand Down
Loading