Skip to content

feat: cold-start optimization (measurement + lazy imports + fork init)#12783

Open
ogabrielluiz wants to merge 61 commits intorelease-1.10.0from
cold-start/01-measurement-foundation
Open

feat: cold-start optimization (measurement + lazy imports + fork init)#12783
ogabrielluiz wants to merge 61 commits intorelease-1.10.0from
cold-start/01-measurement-foundation

Conversation

@ogabrielluiz
Copy link
Copy Markdown
Contributor

@ogabrielluiz ogabrielluiz commented Apr 20, 2026

Summary

End-to-end cold-start optimization for Langflow. Consolidates work originally split across seven stacked PRs (#12783, #12784, #12785, #12786, #12788, #12789, #12798) into a single landing surface for QA and review.

Nothing has reached release-1.10.0 yet. The dependent PRs were auto-marked as merged by GitHub when their head branches became reachable from this branch during the consolidation merges; their original review threads are preserved on each closed PR.

What's in this PR

1. Measurement harness (originally #12783)

Reproducible cold-start benchmarking for lfx run and langflow run scenarios. Driver, mock LLM, measurement Dockerfile (python:3.13-slim + uv + hyperfine), threshold-based regression gate in CI. Zero-cost stdlib checkpoint instrumentation in lfx._bench, gated on LFX_BENCHMARK_CHECKPOINTS. Make targets: bench-local, bench-docker, bench-snapshot, bench-verify-synthetic. New benchmarks dependency group.

2. Component index correctness + build caching (originally #12784)

Lazy asyncio.Lock on ComponentCache so the import-time singleton does not crash on Python 3.13+. Bounded concurrency on module scans so asyncio's default thread pool is not exhausted under high component counts. Atomic disk-cache write via tempfile + os.replace, version-stamped with the lfx package version so lfx-only deployments do not invalidate on every restart. Read-time stale-cache peek runs outside the lock to avoid widening the lock-hold window during multi-MB disk reads.

3. Deferred imports off the graph hot path (originally #12785)

Field-typing layer split into static metadata (lfx/field_typing/names.py) and class-object resolution (constants.py) via PEP 562 __getattr__. Stub fallbacks for langchain transitive failures (Windows c10.dll and similar) with one-time warnings and fork-aware cache cleanup. DEFAULT_IMPORT_STRING preamble removed from validate.create_class; user components must include their own imports, with an actionable NameError hint mapping each previously-auto-injected name to the right from X import Y line. PEP 563 lazy annotations preserved through prepare_global_scope, so TYPE_CHECKING-only langchain symbols resolve correctly at tool-mode introspection time.

4. Lazy validate.py exec_globals (originally #12786)

_LazyImportProxy stand-ins for langchain modules in user component code, deferred to first real use. Eager imports preserved for non-langchain modules (Pydantic strict-validators, lfx constants). Star imports always resolve eagerly. Windows _MissingModulePlaceholder for unavailable C-extensions (e.g. jq).

5. Service init restructuring + fork-safe container preload (originally #12788)

PreloadStep enum and is_step_complete gates so Gunicorn workers skip work the master already completed (profile pictures, bundles, types cache, starter projects, agentic globals/MCP). Master preload runs in preload._run_master_preload, with two parallel waves via asyncio.gather:

  • Wave 1: copy_profile_pictures || load_bundles_with_error_handling (different filesystem subtrees, no shared state)
  • Wave 2 (agentic_experience only): initialize_agentic_global_variables || auto_configure_agentic_mcp_server (independent per the prerequisite DAG)

Types cache stays sequential after wave 1 because it scans settings.components_path, which the bundle step extends. Master disposes the DB engine and external cache socket before fork to prevent worker connection-pool inheritance. gc.freeze() after preload moves preloaded objects into the permanent generation so the cyclic GC does not unshare COW pages in workers.

6. Release notes (originally #12789, doc page dropped)

The standalone Deployment/deployment-cold-start.mdx page was dropped because the cold-start work is internal-track for now (master/worker COW, gc.freeze, asyncio.gather are architecture, not operator-facing config). Release notes still carry the user-visible bullet and the before/after numbers table.

7. TCP-probe readiness fix (originally #12798)

langflow_run_http_ready benchmark scenario uses a TCP probe instead of an HTTP probe, removing false negatives when the HTTP server is bound but not yet serving.

How to verify

  • make bench-local runs against the dev venv (fast iteration)
  • make bench-docker builds the measurement image and runs all scenarios end-to-end
  • make bench-verify-synthetic injects a synthetic regression and proves the CI gate trips
  • Apply the run-benchmark-snapshot label to capture authoritative numbers against release-1.10.0
  • Apply the run-benchmarks label to verify against thresholds

For prefork validation, run with LANGFLOW_GUNICORN_PRELOAD=true and watch logs for [preload] lines; workers should log Skipping ...: inherited from master for completed preload steps.

Adds a reproducible cold-start benchmarking harness covering lfx run bare boot,
lfx run <flow> first-execution, and langflow run end-to-end restart scenarios.

- src/backend/tests/benchmarks/: driver, scenarios, fixtures, mock LLM,
  measurement Dockerfile, thresholds.json sentinel baseline, conftest.
- src/lfx/src/lfx/_bench.py: stdlib-only checkpoint instrumentation gated on
  LFX_BENCHMARK_CHECKPOINTS.
- src/lfx/src/lfx/cli/run.py: integrates checkpoint hooks at after-imports,
  before-run-flow, after-run-flow landmarks; dump() on completion.
- .github/workflows/cold-start-benchmark.yml: label-gated CI regression gate
  with per-scenario matrix jobs. run-benchmarks (verify) and
  run-benchmark-snapshot (capture) label triggers.
- Makefile: bench-local / bench-docker / bench-snapshot / bench-verify-synthetic.
- pyproject.toml + uv.lock: benchmarks dep group.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 20, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0ad8b08c-b2ae-4c31-a689-83193305a836

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cold-start/01-measurement-foundation

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the enhancement New feature or request label Apr 20, 2026
ogabrielluiz and others added 2 commits April 20, 2026 11:40
Lazy asyncio.Lock on ComponentCache (safe under module-import-time
constraints), bounded concurrency on dynamic component loading,
non-blocking async read of the persisted index, cache-hit short-circuit
that skips the full package walk when the installed lfx version matches,
atomic + version-stamped writes, stale-index warning.

- src/lfx/src/lfx/interface/components.py: all correctness + caching changes
- src/lfx/tests/unit/test_component_index.py: parity + correctness tests

Observable flow behavior is unchanged: every change is accompanied by a
deep parity snapshot test that loads a flow via the component cache and
asserts byte-identical final output + vertex execution order.
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Apr 20, 2026
ogabrielluiz and others added 4 commits April 20, 2026 11:43
Renames test classes to descriptive names and strips internal roadmap
IDs (IMP-XX, IDX-XX, D-XX, plan 0N-NN) from comments and docstrings.
No behavioral changes.

Class renames:
- TestIMP02NoPandas    -> TestLfxImportsWithoutPandas
- TestIMP07FieldTyping -> TestFieldTypingDefersLangchainCore
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Apr 20, 2026
Rewrites prepare_global_scope to build a LazyImportProxy-backed globals
mapping instead of eagerly calling importlib.import_module() for every
langchain_* name in the prepended DEFAULT_IMPORT_STRING. Components that
do not reference a given langchain symbol no longer trigger its import,
cutting transformers/torch off the component-instantiation path.

Narrowed scope: only langchain / langchain_core / langchain_classic /
langchain_text_splitters / langchain_community prefixes are deferred.
Everything else (stdlib, pydantic, lfx) resolves eagerly so class-body
validators get concrete values.
ogabrielluiz and others added 3 commits April 20, 2026 11:46
Renames test files / directory / functions to descriptive names and
strips internal roadmap IDs (SVC-XX, CNT-XX, IDX-06, D-XX, phase_04).

- Directory: phase_04_service_init_parity/ -> service_init_parity/
- Files:
  - test_svc01_starter_hash_cache.py -> test_starter_project_hash_gate.py
  - test_svc02_dependency_review.py -> test_lifespan_dependency_review.py
  - test_svc02_gather_structure.py -> test_lifespan_gather_structure.py
  - test_svc03_mcp_event_readiness.py -> test_mcp_event_readiness.py
  - test_svc04_restart_integration.py -> test_langflow_restart_parity.py
- Functions: test_svc04_* -> descriptive names

No behavioral changes.
ogabrielluiz and others added 8 commits April 20, 2026 13:53
Adds docs/docs/Deployment/deployment-cold-start.mdx documenting
UV_COMPILE_BYTECODE, multi-stage layer separation, pre-warmed venv
patterns, pre-bake-deps recipes, and LANGFLOW_GUNICORN_PRELOAD guidance.
Cross-links from the Docker deployment guide and the production best
practices guide. Registers the new page in the sidebar.

Adds a release-notes.mdx bullet summarizing the cold-start performance
improvements across lfx run and langflow run, with headline scenario
numbers.
Removes internal roadmap IDs (IDX-01, MEAS-03, Phase 2, Phase 5) from
the cold-start deployment guide. No behavioral changes.
Swap the `Application startup complete.` stdout marker for a TCP connect
probe against 127.0.0.1:7860 so the scenario no longer races against
langflow's structlog processor pipeline. The same approach is used by
_langflow_no_change_restart_supervisor.py, whose header note already
called out this class of failure.

The scenario's thresholds.json entry is still the `mean_ms: 0` sentinel,
so continue-on-error stays set for this matrix cell until a
run-benchmark-snapshot captures a real baseline. Workflow comments and
the generated thresholds.json `_note` are updated to reflect the fix.
Addresses two low-severity review notes on #12798.

Pre-flight check before launching the child: if 127.0.0.1:7860 already
accepts connections (dev server, leftover benchmark boot), fail fast
with exit code 3 rather than race against the stale listener and emit
a bogus near-zero LANGFLOW_READY_MS.

Post-ready child-liveness check: after a successful TCP connect, verify
the child is still running. If it already exited, the connect landed on
someone else's listener and we refuse to record a measurement.

Also drops the redundant exit_code accumulator — failure paths now
return directly inside the try/finally; the finally block still runs.
…sions

Captured a real baseline for langflow_run_http_ready via snapshot-mode
run 24784108652 on 12798/merge@93c8aa10: 22183.74ms mean, 246ms stddev
over 5 runs. Updated thresholds.json, including a refreshed snapshot
for the other scenarios on the same run.

With the sentinel gone, dropped langflow_run_http_ready from every
continue-on-error expression and from the regression-comment-skip
condition, so the gate now enforces a real regression ceiling on that
scenario.

langflow_run_no_change_restart is retained at its 2026-04-20 baseline
(11324.7ms) rather than the 0.85ms the current run produced — the low
number is a known self_measuring dispatch bug, not a real measurement.
Workflow comments updated to reflect this.
When a thresholds.json entry has runs=0 AND mean_ms<=0, treat it as a
placeholder for a scenario that has never been snapshotted: record the
current measurement for visibility but do not trip the gate. The next
run-benchmark-snapshot anchors the real baseline.

Previously, any baseline mean_ms<=0 tripped the gate unconditionally.
That meant a new scenario landing as a sentinel (mean_ms=0, runs=0,
the convention for "tracked but not yet anchored") would fail the
workflow on its very first run, forcing the contributor to either:
  - snapshot on the PR branch (discouraged — authoritative baselines
    should come from main), or
  - add continue-on-error: true in the workflow matrix as a hack,
    then remember to remove it in a followup PR after the baseline
    lands.

Distinguishing runs=0 (unanchored) from runs>0 with mean_ms=0
(intentionally-zeroed Path-B sentinel) preserves the existing
sentinel-trip semantic for the latter case.
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 6, 2026
@ogabrielluiz ogabrielluiz marked this pull request as ready for review May 6, 2026 17:33
@ogabrielluiz ogabrielluiz added the run-benchmark-snapshot Triggers the cold-start-benchmark workflow in snapshot mode (captures authoritative baseline) label May 6, 2026
The unreleased deployment-cold-start page only lives in docs/docs/, not in
the versioned snapshots. Slug-based links like /deployment-cold-start resolve
to the latest released version (1.9.0) and break the docusaurus build.
Relative .mdx links resolve within the same docs version.
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 6, 2026
Cold-start work is internal-track for now. The implementation details
(master/worker COW, gc.freeze, gunicorn preload, asyncio.gather waves)
read as architecture rather than user-facing configuration. The
user-facing win (Langflow starts faster) happens automatically; if and
when prefork mode becomes a recommended path, the env-var reference
can carry a one-line note.

- Remove docs/docs/Deployment/deployment-cold-start.mdx
- Remove sidebar entry
- Remove the Cold-start optimization section from deployment-docker.mdx
- Remove the Cold-start optimization bullet from deployment-prod-best-practices.mdx
- Trim the deployment-tuning paragraph and table footnote in release-notes.mdx
  so the cold-start row no longer references the dropped page
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 6, 2026
@github-actions

This comment has been minimized.

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 6, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 70.91837% with 171 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.31%. Comparing base (78f82ca) to head (034eccb).
⚠️ Report is 5 commits behind head on release-1.10.0.

Files with missing lines Patch % Lines
...ase/langflow/initial_setup/starter_project_hash.py 0.00% 60 Missing ⚠️
src/lfx/src/lfx/custom/validate.py 70.94% 47 Missing and 5 partials ⚠️
src/lfx/src/lfx/_bench.py 60.00% 15 Missing and 1 partial ⚠️
src/lfx/src/lfx/interface/components.py 85.56% 10 Missing and 4 partials ⚠️
src/lfx/src/lfx/field_typing/constants.py 93.10% 3 Missing and 3 partials ⚠️
...c/lfx/src/lfx/custom/custom_component/component.py 60.00% 4 Missing ⚠️
src/lfx/src/lfx/utils/type_hints.py 77.77% 4 Missing ⚠️
src/lfx/src/lfx/base/data/base_file.py 25.00% 3 Missing ⚠️
src/backend/base/langflow/server.py 71.42% 2 Missing ⚠️
src/lfx/src/lfx/cli/run.py 77.77% 1 Missing and 1 partial ⚠️
... and 6 more

❌ Your project check has failed because the head coverage (51.03%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@                Coverage Diff                 @@
##           release-1.10.0   #12783      +/-   ##
==================================================
- Coverage           54.36%   53.31%   -1.05%     
==================================================
  Files                2091     2095       +4     
  Lines              191692   192072     +380     
  Branches            27455    27506      +51     
==================================================
- Hits               104204   102406    -1798     
- Misses              86333    88500    +2167     
- Partials             1155     1166      +11     
Flag Coverage Δ
backend 50.26% <22.22%> (-7.37%) ⬇️
frontend 54.54% <ø> (+0.07%) ⬆️
lfx 51.03% <78.69%> (+0.43%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...rc/backend/base/langflow/field_typing/constants.py 0.00% <ø> (ø)
src/backend/base/langflow/initial_setup/setup.py 41.67% <ø> (-12.23%) ⬇️
src/backend/base/langflow/preload.py 88.23% <100.00%> (-10.32%) ⬇️
src/lfx/src/lfx/base/prompts/utils.py 53.57% <100.00%> (+3.57%) ⬆️
src/lfx/src/lfx/base/tools/component_tool.py 48.90% <100.00%> (+1.38%) ⬆️
src/lfx/src/lfx/field_typing/names.py 100.00% <100.00%> (ø)
src/lfx/src/lfx/graph/graph/utils.py 72.74% <100.00%> (ø)
src/lfx/src/lfx/graph/state/model.py 92.30% <100.00%> (+0.12%) ⬆️
src/lfx/src/lfx/graph/vertex/param_handler.py 83.09% <100.00%> (ø)
src/lfx/src/lfx/graph/vertex/vertex_types.py 43.58% <100.00%> (ø)
... and 26 more

... and 182 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 37%
37.46% (44822/119652) 67.62% (6197/9164) 37.13% (1029/2771)

Unit Test Results

Tests Skipped Failures Errors Time
4253 0 💤 0 ❌ 0 🔥 8m 29s ⏱️

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 8, 2026
The test asserted that initialize_auto_login_default_superuser() is called
exactly once in main.py. That call was intentionally removed — superuser
initialization is handled inside initialize_services() via setup_superuser(),
which includes file-lock protection for multi-worker environments. The
standalone call in main.py was always redundant and its removal didn't break
AUTO_LOGIN behavior.
@jordanrfrazier jordanrfrazier force-pushed the cold-start/01-measurement-foundation branch from 742aa3f to 629c8ab Compare May 8, 2026 18:25
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 8, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 8, 2026
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request run-benchmark-snapshot Triggers the cold-start-benchmark workflow in snapshot mode (captures authoritative baseline)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants