feat: cold-start optimization (measurement + lazy imports + fork init)#12783
feat: cold-start optimization (measurement + lazy imports + fork init)#12783ogabrielluiz wants to merge 61 commits intorelease-1.10.0from
Conversation
Adds a reproducible cold-start benchmarking harness covering lfx run bare boot, lfx run <flow> first-execution, and langflow run end-to-end restart scenarios. - src/backend/tests/benchmarks/: driver, scenarios, fixtures, mock LLM, measurement Dockerfile, thresholds.json sentinel baseline, conftest. - src/lfx/src/lfx/_bench.py: stdlib-only checkpoint instrumentation gated on LFX_BENCHMARK_CHECKPOINTS. - src/lfx/src/lfx/cli/run.py: integrates checkpoint hooks at after-imports, before-run-flow, after-run-flow landmarks; dump() on completion. - .github/workflows/cold-start-benchmark.yml: label-gated CI regression gate with per-scenario matrix jobs. run-benchmarks (verify) and run-benchmark-snapshot (capture) label triggers. - Makefile: bench-local / bench-docker / bench-snapshot / bench-verify-synthetic. - pyproject.toml + uv.lock: benchmarks dep group.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Lazy asyncio.Lock on ComponentCache (safe under module-import-time constraints), bounded concurrency on dynamic component loading, non-blocking async read of the persisted index, cache-hit short-circuit that skips the full package walk when the installed lfx version matches, atomic + version-stamped writes, stale-index warning. - src/lfx/src/lfx/interface/components.py: all correctness + caching changes - src/lfx/tests/unit/test_component_index.py: parity + correctness tests Observable flow behavior is unchanged: every change is accompanied by a deep parity snapshot test that loads a flow via the component cache and asserts byte-identical final output + vertex execution order.
Renames test classes to descriptive names and strips internal roadmap IDs (IMP-XX, IDX-XX, D-XX, plan 0N-NN) from comments and docstrings. No behavioral changes. Class renames: - TestIMP02NoPandas -> TestLfxImportsWithoutPandas - TestIMP07FieldTyping -> TestFieldTypingDefersLangchainCore
Rewrites prepare_global_scope to build a LazyImportProxy-backed globals mapping instead of eagerly calling importlib.import_module() for every langchain_* name in the prepended DEFAULT_IMPORT_STRING. Components that do not reference a given langchain symbol no longer trigger its import, cutting transformers/torch off the component-instantiation path. Narrowed scope: only langchain / langchain_core / langchain_classic / langchain_text_splitters / langchain_community prefixes are deferred. Everything else (stdlib, pydantic, lfx) resolves eagerly so class-body validators get concrete values.
Renames test files / directory / functions to descriptive names and strips internal roadmap IDs (SVC-XX, CNT-XX, IDX-06, D-XX, phase_04). - Directory: phase_04_service_init_parity/ -> service_init_parity/ - Files: - test_svc01_starter_hash_cache.py -> test_starter_project_hash_gate.py - test_svc02_dependency_review.py -> test_lifespan_dependency_review.py - test_svc02_gather_structure.py -> test_lifespan_gather_structure.py - test_svc03_mcp_event_readiness.py -> test_mcp_event_readiness.py - test_svc04_restart_integration.py -> test_langflow_restart_parity.py - Functions: test_svc04_* -> descriptive names No behavioral changes.
Adds docs/docs/Deployment/deployment-cold-start.mdx documenting UV_COMPILE_BYTECODE, multi-stage layer separation, pre-warmed venv patterns, pre-bake-deps recipes, and LANGFLOW_GUNICORN_PRELOAD guidance. Cross-links from the Docker deployment guide and the production best practices guide. Registers the new page in the sidebar. Adds a release-notes.mdx bullet summarizing the cold-start performance improvements across lfx run and langflow run, with headline scenario numbers.
Removes internal roadmap IDs (IDX-01, MEAS-03, Phase 2, Phase 5) from the cold-start deployment guide. No behavioral changes.
Swap the `Application startup complete.` stdout marker for a TCP connect probe against 127.0.0.1:7860 so the scenario no longer races against langflow's structlog processor pipeline. The same approach is used by _langflow_no_change_restart_supervisor.py, whose header note already called out this class of failure. The scenario's thresholds.json entry is still the `mean_ms: 0` sentinel, so continue-on-error stays set for this matrix cell until a run-benchmark-snapshot captures a real baseline. Workflow comments and the generated thresholds.json `_note` are updated to reflect the fix.
Addresses two low-severity review notes on #12798. Pre-flight check before launching the child: if 127.0.0.1:7860 already accepts connections (dev server, leftover benchmark boot), fail fast with exit code 3 rather than race against the stale listener and emit a bogus near-zero LANGFLOW_READY_MS. Post-ready child-liveness check: after a successful TCP connect, verify the child is still running. If it already exited, the connect landed on someone else's listener and we refuse to record a measurement. Also drops the redundant exit_code accumulator — failure paths now return directly inside the try/finally; the finally block still runs.
…sions Captured a real baseline for langflow_run_http_ready via snapshot-mode run 24784108652 on 12798/merge@93c8aa10: 22183.74ms mean, 246ms stddev over 5 runs. Updated thresholds.json, including a refreshed snapshot for the other scenarios on the same run. With the sentinel gone, dropped langflow_run_http_ready from every continue-on-error expression and from the regression-comment-skip condition, so the gate now enforces a real regression ceiling on that scenario. langflow_run_no_change_restart is retained at its 2026-04-20 baseline (11324.7ms) rather than the 0.85ms the current run produced — the low number is a known self_measuring dispatch bug, not a real measurement. Workflow comments updated to reflect this.
When a thresholds.json entry has runs=0 AND mean_ms<=0, treat it as a
placeholder for a scenario that has never been snapshotted: record the
current measurement for visibility but do not trip the gate. The next
run-benchmark-snapshot anchors the real baseline.
Previously, any baseline mean_ms<=0 tripped the gate unconditionally.
That meant a new scenario landing as a sentinel (mean_ms=0, runs=0,
the convention for "tracked but not yet anchored") would fail the
workflow on its very first run, forcing the contributor to either:
- snapshot on the PR branch (discouraged — authoritative baselines
should come from main), or
- add continue-on-error: true in the workflow matrix as a hack,
then remember to remove it in a followup PR after the baseline
lands.
Distinguishing runs=0 (unanchored) from runs>0 with mean_ms=0
(intentionally-zeroed Path-B sentinel) preserves the existing
sentinel-trip semantic for the latter case.
The unreleased deployment-cold-start page only lives in docs/docs/, not in the versioned snapshots. Slug-based links like /deployment-cold-start resolve to the latest released version (1.9.0) and break the docusaurus build. Relative .mdx links resolve within the same docs version.
Cold-start work is internal-track for now. The implementation details (master/worker COW, gc.freeze, gunicorn preload, asyncio.gather waves) read as architecture rather than user-facing configuration. The user-facing win (Langflow starts faster) happens automatically; if and when prefork mode becomes a recommended path, the env-var reference can carry a one-line note. - Remove docs/docs/Deployment/deployment-cold-start.mdx - Remove sidebar entry - Remove the Cold-start optimization section from deployment-docker.mdx - Remove the Cold-start optimization bullet from deployment-prod-best-practices.mdx - Trim the deployment-tuning paragraph and table footnote in release-notes.mdx so the cold-start row no longer references the dropped page
This comment has been minimized.
This comment has been minimized.
Codecov Report❌ Patch coverage is ❌ Your project check has failed because the head coverage (51.03%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## release-1.10.0 #12783 +/- ##
==================================================
- Coverage 54.36% 53.31% -1.05%
==================================================
Files 2091 2095 +4
Lines 191692 192072 +380
Branches 27455 27506 +51
==================================================
- Hits 104204 102406 -1798
- Misses 86333 88500 +2167
- Partials 1155 1166 +11
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
Build successful! ✅ |
The test asserted that initialize_auto_login_default_superuser() is called exactly once in main.py. That call was intentionally removed — superuser initialization is handled inside initialize_services() via setup_superuser(), which includes file-lock protection for multi-worker environments. The standalone call in main.py was always redundant and its removal didn't break AUTO_LOGIN behavior.
742aa3f to
629c8ab
Compare
This reverts commit 629c8ab.
This reverts commit c6adefd.
Summary
End-to-end cold-start optimization for Langflow. Consolidates work originally split across seven stacked PRs (#12783, #12784, #12785, #12786, #12788, #12789, #12798) into a single landing surface for QA and review.
Nothing has reached
release-1.10.0yet. The dependent PRs were auto-marked as merged by GitHub when their head branches became reachable from this branch during the consolidation merges; their original review threads are preserved on each closed PR.What's in this PR
1. Measurement harness (originally #12783)
Reproducible cold-start benchmarking for
lfx runandlangflow runscenarios. Driver, mock LLM, measurement Dockerfile (python:3.13-slim + uv + hyperfine), threshold-based regression gate in CI. Zero-cost stdlib checkpoint instrumentation inlfx._bench, gated onLFX_BENCHMARK_CHECKPOINTS. Make targets:bench-local,bench-docker,bench-snapshot,bench-verify-synthetic. Newbenchmarksdependency group.2. Component index correctness + build caching (originally #12784)
Lazy
asyncio.LockonComponentCacheso the import-time singleton does not crash on Python 3.13+. Bounded concurrency on module scans so asyncio's default thread pool is not exhausted under high component counts. Atomic disk-cache write via tempfile +os.replace, version-stamped with the lfx package version so lfx-only deployments do not invalidate on every restart. Read-time stale-cache peek runs outside the lock to avoid widening the lock-hold window during multi-MB disk reads.3. Deferred imports off the graph hot path (originally #12785)
Field-typing layer split into static metadata (
lfx/field_typing/names.py) and class-object resolution (constants.py) via PEP 562__getattr__. Stub fallbacks for langchain transitive failures (Windowsc10.dlland similar) with one-time warnings and fork-aware cache cleanup.DEFAULT_IMPORT_STRINGpreamble removed fromvalidate.create_class; user components must include their own imports, with an actionableNameErrorhint mapping each previously-auto-injected name to the rightfrom X import Yline. PEP 563 lazy annotations preserved throughprepare_global_scope, soTYPE_CHECKING-only langchain symbols resolve correctly at tool-mode introspection time.4. Lazy
validate.pyexec_globals (originally #12786)_LazyImportProxystand-ins for langchain modules in user component code, deferred to first real use. Eager imports preserved for non-langchain modules (Pydantic strict-validators, lfx constants). Star imports always resolve eagerly. Windows_MissingModulePlaceholderfor unavailable C-extensions (e.g.jq).5. Service init restructuring + fork-safe container preload (originally #12788)
PreloadStepenum andis_step_completegates so Gunicorn workers skip work the master already completed (profile pictures, bundles, types cache, starter projects, agentic globals/MCP). Master preload runs inpreload._run_master_preload, with two parallel waves viaasyncio.gather:copy_profile_pictures||load_bundles_with_error_handling(different filesystem subtrees, no shared state)initialize_agentic_global_variables||auto_configure_agentic_mcp_server(independent per the prerequisite DAG)Types cache stays sequential after wave 1 because it scans
settings.components_path, which the bundle step extends. Master disposes the DB engine and external cache socket before fork to prevent worker connection-pool inheritance.gc.freeze()after preload moves preloaded objects into the permanent generation so the cyclic GC does not unshare COW pages in workers.6. Release notes (originally #12789, doc page dropped)
The standalone
Deployment/deployment-cold-start.mdxpage was dropped because the cold-start work is internal-track for now (master/worker COW,gc.freeze, asyncio.gather are architecture, not operator-facing config). Release notes still carry the user-visible bullet and the before/after numbers table.7. TCP-probe readiness fix (originally #12798)
langflow_run_http_readybenchmark scenario uses a TCP probe instead of an HTTP probe, removing false negatives when the HTTP server is bound but not yet serving.How to verify
make bench-localruns against the dev venv (fast iteration)make bench-dockerbuilds the measurement image and runs all scenarios end-to-endmake bench-verify-syntheticinjects a synthetic regression and proves the CI gate tripsrun-benchmark-snapshotlabel to capture authoritative numbers againstrelease-1.10.0run-benchmarkslabel to verify against thresholdsFor prefork validation, run with
LANGFLOW_GUNICORN_PRELOAD=trueand watch logs for[preload]lines; workers should logSkipping ...: inherited from masterfor completed preload steps.