Skip to content

fix(infra): create pre commit script and port vertex as lazy import#5453

Merged
edwin-onyx merged 22 commits intomainfrom
edwin/dan-2558
Sep 22, 2025
Merged

fix(infra): create pre commit script and port vertex as lazy import#5453
edwin-onyx merged 22 commits intomainfrom
edwin/dan-2558

Conversation

@edwin-onyx
Copy link
Contributor

@edwin-onyx edwin-onyx commented Sep 18, 2025

Description

created pre commit script to prevent brittleness of inline lazy imports it make sure that packages which are lazy loaded must not be imported directly

couldnt find any solid lazy loading python libraries, top results that come up r smth like this https://pypi.org/project/lazy-imports/ which has liek 10 starts, i think just doing python native and having script to enforce is prob best?

ported over the vertex ai import to this structure and tested using script in https://github.com/onyx-dot-app/onyx/pull/5456/files

per celery worker ->
without vertex ai lazy loading-Peak RSS Memory: 796.9 MB
with - Peak RSS Memory: 653.0 MB, 18% decrease

moving forward will start porting other big packages / connector specific pkgs that likely won't be used by core application flow onto new lazy loading abstraction to further reduce mem usage

https://linear.app/danswer/issue/DAN-2558/investigate-lazy-loading

How Has This Been Tested?

created unit tests for both lazy loading abstraction and pre commit script

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

Summary by cubic

Introduces lazy imports to reduce baseline memory and only load heavy Google/VertexAI libs when needed, addressing DAN-2558 (Reduce Memory usage in Onyx). Adds tools to measure Celery worker memory and per-package import cost.

  • Refactors

    • Added lazy_imports.py with LazyImport/LazyClass and wrappers for googleapiclient, google.auth, and vertexai.
    • Switched Gmail and Google Drive connectors plus google_utils (auth/resources) to lazy imports; routed HttpError/RefreshError via wrappers; replaced Resource typing with light placeholders; tightened types with Union and TYPE_CHECKING.
    • Vertex AI embeddings now lazily load service account creds and language_models; behavior unchanged.
  • New Features

    • measure_celery_memory.py: launch a Celery worker and sample RSS/USS/PSS with a summary.
    • measure_package_size.py: import packages from requirements and report per-import memory deltas.

@vercel
Copy link

vercel bot commented Sep 18, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
internal-search Ready Ready Preview Comment Sep 20, 2025 4:03am

Moving this script to a separate feature branch for better organization.
The script will be available in the feature/celery-memory-measurement branch.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@edwin-onyx edwin-onyx changed the title Edwin/dan 2558 fix(infra): create lazy module helper and pre commit script and port vertex as lazy import Sep 19, 2025
greptile-apps[bot]

This comment was marked as outdated.

@edwin-onyx edwin-onyx requested a review from Weves September 19, 2025 03:37
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 issues found across 12 files

Prompt for AI agents (all 7 issues)

Understand the root cause of the following 7 issues and fix them.


<file name="backend/onyx/lazy_handling/lazy_module.py">

<violation number="1" location="backend/onyx/lazy_handling/lazy_module.py:29">
Possible race: getattr may run after a prior import failure by another thread, leading to AttributeError from None instead of ImportError; re-check _import_failed (or ensure self._module is not None) after the import section before getattr.</violation>
</file>

<file name="backend/scripts/check_lazy_imports.py">

<violation number="1" location="backend/scripts/check_lazy_imports.py:110">
Dotted submodule imports like `from google import auth` (for protected `google.auth`) are not detected, allowing disallowed direct imports to slip through.</violation>
</file>

<file name="backend/tests/unit/scripts/test_check_lazy_imports.py">

<violation number="1" location="backend/tests/unit/scripts/test_check_lazy_imports.py:134">
Test enforces over-broad violation: flagging &#39;from some_package import vertexai&#39; will cause false positives; restrict to imports from the protected module/package only.</violation>
</file>

<file name=".pre-commit-config.yaml">

<violation number="1" location=".pre-commit-config.yaml:44">
Use python3 to ensure the hook runs with Python 3, avoiding failures on systems where &quot;python&quot; is Python 2 or not available.</violation>
</file>

<file name="backend/tests/unit/onyx/lazy_handling/test_lazy_module.py">

<violation number="1" location="backend/tests/unit/onyx/lazy_handling/test_lazy_module.py:68">
Ineffective assertion: comparing the same object to itself is always true, so this test does not verify caching behavior.</violation>
</file>

<file name="backend/onyx/natural_language_processing/search_nlp_models.py">

<violation number="1" location="backend/onyx/natural_language_processing/search_nlp_models.py:276">
Accessing TextEmbeddingModel from top-level lazy_vertexai likely breaks: class lives in vertexai.language_models, but the lazy wrapper only imports &#39;vertexai&#39; top-level, causing AttributeError at runtime.</violation>

<violation number="2" location="backend/onyx/natural_language_processing/search_nlp_models.py:279">
Accessing TextEmbeddingInput from top-level lazy_vertexai likely fails; it lives in vertexai.language_models, but only the top-level package is lazily imported.</violation>
</file>

React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.

@edwin-onyx edwin-onyx changed the title fix(infra): create lazy module helper and pre commit script and port vertex as lazy import fix(infra): create pre commit script and port vertex as lazy import Sep 20, 2025
@edwin-onyx edwin-onyx disabled auto-merge September 20, 2025 04:35
@edwin-onyx edwin-onyx disabled auto-merge September 20, 2025 21:15
@edwin-onyx edwin-onyx merged commit 3cde4ef into main Sep 22, 2025
100 of 109 checks passed
@edwin-onyx edwin-onyx deleted the edwin/dan-2558 branch September 22, 2025 03:43
razvanMiu pushed a commit to eea/danswer that referenced this pull request Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants