Skip to content

Conversation

@tomvothecoder
Copy link
Collaborator

@tomvothecoder tomvothecoder commented Oct 29, 2025

Description

This PR significantly improves the import time of xcdat by deferring the import of the xgcm package until it is actually needed within the .vertical() accessor method.

Problem

The xgcm.transform module uses Numba's @guvectorize, which eagerly compiles functions at import time. Since xcdat/__init__.py indirectly imports xgcm, this causes an additional 3–4 seconds of import latency, even if vertical regridding is never used.

More context here: #805 (comment)

Solution

This PR resolves the issue by:

  • Deferring the from xgcm import Grid statement until the moment .vertical() is called.
  • This ensures that xgcm and its Numba-based modules are only loaded and compiled if/when vertical regridding is actually used.

Results

Import-time benchmarking shows a ~3 second improvement in xcdat import latency:

Branch Import Behavior Avg. Import Time
main Eagerly imports xgcm 4.264 sec
feature/805-speed-up-imports Defers xgcm import 1.193 sec

Reproducible Benchmark Script

The script is included to measure true import-only performance and verify the improvement.

#!/usr/bin/env python3
"""
Measure *true import-only* times for xcdat and its dependencies.

It compares the main branch against the feature/805-speed-up-imports branch.
"""

from __future__ import annotations

import subprocess
import sys
import time
from pathlib import Path
from statistics import mean
from typing import Literal


N_RUNS = 3  # 3 runs is usually enough for this comparison


def time_subprocess(cmd: list[str]) -> float:
    """Return wall-clock runtime (seconds) for a subprocess call."""
    start = time.perf_counter()
    subprocess.run(cmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    return time.perf_counter() - start


def measure_baseline(runs: int = N_RUNS) -> float:
    """Measure average Python interpreter startup time."""
    times = [time_subprocess([sys.executable, "-c", "pass"]) for _ in range(runs)]
    return mean(times)


def measure_import_only(
    module: str,
    baseline: float,
    branch=Literal["main", "feature/805-speed-up-import"],
    runs: int = N_RUNS,
) -> tuple[str, float]:
    """Measure average import time for one module, subtracting interpreter startup."""
    times = []
    for _ in range(runs):
        # Switch to the specified branch before measuring import time
        subprocess.run(
            ["git", "checkout", branch],
            stdout=subprocess.DEVNULL,
            stderr=subprocess.DEVNULL,
        )
        t = time_subprocess([sys.executable, "-c", f"import {module}; pass"])
        times.append(max(0.0, t - baseline))

    return module, mean(times)


def main() -> int:
    print("\n📦 Measuring *true import-only* times for xcdat dependencies...\n")
    print(f"  - Each import is measured over {N_RUNS} runs (baseline-subtracted).\n")

    # Measure baseline
    baseline = measure_baseline()
    print(f"  - Baseline Python startup time: {baseline:.3f} seconds\n")

    # Measure xcdat on the main branch
    print("\n1. Measuring xcdat (main branch - imports all dependencies)...")
    mod, avg = measure_import_only("xcdat", baseline, "main")
    print(f"  Module: {mod:<20} | Average Import Time: {avg:>10.3f} seconds")

    # Measure xcdat on the feature/805-speed-up-import branch
    print(
        "\n2. Measuring xcdat (feature/805-speed-up-import branch - imports all dependencies)..."
    )
    mod, avg = measure_import_only("xcdat", baseline, "feature/805-speed-up-import")
    print(f"  Module: {mod:<20} | Average Import Time: {avg:>10.3f} seconds")

    # Optional detailed log
    log_file = Path("xcdat_importtime.log")
    print(f"\n📝 Generating detailed import-time log: {log_file}")
    with open(log_file, "w") as f:
        subprocess.run(
            [sys.executable, "-X", "importtime", "-c", "import xcdat"],
            stderr=f,
            stdout=subprocess.DEVNULL,
        )
    print(f"  ✅ Log successfully written to: {log_file.resolve()}\n")
    return 0


if __name__ == "__main__":
    sys.exit(main())

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@github-actions github-actions bot added the type: enhancement New enhancement request label Oct 29, 2025
@tomvothecoder tomvothecoder changed the title Defer import of xgcm to .vertical() method Improve import speed by importing xgcm in .vertical() Oct 29, 2025
@codecov
Copy link

codecov bot commented Oct 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (7ae6b72) to head (b959ceb).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #810   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           16        16           
  Lines         1784      1784           
=========================================
  Hits          1784      1784           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tomvothecoder tomvothecoder marked this pull request as ready for review October 29, 2025 20:17
@tomvothecoder tomvothecoder requested a review from Copilot October 29, 2025 20:27
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements lazy importing of xgcm.Grid to improve module import performance by avoiding unnecessary JIT compilation overhead during initial imports.

Key changes:

  • Moved xgcm.Grid import from module-level to inside the vertical method (lazy import)
  • Updated all test mocks to patch xgcm.Grid instead of xcdat.regridder.xgcm.Grid
  • Added documentation explaining the rationale for lazy importing

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
xcdat/regridder/xgcm.py Removed module-level Grid import and added lazy import with explanatory comment in the vertical method
tests/test_regrid.py Updated 8 mock.patch decorators to patch xgcm.Grid instead of the old module-level import path

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator Author

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jasonb5 and @pochedls, I've opened this PR to address #805. Can you review to ensure it looks good? It's a simple one line change.

This PR defers the import of xgcm to only when .vertical() is called to improve the speed of import xcdat. Please refer to the PR description for more information.

Thanks!

@tomvothecoder tomvothecoder changed the title Improve import speed by importing xgcm in .vertical() Defer xgcm import to speed up xcdat startup time by ~3 seconds Oct 29, 2025
@tomvothecoder tomvothecoder moved this from Todo to In Progress in xCDAT Development Oct 29, 2025
@tomvothecoder tomvothecoder moved this to In Review in xCDAT Development Oct 29, 2025
@tomvothecoder tomvothecoder self-assigned this Oct 29, 2025
Copy link
Collaborator

@jasonb5 jasonb5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tomvothecoder tomvothecoder merged commit 08a340a into main Nov 5, 2025
9 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in xCDAT Development Nov 5, 2025
@tomvothecoder tomvothecoder deleted the feature/805-speed-up-import branch November 5, 2025 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: enhancement New enhancement request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Enhancement]: Speed Up Import of xcdat

4 participants