-
Notifications
You must be signed in to change notification settings - Fork 7
Migrate CI and Build Infrastructure to UV #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Migrate to uv pip interface for CI builds with static dependency override support. This establishes a simpler, reproducible CI workflow. Changes: - Add GitHub Copilot instructions (.github/copilot-instructions.md) with comprehensive repository context, build commands, and development guidelines - Add static CI overrides file (requirements/ci/overrides.txt) for Lightning commit pinning, replacing dynamic generation approach - Add locked CI requirements (requirements/ci/requirements.txt) generated via uv pip compile for reproducible builds - Add lock script (requirements/utils/lock_ci_requirements.sh) wrapper around uv pip compile - Add CI composite action (.github/actions/install-ci-dependencies) for unified uv-based installation - Add process management wrapper (scripts/manage_standalone_processes.sh) to prevent duplicate test execution during coverage collection The override file is used with `uv pip install --override` to pin Lightning to a specific git commit during CI builds.
Simplify the CI dependency management by: - Filter torch directly in requirements.txt during lock generation when nightly is configured, eliminating the need for requirements_no_torch.txt - All installation paths now use requirements.txt directly - Replace USE_CI_COMMIT_PIN env var with UV_OVERRIDE pointing to requirements/ci/overrides.txt for Lightning commit pinning - Add torch-nightly.txt as centralized config for nightly torch version - Move optional dependencies to pyproject.toml (cli, extra, examples, ipynb) - Add dependency-groups for dev and test in pyproject.toml - Update minimum Python to 3.10 (drop 3.9) - Add requirements-oldest.txt lock file for Python 3.10 oldest deps The torch nightly workflow is now: 1. Edit torch-nightly.txt with version (or leave empty to disable) 2. Run lock_ci_requirements.sh to regenerate locks with torch filtered 3. CI pre-installs torch nightly from PyTorch index, then installs rest This eliminates the complexity of runtime filtering and multiple requirements files while maintaining flexibility for nightly testing.
…just uv cache strategy
…nsorboard version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR successfully migrates the FTS build and CI infrastructure from pip to uv, providing faster and more reproducible builds. The migration is comprehensive, touching CI workflows, build scripts, dependency management, and supporting Python code.
Key Changes
- UV adoption: Replaces pip with uv throughout CI and build scripts for 10-100x faster dependency resolution
- Locked requirements: Introduces
requirements/ci/requirements.txtandrequirements-oldest.txtfor reproducible builds across platforms - Static overrides: Replaces dynamic Lightning override generation with static
requirements/ci/overrides.txtfile - Minimum Python version bump: Changes from Python 3.9 to 3.10 as minimum supported version
Reviewed changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
.github/actions/install-ci-dependencies/action.yml |
New composite action that centralizes UV-based dependency installation for CI |
.github/workflows/ci_test-full.yml |
Updated to use new composite action, removed manual pip caching logic |
.github/workflows/code-checks.yml |
Migrated type checking workflow to use UV and new composite action |
.azure-pipelines/gpu-tests.yml |
Updated Azure GPU tests to use UV and static override file |
requirements/ci/overrides.txt |
New static override file replacing dynamic generation |
requirements/ci/requirements.txt |
New locked requirements for highest resolution (latest tests) |
requirements/ci/requirements-oldest.txt |
New locked requirements for lowest resolution (oldest compatibility tests) |
requirements/ci/torch-nightly.txt |
Configuration file for optional PyTorch nightly builds |
requirements/utils/lock_ci_requirements.sh |
Script to generate locked requirements with UV |
scripts/build_fts_env.sh |
Completely rewritten to use UV, supports venv directory configuration |
scripts/gen_fts_coverage.sh |
Updated with logging improvements and UV support |
scripts/infra_utils.sh |
Added substantial utility functions for venv management and from-source installs |
scripts/manage_standalone_processes.sh |
New process management script with conflict detection |
src/finetuning_scheduler/dynamic_versioning/utils.py |
Simplified Lightning requirement handling (commit pinning now at install time) |
src/finetuning_scheduler/fts_supporters.py |
Bug fix for lr_lambdas synchronization edge case |
src/finetuning_scheduler/strategy_adapters/base.py |
Commented out lr_lambdas handling (related to bug fix) |
setup.py |
Streamlined with dependency management moved to pyproject.toml |
pyproject.toml |
Major update: moved to Python 3.10+, added optional dependencies, added dependency groups, pyright configuration |
tests/test_dynamic_versioning_utils.py |
Updated tests to reflect simplified commit pinning approach |
tests/helpers/expected_warns.py |
Added new expected warnings for PT 2.10 nightly compatibility |
.github/copilot-instructions.md |
New comprehensive repository documentation for AI assistants |
dockers/base-cuda/Dockerfile |
Updated to use UV for package installation |
Comments suppressed due to low confidence (2)
src/finetuning_scheduler/dynamic_versioning/utils.py:20
- Import of 'tomllib' is not used.
import tomllib
src/finetuning_scheduler/dynamic_versioning/utils.py:23
- Import of 'tomllib' is not used.
import tomli as tomllib # type: ignore[import-not-found,no-redef]
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…on preview action, bump some older minimums
…a is active, update installation instructions to include pytorch nightly manual scenarios and current uv torch integration limitations
Overview
This PR enhances the FTS build system by migrating from pip to uv for dependency management and standardizing CI workflows with reusable composite actions. The changes also include fixes for CI issues discovered during testing.
Key Changes
1. UV Migration
Core Changes:
pip installwithuv pip installacross all documentation and workflowsUSE_CI_COMMIT_PINenvironment variable in favor ofUV_OVERRIDEmechanismrequirements/ci/overrides.txtand theUV_OVERRIDEenvironment variableastral-sh/setup-uv@v7for consistent uv installationFiles Updated:
README.md- Updated source installation examplestests/README.md- Updated testing instructions and coverage collection examplesMakefile- Changed pip → uv pip for test/docs targets.github/CONTRIBUTING.md- Updated development setup instructions with nightly torch optiondocs/source/index.rst- Updated installation examplesdocs/source/install/dynamic_versioning.rst- Updated CI commit pinning docs.github/workflows/ci_schema.yml- Added uv setup, use uv pip installsrc/finetuning_scheduler/dynamic_versioning/utils.py- Updated error messages2. CI Workflow Modernization
Reusable Composite Action:
.github/actions/install-ci-dependencies/action.yml- Reusable action for dependency installationlatestandoldestdependency matricesrequirements/ci/overrides.txtfor Lightning commit pinningci_test-full.ymlandcode-checks.ymlWorkflow Updates:
.github/workflows/ci_test-full.yml- Simplified using composite action.github/workflows/code-checks.yml- Added parallel checking (ruff, pyright, pre-commit).azure-pipelines/gpu-tests.yml- Updated for Azure GPU testing.github/workflows/documentation-links.yml- NEW: ReadTheDocs PR preview action that automatically adds documentation preview links to PRs that touch docs, README, or source files (docstrings)3. Requirements Management
Consolidation to pyproject.toml:
All optional dependencies are now defined in
pyproject.tomlwith extras:cli,extra,examples,ipynb,all.New Files:
requirements/ci/requirements.txt- Locked CI dependencies (latest, excludes torch when nightly mode)requirements/ci/requirements-oldest.txt- Locked CI dependencies (oldest matrix, always stable torch)requirements/ci/overrides.txt- Lightning commit pin specificationrequirements/ci/torch-nightly.txt- PyTorch nightly version configurationrequirements/ci/torch_override.txt- Generated file with nightly torch version and installation instructionsrequirements/utils/lock_ci_requirements.sh- Script to regenerate locked requirementsPyTorch Nightly Handling (Two-Step Approach):
The lock script now handles torch nightly with a secure two-step installation approach:
When
torch-nightly.txtis configured:Lockfile Generation:
--prerelease=if-necessary-or-explicitfor prereleases--index-strategy unsafe-best-matchfor lockfile generation only (security-isolated to maintainer machines)--no-emit-package torchtorch_override.txtwith version and installation instructionsUser Installation (Secure Two-Step):
uv pip install --prerelease=if-necessary-or-explicit torch==<version> --index-url https://download.pytorch.org/whl/nightly/cu128UV_OVERRIDE=requirements/ci/overrides.txt uv pip install -e ".[all]"CI Installation:
Without
torch-nightly.txt:--torch-backend=cpuor--torch-backend=autotorch_override.txtif presentRemoved Files:
requirements/lightning_pin.txt- Superseded byrequirements/ci/overrides.txtrequirements.txt(repo root) - No longer neededrequirements/base.txt- Dependencies now inpyproject.tomlanddynamic_versioning/utils.pyrequirements/examples.txt- Now inpyproject.toml[project.optional-dependencies]requirements/extra.txt- Now inpyproject.toml[project.optional-dependencies]requirements/cli.txt- Now inpyproject.toml[project.optional-dependencies]requirements/ipynb.txt- Now inpyproject.toml[project.optional-dependencies]requirements/devel.txt- Now inpyproject.toml[dependency-groups]requirements/test.txt- Now inpyproject.toml[dependency-groups].actions/assistant.py- No longer needed (was used for requirements processing)4. Build Script Enhancements
scripts/build_fts_env.sh:--oldestflag for building with oldest supported dependencies (Python 3.10)--venv-dirflag for explicit venv base directoryscripts/gen_fts_coverage.sh:--oldestflag to pass through to build script--no-specialflag to skip standalone/experimental testsNew Scripts:
scripts/manage_standalone_processes.sh- Wrapper for isolated nohup execution with loggingscripts/manage_standalone_regex.cfg- Regex patterns for output filename generation5. Docker Updates
dockers/base-cuda/Dockerfile:/usr/local/bininstead of user-specific)UV_HTTP_TIMEOUT=120for large package downloads (NVIDIA packages)chmod -R 777 /tmp/venvs/fts_devfrom venv creation step to the final RUN step (after all Python verification commands). This prevents__pycache__directories created by Python execution from having root ownership, which was causing "Permission denied" errors in Azure Pipelines with userns-remapped rootless Docker.dockers/fts-az-base/Dockerfile:6. Bug Fixes
tensorboardX Compatibility:
pyproject.tomlto requiretensorboardX>=2.6.1for protobuf 4.x compatibilitytorch.utils.collect_env Compatibility with UV:
src/fts_examples/cli_experiment_utils.pyto handle cases wherecollect_envreturns empty pip packages when using uv (since uv doesn't install pip by default)split("==", 1)to handle edge cases in package version stringspip>=21.0.0as explicit dependency for torch collect_env backward compatibilityDynamic Versioning:
setup.pydependency handlingutils.pyto reference uv pipCode Cleanup:
src/finetuning_scheduler/strategy_adapters/base.pytests/test_dynamic_versioning_utils.pySphinx/RTD Configuration:
docs/source/conf.pyto use hardcoded mock package list instead of reading from deleted requirements files.readthedocs.ymlto use.[examples]extra instead of separate requirements filessetup_tools.py:
_load_requirements()default file_name from"base.txt"to"requirements.txt"7. Documentation
New:
.github/copilot-instructions.md- Comprehensive copilot instructionsUpdated:
README.md- Two-step nightly installation instructionstests/README.md- Two-step nightly installation, coverage examples.github/CONTRIBUTING.md- Added PyTorch nightly installation sectiondocs/source/install/dynamic_versioning.rst- Two-step nightly installationMigration Notes
For Users
pip installcommands withuv pip installUSE_CI_COMMIT_PINenvironment variable is no longer usedscripts/build_fts_env.shfor development environment setupFor CI/CD
UV_OVERRIDEenvironment variablerequirements/ci/overrides.txtrequirements/utils/lock_ci_requirements.shValidated Installation Scenarios
Files Changed Summary
49 files changed, 3851 insertions(+), 645 deletions(-)
New Files (11)
.github/actions/install-ci-dependencies/action.yml.github/copilot-instructions.md.github/workflows/documentation-links.ymlrequirements/ci/overrides.txtrequirements/ci/requirements-oldest.txtrequirements/ci/requirements.txtrequirements/ci/torch-nightly.txtrequirements/ci/torch_override.txtrequirements/utils/lock_ci_requirements.shscripts/manage_standalone_processes.shscripts/manage_standalone_regex.cfgRemoved Files (10)
.actions/assistant.pyrequirements/lightning_pin.txtrequirements.txt(repo root)requirements/base.txtrequirements/examples.txtrequirements/extra.txtrequirements/cli.txtrequirements/ipynb.txtrequirements/devel.txtrequirements/test.txtModified Files (28)
.azure-pipelines/gpu-tests.yml.github/CONTRIBUTING.md.github/workflows/ci_schema.yml.github/workflows/ci_test-full.yml.github/workflows/code-checks.yml.readthedocs.ymlMakefileMANIFEST.inREADME.mddockers/base-cuda/Dockerfiledockers/release/Dockerfiledockers/fts-az-base/Dockerfiledocs/source/conf.pydocs/source/index.rstdocs/source/install/dynamic_versioning.rstpyproject.tomlscripts/build_fts_env.shscripts/gen_fts_coverage.shsetup.pysrc/finetuning_scheduler/dynamic_versioning/utils.pysrc/finetuning_scheduler/setup_tools.pysrc/finetuning_scheduler/strategy_adapters/base.pysrc/fts_examples/cli_experiment_utils.pysrc/fts_examples/stable/fts_superglue.pysrc/fts_examples/stable/model_parallel/torchtitan_llama.pytests/README.mdtests/helpers/expected_warns.pytests/test_dynamic_versioning_utils.py