-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Include VertexAI cluster environment for Fabric #19911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
miguelalba96
wants to merge
11
commits into
Lightning-AI:master
Choose a base branch
from
miguelalba96:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+64
−0
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The following cluster environment allows fabric to be aware of the cluster specification defined on a custom training job in vertex AI More information about the environment variables in: https://cloud.google.com/vertex-ai/docs/training/distributed-training
added .py extension
rename VertexAIEnvironment for consistency with other environments
Added VertexAIEnvironment
Added VertexAIEnvironment to connectors
include VertexAIEnvironment in list of environments in lighting.pytorch
for more information, see https://pre-commit.ci
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://lightning.ai/docs/pytorch/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Discord. Thank you for your contributions. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
docs
Documentation related
fabric
lightning.fabric.Fabric
has conflicts
pl
Generic label for PyTorch Lightning package
won't fix
This will not be worked on
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
VertexAI cluster environment for Fabric
It includes a subclass that picks the proper CLUSTER_SPEC from a VertexAI custom training job and populate the respective environment variables necessary for DDP
Dependencies: os, json
Before submitting
I added documentation on the docstring, I am not entirely sure where to add more docs
Did you write any new necessary tests? (not for typos and docs)
[ x] Did you verify new and existing tests pass locally with your changes?
I tested the code running it in Vertex AI pipelines as a custom training job, to asses the efficacy of the change you can check the documentation
I tested the custom job using
from the official Google's documentation:
https://cloud.google.com/vertex-ai/docs/pipelines/customjob-component#create_custom_training_job_from_component_function
https://cloud.google.com/vertex-ai/docs/pipelines/request-gcp-machine-resources
Did you list all the breaking changes introduced by this pull request?
There are no substantial changes in the source code more than adding the environment in some imports
yes
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist
📚 Documentation preview 📚: https://pytorch-lightning--19911.org.readthedocs.build/en/19911/