Add Kubeflow Trainer v2.2 release blog post#194
Add Kubeflow Trainer v2.2 release blog post#194google-oss-prow[bot] merged 14 commits intokubeflow:masterfrom
Conversation
andreyvelich
left a comment
There was a problem hiding this comment.
Thank you for this @xikronz !
|
@xikronz Please sign your commit too for DCO |
|
@andreyvelich: GitHub didn't allow me to assign the following users: robert-bell, Krishna-kg732, vsoch, XploY04, yashpal2104. Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: xikron <cc2864@cornell.edu>
Signed-off-by: xikron <cc2864@cornell.edu>
58c2a2f to
a99e675
Compare
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Signed-off-by: Carrie Chen <139071206+xikronz@users.noreply.github.com>
|
/ok-to-test |
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Signed-off-by: Carrie Chen <139071206+xikronz@users.noreply.github.com>
andreyvelich
left a comment
There was a problem hiding this comment.
@xikronz Can you also sign your commits for DCO.
Ref: https://www.kubeflow.org/docs/about/contributing/#sign-off-your-commits
|
I am not a kubeflow member, can someone make me a member not able to get assigned issues |
Sure, can you create PR similar to this one: kubeflow/internal-acls#895 |
Signed-off-by: xikron <cc2864@cornell.edu>
Signed-off-by: xikron <cc2864@cornell.edu>
e28c5a0 to
94c1974
Compare
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Signed-off-by: Carrie Chen <139071206+xikronz@users.noreply.github.com>
Signed-off-by: xikron <cc2864@cornell.edu>
Signed-off-by: xikron <cc2864@cornell.edu>
Signed-off-by: xikron <cc2864@cornell.edu>
Signed-off-by: xikron <cc2864@cornell.edu>
There was a problem hiding this comment.
Thanks for this work @xikronz!
/lgtm
/assign @astefanutti @VassilisVassiliadis @kramaranya @Fiona-Waters @robert-bell @tenzen-y @vsoch
|
@andreyvelich: GitHub didn't allow me to assign the following users: robert-bell, VassilisVassiliadis. Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
| enabling them to treat GPUs across multiple machines as a single unified memory domain. For | ||
| large-scale training, this means significantly faster node-to-node communication compared to | ||
| standard network-based primitives and brings forth a new era of configurations that simply | ||
| weren't practical before on Kubernetes. |
There was a problem hiding this comment.
Let's also say this
| weren't practical before on Kubernetes. | |
| weren't practical before on Kubernetes. We are working closely with Kubernetes community to introduce first class support for Dynamic Resource Allocation (DRA) in TrainJobs. |
cc @Ronkahn21
| process, Trainer will choose appropriate resources automatically based on the TrainJob configuration. | ||
| This gives teams the power to plan experiments with confidence and trust that jobs use just the right | ||
| amount of compute. | ||
|
|
There was a problem hiding this comment.
Can we also say something about Workload Aware Scheduling please?
kubeflow/trainer#3219
Co-authored-by: Vanessa Sochat <814322+vsoch@users.noreply.github.com> Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Signed-off-by: Carrie Chen <139071206+xikronz@users.noreply.github.com>
|
@vsoch: changing LGTM is restricted to collaborators DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: xikron <cc2864@cornell.edu>
Signed-off-by: xikron <cc2864@cornell.edu>
andreyvelich
left a comment
There was a problem hiding this comment.
Awesome work @xikronz!
I am going to announce the Trainer v2.2 release tomorrow.
/lgtm
/approve
/hold in case others want to give more comments.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold cancel |
This blog post (collaborative) covers the Kubeflow Trainer v2.2 release. For the full release tracking issue, see #3116