Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Commit 94a1680

Browse files
authored
Support FrameworkBarrier for GangExecution and Add Distributed TensorFlow Training Example (#2)
1. Support FrameworkBarrier for GangExecution 2. Add Distributed TensorFlow Training Example
1 parent 3fcb72b commit 94a1680

38 files changed

+1569
-336
lines changed

README.md

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -58,11 +58,8 @@ A Framework represents an application with a set of Tasks:
5858

5959
## Quick Start
6060
1. [Build](build/frameworkcontroller)
61-
2. [Run Example](example/run/frameworkcontroller)
62-
3. [Config Usage](pkg/apis/frameworkcontroller/v1/config.go)
63-
4. [Config Example](example/config)
64-
5. [Framework Usage](pkg/apis/frameworkcontroller/v1/types.go)
65-
6. [Framework Example](example/framework)
61+
2. [Run Example](example/run/frameworkcontroller.md)
62+
3. [Framework Example](example/framework)
6663

6764
## Doc
6865
1. [User Manual](doc/user-manual.md)
@@ -76,13 +73,13 @@ A specialized wrapper can be built on top of FrameworkController to optimize for
7673
* [NNI Controller Wrapper](https://github.com/Microsoft/nni)(Developing): A wrapper client optimized for AutoML applications
7774

7875
## Official Image
79-
[FrameworkController DockerHub](https://hub.docker.com/u/frameworkcontroller)
76+
* [DockerHub](https://hub.docker.com/u/frameworkcontroller)
8077

8178
## Related Project
8279
* [YARN FrameworkLauncher](https://github.com/Microsoft/pai/blob/master/subprojects/frameworklauncher/yarn): Similar offering natively supports [Apache YARN](http://hadoop.apache.org)
8380

8481
## Contributing
85-
This project welcomes contributions and suggestions. Most contributions require you to agree to a
82+
This project welcomes contributions and suggestions. Most contributions require you to agree to a
8683
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
8784
the rights to use your contribution. For details, visit https://cla.microsoft.com.
8885

bin/frameworkbarrier/start.sh

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
#!/bin/bash
2+
3+
# MIT License
4+
#
5+
# Copyright (c) Microsoft Corporation. All rights reserved.
6+
#
7+
# Permission is hereby granted, free of charge, to any person obtaining a copy
8+
# of this software and associated documentation files (the "Software"), to deal
9+
# in the Software without restriction, including without limitation the rights
10+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
11+
# copies of the Software, and to permit persons to whom the Software is
12+
# furnished to do so, subject to the following conditions:
13+
#
14+
# The above copyright notice and this permission notice shall be included in all
15+
# copies or substantial portions of the Software.
16+
#
17+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
18+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
19+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
20+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
21+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
22+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
23+
# SOFTWARE
24+
25+
set -o errexit
26+
set -o nounset
27+
set -o pipefail
28+
29+
BASH_DIR=$(cd $(dirname ${BASH_SOURCE}) && pwd)
30+
31+
cd ${BASH_DIR}
32+
33+
./frameworkbarrier
34+
35+
MNT_DIR=/mnt/frameworkbarrier
36+
37+
mkdir -p ${MNT_DIR}
38+
39+
cp -r ./framework.json ${MNT_DIR}
40+
cp -r ./injector.sh ${MNT_DIR}
41+
42+
echo Succeeded to copy current Framework helper files into ${MNT_DIR}:
43+
cd ${MNT_DIR} && ls -lR .

build/frameworkbarrier/Dockerfile

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# MIT License
2+
#
3+
# Copyright (c) Microsoft Corporation. All rights reserved.
4+
#
5+
# Permission is hereby granted, free of charge, to any person obtaining a copy
6+
# of this software and associated documentation files (the "Software"), to deal
7+
# in the Software without restriction, including without limitation the rights
8+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
# copies of the Software, and to permit persons to whom the Software is
10+
# furnished to do so, subject to the following conditions:
11+
#
12+
# The above copyright notice and this permission notice shall be included in all
13+
# copies or substantial portions of the Software.
14+
#
15+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
# SOFTWARE
22+
23+
FROM golang:alpine as builder
24+
25+
ENV PROJECT_DIR=${GOPATH}/src/github.com/microsoft/frameworkcontroller
26+
ENV INSTALL_DIR=/opt/frameworkcontroller/frameworkbarrier
27+
28+
RUN apk update && apk add --no-cache bash && \
29+
mkdir -p ${PROJECT_DIR} ${INSTALL_DIR}
30+
COPY . ${PROJECT_DIR}
31+
RUN ${PROJECT_DIR}/build/frameworkbarrier/go-build.sh && \
32+
mv ${PROJECT_DIR}/dist/frameworkbarrier/* ${INSTALL_DIR}
33+
34+
35+
FROM alpine:latest
36+
37+
ENV INSTALL_DIR=/opt/frameworkcontroller/frameworkbarrier
38+
39+
RUN apk update && apk add --no-cache bash
40+
COPY --from=builder ${INSTALL_DIR} ${INSTALL_DIR}
41+
WORKDIR ${INSTALL_DIR}
42+
43+
ENTRYPOINT ["./start.sh"]
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
#!/bin/bash
2+
3+
# MIT License
4+
#
5+
# Copyright (c) Microsoft Corporation. All rights reserved.
6+
#
7+
# Permission is hereby granted, free of charge, to any person obtaining a copy
8+
# of this software and associated documentation files (the "Software"), to deal
9+
# in the Software without restriction, including without limitation the rights
10+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
11+
# copies of the Software, and to permit persons to whom the Software is
12+
# furnished to do so, subject to the following conditions:
13+
#
14+
# The above copyright notice and this permission notice shall be included in all
15+
# copies or substantial portions of the Software.
16+
#
17+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
18+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
19+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
20+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
21+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
22+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
23+
# SOFTWARE
24+
25+
set -o errexit
26+
set -o nounset
27+
set -o pipefail
28+
29+
BASH_DIR=$(cd $(dirname ${BASH_SOURCE}) && pwd)
30+
PROJECT_DIR=${BASH_DIR}/../..
31+
IMAGE_NAME=frameworkbarrier
32+
33+
cd ${PROJECT_DIR}
34+
35+
docker build -t ${IMAGE_NAME} -f ${BASH_DIR}/Dockerfile .
36+
37+
echo Succeeded to build docker image ${IMAGE_NAME}

build/frameworkbarrier/go-build.sh

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
#!/bin/bash
2+
3+
# MIT License
4+
#
5+
# Copyright (c) Microsoft Corporation. All rights reserved.
6+
#
7+
# Permission is hereby granted, free of charge, to any person obtaining a copy
8+
# of this software and associated documentation files (the "Software"), to deal
9+
# in the Software without restriction, including without limitation the rights
10+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
11+
# copies of the Software, and to permit persons to whom the Software is
12+
# furnished to do so, subject to the following conditions:
13+
#
14+
# The above copyright notice and this permission notice shall be included in all
15+
# copies or substantial portions of the Software.
16+
#
17+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
18+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
19+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
20+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
21+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
22+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
23+
# SOFTWARE
24+
25+
set -o errexit
26+
set -o nounset
27+
set -o pipefail
28+
29+
BASH_DIR=$(cd $(dirname ${BASH_SOURCE}) && pwd)
30+
# Ensure ${PROJECT_DIR} is ${GOPATH}/src/github.com/microsoft/frameworkcontroller
31+
PROJECT_DIR=${BASH_DIR}/../..
32+
DIST_DIR=${PROJECT_DIR}/dist/frameworkbarrier
33+
34+
cd ${PROJECT_DIR}
35+
36+
rm -rf ${DIST_DIR}
37+
mkdir -p ${DIST_DIR}
38+
39+
go build -o ${DIST_DIR}/frameworkbarrier cmd/frameworkbarrier/*
40+
chmod a+x ${DIST_DIR}/frameworkbarrier
41+
cp -r bin/frameworkbarrier/* ${DIST_DIR}
42+
43+
echo Succeeded to build binary distribution into ${DIST_DIR}:
44+
cd ${DIST_DIR} && ls -lR .

build/frameworkcontroller/Dockerfile

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,15 +20,24 @@
2020
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2121
# SOFTWARE
2222

23-
FROM golang:alpine
23+
FROM golang:alpine as builder
2424

2525
ENV PROJECT_DIR=${GOPATH}/src/github.com/microsoft/frameworkcontroller
26+
ENV INSTALL_DIR=/opt/frameworkcontroller/frameworkcontroller
2627

27-
RUN apk update && apk add bash && mkdir -p ${PROJECT_DIR}
28+
RUN apk update && apk add --no-cache bash && \
29+
mkdir -p ${PROJECT_DIR} ${INSTALL_DIR}
2830
COPY . ${PROJECT_DIR}
29-
WORKDIR ${PROJECT_DIR}
31+
RUN ${PROJECT_DIR}/build/frameworkcontroller/go-build.sh && \
32+
mv ${PROJECT_DIR}/dist/frameworkcontroller/* ${INSTALL_DIR}
3033

31-
RUN ./build/frameworkcontroller/go-build.sh
32-
WORKDIR ./dist/frameworkcontroller
34+
35+
FROM alpine:latest
36+
37+
ENV INSTALL_DIR=/opt/frameworkcontroller/frameworkcontroller
38+
39+
RUN apk update && apk add --no-cache bash
40+
COPY --from=builder ${INSTALL_DIR} ${INSTALL_DIR}
41+
WORKDIR ${INSTALL_DIR}
3342

3443
ENTRYPOINT ["./start.sh"]

build/frameworkcontroller/go-build.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ mkdir -p ${DIST_DIR}
3939
go build -o ${DIST_DIR}/frameworkcontroller cmd/frameworkcontroller/*
4040
chmod a+x ${DIST_DIR}/frameworkcontroller
4141
cp -r bin/frameworkcontroller/* ${DIST_DIR}
42-
cp -r example/config/default/* ${DIST_DIR}
42+
cp -r example/config/default/frameworkcontroller.yaml ${DIST_DIR}
4343

4444
echo Succeeded to build binary distribution into ${DIST_DIR}:
4545
cd ${DIST_DIR} && ls -lR .

cmd/frameworkbarrier/main.go

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
// MIT License
2+
//
3+
// Copyright (c) Microsoft Corporation. All rights reserved.
4+
//
5+
// Permission is hereby granted, free of charge, to any person obtaining a copy
6+
// of this software and associated documentation files (the "Software"), to deal
7+
// in the Software without restriction, including without limitation the rights
8+
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
// copies of the Software, and to permit persons to whom the Software is
10+
// furnished to do so, subject to the following conditions:
11+
//
12+
// The above copyright notice and this permission notice shall be included in all
13+
// copies or substantial portions of the Software.
14+
//
15+
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
// SOFTWARE
22+
23+
package main
24+
25+
import (
26+
"github.com/microsoft/frameworkcontroller/pkg/common"
27+
"github.com/microsoft/frameworkcontroller/pkg/barrier"
28+
)
29+
30+
func init() {
31+
common.InitAll()
32+
}
33+
34+
func main() {
35+
barrier.NewFrameworkBarrier().Run()
36+
}

cmd/frameworkcontroller/main.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ func main() {
3838
stopCh := make(chan struct{})
3939
defer close(stopCh)
4040

41-
go controller.NewQueueFrameworkController().Run(stopCh)
41+
go controller.NewFrameworkController().Run(stopCh)
4242

4343
sigTerm := make(chan os.Signal, 1)
4444
signal.Notify(sigTerm, syscall.SIGTERM)

doc/known-issue-and-upcoming-feature.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@
1313
Tracked in [Dashboard errors if pod's owner reference is not supported](https://github.com/kubernetes/dashboard/issues/3251)
1414

1515
## <a name="UpcomingFeature">Upcoming Feature</a>
16-
- [ ] Add Distributed TensorFlow Training Example
1716
- [ ] Support Framework Spec Update
1817
- [ ] Support Framework Spec Validation and Defaulting
1918
- [ ] Support Framework Status Subresource

0 commit comments

Comments
 (0)