Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 21 additions & 45 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@

### Description

### Tests run

**NOTE: By default, docker builds are disabled. In order to build your container, please update dlc_developer_config.toml and specify the framework to build in "build_frameworks"**
- [ ] I have run builds/tests on commit <INSERT COMMIT ID> for my changes.
### Tests Run
By default, docker image builds and tests are disabled. Two ways to run builds and tests:
1. Using dlc_developer_config.toml
2. Using this PR description (currently only supported for PyTorch, TensorFlow, vllm, and base images)

<details>
<summary>Confused on how to run tests? Try using the helper utility...</summary>
<summary>How to use the helper utility for updating dlc_developer_config.toml</summary>

Assuming your remote is called `origin` (you can find out more with `git remote -v`)...

Expand All @@ -28,50 +28,34 @@ Assuming your remote is called `origin` (you can find out more with `git remote
- Restore TOML file when ready to merge

`python src/prepare_dlc_dev_environment.py -rcp origin`
</details>

**NOTE: If you are creating a PR for a new framework version, please ensure success of the standard, rc, and efa sagemaker remote tests by updating the dlc_developer_config.toml file:**
<details>
<summary>Expand</summary>

**NOTE: If you are creating a PR for a new framework version, please ensure success of the local, standard, rc, and efa sagemaker tests by updating the dlc_developer_config.toml file:**
- [ ] `sagemaker_remote_tests = true`
- [ ] `sagemaker_efa_tests = true`
- [ ] `sagemaker_rc_tests = true`

**Additionally, please run the sagemaker local tests in at least one revision:**
- [ ] `sagemaker_local_tests = true`

</details>

### Formatting
- [ ] I have run `black -l 100` on my code (formatting tool: https://black.readthedocs.io/en/stable/getting_started.html)

### DLC image/dockerfile

#### Builds to Execute
<details>
<summary>Expand</summary>

Fill out the template and click the checkbox of the builds you'd like to execute

*Note: Replace with <X.Y> with the major.minor framework version (i.e. 2.2) you would like to start.*

- [ ] build_pytorch_training_<X.Y>_sm
- [ ] build_pytorch_training_<X.Y>_ec2

- [ ] build_pytorch_inference_<X.Y>_sm
- [ ] build_pytorch_inference_<X.Y>_ec2
- [ ] build_pytorch_inference_<X.Y>_graviton
<summary>How to use PR description</summary>
Use the code block below to uncomment commands and run the PR CodeBuild jobs. There are two commands available:

- [ ] build_tensorflow_training_<X.Y>_sm
- [ ] build_tensorflow_training_<X.Y>_ec2
- `# /buildspec <buildspec_path>`
- e.g.: `# /buildspec pytorch/training/buildspec.yml`
- If this line is commented out, dlc_developer_config.toml will be used.
- `# /tests <test_list>`
- e.g.: `# /tests sanity security ec2`
- If this line is commented out, it will run the default set of tests (same as the defaults in dlc_developer_config.toml): `sanity, security, ec2, ecs, eks, sagemaker, sagemaker-local`.

- [ ] build_tensorflow_inference_<X.Y>_sm
- [ ] build_tensorflow_inference_<X.Y>_ec2
- [ ] build_tensorflow_inference_<X.Y>_graviton
</details>

### Additional context
```
# /buildspec <buildspec_path>
# /tests <test_list>
```

### Formatting
- [ ] I have run `black -l 100` on my code (formatting tool: https://black.readthedocs.io/en/stable/getting_started.html)

### PR Checklist
<details>
Expand All @@ -84,14 +68,6 @@ Fill out the template and click the checkbox of the builds you'd like to execute
- [ ] (If applicable) I've documented below the tests I've run on the DLC image
- [ ] (If applicable) I've reviewed the licenses of updated and new binaries and their dependencies to make sure all licenses are on the Apache Software Foundation Third Party License Policy Category A or Category B license list. See [https://www.apache.org/legal/resolved.html](https://www.apache.org/legal/resolved.html).
- [ ] (If applicable) I've scanned the updated and new binaries to make sure they do not have vulnerabilities associated with them.

#### NEURON/GRAVITON Testing Checklist
* When creating a PR:
- [ ] I've modified `dlc_developer_config.toml` in my PR branch by setting `neuron_mode = true` or `graviton_mode = true`

#### Benchmark Testing Checklist
* When creating a PR:
- [ ] I've modified `dlc_developer_config.toml` in my PR branch by setting `ec2_benchmark_tests = true` or `sagemaker_benchmark_tests = true`
</details>

### Pytest Marker Checklist
Expand Down
14 changes: 9 additions & 5 deletions src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,15 @@ def main():
# this build.
utils.write_to_json_file(constants.TEST_TYPE_IMAGES_PATH, {})

# Skip tensorflow-1 PR jobs, as there are no longer patch releases being added for TF1
# Purposefully not including this in developer config to make this difficult to enable
# TODO: Remove when we remove these jobs completely
build_name = get_codebuild_project_name()
if build_context == "PR" and build_name == "dlc-pr-tensorflow-1":
# Only bypass TOML checks if buildspec comes from PR description
if os.getenv("FRAMEWORK_BUILDSPEC_FILE") and os.getenv("FROM_PR_DESCRIPTION") == "true":
utils.build_setup(
args.framework,
device_types=device_types,
image_types=image_types,
py_versions=py_versions,
)
image_builder(args.buildspec, image_types, device_types)
return

# A general build will work if build job and build mode are in non-EI, non-NEURON
Expand Down
34 changes: 27 additions & 7 deletions src/start_testbuilds.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,13 @@ def run_test_job(commit, codebuild_project, images_str=""):
if config.is_deep_canary_mode_enabled():
env_overrides.append({"name": "DEEP_CANARY_MODE", "value": "true", "type": "PLAINTEXT"})

# Get specified tests from PR description if any
specified_tests = os.getenv("SPECIFIED_TESTS")
if specified_tests:
env_overrides.append(
{"name": "SPECIFIED_TESTS", "value": specified_tests, "type": "PLAINTEXT"}
)

pr_num = os.getenv("PR_NUMBER")
LOGGER.debug(f"pr_num {pr_num}")
env_overrides.extend(
Expand Down Expand Up @@ -248,17 +255,29 @@ def main():
# Deep Canaries, as detailed in the docstring for run_deep_canary_pr_testbuilds().
return

# load the images for all test_types to pass on to code build jobs
# Load the test types to images mapping
with open(constants.TEST_TYPE_IMAGES_PATH) as json_file:
test_images = json.load(json_file)

# Run necessary PR test jobs
commit = os.getenv("CODEBUILD_RESOLVED_SOURCE_VERSION")

specified_tests_env = os.getenv("SPECIFIED_TESTS")
if specified_tests_env:
specified_tests = specified_tests_env.split()
LOGGER.info(f"Running only specified tests from PR description: {specified_tests}")
else:
specified_tests = None

for test_type, images in test_images.items():
# only run the code build test jobs when the images are present
# Skip any test_type not explicitly requested
if specified_tests and test_type not in specified_tests:
LOGGER.info(f"Skipping {test_type} test because it wasn’t in SPECIFIED_TESTS")
continue

LOGGER.debug(f"test_type : {test_type}")
LOGGER.debug(f"images: {images}")
# Only run the CodeBuild test jobs when images are present
if images:
pr_test_job = f"dlc-pr-{test_type}-test"
images_str = " ".join(images)
Expand All @@ -275,11 +294,12 @@ def main():
):
run_test_job(commit, pr_test_job, images_str)

if test_type == "autopr" and config.is_autopatch_build_enabled(
buildspec_path=config.get_buildspec_override()
or os.getenv("FRAMEWORK_BUILDSPEC_FILE"),
):
run_test_job(commit, f"dlc-pr-{test_type}", images_str)
# autopr is disabled
# if test_type == "autopr" and config.is_autopatch_build_enabled(
# buildspec_path=config.get_buildspec_override()
# or os.getenv("FRAMEWORK_BUILDSPEC_FILE"),
# ):
# run_test_job(commit, f"dlc-pr-{test_type}", images_str)

# Trigger sagemaker local test jobs when there are changes in sagemaker_tests
if test_type == "sagemaker" and config.is_sm_local_test_enabled():
Expand Down
24 changes: 14 additions & 10 deletions src/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -462,16 +462,20 @@ def upload_data_to_pr_creation_s3_bucket(upload_data: str, s3_filepath: str, tag
:param tag_set: List[Dict], as described above
:return: str, s3 file path
"""
s3_resource = boto3.resource("s3")
s3object = s3_resource.Object(constants.PR_CREATION_DATA_HELPER_BUCKET, s3_filepath)
s3_client = s3_resource.meta.client
s3object.put(Body=(bytes(upload_data.encode("UTF-8"))))
if tag_set:
s3_client.put_object_tagging(
Bucket=constants.PR_CREATION_DATA_HELPER_BUCKET,
Key=s3_filepath,
Tagging={"TagSet": tag_set},
)
s3 = boto3.resource("s3")
bucket = constants.PR_CREATION_DATA_HELPER_BUCKET
obj = s3.Object(bucket, s3_filepath)
client = s3.meta.client
try:
obj.put(Body=upload_data.encode("utf-8"))
if tag_set:
client.put_object_tagging(
Bucket=bucket,
Key=s3_filepath,
Tagging={"TagSet": tag_set},
)
except ClientError as e:
LOGGER.info(f"Could not write to s3://{bucket}/{s3_filepath}: {e}")


def get_unique_s3_path_for_uploading_data_to_pr_creation_bucket(image_uri: str, file_name: str):
Expand Down
8 changes: 8 additions & 0 deletions test/testrunner.py
Original file line number Diff line number Diff line change
Expand Up @@ -441,6 +441,11 @@ def main():
else:
raise Exception(f"EKS cluster {eks_cluster_name} is not in active state")

# Get specified tests if any
specified_tests = os.getenv("SPECIFIED_TESTS")
if specified_tests:
specified_tests = specified_tests.split()

# Execute dlc_tests pytest command
pytest_cmd = [
"-s",
Expand All @@ -449,6 +454,9 @@ def main():
f"--junitxml={report}",
"-n=auto",
]
if specified_tests:
test_expr = " or ".join(f"test_{t}" for t in specified_tests)
pytest_cmd.extend(["-k", f"({test_expr})"])

is_habana_image = any("habana" in image_uri for image_uri in all_image_list)
if specific_test_type == "ec2":
Expand Down