Skip to content

Commit 5684175

Browse files
authored
Add distributed tests to run-readme-pr.yml (#1466)
* Add distributed tests to run-readme-pr.yml Need to ensure this is the right runner, @lessw2020 can you please have a look -- torchchat uses the same runners as pytorch. * Update run-docs Remove HF login because tokens not available as git secret * Update run-docs Replace llama3.1 with open-llama to avoid need for token. If this turns out running too long, then we can switch to stories110M * Update run-docs open-llama -> stories.
1 parent 9686c79 commit 5684175

File tree

2 files changed

+24
-1
lines changed

2 files changed

+24
-1
lines changed

.ci/scripts/run-docs

+2-1
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,8 @@ fi
129129
if [ "$1" == "distributed" ]; then
130130

131131
echo "::group::Create script to run distributed"
132-
python3 torchchat/utils/scripts/updown.py --file docs/distributed.md > ./run-distributed.sh
132+
python3 torchchat/utils/scripts/updown.py --file docs/distributed.md --replace 'llama3.1:stories110M,-l 3:-l 2' --suppress huggingface-cli,HF_TOKEN > ./run-distributed.sh
133+
python3 torchchat/utils/scripts/updown.py --file docs/distributed.md --suppress huggingface-cli,HF_TOKEN > ./run-distributed.sh
133134
# for good measure, if something happened to updown processor,
134135
# and it did not error out, fail with an exit 1
135136
echo "exit 1" >> ./run-distributed.sh

.github/workflows/run-readme-pr.yml

+22
Original file line numberDiff line numberDiff line change
@@ -306,3 +306,25 @@ jobs:
306306
echo "::endgroup::"
307307
308308
TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs native
309+
310+
test-distributed-cuda:
311+
permissions:
312+
id-token: write
313+
contents: read
314+
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
315+
with:
316+
runner: linux.g5.4xlarge.nvidia.gpu
317+
gpu-arch-type: cuda
318+
gpu-arch-version: "12.4"
319+
timeout: 60
320+
script: |
321+
echo "::group::Print machine info"
322+
uname -a
323+
echo "::endgroup::"
324+
325+
.ci/scripts/run-docs distributed
326+
327+
echo "::group::Completion"
328+
echo "tests complete"
329+
echo "*******************************************"
330+
echo "::endgroup::"

0 commit comments

Comments
 (0)