Skip to content

Run NCCL tests on the JAX-specific base container #4497

Run NCCL tests on the JAX-specific base container

Run NCCL tests on the JAX-specific base container #4497

Triggered via pull request June 6, 2025 14:27
Status Failure
Total duration 2h 12m 22s
Artifacts 43

ci.yaml

on: pull_request
metadata
0s
metadata
bump-manifest
24s
bump-manifest
Matrix: amd64 / test-distribution
Matrix: arm64 / test-distribution
amd64  /  ...  /  build-base
3m 17s
amd64 / build-base / build-base
arm64  /  ...  /  build-base
3m 47s
arm64 / build-base / build-base
amd64  /  ...  /  build-mpi-operator-compatible-base
1m 53s
amd64 / test-nccl / build-mpi-operator-compatible-base
arm64  /  ...  /  build-mpi-operator-compatible-base
arm64 / test-nccl / build-mpi-operator-compatible-base
Matrix: amd64 / test-jax / run-unit-test
Matrix: amd64 / test-te-a100 / run-unit-test
Matrix: amd64 / test-te-h100 / te-test-h100
amd64  /  ...  /  launch-slurm-runner
26m 44s
amd64 / test-jax / runner / launch-slurm-runner
amd64  /  test-nsys-jax-eks
5m 17s
amd64 / test-nsys-jax-eks
amd64  /  ...  /  launch-slurm-runner
47m 0s
amd64 / test-te-a100 / runner / launch-slurm-runner
amd64  /  build-maxtext
9m 3s
amd64 / build-maxtext
amd64  /  build-upstream-t5x
6m 37s
amd64 / build-upstream-t5x
Matrix: amd64 / test-nsys-jax / run-unit-test
amd64  /  ...  /  launch-slurm-runner
36m 35s
amd64 / test-nsys-jax / runner / launch-slurm-runner
Matrix: amd64 / test-nccl / nccl-test
Matrix: arm64 / test-jax / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-a100 / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-h100 / te-test-h100
Waiting for pending jobs
arm64  /  test-nsys-jax-eks
0s
arm64 / test-nsys-jax-eks
arm64  /  ...  /  launch-slurm-runner
arm64 / test-jax / runner / launch-slurm-runner
arm64  /  ...  /  launch-slurm-runner
arm64 / test-te-a100 / runner / launch-slurm-runner
arm64  /  build-upstream-t5x
9m 16s
arm64 / build-upstream-t5x
arm64  /  build-axlearn
7m 2s
arm64 / build-axlearn
Matrix: arm64 / test-nsys-jax / run-unit-test
Waiting for pending jobs
arm64  /  ...  /  launch-slurm-runner
arm64 / test-nsys-jax / runner / launch-slurm-runner
Matrix: arm64 / test-nccl / nccl-test
Waiting for pending jobs
Matrix: amd64 / test-maxtext / maxtext-multinode
Matrix: amd64 / test-maxtext / single-process-multi-device
amd64  /  ...  /  build-rosetta
14m 31s
amd64 / build-rosetta-t5x / build-rosetta
amd64  /  test-axlearn-eks
19m 22s
amd64 / test-axlearn-eks
amd64  /  test-axlearn-fuji-models-eks
13m 1s
amd64 / test-axlearn-fuji-models-eks
Matrix: amd64 / test-nsys-jax-archive
Matrix: arm64 / test-maxtext / maxtext-multinode
Waiting for pending jobs
Matrix: arm64 / test-maxtext / single-process-multi-device
Waiting for pending jobs
arm64  /  ...  /  build-rosetta
16m 21s
arm64 / build-rosetta-t5x / build-rosetta
arm64  /  test-axlearn-eks
0s
arm64 / test-axlearn-eks
arm64  /  test-axlearn-fuji-models-eks
0s
arm64 / test-axlearn-fuji-models-eks
Matrix: arm64 / test-nsys-jax-archive
amd64  /  ...  /  test-maxtext-metrics
25s
amd64 / test-maxtext / test-maxtext-metrics
amd64  /  collect-docker-tags
0s
amd64 / collect-docker-tags
Matrix: amd64 / test-rosetta-t5x / vit-multi-gpu-multi-node
arm64  /  ...  /  test-maxtext-metrics
arm64 / test-maxtext / test-maxtext-metrics
arm64  /  collect-docker-tags
0s
arm64 / collect-docker-tags
Matrix: arm64 / test-rosetta-t5x / vit-multi-gpu-multi-node
Waiting for pending jobs
amd64  /  ...  /  sitrep
11s
amd64 / test-maxtext / test-maxtext-sitrep / sitrep
amd64  /  ...  /  test-t5x-rosetta-summary
0s
amd64 / test-rosetta-t5x / test-t5x-rosetta-summary
amd64  /  ...  /  test-t5x-rosetta-metrics
13s
amd64 / test-rosetta-t5x / test-t5x-rosetta-metrics
arm64  /  ...  /  sitrep
arm64 / test-maxtext / test-maxtext-sitrep / sitrep
arm64  /  ...  /  test-t5x-rosetta-summary
arm64 / test-rosetta-t5x / test-t5x-rosetta-summary
arm64  /  ...  /  test-t5x-rosetta-metrics
arm64 / test-rosetta-t5x / test-t5x-rosetta-metrics
amd64  /  ...  /  test-maxtext-outcome
0s
amd64 / test-maxtext / test-maxtext-outcome
amd64  /  ...  /  sitrep
8s
amd64 / test-rosetta-t5x / test-t5x-rosetta-sitrep / sitrep
arm64  /  ...  /  test-maxtext-outcome
arm64 / test-maxtext / test-maxtext-outcome
arm64  /  ...  /  sitrep
arm64 / test-rosetta-t5x / test-t5x-rosetta-sitrep / sitrep
amd64  /  ...  /  test-t5x-rosetta-outcome
0s
amd64 / test-rosetta-t5x / test-t5x-rosetta-outcome
arm64  /  ...  /  test-t5x-rosetta-outcome
arm64 / test-rosetta-t5x / test-t5x-rosetta-outcome
make-publish-configs
3s
make-publish-configs
merge-new-manifest
0s
merge-new-manifest
Matrix: publish-containers
finalize  /  workflow-badge
9s
finalize / workflow-badge
finalize  /  report
15s
finalize / report
finalize  /  upload-badge
5s
finalize / upload-badge
finalize  /  publish-badge
3s
finalize / publish-badge
Fit to window
Zoom out
Zoom in

Annotations

8 warnings
amd64 / test-nccl / nccl-test (all_reduce_perf_mpi)
This self-hosted runner is currently using runner version 2.323.0. This version is out of date. Please update to the latest version 2.325.0
amd64 / test-nccl / nccl-test (reduce_scatter_perf_mpi)
This self-hosted runner is currently using runner version 2.323.0. This version is out of date. Please update to the latest version 2.325.0
amd64 / test-nccl / nccl-test (broadcast_perf_mpi)
This self-hosted runner is currently using runner version 2.323.0. This version is out of date. Please update to the latest version 2.325.0
amd64 / test-nccl / nccl-test (all_gather_perf_mpi)
This self-hosted runner is currently using runner version 2.323.0. This version is out of date. Please update to the latest version 2.325.0
amd64 / test-nsys-jax-eks
This self-hosted runner is currently using runner version 2.323.0. This version is out of date. Please update to the latest version 2.325.0
amd64 / test-te-h100 / te-test-h100 (unittest, 8)
This self-hosted runner is currently using runner version 2.323.0. This version is out of date. Please update to the latest version 2.325.0
amd64 / test-axlearn-fuji-models-eks
This self-hosted runner is currently using runner version 2.323.0. This version is out of date. Please update to the latest version 2.325.0
amd64 / test-axlearn-eks
This job failure may be caused by using an out of date version of GitHub runner on your self-hosted runner. You are currently using GitHub runner version 2.323.0. Please update to the latest version 2.325.0

Artifacts

Produced during runtime
Name Size Digest
artifact-axlearn-build-amd64 Expired
566 Bytes
sha256:356fabd0a330fda9ac82ac8998599207910a8ac1bd7ce1d10eb807e1170dee35
artifact-axlearn-build-arm64 Expired
567 Bytes
sha256:cd21bbf731ccc4d363f317f043ceca24fd315e893760f252d48d0450bc1fb20d
artifact-axlearn-test Expired
72.2 KB
sha256:7ff4c34f55a617abbfccb0fe2448e21bb986fb595b1979369ff7a91b2ec5ee5e
artifact-base-build-amd64 Expired
567 Bytes
sha256:4c3582488fcc2ffab4deedd089badf64f4e733f253930304ffb6df6bd79af8d5
artifact-base-build-arm64 Expired
567 Bytes
sha256:e136e70614e4f5ca6313b055dff9a7e1cf2b7188513f09c4ea5f02c1c0b3c77a
artifact-equinox-build-amd64 Expired
568 Bytes
sha256:82e4b7ce4b9b48066e5f43d2115176b7d6a3a5acebc96cab23b52ba90a743d25
artifact-equinox-build-arm64 Expired
570 Bytes
sha256:20dfee87d539f6641a422f7198113f6d781c7bbffe37f7f89662cd5482649b44
artifact-final-report Expired
4 KB
sha256:d65cfa92fb06b95a83f00b1359f7b3cf045ae2bba3f10162c1e94f861c070e99
artifact-jax-build-amd64 Expired
553 Bytes
sha256:7c3ebcab6118be385828df9a9ca37757e999e3c6644c26a9578286d9b399f1ba
artifact-jax-build-arm64 Expired
553 Bytes
sha256:639d72b7759c2366d251d7f0d9073a853860abb2b8f5e8eec7c5399ce9d66dd7
artifact-maxtext-build-amd64 Expired
567 Bytes
sha256:323ebfbf9503415477db5e18bfeabb5eac4c64d08ae8f0abdb60427f9eb47295
artifact-maxtext-build-arm64 Expired
568 Bytes
sha256:dfc5205879f49d2711fe2001288f4025e7eeab7f16f84e526b7fed9e79ff4004
artifact-maxtext-test Expired
1.83 KB
sha256:8bd5d632d9023fd91b484854afcdf209d6e2b449ee6e729988f16fda281c02e3
artifact-mpi-operator-compatible-base-build-amd64 Expired
638 Bytes
sha256:2950c91178f9113890204a7d95fab34ae796f613b91c5dea5c2d88f910ccc62e
artifact-rosetta-build-t5x-amd64 Expired
584 Bytes
sha256:e57b84a0c30f013fb4bf8fa9224dabc0245279992311aac98545d2b50bfc0a0a
artifact-rosetta-build-t5x-arm64 Expired
584 Bytes
sha256:af2aafd5997d0ba2f0711075237cbabe3e6b80bdadb8428ba7cf922ff8e43e9d
artifact-rosetta-t5x-mgmn-test Expired
1.28 KB
sha256:13bdb8410d099ff2b2ce70b69307ee9dde48416a2a5e3973f10d824eca68c53b
artifact-t5x-build-amd64 Expired
569 Bytes
sha256:55fb27e81dceead5a4ada8d4b29d21b34e441947ab2a983a366cc3d87a0e5adc
artifact-t5x-build-arm64 Expired
567 Bytes
sha256:74e85dba3d8031fb94d0ab417caef6e76af1d6b5411185b9140699504cde94d7
artifact-workflow-metadata Expired
278 Bytes
sha256:c17c4b4d58aeb57528804e9855359fe00b2cbf0a1328ce130e64a304c14cf763
bumped-manifest Expired
46.9 KB
sha256:2b18c8b527d30dea5fcf12e787c123c34a516fbeb185bb00fceb26771ae1d1e3
final-axlearn Expired
263 Bytes
sha256:b11288999ebebdb07a9c686a5de174daf08745dfdccac3b00c3a69d87452a9aa
final-base Expired
254 Bytes
sha256:49390bf89b7b89dc5d52ace772f091bebba1d9efdc88b97453a9a665bf329527
final-equinox Expired
263 Bytes
sha256:006802c86614ee51be1bb7f7c318c4700e2fd3b3e9f37792692177a1739ead51
final-jax Expired
251 Bytes
sha256:959a900c59b4a088cb7424b94001edf7703c115c6deac3401e6f1bd2bd94e82e
final-maxtext Expired
263 Bytes
sha256:e78dcfc4860e5a9be87941d12f9fd40804866939653fcea12ce33315a4e53254
final-t5x Expired
251 Bytes
sha256:bfe08b0a8425cd26ea358c569784bffbd558a385fcc9d9a98afd06559e5acff1
final-upstream-t5x Expired
277 Bytes
sha256:0f69f9066629c41b968fe0784afc3c64dd27f7c0a669f83467060a3a92213ed3
jax-unit-test-A100 Expired
20.3 KB
sha256:546bf44af02f9aac167974103c6334ec920cc608338ad43d0341fde9ab767de8
mealkit-axlearn Expired
271 Bytes
sha256:4e835b30d9025f3661770122dc5b608812732d3f0a6daecc7a1cc8c9cc64e0d4
mealkit-equinox Expired
272 Bytes
sha256:2b4d72071231aa4855a75f4886ae9e52ee7648435c8010ad3c6817bb3889af76
mealkit-jax Expired
261 Bytes
sha256:59de82662c992abdf81ac03acda93cddd907e0afc235802d735cd81006a5e176
mealkit-maxtext Expired
271 Bytes
sha256:2f389775be56de9c8050407134c9d1528afa438b735a484fc4a92558a616c9fb
mealkit-t5x Expired
261 Bytes
sha256:eed664c5b777ad92ca0773d173018e487dc512bd247263a2eb35b83bc6b75304
mealkit-upstream-t5x Expired
286 Bytes
sha256:ca4411b1ffa9875648cad7b8edaeb56923198142f8eb566c32f4d2ea20912a91
nsys-jax-unit-test-A100 Expired
31.8 MB
sha256:64ed9477e83d8adbf876c9bc13cf2e1d8521774334c586ca5316304ee9250428
rosetta-t5x-metrics-test-log Expired
1.03 KB
sha256:3188dc17d76725ca0ef849cb807cb6a5a6f8092ee43f5a70a63d7397fe0a5b29
rosetta-t5x-vit-15492909453-VIT8G1N Expired
31.6 KB
sha256:90a833aa0c4ffbae6d3450219da4f459501012fe7242510450031bcc59a7da21
te-unit-test-A100 Expired
815 KB
sha256:2301cba3c66f4fc41f4f709888dd0ae490a201a5d35c7dcff50aef10868639ed
te-unit-test-H100 Expired
983 KB
sha256:8814688166a516808756cbd0fe5333d95f6cc368140d8e6f2b966430e477a541
upstream-maxtext-15492909453-1DP2FSDP4TP1PP_single_process Expired
20.3 KB
sha256:d78e5bea5fc6fce2757e70d9c3bc36c0972feca30aa686e21e43a665ac3ecb3a
upstream-maxtext-15492909453-2DP2FSDP2TP1PP Expired
32.9 KB
sha256:bd6f92cbaaece71867d4ff43d7e1615c74860a6e79e33b3084d6b115e755fad7
upstream-maxtext-metrics-test-log Expired
1.82 KB
sha256:1675bda5a0f94373322d92a1a195b97dfe909ab1fba26b82987908b8f8401c00