Skip to content

CI

CI #5018

Triggered via schedule November 5, 2025 09:35
Status Failure
Total duration 2h 36m 18s
Artifacts 53

ci.yaml

on: schedule
metadata
3s
metadata
bump-manifest
18s
bump-manifest
Matrix: amd64 / test-distribution
Matrix: arm64 / test-distribution
amd64  /  ...  /  build-base
2m 32s
amd64 / build-base / build-base
arm64  /  ...  /  build-base
3m 1s
arm64 / build-base / build-base
amd64  /  ...  /  build-mpi-operator-compatible-base
1m 37s
amd64 / test-nccl / build-mpi-operator-compatible-base
amd64  /  ...  /  build-nccl-gke
1m 54s
amd64 / test-nccl / nccl-test-gke / build-nccl-gke
arm64  /  ...  /  build-mpi-operator-compatible-base
arm64 / test-nccl / build-mpi-operator-compatible-base
arm64  /  ...  /  build-nccl-gke
arm64 / test-nccl / nccl-test-gke / build-nccl-gke
Matrix: amd64 / test-jax-cutlass-h100 / jax-cutlass-test-h100
Matrix: amd64 / test-jax / run-unit-test
Matrix: amd64 / test-te-a100 / run-unit-test
Matrix: amd64 / test-te-h100 / te-test-h100
amd64  /  ...  /  launch-slurm-runner
1h 39m
amd64 / test-jax / runner / launch-slurm-runner
amd64  /  test-nsys-jax-eks
39m 17s
amd64 / test-nsys-jax-eks
amd64  /  ...  /  launch-slurm-runner
1h 27m
amd64 / test-te-a100 / runner / launch-slurm-runner
amd64  /  build-upstream-t5x
6m 56s
amd64 / build-upstream-t5x
amd64  /  build-axlearn
6m 0s
amd64 / build-axlearn
Matrix: amd64 / test-nsys-jax / run-unit-test
amd64  /  ...  /  launch-slurm-runner
1h 0m
amd64 / test-nsys-jax / runner / launch-slurm-runner
Matrix: amd64 / test-nccl / nccl-test
Matrix: amd64 / test-nccl / nccl-test-gke / nccl-gke
Matrix: arm64 / test-jax-cutlass-h100 / jax-cutlass-test-h100
Waiting for pending jobs
Matrix: arm64 / test-jax / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-a100 / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-h100 / te-test-h100
Waiting for pending jobs
arm64  /  test-nsys-jax-eks
0s
arm64 / test-nsys-jax-eks
arm64  /  ...  /  launch-slurm-runner
arm64 / test-jax / runner / launch-slurm-runner
arm64  /  ...  /  launch-slurm-runner
arm64 / test-te-a100 / runner / launch-slurm-runner
arm64  /  build-upstream-t5x
11m 16s
arm64 / build-upstream-t5x
Matrix: arm64 / test-nsys-jax / run-unit-test
Waiting for pending jobs
arm64  /  ...  /  launch-slurm-runner
arm64 / test-nsys-jax / runner / launch-slurm-runner
Matrix: arm64 / test-nccl / nccl-test
Waiting for pending jobs
Matrix: arm64 / test-nccl / nccl-test-gke / nccl-gke
Waiting for pending jobs
amd64  /  ...  /  maxtext-gke-xpk
9m 41s
amd64 / test-maxtext-gke / maxtext-gke-xpk
Matrix: amd64 / test-maxtext / maxtext-multinode
Matrix: amd64 / test-maxtext / single-process-multi-device
amd64  /  ...  /  build-rosetta
13m 20s
amd64 / build-rosetta-t5x / build-rosetta
amd64  /  test-axlearn-eks
24m 37s
amd64 / test-axlearn-eks
amd64  /  test-axlearn-fuji-models-eks
13m 39s
amd64 / test-axlearn-fuji-models-eks
Matrix: amd64 / test-nsys-jax-archive
arm64  /  ...  /  maxtext-gke-xpk
arm64 / test-maxtext-gke / maxtext-gke-xpk
Matrix: arm64 / test-maxtext / maxtext-multinode
Waiting for pending jobs
Matrix: arm64 / test-maxtext / single-process-multi-device
Waiting for pending jobs
arm64  /  ...  /  build-rosetta
16m 2s
arm64 / build-rosetta-t5x / build-rosetta
arm64  /  test-axlearn-eks
0s
arm64 / test-axlearn-eks
arm64  /  test-axlearn-fuji-models-eks
0s
arm64 / test-axlearn-fuji-models-eks
Matrix: arm64 / test-nsys-jax-archive
amd64  /  ...  /  test-maxtext-metrics
51s
amd64 / test-maxtext / test-maxtext-metrics
amd64  /  collect-docker-tags
3s
amd64 / collect-docker-tags
Matrix: amd64 / test-rosetta-t5x / vit-multi-gpu-multi-node
arm64  /  ...  /  test-maxtext-metrics
arm64 / test-maxtext / test-maxtext-metrics
arm64  /  collect-docker-tags
4s
arm64 / collect-docker-tags
Matrix: arm64 / test-rosetta-t5x / vit-multi-gpu-multi-node
Waiting for pending jobs
amd64  /  ...  /  sitrep
24s
amd64 / test-maxtext / test-maxtext-sitrep / sitrep
amd64  /  ...  /  test-t5x-rosetta-summary
2s
amd64 / test-rosetta-t5x / test-t5x-rosetta-summary
amd64  /  ...  /  test-t5x-rosetta-metrics
21s
amd64 / test-rosetta-t5x / test-t5x-rosetta-metrics
arm64  /  ...  /  sitrep
arm64 / test-maxtext / test-maxtext-sitrep / sitrep
arm64  /  ...  /  test-t5x-rosetta-summary
arm64 / test-rosetta-t5x / test-t5x-rosetta-summary
arm64  /  ...  /  test-t5x-rosetta-metrics
arm64 / test-rosetta-t5x / test-t5x-rosetta-metrics
amd64  /  ...  /  test-maxtext-outcome
2s
amd64 / test-maxtext / test-maxtext-outcome
amd64  /  ...  /  sitrep
11s
amd64 / test-rosetta-t5x / test-t5x-rosetta-sitrep / sitrep
arm64  /  ...  /  test-maxtext-outcome
arm64 / test-maxtext / test-maxtext-outcome
arm64  /  ...  /  sitrep
arm64 / test-rosetta-t5x / test-t5x-rosetta-sitrep / sitrep
amd64  /  ...  /  test-t5x-rosetta-outcome
7s
amd64 / test-rosetta-t5x / test-t5x-rosetta-outcome
arm64  /  ...  /  test-t5x-rosetta-outcome
arm64 / test-rosetta-t5x / test-t5x-rosetta-outcome
make-publish-configs
3s
make-publish-configs
merge-new-manifest
13s
merge-new-manifest
Matrix: publish-containers
finalize  /  workflow-badge
5s
finalize / workflow-badge
finalize  /  report
32s
finalize / report
finalize  /  upload-badge
20s
finalize / upload-badge
finalize  /  publish-badge
4s
finalize / publish-badge
Fit to window
Zoom out
Zoom in

Annotations

5 errors and 2 warnings
amd64 / test-te-h100 / te-test-h100 (unittest, 8)
Process completed with exit code 1.
amd64 / test-te-a100 / te-A100-unit-test
The self-hosted runner lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
amd64 / test-rosetta-t5x / test-t5x-rosetta-metrics
Process completed with exit code 1.
amd64 / test-rosetta-t5x / test-t5x-rosetta-outcome
Process completed with exit code 1.
amd64 / test-maxtext / test-maxtext-outcome
Process completed with exit code 1.
merge-new-manifest
Unexpected input(s) 'owner_and_repo', valid inputs are ['route', 'mediaType']
merge-new-manifest
Unexpected input(s) 'owner_and_repo', 'head', 'base', 'body', 'title', 'draft', valid inputs are ['route', 'mediaType']

Artifacts

Produced during runtime
Name Size Digest
artifact-axlearn-build-amd64
566 Bytes
sha256:5332077ed29bd0786e39e3c06353d053e9620229e49ac0f893d268aa5d9b0b09
artifact-axlearn-build-arm64
568 Bytes
sha256:e476703ecd010e761d1c8733896eeebe3787f9e5d66addeaa1f69fb75b80f512
artifact-axlearn-test
178 KB
sha256:e95a220c6a91a3892a20cc2ae0478785310bc3d1253ee46f4538a53830ede1c1
artifact-base-build-amd64
567 Bytes
sha256:0adcb113072356b3f9c68e09cc371e02a466ec730fe9972470268e57592b6576
artifact-base-build-arm64
566 Bytes
sha256:b730cb66fb34b39a8e303784c055d23b3d7b6c16c04599ed3920d5b6e0b6746e
artifact-equinox-build-amd64
568 Bytes
sha256:7a5ba398ef2141908e3483c8a0d37d68be28d2118726a36dd4090155d55fb92e
artifact-equinox-build-arm64
569 Bytes
sha256:a6e670a85e2b7e0d382f384f664b5949a9cde5e530565f9ece668c1dcbd5993e
artifact-final-report
3.9 KB
sha256:7892aab7210ffc0307f7c4ca752f1ccd472f76949bddcb32cc63da46372bb5ba
artifact-jax-build-amd64
553 Bytes
sha256:53226cf81ae99774cfb68caf6beea482b20485e05d2fefd7fb2ff3d7a1aed266
artifact-jax-build-arm64
554 Bytes
sha256:2cc77c4e2c9b070f692a3e2ebfd7b1f57b88c66b66728b3780cebe49381ec063
artifact-maxtext-build-amd64
568 Bytes
sha256:008a03e903a59d6baab13b76af2ac0adb536b1f2f62c103ddd5812fe497530cf
artifact-maxtext-build-arm64
568 Bytes
sha256:b973c13c527cd4c4c1258bf0dec621c255e4b59c8df0a68cd115c7cd41d77d3b
artifact-maxtext-test
1.47 KB
sha256:e240775fa9851ea43d3123f4f633975d334ba00622577beb9786235dae487a6b
artifact-mpi-operator-compatible-base-build-amd64
639 Bytes
sha256:18cbf79284c9ba6ca137fc6c6cfd17f56d8331612b968190f0e91899f9a97315
artifact-nccl-gke-build-amd64
571 Bytes
sha256:dc5b8ab80137b21c4473b9c842361ca0995d9b5a49e3ab1de76dbdecb4de004a
artifact-rosetta-build-t5x-amd64
584 Bytes
sha256:f0b7f7426fe1734e5cd98f14733f4b650ca0ff4246a988da961a3eba7aa4186d
artifact-rosetta-build-t5x-arm64
585 Bytes
sha256:5bbc3901dbfbc438f90c02e395e73e1f3803e993c3680d2c73c662b58775c564
artifact-rosetta-t5x-mgmn-test
624 Bytes
sha256:46f8872842ffbb35173b666a247772023fb82ddec2b97b382edded9fe63beca4
artifact-t5x-build-amd64
568 Bytes
sha256:e3e568df2125a12bc25b69441f55f2f689d7b28fe6ab88f5c42fa950e5fa6c9e
artifact-t5x-build-arm64
567 Bytes
sha256:f302f835a26cebb8192e1a76844c48ec9cdef9fcb613ca2db926c95dfebc2d32
artifact-workflow-metadata
278 Bytes
sha256:d276e6f553dd393db02d30dbb2d4d0b3819d42d5e171a5e85a50024d9e38d72f
bumped-manifest
51.5 KB
sha256:32afd9a18480a68b7b6d3dad2f2febfc4567b60c4bb5342a0a88d18e6aaab945
final-axlearn
258 Bytes
sha256:625e3be724de9a6039ea4b0a920eb3c3331ca0bcc10336ce16cfb0e557acaa06
final-base
249 Bytes
sha256:bc690366635a4cefbdb5d51158b549d0cefdc3ee4b47b8e1d3a3df78c6bf2e49
final-equinox
258 Bytes
sha256:7442a95ba87417bccac7775a7ec3f0c1e0f5eb089895ad093f50763573483cc6
final-jax
246 Bytes
sha256:3791b6fbbb142553aac91661519c99a35ba7ce598301e57fd7383199ac7e4886
final-maxtext
258 Bytes
sha256:6a30fe8c47166f207dcf8a0d42d533524ab3b4e4cea15ac9f11832945467a0d7
final-t5x
246 Bytes
sha256:b78aa8d52aa8a1dcccf80f5b2a58970c8162708dcae0ea4f1e6139f51b8999b7
final-upstream-t5x
273 Bytes
sha256:2a5c401227f2731bb1e7b10acce831f94604e2febfedf1e8a56f01f2c6a6b70f
gke-maxtext-train
368 MB
sha256:33fdf11a4799ad91dbd238c77669b6dbbaeb62b7d25091b43d8470cfab8d647c
gke-maxtext-train-sitrep
228 Bytes
sha256:9ee08f1e81c16cb9686b432d3bfe0ef126a81ce4d16bdf8f0bb4b4c4788bb0c1
jax-cutlass-test-H100
1.24 KB
sha256:4b6401e098329942a12f18a9ab01f8bb9bbd6ec34fa8c2c98ae0fefdcc839045
jax-unit-test-A100
22.3 KB
sha256:c6c5ef22e02de097d7cf206d0bd28007148e3e036499330b7ed75c0f4eaa170d
mealkit-axlearn
268 Bytes
sha256:75582554293e2d396d1b628ade8e7102abcb46b0d8c236a944b4aaea80e6a691
mealkit-equinox
269 Bytes
sha256:3cb0f823531759c98b67aac1fca47c5dacf2402ed06a44a789009e59e049f815
mealkit-jax
256 Bytes
sha256:65a334808d421191c5ef1eecad2978b2a70c67806d6ae7a81cd08d0dd46aa1c3
mealkit-maxtext
268 Bytes
sha256:ce1349a13ee97f00e44cae04e84c01443789813919c69a4abbf12521e02ad243
mealkit-t5x
258 Bytes
sha256:484d03db1a10b218ab6d21f61c43f3d931ea9a55d6537d10d6c6c1cc12c8e4a0
mealkit-upstream-t5x
283 Bytes
sha256:d1cdbf7ac9d269cdd1a7d2a5609ca2ec647fc156f6aea3f4d2964aaf8a4aab74
nccl-gke-all-gather
15.4 KB
sha256:e192814ee632a72014247aeb7c33603b188692f348c68f6221895a7fa8bd7d47
nccl-gke-all-gather-sitrep
231 Bytes
sha256:2a29b9dce5cbda40005ad710e566b45f0aff90f08da4b2b5ecb58c41d8e4a8ad
nccl-gke-all-reduce
15.6 KB
sha256:bd73dc91893d2db0d4bc333383cbc77ef8fd0a3f606e635b22865c552ea75b22
nccl-gke-all-reduce-sitrep
231 Bytes
sha256:9c9578977d5980ebbbf4494d9d0b0eef7f2c583e0651c6107511df2703741266
nccl-gke-broadcast
15.2 KB
sha256:98abfb42669c68cd5d5fc560f94caf19d8312782ea25beb61c811025ddbf5e71
nccl-gke-broadcast-sitrep
229 Bytes
sha256:093990d9b04c5c99f6c92e66382e420138e684f63903f50f70b374b55935be3d
nccl-gke-reduce-scatter
15.5 KB
sha256:746b3a86a651c204ee055e0e2be3786572dc90e731ed040a24fd4ad7a73c08cf
nccl-gke-reduce-scatter-sitrep
234 Bytes
sha256:9a5013fd2b784effb5cfb9dd0d6b755ab80478c87da832e25a8bdc63d56e1396
nsys-jax-unit-test-A100
126 MB
sha256:74229ee55b6c83c0d48bc11bb305b76813c50d5c0f36bdc8a7f3e5f1f6334a16
rosetta-t5x-vit-19097543395-VIT8G1N
15.7 KB
sha256:7ee7a0c2674f842c91b46074ad8dd98da315c7ca788df67b579448594a872563
te-unit-test-H100
2.09 MB
sha256:137f400bb636644c01bf284589a77c40297809b868274be4a96ad3f7ec78f543
upstream-maxtext-19097543395-1DP2FSDP4TP1PP_single_process
23.2 KB
sha256:1713ed90a720501afaa68a642724b9a8eac459b5291f79e37dac4d65a0cc5a9c
upstream-maxtext-19097543395-2DP2FSDP2TP1PP
26.9 KB
sha256:f95e51d16ab0fe211ca0d6e0fdca3427dd82866a2d0acd8e333fb1eba2f6842f
upstream-maxtext-metrics-test-log
2.53 KB
sha256:039918a2c2fb141462de85a84170a9ec6462936eea5ae736127bffe2524173d2