Skip to content

CI

CI #5019

Triggered via schedule November 6, 2025 09:34
Status Failure
Total duration 3h 34m 38s
Artifacts 53

ci.yaml

on: schedule
metadata
3s
metadata
bump-manifest
16s
bump-manifest
Matrix: amd64 / test-distribution
Matrix: arm64 / test-distribution
amd64  /  ...  /  build-base
2m 38s
amd64 / build-base / build-base
arm64  /  ...  /  build-base
3m 11s
arm64 / build-base / build-base
amd64  /  ...  /  build-mpi-operator-compatible-base
1m 32s
amd64 / test-nccl / build-mpi-operator-compatible-base
amd64  /  ...  /  build-nccl-gke
2m 53s
amd64 / test-nccl / nccl-test-gke / build-nccl-gke
arm64  /  ...  /  build-mpi-operator-compatible-base
arm64 / test-nccl / build-mpi-operator-compatible-base
arm64  /  ...  /  build-nccl-gke
arm64 / test-nccl / nccl-test-gke / build-nccl-gke
Matrix: amd64 / test-jax-cutlass-h100 / jax-cutlass-test-h100
Matrix: amd64 / test-jax / run-unit-test
Matrix: amd64 / test-te-a100 / run-unit-test
Matrix: amd64 / test-te-h100 / te-test-h100
amd64  /  ...  /  launch-slurm-runner
28m 57s
amd64 / test-jax / runner / launch-slurm-runner
amd64  /  test-nsys-jax-eks
4m 7s
amd64 / test-nsys-jax-eks
amd64  /  ...  /  launch-slurm-runner
2h 29m
amd64 / test-te-a100 / runner / launch-slurm-runner
amd64  /  build-upstream-t5x
7m 25s
amd64 / build-upstream-t5x
amd64  /  build-axlearn
5m 6s
amd64 / build-axlearn
Matrix: amd64 / test-nsys-jax / run-unit-test
amd64  /  build-equinox
4m 3s
amd64 / build-equinox
amd64  /  ...  /  launch-slurm-runner
2h 41m
amd64 / test-nsys-jax / runner / launch-slurm-runner
Matrix: amd64 / test-nccl / nccl-test
Matrix: amd64 / test-nccl / nccl-test-gke / nccl-gke
Matrix: arm64 / test-jax-cutlass-h100 / jax-cutlass-test-h100
Waiting for pending jobs
Matrix: arm64 / test-jax / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-a100 / run-unit-test
Waiting for pending jobs
Matrix: arm64 / test-te-h100 / te-test-h100
Waiting for pending jobs
arm64  /  test-nsys-jax-eks
0s
arm64 / test-nsys-jax-eks
arm64  /  ...  /  launch-slurm-runner
arm64 / test-jax / runner / launch-slurm-runner
arm64  /  ...  /  launch-slurm-runner
arm64 / test-te-a100 / runner / launch-slurm-runner
arm64  /  build-upstream-t5x
8m 52s
arm64 / build-upstream-t5x
Matrix: arm64 / test-nsys-jax / run-unit-test
Waiting for pending jobs
arm64  /  ...  /  launch-slurm-runner
arm64 / test-nsys-jax / runner / launch-slurm-runner
Matrix: arm64 / test-nccl / nccl-test
Waiting for pending jobs
Matrix: arm64 / test-nccl / nccl-test-gke / nccl-gke
Waiting for pending jobs
amd64  /  ...  /  maxtext-gke-xpk
9m 25s
amd64 / test-maxtext-gke / maxtext-gke-xpk
Matrix: amd64 / test-maxtext / maxtext-multinode
Matrix: amd64 / test-maxtext / single-process-multi-device
amd64  /  ...  /  build-rosetta
14m 11s
amd64 / build-rosetta-t5x / build-rosetta
amd64  /  test-axlearn-eks
30m 24s
amd64 / test-axlearn-eks
amd64  /  test-axlearn-fuji-models-eks
5m 22s
amd64 / test-axlearn-fuji-models-eks
Matrix: amd64 / test-nsys-jax-archive
arm64  /  ...  /  maxtext-gke-xpk
arm64 / test-maxtext-gke / maxtext-gke-xpk
Matrix: arm64 / test-maxtext / maxtext-multinode
Waiting for pending jobs
Matrix: arm64 / test-maxtext / single-process-multi-device
Waiting for pending jobs
arm64  /  ...  /  build-rosetta
16m 4s
arm64 / build-rosetta-t5x / build-rosetta
arm64  /  test-axlearn-eks
0s
arm64 / test-axlearn-eks
arm64  /  test-axlearn-fuji-models-eks
0s
arm64 / test-axlearn-fuji-models-eks
Matrix: arm64 / test-nsys-jax-archive
amd64  /  ...  /  test-maxtext-metrics
23s
amd64 / test-maxtext / test-maxtext-metrics
amd64  /  collect-docker-tags
3s
amd64 / collect-docker-tags
Matrix: amd64 / test-rosetta-t5x / vit-multi-gpu-multi-node
arm64  /  ...  /  test-maxtext-metrics
arm64 / test-maxtext / test-maxtext-metrics
arm64  /  collect-docker-tags
4s
arm64 / collect-docker-tags
Matrix: arm64 / test-rosetta-t5x / vit-multi-gpu-multi-node
Waiting for pending jobs
amd64  /  ...  /  sitrep
19s
amd64 / test-maxtext / test-maxtext-sitrep / sitrep
amd64  /  ...  /  test-t5x-rosetta-summary
2s
amd64 / test-rosetta-t5x / test-t5x-rosetta-summary
amd64  /  ...  /  test-t5x-rosetta-metrics
21s
amd64 / test-rosetta-t5x / test-t5x-rosetta-metrics
arm64  /  ...  /  sitrep
arm64 / test-maxtext / test-maxtext-sitrep / sitrep
arm64  /  ...  /  test-t5x-rosetta-summary
arm64 / test-rosetta-t5x / test-t5x-rosetta-summary
arm64  /  ...  /  test-t5x-rosetta-metrics
arm64 / test-rosetta-t5x / test-t5x-rosetta-metrics
amd64  /  ...  /  test-maxtext-outcome
3s
amd64 / test-maxtext / test-maxtext-outcome
amd64  /  ...  /  sitrep
25s
amd64 / test-rosetta-t5x / test-t5x-rosetta-sitrep / sitrep
arm64  /  ...  /  test-maxtext-outcome
arm64 / test-maxtext / test-maxtext-outcome
arm64  /  ...  /  sitrep
arm64 / test-rosetta-t5x / test-t5x-rosetta-sitrep / sitrep
amd64  /  ...  /  test-t5x-rosetta-outcome
9s
amd64 / test-rosetta-t5x / test-t5x-rosetta-outcome
arm64  /  ...  /  test-t5x-rosetta-outcome
arm64 / test-rosetta-t5x / test-t5x-rosetta-outcome
make-publish-configs
3s
make-publish-configs
merge-new-manifest
10s
merge-new-manifest
Matrix: publish-containers
finalize  /  workflow-badge
4s
finalize / workflow-badge
finalize  /  report
13s
finalize / report
finalize  /  upload-badge
23s
finalize / upload-badge
finalize  /  publish-badge
6s
finalize / publish-badge
Fit to window
Zoom out
Zoom in

Annotations

5 errors and 2 warnings
amd64 / test-te-h100 / te-test-h100 (unittest, 8)
Process completed with exit code 1.
amd64 / test-te-a100 / te-A100-unit-test
The self-hosted runner lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
amd64 / test-rosetta-t5x / test-t5x-rosetta-metrics
Process completed with exit code 1.
amd64 / test-rosetta-t5x / test-t5x-rosetta-outcome
Process completed with exit code 1.
amd64 / test-maxtext / test-maxtext-outcome
Process completed with exit code 1.
merge-new-manifest
Unexpected input(s) 'owner_and_repo', valid inputs are ['route', 'mediaType']
merge-new-manifest
Unexpected input(s) 'owner_and_repo', 'head', 'base', 'body', 'title', 'draft', valid inputs are ['route', 'mediaType']

Artifacts

Produced during runtime
Name Size Digest
artifact-axlearn-build-amd64
565 Bytes
sha256:76e33e7613ee1359f022cd243fc713556148db8a181c842ebd23161da023a2f8
artifact-axlearn-build-arm64
567 Bytes
sha256:3de821382fcab9e8b90f4421ceff111e089e273e4fee10e99df76a1f2d6465b5
artifact-axlearn-test
181 KB
sha256:d3aa9424575d1e65b9d75bf6a0876f5a3c0f8bf856a212c97abde0780fa2b2d8
artifact-base-build-amd64
569 Bytes
sha256:8b8ba1478f1cfd622ebb942f43ee9fca6063ad76b89d559665706a26a6b97400
artifact-base-build-arm64
565 Bytes
sha256:f77a50dd2482f8887348a693f2f3633d8f7e7091f5df4362ea334ad7d1b5bdf7
artifact-equinox-build-amd64
568 Bytes
sha256:2f7a988fdd677f54845511b75391de61ff9793ec0a9093ca403275f4a14c44c1
artifact-equinox-build-arm64
568 Bytes
sha256:076f56695a5a48a405c8d3aa16865f2870f38a81f40398d4165faa6b71b784fc
artifact-final-report
3.9 KB
sha256:46de043e965dc3ba1b29fbbc7f31dab0af99694de5967d27bd343da66cf4c161
artifact-jax-build-amd64
553 Bytes
sha256:007fc0bac2162c62e833a802840bc74ca2b39ab98f7ae4a2c6f0a22dd8cc5025
artifact-jax-build-arm64
554 Bytes
sha256:2dcd8c318eb4567b07b51a86ec717ed24e969a20440c2890ea587cd608432f0f
artifact-maxtext-build-amd64
568 Bytes
sha256:74d3b4fd983e154489723ab540828b36948327a273ca3d6f17d5e9037ce02c24
artifact-maxtext-build-arm64
569 Bytes
sha256:d3e1d1342d94dd2660552c13b172dccda2b53d611d49b4930547aa74c4ee9959
artifact-maxtext-test
1.46 KB
sha256:e48c2297ab5907c24d57d568b05e620295cb982317825a34f66948d7dfdbe3a4
artifact-mpi-operator-compatible-base-build-amd64
639 Bytes
sha256:d962f5df7b517072be37bfbcd35c488856c970834f95c59655079b4ce0ee4858
artifact-nccl-gke-build-amd64
571 Bytes
sha256:fe153308e79a1305988cb9c43d8ccec78a836bb2097d8a743742fadce0d6eca7
artifact-rosetta-build-t5x-amd64
585 Bytes
sha256:ae57f9dbd0c3e24905c68fe4ae68bc4d8ccee191de67e096097c9a7789ad64a7
artifact-rosetta-build-t5x-arm64
585 Bytes
sha256:091b5bbdcc572416e3a7717651b620572505eca1af189dd561687c4929a8a754
artifact-rosetta-t5x-mgmn-test
624 Bytes
sha256:3b77974ea18b9e9d8be1261b629a9bf6c2dda8e7261092297763929ef7485c35
artifact-t5x-build-amd64
569 Bytes
sha256:49d253a849843e718fe381ee2c8027a6e93414a3a9e600036d1ea8e00b1ecfcb
artifact-t5x-build-arm64
568 Bytes
sha256:a5ee361d35c237586e0158d2e2d2b67bfbe9b1c1c446fc28a2c3e15a738a195b
artifact-workflow-metadata
277 Bytes
sha256:8b60e24a1344befb4418fdc420965fc963a7aee8ee955f5dd2293f9c445e2a71
bumped-manifest
51.5 KB
sha256:ca25dda8476ab03ddf7a992ff0f136704881cbf56791b721e695d5f0d9dd2041
final-axlearn
258 Bytes
sha256:07c6bac9c074b8db6bb88c938d23343bbfc501c2b83ebff7890ceea89f557f57
final-base
249 Bytes
sha256:09f0253aee7338b53bbc427c14770c4da0b6f5f6f2fcc6a7c4722433da638fa7
final-equinox
258 Bytes
sha256:ad780373837472070603d203e33addc500490c90d3775acdd8300e4672e7e06d
final-jax
246 Bytes
sha256:0ba324c4e04bd5f400cf5b82d3952f79e7b474065603e45b6dad8bb311664b85
final-maxtext
258 Bytes
sha256:132fd1b6880592ebf55594a94147d7a74e1e969776b6abb932c71dcda754d7a5
final-t5x
246 Bytes
sha256:8463049ade984a14824cf87cd6af6003c8cce711448420dc4bda28c71c7937af
final-upstream-t5x
273 Bytes
sha256:f5038baf738581320ba30d301565db6b429e5f0deab89d6ecd4f829bf301e513
gke-maxtext-train
366 MB
sha256:de739e6255518292682482a2071c9036b40fe84d3ec149b510457e17ec83eae8
gke-maxtext-train-sitrep
228 Bytes
sha256:9c894dbdf65bd483f753029e43cf84b5bc8b8ef3a3f13f06a4cbf4a8bf7882a2
jax-cutlass-test-H100
1.24 KB
sha256:27386278d455c1ca94a6966c30958d9cb5e11bc6b59aba844e06986a0bc58de9
jax-unit-test-A100
22.2 KB
sha256:2f45ccc074db227df778b31077ab9fc08ad308fdd63c4a7efa5e612fa99229f9
mealkit-axlearn
269 Bytes
sha256:4c201bc65e137273c160f6be0996aecb2968dd8cbcb4a423b8f873fd8d26a863
mealkit-equinox
270 Bytes
sha256:dea706af60a090779f0b49fa463ddb136749c1e26cff1be2fdb278546bf0ebba
mealkit-jax
256 Bytes
sha256:a945f12d3b47873a312ede96d11c7fddd2cf670a6302d769fb4c960c7c1f8241
mealkit-maxtext
268 Bytes
sha256:90b37bf0afdb5c079f4ca22be34b1a797329d444ab902aa281aa70382d5082df
mealkit-t5x
258 Bytes
sha256:950ec847d2156c0e44abd9aa142894b8fac1febb879611ffc850fd2c8ea97552
mealkit-upstream-t5x
282 Bytes
sha256:a0ce91088e77fcbb7f65f27450b1a4cf7543dee463cc8028a9c8de61c508f82f
nccl-gke-all-gather
15.4 KB
sha256:553b530314cb6da52fea3c1ca607a1c5f2c859a04b322e56bc1a0ad21b0ad5b8
nccl-gke-all-gather-sitrep
231 Bytes
sha256:e12d6c2763f8d998acbb670c57d1eb15bd7a11ac21b8f151f7ffa6457e5910c5
nccl-gke-all-reduce
15.6 KB
sha256:4725479e411cd30d2c4468d23064f424383a222d373efbf3871d5f6fb4df4fa8
nccl-gke-all-reduce-sitrep
231 Bytes
sha256:49261de4204a487df0774151ce1203400324938af5873a1b4384ebdc2171e7c3
nccl-gke-broadcast
15.2 KB
sha256:7ad500559a569a48f19e354e98e310d252e4c370e43c343a702f7554d11076b6
nccl-gke-broadcast-sitrep
229 Bytes
sha256:253ae2a2d15ae3a52ae8f362573e7bad10287f848c78b2abf64071b0bd6013f3
nccl-gke-reduce-scatter
15.4 KB
sha256:998b1f9cc7d5cbb185dc53c78ebd97c826c909de70808fd541cad7c083abede9
nccl-gke-reduce-scatter-sitrep
234 Bytes
sha256:db0f53b6dba1ff0502c21f3781c7bc277f597ab18155c479d4b92839ce8ad441
nsys-jax-unit-test-A100
126 MB
sha256:4584da9a989a7601b55bba4aa95dd2a53746450f010a871faedae9c18aaf3f38
rosetta-t5x-vit-19131176479-VIT8G1N
15.4 KB
sha256:bd3778d1f714792d9202002a590524099bba4ceaed5a30a96c4fae7e3034de34
te-unit-test-H100
2.08 MB
sha256:ea08fa7ffcae3f94d772f84addf6b362e691093f13c8293893cc07be4f2bea75
upstream-maxtext-19131176479-1DP2FSDP4TP1PP_single_process
23.5 KB
sha256:57487e977844abbe884b620c4be36885b3f6277d3f6bd78a56a90878cb4dde71
upstream-maxtext-19131176479-2DP2FSDP2TP1PP
29 KB
sha256:0d3bdb389f2f42c1a0f1840da9bf612733c18bc1e5203c51e4bce0b05dee0973
upstream-maxtext-metrics-test-log
2.51 KB
sha256:678ac79a374f435e12fccd34ee890d413def79859872d864fe23d5647fab1b7e