Skip to content

(Galaxy) e2e tests

(Galaxy) e2e tests #21

Triggered via schedule February 9, 2026 00:37
Status Failure
Total duration 15h 30m 14s
Artifacts 5

galaxy-e2e-tests.yaml

on: schedule
build-artifact  /  parse-platform
6s
build-artifact / parse-platform
build-artifact  /  ...  /  check-docker-images
17s
build-artifact / build-docker-image / check-docker-images
build-artifact  /  download-artifacts
build-artifact / download-artifacts
build-artifact  /  determine-python-version
2s
build-artifact / determine-python-version
build-artifact  /  ...  /  🐳️ Build Ubuntu images
0s
build-artifact / build-docker-image / 🐳️ Build Ubuntu images
build-artifact  /  ...  /  🐳️ Build ManyLinux image
0s
build-artifact / build-docker-image / 🐳️ Build ManyLinux image
build-artifact  /  ...  /  🔄 Update latest tag
0s
build-artifact / build-docker-image / 🔄 Update latest tag
build-artifact  /  🛠️ Build Release ubuntu 22.04
5m 37s
build-artifact / 🛠️ Build Release ubuntu 22.04
build-artifact  /  ...  /  🐍 Build wheel (Python 3.10)
5m 24s
build-artifact / 🐍 Build wheel (Python 3.10) / 🐍 Build wheel (Python 3.10)
galaxy-e2e-tests  /  load-test-matrix
6s
galaxy-e2e-tests / load-test-matrix
Matrix: galaxy-e2e-tests / galaxy-e2e-tests
Fit to window
Zoom out
Zoom in

Annotations

5 errors and 15 warnings
galaxy-e2e-tests / BH Galaxy Fabric Torus Stability Tests
Process completed with exit code 134.
galaxy-e2e-tests / Galaxy Fabric Multi-mesh 2x8 and 2x4s stability tests
The job has exceeded the maximum execution time of 6h0m0s
galaxy-e2e-tests / Galaxy Fabric 2D Torus Nightly Tests
The job has exceeded the maximum execution time of 6h0m0s
galaxy-e2e-tests / Galaxy CCL tests
Process completed with exit code 1.
galaxy-e2e-tests / Galaxy CCL tests: tests/nightly/tg/ccl/test_minimal_reduce_scatter_async.py#L95
test_reduce_scatter_async[wormhole_b0-mesh_device0-2-1-1-fabric_ring-mem_config_input0-mem_config_rs0-batch_1_sd35_prompt-check-4links] RuntimeError: TT_FATAL @ /project/tt_metal/fabric/fabric.cpp:152: forwarding_direction.has_value() info: Could not find any forwarding direction from src (M0, D0) to dst (M0, D12) backtrace: --- /opt/venv/lib/python3.10/site-packages/ttnn/build/lib/libtt_metal.so(+0x5679e9) [0x7b81717e89e9] --- void tt::tt_fabric::append_fabric_connection_rt_args<tt::tt_metal::Program>(tt::tt_fabric::FabricNodeId const&, tt::tt_fabric::FabricNodeId const&, unsigned int, tt::tt_metal::Program&, tt::xy_pair const&, std::vector<unsigned int, std::allocator<unsigned int> >&, tt::CoreType) --- ttnn::build_ring_reduce_scatter_minimal_async_program_artifacts(tt::tt_metal::Program&, tt::tt_metal::Tensor const&, tt::tt_metal::Tensor const&, tt::tt_metal::distributed::MeshCoordinate const&, std::optional<tt::tt_metal::distributed::MeshCoordinate> const&, std::optional<tt::tt_metal::distributed::MeshCoordinate> const&, tt::tt_metal::Tensor&, unsigned int, unsigned int, unsigned int, unsigned int, tt::tt_fabric::Topology, std::vector<tt::tt_metal::GlobalSemaphore, std::allocator<tt::tt_metal::GlobalSemaphore> > const&, std::optional<tt::tt_metal::GlobalSemaphore> const&, bool, std::optional const&) --- ttnn::experimental::prim::RingReduceScatterMeshWorkloadFactory::create_at(ttnn::experimental::prim::ReduceScatterMinimalAsyncParams const&, tt::tt_metal::distributed::MeshCoordinate const&, ttnn::experimental::prim::ReduceScatterMinimalAsyncInputs const&, std::vector<tt::tt_metal::Tensor, std::allocator<tt::tt_metal::Tensor> >&) --- ttnn::experimental::prim::RingReduceScatterMeshWorkloadFactory::create_mesh_workload(ttnn::experimental::prim::ReduceScatterMinimalAsyncParams const&, tt::tt_metal::distributed::MeshCoordinateRangeSet const&, ttnn::experimental::prim::ReduceScatterMinimalAsyncInputs const&, std::vector<tt::tt_metal::Tensor, std::allocator<tt::tt_metal::Tensor> >&) --- /opt/venv/lib/python3.10/site-packages/ttnn/build/lib/_ttnncpp.so(+0x1793276) [0x7b81739f3276] --- /opt/venv/lib/python3.10/site-packages/ttnn/build/lib/_ttnncpp.so(_ZN4ttnn16device_operation6detail29launch_operation_with_adapterITkNS0_36DeviceOperationWithMeshDeviceAdapterENS0_26MeshDeviceOperationAdapterINS_12experimental4prim40ReduceScatterMinimalAsyncDeviceOperationEEEEEvRKNT_22operation_attributes_tERKNS8_13tensor_args_tERNS8_21tensor_return_value_tEPN2tt8tt_metal11distributed10MeshDeviceE+0x21d) [0x7b81739f10fd] --- /opt/venv/lib/python3.10/site-packages/ttnn/build/lib/_ttnncpp.so(_ZN4ttnn16device_operation6detail6launchITkNS0_22DeviceOperationConceptENS_12experimental4prim40ReduceScatterMinimalAsyncDeviceOperationEEENT_21tensor_return_value_tERKNS6_22operation_attributes_tERKNS6_13tensor_args_tE+0x3f7) [0x7b81739e8867] --- ttnn::prim::reduce_scatter_minimal_async(tt::tt_metal::Tensor const&, std::optional<tt::tt_metal::Tensor> const&, std::optional<tt::tt_metal::Tensor> const&, unsigned int, unsigned int, unsigned int, tt::tt_metal::MemoryConfig, std::optional<tt::tt_metal::MemoryConfig>, tt::tt_fabric::Topology, std::vector<tt::tt_metal::GlobalSemaphore, std::allocator<tt::tt_metal::GlobalSemaphore> >, std::optional<tt::tt_metal::GlobalSemaphore>, bool, std::optional<ttsl::StrongType<unsigned char, tt::tt_metal::SubDeviceIdTag> >, std::optional<unsigned int>, std::optional<unsigned int>, std::optional<unsigned int>, std::optional<unsigned int>) --- /opt/venv/lib/python3.10/site-packages/ttnn/build/lib/_ttnncpp.so(_ZN4ttnn10operations12experimental3ccl32ExecuteReduceScatterMinimalAsync6invokeERKN2tt8tt_metal6TensorERKSt8optionalISt6vectorIS6_SaIS6_EEEiRKSA_INS5_15GlobalSemaphoreESaISG_EERKS9_ISG_EjRKS9_INS5_12MemoryConfigEESR_NS4_9tt_fabric8TopologyES9_IN4ttsl10StrongTypeIhNS5_14SubDeviceIdTagEEEES9_IjESZ_SZ_SZ_+0x428) [0x7b81739d8128] --- /opt/venv/lib/python3.10/site-packages/ttnn/_ttnn.cpython-310-x86_64-linux-gnu.so(+0x271f03) [0x7b8174d40f03] --- /opt/venv/lib/python3.10/site-packages/ttnn/_ttnn.cpython-310-x86_64-lin
build-artifact / 🐍 Build wheel (Python 3.10) / 🐍 Build wheel (Python 3.10)
build_verbosity 2 is not supported for build frontend. Ignoring.
galaxy-e2e-tests / BH Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py#L132
Field "model_name" in OpTest has conflict with protected namespace "model_". You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
galaxy-e2e-tests / BH Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/pydantic/_internal/_config.py#L291
Support for class-based `config` is deprecated, use ConfigDict instead.
galaxy-e2e-tests / BH Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/pydantic/_internal/_config.py#L291
Support for class-based `config` is deprecated, use ConfigDict instead.
galaxy-e2e-tests / BH Galaxy Fabric Torus Stability Tests
No files were found with the provided path: /work/generated/test_reports/. No artifacts will be uploaded.
galaxy-e2e-tests / BH Galaxy Fabric unit tests
No files were found with the provided path: /work/generated/test_reports/. No artifacts will be uploaded.
galaxy-e2e-tests / Galaxy Fabric Multi-Mesh 4x4 and 2x4s stability tests
No files were found with the provided path: /work/generated/test_reports/. No artifacts will be uploaded.
galaxy-e2e-tests / Galaxy Fabric unit tests
No files were found with the provided path: /work/generated/test_reports/. No artifacts will be uploaded.
galaxy-e2e-tests / Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/llama_models/llama3/api/datatypes.py#L162
Pydantic V1 style `@validator` validators are deprecated. You should migrate to Pydantic V2 style `@field_validator` validators, see the migration guide for more details
galaxy-e2e-tests / Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/llama_models/llama3/api/datatypes.py#L137
Pydantic V1 style `@validator` validators are deprecated. You should migrate to Pydantic V2 style `@field_validator` validators, see the migration guide for more details
galaxy-e2e-tests / Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/llama_models/llama3/api/datatypes.py#L120
Pydantic V1 style `@validator` validators are deprecated. You should migrate to Pydantic V2 style `@field_validator` validators, see the migration guide for more details
galaxy-e2e-tests / Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/pydantic/_internal/_config.py#L291
Support for class-based `config` is deprecated, use ConfigDict instead.
galaxy-e2e-tests / Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py#L132
Field "model_name" in OpTest has conflict with protected namespace "model_". You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
galaxy-e2e-tests / Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/pydantic/_internal/_config.py#L291
Support for class-based `config` is deprecated, use ConfigDict instead.
galaxy-e2e-tests / Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/pydantic/_internal/_config.py#L291
Support for class-based `config` is deprecated, use ConfigDict instead.

Artifacts

Produced during runtime
Name Size Digest
TTMetal_build_any_22.04_amd64_x86_64-linux-clang-20-libstdcpp_44e8cdc788cdbd85567fa34509342fab3b862bf1_21808413196
162 MB
sha256:7334d44961b0e3d4152c98fd5a01c90272707277c6df4312b2341df69d87c88a
packages-ubuntu-22.04-amd64-Release-x86_64-linux-clang-20-libstdcpp-44e8cdc788cdbd85567fa34509342fab3b862bf1-21808413196
162 MB
sha256:d69faa87c06df8c66f7441b06721c8188e380ab4f6fdf3d4ce6d02ff49583ac1
test_reports_7fe5562c-c376-4657-b7e4-b8f39e7596b3
1.95 KB
sha256:4439f3737380d548215bac3f0c45d65202686255096e416e5f8a6032ed77aaae
test_reports_f4f34fbd-1407-4166-af0d-ffc02605ff19
15.7 KB
sha256:e920ce9644371692f79b0f2b880199819513c71ce2ddf2a2b9fdddc9c0b564e1
ttnn-dist-cp310-Release-44e8cdc788cdbd85567fa34509342fab3b862bf1-21808413196
30.2 MB
sha256:407e47560ec044f883d94201883f8ef12ff576b548f6fe8f59cc2395daaaa286