Skip to content

(Galaxy) e2e tests

(Galaxy) e2e tests #35

Re-run triggered February 12, 2026 04:12
Status Failure
Total duration 5h 29m 21s
Artifacts 5

galaxy-e2e-tests.yaml

on: workflow_dispatch
build-artifact  /  parse-platform
4s
build-artifact / parse-platform
build-artifact  /  ...  /  check-docker-images
21s
build-artifact / build-docker-image / check-docker-images
build-artifact  /  download-artifacts
build-artifact / download-artifacts
build-artifact  /  determine-python-version
3s
build-artifact / determine-python-version
build-artifact  /  ...  /  🐳️ Build Ubuntu images
build-artifact / build-docker-image / 🐳️ Build Ubuntu images
build-artifact  /  ...  /  🐳️ Build ManyLinux image
build-artifact / build-docker-image / 🐳️ Build ManyLinux image
build-artifact  /  ...  /  🔄 Update latest tag
build-artifact / build-docker-image / 🔄 Update latest tag
build-artifact  /  🛠️ Build Release ubuntu 22.04
8m 46s
build-artifact / 🛠️ Build Release ubuntu 22.04
build-artifact  /  ...  /  🐍 Build wheel (Python 3.10)
8m 47s
build-artifact / 🐍 Build wheel (Python 3.10) / 🐍 Build wheel (Python 3.10)
galaxy-e2e-tests  /  load-test-matrix
7s
galaxy-e2e-tests / load-test-matrix
Matrix: galaxy-e2e-tests / galaxy-e2e-tests
Fit to window
Zoom out
Zoom in

Annotations

18 errors and 15 warnings
galaxy-e2e-tests / Galaxy Fabric unit tests
Value cannot be null. (Parameter 'ContainerId')
galaxy-e2e-tests / Galaxy Fabric unit tests
Value cannot be null. (Parameter 'ContainerId')
galaxy-e2e-tests / Galaxy Fabric unit tests
Value cannot be null. (Parameter 'ContainerId')
galaxy-e2e-tests / Galaxy Fabric unit tests
Docker login for 'ghcr.io' failed with exit code 125
galaxy-e2e-tests / BH Galaxy CCL tests
Value cannot be null. (Parameter 'ContainerId')
galaxy-e2e-tests / BH Galaxy CCL tests
Value cannot be null. (Parameter 'ContainerId')
galaxy-e2e-tests / BH Galaxy CCL tests
Value cannot be null. (Parameter 'ContainerId')
galaxy-e2e-tests / BH Galaxy CCL tests
Docker login for 'ghcr.io' failed with exit code 125
galaxy-e2e-tests / BH Galaxy Socket Pipeline Latency SendRecv Single Galaxy
Value cannot be null. (Parameter 'ContainerId')
galaxy-e2e-tests / BH Galaxy Socket Pipeline Latency SendRecv Single Galaxy
Value cannot be null. (Parameter 'ContainerId')
galaxy-e2e-tests / BH Galaxy Socket Pipeline Latency SendRecv Single Galaxy
Value cannot be null. (Parameter 'ContainerId')
galaxy-e2e-tests / BH Galaxy Socket Pipeline Latency SendRecv Single Galaxy
Docker login for 'ghcr.io' failed with exit code 125
galaxy-e2e-tests / BH Galaxy Fabric Torus Stability Tests
Value cannot be null. (Parameter 'ContainerId')
galaxy-e2e-tests / BH Galaxy Fabric Torus Stability Tests
Value cannot be null. (Parameter 'ContainerId')
galaxy-e2e-tests / BH Galaxy Fabric Torus Stability Tests
Value cannot be null. (Parameter 'ContainerId')
galaxy-e2e-tests / BH Galaxy Fabric Torus Stability Tests
Docker login for 'ghcr.io' failed with exit code 125
galaxy-e2e-tests / Galaxy CCL tests
Process completed with exit code 1.
galaxy-e2e-tests / Galaxy CCL tests: tests/nightly/tg/ccl/test_minimal_reduce_scatter_async.py#L95
test_reduce_scatter_async[wormhole_b0-mesh_device0-2-1-1-fabric_ring-mem_config_input0-mem_config_rs0-batch_1_sd35_prompt-check-4links] RuntimeError: TT_FATAL @ /project/tt_metal/fabric/fabric.cpp:152: forwarding_direction.has_value() info: Could not find any forwarding direction from src (M0, D0) to dst (M0, D12) backtrace: --- /opt/venv/lib/python3.10/site-packages/ttnn/build/lib/libtt_metal.so(+0x574149) [0x7fcded7b7149] --- void tt::tt_fabric::append_fabric_connection_rt_args<tt::tt_metal::Program>(tt::tt_fabric::FabricNodeId const&, tt::tt_fabric::FabricNodeId const&, unsigned int, tt::tt_metal::Program&, tt::xy_pair const&, std::vector<unsigned int, std::allocator<unsigned int> >&, tt::CoreType) --- ttnn::build_ring_reduce_scatter_minimal_async_program_artifacts(tt::tt_metal::Program&, tt::tt_metal::Tensor const&, tt::tt_metal::Tensor const&, tt::tt_metal::distributed::MeshCoordinate const&, std::optional<tt::tt_metal::distributed::MeshCoordinate> const&, std::optional<tt::tt_metal::distributed::MeshCoordinate> const&, tt::tt_metal::Tensor&, unsigned int, unsigned int, unsigned int, unsigned int, tt::tt_fabric::Topology, std::vector<tt::tt_metal::GlobalSemaphore, std::allocator<tt::tt_metal::GlobalSemaphore> > const&, std::optional<tt::tt_metal::GlobalSemaphore> const&, bool, std::optional const&) --- ttnn::experimental::prim::RingReduceScatterMeshWorkloadFactory::create_at(ttnn::experimental::prim::ReduceScatterMinimalAsyncParams const&, tt::tt_metal::distributed::MeshCoordinate const&, ttnn::experimental::prim::ReduceScatterMinimalAsyncInputs const&, std::vector<tt::tt_metal::Tensor, std::allocator<tt::tt_metal::Tensor> >&) --- ttnn::experimental::prim::RingReduceScatterMeshWorkloadFactory::create_mesh_workload(ttnn::experimental::prim::ReduceScatterMinimalAsyncParams const&, tt::tt_metal::distributed::MeshCoordinateRangeSet const&, ttnn::experimental::prim::ReduceScatterMinimalAsyncInputs const&, std::vector<tt::tt_metal::Tensor, std::allocator<tt::tt_metal::Tensor> >&) --- /opt/venv/lib/python3.10/site-packages/ttnn/build/lib/_ttnncpp.so(+0x17a6219) [0x7fcdef9db219] --- /opt/venv/lib/python3.10/site-packages/ttnn/build/lib/_ttnncpp.so(_ZN4ttnn16device_operation6detail29launch_operation_with_adapterITkNS0_36DeviceOperationWithMeshDeviceAdapterENS0_26MeshDeviceOperationAdapterINS_12experimental4prim40ReduceScatterMinimalAsyncDeviceOperationEEEEEvRKNT_22operation_attributes_tERKNS8_13tensor_args_tERNS8_21tensor_return_value_tEPN2tt8tt_metal11distributed10MeshDeviceE+0x21d) [0x7fcdef9d909d] --- /opt/venv/lib/python3.10/site-packages/ttnn/build/lib/_ttnncpp.so(_ZN4ttnn16device_operation6detail6launchITkNS0_22DeviceOperationConceptENS_12experimental4prim40ReduceScatterMinimalAsyncDeviceOperationEEENT_21tensor_return_value_tERKNS6_22operation_attributes_tERKNS6_13tensor_args_tE+0x3f7) [0x7fcdef9d0687] --- ttnn::prim::reduce_scatter_minimal_async(tt::tt_metal::Tensor const&, std::optional<tt::tt_metal::Tensor> const&, std::optional<tt::tt_metal::Tensor> const&, unsigned int, unsigned int, unsigned int, tt::tt_metal::MemoryConfig, std::optional<tt::tt_metal::MemoryConfig>, tt::tt_fabric::Topology, std::vector<tt::tt_metal::GlobalSemaphore, std::allocator<tt::tt_metal::GlobalSemaphore> >, std::optional<tt::tt_metal::GlobalSemaphore>, bool, std::optional<ttsl::StrongType<unsigned char, tt::tt_metal::SubDeviceIdTag> >, std::optional<unsigned int>, std::optional<unsigned int>, std::optional<unsigned int>, std::optional<unsigned int>) --- /opt/venv/lib/python3.10/site-packages/ttnn/build/lib/_ttnncpp.so(_ZN4ttnn10operations12experimental3ccl32ExecuteReduceScatterMinimalAsync6invokeERKN2tt8tt_metal6TensorERKSt8optionalISt6vectorIS6_SaIS6_EEEiRKSA_INS5_15GlobalSemaphoreESaISG_EERKS9_ISG_EjRKS9_INS5_12MemoryConfigEESR_NS4_9tt_fabric8TopologyES9_IN4ttsl10StrongTypeIhNS5_14SubDeviceIdTagEEEES9_IjESZ_SZ_SZ_+0x428) [0x7fcdef9bff48] --- /opt/venv/lib/python3.10/site-packages/ttnn/_ttnn.cpython-310-x86_64-linux-gnu.so(+0x2722e3) [0x7fcdf0d442e3] --- /opt/venv/lib/python3.10/site-packages/ttnn/_ttnn.cpython-310-x86_64-lin
galaxy-e2e-tests / Galaxy Fabric unit tests
Docker login for 'ghcr.io' failed with exit code 125, back off 2.088 seconds before retry.
galaxy-e2e-tests / Galaxy Fabric unit tests
Docker login for 'ghcr.io' failed with exit code 125, back off 7.986 seconds before retry.
galaxy-e2e-tests / BH Galaxy CCL tests
Docker login for 'ghcr.io' failed with exit code 125, back off 3.288 seconds before retry.
galaxy-e2e-tests / BH Galaxy CCL tests
Docker login for 'ghcr.io' failed with exit code 125, back off 5.057 seconds before retry.
galaxy-e2e-tests / BH Galaxy Socket Pipeline Latency SendRecv Single Galaxy
Docker login for 'ghcr.io' failed with exit code 125, back off 1.758 seconds before retry.
galaxy-e2e-tests / BH Galaxy Socket Pipeline Latency SendRecv Single Galaxy
Docker login for 'ghcr.io' failed with exit code 125, back off 9.796 seconds before retry.
galaxy-e2e-tests / BH Galaxy Fabric Torus Stability Tests
Docker login for 'ghcr.io' failed with exit code 125, back off 6.739 seconds before retry.
galaxy-e2e-tests / BH Galaxy Fabric Torus Stability Tests
Docker login for 'ghcr.io' failed with exit code 125, back off 2.34 seconds before retry.
galaxy-e2e-tests / Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/llama_models/llama3/api/datatypes.py#L162
Pydantic V1 style `@validator` validators are deprecated. You should migrate to Pydantic V2 style `@field_validator` validators, see the migration guide for more details
galaxy-e2e-tests / Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/llama_models/llama3/api/datatypes.py#L137
Pydantic V1 style `@validator` validators are deprecated. You should migrate to Pydantic V2 style `@field_validator` validators, see the migration guide for more details
galaxy-e2e-tests / Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/llama_models/llama3/api/datatypes.py#L120
Pydantic V1 style `@validator` validators are deprecated. You should migrate to Pydantic V2 style `@field_validator` validators, see the migration guide for more details
galaxy-e2e-tests / Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/pydantic/_internal/_config.py#L291
Support for class-based `config` is deprecated, use ConfigDict instead.
galaxy-e2e-tests / Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py#L132
Field "model_name" in OpTest has conflict with protected namespace "model_". You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
galaxy-e2e-tests / Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/pydantic/_internal/_config.py#L291
Support for class-based `config` is deprecated, use ConfigDict instead.
galaxy-e2e-tests / Galaxy CCL tests: /opt/venv/lib/python3.10/site-packages/pydantic/_internal/_config.py#L291
Support for class-based `config` is deprecated, use ConfigDict instead.

Artifacts

Produced during runtime
Name Size Digest
test_reports_ec02438d-5193-4558-b7f1-4f8eb3a93783
15.8 KB
sha256:b702731ee5790906e6ad2cd06463b66934fd8ce23c56d7025b72b5102265826d