(Galaxy) e2e tests #35
galaxy-e2e-tests.yaml
on: workflow_dispatch
build-artifact
/
parse-platform
4s
build-artifact
/
...
/
check-docker-images
21s
build-artifact
/
download-artifacts
build-artifact
/
...
/
🐳️ Build Ubuntu images
build-artifact
/
...
/
🐳️ Build ManyLinux image
build-artifact
/
...
/
🔄 Update latest tag
build-artifact
/
🛠️ Build Release ubuntu 22.04
8m 46s
build-artifact
/
...
/
🐍 Build wheel (Python 3.10)
8m 47s
Matrix: galaxy-e2e-tests / galaxy-e2e-tests
Annotations
18 errors and 15 warnings
|
galaxy-e2e-tests / Galaxy Fabric unit tests
Value cannot be null. (Parameter 'ContainerId')
|
|
galaxy-e2e-tests / Galaxy Fabric unit tests
Value cannot be null. (Parameter 'ContainerId')
|
|
galaxy-e2e-tests / Galaxy Fabric unit tests
Value cannot be null. (Parameter 'ContainerId')
|
|
galaxy-e2e-tests / Galaxy Fabric unit tests
Docker login for 'ghcr.io' failed with exit code 125
|
|
galaxy-e2e-tests / BH Galaxy CCL tests
Value cannot be null. (Parameter 'ContainerId')
|
|
galaxy-e2e-tests / BH Galaxy CCL tests
Value cannot be null. (Parameter 'ContainerId')
|
|
galaxy-e2e-tests / BH Galaxy CCL tests
Value cannot be null. (Parameter 'ContainerId')
|
|
galaxy-e2e-tests / BH Galaxy CCL tests
Docker login for 'ghcr.io' failed with exit code 125
|
|
galaxy-e2e-tests / BH Galaxy Socket Pipeline Latency SendRecv Single Galaxy
Value cannot be null. (Parameter 'ContainerId')
|
|
galaxy-e2e-tests / BH Galaxy Socket Pipeline Latency SendRecv Single Galaxy
Value cannot be null. (Parameter 'ContainerId')
|
|
galaxy-e2e-tests / BH Galaxy Socket Pipeline Latency SendRecv Single Galaxy
Value cannot be null. (Parameter 'ContainerId')
|
|
galaxy-e2e-tests / BH Galaxy Socket Pipeline Latency SendRecv Single Galaxy
Docker login for 'ghcr.io' failed with exit code 125
|
|
galaxy-e2e-tests / BH Galaxy Fabric Torus Stability Tests
Value cannot be null. (Parameter 'ContainerId')
|
|
galaxy-e2e-tests / BH Galaxy Fabric Torus Stability Tests
Value cannot be null. (Parameter 'ContainerId')
|
|
galaxy-e2e-tests / BH Galaxy Fabric Torus Stability Tests
Value cannot be null. (Parameter 'ContainerId')
|
|
galaxy-e2e-tests / BH Galaxy Fabric Torus Stability Tests
Docker login for 'ghcr.io' failed with exit code 125
|
|
galaxy-e2e-tests / Galaxy CCL tests
Process completed with exit code 1.
|
|
galaxy-e2e-tests / Galaxy CCL tests:
tests/nightly/tg/ccl/test_minimal_reduce_scatter_async.py#L95
test_reduce_scatter_async[wormhole_b0-mesh_device0-2-1-1-fabric_ring-mem_config_input0-mem_config_rs0-batch_1_sd35_prompt-check-4links]
RuntimeError: TT_FATAL @ /project/tt_metal/fabric/fabric.cpp:152: forwarding_direction.has_value()
info:
Could not find any forwarding direction from src (M0, D0) to dst (M0, D12)
backtrace:
--- /opt/venv/lib/python3.10/site-packages/ttnn/build/lib/libtt_metal.so(+0x574149) [0x7fcded7b7149]
--- void tt::tt_fabric::append_fabric_connection_rt_args<tt::tt_metal::Program>(tt::tt_fabric::FabricNodeId const&, tt::tt_fabric::FabricNodeId const&, unsigned int, tt::tt_metal::Program&, tt::xy_pair const&, std::vector<unsigned int, std::allocator<unsigned int> >&, tt::CoreType)
--- ttnn::build_ring_reduce_scatter_minimal_async_program_artifacts(tt::tt_metal::Program&, tt::tt_metal::Tensor const&, tt::tt_metal::Tensor const&, tt::tt_metal::distributed::MeshCoordinate const&, std::optional<tt::tt_metal::distributed::MeshCoordinate> const&, std::optional<tt::tt_metal::distributed::MeshCoordinate> const&, tt::tt_metal::Tensor&, unsigned int, unsigned int, unsigned int, unsigned int, tt::tt_fabric::Topology, std::vector<tt::tt_metal::GlobalSemaphore, std::allocator<tt::tt_metal::GlobalSemaphore> > const&, std::optional<tt::tt_metal::GlobalSemaphore> const&, bool, std::optional const&)
--- ttnn::experimental::prim::RingReduceScatterMeshWorkloadFactory::create_at(ttnn::experimental::prim::ReduceScatterMinimalAsyncParams const&, tt::tt_metal::distributed::MeshCoordinate const&, ttnn::experimental::prim::ReduceScatterMinimalAsyncInputs const&, std::vector<tt::tt_metal::Tensor, std::allocator<tt::tt_metal::Tensor> >&)
--- ttnn::experimental::prim::RingReduceScatterMeshWorkloadFactory::create_mesh_workload(ttnn::experimental::prim::ReduceScatterMinimalAsyncParams const&, tt::tt_metal::distributed::MeshCoordinateRangeSet const&, ttnn::experimental::prim::ReduceScatterMinimalAsyncInputs const&, std::vector<tt::tt_metal::Tensor, std::allocator<tt::tt_metal::Tensor> >&)
--- /opt/venv/lib/python3.10/site-packages/ttnn/build/lib/_ttnncpp.so(+0x17a6219) [0x7fcdef9db219]
--- /opt/venv/lib/python3.10/site-packages/ttnn/build/lib/_ttnncpp.so(_ZN4ttnn16device_operation6detail29launch_operation_with_adapterITkNS0_36DeviceOperationWithMeshDeviceAdapterENS0_26MeshDeviceOperationAdapterINS_12experimental4prim40ReduceScatterMinimalAsyncDeviceOperationEEEEEvRKNT_22operation_attributes_tERKNS8_13tensor_args_tERNS8_21tensor_return_value_tEPN2tt8tt_metal11distributed10MeshDeviceE+0x21d) [0x7fcdef9d909d]
--- /opt/venv/lib/python3.10/site-packages/ttnn/build/lib/_ttnncpp.so(_ZN4ttnn16device_operation6detail6launchITkNS0_22DeviceOperationConceptENS_12experimental4prim40ReduceScatterMinimalAsyncDeviceOperationEEENT_21tensor_return_value_tERKNS6_22operation_attributes_tERKNS6_13tensor_args_tE+0x3f7) [0x7fcdef9d0687]
--- ttnn::prim::reduce_scatter_minimal_async(tt::tt_metal::Tensor const&, std::optional<tt::tt_metal::Tensor> const&, std::optional<tt::tt_metal::Tensor> const&, unsigned int, unsigned int, unsigned int, tt::tt_metal::MemoryConfig, std::optional<tt::tt_metal::MemoryConfig>, tt::tt_fabric::Topology, std::vector<tt::tt_metal::GlobalSemaphore, std::allocator<tt::tt_metal::GlobalSemaphore> >, std::optional<tt::tt_metal::GlobalSemaphore>, bool, std::optional<ttsl::StrongType<unsigned char, tt::tt_metal::SubDeviceIdTag> >, std::optional<unsigned int>, std::optional<unsigned int>, std::optional<unsigned int>, std::optional<unsigned int>)
--- /opt/venv/lib/python3.10/site-packages/ttnn/build/lib/_ttnncpp.so(_ZN4ttnn10operations12experimental3ccl32ExecuteReduceScatterMinimalAsync6invokeERKN2tt8tt_metal6TensorERKSt8optionalISt6vectorIS6_SaIS6_EEEiRKSA_INS5_15GlobalSemaphoreESaISG_EERKS9_ISG_EjRKS9_INS5_12MemoryConfigEESR_NS4_9tt_fabric8TopologyES9_IN4ttsl10StrongTypeIhNS5_14SubDeviceIdTagEEEES9_IjESZ_SZ_SZ_+0x428) [0x7fcdef9bff48]
--- /opt/venv/lib/python3.10/site-packages/ttnn/_ttnn.cpython-310-x86_64-linux-gnu.so(+0x2722e3) [0x7fcdf0d442e3]
--- /opt/venv/lib/python3.10/site-packages/ttnn/_ttnn.cpython-310-x86_64-lin
|
|
galaxy-e2e-tests / Galaxy Fabric unit tests
Docker login for 'ghcr.io' failed with exit code 125, back off 2.088 seconds before retry.
|
|
galaxy-e2e-tests / Galaxy Fabric unit tests
Docker login for 'ghcr.io' failed with exit code 125, back off 7.986 seconds before retry.
|
|
galaxy-e2e-tests / BH Galaxy CCL tests
Docker login for 'ghcr.io' failed with exit code 125, back off 3.288 seconds before retry.
|
|
galaxy-e2e-tests / BH Galaxy CCL tests
Docker login for 'ghcr.io' failed with exit code 125, back off 5.057 seconds before retry.
|
|
galaxy-e2e-tests / BH Galaxy Socket Pipeline Latency SendRecv Single Galaxy
Docker login for 'ghcr.io' failed with exit code 125, back off 1.758 seconds before retry.
|
|
galaxy-e2e-tests / BH Galaxy Socket Pipeline Latency SendRecv Single Galaxy
Docker login for 'ghcr.io' failed with exit code 125, back off 9.796 seconds before retry.
|
|
galaxy-e2e-tests / BH Galaxy Fabric Torus Stability Tests
Docker login for 'ghcr.io' failed with exit code 125, back off 6.739 seconds before retry.
|
|
galaxy-e2e-tests / BH Galaxy Fabric Torus Stability Tests
Docker login for 'ghcr.io' failed with exit code 125, back off 2.34 seconds before retry.
|
|
galaxy-e2e-tests / Galaxy CCL tests:
/opt/venv/lib/python3.10/site-packages/llama_models/llama3/api/datatypes.py#L162
Pydantic V1 style `@validator` validators are deprecated. You should migrate to Pydantic V2 style `@field_validator` validators, see the migration guide for more details
|
|
galaxy-e2e-tests / Galaxy CCL tests:
/opt/venv/lib/python3.10/site-packages/llama_models/llama3/api/datatypes.py#L137
Pydantic V1 style `@validator` validators are deprecated. You should migrate to Pydantic V2 style `@field_validator` validators, see the migration guide for more details
|
|
galaxy-e2e-tests / Galaxy CCL tests:
/opt/venv/lib/python3.10/site-packages/llama_models/llama3/api/datatypes.py#L120
Pydantic V1 style `@validator` validators are deprecated. You should migrate to Pydantic V2 style `@field_validator` validators, see the migration guide for more details
|
|
galaxy-e2e-tests / Galaxy CCL tests:
/opt/venv/lib/python3.10/site-packages/pydantic/_internal/_config.py#L291
Support for class-based `config` is deprecated, use ConfigDict instead.
|
|
galaxy-e2e-tests / Galaxy CCL tests:
/opt/venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py#L132
Field "model_name" in OpTest has conflict with protected namespace "model_".
You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
|
|
galaxy-e2e-tests / Galaxy CCL tests:
/opt/venv/lib/python3.10/site-packages/pydantic/_internal/_config.py#L291
Support for class-based `config` is deprecated, use ConfigDict instead.
|
|
galaxy-e2e-tests / Galaxy CCL tests:
/opt/venv/lib/python3.10/site-packages/pydantic/_internal/_config.py#L291
Support for class-based `config` is deprecated, use ConfigDict instead.
|
Artifacts
Produced during runtime
| Name | Size | Digest | |
|---|---|---|---|
|
test_reports_ec02438d-5193-4558-b7f1-4f8eb3a93783
|
15.8 KB |
sha256:b702731ee5790906e6ad2cd06463b66934fd8ce23c56d7025b72b5102265826d
|
|