Skip to content

SDXL L2 Nightly Unet_Loop Fail #37548

@jmitrovicTT

Description

@jmitrovicTT

Problem Description

The SDXL unet_loop test failed in L2 nightly due to hitting the 3000s timeout - link, but passed on rerun: link

After digging in, it looks like this is machine-dependent.
On most CI machines the test completes in ~33 minutes, while on tt-metal-ci-vm-14 it consistently takes closer to 50 minutes, which pushes it over the timeout this time.

Date Machine base_unet refiner_unet Job
10 Feb 11:21 AM tt-metal-ci-vm-27 1980s 952s https://github.com/tenstorrent/tt-metal/actions/runs/21854126492/job/63087302056
10 Feb 7:21 AM tt-metal-ci-vm-14 >3000s - https://github.com/tenstorrent/tt-metal/actions/runs/21854126492/job/63067635989
9 Feb 7:21 AM tt-metal-ci-vm-71 2003s 967s https://github.com/tenstorrent/tt-metal/actions/runs/21814599942/job/62947574846
8 Feb 7:14 AM tt-metal-ci-vm-14 2950s 1449s https://github.com/tenstorrent/tt-metal/actions/runs/21793495785/job/62877313068
7 Feb 7:09 AM tt-metal-ci-vm-97 2111s 979s -
6 Feb 7:15 AM tt-metal-ci-vm-121 1957s 958s -
5 Feb 7:17 AM tt-metal-ci-vm-166 2235s 869s -

Comment

Potential Solution is to increase the timeout in test - but it is unclear should we do it because 50min is already a lot.

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions