Skip to content

Conv2d fail in DETR model #794

Closed
Closed
@ayerofieiev-tt

Description

@ayerofieiev-tt

Conv2d call in DETR model fails with out of memory in latest main.
Started to happen somewhere during past 3 weeks.

More info here
https://github.com/tenstorrent/pytorch2.0_ttnn/actions/runs/13539288917/job/37836823583

pytest models/detr/test_detr.py 

Log

    def conv2d(
        *,
        input_tensor: ttnn.Tensor,  # may or may not be sharded
        weight_tensor: ttnn.Tensor,
        device: ttnn.Device,
        in_channels: int,
        out_channels: int,
        batch_size: int,
        input_height: int,
        input_width: int,
        kernel_size: Union[int, Tuple[int, int]],
        stride: Union[int, Tuple[int, int]],
        padding: Union[int, Tuple[int, int]],
        dilation: Union[int, Tuple[int, int]] = (1, 1),
        groups: int = 1,
        bias_tensor: ttnn.Tensor = None,
        conv_config: Conv2dConfig = None,  # config overrides by user
        compute_config=None,  # compute config overrides by user
        memory_config: ttnn.MemoryConfig = None,  # memory config overrides by user
        conv_op_cache={},  # basic conv object caching in python needed for intermediate refactoring. Not needed after full op refactoring in C++.
        debug=False,  # ignored
        return_output_dim=False,
        return_weights_and_bias=False,
    ) -> Tuple[ttnn.Tensor, int, int, ttnn.Tensor, ttnn.Tensor]:
        (
            conv_output,
            output_height,
            output_width,
            prepared_device_weight,
            prepared_device_bias,
>       ) = ttnn._ttnn.operations.conv.conv2d(
            input_tensor=input_tensor,
            weight_tensor=weight_tensor,
            device=device,
            in_channels=in_channels,
            out_channels=out_channels,
            batch_size=batch_size,
            input_height=input_height,
            input_width=input_width,
            kernel_size=kernel_size,
            stride=stride,
            padding=padding,
            dilation=dilation,
            groups=groups,
            bias_tensor=bias_tensor,
            conv_config=conv_config,
            compute_config=compute_config,
            memory_config=memory_config,
        )
E       RuntimeError: TT_THROW @ /work/tt_metal/impl/program/program.cpp:905: tt::exception
E       info:
E       Statically allocated circular buffers in program 2372 clash with L1 buffers on core range [(x=0,y=0) - (x=7,y=3)]. L1 buffer allocated at 498368 and static circular buffer region ends at 560352
E       backtrace:

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions