Skip to content

[Issue]: Issue on simple MLP with large size inference on navi48 #1785

@thori-amd

Description

@thori-amd

Problem Description

I have an issue with a simple PyTorch MLP program with full HD inference on navi48.


My environment is here:

  • OS: Windows 11 Pro 24H2

  • CPU: AMD Ryzen 9 9950X3D 16-Core Processor

  • GPU: AMD Radeon RX 9070 XT

  • GPU Driver Version: 32.0.21025.10016

  • (Get-WmiObject Win32_OperatingSystem).Version:

    • 10.0.26100
  • (Get-WmiObject win32_Processor).Name:

    • AMD Ryzen 9 9950X3D 16-Core Processor
  • (Get-WmiObject win32_VideoController).Name

    • AMD Radeon(TM) Graphics
    • AMD Radeon RX 9070 XT
  • Python: 3.11


A simple reproducible program is here:

requirements.txt:

--index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/
--extra-index-url https://pypi.org.simple

torch==2.10.0a+rocm7.10.0a20251009
tqdm
numpy
matplotlib

main.py:

import torch
import torch.nn as nn
import matplotlib.pyplot as plt
from tqdm import tqdm

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# device = torch.device("cpu")

width = 1920
height = 1080

# simple MLP
class MLP(nn.Module):
    def __init__(self):
        super(MLP, self).__init__()
        self.fc0 = nn.Linear(3, 256)
        self.fc1 = nn.Linear(256, 256)
        self.fc2 = nn.Linear(256, 256)
        self.fc3 = nn.Linear(256, 256)
        self.fc4 = nn.Linear(256, 3)
    def forward(self, x):
        x = torch.relu(self.fc0(x))
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = torch.relu(self.fc3(x))
        x = torch.sigmoid(self.fc4(x))
        return x

BATCH_SIZE = 1024
ITERATION = 100

# all red
target = torch.tensor([1.0, 0.0, 0.0]).to(device)
target = target.repeat(BATCH_SIZE).reshape(BATCH_SIZE, 3)

# setup training mlp
mlp = MLP().to(device)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(mlp.parameters(), lr=0.1)

# train random value to red
mlp.train()
for epoch in tqdm(range(ITERATION)):
    x = torch.rand_like(target).to(device)
    optimizer.zero_grad()
    x = mlp(x)
    loss = criterion(x, target)
    loss.backward()
    optimizer.step()

# eval mlp from random image to red image
with torch.no_grad():
    mlp.eval()
    x = torch.rand(height * width, 3).to(device)
    
    img = mlp(x)
    # torch.cuda.synchronize()

    img_py = img.reshape(height, width, 3).detach().cpu().numpy()
    plt.imsave("./test-img.png", img_py)

This program trains an MLP to infer random colored inputs into all red.
And it performs inference at a large resolution.
The hope is that this will yield all red images.

However, when run on navi48, we get the following image:
Image

When I test the program with device = torch.device("cpu"), I get a completely red image as expected.
So, it seems a GPU related issue.
Image

There are no issues when the image resolution for inference is lowered (960x540).
Image

Operating System

Windows 11 Pro 24H2

CPU

AMD Ryzen 9 9950X3D

GPU

AMD Radeon RX 9070 XT

ROCm Version

ROCm 6.4.0

ROCm Component

No response

Steps to Reproduce

Run the above program.

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    TODO

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions