Skip to content

[BUG]: AMD Integrated GPU OpenCL Translation Produces Code That Yields Wrong Results #1539

@afmg-nmoesus

Description

@afmg-nmoesus

Describe the bug

It seems like the ILGPU OpenCL translation produces wrong code under certain conditions.
I have an algorithm that casts rays from an origin toward a rasterized plane (100 points -> 100 rays) and detects which rays are blocked by the given geometry. In a scenario where the geometry theoretically is not hit, my kernel reports mostly hits and some misses. The wrong behavior becomes most obvious when I refactor the code and extract a condition from an if into a variable one line above the if (see code comments) and suddenly the algorithm works, no more hits. Also, most other changes to the code, even lines that should not be reached in the given scenario, fix the wrong behavior. Additionally, when I use a CPUDevice the results are correct, too. And on a colleague's machine that has a Nvidia GPU even the CLDevice yields correct results.

I have put quite some effort into simplifying my code and hope you can find the disturbing issue. I couldn't simplify it further, because most changes made the error disappear.

using ILGPU;
using ILGPU.Runtime;
using ILGPU.Runtime.OpenCL;

public sealed class ReproduceErrorFacts
{
    private const int NumberOfPoints = 100;
    private const byte False = 0;
    private const byte True = 1;

    [Fact]
    public void This_test_should_not_fail()
    {
        var context = Context.Create(builder => builder.AllAccelerators()
                                                       .EnableAlgorithms());
        var device = context.GetCLDevices()[0]; // The bug appears only on CLDevice, not on CPUDevice!
        var accelerator = device.CreateAccelerator(context);

        var resultBuffer = accelerator.Allocate1D<int>(NumberOfPoints);

        var loadedKernel = accelerator.LoadStreamKernel<ArrayView<int>>(KernelMethod);
        var kernelConfig = CreateKernelConfig(accelerator, NumberOfPoints, 256);
        loadedKernel.Invoke(kernelConfig, resultBuffer.View);

        accelerator.Synchronize();

        var result = new int[NumberOfPoints];
        resultBuffer.View.CopyToCPU(result);

        int hits = result.Count(hit => hit == True);
        int misses = result.Count(hit => hit == False);

        Assert.Equal(0, hits); // No ray should hit the geometry. A properly calculating kernel has 0 hits.
        Assert.Equal(NumberOfPoints, misses);

        resultBuffer.Dispose();
        accelerator.Dispose();
        context.Dispose();
    }

    private static KernelConfig CreateKernelConfig(Accelerator accelerator, int requiredThreadCount, int desiredThreadsPerGroup)
    {
        int threadsPerGroup = Math.Min(desiredThreadsPerGroup, accelerator.MaxNumThreadsPerGroup);
        int numberOfGroups = (requiredThreadCount + threadsPerGroup - 1) / threadsPerGroup;

        return new KernelConfig(numberOfGroups, threadsPerGroup);
    }

    private static void KernelMethod(ArrayView<int> resultBuffer)
    {
        int threadIndex = Grid.GlobalLinearIndex;
        if (threadIndex >= NumberOfPoints)
        {
            return;
        }

        var rayOrigin = new CustomVector(0f, 0f, 1f);

        // Create target points from (-5|-5|-10) to (4|4|-10).
        float x = threadIndex % 10 - 5;
        float y = threadIndex / 10 - 5;
        var rayTarget = new CustomVector(x, y, -10);

        // Create geometry that should not be hit by the rays. The geometry lies below the target plane.
        GeometryNode[] nodes =
        [
            new(new BoundingBox(new CustomVector(-2f, -2f, -15f), new CustomVector(2f, 2f, -18f)), 1, -1),  // root node
            new(new BoundingBox(new CustomVector(-1f, -1f, -16f), new CustomVector(1f, 1f, -17f)), 42, 43), // first leaf node
            new(new BoundingBox(new CustomVector(-1f, -1f, -16f), new CustomVector(1f, 1f, -17f)), 42, 43)  // second leaf node
        ];

        var originToTarget = rayTarget - rayOrigin;
        var originToTargetDistance = originToTarget.Length;

        var normalizedRayDirection = originToTarget * (1f / originToTargetDistance);
        var ray = new Ray(rayOrigin, normalizedRayDirection);

        float hitDistance = DistanceToGeometry(ray, nodes);

        bool geometryShadowsTargetPlane = hitDistance < originToTargetDistance;

        resultBuffer[threadIndex] = geometryShadowsTargetPlane
                                        ? True
                                        : False;
    }

    private static float DistanceToGeometry(Ray ray, GeometryNode[] nodes)
    {
        GeometryNode[] nodeStack = new GeometryNode[10];
        uint nodeStackPtr = 0;

        GeometryNode node = nodes[0]; // start algorithm with root node

        float nearestHitDistance = float.MaxValue; // Initialize with miss.

        while (true)
        {
            if (node.IsInnerNode())
            {
                GeometryNode child1 = nodes[node.ChildNodeIndex];
                GeometryNode child2 = nodes[node.ChildNodeIndex + 1];
                float dist1 = IntersectBoundingBox(ray, child1.BoundingBox);
                float dist2 = IntersectBoundingBox(ray, child2.BoundingBox);

                if (dist1 == float.MaxValue        // no hit...
                    || dist1 > nearestHitDistance) // ... or hit is further away     !!! WHEN THIS IS PUT INTO A VARIABLE THE ALGORITHM WORKS !!!
                {
                    if (nodeStackPtr == 0)
                    {
                        break; // no more nodes to check, quit algorithm
                    }

                    node = nodeStack[--nodeStackPtr]; // take next node from stack.
                }
                else // child 1 is nearest hit
                {
                    node = child1; // take child 1 as next node

                    if (dist2 != float.MaxValue        // hit...
                        && dist2 < nearestHitDistance) // ... and hit is closer
                    {
                        nodeStack[nodeStackPtr++] = child2; // put child 2 on the stack
                    }
                }
            }
            else // we have a leaf node
            {
                // Set some distance that is far larger than any ray to target distance, i.e. the ray should be considered a miss.
                nearestHitDistance = 1000;

                if (nodeStackPtr == 0)
                {
                    break; // no more nodes to check
                }

                node = nodeStack[--nodeStackPtr]; // take next node from stack.
            }
        }

        return nearestHitDistance;
    }

    private static float IntersectBoundingBox(Ray ray, BoundingBox box)
    {
        float tx1 = (box.Min.X - ray.Origin.X) * ray.ReciprocalDirection.X;
        float tx2 = (box.Max.X - ray.Origin.X) * ray.ReciprocalDirection.X;
        float tMin = MathF.Min(tx1, tx2);
        float tMax = MathF.Max(tx1, tx2);
        float ty1 = (box.Min.Y - ray.Origin.Y) * ray.ReciprocalDirection.Y;
        float ty2 = (box.Max.Y - ray.Origin.Y) * ray.ReciprocalDirection.Y;
        tMin = MathF.Max(tMin, MathF.Min(ty1, ty2));
        tMax = MathF.Min(tMax, MathF.Max(ty1, ty2));
        float tz1 = (box.Min.Z - ray.Origin.Z) * ray.ReciprocalDirection.Z;
        float tz2 = (box.Max.Z - ray.Origin.Z) * ray.ReciprocalDirection.Z;
        tMin = MathF.Max(tMin, MathF.Min(tz1, tz2));
        tMax = MathF.Min(tMax, MathF.Max(tz1, tz2));

        if (tMax >= tMin && tMax > 0f)
        {
            return tMin; // Hit.
        }

        return float.MaxValue; // Miss.
    }

    public readonly record struct CustomVector(float X, float Y, float Z)
    {
        public float Length => MathF.Sqrt(X * X + Y * Y + Z * Z);
        public static CustomVector operator -(CustomVector l, CustomVector r) => new(l.X - r.X, l.Y - r.Y, l.Z - r.Z);
        public static CustomVector operator *(CustomVector v, float d) => new(d * v.X, d * v.Y, d * v.Z);
    }

    private readonly record struct Ray(CustomVector Origin, CustomVector Direction)
    {
        public readonly CustomVector ReciprocalDirection = new(1 / Direction.X, 1 / Direction.Y, 1 / Direction.Z);
    }

    private readonly record struct GeometryNode(BoundingBox BoundingBox, int ChildNodeIndex, int InnerNodeIndicator)
    {
        public bool IsInnerNode() => InnerNodeIndicator == -1;
    }

    private readonly record struct BoundingBox(CustomVector Min, CustomVector Max);
}

Environment

  • ILGPU version: 1.5.3
  • .NET version: .NET 9
  • Operating system: Windows 11
  • Hardware (if GPU-related): AMD Ryzen 9

Steps to reproduce

  1. Run the unit test.
  2. If the test fails, you have reproduced the issue.

Expected behavior

The test should succeed, what it seems to do on Nvidia cards and CPU devices, but not on my integrated AMD chip.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions