Skip to content

Idea for Converter functions #1777

Open
@jdmsolarius

Description

@jdmsolarius

Is your feature request related to a problem? Please describe

The .ToBitmap Code has a few issues and I offer a potential solution. The code change I propose runs on 2-2.5 year old 12 Core AMD CPU this goes from 18 milliseconds to ~1.9 milliseconds.

The two problems with the existing code are

This is a highly Parallel Problem

Creating several thousand buffers is time consuming (assuming a 16 bit image)
Solution:

We already have a function that allows us to get a pointer to an area, we should use a pointer instead of the buffers. In the case of RGB we can also go two at a time packing six bytes into a ULong and assigning two pixels at once.

    using (var pixels = image.GetPixelsUnsafe());
    var mapping = GetMapping(format);

    var bitmap = new Bitmap(image.Width, image.Height, format);
    //First Pain Point, No Parallelization for a very Parallel problem.
    for (var y = 0; y < image.Height; y++)
    {
        var row = new Rectangle(0, y, image.Width, 1);
        var data = bitmap.LockBits(row, ImageLockMode.WriteOnly, format);
        var destination = data.Scan0;

        var bytes = pixels.ToByteArray(0, y, image.Width, 1, mapping);
        //This is the second Pain Point. Instead of getting a Pointer to an Area, we are materializing a buffer
        if (bytes is not null)
            Marshal.Copy(bytes, 0, destination, bytes.Length);
        
        bitmap.UnlockBits(data);
    }

    SetBitmapDensity(self, bitmap, useDensity);
    return bitmap;

Describe the solution you'd like
This is the code I use for Images at the lab. Usually we don't have an Alpha channel but I wrote code that accounts for both 32bppRgb and 24bppRgb. _so I go through and Assign two ULongs at once. The biggest gains though are from Parallel Proccessing and not materializing buffers (we never call .ToByteArray) conceptually it's very simple we just get a pointer to the unsafepixel collection and start iterating.

Describe alternatives you've considered

Describe the solution you'd like

This code works on a variety of images I acknowledge it doesn't have all the error checking and SetBitmapDensity isn't called but the General idea is sound and could easily be applied to several other slower functions.

  public static unsafe Bitmap ToBitmapFast(this IMagickImage self)
   where TQuantumType : struct, IConvertible{
   IMagickImage image = self;
   
   PixelFormat format = self.HasAlpha ? PixelFormat.Format32bppArgb : PixelFormat.Format24bppRgb;
   string mapping = format == PixelFormat.Format24bppRgb ? "BGR" : "BGRA";
   
   int height = image.Height;
   int width = image.Width;
   using IUnsafePixelCollection<TQuantumType> pixels = image.GetPixelsUnsafe();
   Bitmap bitmap = new Bitmap(width, height, format);
   
   // Lock the entire bitmap
   BitmapData data = bitmap.LockBits(
       new Rectangle(0, 0, width, height),
       ImageLockMode.WriteOnly,
       format
   );

 try
 {
     // Get the source pointer for the entire image (ushort image data)
     nint pointer = pixels.GetAreaPointer(0, 0, width, height);
     if (pointer == 0)
     {
         throw new InvalidOperationException("Invalid source pointer for the entire image.");
     }
     //if this is an even 
     int strideDecision = (width % 2 == 0) ? 1 : 0;
      
   ushort* srcPtr = (ushort*)pointer; // Source is ushort
   byte* destPtr = (byte*)data.Scan0; // Destination is byte

   int destinationStride = data.Stride;
   int channels = mapping.Length;

   Parallel.For(0, height, row =>
   {
       ushort* srcRowPtr = srcPtr + row * width; // Source data for the row
       byte* destRowPtr = destPtr + row * destinationStride; // Destination data for the row



       if (channels == 3) // BGR
       {
           for (int col = 0; col < width - 1 - strideDecision; col += 2)
           {
               // Load two ushort values
               ushort pixelValue1 = srcRowPtr[col];
               ushort pixelValue2 = srcRowPtr[col + 1];

               // Normalize to byte
               byte normalizedValue1 = (byte)(pixelValue1 >> 8); // Scale down by 256
               byte normalizedValue2 = (byte)(pixelValue2 >> 8);

               // Pack RGB values for two pixels into a ulong
               ulong packedPixels = ((ulong)normalizedValue2 << 40) | // Pixel 2 - Blue
                                    ((ulong)normalizedValue2 << 32) | // Pixel 2 - Green
                                    ((ulong)normalizedValue2 << 24) | // Pixel 2 - Red
                                    ((ulong)normalizedValue1 << 16) | // Pixel 1 - Blue
                                    ((ulong)normalizedValue1 << 8)  | // Pixel 1 - Green
                                    normalizedValue1;                // Pixel 1 - Red

               // Write the packed pixels directly
               *(ulong*)(destRowPtr + col * 3) = packedPixels;

           }
           if (strideDecision == 1)
           {
               int col = width - 1;
               ushort pixelValue = srcRowPtr[col];
               byte normalizedValue = (byte)(pixelValue >> 8);

               destRowPtr[col * channels + 0] = normalizedValue; // Blue
               destRowPtr[col * channels + 1] = normalizedValue; // Green
               destRowPtr[col * channels + 2] = normalizedValue; // Red
               if (channels == 4)
                   destRowPtr[col * channels + 3] = normalizedValue; // Red
           }
       }
       else if (channels == 4) // BGRA
       {
           
           for (int col = 0; col < width; col++)
           {
               // Load four ushort values for one pixel
               ushort pixelValueB = srcRowPtr[col * 4 + 0]; // Pixel Blue
               ushort pixelValueG = srcRowPtr[col * 4 + 1]; // Pixel Green
               ushort pixelValueR = srcRowPtr[col * 4 + 2]; // Pixel Red
               ushort pixelValueA = srcRowPtr[col * 4 + 3]; // Pixel Alpha

               // Normalize each channel to byte
               uint packedPixel =
                   ((uint)pixelValueA << 24) | // Alpha
                   ((uint)pixelValueR << 16) | // Red
                   ((uint)pixelValueG << 8)  | // Green
                   (uint)pixelValueB;          // Blue

               // Write the packed pixel as a single 32-bit value
               *(uint*)(destRowPtr + col * 4) = packedPixel;
           }
       }
       // Process any remaining pixel if the width is odd
 
   });
 }
 finally
 {
     // Unlock the Bitmap
     bitmap.UnlockBits(data);
 }
 return bitmap;
 }

Describe alternatives you've considered

I tried to avoid using Pointers and just run the code in Parallel with a single buffer. The increase in speed is only marginal the problem is that we still need to call ToByteArray() and materialize an enormous Byte Array with Millions of Bytes when we should really just be pointing to the Bytes.

Additional context

This is something I actually use in production code and I would love to see instantaneous conversion as part of the library. Happy to contribute more if necessary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions