Handling image formats in a cross-platform way

I'm currently rewriting SeeShark from scratch without FFmpeg. Right now the backends are mostly done for Linux and MacOS, and I'm about to implement the Windows one.

However, one problem that has become very apparent is managing image formats in a cross-platform way, as it turns out that every platform does it differently.
I've kept some notes on the differences, so I want to unfold them here and potentially see if I can find a better way to handle input formats.

# Vocabulary

In SeeShark 5, I call "image format" the thing that describes how an image is laid out in memory. It could be raw pixels arranged in a certain way like ARGB or YUYV, or some planar configuration, or a compressed format like MJPEG or whatever else. I chose to call it like this because it fits the way it is used best in my view. Both operating systems and FFmpeg call it either an input format or a pixel format, the latter generally only referring to raw formats, but it's somewhat inconsistently applied and a bit of a mess.

And thus a video format contains an image format describing the format of its frames.

# How input formats are queried

Illustrating how input formats are queried with some C#-ish pseudo-code.

## Linux

```cs
var supportedVideoFormats = [];

foreach (var inputFormat in SupportedInputFormats(selectedDevice))
    foreach (var frameSize in QueryFrameSizes(selectedDevice, inputFormat))
        foreach (var frameInterval in QueryFrameIntervals(selectedDevice, inputFormat, frameSize))
            supportedImageFormats.Add(new VideoFormat(inputFormat, frameSize, frameInterval));
```

On Linux with V4L2, both frame sizes and frame intervals can either be "discrete" (one size), "continuous" (a minimum and maximum size) or "stepwise" (a minimum and maximum size, along with a step size describing which values in-between the min and max sizes are also valid). I am currently not handling continuous frame sizes and intervals but it's probably important that I do, so I'll have to look more into it.

> Question: are the supported input formats chosen by the device or by the kernel? Do they depend on the device?

## MacOS

```cs
var supportedVideoFormats = [];

var pixelFormatTypes = AvailablePixelFormatTypes();

foreach (var deviceFormat in SupportedCaptureDeviceFormats(selectedDevice))
{
    var maxPhotoDimensions = SupportedMaxPhotoDimensions(deviceFormat);
    var frameRateRanges = VideoSupportedFrameRateRanges(deviceFormat);

    foreach (var maxPhotoDimension in maxPhotoDimensions)
        foreach (var frameRateRange in frameRateRanges)
            foreach (var pixelFormatType in pixelFormatTypes)
                supportedVideoFormats.Add(new VideoFormat(maxPhotoDimension, frameRateRange.Max, pixelFormatType));
}
```

On MacOS with AVFoundation, it seems like a device can have multiple "supported max photo dimensions". I assume that is basically the same as Linux's discrete frame sizes, but I'm not sure and Apple's documentation doesn't say much else about it. There are also framerate *ranges*, which seem equivalent to Linux's continuous frame intervals, and do not have information about which in-between values are also valid. Maybe they all are.

## Windows

> TODO

# FourCC codes for image formats

All these platforms use FourCC codes to represent image formats, but it seems like:
- there can be multiple FourCC codes for the same format, according to the documentation I found on wiki.multimedia.cx
- platforms use only one of these and it's different between them.

For example:
- the UYVY 4:2:2 pixel format's FourCC code is
  - `'UYVY'` on Linux (`V4l2InputFormat.UYVY`)
  - `'2vuy'` (or `'yuv2'`?) on MacOS (`CVPixelFormatType.k_422YpCbCr8`)
- the RGBA 32-bit pixel format's FourCC code is
  - `'AB24'` on Linux (`V4l2InputFormat.RGBA32`)
  - `'RGBA'` on MacOS (`CVPixelFormatType.k_32RGBA`)

So I can't just use a single integer to represent all of them in a standard way, unfortunately it seems like it has to be a giant enum instead that I create big switch statements for on each platform. And that requires testing them, which I might not be able to do with my mere 2 cameras and a Virtual OBS, one of which is my Macbook Air's built-in webcam.

# Some useful links

- [FourCC](https://wiki.multimedia.cx/index.php/FourCC)
- [fourcc.org](https://fourcc.org/)
- [Video FourCCs](https://wiki.multimedia.cx/index.php/Category:Video_FourCCs)
- OS sources
  - [Windows Video FourCCs](https://learn.microsoft.com/en-gb/windows/win32/medfound/video-fourccs)
  - [Apple Pixel Format Identifiers](https://developer.apple.com/documentation/corevideo/pixel-format-identifiers?language=objc)
    - `CVPixelBuffer.h` header file containing all definitions with their corresponding FourCC if it exists, available on MacOS at `/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Frameworks/CoreVideo.framework/Versions/A/Headers/CVPixelBuffer.h`
  - [Linux Pixel Formats](https://www.kernel.org/doc/html/v4.10/media/uapi/v4l/pixfmt.html)
    - [`videodev2.h`](https://github.com/torvalds/linux/blob/7cdabafc001202de9984f22c973305f424e0a8b7/include/uapi/linux/videodev2.h#L541-L831) header file containing all definitions with their corresponding FourCC

# Now what?

Some problems I see are as follows.

First, the individual parts of a video format are queried differently and with different dependencies on each platform (as illustrated by the pseudo-C# nested `foreach`es). Right now I am just brute-forcing my way into enumerating them by returning all combinations of all possible configurations (most of the time), but this is probably not sustainable, and even if it were, it is jarring to see so many input formats when it could be expressed much more succinctly.

Second, sometimes stuff is described as a range instead of a list of discrete values. If literally anything between the min and max are valid, then I'd have to make the way I list available video formats more intelligent and start describing these ranges too.

So yeah, there's a lot of stuff to think about here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handling image formats in a cross-platform way #57

Vocabulary

How input formats are queried

Linux

MacOS

Windows

FourCC codes for image formats

Some useful links

Now what?

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Handling image formats in a cross-platform way #57

Description

Vocabulary

How input formats are queried

Linux

MacOS

Windows

FourCC codes for image formats

Some useful links

Now what?

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions