Skip to content

[AudioFormat] Add FLAC and OPUS, correct AAC container#5218

Draft
Nadahar wants to merge 1 commit into
openhab:mainfrom
Nadahar:audio-format
Draft

[AudioFormat] Add FLAC and OPUS, correct AAC container#5218
Nadahar wants to merge 1 commit into
openhab:mainfrom
Nadahar:audio-format

Conversation

@Nadahar

@Nadahar Nadahar commented Dec 21, 2025

Copy link
Copy Markdown
Contributor

I made these changes a while ago while working on the Chromecast binding, but I'm not quite sure what to do with it - so I've chosen to submit it here as a draft.

The reason for the draft isn't that it's incomplete, but simply that I'm not sure if merging it would have unforeseen consequences. Most likely it wouldn't, but I'm not sure how to know for sure - as it potentially affects audio sink/source negotiations.

Generally, I'd say that much more should be changed here. First of all, there are lots of formats/containers and codecs, and combinations of the two, that aren't included. Many are obscure and perhaps not relevant here, but there are also quite commonly used combinations that aren't in here. Second, the "matching" logic is fundamentally flawed as I see it.

When you want to match sink/source capabilities, there are a lot of pesky details to handle if you want it to work reasonably well. In fact, there are so many pesky details that it's almost limitless how far one can go. Some sinks make the most "outrageous" limitations on what they can play, like e.g. if Huffman tables are "typical" (meaning a copy of those posted in some standard) or not, or how large the decoding buffer in the decoder must be (in bytes). There's no way to know that about a source without doing a full analysis of the decoding process for any particular source. I just conclude that it's not within the realm of what's reasonable or practical to take this "all the way", but that doesn't mean more can't (and shouldn't) be done than what's currently here.

If we look at the metadata that is currently considered:

        this.container = container;
        this.codec = codec;
        this.bigEndian = bigEndian;
        this.bitDepth = bitDepth;
        this.bitRate = bitRate;
        this.frequency = frequency;
        this.channels = channels;

There are some problems. A codec isn't a codec, for example. Most codecs have lots of "sub features", some have standardized this into "levels" and "profiles", but basically, very few decoders support every feature defined in a codec, so to know if a sink can play a source, one must dig a lot deeper than just looking at the codec itself. Going into all the details is almost an indefinite task, but matching profiles and levels (for those codecs that have this concept), would take care of the brunt of it.

bigEndian is a bit strange, most codecs specify this in their specs, so it's not a relevant option. I assume that this applies to PCM, which is the only "codec" I can think of at the moment that can be encoded either way. It should thus rather be a "sub property" of the codec on the same level as "profile" and "level", so that it's only a factor where relevant.

bitDepth is almost always a range (for sink support). The current impementation doesn't allow ranges, if you specify something, you must specify one particular bit depth. That will usually exclude a lot of content that the sink could play, so the only "realistic" option is to not specify it, which indicates that all bit depths are supported - which is also almost guaranteed to be wrong, but will probably impact the users less because most sources are within "the usual range" of bit depth.

Both bitDepth and frequency have a similar situation as bitDepth - you would need ranges to make them useful. In the case of frequency/sample rate, simple ranges wouldn't do either, you woud need to be able to specify specific values, e.g "11025, 44100, 44000, 96000", which means that e.g 22050 is not supported by this particuar sink. This is very common, the decoder/device usually only support select sample rates.

channels is a can of worms in itself. It's fine when channels are 1 or 2. But, beyond that, it's not just the number of channels that matter, by in which way they are "organized". 5 channels, for example, can mean both 5.0 and 4.1 configurations, etc. On top of this, it's the more "modern" concept of "channelless sources" that can exist in an audio stream, which is just a single channel source that can be placed with 3D coordinates in a "virtual room", and the decoder will then do the calculations to map the delay and volume this should result in for each loudspeaker.

Again, channels is a bit difficult, it's hard to know "where to draw the line", but at the very least, you would need to support a range and/or select values, in this is often on a per-codec level, meaning that one particular source can support 6 channels for some codecs (e.g. AC-3), but only 2 channels for others (e.g. AAC) - even though the codec itself supports more channels.

Signed-off-by: Ravi Nadahar <nadahar@rediffmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant