Skip to content

Conversation

@197g
Copy link
Member

@197g 197g commented Nov 26, 2025

See #2245, the intended ImageDecoder changes.

This changes the ImageDecoder trait to fix some underlying issues. The main change is a clarification to the responsibilities; the trait is an interface from an implementor towards the image library. That is, the protocol established from its interface should allow us to drive the decoder into our buffers and our metadata. It is not optimized to be used by an external caller which should prefer the use of ImageReader and other inherent methods instead.

This is a work-in-progress, below motivates the changes and discusses open points.

  • ImageDecoder::peek_layout encourages decoders to read headers after the constructor. This fixes the inherent problem we had with communicating limits. The sequences for internal use is roughly:
    let mut decoder = …;
    decoder.set_limits(); // Other global configuration we have?
    
    { // Potentially multiple times:
      let layout_info = decoder.peek_layout()?;
      let mut buffer = allocate_for(&layout_info);
      decoder.xmp_metadata()?; // and other meta
      decoder.read_image(&mut buffer)?;
    }
    
    // … for sequences, start again from `peek_layout()`
  • ImageDecoder::read_image(&mut self) no longer consumes self. We no longer need the additional boxed method and its trait work around, the trait is now dyn-compatible.

Discussion

  • Implement peek_layout more consistently after read_image
    • avif
    • tga
    • pnm
    • tiff
    • dxt
    • qoi
    • dds
    • gif
  • Maybe init peek_layout should return the full layout information in a single struct. We have a similar open issue for png in its own crate, and the related work for tiff is in the pipeline where its BufferLayoutPreference already exists to be extended with said information.
    • Review limits and remove its size bounds insofar as they can be checked against the communicated bounds in the metadata step by the image side. see: Replace ImageDecoder::set_limits with ImageDecoder::set_allocation_limit #2709, Add an atomically shared allocation limit #2708
    • Idea: If a decoder supports builtin transforms (e.g. YCbCr -> Rgb conversion, grayscale, thumbnailing) that are more efficient than post-processing then there could be a negotiation phase here where information is polled twice / multiple times by different methods. The design should leave this negative space to be added in 1.1, but it's not highly critical.
  • Fix the sequence decoder to use the new API
  • Make sure that read_image is 'destructive' in all decoders, i.e. re-reading an image and reading an image before init should never access an incorrect part of the underlying stream but instead return an error. Affects pnm and qoi for instance where the read will interpret bytes based on the dimensions and color, which would be invalid before reading the header and only valid for one read.
  • Tests for reading an image with read_image then switching to a sequence reader. But that is supposed to become mainly an adapter that implements the iterator protocol.
  • Remove remnants of the dyn-compatibility issue.
  • Adapt to the possibility of fetching metadata after the image. This includes changing ImageReader with a new interface to return some of it. That may be better suited for a separate PR though.
    • Extract the CICP part of the metadata as CicpRgb and apply it to a decoded DynamicImage.
    • Ensure that this is supported by all the bindings.
  • Deal with limits: Decoder metadata interface #2672 (comment)

@mstoeckl
Copy link
Contributor

mstoeckl commented Nov 27, 2025

The main change is a clarification to the responsibilities; the trait is an interface from an implementor towards the image library. That is, the protocol established from its interface should allow us to drive the decoder into our buffers and our metadata. It is not optimized to be used by an external caller which should prefer the use of ImageReader and other inherent methods instead.

With this framing, I think Limits::max_image_width and Limits::max_image_height no longer need to be communicated to or handled by the ImageDecoder trait, because the external code can check ImageDecoder::dimensions() before invoking ImageDecoder::read_image(); only the memory limit (Limits::max_alloc) is essential. That being said, the current way Limits are handled by ImageDecoder isn't that awkward to implement, so to reduce migration costs keeping the current ImageDecoder::set_limits() API may be OK.

@fintelia
Copy link
Contributor

A couple thoughts...

I do like the idea of handling animation decoding with this same trait. To understand, are you thinking of "sequences" as being animations or also stuff like the multiple images stored in a TIFF file? Even just handling animation has some tricky cases though. For instance in PNG, the default image that you get if you treat the image as non-animated may be different from the first frame of the animation. We might need both a read_image and a read_frame method.

The addition of an init method doesn't seem like it gains us much. The tricky part of our current new+set_limits API is that you get to look at the image dimensions and total output size in bytes when deciding what decoding limits to set. Requiring init (and by extension set_limits) to be called before reading the dimensions makes it basically the same as just having a with_limits constructor.

@197g
Copy link
Member Author

197g commented Nov 27, 2025

Requiring init (and by extension set_limits) to be called before reading the dimensions makes it basically the same as just having a with_limits constructor.

It's a dyn-compatible way that achieves the goal of the constructor so it is actually an abstraction.

The tricky part of our current new+set_limits API is that you get to look at the image dimensions and total output size in bytes when deciding what decoding limits to set.

What do you by this? The main problem in png that I'm aware of is the lack of configured limits for reading the header in the ImageReader path, that motivated the extra constructor in the first place. With png we can not modify the limits after the fact but also we don't really perform any large size-dependent allocation within the decoder.

I'm also not suggesting that calling set_limits after the layout inspection would be disallowed but obviously is decoder dependent on whether that 'frees' additional capacity. I guess if that is sufficient remains to be seen? When we allocate a buffer (with applied allocator limits)´ that allows forwarding the remaining buffer size to the decoder. Or, set aside a different buffer allowance for metadata vs. image data. Whatever change is necessary in png just comes on top anyways, the init flow just allows us to abstract this and thus apply it with an existing Box<dyn ImageDecoder> so we don't have to do it all before. Indeed, as the comment on size eludes to we may want to different limit structs: one user facing that we use in ImageReader and one binding-facing that is passed to ImageDecoder::set_limits. Then settings just need to be

@197g
Copy link
Member Author

197g commented Dec 4, 2025

@fintelia This now includes the other changes including to ImageReader as a draft of what I meant in #2679 (comment). In short:

  • The file guessing routines and the construction of the boxed decoder are split into a separate type, ImageFile, which provides the previous methods of ImageReader but also an into_reader for the mutable, stateful interface.
  • ImageReader:
    • features all accessors for metadata considering that some formats fill said metadata (or a pointer to it) after an image is decoded.
    • has viewbox as a re-imagining of the previous ImageDecoderRect trait but split into two responsibilites: the trait does the efficiency decision on an image-by-image basis with an interface that allows a partial application of the viewbox (in jpeg and tiff we would decode whole tiles); then the reader takes care of translating that into an exact layout. Note that another type of image buffer with offset+rowpitch information could do that adjustment zerocopy—I still want to get those benefits of the type erased buffer/image-canvas someday and this fits in.
  • The code also retrieves the CICP from the color profile and annotates the DynamicImage with it where available. For sanity's sake the moxcms integration was rewritten to allow a smaller dependency to be used here, I'll split these off the PR if we decide to go that route.
  • Conceivably there's a gain_map (or similar) that may be queried similar to the metadata methods. For that to be more ergonomic I'd like to seriously consider read_plane for, in tiff lingo, planar images as well as associated and non-associated mask data; and more speculatively other extra samples that are bump maps? uv? true cmyk?. While that does not necessarily all go into 1.* for any output that is not quite neatly statically sorted and sized as an Rgba 4-channel-homogeneous-host-order, I imagine it will be much simpler for a decoder to provides its data successively in multiple calls instead of a contiguous large byte slice. Similar to viewbox we'd allow this where ImageReader provides the compatibility to re-layout the image for the actual user—except where explicitly instructed. Adjusting ImageReader::decode to that effect should be no problem in principle.

@RunDevelopment
Copy link
Contributor

I can't speak about image metadata, but I really don't like the new ImageDecoder interface as both an implementor of the interface and a potential user of it. Right now, it's just not clear to me at all how decoders should behave. My problems are:

  1. init. This is just two-phase initialization and opens so many questions.
    • Do users have to call it? The docs say "should" not "must" be called before read_image.
    • Are users allowed to call it multiple times? If so, the decoder has to keep track of whether the header has already been read.
    • Since init returns a layout, what's the point of dimensions() and color_type()? And what if they disagree?
    • What should dimensions and co do before init is called?
    • If init fails, what should happen to methods like dimensions and read_image? When called, should they panic, return an error, return default values?
    • After calling read_image, do you have to re-init before calling read_image again?
  2. viewbox makes it more difficult to implement decoders.
    • Now they always have to internally keep track of the viewbox rect if rect decoding is supported.
    • After calling viewbox, what should dimensions be? If they should be the viewbox size, should they reflect the new viewbox even before calling init?
    • It's not clear what should happen if viewbox returns ok, but init errors.
    • What should happen if users supply a viewbox outside the bounds of the image?
  3. When calling viewbox, is the offset of the rect relative to the (0,0) of the full image or the last set viewbox?
  4. What should happen if read_image is called twice? Should it read the same image again, error, read the next image in the sequence? The docs don't say.
    • If the intended behavior is "read the same image again", then those semantics would force all decoders to require Seek for the reader (or keep an in memory copy of the image for subsequent reads). Not an unreasonable requirement, but it should be explicitly documented.

Regarding rectangle decoding, I think it would be better if we force decoders to support arbitrary rects. That's because the current interface is actually less efficient by allowing decoder to support only certain rects. To read a specific rect that is not supported as is, ImageReader has to read a too-large rect and then crop the read image, allocating the memory for the too-large image only to throw it away. It is forced to do this, because of the API.

However, most image formats are based on lines of block (macro pixels). So we can do a trick. Decode a line according to the too-large rect, and then only copy the pixels in the real rect to the output buffer. This reduces the memory overhead for unsupported rects from O(width*height) to O(width*block_height). Supported rects don't need this dance and can decode into the output buffer directly. I.e. that's kinda what DDS does.

And if a format can't do the line-based trick for unsupported rects, then decoders should just allocate a temp buffer for the too-large rect and then crop (=copy what is needed). This is still just as efficient as the best ImageReader can do.

For use cases where users can use rowpitch to ignore the exccess parts of the too-large rect, we could just have a method that gives back a preferred rect, which can be decoded very efficiently.

So the API could look like this:

trait ImageDecoder {
    // ...
    /// Returns a viewbox that contains all pixels of the given rect but can potentially be decoded more efficiently.
    /// If rect decoding is not supported or no more-efficient rect exists, the given rect is returned as is.
    fn preferred_viewbox(&self, viewbox: Rect) -> Rect {
        viewbox // default impl
    }
    fn read_image_rect(&mut self, buf, viewbox) -> ImageResult {
        Err(ImageError::Decoding(Decoding::RectDecodingNotSupported)) // or similar
    }

This API should make rect decoding easier to use, easier to implement, and allow for more efficient implementations.

@197g 197g force-pushed the decoder-metadata-interface branch from 86c9194 to cdc0363 Compare December 7, 2025 18:22
@197g
Copy link
Member Author

197g commented Dec 7, 2025

  1. init. This is just two-phase initialization and opens so many questions.

That was one of the open questions, the argument you're presenting makes it clear it should return the layout and that's it. Renamed to next_layout accordingly. I'd like to remove the existing dimensions()/color_type methods from the trait as well. There's no point using separate method calls for communicating them.


  • For use cases where users can use rowpitch, […]

    That is ultimately the crux of the problem. I'd say it's pretty much the only problem even though that does not appreciate the complexity. A lot of what you put forth is overly specific to solving one instance of it, obviously focusing on DDS. That's not bad but take a step back to the larger picture. There's no good way to communicate all kinds of layouts that the caller could handle: tiled, planar, depths, sample types …. With the information being exchanged right now, no-one can find a best-match between the requirements of image's data types (and Limits) and what the decoder can provide. This won't be solved by moving complexity into the decoders, we need to get structured information out of them primarily, then make that decision / handling the resulting byte data in image's code.

    1. viewbox makes it more difficult to implement decoders.

    The point of the default implementation in this PR is that it is purely opt-in. Don't implement the method for decoders that can not provide viewbox decoding and everything works correctly. The documentation seems to be confusing, point taken. We're always going to have inefficiencies, I'm for working through the distinct alternative layouts that allow an optimization one-by-one. More importantly for this PR immediately is what outcome a caller may want and what interface would give it to them—in this case I've worked on the use-case of extracting part of an atlas.

  • However, most image formats are based on lines of block (macro pixels). So we can do a trick.

    I'm not designing anything in this interface around a singular "trick", that's the wrong way around. That is how we got here. That's precisely what created ImageDecoderRect, almost to the dot. Falsehoods programmer's assume about image decoding will lead that to this breaking down and to be horrible to maintain. The trick you mention should live in the decoder's trait impl and nowhere else and we can bring it back where appropriate and possible. (Note that if you do it for a specific format, some formats will be even more efficient and not require you decode anything line-by-line but skip ahead, do tiles, … That's just to drive home the point that you do not want to do this above the decoder abstraction but below it in the ImageDecoder impl).

  • It is forced to do this, because of the API.

    The decoder impls is only forced to do anything if we force it via an interface—this PR does not; read_image_rect(&mut self, buf, viewbox) does force a decoder to be able to handle all possible viewboxes—this PR does not. I'm definitely taking worse short-term efficiency over code maintenance problems—the latter won't get us efficiency in the long run either.


When calling viewbox, is the offset of the rect relative to the (0,0) of the full image or the last set viewbox?

It's suppose to be to the full image. Yeah, that needs more documentation and pointers to the proper implementation.

}
}

impl ImageReader<'_> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it is easier for scrolling through the file if the impl blocks for each struct immediately follow the definitions

self.viewbox = Some(viewbox);
}

/// Get the previously decoded EXIF metadata if any.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "previously decoded" part here makes me a bit nervous. I think we'll want to be clearer about what the user has to do to make sure they don't get None for images that actually do have EXIF data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit odd and will depend on the format and metadatum. For instance, XMP is encoded per-image in tiff but only once in gif (despite us representing this as an image sequence) and also only once in png (no word about APNG). Problematically, in gif and png the standard requires absolutely no ordering with any of the other chunks. So it might be encountered before all of the header information is done; or after all the images have been consumed.

The problem with back references is of course the unclear association. And when multiple are included we always have a problem with consuming them 'after the end' since it should need to be buffered or the decoder able to store seek-back points (like png). Encoding all that in the interface is a challenge, i will incur some unavoidable complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the metadata query between peek_layout and read_image doesn't really affect this argument with the variants of MetadataHint that I've found to be necessary. So that is cleaner, see #2672 (comment)

Of course we have last_attributes still remaining since that is information that combines the information the reader has retrieved, e.g. the orientation given directly from read_image together with the fallback of querying it from exif.

@RunDevelopment

This comment was marked as outdated.

@197g

This comment was marked as resolved.

@RunDevelopment

This comment was marked as resolved.

@197g 197g force-pushed the decoder-metadata-interface branch 2 times, most recently from 1a114c3 to 306c6d2 Compare December 16, 2025 18:40
@197g
Copy link
Member Author

197g commented Dec 16, 2025

Resolving the naming question as peek_layout, hopefully satisfactory for now.

@197g 197g force-pushed the decoder-metadata-interface branch 7 times, most recently from e8d2713 to 4325060 Compare December 22, 2025 17:54
@197g
Copy link
Member Author

197g commented Dec 22, 2025

@fintelia I understand this is too big for a code-depth review but I'd be interested in the directional input. Is the merging of 'animations' and simple images as well as the optimization hint methods convincing enough? Is the idea of returning data from read_image something that works for you? The struct is meant to be Default-able and fills in the information for into_frames() but I'll sketch out some way of putting the metadata indicators in there (i.e. should you poll xmp for this frame, or wait until the end, or is it constant of the file).

As an aside, in wondermagick we basically find that sequence encoding is a missing API to match imagemagick. We can currently only do this with gif, despite tiff and webp having absolutely no conceptual problems with it (avif too, but imagemagick behaves odd, does not match libavifs decoding and the rust libraries don't provide it). It would be nice to make those traits symmetric so the direction here influences the encoding, too.

@197g 197g marked this pull request as ready for review December 22, 2025 22:17
@197g 197g force-pushed the decoder-metadata-interface branch from f6720de to c677c88 Compare December 29, 2025 16:30
197g added 8 commits December 29, 2025 17:45
The purpose of the trait is to be careful with the boundary to the
`moxcms` dependency. We use it because it is a quality implementation
but it is heavy weight for what we need. There's other possible ways to
provide transfer functions and color space transforms. Now this also
introduces ICC profile parsing but again that could be done with a
*much* lighter dependency as we only need basic information from it. The
trait should make every little additional cross-dependency a conscious
decision.

Also it should be the start of a customization point, by feature flag or
actually at runtime.
No longer responsible for ensuring the size constraints are met under
the new policy and with available of constructing a reader from an
instance of a boxed decoder.
This allows us to write a generic iterator which uses the same decoder
function to generate a whole sequence of frames. The attributes are
designed to be extensible to describe changes in available metadata as
well, very concretely some formats require that XMP/ICC/… are polled for
each individual image whereas others have one for the whole file and put
that at the end. So there's no universal sequence for querying the
metadata, and we need to hold runtime information. This will be the
focus of the next commit.
@197g 197g force-pushed the decoder-metadata-interface branch from ffbcf67 to 97a043b Compare December 29, 2025 16:48
@fintelia
Copy link
Contributor

@fintelia I understand this is too big for a code-depth review but I'd be interested in the directional input. Is the merging of 'animations' and simple images as well as the optimization hint methods convincing enough?

I do like having animation decoding also handled by ImageDecoder. I'm a bit torn about whether it should be a single method or having separate read_image and next_frame methods, mostly so that PNG's thumbnail can be extracted separately from the frames of an APNG animation.

We should probably also have an is_animated method to indicate whether there's multiple animation frames. And perhaps an is_image_sequence (open to other names) to support TIFF image sequences.

I also like the viewbox approach, and it is a good demonstration of why this PR is useful. However, it now seems like a good candidate to spin out into a separate PR. The mistake we made with the decode-rect API was not having implementations for very many formats and then discovering that it was too much effort to add them later. So to avoid repeating that, we should ideally include implementations it for a few non-trivial formats when adding it.

Is the idea of returning data from read_image something that works for you? The struct is meant to be Default-able and fills in the information for into_frames() but I'll sketch out some way of putting the metadata indicators in there (i.e. should you poll xmp for this frame, or wait until the end, or is it constant of the file).

I think it is a bit awkward that you have to decode the image data to know whether the metadata could appear after it. My understanding is that it is mostly for PNG to say that the metadata could come late in the file? (The WebP and TIFF formats also can have metadata after the pixel data, but they provide file offsets so our decoders directly seek to/from that file position)

For PNG we're in an especially weird position where web browsers will generally ignore metadata after the image data, while certain other viewers load it normally. Picking either option means someone is going to think we're doing it wrong. My inclination is that the default should be matching web browsers with an API to opt-in to getting trailing metadata, but I don't know what that interface would look like.

Frame delay could also be a separate method on ImageDecoder, but it is probably fine either way.

Other thoughts

  • I think the ImageReader type should provide convenience methods for getting the color type, dimensions, and perhaps even width/height individually.
  • I'm not totally sold on ImageDecoder::peek_layout over separate dimensions/color_type/etc. methods, but it is probably fine for the lower level API.
  • We should remember to have ImageReader::decode query/apply the image orientation and as a possible addition maybe have a decode_raw method that skips applying any metadata.

@197g
Copy link
Member Author

197g commented Dec 30, 2025

I also like the viewbox approach, and it is a good demonstration of why this PR is useful.

Great to hear, I'll add a commit that removes it so it can be reverted as a basis of building it more fleshed out. It should be easy to provide more API surface in tiff for this and then the performance benefits can be demonstrated.

I think it is a bit awkward that you have to decode the image data to know whether the metadata could appear after it. My understanding is that it is mostly for PNG to say that the metadata could come late in the file? (The WebP and TIFF formats also can have metadata after the pixel data, but they provide file offsets so our decoders directly seek to/from that file position)

I wouldn't say mostly, the other point is that TIFF has metadata per image and advancing to the next would be destructive in the sense that one would skip one set of possibly available metadata. With hindsight bias of the structure in this PR, the case where it is delayed beyond all image data in IPTC-for-PNG (other kinds of metadata are sane) is less of a problem in my eyes since it at least behaves consistently for both single images and sequences.

  • That said, those fields can clearly be moved. Is it constant for the whole decoder? In that case my best idea is creating a struct for these which also contains the other information, is_animated and is_image_sequence (though not quite sure what to expect from these semantically, but definitely discourage a check for more_images in into_frames). Then ImageReader should acquire these once which also gets rid of the wart of last_attributes being a bad buffer. And should anything pop up where they are not constants then adding a VariesPerImage variant seems straightforward (independent of where that poll-each information should live, probably retrieved after peek_layout).

My inclination is that the default should be matching web browsers with an API to opt-in to getting trailing metadata, but I don't know what that interface would look like.

For metadata only available on finish, ImageReader could offer such a utility that would either consume the reader after a call to ImageDecoder::finish and the metadata read, or add a seek method (modelled after tiff) first with semantics of ensuring they can still be used after finish.

Other thoughts

  • I think the ImageReader type should provide convenience methods for getting the color type, dimensions, and perhaps even width/height individually. 👍 will do when removing them from the trait.
  • I'm not totally sold on ImageDecoder::peek_layout over separate dimensions/color_type/etc. methods, but it is probably fine for the lower level API: Right, that was the idea. Also the decoders I looked at, internally, have a clean sequence point where all that information is available and would poll until there for either. So it didn't really make sense to split them up too much. (Also it gives us one clean non_exhaustive struct as an extension point in the sequence should something come up).
  • We should remember to have ImageReader::decode query/apply the image orientation and as a possible addition maybe have a decode_raw method that skips applying any metadata.

@197g 197g force-pushed the decoder-metadata-interface branch from d6c1cb4 to 4ad4a6f Compare December 31, 2025 16:42
This commit is meant to be revertable. The demonstration code was an
experiment at improving the protocol between the trait implementation
and the user facing `ImageReader`. While successfully showing that the
interface would work with extension points as intended the details are
not fleshed out and we would like at least some implementations to
provide actual `viewbox` internals to add this feature.

Also remember: this is not yet tested, `ImageReader::peek_layout` and
all its derived methods should very likely consult the viewbox setting.
@197g
Copy link
Member Author

197g commented Jan 1, 2026

Re: metadata, and iterating image sequences. With tiff it becomes apparent that there should be an explicit iteration or that the per-image metadata should be retrieved between a call to peek_layout and read_image. I think that would also address the concern about 'previously read image' here to instead require this sequence. Decoders that have to buffer some of it need to do so in either case and it seems simpler to buffer in this sequence:

peek_layout()? // reference or buffer metadata
xmp_metadata()?
read_image()? // discard buffered per-image metadata

That would be compatible with a future set_hints in which the reader can inform the Decoder that some metadata will not be queried / can always return None; and that method is backwards compatible and acts correctly as a hint, i.e. can be ignored. In addition, the ImageReader can now react to available metadata with further performance-relevant hints before calling read_image. Example 1: for viewbox we could now choose between a viewbox in the the pixel matrix vs. a box in the EXIF orientation (idk if that is a good idea); Ex2: subsampling information is part of EXIF and underlying color profile part of ICC, both of which can inform the ImageReader about duplicate transforms that could be avoided with some extensions to the protocol.

@197g 197g force-pushed the decoder-metadata-interface branch from 2604a6e to d469ab8 Compare January 1, 2026 15:00
197g added 2 commits January 1, 2026 16:05
This method was mostly implemented to acquire EXIF information and
retrieve the entry from it. However, that is a weird way of using the
trait system with this mutable method. Instead we move the
responsibility of extracting and manipulation of EXIF to the ImageReader
and have the decoder interface only support its input. The orientation
is added as a field to per-image decoded data in case where such
information exists outside of EXIF (e.g. in TIFF).
At least the per-image metadata should be retrieved there. This also
ensures that there is a consistent location for data available after the
header for all images and data that changes per image. The call to
`read_image` can then be destructive for all metadata that is only valid
for that particular frame (e.g. TIFF) which it needed to delay until the
next `peek_layout` before, leading to very odd interactions. Previously,
it meant that `read_image` was supposed to cleanup the image data itself
but not the metadata. This split responsibility is not super intuitive
for many image types..
@197g 197g force-pushed the decoder-metadata-interface branch 4 times, most recently from c099e78 to 9d8b375 Compare January 1, 2026 21:29
@197g
Copy link
Member Author

197g commented Jan 1, 2026

I think the ImageReader type should provide convenience methods for getting the color type, dimensions, and perhaps even width/height individually.

I've put these on the ImageLayout type now, not immediately on ImageReader. By way of example (while rewriting the necessary code) this seems reasonably ergonomic and incidentally addresses both interfaces.

Having separate methods for these is not very effective in terms of
protocol. All the formats we have can determine them at the same
sequence point and `peek_layout` has become somewhat central in the
definition of the protocol. It defines the buffer requirements for
`read_image`—having one method means one source of truth for this length
requirement which is far clearer than requiring that the caller run
multiple methods and combine their results (with unclear semantics of
running intermediate other methods).
@197g 197g force-pushed the decoder-metadata-interface branch from 9d8b375 to 2f71c9a Compare January 1, 2026 21:36
@197g
Copy link
Member Author

197g commented Jan 2, 2026

I do like having animation decoding also handled by ImageDecoder. I'm a bit torn about whether it should be a single method or having separate read_image and next_frame methods, mostly so that PNG's thumbnail can be extracted separately from the frames of an APNG animation.

I think that's the main remaining interface question. My take on it would be to put this information into DecodedImageAttributes together with delay. Since the whole struct is a non_exhaustive bag there should be ample room for a 'relation'-kind of information, including alternates for thumbnails, identifiers for frames you can seek back to, etc. How exactly should probably refer to some prior art. (And for that reason, acting similar to a browser makes a lot of sense in other regards, too).

@fintelia What's not entirely clear to me is if we should turn such information into a separate call, like metadata, between peek_layout and read_image or if it should get attached to the return value from read_image. But given this is somewhat minor compared to the overall structural impact of the other part of this rework, can we move this question to a secondary PR? I'll add comments regarding this to the V1 issue and relevant code.

@ronjakoi
Copy link

ronjakoi commented Jan 6, 2026

I think it would be nice to have an iterator of all images in a file, where in the case of pyramid TIFFs for example, I could filter for the image with dimensions most closely matching a given need.

@197g 197g mentioned this pull request Jan 7, 2026
24 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants