-
Notifications
You must be signed in to change notification settings - Fork 680
Decoder metadata interface #2672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
With this framing, I think Limits::max_image_width and Limits::max_image_height no longer need to be communicated to or handled by the ImageDecoder trait, because the external code can check ImageDecoder::dimensions() before invoking ImageDecoder::read_image(); only the memory limit (Limits::max_alloc) is essential. That being said, the current way Limits are handled by ImageDecoder isn't that awkward to implement, so to reduce migration costs keeping the current ImageDecoder::set_limits() API may be OK. |
|
A couple thoughts... I do like the idea of handling animation decoding with this same trait. To understand, are you thinking of "sequences" as being animations or also stuff like the multiple images stored in a TIFF file? Even just handling animation has some tricky cases though. For instance in PNG, the default image that you get if you treat the image as non-animated may be different from the first frame of the animation. We might need both a The addition of an |
It's a dyn-compatible way that achieves the goal of the constructor so it is actually an abstraction.
What do you by this? The main problem in I'm also not suggesting that calling |
|
@fintelia This now includes the other changes including to
|
|
I can't speak about image metadata, but I really don't like the new
Regarding rectangle decoding, I think it would be better if we force decoders to support arbitrary rects. That's because the current interface is actually less efficient by allowing decoder to support only certain rects. To read a specific rect that is not supported as is, However, most image formats are based on lines of block (macro pixels). So we can do a trick. Decode a line according to the too-large rect, and then only copy the pixels in the real rect to the output buffer. This reduces the memory overhead for unsupported rects from And if a format can't do the line-based trick for unsupported rects, then decoders should just allocate a temp buffer for the too-large rect and then crop (=copy what is needed). This is still just as efficient as the best For use cases where users can use rowpitch to ignore the exccess parts of the too-large rect, we could just have a method that gives back a preferred rect, which can be decoded very efficiently. So the API could look like this: trait ImageDecoder {
// ...
/// Returns a viewbox that contains all pixels of the given rect but can potentially be decoded more efficiently.
/// If rect decoding is not supported or no more-efficient rect exists, the given rect is returned as is.
fn preferred_viewbox(&self, viewbox: Rect) -> Rect {
viewbox // default impl
}
fn read_image_rect(&mut self, buf, viewbox) -> ImageResult {
Err(ImageError::Decoding(Decoding::RectDecodingNotSupported)) // or similar
}This API should make rect decoding easier to use, easier to implement, and allow for more efficient implementations. |
86c9194 to
cdc0363
Compare
That was one of the open questions, the argument you're presenting makes it clear it should return the layout and that's it. Renamed to
It's suppose to be to the full image. Yeah, that needs more documentation and pointers to the proper implementation. |
| } | ||
| } | ||
|
|
||
| impl ImageReader<'_> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it is easier for scrolling through the file if the impl blocks for each struct immediately follow the definitions
src/io/image_reader_type.rs
Outdated
| self.viewbox = Some(viewbox); | ||
| } | ||
|
|
||
| /// Get the previously decoded EXIF metadata if any. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "previously decoded" part here makes me a bit nervous. I think we'll want to be clearer about what the user has to do to make sure they don't get None for images that actually do have EXIF data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit odd and will depend on the format and metadatum. For instance, XMP is encoded per-image in tiff but only once in gif (despite us representing this as an image sequence) and also only once in png (no word about APNG). Problematically, in gif and png the standard requires absolutely no ordering with any of the other chunks. So it might be encountered before all of the header information is done; or after all the images have been consumed.
The problem with back references is of course the unclear association. And when multiple are included we always have a problem with consuming them 'after the end' since it should need to be buffered or the decoder able to store seek-back points (like png). Encoding all that in the interface is a challenge, i will incur some unavoidable complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the metadata query between peek_layout and read_image doesn't really affect this argument with the variants of MetadataHint that I've found to be necessary. So that is cleaner, see #2672 (comment)
Of course we have last_attributes still remaining since that is information that combines the information the reader has retrieved, e.g. the orientation given directly from read_image together with the fallback of querying it from exif.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
1a114c3 to
306c6d2
Compare
|
Resolving the naming question as |
e8d2713 to
4325060
Compare
|
@fintelia I understand this is too big for a code-depth review but I'd be interested in the directional input. Is the merging of 'animations' and simple images as well as the optimization hint methods convincing enough? Is the idea of returning data from As an aside, in wondermagick we basically find that sequence encoding is a missing API to match imagemagick. We can currently only do this with |
f6720de to
c677c88
Compare
The purpose of the trait is to be careful with the boundary to the `moxcms` dependency. We use it because it is a quality implementation but it is heavy weight for what we need. There's other possible ways to provide transfer functions and color space transforms. Now this also introduces ICC profile parsing but again that could be done with a *much* lighter dependency as we only need basic information from it. The trait should make every little additional cross-dependency a conscious decision. Also it should be the start of a customization point, by feature flag or actually at runtime.
No longer responsible for ensuring the size constraints are met under the new policy and with available of constructing a reader from an instance of a boxed decoder.
This allows us to write a generic iterator which uses the same decoder function to generate a whole sequence of frames. The attributes are designed to be extensible to describe changes in available metadata as well, very concretely some formats require that XMP/ICC/… are polled for each individual image whereas others have one for the whole file and put that at the end. So there's no universal sequence for querying the metadata, and we need to hold runtime information. This will be the focus of the next commit.
ffbcf67 to
97a043b
Compare
I do like having animation decoding also handled by We should probably also have an I also like the
I think it is a bit awkward that you have to decode the image data to know whether the metadata could appear after it. My understanding is that it is mostly for PNG to say that the metadata could come late in the file? (The WebP and TIFF formats also can have metadata after the pixel data, but they provide file offsets so our decoders directly seek to/from that file position) For PNG we're in an especially weird position where web browsers will generally ignore metadata after the image data, while certain other viewers load it normally. Picking either option means someone is going to think we're doing it wrong. My inclination is that the default should be matching web browsers with an API to opt-in to getting trailing metadata, but I don't know what that interface would look like. Frame delay could also be a separate method on Other thoughts
|
Great to hear, I'll add a commit that removes it so it can be reverted as a basis of building it more fleshed out. It should be easy to provide more API surface in
I wouldn't say mostly, the other point is that TIFF has metadata per image and advancing to the next would be destructive in the sense that one would skip one set of possibly available metadata. With hindsight bias of the structure in this PR, the case where it is delayed beyond all image data in IPTC-for-PNG (other kinds of metadata are sane) is less of a problem in my eyes since it at least behaves consistently for both single images and sequences.
For metadata only available on finish, Other thoughts
|
d6c1cb4 to
4ad4a6f
Compare
This commit is meant to be revertable. The demonstration code was an experiment at improving the protocol between the trait implementation and the user facing `ImageReader`. While successfully showing that the interface would work with extension points as intended the details are not fleshed out and we would like at least some implementations to provide actual `viewbox` internals to add this feature. Also remember: this is not yet tested, `ImageReader::peek_layout` and all its derived methods should very likely consult the viewbox setting.
|
Re: metadata, and iterating image sequences. With peek_layout()? // reference or buffer metadata
xmp_metadata()?
read_image()? // discard buffered per-image metadataThat would be compatible with a future |
2604a6e to
d469ab8
Compare
This method was mostly implemented to acquire EXIF information and retrieve the entry from it. However, that is a weird way of using the trait system with this mutable method. Instead we move the responsibility of extracting and manipulation of EXIF to the ImageReader and have the decoder interface only support its input. The orientation is added as a field to per-image decoded data in case where such information exists outside of EXIF (e.g. in TIFF).
At least the per-image metadata should be retrieved there. This also ensures that there is a consistent location for data available after the header for all images and data that changes per image. The call to `read_image` can then be destructive for all metadata that is only valid for that particular frame (e.g. TIFF) which it needed to delay until the next `peek_layout` before, leading to very odd interactions. Previously, it meant that `read_image` was supposed to cleanup the image data itself but not the metadata. This split responsibility is not super intuitive for many image types..
c099e78 to
9d8b375
Compare
I've put these on the |
Having separate methods for these is not very effective in terms of protocol. All the formats we have can determine them at the same sequence point and `peek_layout` has become somewhat central in the definition of the protocol. It defines the buffer requirements for `read_image`—having one method means one source of truth for this length requirement which is far clearer than requiring that the caller run multiple methods and combine their results (with unclear semantics of running intermediate other methods).
9d8b375 to
2f71c9a
Compare
I think that's the main remaining interface question. My take on it would be to put this information into @fintelia What's not entirely clear to me is if we should turn such information into a separate call, like metadata, between |
|
I think it would be nice to have an iterator of all images in a file, where in the case of pyramid TIFFs for example, I could filter for the image with dimensions most closely matching a given need. |
See #2245, the intended
ImageDecoderchanges.This changes the
ImageDecodertrait to fix some underlying issues. The main change is a clarification to the responsibilities; the trait is an interface from an implementor towards theimagelibrary. That is, the protocol established from its interface should allow us to drive the decoder into our buffers and our metadata. It is not optimized to be used by an external caller which should prefer the use ofImageReaderand other inherent methods instead.This is a work-in-progress, below motivates the changes and discusses open points.
ImageDecoder::peek_layoutencourages decoders to read headers after the constructor. This fixes the inherent problem we had with communicating limits. The sequences for internal use is roughly:ImageDecoder::read_image(&mut self)no longer consumesself. We no longer need the additionalboxedmethod and its trait work around, the trait is now dyn-compatible.Discussion
peek_layoutmore consistently afterread_imageinitpeek_layoutshould return the full layout information in a single struct. We have a similar open issue forpngin its own crate, and the related work fortiffis in the pipeline where itsBufferLayoutPreferencealready exists to be extended with said information.Review limits and remove its size bounds insofar as they can be checked against the communicated bounds in the metadata step by thesee: Replaceimageside.ImageDecoder::set_limitswithImageDecoder::set_allocation_limit#2709, Add an atomically shared allocation limit #27081.1, but it's not highly critical.read_imageis 'destructive' in all decoders, i.e. re-reading an image and reading an image beforeinitshould never access an incorrect part of the underlying stream but instead return an error. Affects pnm and qoi for instance where the read will interpret bytes based on the dimensions and color, which would be invalid before reading the header and only valid for one read.read_imagethen switching to a sequence reader. But that is supposed to become mainly an adapter that implements the iterator protocol.ImageReaderwith a new interface to return some of it. That may be better suited for a separate PR though.CicpRgband apply it to a decodedDynamicImage.