-
-
Notifications
You must be signed in to change notification settings - Fork 424
refactor: remove poppler dependency, use hayro for PDF import #1628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Do you want to do a sanity check on pdf-related svg conversion issues ? Go over all issues related to pdf svg import and see whether or not they are fixed ? I'm thinking of doing a small list
And go over them, check what is fixed by this dependency swap, and maybe open issues upstream for the rest |
|
yes, will do 👍 |
|
and sorry about the diff - I took the opportunity to also separate |
|
Mh, this is a little bit annoying. The CI fails with out of space This time it got a little further down but still failed. Do we have too much thing in the cache from previous dependencies ? Should we split/try to separate the ui from the cli ? |
| /// Simplify the Svg by passing it through [usvg]. | ||
| /// | ||
| /// Also moves the bounds to mins: [0., 0.], maxs: extents | ||
| pub fn simplify(&mut self) -> anyhow::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still go through simplification for pdf->svg ?
Do we have to? (how does typst makes svg in svg works). Or is it mostly for compression reasons ?
We'll have to keep in mind that rendering issues can come from here in addition to hayro. I think when going through the list of pdf with issues it'd be good to make COORDINATES_PREC and TRANSFORMS_PREC settable (if possible) from varenv for debug purposes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the only uses are now on export
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mh ... I think we still go through from the from_svg_str call inside the pdf_from_bytes function (and there are some COORDINATES_PREC/TRANSFORMS_PREC there).
Trying a (very dirty)
let coordinates_prec = std::env::var("COORDINATES_PREC")
.unwrap_or(String::from("3"))
.parse()
.unwrap_or(3);
let transform_prec = std::env::var("TRANSFORMS_PREC")
.unwrap_or(String::from("4"))
.parse()
.unwrap_or(3);varenv just to see if it matters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh my bad, I was only looking at invocations of simplify(). I'll see if skipping the simplification can be a special case just for SVGs generated by the PDF import without issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda need to go over the simplification-related issues (I mean who knows, maybe svg transforms end up cleaner with hayro vs poppler ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might actually be more tricky than that.
Because if we skip this step for the stroke creation, things will be fine in the rnote window but then exporting to svg will reintroduce this because the simplification will apply to the full svg (and may be actually worse because of added transforms on top of the already "numerically sensible" nested transforms that can't be simplified up to n decimals without snowball effects).
But the pdf export will be fine ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's not the time and place for this though .. I feel like we kinda have to do all or nothing here if we want to stay consistent. Or try to see if there is a way upstream linebender/resvg#877 to simplify whilst preserving positions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember why I set the values specifically to 3 and 4 anymore. In retrospect, I should have added documentation there. I assume it is about file sizes. If increasing these values to something more sensible (what about 6, 6 or just the default 8)? does not increase file sizes too much, we could just consider doing that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested
- import a pdf with previous/new settings and export to pdf/svg
- Scribble a lot (at this point this is a file for benchmarking) and export to svg with previous/updated precision settings
And it doesn't seem to do much.
Now on the pdf side, things are much better with 6, but I think that if the intent was to compress strokes by reducing decimal points, I'd be better to test 3 for coordinates and 8 for transforms. And if this is actually it, maybe it's better altogether to implement the coordinates simplification on the strokes type itself (truncate to the x-th decimal all coordinates then convert to svg using a boolean flag for compression upon export).
Comparing side by side (left to right)
- the original pdf
- pdf vector import and export as pdf (this PR + default precision)
- pdf vector import and export as pdf with both precision set to 6 (on this PR)
- pdf vector import and export as pdf with coordinates precision to 3, transform precision to 8 (from this PR)
We get a lot back with the precision set to 6, with some slight offsets still present, but keeping transforms up to max precision is better in that case.
|
Starting to test the change with the most challenging pdf firsts
I've not gone over every pdf in issues, it seems to fare better than |


This removes the C++ poppler dependency, and the rust poppler-rs bindings. The bindings are semi-unmaintained and often lagged behind in updates to their cairo dependency.
Often this was a blocker in updates for the gtk & cairo stack.
The replacement is hayro, a rust-only PDF rendering library.
It supports rendering as PNG and SVG, so supports our import functionality.
It also has gained support for encrypted PDFs.
AFAIK, it is also the library used by typst for rendering included PDFs.
The only regression with this change is the "page-borders" optiosn, which I removed.
This can be re-added as soon as hayro supports rendering to a existing vello Scene (see open issue LaurenzV/hayro#821).
In my opinion, the benefits outweigh the (temporary) removal of this feature.