Skip to content

5.0.0b1

Latest
Compare
Choose a tag to compare
@github-actions github-actions released this 03 Feb 21:35
· 17 commits to main since this release

Changes (Release 5.0.0b1)

Summary (pypdfium2)

API changes

  • Rendering / Bitmap
    • Removed PdfDocument.render() (see deprecation rationale in v4.25 changelog). Instead, use PdfPage.render() with a loop or process pool.
    • Removed PdfBitmap.get_info() and PdfBitmapInfo, which existed mainly on behalf of data transfer with PdfDocument.render(). Instead, take the info from the PdfBitmap object directly. (If using an adapter that copies, you may want to store the relevant info in variables to avoid holding a reference to the original buffer.)
    • PdfBitmap.fill_rect(): Changed argument order. The color parameter now goes first.
    • PdfBitmap.to_numpy(): If the bitmap is single-channel (grayscale), use a 2d shape to avoid needlessly wrapping each pixel value in a list.
    • PdfBitmap.from_pil(): Removed recopy parameter.
  • Pageobjects
    • Renamed PdfObject.get_pos() to .get_bounds().
    • Renamed PdfImage.get_size() to .get_px_size().
    • PdfImage.extract(): Removed fb_render option because it does not fit in this API. If the image's rendered bitmap is desired, use .get_bitmap(render=True) in the first place.
  • PdfDocument.get_toc(): Replaced PdfOutlineItem namedtuple with method-oriented wrapper classes PdfBookmark and PdfDest, so callers may retrieve only the properties they actually need. This is closer to pdfium's original API and exposes the underlying raw objects. Provides signed count as-is rather than splitting in n_kids and is_closed. Also distinguishes between dest is None and a dest with unknown mode.
  • Renamed misleading PdfMatrix.mirror() parameters v, h to invert_x, invert_y, as the terms horizontal/vertical flip commonly refer to the transformation applied, not the axis around which is being flipped (i.e. the previous v meant flipping around the Y axis, which is vertical, but the resulting transform is inverting the X coordinates and thus actually horizontal). No behavior change if you did not use keyword arguments.
  • get_text_range(): Removed implicit translation of default calls to get_text_bounded(), as pdfium reverted FPDFText_GetText() to UCS-2, which resolves the allocation concern. However, callers are encouraged to explicitly use get_text_bounded() for full Unicode support.
  • Removed legacy version flags.

Improvements and new features

  • Added PdfPosConv and PdfBitmap.get_posconv(page) helper for bidirectional translation between page and bitmap coordinates.
  • Added PdfObject.get_quad_points() to get the corner points of an image or text object.
  • Exposed PdfPage.flatten() (previously semi-private _flatten()), after having found out how to correctly use it. Added check and updated docs accordingly.
  • With PdfImage.get_bitmap(render=True), added scale_to_original option (defaults to True) to temporarily scale the image to its pixel size. Thanks to Lei Zhang for the suggestion.
  • Added context manager support to PdfDocument, so it can be used in a with-statement, because opening from a file path binds a file descriptor (usually on the C side), which should be released explicitly, given OS limits.
  • If document loading failed, err_code is now assigned to the PdfiumError instance so callers may programmatically handle the error subtype.
  • In PdfPage.render(), added a new option use_bgra_on_transparency. If there is page content with transparency, using BGR(x) may slow down PDFium. Therefore, it is recommended to set this option to True if dynamic (page-dependent) pixel format selection is acceptable. Alternatively, you might want to use only BGRA via force_bitmap_format=pypdfium2.raw.FPDFBitmap_BGRA (at the cost of occupying more memory compared to BGR).
  • In PdfBitmap.new_*() methods, avoid use of .from_raw(), and instead call the constructor directly, as most parameters are already known on the caller side when creating a bitmap.
  • In the rendering CLI, added --invert-lightness --exclude-images post-processing options to render with selective lightness inversion. This may be useful to achieve a "dark theme" for light PDFs while preserving different colors, but goes at the cost of performance. (PDFium also provides a color scheme option, but this only allows you to set colors for certain object types, which are then forced on all instances of the type in question. This may flatten different colors into one, leading to a loss of visual information.)
  • Corrected some null pointer checks: we have to use bool(ptr) rather than ptr is None.
  • Improved startup performance by deferring imports of optional dependencies to the point where they are actually needed, to avoid overhead if you do not use them.
  • Simplified version classes (no API change expected).

Platforms

  • Experimental Android support added (cf. PEP 738). arm64_v8a, armeabi_v7a, x86_64, x86 are now handled in setup and should implicitly download the right binaries. We do not publish any android wheels at this time (for one thing, PyPI/warehouse does not support them yet). However, we might want to package arm64_v8a (and maybe armeabi_v7a) wheels in the future. Note, android support is provided on a best effort basis, and largely untested (only arm64 Termux prior to PEP 738 has been tested on the author's phone). Please report success or failure.
  • Experimental iOS support added as well (cf. PEP 730). arm64 device and simulator, and x86_64 simulator are now handled and should implicitly download the right binaries. However, this is untested and may not be enough to get all the way through. In particular, the PEP hints that the binary needs to be moved to a Frameworks location, in which case you'd also need to change the library search path. No iOS wheels will be provided at this time. However, if there are testers and an actual demand, iOS arm64 wheels may be enabled in the future.
  • Note, we have no intent to package wheels for the simulators (android x86_64/x86, ios arm64_simu/x86_64), as they are only relevant to developers, and installing from source with implicit binary download should be roughly equialvent.

Setup

  • Avoid needlessly calling _get_libc_ver(). Instead, call it only on Linux. A negative side effect of calling this unconditionally is that, on non-Linux platforms, an empty string may be returned, in which case the musllinux handler would be reached, which uses non-public API and isn't meant to be called on other platforms (though it seems to have passed).
  • If packaging with PDFIUM_PLATFORM=sourcebuild, forward the platform tag determined by bdist_wheel's wrapper, rather than using the underlying sysconfig.get_platform() directly. This may provide more accurate results, e.g. on macOS.

Project

  • Made the runfile fail fast and propagate errors via bash -eu. This is actually quite important to avoid potentially continuing on a broken state in CI.
  • CI: Added Linux aarch64 (GH now provides free runners) and Python 3.13 to the test matrix.
  • Merged tests_old/ back into tests/.
  • Migrated from deprecated .reuse/dep5/.reuse/dep5-wheel to more visible REUSE.toml/REUSE-wheel.toml.
  • Docs: Improved logic when to include the unreleased version warning and upcoming changelog.
  • Bumped minimum pdfium requirement in conda recipe to >6635 (effectively >=6638), due to new errchecks that are not version-guarded.
  • Cleanly split out conda packaging into an own file, and confined it to the conda/ directory, to avoid polluting the main setup code.
pypdfium2 commit log

Commits between 4.30.1 and 5.0.0b1 (latest commit first):

PDFium commit log

Commits between 6899 and 6996 (latest commit first):

  • 012fe571c Fix unnecessary tree traversal in SearchNameNodeByNameInternal()
  • 3c2bfd785 Refactor SearchNameNodeByNameInternal()
  • a9f2f0f33 Use CIDToGIDMap to fill font widths in FPDFText_LoadCidType2Font()
  • 0d2d104ba Roll goldctl from 78856799f02f to 9389855cfb14
  • d69e9855e Add even better compiler-support section to README.md
  • 6c386f729 Always initialize CFX_SkiaDeviceDriver::m_bRgbByteOrder
  • a78c76720 Add supported compilers section to README.md
  • ef5fcdf6e Remove some MSVC-specific code
  • fa6581277 Allow options and input files in any order in pdfium_test
  • 170de1e03 Fix stack-use-after-scope in pdfium_test
  • 2febc2869 Fix FPDFText_GetLooseCharBox() to handle rotation
  • 4c7464b07 Add tests to show FPDFText_GetLooseCharBox() bug with rotated text
  • 89a94c1b9 Fix test helper to get correct indices from rotated_text.pdf
  • 603caea4e Add a helper to FPDFTextEmbedderTest for use with rotated_text.pdf
  • da069983b Roll libpng from cf7c36ed084c to 28213bcabe21 (1 revision)
  • 859f92a77 Check the font width array generated by FPDFText_LoadCidType2Font()
  • efe66807a Add GetWidthsArrayForCidFont() helper to fpdf_edit_embeddertest.cpp
  • f6da7d235 Add comment for subtle code in CPDF_StreamContentParser
  • fab1b6d64 Add debugging data to help diagnose a hang in fread()
  • 6be4f3be7 Rename pdfium_unsafe_buffers_paths.txt file
  • e99f1e8d5 Avoid out of bounds crash when reading fonts
  • 594caeb0e Avoid fixed-offset NULL-deref in XFA_Node::InsertChildAndNotify().
  • aacaea19d [AGG] Only add positive dash lengths and gap lengths
  • b4cf887f7 Add pixel test for negative dash scales
  • 3cd0a262c Use AutoRestorer in CPDF_StreamParser::ReadInlineStream()
  • 4bc397f60 Rename local variables in CPDF_StreamParser::ReadInlineStream()
  • 7420dfeed Fix pdfium_test in Chromium builds when Skia is enabled by default
  • d8b668c01 Making CPDF_SyntaxParser::FindTag(ByteStringView tag) robust
  • 320fc870f Roll Zlib from 82a5fecf8aae to b763971bcaa3 (1 revision)
  • e116b67b1 Fix bad refactoring in CXFA_TextParser::GetFont()
  • 67a00b167 Add test showing copies do not happen in fxcrt::Zip().
  • 20b8b48e4 Avoid UNSAFE_TODO() in AreColorIndicesOutOfBounds().
  • 28cfa3a8a Remove distinction between input/output views in fxcrt::Zip().
  • ea4eab892 Update documentation and tests for fxcrt::Zip()
  • da206beb2 Make PDFium's compiler_specific.h use clang's UNSAFE_BUFFERS_BUILD
  • 4adcb08d8 Roll build, clang, and rust
  • 4886ee0d3 Update gn_version to c97a86a72105f3328a540f5a5ab17d11989ab7dd
  • ed30f70b4 Roll buildtools and libc++
  • 16df41e4b Roll v8/ 3e984a9e0..75be3dcb5 (277 commits)
  • 0320375fa Add third_party/highway dependency
  • ab72191db Roll third_party/freetype/src/ 0ae7e6073..afc7000ca (9 commits)
  • 77e7dee60 Roll v8/ 313e6ed36..3e984a9e0 (189 commits)
  • 8198c4e98 Roll third_party/libc++abi/src/ 6c4fa00e4..83dfa1f5b (12 commits)
  • ac5bfacd6 Update reclient_version to re_client_version:0.172.0.3cf60ba5-gomaip
  • d1e80cff3 Roll testing/scripts/rust/ 347b3c20a..6712dc59f (1 commit)
  • 0af514970 Roll third_party/llvm-libc/src/ 4c70d6b5a..60b7db20a (87 commits)
  • 328507313 Roll third_party/libunwind/src/ 5b01ea4a6..d1e95b102 (4 commits)
  • 4ad60e37a Roll third_party/abseil-cpp/ 0b76dfe4f..72093794a (6 commits)
  • 84acf3a55 Roll third_party/googletest/src/ d14403194..7d76a231b (6 commits)
  • 7f588b3b4 Roll base/allocator/partition_allocator/ c551156ef..9cab8b0d1 (13 commits)
  • b5d8c977c Roll third_party/icu/ 4239b1559..bbccc2f6e (5 commits)
  • 1ef5cd32e Roll third_party/skia/ 3db026d62..975788ea9 (248 commits)
  • 8d0676e4f Roll third_party/clang-format/script/ 37f6e68a1..1549a8dba (3 commits)
  • c1992c827 Roll Catapult from 6a0960fe97ab to 86d6f8ee6130 (59 revisions)
  • 994b9858b Roll Code Coverage from 719f1eba4379 to 5e7c277c0d8c (2 revisions)
  • cbf0bb586 Roll Depot Tools from 8d20c1e0b56c to 58625e82c685 (39 revisions)
  • 55a8262e3 Roll goldctl from b2da51fa8d3a to db814b551104
  • a4cbdc9ed Update OpenJPEG to 2.5.3
  • d48287fd9 add missing includes for the build with use_libcxx_modules
  • b69783fd1 Begin marking unsafe libc functions as UNSAFE_BUFFERS().
  • bea10144d Roll Instrumented Libraries from 69291a3c7c79 to 3cc43119a291 (2 revisions)

Edit: Removed the *.publish.attestation files that were inadvertently included in the GH release.