You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+25-12
Original file line number
Diff line number
Diff line change
@@ -246,6 +246,7 @@ Here are some examples of using the support model API.
246
246
* Import the library
247
247
```python
248
248
import pypdfium2 as pdfium
249
+
import pypdfium2.raw as pdfium_c
249
250
```
250
251
251
252
* Open a PDF using the helper class `PdfDocument` (supports file path strings, bytes, and byte buffers)
@@ -266,6 +267,10 @@ Here are some examples of using the support model API.
266
267
pil_image = bitmap.to_pil()
267
268
pil_image.show()
268
269
```
270
+
271
+
Note, with the PIL adapter, it might be advantageous to use `force_bitmap_format=pdfium_c.FPDFBitmap_BGRA, rev_byteorder=True` or maybe `prefer_bgrx=True, use_bgra_on_transparency=True, rev_byteorder=True`, to achieve a pixel format supported natively by PIL, and avoid rendering with transparency to a non-alpha bitmap, which can slow down pdfium.
272
+
273
+
With `.to_numpy()`, all formats are zero-copy, but passing either `use_bgra_on_transparency=True` (if dynamic pixel format is acceptable) or `force_bitmap_format=pdfium_c.FPDFBitmap_BGRA` is also recommended for the transparency problem.
269
274
270
275
* Try some page methods
271
276
```python
@@ -371,6 +376,7 @@ Nonetheless, the following guide may be helpful to get started with the raw API,
371
376
[^pdfium_docs]: Unfortunately, no recent HTML-rendered docs are available for PDFium at the moment.
372
377
373
378
<!-- TODO write something about weakref.finalize(); add example on creating a C page array -->
379
+
<!-- TODO doctests? -->
374
380
375
381
* In general, PDFium functions can be called just like normal Python functions.
376
382
However, parameters may only be passed positionally, i.e. it is not possible to use keyword arguments.
@@ -478,25 +484,29 @@ Nonetheless, the following guide may be helpful to get started with the raw API,
478
484
479
485
* Leaving strings, let's suppose you have a C memory buffer allocated by PDFium and wish to read its data.
480
486
PDFium will provide you with a pointer to the first item of the byte array.
481
-
To access the data, you'll want to re-interpret the pointer with `ctypes.cast()` to encompass the whole array:
487
+
To access the data, you'll want to re-interpret the pointer to an array view with `.from_address()`:
482
488
```python
483
489
# (Assuming `bitmap` is an FPDF_BITMAP and `size` is the expected number of bytes in the buffer)
Note that you can achieve the same result with `ctypes.cast(ptr, POINTER(type * size)).contents`, but this is somewhat problematic as ctypes seems to cache pointer types eternally. As `size` may vary, this can lead to memory leak like scenarios with long-running applications, so better avoid doing that.
491
501
492
502
* Writing data from Python into a C buffer works in a similar fashion:
493
503
```python
494
504
# (Assuming `buffer_ptr` is a pointer to the first item of a C buffer to write into,
495
505
# `size` the number of bytes it can store, and `py_buffer` a Python byte buffer)
# Read from the Python buffer, starting at its current position, directly into the C buffer
498
508
# (until the target is full or the end of the source is reached)
499
-
n_bytes = py_buffer.readinto(buffer_ptr.contents) # returns the number of bytes read
509
+
n_bytes = py_buffer.readinto(buffer) # returns the number of bytes read
500
510
```
501
511
502
512
* If you wish to check whether two objects returned by PDFium are the same, the `is` operator won't help because `ctypes` does not have original object return (OOR), i.e. new, equivalent Python objects are created each time, although they might represent one and the same C object.[^ctypes_no_oor]
@@ -642,13 +652,16 @@ Nonetheless, the following guide may be helpful to get started with the raw API,
Copy file name to clipboardexpand all lines: docs/devel/changelog_staging.md
+5
Original file line number
Diff line number
Diff line change
@@ -33,6 +33,11 @@
33
33
- In `PdfBitmap.new_*()` methods, avoid use of `.from_raw()`, and instead call the constructor directly, as most parameters are already known on the caller side when creating a bitmap.
34
34
- In the rendering CLI, added `--invert-lightness --exclude-images` post-processing options to render with selective lightness inversion. This may be useful to achieve a "dark theme" for light PDFs while preserving different colors, but goes at the cost of performance. (PDFium also provides a color scheme option, but this only allows you to set colors for certain object types, which are then forced on all instances of the type in question. This may flatten different colors into one, leading to a loss of visual information.)
35
35
- Corrected some null pointer checks: we have to use `bool(ptr)` rather than `ptr is None`.
36
+
- Avoid creation of sized pointer types at runtime, to avoid blowing up Python's unbounded pointer type cache, which could effectively lead to a memory leak in a long-running application (i.e. do `(type * size).from_address(addressof(first_ptr.contents))` instead of `cast(first_ptr, POINTER(type * size)).contents`). In our opinion, the root issue is ctypes using an unlimited cache in the first place. Upstream have already signalled willingness to address this in a future version of Python. Thanks to Richard Hundt for the bug report, {issue}`346`. See below for a list of APIs that were affected:
37
+
* Anything using `_buffer_reader`/`_buffer_writer` under the hood (`PdfDocument` created from byte stream input, `PdfImage.load_jpeg()`, `PdfDocument.save()`).
38
+
*`PdfBitmap.from_raw()` rsp. `PdfBitmap._get_buffer()` and their internal callers (`PdfBitmap` makers `new_foreign` and `new_foreign_simple`, `PdfImage.get_bitmap()`).
39
+
* Also, some Readme snippets were affected, including the raw API rendering example. The Readme has been updated to mention the problem and use `.from_address(...)` instead.
40
+
**With older versions, periodically calling `ctypes._reset_cache()` can work around this issue.*
36
41
- Improved startup performance by deferring imports of optional dependencies to the point where they are actually needed, to avoid overhead if you do not use them.
37
42
- Simplified version classes (no API change expected).
0 commit comments