Open
Description
Chrome & MacOS' Preview open the PDF without any issue.
pdf-online validator's output:
File | example.pdf
-- | --
Compliance | pdf1.7
Result | Document does not conform to PDF/A.
Details | Validating file "example.pdf" for conformance level pdf1.7The name F1 of a font resource is unknown.The name FormXob.a31602a3f14463f4d5d3143608a8d452 of a xobject resource is unknown.The encoding for character code 8226 in font 'STSong-Light' is missing.The name F1 of a font resource is unknown.The name FormXob.a31602a3f14463f4d5d3143608a8d452 of a xobject resource is unknown.The encoding for character code 169 in font 'STSong-Light' is missing.The name F1 of a font resource is unknown.The name FormXob.a31602a3f14463f4d5d3143608a8d452 of a xobject resource is unknown.The name F1 of a font resource is unknown.The name FormXob.a31602a3f14463f4d5d3143608a8d452 of a xobject resource is unknown.The name F1 of a font resource is unknown.The name FormXob.a31602a3f14463f4d5d3143608a8d452 of a xobject resource is unknown.The encoding for character code 9899 in font 'STSong-Light' is missing.The name F1 of a font resource is unknown.The name FormXob.a31602a3f14463f4d5d3143608a8d452 of a xobject resource is unknown.The encoding for character code 8226 in font 'STSong-Light' is missing.The encoding for character code 183 in font 'STSong-Light' is missing.The name F1 of a font resource is unknown.The name FormXob.a31602a3f14463f4d5d3143608a8d452 of a xobject resource is unknown.The encoding for character code 8226 in font 'STSong-Light' is missing.The encoding for character code 9899 in font 'STSong-Light' is missing.The name F1 of a font resource is unknown.The name FormXob.a31602a3f14463f4d5d3143608a8d452 of a xobject resource is unknown.The encoding for character code 8226 in font 'STSong-Light' is missing.
Environment
$ python -m platform
macOS-15.3.1-arm64-arm-64bit
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.3.0, crypt_provider=('cryptography', '43.0.0'), PIL=none
Also recreated on Ubunto 22.04 & Jupyter notebook.
Code + PDF
import pdb
import sys
import traceback
from pypdf import PdfReader
reader = PdfReader("example.pdf")
for page in reader.pages:
if page.page_number != 49:
continue
try:
text = page.extract_text()
except Exception as e:
_, _, tb = sys.exc_info()
traceback.print_exc() # Optional: print the full traceback
pdb.post_mortem(tb)
I cannot share the PDF, since it contains proprietary information, nor do I know how it was encoded.
Traceback
WARNING:pypdf._page:Impossible to decode XFormObject /FormXob.a31602a3f14463f4d5d3143608a8d452: '/XObject'
Traceback (most recent call last):
File "<ipython-input-4-2a61f328ceb7>", line 6, in <cell line: 0>
text = page.extract_text()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/pypdf/_page.py", line 2378, in extract_text
return self._extract_text(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/pypdf/_page.py", line 2148, in _extract_text
process_operation(operator, operands)
File "/usr/local/lib/python3.11/dist-packages/pypdf/_page.py", line 1961, in process_operation
cm_matrix = mult(
^^^^^
File "/usr/local/lib/python3.11/dist-packages/pypdf/_text_extraction/__init__.py", line 72, in mult
m[2] * n[0] + m[3] * n[2],
~^^^
IndexError: list index out of range
> /usr/local/lib/python3.11/dist-packages/pypdf/_text_extraction/__init__.py(72)mult()
70 m[0] * n[0] + m[1] * n[2],
71 m[0] * n[1] + m[1] * n[3],
---> 72 m[2] * n[0] + m[3] * n[2],
73 m[2] * n[1] + m[3] * n[3],
74 m[4] * n[0] + m[5] * n[2] + n[4],
m
is [0.70278, 65.3, 163.36]