Cannot extract text from a certain page in a document, due to unexpected low number of operands in a `cm` operator.

Chrome & MacOS' Preview open the PDF without any issue. 

pdf-online validator's output:

```

File | example.pdf
-- | --
Compliance | pdf1.7
Result | Document does not conform to PDF/A.
Details | Validating file "example.pdf" for conformance level pdf1.7The name F1 of a font resource is unknown.The name FormXob.a31602a3f14463f4d5d3143608a8d452 of a xobject resource is unknown.The encoding for character code 8226 in font 'STSong-Light' is missing.The name F1 of a font resource is unknown.The name FormXob.a31602a3f14463f4d5d3143608a8d452 of a xobject resource is unknown.The encoding for character code 169 in font 'STSong-Light' is missing.The name F1 of a font resource is unknown.The name FormXob.a31602a3f14463f4d5d3143608a8d452 of a xobject resource is unknown.The name F1 of a font resource is unknown.The name FormXob.a31602a3f14463f4d5d3143608a8d452 of a xobject resource is unknown.The name F1 of a font resource is unknown.The name FormXob.a31602a3f14463f4d5d3143608a8d452 of a xobject resource is unknown.The encoding for character code 9899 in font 'STSong-Light' is missing.The name F1 of a font resource is unknown.The name FormXob.a31602a3f14463f4d5d3143608a8d452 of a xobject resource is unknown.The encoding for character code 8226 in font 'STSong-Light' is missing.The encoding for character code 183 in font 'STSong-Light' is missing.The name F1 of a font resource is unknown.The name FormXob.a31602a3f14463f4d5d3143608a8d452 of a xobject resource is unknown.The encoding for character code 8226 in font 'STSong-Light' is missing.The encoding for character code 9899 in font 'STSong-Light' is missing.The name F1 of a font resource is unknown.The name FormXob.a31602a3f14463f4d5d3143608a8d452 of a xobject resource is unknown.The encoding for character code 8226 in font 'STSong-Light' is missing.

```

## Environment

```bash
$ python -m platform
macOS-15.3.1-arm64-arm-64bit

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.3.0, crypt_provider=('cryptography', '43.0.0'), PIL=none
```

Also recreated on Ubunto 22.04 & Jupyter notebook.

## Code + PDF

```py
import pdb
import sys
import traceback
from pypdf import PdfReader

reader = PdfReader("example.pdf")
for page in reader.pages:
  if page.page_number != 49:
    continue

  try:
    text = page.extract_text()
  except Exception as e:
    _, _, tb = sys.exc_info()
    traceback.print_exc()  # Optional: print the full traceback
    pdb.post_mortem(tb)
```

I cannot share the PDF, since it contains proprietary information, nor do I know how it was encoded.

## Traceback

```
WARNING:pypdf._page:Impossible to decode XFormObject /FormXob.a31602a3f14463f4d5d3143608a8d452: '/XObject'
Traceback (most recent call last):
  File "<ipython-input-4-2a61f328ceb7>", line 6, in <cell line: 0>
    text = page.extract_text()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pypdf/_page.py", line 2378, in extract_text
    return self._extract_text(
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pypdf/_page.py", line 2148, in _extract_text
    process_operation(operator, operands)
  File "/usr/local/lib/python3.11/dist-packages/pypdf/_page.py", line 1961, in process_operation
    cm_matrix = mult(
                ^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pypdf/_text_extraction/__init__.py", line 72, in mult
    m[2] * n[0] + m[3] * n[2],
                  ~^^^
IndexError: list index out of range
> /usr/local/lib/python3.11/dist-packages/pypdf/_text_extraction/__init__.py(72)mult()
     70         m[0] * n[0] + m[1] * n[2],
     71         m[0] * n[1] + m[1] * n[3],
---> 72         m[2] * n[0] + m[3] * n[2],
     73         m[2] * n[1] + m[3] * n[3],
     74         m[4] * n[0] + m[5] * n[2] + n[4],
```

`m` is `[0.70278, 65.3, 163.36]`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cannot extract text from a certain page in a document, due to unexpected low number of operands in a `cm` operator. #3262

Environment

Code + PDF

Traceback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot extract text from a certain page in a document, due to unexpected low number of operands in a cm operator. #3262

Description

Environment

Code + PDF

Traceback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Cannot extract text from a certain page in a document, due to unexpected low number of operands in a `cm` operator. #3262