extract_text() return garbled characters

I get garbled characters when parsing pdf file. The file I use is [this](http://www.aas.net.cn/fileZDHXB/journal/article/zdhxb/2012/8/PDF/20120812.pdf). There may be encoding issues?

## Environment

```bash
$ python -m platform
Linux-4.18.0-147.5.1.6.h841.eulerosv2r9.x86_64-x86_64-with-glibc2.17

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.17.1, crypt_provider=('pycryptodome', '3.19.0'), PIL=10.0.1
```

## Code + PDF

This is a minimal, complete example that shows the issue:

```python
from pypdf import PdfReader

file_path = '20120812.pdf'
page_idx = 0

reader = PdfReader(file_path)
page = reader.pages[page_idx]
text = page.extract_text()
print(text)
```

The pdf file can be obtained from [this url](http://www.aas.net.cn/fileZDHXB/journal/article/zdhxb/2012/8/PDF/20120812.pdf).

The output is:

```38ֻ8࿐ Б Vol. 38, No. 8
2012୍8ᄅ ACTA AUTOMATICA SINICA August, 2012
م
ᇛ ਟ1ࡹ1ྷ ೦2ᅦ ม1
ᅋေم, ྛऊো ,ۋ, ০Ⴈ
......
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

extract_text() return garbled characters #2330

Environment

Code + PDF

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

extract_text() return garbled characters #2330

Description

Environment

Code + PDF

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions