Cannot extract text correctly for some CJK fonts

Hi there, we're trying to utilize this cool library to extract text for some processing, but it seems it failed on the attached PDF. It contains some Traditional Chinese characters but the output looks like some random characters.

Looks like this PDF is utilizing CFF based CIDFontType0C as subtype, wondering if that's not currently supported by pypdf? Let us know if there's anything we can help as well. Not super familiar but happy to help out.

## Environment

Which environment were you using when you encountered the problem?

```bash
$ python3 -m platform
macOS-13.5-x86_64-i386-64bit

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.16.2, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=none
```

## Code + PDF


This is a minimal, complete example that shows the issue:

```python
from pypdf import PdfReader

reader = PdfReader("caibao.pdf")
number_of_pages = len(reader.pages)
for i in range(number_of_pages): 
    page = reader.pages[i]
    text = page.extract_text()
    print(text)

Output:
2ࣨ
˜
ɚཧɚɧϋ
ʬ˜ɧɤ˚ɚཧɚɚϋ
ʬ˜ɧɤ˚ Νˢᜊਗ
ৰ̮
ϗɝ 299,194 269,505 11%
ˣл 139,022 114,941 21%
л 80,729 67,284 20%
л 53,417 42,963 24%
л 52,009 42,032 24%
ɛ͏࿆ʩ
 Ñਿ͉ 5.486 4.407 24%
 Ñᛅᑛ 5.334 4.320 23%
л 98,511 73,205 35%
л 70,086 53,684 31%
ɛ͏࿆ʩ
 Ñਿ͉ 7.393 5.628 31%
 Ñᛅᑛ 7.236 5.516 31%
```

PDF:
[caibao.pdf](https://github.com/py-pdf/pypdf/files/13331843/caibao.pdf)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot extract text correctly for some CJK fonts #2295

Environment

Code + PDF

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot extract text correctly for some CJK fonts #2295

Description

Environment

Code + PDF

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions