How to use with languages with non-Latin characters?

Hi.
When I try to process PDF in Japanese, it produces gibberish like following.
```
E -
6
) -
18 BE E# #B
BE
B
5E
A
471 64123
- 5
```

I've tried setting the language based on code I found inside this repo.
```py
model = ChatOpenAI(model="gpt-4o", api_key=os.getenv("OPENAI_API_KEY"))  # type: ignore
parser_config = ParseFileConfig(
    llm_model_name="gpt-4o",
    # method=method,
    # strategy=strategy,
    model=model,
    language="ja",
    # parsing_instruction=parsing_instruction,
)

parser_builder = ParserBuilder()
parser = parser_builder.build(parser_config)
megaparse = MegaParse(parser)
response = megaparse.load("./document.pdf")
print(response)
megaparse.save("./document.md")
```

probably related to https://github.com/QuivrHQ/MegaParse/issues/92

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to use with languages with non-Latin characters? #219

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to use with languages with non-Latin characters? #219

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions