Open
Description
Hi.
When I try to process PDF in Japanese, it produces gibberish like following.
E -
6
) -
18 BE E# #B
BE
B
5E
A
471 64123
- 5
I've tried setting the language based on code I found inside this repo.
model = ChatOpenAI(model="gpt-4o", api_key=os.getenv("OPENAI_API_KEY")) # type: ignore
parser_config = ParseFileConfig(
llm_model_name="gpt-4o",
# method=method,
# strategy=strategy,
model=model,
language="ja",
# parsing_instruction=parsing_instruction,
)
parser_builder = ParserBuilder()
parser = parser_builder.build(parser_config)
megaparse = MegaParse(parser)
response = megaparse.load("./document.pdf")
print(response)
megaparse.save("./document.md")
probably related to #92
Activity