Skip to content

bug/Upgrading from version 0.14.10 to 0.15.0 results in the loss of code block formatting in Markdown. #3501

Open
@Fleandre

Description

@Fleandre

Describe the bug
Recently, I upgraded from version 0.14.10 to 0.15.0, but I found that the new version causes line breaks in code blocks within Markdown to be lost, as shown in the image below.

image

To Reproduce
Simply call elements = partition_md(filename=file_path)

The source markdown file and the resulting output files are attached below.
The output file is created by concatenating all the text from the elements returned by the partition_md function.
test_markdown_ch.md
output_0.14.10.txt
output_0.15.0.txt

Expected behavior
Line breaks in the markdown code block remain unchanged.

Screenshots
Please refer to the screenshot above.

Environment Info
Please run python scripts/collect_env.py and paste the output here.
This will help us understand more about the environment in which the bug occurred.

Additional context
In the changelog for version 0.15.0, I observed the following:

Improve text clearing process in email partitioning. Updated the email partitioner to remove both =\n and =\r\n characters during the clearing process. Previously, only =\n characters were removed.

Additionally, the implementation of partition_md is accomplished by converting Markdown to HTML and then invoking the partition_html function.

Does this issue relates to the newer strategy of HTML partitioning?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghtmlmarkdownRelated to partitioning Markdown documents

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions