Skip to content

bug/<partition_md> does not properly handled title hireachy #3952

Open
@unemployed-denizen

Description

@unemployed-denizen

Describe the bug
When processing markdown using partition_md.The title 'section' should be under 'Chapter 1' as its parent, but in this case, it is NaN. Or could you please suggest if there is any other parameter I might have missed that could be causing this problem? Thanks.

To Reproduce
Provide a code snippet that reproduces the issue.

markdown_document = """# Title\n\n \
## Chapter 1\n\n \
Hi this is Jim\n\n Hi this is Joe\n\n \
### Section \n\n \
Hi this is Lance \n\n
## Chapter 2\n\n \
Hi this is Molly"""

from unstructured.partition.md import partition_md
from unstructured.staging.base import convert_to_dataframe

# file = r"C:\Users\Cjy\Desktop\fix\out.md"
raw_md_elements = partition_md(text=markdown_document)
df = convert_to_dataframe(raw_md_elements)
df

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.
Image

Environment Info
Please run python scripts/collect_env.py and paste the output here.
This will help us understand more about the environment in which the bug occurred.

Additional context
version: 0.12.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions