-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
While partition_pdf or partition(text.. ) this method is working for docx, txt however for some pdfs it is not parsing well especially academic papers.
To Reproduce
elements = self._partition(**self._build_partition_kwargs(loaded))
Expected behavior
Should return a simple text inside the elements list with some metadata.
Screenshots
**Environment Info**
name = "unstructured"
version = "0.17.2"
description = "A library that prepares raw documents for downstream ML tasks."
optional = false
python-versions = ">=3.9.0"
groups = ["main"]
files = [
{file = "unstructured-0.17.2-py3-none-any.whl", hash = ..
{file = "unstructured-0.17.2.tar.gz", hash = ..
Additional context
I will upload the pdf example that is not working.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working