bug/group_bullet_paragraph causes problems by returning a list

**Describe the bug**
passing `unstructured.cleaners.core.group_bullet_paragraph` to `UnstructuredBaseLoader`'s `post_processors` will cause the code to break, because `group_bullet_paragraph` returns a `List[str]`, and `unstructured.documents.elements.Text.apply()` method checks the output of `group_bullet_paragraph`, and throws an error if it is not `str`, see [here](https://github.com/Unstructured-IO/unstructured/blob/1a706771facd0adef754e2f87ec58479e42251e6/unstructured/documents/elements.py#L811):

```python
if not isinstance(cleaned_text, str):  # pyright: ignore[reportUnnecessaryIsInstance]
            raise ValueError("Cleaner produced a non-string output.")
```

**To Reproduce**
```python
loader = UnstructuredFileLoader("some_file_that_has_bullet_points.pdf",
                                mode="elements",
                                pdf_infer_table_structure=True,
                                skip_infer_table_types=['jpg', 'png', 'xls', 'xlsx'],
                                show_progress=True,
                                post_processors=[group_bullet_paragraph]
                                )
docs = loader.load()
```

**Expected behavior**
The list of strings should be joined.
Proposing replacing:

```python
if not isinstance(cleaned_text, str):  # pyright: ignore[reportUnnecessaryIsInstance]
            raise ValueError("Cleaner produced a non-string output.")
```

with something like:

```python
if isinstance(cleaned_text, list):
    cleaned_text = " ".join(cleaned_text)
if not isinstance(cleaned_text, str):  # pyright: ignore[reportUnnecessaryIsInstance]
    raise ValueError("Cleaner produced a non-string output.")
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug/group_bullet_paragraph causes problems by returning a list #2547

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug/group_bullet_paragraph causes problems by returning a list #2547

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions