Number of chunks not always reliable

It was observed that the number of chunks produced from a csv file is not always reliable. During the testing it was found that for the test dataset the number of chunks generated by the `.report_from_directory()` function was 76. If the `report.csv` file it produced is copied and renamed it was found that sometimes it returns 76 chunks and other times it returns 75 chunks. One potential reason for this could be related to the how whitespace or line breaks are treated in the document. There may be some autoformatting that reduces the number of characters in the csv file.
It should be investigated what is the exact root of this issue and if it can be resolved by adding a pre-processing step to remove problematic characters before the chunking step to improve reliability

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Number of chunks not always reliable #13

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Number of chunks not always reliable #13

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions