Skip to content

Number of chunks not always reliable #13

Description

@TylerWilsonHC-SC

It was observed that the number of chunks produced from a csv file is not always reliable. During the testing it was found that for the test dataset the number of chunks generated by the .report_from_directory() function was 76. If the report.csv file it produced is copied and renamed it was found that sometimes it returns 76 chunks and other times it returns 75 chunks. One potential reason for this could be related to the how whitespace or line breaks are treated in the document. There may be some autoformatting that reduces the number of characters in the csv file.
It should be investigated what is the exact root of this issue and if it can be resolved by adding a pre-processing step to remove problematic characters before the chunking step to improve reliability

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions