Create paragraphs from a collection of lines.
It creates new paragraph elements that contains arrays of line elements.
It simply takes every line one by one according to the reading order and stops and loops if the next line is on another paragraph.
-
tolerance
: Ratio used when merging lines into paragraphs taking into account the line height and bottom distance to next line.TIP: If you see two lines in same paragraph that should be split into two paragraphs decrease tolerance value, if you see two lines in different paragraphs that should be part of same paragraph increase tolerance value.
Almost perfect
- It depends on the reading order detection quality
- To detect the space between paragraphs, it's currently using an heuristics and doesn't detect automatically according the the interline. So if a paragraph have a large interline spacing, the algorithm may fail and create one paragraph per line. That said, this rarely occurs according to our experience.