Commit 318330b
committed
fix: preserve newlines in Table and TableChunk elements during PDF partitioning
The RE_MULTISPACE_INCLUDING_NEWLINES regex was being applied to all Text
elements, including Table and TableChunk. This incorrectly removed newline
characters that carry structural meaning in tables (row separation).
Fixes #39831 parent 4bbb1ff commit 318330b
3 files changed
+15
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
1 | 6 | | |
2 | 7 | | |
3 | 8 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| 37 | + | |
| 38 | + | |
37 | 39 | | |
38 | 40 | | |
39 | 41 | | |
| |||
823 | 825 | | |
824 | 826 | | |
825 | 827 | | |
826 | | - | |
827 | | - | |
828 | | - | |
829 | | - | |
830 | | - | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
| 831 | + | |
| 832 | + | |
| 833 | + | |
| 834 | + | |
831 | 835 | | |
832 | 836 | | |
833 | 837 | | |
| |||
0 commit comments