Commit 6388894
authored
refactor: Refactor Parquet reader to avoid loading entire file in memory at once (#184)
* refactor:Refactor Parquet reader to avoid whole-file loading in memory
Read Parquet metadata from the footer and fetch column chunk bytes by seek
instead of loading the entire file into memory up front.
This keeps the current page decoding path intact while reducing peak memory
usage for normal file reads, ensuring that only the column chunks needed
are loaded into memory. One column chunk at a time so extra memory is
bounded by the size of the column chunk.
This is also the first step towards a streaming reader.
* Use Map.Strict to reduce the possibility of unevaluated thunk leak
* Remove forceNonSeekable from ReaderOpts, use functions + partial application to inject testing behavior while preserving readParquetWithOpts API1 parent 727c579 commit 6388894
File tree
6 files changed
+216
-26
lines changed- src/DataFrame/IO
- Parquet
- tests
6 files changed
+216
-26
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
91 | 91 | | |
92 | 92 | | |
93 | 93 | | |
| 94 | + | |
94 | 95 | | |
95 | 96 | | |
96 | 97 | | |
| |||
143 | 144 | | |
144 | 145 | | |
145 | 146 | | |
| 147 | + | |
| 148 | + | |
146 | 149 | | |
147 | 150 | | |
148 | 151 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
16 | | - | |
| 16 | + | |
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| 40 | + | |
40 | 41 | | |
| 42 | + | |
41 | 43 | | |
42 | 44 | | |
43 | 45 | | |
| |||
130 | 132 | | |
131 | 133 | | |
132 | 134 | | |
133 | | - | |
134 | | - | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
135 | 142 | | |
136 | 143 | | |
137 | 144 | | |
| |||
204 | 211 | | |
205 | 212 | | |
206 | 213 | | |
207 | | - | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
208 | 219 | | |
209 | 220 | | |
210 | 221 | | |
| |||
236 | 247 | | |
237 | 248 | | |
238 | 249 | | |
239 | | - | |
240 | | - | |
| 250 | + | |
| 251 | + | |
241 | 252 | | |
242 | 253 | | |
243 | 254 | | |
244 | | - | |
245 | | - | |
| 255 | + | |
| 256 | + | |
246 | 257 | | |
247 | 258 | | |
248 | 259 | | |
| |||
321 | 332 | | |
322 | 333 | | |
323 | 334 | | |
| 335 | + | |
324 | 336 | | |
325 | 337 | | |
326 | 338 | | |
327 | | - | |
| 339 | + | |
328 | 340 | | |
329 | 341 | | |
330 | 342 | | |
331 | 343 | | |
332 | | - | |
333 | | - | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
334 | 355 | | |
335 | | - | |
336 | | - | |
337 | | - | |
338 | | - | |
| 356 | + | |
| 357 | + | |
339 | 358 | | |
340 | 359 | | |
341 | 360 | | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
342 | 364 | | |
343 | 365 | | |
344 | 366 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
65 | | - | |
| 65 | + | |
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
329 | 330 | | |
330 | 331 | | |
331 | 332 | | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
332 | 344 | | |
333 | 345 | | |
334 | 346 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
| |||
61 | 62 | | |
62 | 63 | | |
63 | 64 | | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
64 | 72 | | |
65 | | - | |
| 73 | + | |
66 | 74 | | |
67 | 75 | | |
68 | 76 | | |
69 | 77 | | |
70 | | - | |
| 78 | + | |
71 | 79 | | |
72 | 80 | | |
73 | 81 | | |
| |||
163 | 171 | | |
164 | 172 | | |
165 | 173 | | |
166 | | - | |
| 174 | + | |
167 | 175 | | |
168 | 176 | | |
169 | 177 | | |
| |||
172 | 180 | | |
173 | 181 | | |
174 | 182 | | |
175 | | - | |
| 183 | + | |
176 | 184 | | |
177 | 185 | | |
178 | 186 | | |
179 | 187 | | |
180 | 188 | | |
181 | | - | |
| 189 | + | |
182 | 190 | | |
183 | 191 | | |
184 | 192 | | |
185 | 193 | | |
186 | | - | |
| 194 | + | |
187 | 195 | | |
188 | 196 | | |
189 | 197 | | |
190 | | - | |
| 198 | + | |
191 | 199 | | |
192 | 200 | | |
193 | 201 | | |
194 | 202 | | |
195 | | - | |
| 203 | + | |
196 | 204 | | |
197 | 205 | | |
198 | 206 | | |
| |||
468 | 476 | | |
469 | 477 | | |
470 | 478 | | |
471 | | - | |
| 479 | + | |
472 | 480 | | |
473 | 481 | | |
474 | 482 | | |
475 | 483 | | |
476 | | - | |
| 484 | + | |
477 | 485 | | |
478 | 486 | | |
479 | 487 | | |
| |||
0 commit comments