Commit 3395c4c
Avoid per-item numpy conversion in JaggedArrayStore write path
TreeStore.extend and extend_with_batch were converting each item to a
numpy array individually before passing to JaggedArrayStore.extend,
which then concatenated them. For a batch of 16K tokenized sequences
this means 16K np.asarray calls + one np.concatenate.
Add PreparedBatch.from_sequences() that pre-allocates a single flat
array from the cumulative lengths and copies each sequence directly
into the right slice. JaggedArrayStore.extend now detects Python
sequences (lists) and uses this fast path automatically.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 493c9bb commit 3395c4c
3 files changed
Lines changed: 63 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
78 | | - | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
79 | 83 | | |
80 | 84 | | |
81 | 85 | | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
82 | 102 | | |
83 | 103 | | |
84 | 104 | | |
| |||
282 | 302 | | |
283 | 303 | | |
284 | 304 | | |
285 | | - | |
| 305 | + | |
286 | 306 | | |
287 | 307 | | |
288 | 308 | | |
289 | 309 | | |
290 | | - | |
| 310 | + | |
291 | 311 | | |
292 | 312 | | |
293 | 313 | | |
| |||
313 | 333 | | |
314 | 334 | | |
315 | 335 | | |
316 | | - | |
| 336 | + | |
317 | 337 | | |
318 | 338 | | |
319 | 339 | | |
320 | 340 | | |
321 | 341 | | |
322 | | - | |
| 342 | + | |
323 | 343 | | |
324 | 344 | | |
325 | 345 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
73 | | - | |
| 73 | + | |
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
| 87 | + | |
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
| |||
98 | 98 | | |
99 | 99 | | |
100 | 100 | | |
101 | | - | |
102 | | - | |
103 | | - | |
| 101 | + | |
104 | 102 | | |
105 | 103 | | |
106 | 104 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
396 | 396 | | |
397 | 397 | | |
398 | 398 | | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
399 | 434 | | |
400 | 435 | | |
0 commit comments