Commit c1cb835
authored
Write Tokenized Data Sizes as metadata (#2431)
Writes out the token size info alongside the tokenized data itself
(request from
https://discord.com/channels/1354881461060243556/1366632114316906506/1458962443542724785).
This doesn't help for already tokenized data, but means moving forward
that reasonable stats will live alongside the data itself so it can be
accessed easily to compute things like epochs.1 parent 96fae76 commit c1cb835
2 files changed
Lines changed: 30 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| 31 | + | |
| 32 | + | |
30 | 33 | | |
31 | 34 | | |
32 | 35 | | |
| |||
367 | 370 | | |
368 | 371 | | |
369 | 372 | | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
370 | 390 | | |
371 | 391 | | |
372 | 392 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| |||
252 | 253 | | |
253 | 254 | | |
254 | 255 | | |
255 | | - | |
| 256 | + | |
256 | 257 | | |
257 | 258 | | |
| 259 | + | |
258 | 260 | | |
259 | 261 | | |
260 | 262 | | |
261 | 263 | | |
262 | 264 | | |
263 | 265 | | |
| 266 | + | |
| 267 | + | |
264 | 268 | | |
265 | 269 | | |
266 | 270 | | |
267 | 271 | | |
268 | 272 | | |
269 | | - | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
270 | 278 | | |
271 | 279 | | |
272 | 280 | | |
| |||
0 commit comments