Commit e889201
authored
Fix Hadoop multi-value string null value handling to match native batch (#18944)
Doing some more digging, I found another unfortunate data difference between native batch (on-cluster) and Hadoop batch ingest. Ingesting a multi-value string ["a","b",null] with Hadoop is treated as ["a","b","null"] and in native batch, this correctly ingests to ["a","b",null]. This is difference appears to be a bug in all Druid versions(even latest). While this will not affect the current null handling migration, this will affect the future Hadoop -> native batch ingestion migration that will also need to take place.
Hadoop doesn't allow for all-null columns in segments, it simply excludes them from the segment. I've updated the Hadoop job to support running druid.indexer.task.storeEmptyColumns=true, which allows us to store all NULL columns (how native/streaming ingest work today).
BREAKING CHANGES
1. Hadoop ingests will now process multi-value string inputs like ["a","b",null] -> ["a","b",null] instead of ["a","b","null"] to match native batch ingestion.
2. Hadoop ingests will now by default keep columns with all NULL values, instead of excluding them from the segment.
useStringValueOfNullInLists parameter in RowBasedColumnSelectorFactory.java has been removed.1 parent cec4f5b commit e889201
28 files changed
Lines changed: 728 additions & 61 deletions
File tree
- extensions-core/avro-extensions/src/test/java/org/apache/druid/data/input
- indexing-hadoop/src
- main/java/org/apache/druid/indexer
- test/java/org/apache/druid/indexer
- processing/src
- main/java/org/apache/druid
- data/input
- java/util/common
- query
- groupby/epinephelinae
- timeseries
- segment
- incremental
- transform
- test/java/org/apache/druid
- data/input/impl
- frame
- key
- write
- query
- filter
- groupby
- having
- orderby
- segment
- filter
- transform
- virtual
- sql/src/test/java/org/apache/druid/sql/calcite/expression
Lines changed: 4 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
| 48 | + | |
48 | 49 | | |
49 | 50 | | |
50 | 51 | | |
| |||
71 | 72 | | |
72 | 73 | | |
73 | 74 | | |
| 75 | + | |
74 | 76 | | |
75 | 77 | | |
76 | 78 | | |
| |||
366 | 368 | | |
367 | 369 | | |
368 | 370 | | |
| 371 | + | |
369 | 372 | | |
370 | | - | |
| 373 | + | |
371 | 374 | | |
372 | 375 | | |
373 | 376 | | |
| |||
Lines changed: 14 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
55 | | - | |
| 55 | + | |
| 56 | + | |
56 | 57 | | |
57 | 58 | | |
58 | 59 | | |
| |||
88 | 89 | | |
89 | 90 | | |
90 | 91 | | |
| 92 | + | |
| 93 | + | |
91 | 94 | | |
92 | 95 | | |
93 | 96 | | |
94 | 97 | | |
95 | 98 | | |
96 | | - | |
| 99 | + | |
97 | 100 | | |
98 | 101 | | |
99 | 102 | | |
| |||
129 | 132 | | |
130 | 133 | | |
131 | 134 | | |
132 | | - | |
133 | 135 | | |
134 | 136 | | |
135 | 137 | | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
136 | 146 | | |
137 | 147 | | |
138 | 148 | | |
| |||
262 | 272 | | |
263 | 273 | | |
264 | 274 | | |
| 275 | + | |
265 | 276 | | |
266 | 277 | | |
267 | 278 | | |
| |||
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
602 | 602 | | |
603 | 603 | | |
604 | 604 | | |
605 | | - | |
| 605 | + | |
606 | 606 | | |
607 | 607 | | |
608 | 608 | | |
| |||
614 | 614 | | |
615 | 615 | | |
616 | 616 | | |
617 | | - | |
| 617 | + | |
618 | 618 | | |
619 | 619 | | |
620 | 620 | | |
| |||
Lines changed: 11 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
372 | 372 | | |
373 | 373 | | |
374 | 374 | | |
375 | | - | |
| 375 | + | |
376 | 376 | | |
377 | | - | |
| 377 | + | |
378 | 378 | | |
379 | 379 | | |
380 | 380 | | |
| |||
389 | 389 | | |
390 | 390 | | |
391 | 391 | | |
| 392 | + | |
392 | 393 | | |
393 | 394 | | |
394 | 395 | | |
395 | | - | |
| 396 | + | |
396 | 397 | | |
397 | 398 | | |
| 399 | + | |
398 | 400 | | |
399 | 401 | | |
400 | 402 | | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
401 | 406 | | |
402 | 407 | | |
403 | 408 | | |
| |||
449 | 454 | | |
450 | 455 | | |
451 | 456 | | |
452 | | - | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
453 | 460 | | |
454 | 461 | | |
455 | 462 | | |
| |||
Lines changed: 78 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
41 | 44 | | |
42 | 45 | | |
43 | 46 | | |
| |||
46 | 49 | | |
47 | 50 | | |
48 | 51 | | |
| 52 | + | |
49 | 53 | | |
50 | 54 | | |
51 | 55 | | |
52 | 56 | | |
53 | 57 | | |
| 58 | + | |
54 | 59 | | |
55 | 60 | | |
56 | 61 | | |
| |||
214 | 219 | | |
215 | 220 | | |
216 | 221 | | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
217 | 295 | | |
218 | 296 | | |
219 | 297 | | |
| |||
0 commit comments