You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We found that non-ASCII field names could not be converted to lower case due to 'folly::toLowerAscii' only applies to ASCII characters. This causes 'null' output because the characters in 'fileType' are still in upper-case, while 'fieldName' is in lower case (provided by Gluten when isFileColumnNamesReadAsLowerCase is true). 'fileTypeIdx' could not be found and the field is treated as missing column.
E.g. 'Товары' needs to be converted as 'товары', '国Ⅵ' -> '国ⅵ'. To solve this issue, perhaps a library handles UTF-8 characters like 'boost::locale::to_lower' could be used instead.
Velox System Info v0.0.2
CMake Version: 3.28.3
System: Linux-5.4.0-189-generic
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.1.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.1.0
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.8/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt
Relevant logs
No response
The text was updated successfully, but these errors were encountered:
rui-mo
added
bug
Something isn't working
triage
Newly created issue that needs attention.
and removed
bug
Something isn't working
labels
Sep 25, 2024
Bug description
We found that non-ASCII field names could not be converted to lower case due to 'folly::toLowerAscii' only applies to ASCII characters. This causes 'null' output because the characters in 'fileType' are still in upper-case, while 'fieldName' is in lower case (provided by Gluten when
isFileColumnNamesReadAsLowerCase
is true). 'fileTypeIdx' could not be found and the field is treated as missing column.E.g. 'Товары' needs to be converted as 'товары', '国Ⅵ' -> '国ⅵ'. To solve this issue, perhaps a library handles UTF-8 characters like 'boost::locale::to_lower' could be used instead.
velox/velox/dwio/parquet/reader/ParquetReader.cpp
Lines 760 to 762 in 3a1a60a
velox/velox/connectors/hive/SplitReader.cpp
Lines 371 to 378 in 3a1a60a
System information
Velox System Info v0.0.2
CMake Version: 3.28.3
System: Linux-5.4.0-189-generic
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.1.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.1.0
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.8/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt
Relevant logs
No response
The text was updated successfully, but these errors were encountered: