Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-ASCII field names cannot be converted to lower case #11092

Open
rui-mo opened this issue Sep 25, 2024 · 0 comments
Open

Non-ASCII field names cannot be converted to lower case #11092

rui-mo opened this issue Sep 25, 2024 · 0 comments
Labels
enhancement New feature or request triage Newly created issue that needs attention.

Comments

@rui-mo
Copy link
Collaborator

rui-mo commented Sep 25, 2024

Bug description

We found that non-ASCII field names could not be converted to lower case due to 'folly::toLowerAscii' only applies to ASCII characters. This causes 'null' output because the characters in 'fileType' are still in upper-case, while 'fieldName' is in lower case (provided by Gluten when isFileColumnNamesReadAsLowerCase is true). 'fileTypeIdx' could not be found and the field is treated as missing column.

E.g. 'Товары' needs to be converted as 'товары', '国Ⅵ' -> '国ⅵ'. To solve this issue, perhaps a library handles UTF-8 characters like 'boost::locale::to_lower' could be used instead.

if (fileColumnNamesReadAsLowerCase) {
folly::toLowerAscii(childName);
}

auto fileTypeIdx = fileType->getChildIdxIfExists(fieldName);
if (!fileTypeIdx.has_value()) {
// Column is missing. Most likely due to schema evolution.
VELOX_CHECK(tableSchema);
childSpec->setConstantValue(BaseVector::createNullConstant(
tableSchema->findChild(fieldName),
1,
connectorQueryCtx_->memoryPool()));

System information

Velox System Info v0.0.2
CMake Version: 3.28.3
System: Linux-5.4.0-189-generic
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.1.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.1.0
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.8/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

@rui-mo rui-mo added bug Something isn't working triage Newly created issue that needs attention. and removed bug Something isn't working labels Sep 25, 2024
@rui-mo rui-mo added the enhancement New feature or request label Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage Newly created issue that needs attention.
Projects
None yet
Development

No branches or pull requests

1 participant