Skip to content

remove debug table validation checks #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 17, 2025

Conversation

bharatnc
Copy link

@bharatnc bharatnc commented Apr 17, 2025

Follow up for previous PR: #69.

Remove debug validation checks on table. The validation checks mainly fail for debug builds and what they do is to actually validate the metadata for a table. While dealing with indices for dictionaries, it looks like for compatibility, we accept signed or unsigned 32 or 64 bit integers. However, arrow dictionary builder AppendIndices only accepts signed integers. So, we end up appending signed integers while dictionary schema could still be unsigned integers. So, when the validation happens, it checks if the dictionary schema and the actual column data are the same types. Since it's possible to have uint64 schema with int64 columns, this leads to errors like (and similar error for uint32):

Column data for field 0 with type dictionary<values=string, indices=int64,
ordered=0> is inconsistent with schema dictionary<values=string,
indices=uint64, ordered=0>

Unfortunately, I couldn't find a sane way to preserve this behavior since the current behavior exists for compatibility with pandas format it looks like. The settings to support both singed and unsigned dictionary indices was added in: ClickHouse/ClickHouse#58519.

This is where the current assert fails:

RETURN_NOT_OK(ValidateMeta());

which is ultimately checked here:

return Status::Invalid("Column data for field ", i, " with type ",

Remove validation checks on table. The validation checks mainly fail for
debug builds and what they do is to actually validate the metadata for a
table. While dealing with indices for dictionaries, it looks like for
compatibility, we accept signed or unsigned uint32 or uint64 integers.
However, arrow dictionary builder AppendIndices only accepts signed
integers. So, we end up appending signed integers while dictionary
schema could still be unsigned integers. So, when the validation
happens, it checks if the dictionary schema and the actual column data
are the same types. Since it's possible to have uint64 schema with int64
columns, this leads to errors like (and similar error for uint32):
```
Column data for field 0 with type dictionary<values=string, indices=int64,
ordered=0> is inconsistent with schema dictionary<values=string,
indices=uint64, ordered=0>
```
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@bharatnc bharatnc merged commit 210f686 into ClickHouse/release/16.1.0 Apr 17, 2025
7 of 66 checks passed
@bharatnc bharatnc deleted the ncb/remove-asserts branch April 17, 2025 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants