-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29375: FULL OUTER JOIN is failing with Unexpected hash table key type DATE #6239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Explanation: In
But the Line 821 in c237510
For
|
|
Will address sonar issues post review comments. I',m willing to move the if-else pattern + old switch style with jdk21 'switch expressions'. Reviewers can let me know. will file separate jira to migrate especially in vectorization. |
|
CC @zabetak , can you please help with the review? |
|
Hey @Aggarwal-Raghav, I am on holidays till Dec 29, with intermittent and not stable internet connection. Not sure if I will find time to check this before then. |
No worries. Enjoy !! 😅 |
|
Hi @Aggarwal-Raghav ,
Also I could see there are many UT test cases, may be you can add more test cases related to DATE HashTableKeyType. For example MapJoinTestConfig.java |
Thanks for the thorough review @mdayakar , i checked the above places you mentioned. For [2] and [3], it is already handled in this PR. For [1] and |
zabetak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM! I left some comments for minor improvements and potentially few unit tests more.
(The Sonar issues are not worth fixing; at least not now)
ql/src/test/queries/clientpositive/vector_full_outer_join_date.q
Outdated
Show resolved
Hide resolved
ql/src/test/queries/clientpositive/vector_full_outer_join_date.q
Outdated
Show resolved
Hide resolved
...g/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongHashMap.java
Outdated
Show resolved
Hide resolved
ql/src/test/queries/clientpositive/vector_full_outer_join_date.q
Outdated
Show resolved
Hide resolved
...test/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/TestVectorMapJoinFastRowHashMap.java
Outdated
Show resolved
Hide resolved
...test/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/TestVectorMapJoinFastRowHashMap.java
Outdated
Show resolved
Hide resolved
| case DATE: | ||
| hashTableKeyType = HashTableKeyType.DATE; | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have unit tests exploiting this config? Do we need to add something in TestMapJoinOperator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add a testcase testDate0 in TestMapJoinOperator. As testString0 makes use of the DATE type, but it does so as a Value column, not as a Join Key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well if its not a Join key then the tests are not directly targeting the fix so not sure how much are needed. Do we have unit tests for join keys with different types somewhere in the repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In TestMapJoinOperator.java, the following are the join keys for UT. The bigTableKeyColumnNums = new int[] {0}; determines the join column.
- TestLong0: long
- testLong0_NoRegularKeys: long
- testLong1: int
- testLong2: short
- testLong3: int
- testLong3_NoRegularKeys: int
- testLong4: int
- testLong5: long
- testLong6: long
- testDate0: date
- testMultiKey0: short, int
- testMultiKey1: timestamp, short, string
- testMultiKey2: long, short, string
- testMultiKey3: date, byte
- testString0: string
- testString1: binary
- testString2: string
|
Thanks for the review @zabetak , I'll accomodate the suggestions. |
|
Will push changes based on your input on #6239 (comment) |
ql/src/test/queries/clientpositive/vector_full_outer_join_date.q
Outdated
Show resolved
Hide resolved
| case DATE: | ||
| hashTableKeyType = HashTableKeyType.DATE; | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well if its not a Join key then the tests are not directly targeting the fix so not sure how much are needed. Do we have unit tests for join keys with different types somewhere in the repo?
...g/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongHashMap.java
Outdated
Show resolved
Hide resolved
|
zabetak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Aggarwal-Raghav Can you please clarify what kind of coverage we are adding with the unit tests:
- TestVectorMapJoinFastRowHashMap#testDateRowsExact
- TestMapJoinOperator#testDate0
It seems that none of the above goes through the problematic code.
That's strange, it was failing for me. Let me re-check and confirm on that. |
|
Please test with just these files from this PR:
For |
|
Thanks for the detailed explanation @Aggarwal-Raghav! My workspace was not clean and didn't notice that I was missing |
zabetak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience. I will merge the PR soon!
No worries. Thanks for the thorough review. |
|
Many thanks for the PR @Aggarwal-Raghav and @mdayakar for the review! |





What changes were proposed in this pull request?
Check HIVE-29375 for repro and stacktrace
Why are the changes needed?
To make
DATEtype support for full outer join, which is, getting converted to map join because ofhive.optimize.dynamic.partition.hashjoin=true;Does this PR introduce any user-facing change?
No
How was this patch tested?
Wrote a new q file and will see CI output