Skip to content

Conversation

@gaurav7261
Copy link
Contributor

@gaurav7261 gaurav7261 commented Dec 19, 2025

Why I'm doing:

[BugFix] Fix SIGSEGV in Int32ToDateConverter for NULL values on ARM64

The Int32ToDateConverter::convert() method was unconditionally accessing                                                                                                                        
src_data[i] for all rows, including NULL values. When Parquet statistics                                                                                                                        
contain NULL dates (common in Delta Lake tables with deletion vectors),                                                                                                                         
the underlying data array is uninitialized, causing SIGSEGV when accessed.                                                                                                                      
                                                                                                                                                                                                
Reproduction case:                                                                                                                                                                              
- Delta Lake table with DATE column                                                                                                                                                             
- Query with date predicate: WHERE day > current_date - interval 1 day                                                                                                                          
- Parquet row group statistics contain NULL min/max                                                                                                                                             
- Zone map filtering calls Int32ToDateConverter::convert()                                                                                                                                      
- Crash at: dst_data[i]._julian = src_data[i] + UNIX_EPOCH_JULIAN                                                                                                                               

This crash was architecture-specific (ARM64/Graviton) because:
- ARM has stricter memory access patterns than x86_64
- Uninitialized memory contains different bit patterns on ARM
- x86_64 had the same latent bug but often didn't crash

Root cause:                                                                                                                                                                                     
- resize_uninitialized() uses RawAllocator (no initialization for performance)                                                                                                                  
- append_nulls() sets null flag but leaves data uninitialized                                                                                                                                   
- Int32ToDateConverter accessed uninitialized data for NULL rows                                                                                                                                

What I'm doing:

Fixes #issue

Fix: Add null check before accessing data, consistent with other
converters (Int32ToTimeConverter, Int32ToDateTimeConverter).

Changes:                                                                                                                                                                                        
- Add null flag check: if (!src_null_data[i])                                                                                                                                                   
- Only convert non-null date values                                                                                                                                                             
- Replace memcpy with loop for consistency                                                                                                                                                      
                                                                                                                                                                                                
Fixes: SIGSEGV during Parquet zone map filtering with NULL date statistics                                                                                                                      
Tested: ARM64 (Graviton) and x86_64            

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.0
    • 3.5
    • 3.4
    • 3.3

Note

Guard Int32-to-Date parquet conversion against NULLs by copying nulls and only converting non-null rows, preventing SIGSEGV.

  • Backend · Parquet:
    • Int32ToDateConverter::convert:
      • Copy null flags row-by-row and only set Date values for non-null entries.
      • Removes bulk memcpy of null bitmap and unconditional writes to dst_data that accessed uninitialized memory.

Written by Cursor Bugbot for commit 2ebd9f5. This will update automatically on new commits. Configure here.

@gaurav7261 gaurav7261 requested a review from a team as a code owner December 19, 2025 07:42
@gaurav7261
Copy link
Contributor Author

@alvin-celerdata @Youngwb does above changes make sense?

@github-actions
Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link

[BE Incremental Coverage Report]

fail : 3 / 4 (75.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 src/formats/parquet/column_converter.cpp 3 4 75.00% [606]

@gaurav7261 gaurav7261 changed the title [BugFix] segmentation fault in parquet reading [BugFix] segmentation fault in parquet statistics reading Dec 19, 2025
@alvin-celerdata
Copy link
Contributor

@cursor review

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no bugs!


@stdpain
Copy link
Contributor

stdpain commented Dec 23, 2025

a similar fix #66864

@trueeyu
Copy link
Contributor

trueeyu commented Dec 23, 2025

a similar fix #66864

The fix: #66864 is better

@stdpain
Copy link
Contributor

stdpain commented Dec 23, 2025

closed via #66864

@stdpain stdpain closed this Dec 23, 2025
@gaurav7261
Copy link
Contributor Author

gaurav7261 commented Dec 26, 2025

Hi @stdpain @trueeyu i am facing random issue when applied above patch(#66864), in intel output of simple query is same, but in graviton it is giving random for example sometime 0, sometime x number, sometime y number etc.. how to debug

also one more thing i want to tell is same sigsev fault is not in 3.5, but in 4.x only, why?

i also want to add, my patch is working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants