Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8371297134 Type promotion between integral and floating types #2173

Merged
merged 9 commits into from
Mar 4, 2025

Conversation

poodlewars
Copy link
Collaborator

To fix the bug reported in 8371297134

  • Write an int64 column
  • Append a float32 to it
  • Read it -> We blow up at read time

I've added logic so that when we merge descriptors we combine:

any integer + float 64 -> float 64
integer up to 16 bits + float 32 -> float 32
integer above 16 bits + float 32 -> float 64

This is because a 16 bit integer can safely fit in a 32 bit float without loss of precision, whereas a 32 bit integer cannot. We could instead always promote up to float 64 which would be more wasteful but simpler.

Separately, for query builder pipelines I added changes so that when combining a float and an integer we always promote up to float64. This misses the cases where it is actually safe to promote to float32, but is simpler and matches Pandas. This is the cause of the test change in test_sort_merge.py.

I reworked has_valid_type_promotion, introducing a new function is_valid_type_promotion_to_target instead. This is because the has_valid_type_promotion signature was dangerous - it returned an optional type that was often interpreted only as a bool (the actual type inside it was ignored). I've replace the call sites that only tested the bool with calls to the new is_valid_type_promotion_to_target that only returns a bool.

Update tests: I deliberately promote to float64 here so we don't lose precision from the int32
…get that just returns a bool

To prevent dangerous uses where only the static_cast<bool> of its return value was used, but where the type to be promoted to was not the second argument given to it.
@poodlewars poodlewars marked this pull request as ready for review February 11, 2025 09:22
// Otherwise, if only one type is floating point, always promote to double
// For example when combining int32 and float32 the result can only fit in float64 without loss of precision
// Special cases like int16 and float32 can fit in float32, but we always promote up to float64 (as does Pandas)
double
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically a semver change of behaviour



def test_resample_mean_large_arithmetic_error_repro(lmdb_version_store_v1):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could move to nonreg tests

@@ -860,7 +860,7 @@ def generic_resample_test(
but it cannot take parameters such as origin and offset.
"""
# Pandas doesn't have a good date_range equivalent in resample, so just use read for that
expected = lib.read(sym, date_range=date_range).data
original_data = lib.read(sym, date_range=date_range).data
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is just to help debugging tests

@poodlewars poodlewars added bug Something isn't working patch Small change, should increase patch version labels Mar 3, 2025
@poodlewars poodlewars merged commit e55aa2f into master Mar 4, 2025
155 of 156 checks passed
@poodlewars poodlewars deleted the aseaton/8371297134/type-promotion branch March 4, 2025 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working patch Small change, should increase patch version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants