8371297134 Type promotion between integral and floating types #2173

poodlewars · 2025-02-10T17:59:58Z

To fix the bug reported in 8371297134

Write an int64 column
Append a float32 to it
Read it -> We blow up at read time

I've added logic so that when we merge descriptors we combine:

any integer + float 64 -> float 64
integer up to 16 bits + float 32 -> float 32
integer above 16 bits + float 32 -> float 64

This is because a 16 bit integer can safely fit in a 32 bit float without loss of precision, whereas a 32 bit integer cannot. We could instead always promote up to float 64 which would be more wasteful but simpler.

Separately, for query builder pipelines I added changes so that when combining a float and an integer we always promote up to float64. This misses the cases where it is actually safe to promote to float32, but is simpler and matches Pandas. This is the cause of the test change in test_sort_merge.py.

I reworked has_valid_type_promotion, introducing a new function is_valid_type_promotion_to_target instead. This is because the has_valid_type_promotion signature was dangerous - it returned an optional type that was often interpreted only as a bool (the actual type inside it was ignored). I've replace the call sites that only tested the bool with calls to the new is_valid_type_promotion_to_target that only returns a bool.

Update tests: I deliberately promote to float64 here so we don't lose precision from the int32

…get that just returns a bool To prevent dangerous uses where only the static_cast<bool> of its return value was used, but where the type to be promoted to was not the second argument given to it.

cpp/arcticdb/entity/type_utils.cpp

python/tests/unit/arcticdb/version_store/test_column_type_changes.py

… tolerance as a result

cpp/arcticdb/pipeline/execution.hpp

alexowens90 · 2025-03-03T09:37:08Z

cpp/arcticdb/processing/operation_types.hpp

+                // Otherwise, if only one type is floating point, always promote to double
+                // For example when combining int32 and float32 the result can only fit in float64 without loss of precision
+                // Special cases like int16 and float32 can fit in float32, but we always promote up to float64 (as does Pandas)
+                double


This is technically a semver change of behaviour

alexowens90 · 2025-03-03T09:46:34Z

python/tests/hypothesis/arcticdb/test_resample.py

+
+
+def test_resample_mean_large_arithmetic_error_repro(lmdb_version_store_v1):


Could move to nonreg tests

python/tests/unit/arcticdb/version_store/test_column_type_changes.py

poodlewars · 2025-03-03T11:16:02Z

python/arcticdb/util/test.py

@@ -860,7 +860,7 @@ def generic_resample_test(
    but it cannot take parameters such as origin and offset.
    """
    # Pandas doesn't have a good date_range equivalent in resample, so just use read for that
-    expected = lib.read(sym, date_range=date_range).data
+    original_data = lib.read(sym, date_range=date_range).data


This change is just to help debugging tests

poodlewars added 5 commits February 10, 2025 17:01

Type promotion between integral and floating types

3f006eb

Update tests: I deliberately promote to float64 here so we don't lose precision from the int32

Make type arithmetic for float + int projections match Pandas

ad4b88b

Split has_valid_type_promotion up with is_valid_type_promotion_to_tar…

557f796

…get that just returns a bool To prevent dangerous uses where only the static_cast<bool> of its return value was used, but where the type to be promoted to was not the second argument given to it.

Fixup after rebase

064f14f

Remove a duplicated test I added by mistake

5280445

poodlewars requested a review from vasil-pashov February 10, 2025 18:00

poodlewars marked this pull request as ready for review February 11, 2025 09:22

poodlewars requested review from alexowens90 and willdealtry as code owners February 11, 2025 09:23

vasil-pashov reviewed Feb 12, 2025

View reviewed changes

poodlewars added 2 commits February 20, 2025 11:09

Code review comments

0c4b25b

Add repro of a resample test that also fails on master, increase test…

04eeafb

… tolerance as a result

vasil-pashov approved these changes Feb 20, 2025

View reviewed changes

willdealtry approved these changes Feb 26, 2025

View reviewed changes

alexowens90 reviewed Mar 3, 2025

View reviewed changes

cpp/arcticdb/pipeline/execution.hpp Outdated Show resolved Hide resolved

alexowens90 reviewed Mar 3, 2025

View reviewed changes

python/tests/unit/arcticdb/version_store/test_column_type_changes.py Show resolved Hide resolved

Implement Alex's simple PR comments

de6adfd

poodlewars commented Mar 3, 2025

View reviewed changes

Back out major API change fix to query builder type promotion

01d8b6c

poodlewars added bug Something isn't working patch Small change, should increase patch version labels Mar 3, 2025

alexowens90 approved these changes Mar 3, 2025

View reviewed changes

poodlewars merged commit e55aa2f into master Mar 4, 2025
155 of 156 checks passed

poodlewars deleted the aseaton/8371297134/type-promotion branch March 4, 2025 09:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8371297134 Type promotion between integral and floating types #2173

8371297134 Type promotion between integral and floating types #2173

Uh oh!

poodlewars commented Feb 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexowens90 Mar 3, 2025

Uh oh!

alexowens90 Mar 3, 2025

Uh oh!

Uh oh!

poodlewars Mar 3, 2025

Uh oh!

Uh oh!

Uh oh!



		def test_resample_mean_large_arithmetic_error_repro(lmdb_version_store_v1):

8371297134 Type promotion between integral and floating types #2173

8371297134 Type promotion between integral and floating types #2173

Uh oh!

Conversation

poodlewars commented Feb 10, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexowens90 Mar 3, 2025

Choose a reason for hiding this comment

Uh oh!

alexowens90 Mar 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

poodlewars Mar 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!