Skip to content

Fix convert_int_to_float in compact_incomplete V1 library API method #2235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 18, 2025

Conversation

vasil-pashov
Copy link
Collaborator

@vasil-pashov vasil-pashov commented Mar 12, 2025

Reference Issues/PRs

What does this implement or fix?

This fixes the behavior of convert_int_to_float exposed via the V1 Library API's compact_incomplete method.

Convert int to float will change the type of all integer columns (both signed and unsigned) to float64. In contrast to dynamic schema conversion this will static cast the values of all segments to float64 prior writing. This functionality was broken when we made the stricter type checks for staged segments.

Additional bugfix was added for dynamic schema. Dynamic schema has a different code path as it calls merge descriptors when it collects all incomplete segments thus the option was not properly applied even in arcticc. Dynamic schema used to throw exception when unsigned and signed int were mixed together (in the same column) but this should not matter as they are both going to be cast to double.

There are two main points:

  • columns_match was changed to take convert int to float as a parameter
  • merge_descriptors was changed to take convert int to float as a parameter

Any other comments?

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@vasil-pashov vasil-pashov added the patch Small change, should increase patch version label Mar 12, 2025
@vasil-pashov vasil-pashov force-pushed the vasil.pashov/fix-convert-int-to-float branch from 4fa6bb9 to 23af9d8 Compare March 14, 2025 13:20
Copy link
Collaborator

@IvoDD IvoDD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your tests pretty much cover all cases with convert_int_to_float=True.

What do you think about also adding tests that we get schema exseptions if convert_int_to_float=False? (e.g. to ensure we don't accidentally pass convert_int_to_float=True somewhere deep in C++ where we shouldn't). It is possible we already have a bunch of these tests somewhere else, haven't looked through them

auto new_descriptor = has_valid_common_type(existing_type_desc, type_desc);
if(new_descriptor) {
merged_fields_map[field.name()] = *new_descriptor;
if (convert_int_to_float && is_integer_type(existing_type_desc.data_type()) && is_integer_type(type_desc.data_type())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the existing type is float64? Or float32?

assert_frame_equal(expected, lib.read(sym).data, check_dtype=True)

@pytest.mark.parametrize("dtype1", [np.int32, np.uint16, np.int8, np.int64, np.uint64, np.float64])
@pytest.mark.parametrize("dtype2", [np.int32, np.uint16, np.int8, np.int64, np.uint64, np.float64])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include float32 in the tested dtypes?

expected = pd.DataFrame({"a": np.arange(1, 7, dtype=np.double)}, index=pd.date_range(pd.Timestamp(0), periods=6, freq="ns"))
assert_frame_equal(expected, lib.read(sym).data, check_dtype=True)

@pytest.mark.parametrize("dtype2", [np.int32, np.uint16, np.int8, np.int64, np.uint64, np.float64])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include float32?

@vasil-pashov vasil-pashov force-pushed the vasil.pashov/fix-convert-int-to-float branch from 37adac2 to 3d55226 Compare March 18, 2025 08:29
@vasil-pashov vasil-pashov merged commit 24dfb54 into master Mar 18, 2025
154 checks passed
@vasil-pashov vasil-pashov deleted the vasil.pashov/fix-convert-int-to-float branch March 18, 2025 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Small change, should increase patch version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants