-
Notifications
You must be signed in to change notification settings - Fork 143
Fix convert_int_to_float in compact_incomplete V1 library API method #2235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4fa6bb9
to
23af9d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your tests pretty much cover all cases with convert_int_to_float=True
.
What do you think about also adding tests that we get schema exseptions if convert_int_to_float=False
? (e.g. to ensure we don't accidentally pass convert_int_to_float=True
somewhere deep in C++ where we shouldn't). It is possible we already have a bunch of these tests somewhere else, haven't looked through them
auto new_descriptor = has_valid_common_type(existing_type_desc, type_desc); | ||
if(new_descriptor) { | ||
merged_fields_map[field.name()] = *new_descriptor; | ||
if (convert_int_to_float && is_integer_type(existing_type_desc.data_type()) && is_integer_type(type_desc.data_type())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the existing type is float64? Or float32?
assert_frame_equal(expected, lib.read(sym).data, check_dtype=True) | ||
|
||
@pytest.mark.parametrize("dtype1", [np.int32, np.uint16, np.int8, np.int64, np.uint64, np.float64]) | ||
@pytest.mark.parametrize("dtype2", [np.int32, np.uint16, np.int8, np.int64, np.uint64, np.float64]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include float32 in the tested dtypes?
expected = pd.DataFrame({"a": np.arange(1, 7, dtype=np.double)}, index=pd.date_range(pd.Timestamp(0), periods=6, freq="ns")) | ||
assert_frame_equal(expected, lib.read(sym).data, check_dtype=True) | ||
|
||
@pytest.mark.parametrize("dtype2", [np.int32, np.uint16, np.int8, np.int64, np.uint64, np.float64]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include float32?
37adac2
to
3d55226
Compare
Reference Issues/PRs
What does this implement or fix?
This fixes the behavior of
convert_int_to_float
exposed via the V1 Library API'scompact_incomplete
method.Convert int to float will change the type of all integer columns (both signed and unsigned) to float64. In contrast to dynamic schema conversion this will static cast the values of all segments to float64 prior writing. This functionality was broken when we made the stricter type checks for staged segments.
Additional bugfix was added for dynamic schema. Dynamic schema has a different code path as it calls merge descriptors when it collects all incomplete segments thus the option was not properly applied even in arcticc. Dynamic schema used to throw exception when unsigned and signed int were mixed together (in the same column) but this should not matter as they are both going to be cast to double.
There are two main points:
Any other comments?
Checklist
Checklist for code changes...