-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Labels
Description
Float values with zero as the fractional component e.g. '0.0'
, '0.1'
, '1.00'
are detected as int
instead of float
. This is because they can be parsed as int
according to fntools.is_int
. Although the data could be interpreted as an integer, given that the source has a decimal place I would argue that detect_types
should not perform casting to an integer. For example, database reports may include data from float/decimal columns which, just by chance, have no fractional component however this doesn't mean they should not be treated as floats.
Example Test Case
diff --git a/tests/test_process.py b/tests/test_process.py
index cc538a2..9b720e5 100644
--- a/tests/test_process.py
+++ b/tests/test_process.py
@@ -76,6 +76,12 @@ class Test:
nt.assert_equal(Decimal('0.87'), result['confidence'])
nt.assert_false(result['accurate'])
+ def test_detect_types_floats_zero_fractional_component(self):
+ records = it.cycle([{"foo": '0.0'}, {"foo": "1.0"}, {"foo": "10.00"}])
+ records, result = pr.detect_types(records)
+
+ nt.assert_equal(result["types"], [{"id": "foo", "type": "float"}])
+
def test_fillempty(self):
records = [
{'a': '1', 'b': '27', 'c': ''},
Fails with:
AssertionError: Lists differ: [{'id': 'foo', 'type': 'int'}] != [{'id': 'foo', 'type': 'float'}]
First differing element 0:
{'id': 'foo', 'type': 'int'}
{'id': 'foo', 'type': 'float'}
- [{'id': 'foo', 'type': 'int'}]
? ^^
+ [{'id': 'foo', 'type': 'float'}]
? ^^^^
Potential Solutions
- Prefer stricter type inference for floats by default; e.g. if it has decimal places, it's a float.
- Allow stricter type inference for floats via an option e.g. a kwarg to
detect_types
that is passed down tois_int
to change the behaviour from "can this be parsed as an int" to "this is definitely an int" - Any other ideas of course!
I am happy to implement the changes required after a decision is made on the correct behaviour 😄