Skip to content

Allow process.detect_types to match last type instead of the first #34

@SteadBytes

Description

@SteadBytes

Float values with zero as the fractional component e.g. '0.0', '0.1', '1.00' are detected as int instead of float. This is because they can be parsed as int according to fntools.is_int. Although the data could be interpreted as an integer, given that the source has a decimal place I would argue that detect_types should not perform casting to an integer. For example, database reports may include data from float/decimal columns which, just by chance, have no fractional component however this doesn't mean they should not be treated as floats.

Example Test Case

diff --git a/tests/test_process.py b/tests/test_process.py
index cc538a2..9b720e5 100644
--- a/tests/test_process.py
+++ b/tests/test_process.py
@@ -76,6 +76,12 @@ class Test:
         nt.assert_equal(Decimal('0.87'), result['confidence'])
         nt.assert_false(result['accurate'])
 
+    def test_detect_types_floats_zero_fractional_component(self):
+        records = it.cycle([{"foo": '0.0'}, {"foo": "1.0"}, {"foo": "10.00"}])
+        records, result = pr.detect_types(records)
+
+        nt.assert_equal(result["types"], [{"id": "foo", "type": "float"}])
+
     def test_fillempty(self):
         records = [
             {'a': '1', 'b': '27', 'c': ''},

Fails with:

AssertionError: Lists differ: [{'id': 'foo', 'type': 'int'}] != [{'id': 'foo', 'type': 'float'}]

First differing element 0:
{'id': 'foo', 'type': 'int'}
{'id': 'foo', 'type': 'float'}

- [{'id': 'foo', 'type': 'int'}]
?                         ^^

+ [{'id': 'foo', 'type': 'float'}]
?                         ^^^^

Potential Solutions

  • Prefer stricter type inference for floats by default; e.g. if it has decimal places, it's a float.
  • Allow stricter type inference for floats via an option e.g. a kwarg to detect_types that is passed down to is_int to change the behaviour from "can this be parsed as an int" to "this is definitely an int"
  • Any other ideas of course!

I am happy to implement the changes required after a decision is made on the correct behaviour 😄

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions