Surprised by type inference: csvjoin removing underscores in columns containg values like: 1_100

Hello,

I've spent a while figuring out why underscores were getting removed in what I believe is a fairly simple use-case. I'm using `csvjoin` 2.0.0

Here's a minimal example:

`file1.csv` contents:
```
Name,Type
MHYS,foo
JABI,bar
```

`file2.csv` contents:
```
Name,ID
MHYS,100_1
JABI,1030_11
```
Running ` csvjoin -c Name file1.csv file2.csv` produces the following:
```
Name,Type ,ID
MHYS,foo,1001
JABI,bar,103011
```

The underscores in the ID field are getting dropped.  I've tracked this down to type inference.  Running `csvjoin` with the `--no-inference` option produces the desired behaviour.

I was a bit surprised by this, as it seems a bit aggressive of a default inference on what I believe to be a very common text 
pattern.  I've had my share of being bitten by type inference in the tidyverse and when using pandas, but these sort of fields were never an issue.

Finding a type inference method that handles all situations perfectly is a pipe dream, and I don't have an exact solution, but I'd like to point out that:
1. This happens silently, I was lucky to catch the error while debugging my data pipeline.
1. Figuring out the root cause was a bit of a time sink. My main reason for filing this issue is to ensure that even if there isn't a good way to fix the issue, this post may help others searching for 'missing/dropped/removed underscores'.  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Surprised by type inference: csvjoin removing underscores in columns containg values like: 1_100 #1246

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Surprised by type inference: csvjoin removing underscores in columns containg values like: 1_100 #1246

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions