Skip to content

mappify should set missing values (for rows shorter than header) to nil #65

Closed
@ABeltramo

Description

@ABeltramo

I found it weird and undocomunted but with the following malformed CSV (first two lines have less commas than the header)

sepal_length,sepal_width,petal_length,petal_width,label,id,test_train
5.8,4,1.2,0.2,15
4.8,3,1.4,
4.3,3,1.1,0.1,Iris-setosa,14,train

Using the mappify method will produce the following:

{:sepal_length "5.8", :sepal_width "4", :petal_length "1.2", :petal_width "0.2", :label "15"}
{:sepal_length "4.8", :sepal_width "3", :petal_length "1.4", :petal_width ""}
{:sepal_length "4.3", :sepal_width "3", :petal_length "1.1", :petal_width "0.1", :label "Iris-setosa", :id "14", :test_train "train"}

As you can see some rows are smaller than others, totally missing from the mappified results. I was expecting that all rows will have the same size, with nil values when something is missing from the CSV.

Bare in mind that using {:structs true} will produce the expected results:

{:sepal_length "5.8", :sepal_width "4", :petal_length "1.2", :petal_width "0.2", :label "15", :id nil, :test_train nil}
{:sepal_length "4.8", :sepal_width "3", :petal_length "1.4", :petal_width "", :label nil, :id nil, :test_train nil}
{:sepal_length "4.3", :sepal_width "3", :petal_length "1.1", :petal_width "0.1", :label "Iris-setosa", :id "14", :test_train "train"}

I have some other issues using structs but I will probably open another issue when I can get a reproducible environment.

I'll open up a pull request with a fix I have made in order to fix this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions