Skip to content

tuple/table ambiguity #182

Open
Open
@ablaom

Description

@ablaom

The scitype of a tuple is intended to be the Tuple of the element scitypes. For example:

julia> scitype((1.0, 4))
Tuple{Continuous, Count}

By this logic, if I create a 1-tuple with a table t as it's single element, then this tuple should have Tuple{scitype(t)}. But this isn't always the case:

t = (x=[1, 2], y=["a", "b"])

julia> scitype(t)
Table{Union{AbstractVector{Count}, AbstractVector{Textual}}}

julia> scitype((t,))
Table{Union{AbstractVector{AbstractVector{Count}}, AbstractVector{AbstractVector{Textual}}}}

The problem is that (t, ) is also a table (with one row):

julia> schema((t,))
┌───────┬─────────────────────────┬────────────────┐
│ names │ scitypes                │ types          │
├───────┼─────────────────────────┼────────────────┤
│ x     │ AbstractVector{Count}   │ Vector{Int64}  │
│ y     │ AbstractVector{Textual} │ Vector{String} │
└───────┴─────────────────────────┴────────────────┘

This is pretty awful 😢 . For example it makes it tricky, in MLJBase, to use the fit_data_scitype of models, to check compatibility of a model with data, as in JuliaAI/MLJBase.jl#731 . That is, the test scitype(data) <: fit_data_scitype(model) where data is the tuple of data arguments, is not reliable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions