Skip to content

Can MLUtils play nicely with Tables.jl? #61

Closed
@ablaom

Description

I think one could get greatly increase buy-in for MLUtil.jl if every Tables.jl compatible table would automatically implement the "data container" API. To get performance, one would still want to implement the concrete table types as well, but having it "just work" for all tables would be nice. I guess, since "table" is itself just an interface, rather than an abstract type, this would need to be implemented as part of the data container API, right? As Tables.jl is very lightweight, I don't see that as a big issue (and I could probably find someone to help with the integration).

Even so, there seems to be a problem implementing the interface for certain tables. MLUtils.jl interprets tuples in a very specific way. For example shuffleobs((x1, x2)) treats x1 and x2 as separate data containers, which are to be shuffled simultaneously, with the same base observation index shuffle. But some tables are tuples. The following example is even a tuple-table whose elements are themselves tables (of a different type):

julia> X
((a = [1, 3], b = [2, 3]), (a = [2, 5], b = [4, 7]))

julia> Tables.istable(X)
true

So is such a tuple a pair of data containers or a single data container? The current API cannot distinguish them.

I wonder:

  1. How attached are people to current tuple-based dispatch for coupled multi-container processing?
  2. Is there a big use-case for tables that are also tuples? @quinnj

Possibly this discussion is related.

Tables that are tuples are problematic elsewhere.

@oxinabox @rikhuijzer @darsnack

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions