Tables.jl and DataAPI.jl interoperation

@ablaom I am not sure if this is the best place to start this discussion, but it is a follow up to https://discourse.julialang.org/t/random-access-to-rows-of-a-table/77386 and https://github.com/JuliaData/Tables.jl/pull/278.

The key point is to avoid creating functions having essentially the same functionalities across DataAPI.jl, Tables.jl, and MLUtils.jl (possibly other ML packages I am not aware of).

Assume for a moment that Tables.jl table is a source of data for some ML model and you want operations to be efficient.

My understanding that your high-level workflow is the following:
1. the user starts with a Tables.jl table.
2. then the user does observation subsetting, feature selection, feature transformation operations on this table (either eagerly or lazily).
3. finally the user transforms the result of step 2 to an object to some other type (again - either lazily or eagerly) to another value that can be accepted as an input by the ML algorithm.

The question is:

What functionalities you need to have in DataAPI.jl and Tables.jl so that it is efficient and you do not need to provide duplicate definitions of concepts in MLUtils.jl (or some other packages)?
Another consideration (raised in the linked discussions) is that I would expect that what we develop is consistent with the interfaces that Base Julia already defines (e.g. iterator interface, abstract vector interface, indexing interface, view interface)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tables.jl and DataAPI.jl interoperation #67

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tables.jl and DataAPI.jl interoperation #67

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions