Skip to content

Add set key to data.table operations.  #90

Open
@ricardonovaes

Description

@ricardonovaes

setkey make joins extremely faster in data.tables, the codes over join benchmark are not setting the keys properly and can affect the main results.

It is also important in other kinds of data manipulation such as deduce. for instance:
setkey(DT, key)
unique(DT, by = 'key')

is very much faster than
unique(DT, by 'key')

This can go from 15 minutes to seconds for 100GB+ datasets

Joins work the same way:

setkey(DTA, key)
setkey(DTB, key)

DTA[DTB, on = .(key)]

I hope it can make the benchmar better!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions