Open
Description
setkey make joins extremely faster in data.tables, the codes over join benchmark are not setting the keys properly and can affect the main results.
It is also important in other kinds of data manipulation such as deduce. for instance:
setkey(DT, key)
unique(DT, by = 'key')
is very much faster than
unique(DT, by 'key')
This can go from 15 minutes to seconds for 100GB+ datasets
Joins work the same way:
setkey(DTA, key)
setkey(DTB, key)
DTA[DTB, on = .(key)]
I hope it can make the benchmar better!!
Metadata
Metadata
Assignees
Labels
No labels