Skip to content

compression size on highly redundant data #96

Open
@arunsrinivasan

Description

@arunsrinivasan

Hi Mark, I was wondering if you've an explanation as to why the compression from fst on this particular data.frame seems to end up with larger size compared to native (rda) format. The entropy of each column is the minimal it could be... Do you think there's room for improvements on such cases?

require(fst) # CRAN  version
df <- data.frame(
        x=rep(1, 1e8),
        y=rep(2, 1e8),
        z=rep(3, 1e8)
      )

fst <- tempfile()
rda <- tempfile()

write.fst(df, fst, compress=100) # 2s
save(list="df", file=rda)        # 22s

file.info(fst)$size/1024 # 5102.4 KB
file.info(rda)$size/1024 # 3410.6 KB

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions