Skip to content

stars_proxy memory hog #708

@dazu89

Description

@dazu89

Intending to build a high-dimensional data cube from raster files in plain text ASCII grid format I read all files' meta data (file path and attributes) into a data frame (1), group by dimensions and concatenate files in each group into a stars_proxy (2) to then summarize/concantenate the stars_proxys into a higher dimensional star_proxy (3), similar to the process described in this post on StackExchange or this Github issue.

Upon loading the star_proxy via my_star_proxy |> st_as_stars() the memory usage ascends into 10s of GB even if only a couple of files with file size of 5-10 MB are read. The problem only occurs with files of the following format

ncols                   500
nrows                  500
xllcorner              6.5
yllcorner              -65.5
cellsize                 0.002
NODATA_value            -9.9990E+03
-9.9990E+03 -9.9990E+03 -9.9990E+03 -9.9990E+03 -9.9990E+03 ...
-9.9990E+03 -9.9990E+03  0.5000E-02  1.5000E+02 -9.9990E+03 ...
-9.9990E+03 -9.9990E+03 -9.9990E+03 -9.9990E+03 -9.9990E+03 ...
.			.			.			.			.			.
.			.			.			.			.			 .
.			.			.			.			.			  .

whereas with standard data no such problem occurs and only a couple 100 MB are used.

library(stars)
library(profmem)
options(profmem.threshold = 1e6)
tif = system.file("tif/L7_ETMs.tif", package = "stars")
rs_mem = read_stars(tif)
print(object.size(rs_mem), standard = "SI", units = 'auto')
r = read_stars(list(a = c(tif,tif), b = c(tif, tif)), proxy = TRUE)
(xx = st_redimension(r, along = list(foo = 1:4)))
(rr = c(xx, xx))
(rrr = st_redimension(rr, along = list(bar = as.Date(c("2001-01-01", "2002-01-01")))))
p <- profmem({
  test = rrr |> st_as_stars()
})
sum(p$bytes, na.rm=TRUE) / 1e6

I suspect, I should supply some options to the read_stars routine but so far have not good guess.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions