Skip to content

vectorization refactor #1

@mdsumner

Description

@mdsumner

I'm in the middle of this

compute_spatial_window <- function(locations) {
  locations |>
    mutate(
      crs = mk_utm_crs(lon, lat),
      utm_x = mk_utm_centre_x(lon, lat),
      utm_y = mk_utm_centre_y(lon, lat),
      utm_xmin = utm_x - radiusx,
      utm_xmax = utm_x + radiusx,
      utm_ymin = utm_y - radiusy,
      utm_ymax = utm_y + radiusy
    ) |>
    # buffer_extent and reproj_extent need rowwise or vectorized versions
    rowwise() |>
    mutate(
      utm_extent = list(buffer_extent(c(utm_xmin, utm_xmax, utm_ymin, utm_ymax), resolution)),
      ll_extent = list(reproj_extent_safe(utm_extent, crs))
    ) |>
    ungroup() |>
    mutate(
      lonmin = map_dbl(ll_extent, 1),
      lonmax = map_dbl(ll_extent, 2),
      latmin = map_dbl(ll_extent, 3),
      latmax = map_dbl(ll_extent, 4)
    ) |>
    select(-utm_extent, -ll_extent)
}

Or write thin vectorized wrappers for buffer_extent / reproj_extent that take column vectors and return data frames:

rbuffer_extent_v <- function(xmin, xmax, ymin, ymax, res) {
  pmap_dfr(list(xmin, xmax, ymin, ymax, res), \(xn, xx, yn, yx, r) {
    e <- buffer_extent(c(xn, xx, yn, yx), r)
    tibble(utm_xmin = e[1], utm_xmax = e[2], utm_ymin = e[3], utm_ymax = e[4])
  })
}

might not even need iteration = "list" anymore:

tar_target(spatial_window, 
           compute_spatial_window(locations), 
           pattern = map(locations)),
# spatial_window: 1 or 2 rows per location (AM split)

tar_target(query_specs,
           prepare_query(spatial_window, collection, provider),
           pattern = map(spatial_window)),
# query_specs: data frame, 1 row per spatial_window row

tar_target(assets_table, 
           get_assets_from_urls(query_specs),
           pattern = map(query_specs))
# assets_table: data frame, auto row-binds

Data frames in, data frames out, targets just binds them together. No list wrangling.
The mental model is now just: locations → (explode at AM) → queries → assets → (regroup by location_id later if needed).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions