-
-
Notifications
You must be signed in to change notification settings - Fork 14
Description
At the moment, when we read / write data to layers
, obsm
, varm
, obsp
, varp
, and uns
, the whole named list is loaded into memory in order to fetch / write a single object.
How about we create something like a LazyNamedList:
#' @title Lazy Named List
#'
#' @description A lazy named list that loads elements on-demand to avoid
#' materializing large objects unnecessarily. Used internally for efficient
#' access to layers, obsm, varm, obsp, varp slots.
#'
#' @keywords internal
LazyNamedList <- R6::R6Class(
"LazyNamedList",
public = list(
#' @description Create a new LazyNamedList
#' @param get_keys_fn Function that returns all available keys: function() -> keys
#' @param set_keys_fn Function to set all keys: function(keys) -> invisible()
#' @param get_value_fn Function to get element by key: function(key) -> object
#' @param set_value_fn Function to set element by key: function(key, value) -> invisible()
#' @param set_bulk_fn Function to set multiple elements: function(named_list) -> invisible()
#' @param use_cache Whether to use caching (default TRUE)
initialize = function(
get_keys_fn,
set_keys_fn,,
get_value_fn,
set_value_fn,
set_bulk_fn,
use_cache = TRUE
) {
# ...
},
),
# ...
)
The HDF5AnnData
, InMemoryAnnData
, ReticulateAnnData
and AnnDataView
will then need to implement the get_keys_fn
, set_keys_fn
, get_value_fn
, etc... functions. In AbstractAnnData, the slots are then updated to something like:
#' @field obsm See [AnnData-usage]
obsm = function(value) {
proxy <- LazyNamedList$new(
get_keys_fn = function() self$layers_keys(),
set_keys_fn = function(keys) self$private.set_layers_keys(keys),
get_value_fn = function(name) private$.get_layers_value(name),
set_value_fn = function(name, value) private$.set_layers_value(name, value),
set_bulk_fn = function(named_list) private$.set_layers_values(named_list)
)
if (missing(value)) {
proxy
} else {
proxy$set_bulk_fn(value)
}
}
Instead of a function for setting the values in bulk, we could also have a clear_fun
which empties current named list, and then use the set_value_fn
to populate the struct with the values individually. It might make the implementation (especially in HDF5AnnData) simpler, but maybe less efficient.
In this proposal I added the possibility for caching values. For InMemoryAnnData this wouldn't make sense, but for HDF5AnnData it might, though I'm worried it might cause more issues than it solves. I'd be inclined to not include any caching in this implementation actually.
@lazappi @LouiseDck Wdyt?