Skip to content

Normalization (and automated conversion R/Python) of data types ? #110

@Artur-man

Description

@Artur-man

What would be the ideal input for the dtype normalization below ? say that the input arrays are composed of characters should we pass S8 or U8 or is it possible to pass chararcter and let Dtype automatically figure out the correct numpy data type ?

pizzarr/R/normalize.R

Lines 106 to 130 in f84355d

#' @keywords internal
normalize_dtype <- function(dtype, object_codec = NA) {
# Reference: https://github.com/zarr-developers/zarr-python/blob/5dd4a0e6cdc04c6413e14f57f61d389972ea937c/zarr/util.py#L152
if(is_na(dtype)) {
# np.dtype(None) returns 'float64'
if(!is_na(object_codec)) {
stop("expected object_codec to be NA due to NA dtype")
}
return(Dtype$new("<f8"))
}
# Construct Dtype instance.
# convenience API for object arrays
if("Dtype" %in% class(dtype)) {
return(dtype)
}
if(is.character(dtype)) {
# Filter list was NA but there could be non-NA object_codec parameter.
return(Dtype$new(dtype, object_codec = object_codec))
}
stop("dtype must be NA, string/character vector, or Dtype instance")
}

The typical scenario would be that one inserts a full character array, then type is provided to the dtype appropriately.

zarr.array <- pizzarr::zarr_open(store = "data/string_test.zarr")
z1 <- zarr.array$create_dataset(name = "assay", data = array(rep("a", 10), dim = 10), shape = 10)
zarr.array$get_item("assay")$get_item("...")$data
[1] "Buffer has ${numDataElements} of dtype ${dtype}, shape is too large or small"
Error in private$chunk_getitem_part2(part1_result, proj$chunk_coords,  : 
  Different type of error - rethrow

Looks like now the type is given as "<f8" (float32 ?) if not provided by the user.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions