Normalization (and automated conversion R/Python) of data types ?

What would be the ideal input for the dtype normalization below ? say that the input arrays are composed of characters should we pass `S8` or `U8` or is it possible to pass `chararcter` and let `Dtype` automatically figure out the correct numpy data type ? 

https://github.com/keller-mark/pizzarr/blob/f84355d2708c22dc6e703f3cdd83d218221b352a/R/normalize.R#L106-L130

The typical scenario would be that one inserts a full character array, then type is provided to the `dtype` appropriately. 

```
zarr.array <- pizzarr::zarr_open(store = "data/string_test.zarr")
z1 <- zarr.array$create_dataset(name = "assay", data = array(rep("a", 10), dim = 10), shape = 10)
zarr.array$get_item("assay")$get_item("...")$data
```

```
[1] "Buffer has ${numDataElements} of dtype ${dtype}, shape is too large or small"
Error in private$chunk_getitem_part2(part1_result, proj$chunk_coords,  : 
  Different type of error - rethrow
```

Looks like now the type is given as "<f8" (float32 ?) if not provided by the user.

	#' @keywords internal
	normalize_dtype <- function(dtype, object_codec = NA) {
	# Reference: https://github.com/zarr-developers/zarr-python/blob/5dd4a0e6cdc04c6413e14f57f61d389972ea937c/zarr/util.py#L152

	if(is_na(dtype)) {
	# np.dtype(None) returns 'float64'
	if(!is_na(object_codec)) {
	stop("expected object_codec to be NA due to NA dtype")
	}
	return(Dtype$new("<f8"))
	}

	# Construct Dtype instance.
	# convenience API for object arrays
	if("Dtype" %in% class(dtype)) {
	return(dtype)
	}

	if(is.character(dtype)) {
	# Filter list was NA but there could be non-NA object_codec parameter.
	return(Dtype$new(dtype, object_codec = object_codec))
	}

	stop("dtype must be NA, string/character vector, or Dtype instance")
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Normalization (and automated conversion R/Python) of data types ? #110

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Normalization (and automated conversion R/Python) of data types ? #110

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions