-
Notifications
You must be signed in to change notification settings - Fork 95
Description
There is no way for now to express that a field should be a multidimensional array, for example a 4x4 matrix.
An example of dataset with such a need: MatrixCity (https://github.com/city-super/MatrixCity), where there is a rotation matrix field in the data (distributed as JSON in example):
{
"frame_index": 0,
"rot_mat": [
[
-0.009902680292725563,
0.0010966990375891328,
-0.0008568363264203072,
-590.0
],
[
-0.0013917317846789956,
-0.0078034186735749245,
0.006096699275076389,
590.0
],
[
-8.448758914703092e-10,
0.0061566149815917015,
0.007880106568336487,
200.0
],
[
0.0,
0.0,
0.0,
1.0
]
],
"euler": [
0.6632251739501953,
8.44875884808971e-08,
-3.0019662380218506
]
},One possibility might be to use JSON schema to represent such an array:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "array",
"items": {
"type": "array",
"items": {"type": "number"},
"minItems": 4,
"maxItems": 4
},
"minItems": 4,
"maxItems": 4
}The benefit here is that JSON schema is quite complete, so it would be possible to express complex cases, including arrays of different types (useful in multimodal prompts for example).
The downside is that the range of possible schemas is quite large, and there is the risk that some datasets would end-up with one field defined in Croissant, that field type being a complex JSON-schema described object... That would also significantly increase the implementation complexity.
A possible alternative might be to define our own Array dataType in the croissant namespace, similarly to cr:BoundingBox. For example, something like:
{
"@type": "cr:Field",
"@id": "recordsetName/rotation_matrix",
"description": "The rotation matrix.",
"dataType": "cr:Array",
"dataTypeParams": {
"dimensions": [4, 4],
"dataType": "sc:Float"
},
"source": {
"fileSet": { ... },
"extract": {
"jsonPath": "..."
}
}
}What do you folks think?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status