-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathnotes.txt
More file actions
155 lines (101 loc) · 6.3 KB
/
notes.txt
File metadata and controls
155 lines (101 loc) · 6.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
------------------------------------
SELECTORS:
two types:
builder selectors: i.e. path = "dataset_description.json"
or
datatype == "nirs"
suffix == "channels"
extension == ".tsv"
conditional selectors i.e. json.NIRSCoordinateSystem == "other"
or
!exists("citation.cff")
types of builder fields:
"subject": notImplemented,
"path": notImplemented,
"entities": notImplemented,
"datatype": notImplemented,
"suffix": notImplemented,
"extension": notImplemented,
"modality": notImplemented,
Primary issues:
seperating "builder" and "conditional" selectors:
builders must be used to add needed metadata to a file.
whereas conditionals must be used to hook to existing matching files and add metadata if hooked fields change
Different parsing depending on where in the schema:
add data:
rules/files/raw: datatypes are a "conditional",
-> Files are only added under rules/files
sidecars, tabular_data, json etc... are only selectors
Options:
- interpret "selectors" once at runtime (as much as we can) -> probably the better option?
OR
interpret them each time?
- interpret the schema completely? i.e. convert it into a graph and then fill in data, i.e. merge the metadata under rules and objects into one already
OR
- dynamically interpret at runtime...
Underlying issue:
a lot of extra stuff, possibly interpreting everything adds a couple seconds of unnecessary stuff each time we make a dataset even though 95% of it won't be used
also requires everything to be implemented, whereas if we interpret as we go then some stuff (which never gets touched really, i.e. some checks before they are implemented) can be ignored
and not needed to be implemented
solution:
dynamic approach -> interpret at runtime, but if something gets interpreted (a selector) then replace it in the schema with the selector object hook rather than re interpret if called again
-> high chance of reuse (i.e. if we do a eeg dataset and call on eeg hooks, then we will probably add more than one subject of eeg data)
-> do we call on files if they match modality/datatype/suffix? or call them all. Issue mainly regarding a couple have overlap -> intersect a couple of modalities, but I think this is
only ever in the general files, i.e. in MRI, and then it intersects lower modalities selectors:- intersects([suffix], ["physio", "stim"]) in "continuous.yaml"
as well as a couple of them can often apply (task/ events etc...)
OR
-> do we call every rule/json, sidecar etc.. file and perform all checks? is this redundant?
It seems there are a couple exception cases, i.e. continous.yaml is actually a correspondance to "pyhsio" or "stim" which are special timeseries files defined under rules/files/raw/task.yaml
solutions: map out every exception case and deal with them once we know all of them
or just take no shortcuts and iterate through everything so "exceptions" would be covered anyway
------------------------------------------------------
conditional/builder selectors UPDATE
it seems we can make files by iterating through rules/files/common for starting files
then rule/files/raw for any added data
then apply all selectors and json/ tabular/sidecar as conditionals ONLY
Add hooks using @property and @property.setter to intercept the changing of a parameter and look at linked "hooks"
hooks are found when parsing through selectors and resolved to the specific object if possible otherwise kept vague,
i.e. exists("citation.cff", "dataset") is always resolvable, but something more abstract, "task" in entities is not resolvable
Need to figure out how the hooks add / remove metadata / where it is stored
seperate wrapper dicts for entities/ metadata/ etc..?
------------------------------------------------------
FileTree is a standalone tree with attributes describing the paths as well as a reference to the object.
BidsDataset has a reference to File tree, as well as dictionary objects, i.e. Subject, session or datatype objects
Can use @property to easily perform conditions such as acq in file.entities. I set entities as a @property method,
which when called collates the entities from its parents
Input: -Github BIDS schema, YAML format so machine readable, can be used to define classes, requirments and options etc...
Kinds of data:
- Raw data (in unknown format) -> use MNE-BIDS tools to convert
- Data complementing raw data (JSON sidecars)
- Unknown data -> must be manually filled in (authors, readme, etc...)
UI:
Let you choose paths, for raw data /sidecar jsons.
have mapping dictionaries to allow for automatic conversion of data under different "key" names to the required ones
(i.e. in editable config).
Options to select and write down unknown data that must be manually filled in.
FILES HAVE:
core files (and their fields)
deriv files
(comprise of entities - name)
(accompanied by json sidecar - fields)
raw files:
(comprise of entities - name)
(accompanied by json sidecar - fields)
SCHEMA: using bidsschematools
- objects : descriptions for pretty much everything you need (files, what they do, entities, etc..)
- rules : the nitty gritty stuff, whether stuff is required, what entities it needs etc...
- files: describes the requirements for files, as well as entities, datatypes extensions etc...
Create a dataset Tree.
Can build it from rules/directories.yaml to describe core files / folders.
Inate tree structure to allow for backwards and forwards parsing for inheritance of METADATA
When a "dataset" is created, all the required directories / files are created, from rules/files/common/core etc..
with options for their metadata.
Options are added to add raw data. Then this is parsed through its specific file type modality/entity etc... updating relevant core files.
LABELS:
datatype
entities
subject
Create a subject
- create sessions
- create datatype
- create