Skip to content

Commit 15ba0d3

Browse files
committed
update data model documentation
1 parent a6de851 commit 15ba0d3

File tree

1 file changed

+53
-37
lines changed

1 file changed

+53
-37
lines changed

docs/explanations/curator_data_model.md

Lines changed: 53 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ The CSV data model described in this tutorial formalizes this structure:
1515

1616
Here is the Patient described above represented as a CSV data model:
1717

18-
| Attribute | DependsOn |
19-
|---|---|
18+
| Attribute | DependsOn |
19+
|-----------|---------------------|
2020
| Patient | "Age, Gender, Name" |
2121
| Age | |
2222
| Gender | |
@@ -48,9 +48,20 @@ The end goal is to create a JSON Schema that can be used in Curator. A JSON Sche
4848

4949
Note: Individual columns are covered later on this page.
5050

51+
These columns must be present in your CSV data model:
52+
53+
- `Attribute`
54+
- `DependsOn`
55+
- `Description`
56+
- `Valid Values`
57+
- `Required`
58+
- `Parent`
59+
- `Validation Rules`
60+
5161
Defining data types:
5262

5363
- Put a unique data type name in the `Attribute` column.
64+
- Put the value `DataType` in the `Parent` column.
5465
- List at least one attribute in the `DependsOn` column (comma-separated).
5566
- Optionally add a description to the `Description` column.
5667

@@ -79,8 +90,8 @@ Set of possible values for the current attribute. This attribute will be an enum
7990
Data Model:
8091

8192
| Attribute | DependsOn | Valid Values |
82-
|---|---|---|
83-
| Patient | "Gender" | |
93+
|-----------|-----------|-----------------------|
94+
| Patient | "Gender" | |
8495
| Gender | | "Female, Male, Other" |
8596

8697
JSON Schema output:
@@ -107,8 +118,8 @@ Note: Leaving this empty is the equivalent of `False`.
107118
Data Model:
108119

109120
| Attribute | DependsOn | Required |
110-
|---|---|---|
111-
| Patient | "Gender, Age" | |
121+
|-----------|----------------|----------|
122+
| Patient | "Gender, Age" | |
112123
| Gender | | True |
113124
| Age | | False |
114125

@@ -131,6 +142,10 @@ JSON Schema output:
131142
}
132143
```
133144

145+
### Parent
146+
147+
This is mostly a remnant of the Schematic data model. It is currently used to find all the data types in the data model. Put the value `DataType` in this column if this row is a data type. Other vlaues are currently ignored.
148+
134149
### columnType
135150

136151
The data type of this attribute. See [type](https://json-schema.org/understanding-json-schema/reference/type).
@@ -147,11 +162,11 @@ Must be one of:
147162

148163
Data Model:
149164

150-
| Attribute | DependsOn | columnType |
151-
|---|---|---|
152-
| Patient | "Gender, Hobbies" | |
153-
| Gender | | string |
154-
| Hobbies | | string_list |
165+
| Attribute | DependsOn | columnType | Parent |
166+
|-----------|-------------------|-------------|----------|
167+
| Patient | "Gender, Hobbies" | | DataType |
168+
| Gender | | string | |
169+
| Hobbies | | string_list | |
155170

156171
JSON Schema output:
157172

@@ -196,11 +211,11 @@ The format of this attribute. See [format](https://json-schema.org/understanding
196211

197212
Data Model:
198213

199-
| Attribute | DependsOn | columnType | Format |
200-
|---|---|---|---|
201-
| Patient | "Gender, Birth Date" | | |
202-
| Gender | | string | |
203-
| Birth Date | | string | date |
214+
| Attribute | DependsOn | columnType | Format | Parent |
215+
|-----------------|----------------------|-------------|--------|----------|
216+
| Patient | "Gender, Birth Date" | | | DataType |
217+
| Gender | | string | | |
218+
| Birth Date | | string | date | |
204219

205220
JSON Schema output:
206221

@@ -229,11 +244,11 @@ The regex pattern this attribute must match. The type of this attribute must be
229244

230245
Data Model:
231246

232-
| Attribute | DependsOn | columnType | Pattern |
233-
|---|---|---|---|
234-
| Patient | "Gender, ID" | | |
235-
| Gender | | string | |
236-
| ID | | string | [a-f] |
247+
| Attribute | DependsOn | columnType | Pattern | Parent |
248+
|-----------|---------------|-------------|---------|----------|
249+
| Patient | "Gender, ID" | | | DataType |
250+
| Gender | | string | | |
251+
| ID | | string | [a-f] | |
237252

238253
JSON Schema output:
239254

@@ -262,12 +277,12 @@ The range that this attribute's numeric values must fall within. The type of thi
262277

263278
Data Model:
264279

265-
| Attribute | DependsOn | columnType | Minimum | Maximum |
266-
|---|---|---|---|---|
267-
| Patient | "Age, Weight, Health Score" | | | |
268-
| Age | | integer | 0 | 120 |
269-
| Weight | | number | 0.0 | |
270-
| Health Score | | number | 0.0 | 1.0 |
280+
| Attribute | DependsOn | columnType | Minimum | Maximum | Parent |
281+
|--------------|-----------------------------|-------------|---------|---------|----------|
282+
| Patient | "Age, Weight, Health Score" | | | | DataType |
283+
| Age | | integer | 0 | 120 | |
284+
| Weight | | number | 0.0 | | |
285+
| Health Score | | number | 0.0 | 1.0 | |
271286

272287
JSON Schema output:
273288

@@ -301,9 +316,9 @@ JSON Schema output:
301316

302317
### Validation Rules (deprecated)
303318

304-
This is a remnant from Schematic. It is still used (for now) to translate certain validation rules to other JSON Schema keywords.
319+
This is a remnant from Schematic. It is still required and in use (for now) to translate certain validation rules to other JSON Schema keywords.
305320

306-
If you are starting a new data model, DO NOT use this column.
321+
If you are starting a new data model, DO NOT fill out this column, just leave it blank.
307322

308323
If you have an existing data model using any of the following validation rules, follow these instructions to update it:
309324

@@ -315,26 +330,27 @@ If you have an existing data model using any of the following validation rules,
315330

316331
## Conditional dependencies
317332

318-
The `DependsOn` and `Valid Values` columns can be used together to flexibly define conditional logic for determining the relevant attributes for a data type.
333+
The `DependsOn`, `Valid Values` and `Parent` columns can be used together to flexibly define conditional logic for determining the relevant attributes for a data type.
319334

320335
In this example we have the `Patient` data type. The `Patient` can be diagnosed as healthy or with cancer. For Patients with cancer we also want to collect info about their cancer type, and any cancers in their family history.
321336

322337
Data Model:
323338

324-
| Attribute | DependsOn | Valid Values | Required | columnType |
325-
|---|---|---|---|---|
326-
| Patient | "Diagnosis" | | | |
327-
| Diagnosis | | "Healthy, Cancer" | True | string |
328-
| Cancer | "Cancer Type, Family History" | | | |
329-
| Cancer Type | | "Brain, Lung, Skin" | True | string |
330-
| Family History | | "Brain, Lung, Skin" | True | string_list |
339+
| Attribute | DependsOn | Valid Values | Required | columnType | Parent |
340+
|----------------|-------------------------------|---------------------|----------|-------------|----------|
341+
| Patient | "Diagnosis" | | | | DataType |
342+
| Diagnosis | | "Healthy, Cancer" | True | string | |
343+
| Cancer | "Cancer Type, Family History" | | | | |
344+
| Cancer Type | | "Brain, Lung, Skin" | True | string | |
345+
| Family History | | "Brain, Lung, Skin" | True | string_list | |
331346

332347
To demonstrate this, see the above example with the `Patient` and `Cancer` data types:
333348

334349
- `Diagnosis` is an attribute of `Patient`.
335350
- `Diagnosis` has `Valid Values` of `Healthy` and `Cancer`.
336351
- `Cancer` is also a data type.
337352
- `Cancer Type` and `Family History` are attributes of `Cancer` and are both required.
353+
- `Patient` is a data type, but `Cancer` is not, as defined by the `Parent` column.
338354

339355
As a result of the above data model, in the JSON Schema:
340356

0 commit comments

Comments
 (0)