Skip to content

Commit 0d7725b

Browse files
authored
Merge pull request #1298 from Sage-Bionetworks/update_relationships
Update relationships
2 parents 7b6b7d1 + 2158fdb commit 0d7725b

File tree

5 files changed

+101
-44
lines changed

5 files changed

+101
-44
lines changed

docs/explanations/curator_data_model.md

Lines changed: 58 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ The CSV data model described in this tutorial formalizes this structure:
1515

1616
Here is the Patient described above represented as a CSV data model:
1717

18-
| Attribute | DependsOn |
19-
|---|---|
18+
| Attribute | DependsOn |
19+
|-----------|---------------------|
2020
| Patient | "Age, Gender, Name" |
2121
| Age | |
2222
| Gender | |
@@ -48,9 +48,20 @@ The end goal is to create a JSON Schema that can be used in Curator. A JSON Sche
4848

4949
Note: Individual columns are covered later on this page.
5050

51+
These columns must be present in your CSV data model:
52+
53+
- [Attribute](#attribute)
54+
- [DependsOn](#dependson)
55+
- [Description](#description)
56+
- [Valid Values](#valid-values)
57+
- [Required](#required)
58+
- [Parent](#parent)
59+
- [Validation Rules](#validation-rules)
60+
5161
Defining data types:
5262

5363
- Put a unique data type name in the `Attribute` column.
64+
- Put the value `DataType` in the `Parent` column.
5465
- List at least one attribute in the `DependsOn` column (comma-separated).
5566
- Optionally add a description to the `Description` column.
5667

@@ -79,8 +90,8 @@ Set of possible values for the current attribute. This attribute will be an enum
7990
Data Model:
8091

8192
| Attribute | DependsOn | Valid Values |
82-
|---|---|---|
83-
| Patient | "Gender" | |
93+
|-----------|-----------|-----------------------|
94+
| Patient | "Gender" | |
8495
| Gender | | "Female, Male, Other" |
8596

8697
JSON Schema output:
@@ -107,8 +118,8 @@ Note: Leaving this empty is the equivalent of `False`.
107118
Data Model:
108119

109120
| Attribute | DependsOn | Required |
110-
|---|---|---|
111-
| Patient | "Gender, Age" | |
121+
|-----------|----------------|----------|
122+
| Patient | "Gender, Age" | |
112123
| Gender | | True |
113124
| Age | | False |
114125

@@ -131,8 +142,14 @@ JSON Schema output:
131142
}
132143
```
133144

145+
### Parent
146+
147+
Put the value `DataType` in this column if this row is a data type. Other values are currently ignored. It is currently used to find all the data types in the data model.
148+
134149
### columnType
135150

151+
**NOTE: While this is not required, it it strongly recommended that this column is present and that all attributes are typed explicitly.**
152+
136153
The data type of this attribute. See [type](https://json-schema.org/understanding-json-schema/reference/type).
137154

138155
Must be one of:
@@ -147,11 +164,11 @@ Must be one of:
147164

148165
Data Model:
149166

150-
| Attribute | DependsOn | columnType |
151-
|---|---|---|
152-
| Patient | "Gender, Hobbies" | |
153-
| Gender | | string |
154-
| Hobbies | | string_list |
167+
| Attribute | DependsOn | columnType | Parent |
168+
|-----------|-------------------|-------------|----------|
169+
| Patient | "Gender, Hobbies" | | DataType |
170+
| Gender | | string | |
171+
| Hobbies | | string_list | |
155172

156173
JSON Schema output:
157174

@@ -196,11 +213,11 @@ The format of this attribute. See [format](https://json-schema.org/understanding
196213

197214
Data Model:
198215

199-
| Attribute | DependsOn | columnType | Format |
200-
|---|---|---|---|
201-
| Patient | "Gender, Birth Date" | | |
202-
| Gender | | string | |
203-
| Birth Date | | string | date |
216+
| Attribute | DependsOn | columnType | Format | Parent |
217+
|-----------------|----------------------|-------------|--------|----------|
218+
| Patient | "Gender, Birth Date" | | | DataType |
219+
| Gender | | string | | |
220+
| Birth Date | | string | date | |
204221

205222
JSON Schema output:
206223

@@ -229,11 +246,11 @@ The regex pattern this attribute must match. The type of this attribute must be
229246

230247
Data Model:
231248

232-
| Attribute | DependsOn | columnType | Pattern |
233-
|---|---|---|---|
234-
| Patient | "Gender, ID" | | |
235-
| Gender | | string | |
236-
| ID | | string | [a-f] |
249+
| Attribute | DependsOn | columnType | Pattern | Parent |
250+
|-----------|---------------|-------------|---------|----------|
251+
| Patient | "Gender, ID" | | | DataType |
252+
| Gender | | string | | |
253+
| ID | | string | [a-f] | |
237254

238255
JSON Schema output:
239256

@@ -262,12 +279,12 @@ The range that this attribute's numeric values must fall within. The type of thi
262279

263280
Data Model:
264281

265-
| Attribute | DependsOn | columnType | Minimum | Maximum |
266-
|---|---|---|---|---|
267-
| Patient | "Age, Weight, Health Score" | | | |
268-
| Age | | integer | 0 | 120 |
269-
| Weight | | number | 0.0 | |
270-
| Health Score | | number | 0.0 | 1.0 |
282+
| Attribute | DependsOn | columnType | Minimum | Maximum | Parent |
283+
|--------------|-----------------------------|-------------|---------|---------|----------|
284+
| Patient | "Age, Weight, Health Score" | | | | DataType |
285+
| Age | | integer | 0 | 120 | |
286+
| Weight | | number | 0.0 | | |
287+
| Health Score | | number | 0.0 | 1.0 | |
271288

272289
JSON Schema output:
273290

@@ -299,11 +316,13 @@ JSON Schema output:
299316
}
300317
```
301318

302-
### Validation Rules (deprecated)
319+
### Validation Rules
320+
321+
This column is currently deprecated.
303322

304-
This is a remnant from Schematic. It is still used (for now) to translate certain validation rules to other JSON Schema keywords.
323+
This is a remnant from Schematic. It is still required and in use (for now) to translate certain validation rules to other JSON Schema keywords.
305324

306-
If you are starting a new data model, DO NOT use this column.
325+
If you are starting a new data model, DO NOT fill out this column, just leave it blank.
307326

308327
If you have an existing data model using any of the following validation rules, follow these instructions to update it:
309328

@@ -315,22 +334,23 @@ If you have an existing data model using any of the following validation rules,
315334

316335
## Conditional dependencies
317336

318-
The `DependsOn` and `Valid Values` columns can be used together to flexibly define conditional logic for determining the relevant attributes for a data type.
337+
The `DependsOn`, `Valid Values` and `Parent` columns can be used together to flexibly define conditional logic for determining the relevant attributes for a data type.
319338

320339
In this example we have the `Patient` data type. The `Patient` can be diagnosed as healthy or with cancer. For Patients with cancer we also want to collect info about their cancer type, and any cancers in their family history.
321340

322341
Data Model:
323342

324-
| Attribute | DependsOn | Valid Values | Required | columnType |
325-
|---|---|---|---|---|
326-
| Patient | "Diagnosis" | | | |
327-
| Diagnosis | | "Healthy, Cancer" | True | string |
328-
| Cancer | "Cancer Type, Family History" | | | |
329-
| Cancer Type | | "Brain, Lung, Skin" | True | string |
330-
| Family History | | "Brain, Lung, Skin" | True | string_list |
343+
| Attribute | DependsOn | Valid Values | Required | columnType | Parent |
344+
|----------------|-------------------------------|---------------------|----------|-------------|----------|
345+
| Patient | "Diagnosis" | | | | DataType |
346+
| Diagnosis | | "Healthy, Cancer" | True | string | |
347+
| Cancer | "Cancer Type, Family History" | | | | |
348+
| Cancer Type | | "Brain, Lung, Skin" | True | string | |
349+
| Family History | | "Brain, Lung, Skin" | True | string_list | |
331350

332351
To demonstrate this, see the above example with the `Patient` and `Cancer` data types:
333352

353+
- `Patient` is a data type, but `Cancer` is not, as defined by the `Parent` column.
334354
- `Diagnosis` is an attribute of `Patient`.
335355
- `Diagnosis` has `Valid Values` of `Healthy` and `Cancer`.
336356
- `Cancer` is also a data type.

synapseclient/extensions/curator/schema_generation.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2955,7 +2955,7 @@ def define_data_model_relationships(self) -> dict:
29552955
"edge_dir": "out",
29562956
"type": list,
29572957
"edge_rel": True,
2958-
"required_header": True,
2958+
"required_header": False,
29592959
},
29602960
"required": {
29612961
"jsonld_key": "sms:required",
@@ -3004,7 +3004,7 @@ def define_data_model_relationships(self) -> dict:
30043004
"edge_dir": "in",
30053005
"type": list,
30063006
"edge_rel": True,
3007-
"required_header": True,
3007+
"required_header": False,
30083008
},
30093009
"isPartOf": {
30103010
"jsonld_key": "schema:isPartOf",
@@ -3023,7 +3023,7 @@ def define_data_model_relationships(self) -> dict:
30233023
"node_label": "uri",
30243024
"type": str,
30253025
"edge_rel": False,
3026-
"required_header": True,
3026+
"required_header": False,
30273027
"node_attr_dict": {
30283028
"default": get_label_from_display_name,
30293029
"standard": get_label_from_display_name,
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Attribute,Description,Valid Values,DependsOn,Required,Parent,Validation Rules
2+
datatype,,,attribute,,DataType,
3+
attribute,,,,TRUE,DataProperty,

tests/unit/synapseclient/extensions/unit_test_curator.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1947,6 +1947,11 @@ def setUp(self):
19471947
"schema_files",
19481948
"data_models/example.model.csv",
19491949
)
1950+
self.minimal_test_schema_path = os.path.join(
1951+
os.path.dirname(__file__),
1952+
"schema_files",
1953+
"data_models/minimal_model.csv",
1954+
)
19501955

19511956
def test_generate_jsonschema_from_csv(self):
19521957
"""Test generate_jsonschema from CSV file."""
@@ -1980,6 +1985,38 @@ def test_generate_jsonschema_from_csv(self):
19801985
finally:
19811986
shutil.rmtree(temp_dir)
19821987

1988+
def test_generate_jsonschema_from_minimal_csv(self):
1989+
"""Test generate_jsonschema from a minimal CSV file."""
1990+
# GIVEN a CSV schema file
1991+
temp_dir = tempfile.mkdtemp()
1992+
try:
1993+
# WHEN I generate JSON schemas
1994+
schemas, file_paths = generate_jsonschema(
1995+
data_model_source=self.minimal_test_schema_path,
1996+
output=temp_dir,
1997+
data_types=None,
1998+
data_model_labels="class_label",
1999+
synapse_client=self.syn,
2000+
)
2001+
2002+
# THEN schemas should be generated
2003+
assert isinstance(schemas, list)
2004+
assert len(schemas) > 0
2005+
assert isinstance(file_paths, list)
2006+
assert len(file_paths) == len(schemas)
2007+
2008+
# AND files should exist
2009+
for file_path in file_paths:
2010+
assert os.path.exists(file_path), f"Expected file at {file_path}"
2011+
2012+
# AND each schema should be valid JSON Schema
2013+
for schema in schemas:
2014+
assert isinstance(schema, dict)
2015+
assert "$schema" in schema
2016+
assert "properties" in schema
2017+
finally:
2018+
shutil.rmtree(temp_dir)
2019+
19832020
def test_generate_jsonschema_from_jsonld(self):
19842021
"""Test generate_jsonschema from JSONLD file."""
19852022
# GIVEN a JSONLD file (first generate it from CSV)

tests/unit/synapseclient/extensions/unit_test_data_model_relationships.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,12 +39,9 @@ def test_define_required_csv_headers(self, dmr: DataModelRelationships):
3939
"Description",
4040
"Valid Values",
4141
"DependsOn",
42-
"DependsOn Component",
4342
"Required",
4443
"Parent",
4544
"Validation Rules",
46-
"Properties",
47-
"Source",
4845
]
4946

5047
@pytest.mark.parametrize("edge", [True, False], ids=["True", "False"])

0 commit comments

Comments
 (0)