Skip to content

Commit 734e150

Browse files
Fix conditional enum filtering and sync workflow
This commit fixes the conditional enum filtering system to work with Synapse's limitations and consolidates the sync scripts. ## Problem 1. Conditional filtering used $defs/$refs which Synapse doesn't support 2. Two sync scripts (sync_model_systems.py and sync_model_systems_enhanced.py) were confusing 3. Weekly workflow didn't regenerate JSON schemas after syncing data 4. modules/Sample/generated/ folder had no documentation ## Solution ### 1. Replace sync_model_systems.py with enhanced version - Merged sync_model_systems_enhanced.py functionality into sync_model_systems.py - Added antibody and genetic reagent syncing to the enhanced script - Deleted the "enhanced" version to avoid confusion - Updated weekly workflow to use standard name ### 2. Fix add_conditional_enum_filtering.py to inline enums - Changed from using $refs pointing to $defs - Now directly inlines enum values in if/then conditionals - Reads from modules/Sample/generated/*.yaml files - Creates conditionals like: ``` if: {modelSystemType: "cell line", modelSpecies: "Homo sapiens", ...} then: {modelSystemName: {items: {enum: ["90-8", "ST88-14", ...]}}} ``` - No $defs section in output (Synapse-compatible) ### 3. Update weekly-model-system-sync.yml workflow - Added step to regenerate JSON schemas after syncing data - Now runs add_conditional_enum_filtering.py + gen-json-schema-class.py - Ensures schemas stay in sync with latest cell lines/models - Updated PR description to mention schema regeneration ### 4. Document modules/Sample/generated/ folder - Added README.md explaining purpose and build process - Clarifies these are source files, not runtime files - Documents the cascading filter approach for staying under 100-value limit ## Result - ✅ Conditional filtering works without $defs (Synapse-compatible) - ✅ Single sync script handles all resource types - ✅ Weekly workflow keeps schemas synchronized - ✅ Clear documentation for generated enum files ## Files Changed - utils/sync_model_systems.py - Now the main sync script (was "enhanced") - utils/sync_model_systems_enhanced.py - Deleted (merged into main) - utils/add_conditional_enum_filtering.py - Inline enums instead of $defs - .github/workflows/weekly-model-system-sync.yml - Add schema regeneration - modules/Sample/generated/README.md - New documentation Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent 62a946f commit 734e150

File tree

5 files changed

+500
-967
lines changed

5 files changed

+500
-967
lines changed

.github/workflows/weekly-model-system-sync.yml

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ jobs:
4040
4141
- name: Sync model system data with conditional filtering
4242
run: |
43-
python utils/sync_model_systems_enhanced.py --synapse-id ${{ env.SYNAPSE_TOOLS_TABLE_ID }}
43+
python utils/sync_model_systems.py --synapse-id ${{ env.SYNAPSE_TOOLS_TABLE_ID }}
4444
4545
- name: Check for changes
4646
id: changes
@@ -61,10 +61,20 @@ jobs:
6161
git clone --depth 1 https://github.com/anngvu/retold.git
6262
pip install linkml==v1.8.1
6363
npm install -g json-dereference-cli
64-
64+
pip install jsonref # Required for schema generation
65+
6566
# Rebuild the data model
6667
make NF.yaml
67-
bb ./retold/retold as-jsonld --dir modules --out NF.jsonld
68+
bb ./retold/retold as-jsonld --dir modules --out NF.jsonld
69+
70+
- name: Regenerate JSON schemas
71+
if: steps.changes.outputs.changes == 'true'
72+
run: |
73+
# Add conditional filtering (inlines enum values into schemas)
74+
python utils/add_conditional_enum_filtering.py
75+
76+
# Regenerate all JSON schemas (this will inline everything properly)
77+
python utils/gen-json-schema-class.py --skip-validation
6878
6979
- name: Create Pull Request
7080
if: steps.changes.outputs.changes == 'true'
@@ -89,15 +99,17 @@ jobs:
8999
- Updated genetic reagents in `modules/Experiment/GeneticReagent.yaml`
90100
- Added/updated `source` links to NF Tools Central detail pages
91101
- Rebuilt data model artifacts (`NF.jsonld`, `dist/NF.yaml`)
102+
- Regenerated JSON schemas with updated enum values
92103
93104
### Files Modified:
94-
- `modules/Sample/CellLineModel.yaml` (638 cell lines)
95-
- `modules/Sample/AnimalModel.yaml` (123 animal models)
105+
- `modules/Sample/CellLineModel.yaml` (cell lines)
106+
- `modules/Sample/AnimalModel.yaml` (animal models)
96107
- `modules/Sample/generated/*.yaml` (29 filtered enum subsets)
97108
- `modules/Experiment/Antibody.yaml`
98109
- `modules/Experiment/GeneticReagent.yaml`
99110
- `NF.jsonld`
100111
- `dist/NF.yaml`
112+
- `registered-json-schemas/*.json` (63 schemas)
101113
102114
### What This Includes:
103115
- ✅ Cell line and animal model names from NF Tools Central (syn51730943)

modules/Sample/generated/README.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Filtered Enum Subsets for Conditional Filtering
2+
3+
This directory contains auto-generated filtered enum subsets used for conditional enum filtering in JSON schemas.
4+
5+
## Purpose
6+
7+
Synapse has a **100-value limit** for enum fields. The full `CellLineModel` enum has 638 values, which exceeds this limit. To work around this, we use **conditional filtering** with cascading dropdowns:
8+
9+
```
10+
modelSystemType → modelSpecies → cellLineCategory → cellLineGeneticDisorder → modelSystemName
11+
```
12+
13+
Based on the user's selections in the filter fields, the JSON schema shows only the relevant subset of model systems (always <100 values).
14+
15+
## Files
16+
17+
Each file contains a filtered subset of cell lines or animal models:
18+
19+
- `CellLineHomosapiensCancercelllineNeurofibromatosistype1Enum.yaml` - 54 human NF1 cancer cell lines
20+
- `CellLineHomosapiensInducedpluripotentstemcellNeurofibromatosistype1Enum.yaml` - 32 human NF1 iPSCs
21+
- etc. (29 files total)
22+
23+
## Build Process
24+
25+
These files are **source files** for the build process, not runtime files:
26+
27+
1. **Weekly Sync** (`weekly-model-system-sync.yml`)
28+
- Runs `utils/sync_model_systems.py`
29+
- Queries syn51730943 (NF Tools Central) for latest cell lines/models
30+
- Generates these filtered enum YAML files
31+
- Updates `CellLineModel.yaml`, `AnimalModel.yaml`, etc.
32+
33+
2. **Schema Generation** (manual or in CI)
34+
- Runs `utils/add_conditional_enum_filtering.py`
35+
- Reads these YAML files
36+
- Creates if/then conditional rules in JSON schemas
37+
- **Inlines enum values directly** (no $refs - Synapse doesn't support them)
38+
39+
3. **Upload to Synapse**
40+
- Only the final `registered-json-schemas/*.json` files are uploaded
41+
- These contain fully inlined enum values from this directory
42+
43+
## Data Source
44+
45+
- Table: **syn51730943** (NF Tools Central)
46+
- Columns: `resourceName`, `species`, `cellLineCategory`, `cellLineGeneticDisorder`
47+
- Maintained by the NFTC team
48+
49+
## Maintenance
50+
51+
These files are auto-generated weekly. **Do not edit manually.**
52+
53+
To regenerate:
54+
```bash
55+
python utils/sync_model_systems.py --synapse-id syn51730943
56+
python utils/add_conditional_enum_filtering.py
57+
python utils/gen-json-schema-class.py
58+
```
59+
60+
## See Also
61+
62+
- [Conditional Enum Filtering Plan](../../../docs/conditional-enum-filtering-plan.md)
63+
- Issue #797 (100-value enum limit)

utils/add_conditional_enum_filtering.py

Lines changed: 26 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -221,12 +221,13 @@
221221
]
222222

223223

224-
def create_conditional_rule(mapping: Dict[str, str]) -> Dict[str, Any]:
224+
def create_conditional_rule(mapping: Dict[str, str], enum_values: List[str]) -> Dict[str, Any]:
225225
"""
226226
Create an if/then conditional rule for a filter combination.
227227
228228
Args:
229229
mapping: Dict with filter field values and enum name
230+
enum_values: List of enum values to inline directly (no $refs)
230231
231232
Returns:
232233
Dict with if/then structure for allOf
@@ -252,11 +253,12 @@ def create_conditional_rule(mapping: Dict[str, str]) -> Dict[str, Any]:
252253
if_properties["cellLineGeneticDisorder"] = {"const": mapping["cellLineGeneticDisorder"]}
253254
required_fields.append("cellLineGeneticDisorder")
254255

255-
# Build the then condition - reference the filtered enum
256+
# Build the then condition - inline enum values directly (Synapse doesn't support $refs/$defs)
256257
then_properties = {
257258
"modelSystemName": {
258259
"items": {
259-
"$ref": f"#/$defs/{mapping['enum']}"
260+
"enum": enum_values,
261+
"type": "string"
260262
}
261263
}
262264
}
@@ -280,31 +282,18 @@ def add_conditionals_to_schema(schema: Dict[str, Any]) -> Dict[str, Any]:
280282
schema: The JSON schema dict
281283
282284
Returns:
283-
Modified schema with conditionals added
285+
Modified schema with conditionals added (enum values inlined, no $defs)
284286
"""
285287
# Check if this schema has modelSystemName field
286288
if "properties" not in schema or "modelSystemName" not in schema.get("properties", {}):
287289
return schema # No modelSystemName, skip
288290

289291
print(f" Adding {len(CONDITIONAL_MAPPINGS)} conditional rules...")
290292

291-
# Create allOf rules
292-
conditional_rules = [create_conditional_rule(mapping) for mapping in CONDITIONAL_MAPPINGS]
293-
294-
# Add to schema
295-
if "allOf" in schema:
296-
# Append to existing allOf
297-
schema["allOf"].extend(conditional_rules)
298-
else:
299-
# Create new allOf
300-
schema["allOf"] = conditional_rules
301-
302-
# Add the filtered enum definitions to $defs
303-
if "$defs" not in schema:
304-
schema["$defs"] = {}
305-
306-
# Load enum definitions from generated files
293+
# Load enum values from generated files
307294
generated_dir = Path("modules/Sample/generated")
295+
conditional_rules = []
296+
308297
for mapping in CONDITIONAL_MAPPINGS:
309298
enum_name = mapping["enum"]
310299
enum_file = generated_dir / f"{enum_name}.yaml"
@@ -318,11 +307,23 @@ def add_conditionals_to_schema(schema: Dict[str, Any]) -> Dict[str, Any]:
318307
if "enums" in enum_data and enum_name in enum_data["enums"]:
319308
enum_values = list(enum_data["enums"][enum_name]["permissible_values"].keys())
320309

321-
# Add to $defs
322-
schema["$defs"][enum_name] = {
323-
"enum": enum_values,
324-
"type": "string"
325-
}
310+
# Create conditional rule with inlined enum values
311+
rule = create_conditional_rule(mapping, enum_values)
312+
conditional_rules.append(rule)
313+
else:
314+
print(f" Warning: Could not find enum '{enum_name}' in {enum_file}")
315+
else:
316+
print(f" Warning: Enum file not found: {enum_file}")
317+
318+
# Add to schema
319+
if "allOf" in schema:
320+
# Append to existing allOf
321+
schema["allOf"].extend(conditional_rules)
322+
else:
323+
# Create new allOf
324+
schema["allOf"] = conditional_rules
325+
326+
print(f" ✓ Added {len(conditional_rules)} conditional rules with inlined enum values")
326327

327328
return schema
328329

0 commit comments

Comments
 (0)