Add script to build access requirement JSON schemas and support registering + binding in Synapse #119

Bankso · 2025-09-09T21:58:15Z

generate_duo_schema.py uses a pre-defined CSV specification to build JSON schemas that allow access requirements to be conditionally activated. CSV spec is described here.

Help text for current script version (click details):

usage: generate_duo_schema.py [-h] [-t TITLE] [-v VERSION] [-o ORG_ID] [-a ACCESS_REQUIREMENT] [-g GRANT_ID] [-m]
                              [-gc GRANT_COL] [-s STUDY_ID] [-sc STUDY_COL] [-d DATA_TYPE] [-dc DATA_COL]
                              [-p SPECIES_TYPE] [-pc SPECIES_COL]
                              csv_path output_path

  Generate Access Requirement JSON Schema from Data Dictionary CSV

  positional arguments:
    csv_path              Path to the data_dictionary.csv. See and example at https://github.com/Sage-Bionetworks/governanceDUO/blob/main/access_requirement_JSON/README.md
    output_path           Path to output directory for the JSON schema

  options:
  -h, --help            show this help message and exit
  -t TITLE, --title TITLE
                        Schema title
  -v VERSION, --version VERSION
                        Schema version
  -o ORG_ID, --org_id ORG_ID
                        Organization ID for $id field
  -a ACCESS_REQUIREMENT, --access_requirement ACCESS_REQUIREMENT
                        Access requirement ID to select conditions for from reference table. If nothing is provided, the
                        JSON schema will include all applicable conditions listed in the input table.
  -g GRANT_ID, --grant_id GRANT_ID
                        Grant number to select conditions for from reference table. If nothing is provided, the JSON
                        schema will include all conditions listed in the input table.
  -m, --multi_condition
                        Boolean. Generate schema with multiple conditions defined in the CSV
  -gc GRANT_COL, --grant_col GRANT_COL
                        Name of the column in the DCC AR data dictionary that will contain the identifier for the grant
  -s STUDY_ID, --study_id STUDY_ID
                        Study ID to select conditions for from reference table. If nothing is provided, the JSON schema
                        will include all applicable studies listed in the input table.
  -sc STUDY_COL, --study_col STUDY_COL
                        Name of the column in the DCC AR data dictionary that will contain the identifier for the study
  -d DATA_TYPE, --data_type DATA_TYPE
                        Data type to select conditions for from reference table. If nothing is provided, the JSON schema
                        will include all applicable data types listed in the input table.
  -dc DATA_COL, --data_col DATA_COL
                        Name of the column in the DCC AR data dictionary that will contain the identifier for the data
                        type
  -p SPECIES_TYPE, --species_type SPECIES_TYPE
                        Species to select conditions for from reference table. If nothing is provided, the JSON schema
                        will include all applicable species listed in the input table.
  -pc SPECIES_COL, --species_col SPECIES_COL
                        Name of the column in the DCC AR data dictionary that will contain the identifier for the
                        species

Updated synapse_json_schema_bind.py to generalize input format and support registering and/or binding non-AR JSON schemas. Note the JSON schema file naming conventions defined on line 136 and 137 (can be updated later, if needed):

Non-AR schema example: mc2.DatasetView-v1.0.0-schema.json
AR schema example: MC2.AccessRequirement-CA000001-v3.0.2-schema.json)

Help text for current script version (click details):

usage: synapse_json_schema_bind.py [-h] [-t T] [-l L] [-p P] [-n N] [-ar] [--no_bind]

options:
  -h, --help  show this help message and exit
  -t T        Synapse Id of an entity to which a schema will be bound.
  -l L        The URL for the JSON schema to be bound to the requested entity.
  -p P        The file path for the JSON schema to be bound to the requested entity.
  -n N        The name of the organization with which the JSON schema should be associated. Default: 'Example
              Organization'.
  -ar         Indicates if the schema includes Access Requirement information.
  --no_bind   Indicates the schema should not be bound to the entity.

Initial script version, created using ChatGPT

Adjust annotation names and argparse arguments add additional condition, based on column "Activated_By_Attribute" in source AR data dictionary

Add attribute typing and adjust auto-generated schema name

- simplify input arguments to take either a path or url that points to a JSON schema - update expected naming convention for JSON - update path/url parsing to accept AR and non-AR schemas

Modify script to automatically generate additional conditions based on columns in data dictionary CSV, provided they are not considered a "base condition", as defined on line 60

Don't require org name, since it has a default

Passing -m at runtime will generate a JSON schema with additional conditions beyond dataUseModifiers, as defined in the AR data dictionary CSV. If -m is not given, then JSON schema "if-then" statement will only be controlled by the value of annotation "dataUseModifiers"

Added inputs for study id and study col, to designate an additional Id by which to filter ARs Adjusted existing inputs to have dedicated grant id and grant col Added logic to filter by study id if it is provided Added additional identifiers to schema Ids and output file names: study id and "mc" to indicate a multi-component schema

Make version the second to last argument, to simplify parsing

Adjusted slicing/list position references when parsing input URL/file path to ensure all info is accurately captured

Add option to identify that a schema has AR-related information integrated, which will ensure "enableDerivedAnnotations" is used when binding the JSON

Select ARs based on data type. Add data type designation to file name and schema id if provided.

Add option to provide species when selecting conditions from input table. Add filtering conditions and integrate into file name + schema id

Supports option to not bind schema

Implement option to not bind the schema; this is useful if the schema will not be used directly, but will be referenced by other JSON schemas used in Synapse. Note that the URL and unique schema id will be printed to the terminal for reference.

Improves naming convention clarity

When using --help flag, placing get_args first ensures the help message is printed and the script is stopped before logging into synapse

Ensure schema bind functions are not run if no target synId is provided Update print functions to improve readability

Add script that modifies JSON schemas from schematic to include a "contains" label for attributes with enums in conditionals. This is to allow multiple conditions to be applied, based on a single input array

Changed the output filename separator from '.' to '-' for consistency. Also fixed minor formatting in the dictionary assignment for better readability.

Addresses bug `TypeError: expected string or bytes-like object`

Bankso added 30 commits May 19, 2025 11:45

Create generate_duo_schema.py

d3f8ead

Initial script version, created using ChatGPT

Update .gitignore

0767fb6

Update generate_duo_schema.py

d7ebc25

Adjust annotation names and argparse arguments add additional condition, based on column "Activated_By_Attribute" in source AR data dictionary

Update generate_duo_schema.py

e02477d

Add attribute typing and adjust auto-generated schema name

Update synapse_json_schema_bind.py

bbdac94

- simplify input arguments to take either a path or url that points to a JSON schema - update expected naming convention for JSON - update path/url parsing to accept AR and non-AR schemas

Update generate_duo_schema.py

6bbf136

Modify script to automatically generate additional conditions based on columns in data dictionary CSV, provided they are not considered a "base condition", as defined on line 60

Update synapse_json_schema_bind.py

f952828

Don't require org name, since it has a default

Update generate_duo_schema.py

692c526

Adjust output file naming

43ccc8c

Make version the second to last argument, to simplify parsing

Account for version position in name

b832cf1

Adjusted slicing/list position references when parsing input URL/file path to ensure all info is accurately captured

Update synapse_json_schema_bind.py

04dbb00

Add option to identify that a schema has AR-related information integrated, which will ensure "enableDerivedAnnotations" is used when binding the JSON

Update synapse_json_schema_bind.py

121b4c8

Add flag to select based on data type

f932b7f

Select ARs based on data type. Add data type designation to file name and schema id if provided.

Add flag to select by species

27cd483

Add option to provide species when selecting conditions from input table. Add filtering conditions and integrate into file name + schema id

Make default org generic

644946f

Add no bind flag

66af239

Supports option to not bind schema

Add no bind logic and messaging

e97a945

Implement option to not bind the schema; this is useful if the schema will not be used directly, but will be referenced by other JSON schemas used in Synapse. Note that the URL and unique schema id will be printed to the terminal for reference.

Make default organization name generic

98bfafd

Add flag to select AR ID

31c16a6

Add AR ID to id field

dd3d9af

Update JSON description format and arg help

9de0769

Update doc strings

8f81dbf

Update .gitignore

c55a78e

Update example CSV URL

11da96e

Make default org tag DCC

63ff8fe

Improves naming convention clarity

Get args before synapse login

01b3a0e

When using --help flag, placing get_args first ensures the help message is printed and the script is stopped before logging into synapse

Merge branch 'main' into add-ARjson-build-script

8c4a2e4

Update input handling and printing

74a364e

Ensure schema bind functions are not run if no target synId is provided Update print functions to improve readability

Bankso added 7 commits September 18, 2025 15:22

Remove newline from print argument

d6d2fc4

Create add_json_conditions.py

a03da96

Add script that modifies JSON schemas from schematic to include a "contains" label for attributes with enums in conditionals. This is to allow multiple conditions to be applied, based on a single input array

Fix output filename and formatting in JSON schema generator

eb5ef7a

Changed the output filename separator from '.' to '-' for consistency. Also fixed minor formatting in the dictionary assignment for better readability.

Load input as strings

ceb4e4b

Addresses bug `TypeError: expected string or bytes-like object`

Add tools from curator examples

76074e2

Create query_schema_registry.py

8b408c4

Add ref to AR-related schema if passed

3828493

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add script to build access requirement JSON schemas and support registering + binding in Synapse #119

Add script to build access requirement JSON schemas and support registering + binding in Synapse #119

Uh oh!

Bankso commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add script to build access requirement JSON schemas and support registering + binding in Synapse #119

Are you sure you want to change the base?

Add script to build access requirement JSON schemas and support registering + binding in Synapse #119

Uh oh!

Conversation

Bankso commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants