Skip to content

Conversation

@Bankso
Copy link
Collaborator

@Bankso Bankso commented Sep 9, 2025

  • generate_duo_schema.py uses a pre-defined CSV specification to build JSON schemas that allow access requirements to be conditionally activated. CSV spec is described here.

Help text for current script version (click details):

usage: generate_duo_schema.py [-h] [-t TITLE] [-v VERSION] [-o ORG_ID] [-a ACCESS_REQUIREMENT] [-g GRANT_ID] [-m]
                              [-gc GRANT_COL] [-s STUDY_ID] [-sc STUDY_COL] [-d DATA_TYPE] [-dc DATA_COL]
                              [-p SPECIES_TYPE] [-pc SPECIES_COL]
                              csv_path output_path

  Generate Access Requirement JSON Schema from Data Dictionary CSV

  positional arguments:
    csv_path              Path to the data_dictionary.csv. See and example at https://github.com/Sage-Bionetworks/governanceDUO/blob/main/access_requirement_JSON/README.md
    output_path           Path to output directory for the JSON schema

  options:
  -h, --help            show this help message and exit
  -t TITLE, --title TITLE
                        Schema title
  -v VERSION, --version VERSION
                        Schema version
  -o ORG_ID, --org_id ORG_ID
                        Organization ID for $id field
  -a ACCESS_REQUIREMENT, --access_requirement ACCESS_REQUIREMENT
                        Access requirement ID to select conditions for from reference table. If nothing is provided, the
                        JSON schema will include all applicable conditions listed in the input table.
  -g GRANT_ID, --grant_id GRANT_ID
                        Grant number to select conditions for from reference table. If nothing is provided, the JSON
                        schema will include all conditions listed in the input table.
  -m, --multi_condition
                        Boolean. Generate schema with multiple conditions defined in the CSV
  -gc GRANT_COL, --grant_col GRANT_COL
                        Name of the column in the DCC AR data dictionary that will contain the identifier for the grant
  -s STUDY_ID, --study_id STUDY_ID
                        Study ID to select conditions for from reference table. If nothing is provided, the JSON schema
                        will include all applicable studies listed in the input table.
  -sc STUDY_COL, --study_col STUDY_COL
                        Name of the column in the DCC AR data dictionary that will contain the identifier for the study
  -d DATA_TYPE, --data_type DATA_TYPE
                        Data type to select conditions for from reference table. If nothing is provided, the JSON schema
                        will include all applicable data types listed in the input table.
  -dc DATA_COL, --data_col DATA_COL
                        Name of the column in the DCC AR data dictionary that will contain the identifier for the data
                        type
  -p SPECIES_TYPE, --species_type SPECIES_TYPE
                        Species to select conditions for from reference table. If nothing is provided, the JSON schema
                        will include all applicable species listed in the input table.
  -pc SPECIES_COL, --species_col SPECIES_COL
                        Name of the column in the DCC AR data dictionary that will contain the identifier for the
                        species

  • Updated synapse_json_schema_bind.py to generalize input format and support registering and/or binding non-AR JSON schemas. Note the JSON schema file naming conventions defined on line 136 and 137 (can be updated later, if needed):
Non-AR schema example: mc2.DatasetView-v1.0.0-schema.json
AR schema example: MC2.AccessRequirement-CA000001-v3.0.2-schema.json)

Help text for current script version (click details):

usage: synapse_json_schema_bind.py [-h] [-t T] [-l L] [-p P] [-n N] [-ar] [--no_bind]

options:
  -h, --help  show this help message and exit
  -t T        Synapse Id of an entity to which a schema will be bound.
  -l L        The URL for the JSON schema to be bound to the requested entity.
  -p P        The file path for the JSON schema to be bound to the requested entity.
  -n N        The name of the organization with which the JSON schema should be associated. Default: 'Example
              Organization'.
  -ar         Indicates if the schema includes Access Requirement information.
  --no_bind   Indicates the schema should not be bound to the entity.

Bankso added 30 commits May 19, 2025 11:45
Initial script version, created using ChatGPT
Adjust annotation names and argparse arguments
add additional condition, based on column "Activated_By_Attribute" in source AR data dictionary
Add attribute typing and adjust auto-generated schema name
- simplify input arguments to take either a path or url that points to a JSON schema
- update expected naming convention for JSON
- update path/url parsing to accept AR and non-AR schemas
Modify script to automatically generate additional conditions based on columns in data dictionary CSV, provided they are not considered a "base condition", as defined on line 60
Don't require org name, since it has a default
Passing -m at runtime will generate a JSON schema with additional conditions beyond dataUseModifiers, as defined in the AR data dictionary CSV. If -m is not given, then JSON schema "if-then" statement will only be controlled by the value of annotation "dataUseModifiers"
Added inputs for study id and study col, to designate an additional Id by which to filter ARs
Adjusted existing inputs to have dedicated grant id and grant col
Added logic to filter by study id if it is provided
Added additional identifiers to schema Ids and output file names: study id and "mc" to indicate a multi-component schema
Make version the second to last argument, to simplify parsing
Adjusted slicing/list position references when parsing input URL/file path to ensure all info is accurately captured
Add option to identify that a schema has AR-related information integrated, which will ensure "enableDerivedAnnotations" is used when binding the JSON
Select ARs based on data type. Add data type designation to file name and schema id if provided.
Add option to provide species when selecting conditions from input table. Add filtering conditions and integrate into file name + schema id
Supports option to not bind schema
Implement option to not bind the schema; this is useful if the schema will not be used directly, but will be referenced by other JSON schemas used in Synapse. Note that the URL and unique schema id will be printed to the terminal for reference.
Improves naming convention clarity
When using --help flag, placing get_args first ensures the help message is printed and the script is stopped before logging into synapse
Ensure schema bind functions are not run if no target synId is provided
Update print functions to improve readability
Add script that modifies JSON schemas from schematic to include a "contains" label for attributes with enums in conditionals. This is to allow multiple conditions to be applied, based on a single input array
Changed the output filename separator from '.' to '-' for consistency. Also fixed minor formatting in the dictionary assignment for better readability.
Addresses bug `TypeError: expected string or bytes-like object`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants