diff --git a/docs/source/asset_store.rst b/docs/source/asset_store.rst index 80cbd008b..cc54fd012 100644 --- a/docs/source/asset_store.rst +++ b/docs/source/asset_store.rst @@ -1,5 +1,6 @@ +############################ Setting up your asset store -=========================== +############################ .. note:: @@ -38,9 +39,9 @@ In each of these project setups, there are two ways you can lay out your data: ├── batch_1 <- annotated └── batch_2 <- annotated - +*************************************** Option 1: Distributed Synapse Projects --------------------------------------- +*************************************** Pick **option 1** if you answer "yes" to one or more of the following questions: @@ -50,7 +51,7 @@ Pick **option 1** if you answer "yes" to one or more of the following questions: - Are you not willing to annotate each DCC dataset folder with the annotation ``contentType:dataset``? Access & Project Setup - Multiple Contributing Projects -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +======================================================== 1. Create a DCC Admin Team with admin permissions. 2. Create a Team for each data contributing institution. Begin with a "Test Team" if all teams are not yet identified. @@ -70,9 +71,9 @@ Access & Project Setup - Multiple Contributing Projects distributed projects, just the ``contentType`` column to your fileview, and you will have to annotate your top level folders with ``contentType:dataset``. - +********************************** Option 2: Single Synapse Project --------------------------------- +********************************** Pick **option 2** if you don't select option 1 and you answer "yes" to any of these questions: @@ -84,7 +85,7 @@ If neither option fits, select option 1. Access & Project Setup - Single Contributing Project -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +======================================================= 1. Create a Team for each data contributing institution. 2. Create a single Synapse Project (e.g., MyDCC). @@ -115,7 +116,7 @@ Access & Project Setup - Single Contributing Project proliferation of folders per contributor and data type. Synapse External Cloud Buckets Setup ------------------------------------- +===================================== If DCC contributors require external cloud buckets, select one of the following configurations. For more information on how to set this up on Synapse, view this documentation: https://help.synapse.org/docs/Custom-Storage-Locations.2048327803.html diff --git a/docs/source/cli_reference.rst b/docs/source/cli_reference.rst index 83cda3e2c..9520d3f45 100644 --- a/docs/source/cli_reference.rst +++ b/docs/source/cli_reference.rst @@ -1,45 +1,47 @@ -============= +############# CLI Reference -============= +############# When you're using this tool ``-d`` flag is referring to the Synapse ID of a folder that would be found under the files tab that contains a manifest and data. This would be referring to a "Top Level Folder". It is not required to provide a ``dataset_id`` but if you're trying to pull existing annotations by using the ``-a`` flag and the manifest is file-based then you would need to provide a ``dataset_id``. - +***************************************** Generate a new manifest as a Google Sheet ------------------------------------------ - +***************************************** .. code-block:: shell schematic manifest -c /path/to/config.yml get -dt -s +****************************************** Generate an existing manifest from Synapse ------------------------------------------- +****************************************** .. code-block:: shell schematic manifest -c /path/to/config.yml get -dt -d -s +***************************************** Validate a manifest -------------------- +***************************************** .. code-block:: shell schematic model -c /path/to/config.yml validate -dt -mp +***************************************** Submit a manifest as a file ---------------------------- +***************************************** .. code-block:: shell schematic model -c /path/to/config.yml submit -mp -d -vc -mrt file_only - +***************************************** In depth guide --------------- +***************************************** .. click:: schematic.__main__:main :prog: schematic diff --git a/docs/source/conf.py b/docs/source/conf.py index 6772850cb..ed63f1b61 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -44,6 +44,9 @@ # ones. extensions = ["sphinx_click", "sphinx_rtd_theme", "sphinx.ext.autosectionlabel"] +# Configure autosection label to prefix sections with document name. Requires referencing from directory index.rst is in. +autosectionlabel_prefix_document = True + # Add any paths that contain templates here, relative to this directory. templates_path = ["_templates"] diff --git a/docs/source/configuration.rst b/docs/source/configuration.rst index f8d458dbf..1edc8577d 100644 --- a/docs/source/configuration.rst +++ b/docs/source/configuration.rst @@ -1,7 +1,6 @@ -.. _configuration: - +################### Configure Schematic -=================== +################### This is an example config for Schematic. All listed values are those that are the default if a config is not used. Remove any fields in the config you don't want to change. If you remove all fields from a section, the entire section should be removed including the header. @@ -48,11 +47,12 @@ Change the values of any fields you do want to change. Please view the installa This document will go into detail what each of these configurations mean. +*********** Asset Store ------------ +*********** Synapse -~~~~~~~ +======== This describes where assets such as manifests are stored and the configurations of the asset store is described under the asset store section. @@ -60,23 +60,25 @@ under the asset store section. * config: Path to the synapse config file, either absolute or relative to this file. Note, if you use `synapse config` command, you will have to provide the full path to the configuration file. * manifest_basename: Base name that manifest files will be saved as on Synapse. The Component will be appended to it so for example: `synapse_storage_manifest_biospecimen.csv` +********** Manifest --------- +********** This describes information about manifests as it relates to generation and validation. Note: some of these configurations can be overwritten by the CLI commands. * manifest_folder: Location where manifests will saved to. This can be a relative or absolute path on your local machine. * title: Title or title prefix given to generated manifest(s). This is used to name the manifest file saved locally. * data_type: Data types of manifests to be generated or data type (singular) to validate manifest against. If you wanted all the available manifests, you can input "all manifests" - +****** Model ------ +****** Describes the location of your schema * location: This is the location of your schema jsonld, it must be a path relative to this file or absolute path. Currently URL's are NOT supported, so you will have to download the jsonld data model. Here is an example: https://raw.githubusercontent.com/ncihtan/data-models/v24.9.1/HTAN.model.jsonld +************* Google Sheets -------------- +************* Schematic leverages the Google API to generate manifests. This section is for using google sheets with Schematic * service_acct_creds: Path to the google service account creds, either absolute or relative to this file. This is the path to the service account credentials file that you download from Google Cloud Platform. diff --git a/docs/source/index.rst b/docs/source/index.rst index 975db9122..5ee86220a 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -5,8 +5,9 @@ .. _index: +###################################### Welcome to Schematic's documentation! -===================================== +###################################### .. warning:: This documentation site is a work in progress, and the sublinks may change. Apologies for the inconvenience. @@ -28,15 +29,16 @@ Schematic tackles these goals: :depth: 2 :local: +******************* Important Concepts ------------------- +******************* .. important:: Before moving reading more about schematic, this section covers essential concepts relevant for using the Schematic tool effectively. Synapse FileViews -~~~~~~~~~~~~~~~~~ +================= Users are responsible for setting up a **FileView** that integrates with Schematic. Note that FileViews appear under the "Tables" tab in Synapse and can be named according to the project's needs. For instance, a FileView for the **Project A** could have a different name than a FileView for the **Project B**. For more information on Synapse projects, visit: @@ -45,17 +47,17 @@ For more information on Synapse projects, visit: - `Synapse annotations `_ Synapse Folders -~~~~~~~~~~~~~~~ +================ Folders in Synapse allow users to organize data within projects. More details on uploading and organizing data can be found at `Synapse folders `_ Synapse Datasets -~~~~~~~~~~~~~~~~ +================ This is an object in Synapse which appears under the "Dataset" tab and represents a user-defined collection of Synapse files and versions. https://help.synapse.org/docs/Datasets.2611281979.html JSON-LD -~~~~~~~ +======= JSON-LD is a lightweight Linked Data format. The usage of JSON-LD to capture our data models extends beyond the creation, validation, and submission of annotations/manifests into Synapse It can create relationships between different data models and, in the future, drive @@ -64,7 +66,7 @@ and their relationships is also possible which allows the community to see the d connections between all the data uploaded into Synapse. Manifest -~~~~~~~~ +======== A manifest is a structured file that contains metadata about files under a "top level folder". The metadata includes information of the files such as data type and etc. @@ -72,14 +74,14 @@ The manifest can also used to annotate the data on Synapse and create a file vie that enables the FAIR principles on each of the files in the "top level folder". Component/Data type -~~~~~~~~~~~~~~~~~~~ +=================== "component" and "data type" are used interchangeably. The component/data type is determined from the specified JSON-LD data model. If the string "component" exists in the depends on column, the "Attribute" value in that row is a data type. Examples of a data type is "Biospecimen", "Patient": https://github.com/Sage-Bionetworks/schematic/blob/develop/tests/data/example.model.csv#L3. Each data type/component should a manifest template that has different columns. Project Data Layout -~~~~~~~~~~~~~~~~~~~ +=================== Regardless of data layout, the data in your Synapse Project(s) are uploaded into Synapse Folders to be curated and annotated by schematic. In both layouts listed below, the project administrators along with the data contributors may have preferences on how the @@ -93,19 +95,19 @@ different things under these two layouts. In both of these layouts, these are really just groupings of resources. - +******************* Schematic services ------------------- +******************* The following are the four main endpoints that assist with the high-level goals outlined above, with additional goals to come. Manifest Generation -~~~~~~~~~~~~~~~~~~~ +=================== Provides a manifest template for users for a particular project or data type. If a project with annotations already exists, a semi-filled-out template can be provided to the user. This ensures they do not start from scratch. If there are no existing annotations and manifests, an empty manifest template is provided. -Manifest Validation -~~~~~~~~~~~~~~~~~~~ +Validating a Manifest +===================== Given a filled-out manifest: @@ -116,7 +118,7 @@ Given a filled-out manifest: - Validation results are provided before the manifest file is uploaded into Synapse. Manifest Submission -~~~~~~~~~~~~~~~~~~~ +=================== Given a filled out manifest, this will allow you to submit the manifest to the "top level folder". This is validates the manifest and... @@ -130,13 +132,13 @@ This is validates the manifest and... More validation documentation can be found here: https://sagebionetworks.jira.com/wiki/spaces/SCHEM/pages/3302785036/Schematic+Validation Data Model Visualization -~~~~~~~~~~~~~~~~~~~~~~~~ +======================== These endpoints allows you to visulize your data models and their relationships with each other. - +************** API reference -------------- +************** For the entire Python API reference documentation, you can visit the docs here: https://sage-bionetworks.github.io/schematic/ diff --git a/docs/source/installation.rst b/docs/source/installation.rst index 37254877b..edbddfb57 100644 --- a/docs/source/installation.rst +++ b/docs/source/installation.rst @@ -1,10 +1,10 @@ -.. _installation: - +############ Installation -============ +############ -Installation Requirements -------------------------- +************************* +Installation requirements +************************* - Your installed python version must be 3.9.0 ≤ version < 3.11.0 - You need to be a registered and certified user on `synapse.org `_ @@ -13,13 +13,14 @@ Installation Requirements To create Google Sheets files from Schematic, please follow our credential policy for Google credentials. You can find a detailed tutorial `Google Credentials Guide `_. If you're using ``config.yml``, make sure to specify the path to ``schematic_service_account_creds.json`` (see the ``google_sheets > service_account_creds`` section for more information). +***************************** Installation Guide For: Users ------------------------------ +***************************** The instructions below assume you have already installed `python `_, with the release version meeting the constraints set in the `Installation Requirements`_ section, and do not have a Python environment already active. 1. Verify your python version -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +============================== Ensure your python version meets the requirements from the `Installation Requirements`_ section using the following command: @@ -33,12 +34,12 @@ If your current Python version is not supported by Schematic, you can switch to You can double-check the current supported python version by opening up the `pyproject.toml `_ file in this repository and finding the supported versions of python in the script. 2. Set up your virtual environment -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +=================================== Once you are working with a python version supported by `schematic`, you will need to activate a virtual environment within which you can install the package. Below we will show how to create your virtual environment either with ``venv`` or with ``conda``. 2a. Set up your virtual environment with ``venv`` -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +-------------------------------------------------- Python 3 has built-in support for virtual environments with the ``venv`` module, so you no longer need to install ``virtualenv``: @@ -48,19 +49,19 @@ Python 3 has built-in support for virtual environments with the ``venv`` module, source .venv/bin/activate 2b. Set up your virtual environment with ``conda`` -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +---------------------------------------------------- ``conda`` is a powerful package and environment management tool that allows users to create isolated environments used particularly in data science and machine learning workflows. If you would like to manage your environments with ``conda``, continue reading: -1. **Download your preferred ``conda`` installer**: Begin by `installing conda `_. We personally recommend working with Miniconda, which is a lightweight installer for ``conda`` that includes only ``conda`` and its dependencies. -2. **Execute the ``conda`` installer**: Once you have downloaded your preferred installer, execute it using ``bash`` or ``zsh``, depending on the shell configured for your terminal environment. For example: +1. **Download your preferred conda installer**: Begin by `installing conda `_. We personally recommend working with Miniconda, which is a lightweight installer for ``conda`` that includes only ``conda`` and its dependencies. +2. **Execute the conda installer**: Once you have downloaded your preferred installer, execute it using ``bash`` or ``zsh``, depending on the shell configured for your terminal environment. For example: .. code-block:: shell bash Miniconda3-latest-MacOSX-arm64.sh -3. **Verify your ``conda`` setup**: Follow the prompts to complete your setup. Then verify your setup by running the ``conda`` command. -4. **Create your ``schematic`` environment**: Begin by creating a fresh ``conda`` environment for ``schematic`` like so: +3. **Verify your conda setup**: Follow the prompts to complete your setup. Then verify your setup by running the ``conda`` command. +4. **Create your schematic environment**: Begin by creating a fresh ``conda`` environment for ``schematic`` like so: .. code-block:: shell @@ -73,9 +74,9 @@ Python 3 has built-in support for virtual environments with the ``venv`` module, conda activate schematicpy 3. Install ``schematic`` dependencies -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +====================================== -Install the package using `pip `_: +Install the package using `pip `_: .. code-block:: shell @@ -88,7 +89,7 @@ If you run into ``ERROR: Failed building wheel for numpy``, the error might be a pip3 install --upgrade pip 4. Get your data model as a ``JSON-LD`` schema file -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +===================================================== Now you need a schema file, e.g. ``model.jsonld``, to have a data model that schematic can work with. While you can download a super basic `example data model `_, you'll probably be working with a DCC-specific data model. For non-Sage employees/contributors using the CLI, you might care only about the minimum needed artifact, which is the ``.jsonld``; locate and download only that from the right repo. @@ -98,7 +99,7 @@ Here are some example repos with schema files: - https://github.com/nf-osi/nf-metadata-dictionary/ 5. Obtain Google credential files -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +================================== Any function that interacts with a Google sheet (such as ``schematic manifest get``) requires Google Cloud credentials. @@ -121,7 +122,7 @@ Once you have obtained credentials, be sure that the json file generated is name .. _Set up configuration files: 6. Set up configuration files -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +============================== The following section will walk through setting up your configuration files with your credentials to allow for communication between ``schematic`` and the Synapse API. @@ -130,7 +131,7 @@ There are two main configuration files that need to be created and modified: - ``.synapseConfig`` - ``config.yml`` -**Create and modify the ``.synapseConfig``** +**Create and modify the .synapseConfig** The ``.synapseConfig`` file is what enables communication between ``schematic`` and the Synapse API using your credentials. You can automatically generate a ``.synapseConfig`` file by running the following in your command line and following the prompts. @@ -149,12 +150,13 @@ After following the prompts, a new ``.synapseConfig`` file and ``.synapseCache`` The ``.synapseConfig`` is used to log into Synapse if you are not using an environment variable (i.e. ``SYNAPSE_ACCESS_TOKEN``) for authentication, and the ``.synapseCache`` is where your assets are stored if you are not working with the CLI and/or you have specified ``.synapseCache`` as the location in which to store your manifests, in your ``config.yml``. -**Create and modify the ``config.yml``** +**Create and modify the config.yml** In this repository there is a ``config_example.yml`` file with default configurations to various components that are required before running ``schematic``, such as the Synapse ID of the main file view containing all your project assets, the +********************************** Installation Guide For: Developers ----------------------------------- +********************************** .. note:: This section is for people developing on Schematic only @@ -163,12 +165,13 @@ The instructions below assume you have already installed `python `_ so that we may track these changes. -Once you have finished setting up your development environment using the instructions below, please follow the guidelines in `CONTRIBUTION.md `_ during your development. +Once you have finished setting up your development environment using the instructions below, please follow the guidelines in `CONTRIBUTION.md `_ during your development. + +Please note we have a `code of conduct `_, please follow it in all your interactions with the project. -Please note we have a `code of conduct `_, please follow it in all your interactions with the project. 1. Clone the ``schematic`` package repository -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +============================================= For development, you will be working with the latest version of ``schematic`` on the repository to ensure compatibility between its latest state and your changes. Ensure your current working directory is where you would like to store your local fork before running the following command: @@ -176,8 +179,9 @@ For development, you will be working with the latest version of ``schematic`` on git clone https://github.com/Sage-Bionetworks/schematic.git + 2. Install ``poetry`` -~~~~~~~~~~~~~~~~~~~~~ +===================== Install ``poetry`` (version 1.3.0 or later) using either the `official installer `_ or ``pip``. If you have an older installation of Poetry, we recommend uninstalling it first. @@ -191,8 +195,9 @@ Check to make sure your version of poetry is > v1.3.0 poetry --version + 3. Start the virtual environment -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +================================= Change directory (``cd``) into your cloned ``schematic`` repository, and initialize the virtual environment using the following command with ``poetry``: @@ -206,8 +211,9 @@ To make sure your poetry version and python version are consistent with the vers poetry debug info + 4. Install ``schematic`` dependencies -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +====================================== Before you begin, make sure you are in the latest ``develop`` branch of the repository. @@ -223,7 +229,7 @@ This command will install: - Documentation dependencies such as ``sphinx`` for building and maintaining documentation. 5. Set up configuration files -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +================================= The following section will walk through setting up your configuration files with your credentials to allow for communication between ``schematic`` and the Synapse API. @@ -231,7 +237,7 @@ There are two main configuration files that need to be created and modified: - ``.synapseConfig`` - ``config.yml`` -**Create and modify the ``.synapseConfig``** +**Create and modify the .synapseConfig** The ``.synapseConfig`` file is what enables communication between ``schematic`` and the Synapse API using your credentials. You can automatically generate a ``.synapseConfig`` file by running the following in your command line and following the prompts. @@ -253,7 +259,7 @@ The ``.synapseConfig`` is used to log into Synapse if you are not using an envir .. important:: When developing on ``schematic``, keep your ``.synapseConfig`` in your current working directory to avoid authentication errors. -**Create and modify the ``config.yml``** +**Create and modify the config.yml** In this repository, there is a ``config_example.yml`` file with default configurations to various components required before running ``schematic``, such as the Synapse ID of the main file view containing all your project assets, the base name of your manifest files, etc. @@ -279,7 +285,7 @@ Once you've copied the file, modify its contents according to your use case. For ``config.yml`` is ignored by git. 6. Obtain Google credential files -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +================================== Any function that interacts with a Google Sheet (such as ``schematic manifest get``) requires Google Cloud credentials. @@ -305,6 +311,6 @@ Once you have obtained credentials, ensure that the JSON file generated is named 7. Verify your setup -~~~~~~~~~~~~~~~~~~~~ +===================== After running the steps above, your setup is complete, and you can test it in a ``python`` instance or by running a command based on the examples diff --git a/docs/source/jsonschema_generation.rst b/docs/source/jsonschema_generation.rst index 502304178..3aa3983bd 100644 --- a/docs/source/jsonschema_generation.rst +++ b/docs/source/jsonschema_generation.rst @@ -9,7 +9,7 @@ JSONSchema Components and How to Set from Data Model ==================================================== This document serves as a guide on what features in a CSV data model map to which components in a JSONSchema file. All examples of JSONSchema files were taken from this `example data model `_. -For documentation on how to generate a JSONSchema file, see the :ref:`cli documentation`. +For documentation on how to generate a JSONSchema file, see the :ref:`cli documentation`. Property Keys ------------- @@ -114,7 +114,7 @@ Type Checks ^^^^^^^^^^^^^^ Types discussed above are enforced in JSONSchema validation. -For more information about these rules see :ref:`the documentation for type rules`. +For more information about these rules see :ref:`the documentation for type rules`. Valid Values ^^^^^^^^^^^^^^^ @@ -160,7 +160,7 @@ An attribute with valid values specified along with the ``list`` rule:: "title": "Check List Enum" } -For more information about the ``list`` rule see :ref:`the rule documentation`. +For more information about the ``list`` rule see :ref:`the rule documentation`. Required Attributes ^^^^^^^^^^^^^^^^^^^^^ @@ -193,7 +193,7 @@ An attribute with an ``inRange`` validation rule:: "type": "number" } -For more information about the ``inRange`` rule see :ref:`the rule documentation`. +For more information about the ``inRange`` rule see :ref:`the rule documentation`. ``regex`` module """"""""""""""""""""" @@ -220,7 +220,7 @@ While an attribute with a ``regex`` rule ``regex match [a-f]`` specified will yi -For more information about the ``regex`` module rule see :ref:`the rule documentation`. +For more information about the ``regex`` module rule see :ref:`the rule documentation`. ``date`` @@ -237,7 +237,7 @@ An attribute with a ``date`` validation rule specified:: "title": "Check Date" } -For more information about the ``date`` rule see :ref:`the rule documentation`. +For more information about the ``date`` rule see :ref:`the rule documentation`. ``URL`` @@ -253,7 +253,7 @@ An attribute with a ``URL`` validation rule specified:: "title": "Check URL" } -For more information about the ``URL`` rule see :ref:`the rule documentation`. +For more information about the ``URL`` rule see :ref:`the rule documentation`. Conditional Dependencies diff --git a/docs/source/linkml.rst b/docs/source/linkml.rst index 7d7462842..878b0f766 100644 --- a/docs/source/linkml.rst +++ b/docs/source/linkml.rst @@ -1,14 +1,18 @@ -====== +###### LinkML -====== +###### + +********** Background -========== +********** -DPE is currently looking into what the future of Schematic might look like. This includes the possibility of completely reworking how we handle data models. Currently, Schematic supports data models in CSV or JSON-LD format. Several DCCs are either using or planning on using LinkML to create their data models and then port them to JsonLD for use in schematic. One possibility in Schematic 2.0 (placeholder name) is to make LinkML the format for data models and to use native LinkML functionality where possible to reduce the work that Schematic does. DPE is currently looking into what the future of Schematic might look like. This includes the possibility of completely reworking how we handle data models. Currently, Schematic supports data models in CSV or JSON-LD format. Several DCCs are either using or planning on using LinkML to create their data models and then port them to JsonLD for use in schematic. One possibility in Schematic 2.0 (placeholder name) is to make LinkML the format for data models and to use native LinkML functionality where possible to reduce the work that Schematic does. +DPE is currently looking into what the future of Schematic might look like. This includes the possibility of completely reworking how we handle data models. Currently, Schematic supports data models in CSV or JSON-LD format. Several DCCs are either using or planning on using LinkML to create their data models and then port them to JsonLD for use in schematic. One possibility in Schematic 2.0 (placeholder name) is to make LinkML the format for data models and to use native LinkML functionality where possible to reduce the work that Schematic does. +***** Links -===== +***** + LinkML `documentation `_ DPE performed a `comparison `_ between Schematics current functionality and what LinkML could provide via the CLI. This is currently restricted to Sage Bionetworks staff. diff --git a/docs/source/manifest_generation.rst b/docs/source/manifest_generation.rst index 560776d5e..9b00030aa 100644 --- a/docs/source/manifest_generation.rst +++ b/docs/source/manifest_generation.rst @@ -1,34 +1,39 @@ .. _manifest_generation: +#################### Generate a manifest -=================== +#################### A **manifest** is a structured file containing metadata that adheres to a specific data model. This page covers different ways to generate a manifest. +************* Prerequisites -------------- +************* **Before Using the Schematic CLI** +================================== - **Install and Configure Schematic**: Ensure you have installed `schematic` and set up its dependencies. See the :ref:`installation` section for more details. - **Understand Important Concepts**: - Understand Important Concepts: Familiarize yourself with key concepts outlined on the :ref:`index` of the documentation. + Understand Important Concepts: Familiarize yourself with key concepts outlined on the :ref:`Homepage ` of the documentation. - **Configuration File**: Learn more about each attribute in the configuration file by referring to the relevant documentation. **Using the Schematic API in Production** +========================================= Visit the **Schematic API (Production Environment)**: ``_ This will open the **Swagger UI**, where you can explore all available API endpoints. +**************** Run help command ----------------- +**************** You could run the following commands to learn about subcommands with manifest generation: @@ -42,12 +47,14 @@ You could also run the following commands to learn about all the options with ma schematic manifest --config path/to/config.yml get -h - +************************** Generate an empty manifest ---------------------------- +************************** + +.. _empty_manifest_gen_cli: Option 1: Use the CLI -~~~~~~~~~~~~~~~~~~~~~ +===================== You can generate a manifest by running the following command: @@ -71,15 +78,19 @@ And if you want to generate a manifest as a csv file, you could do: schematic manifest -c /path/to/config.yml get -dt --output-csv +.. _empty_manifest_gen_api: + Option 2: Use the API -~~~~~~~~~~~~~~~~~~~~~ +====================== 1. Visit the `manifest/generate endpoint `_. 2. Click "Try it out" to enable input fields. 3. Enter the following parameters and execute the request: - **schema_url**: The URL of your data model. - - If your data model is hosted on **GitHub**, the URL should follow this format: + + - If your data model is hosted on **GitHub**, the URL should follow this format: + - JSON-LD: `https://raw.githubusercontent.com//data-model.jsonld` - CSV: `https://raw.githubusercontent.com//data-model.csv` @@ -90,12 +101,14 @@ Option 2: Use the API This will generate a manifest directly from the API. - +********************************************** Generate a manifest using a dataset on synapse ----------------------------------------------- +********************************************** + +.. _synapse_data_manifest_gen_cli: Option 1: Use the CLI -~~~~~~~~~~~~~~~~~~~~~~ +===================== .. note:: @@ -138,9 +151,10 @@ Here you should use syn12345678 to generate a manifest - **-dt **: Defines the data type/schema model for the manifest (e.g., `"Patient"`, `"Biospecimen"`). - **-d **: Retrieves the existing manifest associated with a specific dataset on Synpase. +.. _synapse_data_manifest_gen_api: Option 2: Use the API -~~~~~~~~~~~~~~~~~~~~~~ +===================== To generate a manifest using the **Schematic API**, follow these steps: @@ -164,8 +178,9 @@ To generate a manifest using the **Schematic API**, follow these steps: - **asset_view**: The **Synapse ID of the fileview** containing the top-level dataset for which you want to generate a manifest. +******************************************************************** Generate a manifest using a dataset on synapse and pull annotations --------------------------------------------------------------------- +******************************************************************** .. note:: When you pull annotations from Synapse, the existing metadata (annotations) associated with files or folders in a Synapse dataset is automatically retrieved and pre-filled into the generated manifest. @@ -196,9 +211,10 @@ Generate a manifest using a dataset on synapse and pull annotations The generated manifest will include the above annotations pulled from Synapse when enabled. +.. _pull_annotations_manifest_gen_cli: Option 1: Use the CLI -~~~~~~~~~~~~~~~~~~~~~~ +===================== .. note:: @@ -219,8 +235,10 @@ The **top-level dataset** can be either an empty folder or a folder containing f - **-d **: Retrieves the existing manifest associated with a specific dataset on Synpase. +.. _pull_annotations_manifest_gen_api: + Option 2: Use the API -~~~~~~~~~~~~~~~~~~~~~~ +====================== To generate a manifest using the **Schematic API**, follow these steps: diff --git a/docs/source/manifest_submission.rst b/docs/source/manifest_submission.rst index 441f28119..2b851ec6d 100644 --- a/docs/source/manifest_submission.rst +++ b/docs/source/manifest_submission.rst @@ -1,38 +1,45 @@ +############################# Submit a manifest to Synapse -============================ +############################# +************* Prerequisites -------------- +************* **Obtain Synapse Credentials**: -Ensure you have a Synapse account and set up Synapse configuration file correctly. See the :ref:`installation` section for more details. +================================ +Ensure you have a Synapse account and set up Synapse configuration file correctly. See the :ref:`setting up configuration files ` section for more details. + + +**Using the Schematic API in Production** +========================================= + +Visit the **Schematic API (Production Environment)**: +``_ + +This will open the **Swagger UI**, where you can explore all available API endpoints. + **Before Using the Schematic CLI** +================================== - **Install and Configure Schematic**: - Ensure you have installed `schematic` and set up its dependencies. - See the :ref:`installation` section for more details. + Ensure you have installed ``schematic`` and set up its dependencies. + See the :ref:`installation:installation` section for more details. - **Understand Important Concepts**: Familiarize yourself with key concepts outlined on the :ref:`index` of the documentation. - **Configuration File**: - For more details on configuring Schematic, refer to the :ref:`configure schematic` section. + For more details on configuring Schematic, refer to the :ref:`configuration:Configure Schematic` section. - **Obtain a manifest**: - Please obtain a manifest by following the documentation of generating a manifest. - - -**Using the Schematic API in Production** - -Visit the **Schematic API (Production Environment)**: -``_ - -This will open the **Swagger UI**, where you can explore all available API endpoints. + Please obtain a manifest by following the documentation of :ref:`generating a manifest `. +**************** Run help command ----------------- +**************** You could run the following commands to learn about subcommands with manifest submission: @@ -46,9 +53,9 @@ You could also run the following commands to learn about all the options with ma schematic model --config path/to/config.yml submit -h - +********************************** Submit a Manifest File to Synapse ---------------------------------- +********************************** .. note:: @@ -87,9 +94,10 @@ Submit a Manifest File to Synapse Here is the top-level folder ID: syn12345678 +.. _submit_manifest_cli: Option 1: Use the CLI -~~~~~~~~~~~~~~~~~~~~~~ +===================== .. note:: @@ -109,8 +117,10 @@ Option 1: Use the CLI - **-tcn**: Table Column Names: This is optional, and the available options are "class_label", "display_label", and "display_name". The default is "class_label", but you can change it based on your requirements. +.. _submit_manifest_api: + Option 2: Use the API -~~~~~~~~~~~~~~~~~~~~~~ +====================== .. note:: @@ -143,18 +153,19 @@ Option 2: Use the API - **table_column_names**: This is optional. Available options are "class_label", "display_label", and "display_name". The default is "class_label". - +******************************************* Submit a Manifest file and Add Annotations -------------------------------------------- +******************************************* .. note:: Since annotations are enabled in the submission, if you are submitting a file-based manifest, you should see annotations attached to the entity IDs listed in the manifest. +.. _submit_manifest_add_annotations_cli: Option 1: Use the CLI -~~~~~~~~~~~~~~~~~~~~~~ +===================== .. note:: @@ -175,8 +186,10 @@ Option 1: Use the CLI - **-tcn**: Table Column Names: This is optional, and the available options are "class_label", "display_label", and "display_name". The default is "class_label", but you can change it based on your requirements. +.. _submit_manifest_add_annotations_api: + Option 2: Use the API -~~~~~~~~~~~~~~~~~~~~~~ +====================== .. note:: @@ -209,9 +222,9 @@ Option 2: Use the API - **table_column_names**: This is optional. Available options are "class_label", "display_label", and "display_name". The default is "class_label". - +************************************** Expedite submission process (Optional) ---------------------------------------- +************************************** If your asset view contains multiple projects, it might take some time for the submission to finish. @@ -219,9 +232,10 @@ You could expedite the submission process by specifying the project_scope parame To utilize this parameter, make sure that the projects listed there are part of the asset view. +.. _expedite_submission_cli: Option 1: Use the CLI -~~~~~~~~~~~~~~~~~~~~~~ +===================== .. code-block:: bash @@ -230,8 +244,10 @@ Option 1: Use the CLI - **-ps**: Specifies the project scope as a comma separated list of project IDs. +.. _expedite_submission_api: + Option 2: Use the API -~~~~~~~~~~~~~~~~~~~~~~ +====================== 1. Visit the `**model/submit** endpoint `_ 2. Click **"Try it out"** to enable input fields. @@ -261,22 +277,24 @@ Option 2: Use the API - **table_column_names**: This parameter is not applicable when uploading a manifest as a file. You can keep it as is and it will be ignored. +************************************* Enable upsert for manifest submission -------------------------------------- +************************************* By default, the CLI/API will replace the existing manifest and table with the new one. If you want to update the existing manifest and table, you could use the upsert option. -Pre-requisite -~~~~~~~~~~~~~~ +Pre-requisites +============== 1. Ensure that all your manifests, including both the initial manifests and those containing rows to be upserted, include a primary key: . For example, if your component name is "Patient", the primary key should be "Patient_id". 2. If you plan to use upsert in the future, select the upsert option during the initial table uploads. 3. Currently it is required to use -tcn "display_label" with table upserts. +.. _enable_upsert_cli: Option 1: Use the CLI -~~~~~~~~~~~~~~~~~~~~~~ +====================== .. code-block:: bash @@ -285,8 +303,11 @@ Option 1: Use the CLI - **-tm**: The default option is "replace". Change it to "upsert" for enabling upsert. - **-tcn**: Use display label for upsert. + +.. _enable_upsert_api: + Option 2: Use the API -~~~~~~~~~~~~~~~~~~~~~~ +====================== 1. Visit the `**model/submit** endpoint `_ 2. Click **"Try it out"** to enable input fields. diff --git a/docs/source/manifest_validation.rst b/docs/source/manifest_validation.rst index 7c1abc16f..bfe86059b 100644 --- a/docs/source/manifest_validation.rst +++ b/docs/source/manifest_validation.rst @@ -1,62 +1,74 @@ .. _Validating a Metadata Manifest: +############################## Validating a Metadata Manifest -================================================= +############################## +************* Prerequisites -------------- +************* **Obtain Synapse Credentials**: -Ensure you have a Synapse account and set up Synapse configuration file correctly. See the :ref:`installation` section for more details. +================================ +Ensure you have a Synapse account and set up Synapse configuration file correctly. See the :ref:`setting up configuration files ` section for more details. + + +**Using the Schematic API in Production** +========================================= + +Visit the **Schematic API (Production Environment)**: +``_ + +This will open the **Swagger UI**, where you can explore all available API endpoints. + **Before Using the Schematic CLI** +================================== - **Install and Configure Schematic**: - Ensure you have installed `schematic` and set up its dependencies. - See the :ref:`installation` section for more details. + Ensure you have installed ``schematic`` and set up its dependencies. + See the :ref:`installation:installation` section for more details. - **Understand Important Concepts**: Familiarize yourself with key concepts outlined on the :ref:`index` of the documentation. - **Configuration File**: - For more details on configuring Schematic, refer to the documentation on :ref:`creating a configuration file for schematic `. + For more details on configuring Schematic, refer to the :ref:`configuration:Configure Schematic` section. - **Obtain a manifest**: Please obtain a manifest by following the documentation of :ref:`generating a manifest `. -**Using the Schematic API in Production** - -Visit the **Schematic API (Production Environment)**: -``_ - -This will open the **Swagger UI**, where you can explore all available API endpoints. - - +************ Requirements -------------------------------------------------- +************ Authentication -~~~~~~~~~~~~~~~~~~~~ +============== + Authentication with Synapse is required for metadata validation that includes Cross Manifest Validation rules or the ``filenameExists`` rule. File Format -~~~~~~~~~~~~~~ +=========== + In general, metadata manifests must be stored as ``.CSV`` files. When validating through the api, manifests may alternatively be sent as a JSON string. Required Column Headers -~~~~~~~~~~~~~~~~~~~~~~~~~ +======================= + A ``Component`` column that specifies the data type of the metadata must be present in the manifest. Additionally, columns must be present for each attribute in the component that you wish to validate. Restricted Column Headers -~~~~~~~~~~~~~~~~~~~~~~~~~~~ +========================= The columns ``Filename``, ``entityId``, and ``Component`` are reserved for use by schematic and should not be used as other attributes in a data model. - +******************* Manifest Validation -------------------------------------------------- +******************* + Overview -~~~~~~~~~ +======== + Invalidities within a manifest’s metadata are classified as either errors or warnings depending on the rule itself, whether the attribute is required, and what the data modeler has specified. Errors are considered serious invalidities that must be corrected before submission. Warnings are considered less serious invalidities that are acceptable. A manifest with errors should not be submitted and the presence of errors found during submission will block submission. The presence of warnings will not block submission. @@ -93,14 +105,15 @@ or by viewing the parameter descriptions under the endpoints linked above. With the CLI -~~~~~~~~~~~~~~~ +============= Authentication -^^^^^^^^^^^^^^^^ +-------------- + To authenticate for use with the CLI, follow the installation guide instructions on how to :ref:`set up configuration files ` Parameters -^^^^^^^^^^^^^^^ +---------- --manifest_path/-mp string @@ -150,14 +163,16 @@ The SynId of the fileview containing all relevant project assets should also be With the API -~~~~~~~~~~~~~~~ +============ Authentication -^^^^^^^^^^^^^^^^ +-------------- + Your Synapse token should be included the in the request headers under the ``access_token`` key. In the SwaggerUI this can be added by clicking the padlock icon at the top right or next to the endoints that accept it. Parameters -^^^^^^^^^^^^^^^ +---------- + schema_url string url to the raw version of the data model in either ``.CSV`` or ``.JSONLD`` formats @@ -200,7 +215,8 @@ dataset_scope Specify a dataset to validate against for filename validation. Request Body -^^^^^^^^^^^^^ +------------ + file_name string($binary) @@ -208,13 +224,13 @@ file_name Response -^^^^^^^^^^^ +-------- If valiation completes successfully, regardless of the presence of validation errors or warnings, you'll recieve a ``200`` response code. The body will be a JSON string containing a list of valiation errors and warnings in the format of ``{"errors": [list of errors], "warnings": [warnings]}`` Validating though the CLI will display all the errors and warnings found during validation or a message that no errors or warnings were found and the manifest is considered valid. - +***************** With the Library -~~~~~~~~~~~~~~~~~ +***************** TODO diff --git a/docs/source/troubleshooting.rst b/docs/source/troubleshooting.rst index ec8f33a4f..1548ab50f 100644 --- a/docs/source/troubleshooting.rst +++ b/docs/source/troubleshooting.rst @@ -1,10 +1,13 @@ +############### Troubleshooting -=============== +############### These are some common issues you may encounter when using schematic +********* Debugging ---------- +********* + Whether you are using DCA or schematic API or schematic library/CLI, the following are some steps that you want to take to debug your issues. Here are some steps to walk you through the process. 1. What was the command that caused the error? @@ -120,6 +123,9 @@ Whether you are using DCA or schematic API or schematic library/CLI, the followi -H 'accept: application/json' ... +Manifest Submission +=================== + Manifest Submit: `RuntimeError: failed with SynapseHTTPError('400 Client Error: nan is not a valid Synapse ID.')` ----------------------------------------------------------------------------------------------------------------- @@ -154,6 +160,10 @@ You may encounter this error if your manifest has an "id" (lower case) column du To fix: Delete the `id` (any case variation) and `eTag` column (any case variation) from your manifest and submit the manifest again. +Manifest Validation +=================== + + Manifest validation: `The submitted metadata does not contain all required column(s)` ------------------------------------------------------------------------------------- @@ -172,6 +182,8 @@ but the actual Component name is "ImagingAssayTemplate". To fix: Check if your manifest has invalid Component values and fill it out correctly. Using the above example, fill out your Component column with "ImagingAssayTemplate" +Manifest Generation +=================== Manifest Generate: `KeyError: entityId` --------------------------------------- diff --git a/docs/source/tutorials.rst b/docs/source/tutorials.rst index 6ecc1f6ff..414adf21c 100644 --- a/docs/source/tutorials.rst +++ b/docs/source/tutorials.rst @@ -1,9 +1,10 @@ +######### Tutorials -========= - +######### +*************************************** Contributing your manifest with the CLI ---------------------------------------- +*************************************** In this tutorial, you'll learn how to contribute your metadata manifests to Synapse using the `CLI`. Following best practices, we will cover generating, validating, and submitting your manifest in a structured workflow. @@ -21,15 +22,17 @@ we will cover generating, validating, and submitting your manifest in a structur We strongly recommend not doing that. +************* Prerequisites -~~~~~~~~~~~~~ +************* 1. **Install and configure Schematic**: Ensure that you have installed `schematic` and set up its dependencies. See "Installation Guide For: Users" for more information. 2. **Important Concepts**: Make sure you know the important concepts outlined on the home page of the doc site. 3. **Configuration**: Read more here about each of the attributes in the configuration file. +****************************** Steps to Contribute a Manifest -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +****************************** The contribution process includes three main commands. For information about the parameters of each of these commands, please refer to the CLI Reference section. @@ -40,7 +43,7 @@ For information about the parameters of each of these commands, please refer to Step 1: Generate a Manifest -~~~~~~~~~~~~~~~~~~~~~~~~~~~ +=========================== The `schematic manifest get` command that creates a manifest template based on a data model and existing manifests. @@ -60,7 +63,7 @@ The `schematic manifest get` command that creates a manifest template based on a This command will create a CSV file with the necessary columns and headers, which you can then fill with your metadata. Step 2: Validate the Manifest (Optional) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +======================================== Though optional, `schematic model validate`` is a useful step to ensure that your manifest meets the required standards before submission. It checks for any errors, such as missing or incorrectly formatted values. @@ -76,7 +79,7 @@ It checks for any errors, such as missing or incorrectly formatted values. If validation passes, you'll see a success message; if there are errors, `schematic` will list them. Correct any issues before proceeding to submission. Step 3: Submit the Manifest to Synapse -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +====================================== The `schematic model submit` command uploads your manifest to Synapse. This command will automatically validate the manifest as part of the submission process, so if you prefer, you can skip the standalone validation step. diff --git a/docs/source/validation_rules.rst b/docs/source/validation_rules.rst index b77f2cddd..932db93e7 100644 --- a/docs/source/validation_rules.rst +++ b/docs/source/validation_rules.rst @@ -1,14 +1,15 @@ -================ -Validation Rules -================ +############################### +Validation Rules Documentation +############################### .. contents:: - :depth: 2 + :depth: 3 :local: :backlinks: entry +********* Overview -======== +********* When Schematic validates a manifest, it uses a data model. The data model contains a list of validation rules for each component(data-type). This document describes all allowed validation rules currently implemented by Schematic. @@ -63,7 +64,7 @@ List Validation Type -------------------- list -~~~~ +^^^^^ - Use to parse the imported value to a list of values and (optionally) to verify that the user provided value was a comma separated list, depending on how strictly entries must conform to the list structure. Values can come from Valid Values. @@ -95,7 +96,7 @@ Regex Validation Type --------------------- regex -~~~~~ +^^^^^^ - Use the ``regex`` validation rule when you want to require that a user input values in a specific format, i.e. an ID that follows a particular format. @@ -136,22 +137,22 @@ Type Validation Type - Examples: [ ``str``, ``str error``, ``str warning``] float -~~~~~ +^^^^^^ - Checks that the value is a float. int -~~~ +^^^^ - Checks that the value is an integer. num -~~~ +^^^^ - Checks that the value is either an integer or float. str -~~~ +^^^ - Checks that the value is a string (not a number). @@ -159,7 +160,7 @@ URL Validation Type ------------------- url -~~~ +^^^ - Using the ``url`` rule implies the user should add a URL to a free text box as a string. This function will check that the user has provided a usable URL. It will check for any standard URL error and throw an error if one is found. Further additions to this rule can allow for checking that a specific type of URL is added. For example, if the user needs to ensure that the input contains a http://protocols.io URL string, http://protocols.io can be added after url to perform this check. @@ -183,7 +184,7 @@ Required Validation Type ------------------------ required -~~~~~~~~ +^^^^^^^^ An attribute's requirement is typically set using the required column (csv) or field (JSONLD) in the data model. A ``True`` value means a users must supply a value, ``False`` means they are allowed to skip providing a value. @@ -219,7 +220,7 @@ When using the ``required`` validation rule, the ``Required`` column must ``Fals - To verify that the ``required`` rule is working as expected, you can generate all impacted manifests—required, and columns should appear highlighted in light blue. -Examples: +**Examples:** - ``#BiospecimenManifest required`` @@ -244,28 +245,27 @@ There are three rules that do cross-manifest validation: [``matchAtLeastOne``, ` There are two scopes to choose from: [ ``value``, ``set``] -Value Scope -~~~~~~~~~~~ +``value`` Scope +^^^^^^^^^^^^^^^^ When the value scope is used all values from the target attribute in all target manifests are combined. The values from the manifest being validated are compared to this combined list. In other words, there is no distinction between what values came from what target manifest. matchAtleastOne Value Scope -^^^^^^^^^^^^^^^^^^^^^^^^^^^ +""""""""""""""""""""""""""" The manifest is validated if each value in the target attribute exists at least once in the combined values of the target attribute of the target manifests. matchExactlyOne Value Scope -^^^^^^^^^^^^^^^^^^^^^^^^^^^ +""""""""""""""""""""""""""" The manifest is validated if each value in the target attribute exists once, and only once, in the combined values of the target attribute of the target manifests. matchNone Value Scope -^^^^^^^^^^^^^^^^^^^^^ +""""""""""""""""""""" The manifest is validated if each value in the target attribute does not exist in the combined values of the target attribute of the target manifests. -Example 1 -^^^^^^^^^ +**Example 1** Tested manifest: ["A"] @@ -279,8 +279,7 @@ Target manifests: ["A", "B"] - because "A" is in the target manifest -Example 2 -^^^^^^^^^ +**Example 2** Tested manifest: ["A", "C"] @@ -298,8 +297,7 @@ Target manifests: ["A", "B"] - because "A" is in the target manifest -Example 3 -^^^^^^^^^ +**Example 3** Tested manifest: ["C"] @@ -315,8 +313,7 @@ Target manifests: ["A", "B"] - matchNone: passes -Example 4 -^^^^^^^^^ +**Example 4** Tested manifest: ["A", "A"] @@ -330,8 +327,7 @@ Target manifests: ["A", "B"] - because "A" is in the target manifest -Example 5 -^^^^^^^^^ +**Example 5** Tested manifest: ["A"] @@ -347,8 +343,7 @@ Target manifests: ["A", "A"] - because "A" is in the target manifest -Example 6 -^^^^^^^^^ +**Example 6** Tested manifest: ["A"] @@ -364,8 +359,7 @@ matchNone: fails because "A" is in the target manifest -Example 7 -^^^^^^^^^ +**Example 7** Tested manifest: ["A"] @@ -381,8 +375,8 @@ Target manifests: ["A", "B"], ["A", "B"] - because "A" is in the target manifest -Set scope -~~~~~~~~~ +``set`` Scope +^^^^^^^^^^^^^ When the set scope is used the values from the tested manifest are compared **one at a time** against each target manifest, and the number of matches are counted. The test to determine if the tested manifest matches the target manifest is to see if the tested manifest values are a subset of the target manifest values. Imagine a target manifest who's values are ["A", "B" "C"]: @@ -391,22 +385,21 @@ When the set scope is used the values from the tested manifest are compared **on - [1], ["D"], ["D", "D"], ["D", "E"] are not subsets of the example target manifest. matchAtleastOne Set scope -^^^^^^^^^^^^^^^^^^^^^^^^^ +""""""""""""""""""""""""" The manifest is validated if there is atleast one set match between the tested manifest and the target manifests matchExactlyOne Set scope -^^^^^^^^^^^^^^^^^^^^^^^^^ +""""""""""""""""""""""""" The manifest is validated if there is one and only one set match between the tested manifest and the target manifests matchNone Set scope -^^^^^^^^^^^^^^^^^^^ +"""""""""""""""""""" The manifest is validated if there are no set match between the tested manifest and the target manifests -Example 1 -^^^^^^^^^ +**Example 1** Tested manifest: ["A"] @@ -420,8 +413,7 @@ matchNone: fails because "A" is in the target manifest -Example 2 -^^^^^^^^^ +**Example 2** Tested manifest: ["A"] @@ -435,8 +427,7 @@ Target manifests: ["A", "B"], ["C", "D"] - because "A" is in atleast one of the target manifest -Example 3 -^^^^^^^^^ +**Example 3** Tested manifest: ["A"] @@ -452,8 +443,7 @@ Target manifests: ["A", "B"], ["A", "B"] - because "A" is in atleast one of the target manifests -Example 4 -^^^^^^^^^ +**Example 4** Tested manifest: ["C"] @@ -475,7 +465,7 @@ Content Validation Type Rules can be used to validate the contents of entries for an attribute. recommended -~~~~~~~~~~~ +^^^^^^^^^^^^ - Use to raise a warning when a manifest column is not required but empty. If an attribute is always necessary then ``required`` should be set to ``TRUE`` instead of using the ``recommended`` validation rule. @@ -490,7 +480,7 @@ recommended - Default behavior: raises ``warning`` protectAges -~~~~~~~~~~~ +^^^^^^^^^^^^ - Use to ensure that patient ages under 18 and over 89 years of age are censored when uploading for sharing. If necessary, a censored version of the manifest will be created and uploaded along with the uncensored version. Uncensored versions will be uploaded as restricted and Terms of Use will need to be set. Please follow up with governance after upload to set the terms of use @@ -505,7 +495,7 @@ protectAges - Default behavior: raises ``warning`` unique -~~~~~~ +^^^^^^^ - Use to ensure that attribute values are not duplicated within a column. @@ -520,7 +510,7 @@ unique - Default behavior: raises ``error`` inRange -~~~~~~~ +^^^^^^^ - Use to ensure that numerical data is within a specified range @@ -535,7 +525,7 @@ inRange - Default behavior: raises ``error`` date -~~~~ +^^^^ - Use to ensure the value parses as a date @@ -557,7 +547,7 @@ This requires paths to be enabled for the synapse master file view in use. Can b This should be used only with the Filename attribute in a data model and specified with `Component Based Rule Setting `_ filenameExists -~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^ - Used to validate that the filenames and paths as they exist in the metadata manifest match the paths that are in the Synapse master File View for the specified dataset @@ -606,7 +596,7 @@ We get the following results for this Manifest:: Rule Combinations ------------------ +================= Schematic allows certain combinations of existing validation rules to be used on a single attribute, where appropriate. @@ -627,7 +617,7 @@ Rule combinations: [``list::regex``, ``int::inRange``, ``float::inRange``, ``num - ``list :: regex search [HTAN][0-9]{1}_[0-9]{4}_[0-9]*`` Component-Based Rule Setting ----------------------------- +============================= **Component-Based Rule Setting** is a powerful feature in data modeling that enables users to create rules tailored to specific subsets of components or manifests. This functionality was developed to address scenarios where a data modeler needs to enforce uniqueness for certain attribute values within one manifest while allowing non-uniqueness in another.