Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 9 additions & 8 deletions docs/source/asset_store.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
############################
Setting up your asset store
===========================
############################

.. note::

Expand Down Expand Up @@ -38,9 +39,9 @@ In each of these project setups, there are two ways you can lay out your data:
├── batch_1 <- annotated
└── batch_2 <- annotated


***************************************
Option 1: Distributed Synapse Projects
--------------------------------------
***************************************

Pick **option 1** if you answer "yes" to one or more of the following questions:

Expand All @@ -50,7 +51,7 @@ Pick **option 1** if you answer "yes" to one or more of the following questions:
- Are you not willing to annotate each DCC dataset folder with the annotation ``contentType:dataset``?

Access & Project Setup - Multiple Contributing Projects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
========================================================

1. Create a DCC Admin Team with admin permissions.
2. Create a Team for each data contributing institution. Begin with a "Test Team" if all teams are not yet identified.
Expand All @@ -70,9 +71,9 @@ Access & Project Setup - Multiple Contributing Projects
distributed projects, just the ``contentType`` column to your fileview, and you will have
to annotate your top level folders with ``contentType:dataset``.


**********************************
Option 2: Single Synapse Project
--------------------------------
**********************************

Pick **option 2** if you don't select option 1 and you answer "yes" to any of these questions:

Expand All @@ -84,7 +85,7 @@ If neither option fits, select option 1.


Access & Project Setup - Single Contributing Project
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
=======================================================

1. Create a Team for each data contributing institution.
2. Create a single Synapse Project (e.g., MyDCC).
Expand Down Expand Up @@ -115,7 +116,7 @@ Access & Project Setup - Single Contributing Project
proliferation of folders per contributor and data type.

Synapse External Cloud Buckets Setup
------------------------------------
=====================================

If DCC contributors require external cloud buckets, select one of the following configurations. For more information on how to
set this up on Synapse, view this documentation: https://help.synapse.org/docs/Custom-Storage-Locations.2048327803.html
Expand Down
22 changes: 12 additions & 10 deletions docs/source/cli_reference.rst
Original file line number Diff line number Diff line change
@@ -1,45 +1,47 @@
=============
#############
CLI Reference
=============
#############

When you're using this tool ``-d`` flag is referring to the Synapse ID of a folder that would be found under the files tab
that contains a manifest and data. This would be referring to a "Top Level Folder". It is not required to provide a ``dataset_id``
but if you're trying to pull existing annotations by using the ``-a`` flag and the manifest is file-based then you would
need to provide a ``dataset_id``.


*****************************************
Generate a new manifest as a Google Sheet
-----------------------------------------

*****************************************

.. code-block:: shell

schematic manifest -c /path/to/config.yml get -dt <your data type> -s

******************************************
Generate an existing manifest from Synapse
------------------------------------------
******************************************

.. code-block:: shell

schematic manifest -c /path/to/config.yml get -dt <your data type> -d <your synapse "Top Level Folder" folder id> -s

*****************************************
Validate a manifest
-------------------
*****************************************

.. code-block:: shell

schematic model -c /path/to/config.yml validate -dt <your data type> -mp <your csv manifest path>

*****************************************
Submit a manifest as a file
---------------------------
*****************************************

.. code-block:: shell

schematic model -c /path/to/config.yml submit -mp <your csv manifest path> -d <your synapse "Top Level Folder" id> -vc <your data type> -mrt file_only


*****************************************
In depth guide
--------------
*****************************************

.. click:: schematic.__main__:main
:prog: schematic
Expand Down
3 changes: 3 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@
# ones.
extensions = ["sphinx_click", "sphinx_rtd_theme", "sphinx.ext.autosectionlabel"]

# Configure autosection label to prefix sections with document name. Requires referencing from directory index.rst is in.
autosectionlabel_prefix_document = True

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

Expand Down
20 changes: 11 additions & 9 deletions docs/source/configuration.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
.. _configuration:

###################
Configure Schematic
===================
###################

This is an example config for Schematic. All listed values are those that are the default if a config is not used. Remove any fields in the config you don't want to change.
If you remove all fields from a section, the entire section should be removed including the header.
Expand Down Expand Up @@ -48,35 +47,38 @@ Change the values of any fields you do want to change. Please view the installa

This document will go into detail what each of these configurations mean.

***********
Asset Store
-----------
***********

Synapse
~~~~~~~
========
This describes where assets such as manifests are stored and the configurations of the asset store is described
under the asset store section.

* master_fileview_id: Synapse ID of the file view listing all project data assets.
* config: Path to the synapse config file, either absolute or relative to this file. Note, if you use `synapse config` command, you will have to provide the full path to the configuration file.
* manifest_basename: Base name that manifest files will be saved as on Synapse. The Component will be appended to it so for example: `synapse_storage_manifest_biospecimen.csv`

**********
Manifest
--------
**********
This describes information about manifests as it relates to generation and validation. Note: some of these configurations can be overwritten by the CLI commands.

* manifest_folder: Location where manifests will saved to. This can be a relative or absolute path on your local machine.
* title: Title or title prefix given to generated manifest(s). This is used to name the manifest file saved locally.
* data_type: Data types of manifests to be generated or data type (singular) to validate manifest against. If you wanted all the available manifests, you can input "all manifests"


******
Model
-----
******
Describes the location of your schema

* location: This is the location of your schema jsonld, it must be a path relative to this file or absolute path. Currently URL's are NOT supported, so you will have to download the jsonld data model. Here is an example: https://raw.githubusercontent.com/ncihtan/data-models/v24.9.1/HTAN.model.jsonld

*************
Google Sheets
-------------
*************
Schematic leverages the Google API to generate manifests. This section is for using google sheets with Schematic

* service_acct_creds: Path to the google service account creds, either absolute or relative to this file. This is the path to the service account credentials file that you download from Google Cloud Platform.
Expand Down
38 changes: 20 additions & 18 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@

.. _index:

######################################
Welcome to Schematic's documentation!
=====================================
######################################

.. warning::
This documentation site is a work in progress, and the sublinks may change. Apologies for the inconvenience.
Expand All @@ -28,15 +29,16 @@ Schematic tackles these goals:
:depth: 2
:local:

*******************
Important Concepts
------------------
*******************

.. important::

Before moving reading more about schematic, this section covers essential concepts relevant for using the Schematic tool effectively.

Synapse FileViews
~~~~~~~~~~~~~~~~~
=================
Users are responsible for setting up a **FileView** that integrates with Schematic. Note that FileViews appear under the "Tables" tab in Synapse and can be named according to the project's needs. For instance, a FileView for the **Project A** could have a different name than a FileView for the **Project B**.

For more information on Synapse projects, visit:
Expand All @@ -45,17 +47,17 @@ For more information on Synapse projects, visit:
- `Synapse annotations <https://help.synapse.org/docs/Annotating-Data-With-Metadata.2667708522.html>`_

Synapse Folders
~~~~~~~~~~~~~~~
================

Folders in Synapse allow users to organize data within projects. More details on uploading and organizing data can be found at `Synapse folders <https://help.synapse.org/docs/Uploading-and-Organizing-Data-Into-Projects,-Files,-and-Folders.2048327716.html>`_

Synapse Datasets
~~~~~~~~~~~~~~~~
================

This is an object in Synapse which appears under the "Dataset" tab and represents a user-defined collection of Synapse files and versions. https://help.synapse.org/docs/Datasets.2611281979.html

JSON-LD
~~~~~~~
=======
JSON-LD is a lightweight Linked Data format. The usage of JSON-LD to capture our data models
extends beyond the creation, validation, and submission of annotations/manifests into Synapse
It can create relationships between different data models and, in the future, drive
Expand All @@ -64,22 +66,22 @@ and their relationships is also possible which allows the community to see the d
connections between all the data uploaded into Synapse.

Manifest
~~~~~~~~
========

A manifest is a structured file that contains metadata about files under a "top level folder".
The metadata includes information of the files such as data type and etc.
The manifest can also used to annotate the data on Synapse and create a file view
that enables the FAIR principles on each of the files in the "top level folder".

Component/Data type
~~~~~~~~~~~~~~~~~~~
===================
"component" and "data type" are used interchangeably. The component/data type is determined from the specified JSON-LD data model.
If the string "component" exists in the depends on column, the "Attribute" value in that row is a data type.
Examples of a data type is "Biospecimen", "Patient": https://github.com/Sage-Bionetworks/schematic/blob/develop/tests/data/example.model.csv#L3.
Each data type/component should a manifest template that has different columns.

Project Data Layout
~~~~~~~~~~~~~~~~~~~
===================

Regardless of data layout, the data in your Synapse Project(s) are uploaded into Synapse Folders to be curated and annotated by schematic.
In both layouts listed below, the project administrators along with the data contributors may have preferences on how the
Expand All @@ -93,19 +95,19 @@ different things under these two layouts.

In both of these layouts, these are really just groupings of resources.


*******************
Schematic services
------------------
*******************

The following are the four main endpoints that assist with the high-level goals outlined above, with additional goals to come.

Manifest Generation
~~~~~~~~~~~~~~~~~~~
===================

Provides a manifest template for users for a particular project or data type. If a project with annotations already exists, a semi-filled-out template can be provided to the user. This ensures they do not start from scratch. If there are no existing annotations and manifests, an empty manifest template is provided.

Manifest Validation
~~~~~~~~~~~~~~~~~~~
Validating a Manifest
=====================

Given a filled-out manifest:

Expand All @@ -116,7 +118,7 @@ Given a filled-out manifest:
- Validation results are provided before the manifest file is uploaded into Synapse.

Manifest Submission
~~~~~~~~~~~~~~~~~~~
===================

Given a filled out manifest, this will allow you to submit the manifest to the "top level folder".
This is validates the manifest and...
Expand All @@ -130,13 +132,13 @@ This is validates the manifest and...
More validation documentation can be found here: https://sagebionetworks.jira.com/wiki/spaces/SCHEM/pages/3302785036/Schematic+Validation

Data Model Visualization
~~~~~~~~~~~~~~~~~~~~~~~~
========================

These endpoints allows you to visulize your data models and their relationships with each other.


**************
API reference
-------------
**************

For the entire Python API reference documentation, you can visit the docs here: https://sage-bionetworks.github.io/schematic/

Expand Down
Loading