Skip to content

Commit 019b0ba

Browse files
committed
Merge branch 'develop' into SYNPY-1856
2 parents 26bf808 + e8b3b61 commit 019b0ba

31 files changed

+5633
-724
lines changed

docs/source/asset_store.rst

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
1+
############################
12
Setting up your asset store
2-
===========================
3+
############################
34

45
.. note::
56

@@ -38,9 +39,9 @@ In each of these project setups, there are two ways you can lay out your data:
3839
├── batch_1 <- annotated
3940
└── batch_2 <- annotated
4041
41-
42+
***************************************
4243
Option 1: Distributed Synapse Projects
43-
--------------------------------------
44+
***************************************
4445

4546
Pick **option 1** if you answer "yes" to one or more of the following questions:
4647

@@ -50,7 +51,7 @@ Pick **option 1** if you answer "yes" to one or more of the following questions:
5051
- Are you not willing to annotate each DCC dataset folder with the annotation ``contentType:dataset``?
5152

5253
Access & Project Setup - Multiple Contributing Projects
53-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
54+
========================================================
5455

5556
1. Create a DCC Admin Team with admin permissions.
5657
2. Create a Team for each data contributing institution. Begin with a "Test Team" if all teams are not yet identified.
@@ -70,9 +71,9 @@ Access & Project Setup - Multiple Contributing Projects
7071
distributed projects, just the ``contentType`` column to your fileview, and you will have
7172
to annotate your top level folders with ``contentType:dataset``.
7273

73-
74+
**********************************
7475
Option 2: Single Synapse Project
75-
--------------------------------
76+
**********************************
7677

7778
Pick **option 2** if you don't select option 1 and you answer "yes" to any of these questions:
7879

@@ -84,7 +85,7 @@ If neither option fits, select option 1.
8485

8586

8687
Access & Project Setup - Single Contributing Project
87-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
88+
=======================================================
8889

8990
1. Create a Team for each data contributing institution.
9091
2. Create a single Synapse Project (e.g., MyDCC).
@@ -115,7 +116,7 @@ Access & Project Setup - Single Contributing Project
115116
proliferation of folders per contributor and data type.
116117

117118
Synapse External Cloud Buckets Setup
118-
------------------------------------
119+
=====================================
119120

120121
If DCC contributors require external cloud buckets, select one of the following configurations. For more information on how to
121122
set this up on Synapse, view this documentation: https://help.synapse.org/docs/Custom-Storage-Locations.2048327803.html

docs/source/cli_reference.rst

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,47 @@
1-
=============
1+
#############
22
CLI Reference
3-
=============
3+
#############
44

55
When you're using this tool ``-d`` flag is referring to the Synapse ID of a folder that would be found under the files tab
66
that contains a manifest and data. This would be referring to a "Top Level Folder". It is not required to provide a ``dataset_id``
77
but if you're trying to pull existing annotations by using the ``-a`` flag and the manifest is file-based then you would
88
need to provide a ``dataset_id``.
99

10-
10+
*****************************************
1111
Generate a new manifest as a Google Sheet
12-
-----------------------------------------
13-
12+
*****************************************
1413

1514
.. code-block:: shell
1615
1716
schematic manifest -c /path/to/config.yml get -dt <your data type> -s
1817
18+
******************************************
1919
Generate an existing manifest from Synapse
20-
------------------------------------------
20+
******************************************
2121

2222
.. code-block:: shell
2323
2424
schematic manifest -c /path/to/config.yml get -dt <your data type> -d <your synapse "Top Level Folder" folder id> -s
2525
26+
*****************************************
2627
Validate a manifest
27-
-------------------
28+
*****************************************
2829

2930
.. code-block:: shell
3031
3132
schematic model -c /path/to/config.yml validate -dt <your data type> -mp <your csv manifest path>
3233
34+
*****************************************
3335
Submit a manifest as a file
34-
---------------------------
36+
*****************************************
3537

3638
.. code-block:: shell
3739
3840
schematic model -c /path/to/config.yml submit -mp <your csv manifest path> -d <your synapse "Top Level Folder" id> -vc <your data type> -mrt file_only
3941
40-
42+
*****************************************
4143
In depth guide
42-
--------------
44+
*****************************************
4345

4446
.. click:: schematic.__main__:main
4547
:prog: schematic

docs/source/conf.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,10 @@
4242
# Add any Sphinx extension module names here, as strings. They can be
4343
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
4444
# ones.
45-
extensions = ["sphinx_click", "sphinx_rtd_theme"]
45+
extensions = ["sphinx_click", "sphinx_rtd_theme", "sphinx.ext.autosectionlabel"]
46+
47+
# Configure autosection label to prefix sections with document name. Requires referencing from directory index.rst is in.
48+
autosectionlabel_prefix_document = True
4649

4750
# Add any paths that contain templates here, relative to this directory.
4851
templates_path = ["_templates"]

docs/source/configuration.rst

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
1-
.. _configuration:
2-
1+
###################
32
Configure Schematic
4-
===================
3+
###################
54

65
This is an example config for Schematic. All listed values are those that are the default if a config is not used. Remove any fields in the config you don't want to change.
76
If you remove all fields from a section, the entire section should be removed including the header.
@@ -48,35 +47,38 @@ Change the values of any fields you do want to change. Please view the installa
4847
4948
This document will go into detail what each of these configurations mean.
5049

50+
***********
5151
Asset Store
52-
-----------
52+
***********
5353

5454
Synapse
55-
~~~~~~~
55+
========
5656
This describes where assets such as manifests are stored and the configurations of the asset store is described
5757
under the asset store section.
5858

5959
* master_fileview_id: Synapse ID of the file view listing all project data assets.
6060
* config: Path to the synapse config file, either absolute or relative to this file. Note, if you use `synapse config` command, you will have to provide the full path to the configuration file.
6161
* manifest_basename: Base name that manifest files will be saved as on Synapse. The Component will be appended to it so for example: `synapse_storage_manifest_biospecimen.csv`
6262

63+
**********
6364
Manifest
64-
--------
65+
**********
6566
This describes information about manifests as it relates to generation and validation. Note: some of these configurations can be overwritten by the CLI commands.
6667

6768
* manifest_folder: Location where manifests will saved to. This can be a relative or absolute path on your local machine.
6869
* title: Title or title prefix given to generated manifest(s). This is used to name the manifest file saved locally.
6970
* data_type: Data types of manifests to be generated or data type (singular) to validate manifest against. If you wanted all the available manifests, you can input "all manifests"
7071

71-
72+
******
7273
Model
73-
-----
74+
******
7475
Describes the location of your schema
7576

7677
* location: This is the location of your schema jsonld, it must be a path relative to this file or absolute path. Currently URL's are NOT supported, so you will have to download the jsonld data model. Here is an example: https://raw.githubusercontent.com/ncihtan/data-models/v24.9.1/HTAN.model.jsonld
7778

79+
*************
7880
Google Sheets
79-
-------------
81+
*************
8082
Schematic leverages the Google API to generate manifests. This section is for using google sheets with Schematic
8183

8284
* service_acct_creds: Path to the google service account creds, either absolute or relative to this file. This is the path to the service account credentials file that you download from Google Cloud Platform.

docs/source/index.rst

Lines changed: 21 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,9 @@
55
66
.. _index:
77

8+
######################################
89
Welcome to Schematic's documentation!
9-
=====================================
10+
######################################
1011

1112
.. warning::
1213
This documentation site is a work in progress, and the sublinks may change. Apologies for the inconvenience.
@@ -28,15 +29,16 @@ Schematic tackles these goals:
2829
:depth: 2
2930
:local:
3031

32+
*******************
3133
Important Concepts
32-
------------------
34+
*******************
3335

3436
.. important::
3537

3638
Before moving reading more about schematic, this section covers essential concepts relevant for using the Schematic tool effectively.
3739

3840
Synapse FileViews
39-
~~~~~~~~~~~~~~~~~
41+
=================
4042
Users are responsible for setting up a **FileView** that integrates with Schematic. Note that FileViews appear under the "Tables" tab in Synapse and can be named according to the project's needs. For instance, a FileView for the **Project A** could have a different name than a FileView for the **Project B**.
4143

4244
For more information on Synapse projects, visit:
@@ -45,17 +47,17 @@ For more information on Synapse projects, visit:
4547
- `Synapse annotations <https://help.synapse.org/docs/Annotating-Data-With-Metadata.2667708522.html>`_
4648

4749
Synapse Folders
48-
~~~~~~~~~~~~~~~
50+
================
4951

5052
Folders in Synapse allow users to organize data within projects. More details on uploading and organizing data can be found at `Synapse folders <https://help.synapse.org/docs/Uploading-and-Organizing-Data-Into-Projects,-Files,-and-Folders.2048327716.html>`_
5153

5254
Synapse Datasets
53-
~~~~~~~~~~~~~~~~
55+
================
5456

5557
This is an object in Synapse which appears under the "Dataset" tab and represents a user-defined collection of Synapse files and versions. https://help.synapse.org/docs/Datasets.2611281979.html
5658

5759
JSON-LD
58-
~~~~~~~
60+
=======
5961
JSON-LD is a lightweight Linked Data format. The usage of JSON-LD to capture our data models
6062
extends beyond the creation, validation, and submission of annotations/manifests into Synapse
6163
It can create relationships between different data models and, in the future, drive
@@ -64,22 +66,22 @@ and their relationships is also possible which allows the community to see the d
6466
connections between all the data uploaded into Synapse.
6567

6668
Manifest
67-
~~~~~~~~
69+
========
6870

6971
A manifest is a structured file that contains metadata about files under a "top level folder".
7072
The metadata includes information of the files such as data type and etc.
7173
The manifest can also used to annotate the data on Synapse and create a file view
7274
that enables the FAIR principles on each of the files in the "top level folder".
7375

7476
Component/Data type
75-
~~~~~~~~~~~~~~~~~~~
77+
===================
7678
"component" and "data type" are used interchangeably. The component/data type is determined from the specified JSON-LD data model.
7779
If the string "component" exists in the depends on column, the "Attribute" value in that row is a data type.
7880
Examples of a data type is "Biospecimen", "Patient": https://github.com/Sage-Bionetworks/schematic/blob/develop/tests/data/example.model.csv#L3.
7981
Each data type/component should a manifest template that has different columns.
8082

8183
Project Data Layout
82-
~~~~~~~~~~~~~~~~~~~
84+
===================
8385

8486
Regardless of data layout, the data in your Synapse Project(s) are uploaded into Synapse Folders to be curated and annotated by schematic.
8587
In both layouts listed below, the project administrators along with the data contributors may have preferences on how the
@@ -93,19 +95,19 @@ different things under these two layouts.
9395

9496
In both of these layouts, these are really just groupings of resources.
9597

96-
98+
*******************
9799
Schematic services
98-
------------------
100+
*******************
99101

100102
The following are the four main endpoints that assist with the high-level goals outlined above, with additional goals to come.
101103

102104
Manifest Generation
103-
~~~~~~~~~~~~~~~~~~~
105+
===================
104106

105107
Provides a manifest template for users for a particular project or data type. If a project with annotations already exists, a semi-filled-out template can be provided to the user. This ensures they do not start from scratch. If there are no existing annotations and manifests, an empty manifest template is provided.
106108

107-
Manifest Validation
108-
~~~~~~~~~~~~~~~~~~~
109+
Validating a Manifest
110+
=====================
109111

110112
Given a filled-out manifest:
111113

@@ -116,7 +118,7 @@ Given a filled-out manifest:
116118
- Validation results are provided before the manifest file is uploaded into Synapse.
117119

118120
Manifest Submission
119-
~~~~~~~~~~~~~~~~~~~
121+
===================
120122

121123
Given a filled out manifest, this will allow you to submit the manifest to the "top level folder".
122124
This is validates the manifest and...
@@ -130,13 +132,13 @@ This is validates the manifest and...
130132
More validation documentation can be found here: https://sagebionetworks.jira.com/wiki/spaces/SCHEM/pages/3302785036/Schematic+Validation
131133

132134
Data Model Visualization
133-
~~~~~~~~~~~~~~~~~~~~~~~~
135+
========================
134136

135137
These endpoints allows you to visulize your data models and their relationships with each other.
136138

137-
139+
**************
138140
API reference
139-
-------------
141+
**************
140142

141143
For the entire Python API reference documentation, you can visit the docs here: https://sage-bionetworks.github.io/schematic/
142144

@@ -155,3 +157,4 @@ For the entire Python API reference documentation, you can visit the docs here:
155157
troubleshooting
156158
cli_reference
157159
linkml
160+
jsonschema_generation

0 commit comments

Comments
 (0)