Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions docs/explanations/manifest_csv.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ The format of the manifest file is a comma-separated value (CSV) file with one r
| parentId | Synapse ID of parent | syn1235 |

!!! note
The legacy TSV manifest used a column named `parent`. The CSV manifest uses `parentId` instead, which is consistent with the Synapse REST API field name. If you are migrating an existing TSV manifest to CSV, rename the `parent` column to `parentId`.
The legacy TSV manifest used the columns `parent` and `id`, while the CSV manifest uses `parentId` and `ID` to align with Synapse REST API field names. If you’re migrating a TSV manifest to CSV, you’ll need to rename `parent` to `parentId` and `id` to `ID`.

### Standard fields

Expand Down Expand Up @@ -60,11 +60,11 @@ Any columns that are not in the standard or metadata fields described above will

Adding annotations to each row:

| path | parentId | annot1 | annot2 | annot3 | annot4 | annot5 |
| --- | --- | --- | --- | --- | --- | --- |
| /path/file1.txt | syn1243 | bar | 3.1415 | "aaaa, bbbb" | "[14,27,30]" | "Annotation, with a comma" |
| /path/file2.txt | syn12433 | baz | 2.71 | value_1 | "[1,2,3]" | test 123 |
| /path/file3.txt | syn12455 | zzz | 3.52 | value_3 | "[42,56,77]" | a single annotation |
| path | parentId | annot1 | annot2 | annot3 | annot4 | annot5 | annot6 |
| --- | --- | --- | --- | --- | --- | --- | --- |
| /path/file1.txt | syn1243 | bar | 3.1415 | "aaaa, bbbb" | "[14,27,30]" | "Annotation, with a comma" | "True" |
| /path/file2.txt | syn12433 | baz | 2.71 | value_1 | "[1,2,3]" | string without commas | "[True,False]" |
| /path/file3.txt | syn12455 | zzz | 3.52 | value_3 | "[42,56,77]" | a_single_string | |

#### Multiple values of annotations per key

Expand Down
76 changes: 52 additions & 24 deletions docs/tutorials/python/download_data_in_bulk.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ With a project that has this example layout:
In this tutorial you will:

1. Download all files/folder from a project
1. Control manifest CSV generation during download
1. Download all files/folders for a specific folder within the project
1. Loop over all files/folders on the project/folder object instances

Expand All @@ -44,48 +45,75 @@ another desired directory exists.

#### First let's set up some constants we'll use in this script
```python
{!docs/tutorials/python/tutorial_scripts/download_data_in_bulk.py!lines=5-19}
--8<-- "docs/tutorials/python/tutorial_scripts/download_data_in_bulk.py:setup"
```

#### Next we'll create an instance of the Project we are going to sync
```python
{!docs/tutorials/python/tutorial_scripts/download_data_in_bulk.py!lines=20-22}
--8<-- "docs/tutorials/python/tutorial_scripts/download_data_in_bulk.py:get_project"
```

#### Finally we'll sync the project from synapse to your local machine
```python
{!docs/tutorials/python/tutorial_scripts/download_data_in_bulk.py!lines=23-28}
--8<-- "docs/tutorials/python/tutorial_scripts/download_data_in_bulk.py:sync_project"
```

<details class="example">
<summary>While syncing your project you'll see results like:</summary>
```
Syncing Project (syn53185532:My uniquely named project about Alzheimer's Disease) from Synapse.
Syncing Folder (syn53205630:experiment_notes) from Synapse.
Syncing Folder (syn53205632:notes_2022) from Synapse.
Syncing Folder (syn53205629:single_cell_RNAseq_batch_1) from Synapse.
Syncing Folder (syn53205656:single_cell_RNAseq_batch_2) from Synapse.
Syncing Folder (syn53205631:notes_2023) from Synapse.
Downloading [####################]100.00% 4.0bytes/4.0bytes (1.8kB/s) fileA.txt Done...
Downloading [####################]100.00% 3.0bytes/3.0bytes (1.1kB/s) SRR92345678_R1.fastq.gz Done...
Downloading [####################]100.00% 4.0bytes/4.0bytes (1.7kB/s) SRR12345678_R1.fastq.gz Done...
Downloading [####################]100.00% 4.0bytes/4.0bytes (1.9kB/s) fileC.txt Done...
Downloading [####################]100.00% 4.0bytes/4.0bytes (2.7kB/s) fileB.txt Done...
Downloading [####################]100.00% 4.0bytes/4.0bytes (2.7kB/s) SRR12345678_R2.fastq.gz Done...
Downloading [####################]100.00% 4.0bytes/4.0bytes (2.6kB/s) SRR12345678_R2.fastq.gz Done...
Downloading [####################]100.00% 4.0bytes/4.0bytes (1.8kB/s) SRR12345678_R1.fastq.gz Done...
Downloading [####################]100.00% 3.0bytes/3.0bytes (1.5kB/s) SRR92345678_R2.fastq.gz Done...
Downloading [####################]100.00% 4.0bytes/4.0bytes (1.6kB/s) fileD.txt Done...
['single_cell_RNAseq_batch_2', 'single_cell_RNAseq_batch_1', 'experiment_notes']
[syn74583648:My uniquely named project about Alzheimer's Disease]: Syncing Project from Synapse.
[syn74584000:biospecimen_experiment_1]: Syncing Folder from Synapse.
[syn74584007:single_cell_RNAseq_batch_2]: Syncing Folder from Synapse.
[syn74584001:biospecimen_experiment_2]: Syncing Folder from Synapse.
[syn74584006:single_cell_RNAseq_batch_1]: Syncing Folder from Synapse.
[syn74584146]: Downloaded to <your_DIRECTORY_TO_SYNC_PROJECT_TO>/biospecimen_experiment_1/fileB.png
[syn74584154]: Downloaded to <your_DIRECTORY_TO_SYNC_PROJECT_TO>/biospecimen_experiment_2/fileD.png
[syn74584155]: Downloaded to <your_DIRECTORY_TO_SYNC_PROJECT_TO>/biospecimen_experiment_2/fileC.png
[syn74584188]: Downloaded to <your_DIRECTORY_TO_SYNC_PROJECT_TO>/single_cell_RNAseq_batch_1/SRR12345678_R1.fastq.png
[syn74584147]: Downloaded to <your_DIRECTORY_TO_SYNC_PROJECT_TO>/biospecimen_experiment_1/fileA.png
[syn74584206]: Downloaded to <your_DIRECTORY_TO_SYNC_PROJECT_TO>/single_cell_RNAseq_batch_2/SRR12345678_R1.fastq.png
[syn74584189]: Downloaded to <your_DIRECTORY_TO_SYNC_PROJECT_TO>/single_cell_RNAseq_batch_1/SRR12345678_R2.fastq.png
[syn74584207]: Downloaded to <your_DIRECTORY_TO_SYNC_PROJECT_TO>/single_cell_RNAseq_batch_2/SRR12345678_R2.fastq.png
Downloading files: 100%|████████████████████| 1.31M/1.31M [00:02<00:00, 606kB/s]
Project(id='syn74583648', name="My uniquely named project about Alzheimer's Disease", files=[], folders=[
Folder(id='syn74584000', name='biospecimen_experiment_1', parent_id='syn74583648', files=[
File(id='syn74584147', name='fileA.png', path='<DIRECTORY_TO_SYNC_PROJECT_TO>/biospecimen_experiment_1/fileA.png', parent_id='syn74584000', ...),
File(id='syn74584146', name='fileB.png', path='<DIRECTORY_TO_SYNC_PROJECT_TO>/biospecimen_experiment_1/fileB.png', parent_id='syn74584000', ...)
], folders=[], ...),
Folder(id='syn74584001', name='biospecimen_experiment_2', parent_id='syn74583648', files=[
File(id='syn74584155', name='fileC.png', path='<DIRECTORY_TO_SYNC_PROJECT_TO>/biospecimen_experiment_2/fileC.png', parent_id='syn74584001', ...),
File(id='syn74584154', name='fileD.png', path='<DIRECTORY_TO_SYNC_PROJECT_TO>/biospecimen_experiment_2/fileD.png', parent_id='syn74584001', ...)
], folders=[], ...),
Folder(id='syn74584006', name='single_cell_RNAseq_batch_1', parent_id='syn74583648', files=[
File(id='syn74584188', name='SRR12345678_R1.fastq.png', path='<DIRECTORY_TO_SYNC_PROJECT_TO>/single_cell_RNAseq_batch_1/SRR12345678_R1.fastq.png', parent_id='syn74584006', ...),
File(id='syn74584189', name='SRR12345678_R2.fastq.png', path='<DIRECTORY_TO_SYNC_PROJECT_TO>/single_cell_RNAseq_batch_1/SRR12345678_R2.fastq.png', parent_id='syn74584006', ...)
], folders=[], ...),
Folder(id='syn74584007', name='single_cell_RNAseq_batch_2', parent_id='syn74583648', files=[
File(id='syn74584206', name='SRR12345678_R1.fastq.png', path='<DIRECTORY_TO_SYNC_PROJECT_TO>/single_cell_RNAseq_batch_2/SRR12345678_R1.fastq.png', parent_id='syn74584007', ...),
File(id='syn74584207', name='SRR12345678_R2.fastq.png', path='<DIRECTORY_TO_SYNC_PROJECT_TO>/single_cell_RNAseq_batch_2/SRR12345678_R2.fastq.png', parent_id='syn74584007', ...)
], folders=[], ...)
], ...)
```
</details>

## 2. Download all files/folders for a specific folder within the project
## 2. Control manifest CSV generation during download

By default (`manifest="all"`), `sync_from_synapse` writes a `manifest.csv` into every
synced directory. The manifest.csv is interoperable with sync_to_synapse, the Synapse UI download cart, and `download_list_files`.

Use `manifest="root"` to write a single manifest at the root path, or
`manifest="suppress"` to skip manifest generation entirely.

```python
--8<-- "docs/tutorials/python/tutorial_scripts/download_data_in_bulk.py:sync_project_with_root_manifest"
```

## 3. Download all files/folders for a specific folder within the project

Following the same set of steps let's sync a specific folder

```python
{!docs/tutorials/python/tutorial_scripts/download_data_in_bulk.py!lines=30-36}
--8<-- "docs/tutorials/python/tutorial_scripts/download_data_in_bulk.py:sync_folder"
```

<details class="example">
Expand All @@ -105,12 +133,12 @@ download the content again. If you were to use an `if_collision` of `"overwrite.
you would see that when the content on your machine does not match Synapse the file
will be overwritten.

## 3. Loop over all files/folders on the project/folder object instances
## 4. Loop over all files/folders on the project/folder object instances
Using `sync_from_synapse` will load into memory the state of all Folders and Files
retrieved from Synapse. This will allow you to loop over the contents of your container.

```python
{!docs/tutorials/python/tutorial_scripts/download_data_in_bulk.py!lines=37-47}
--8<-- "docs/tutorials/python/tutorial_scripts/download_data_in_bulk.py:loop_over_project_folder"
```

<details class="example">
Expand Down
38 changes: 31 additions & 7 deletions docs/tutorials/python/tutorial_scripts/download_data_in_bulk.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
Here is where you'll find the code for the downloading data in bulk tutorial.
"""

# --8<-- [start:setup]
import os

import synapseclient
Expand All @@ -16,32 +17,55 @@
DIRECTORY_TO_SYNC_FOLDER_TO = os.path.join(
DIRECTORY_TO_SYNC_PROJECT_TO, FOLDER_NAME_TO_SYNC
)
# --8<-- [end:setup]

# Step 1: Create an instance of the container I want to sync the data from and sync
project = Project(name="My uniquely named project about Alzheimer's Disease")

# We'll set the `if_collision` to `keep.local` so that we don't overwrite any files
# Step 1: Get an instance of the container I want to sync the data from and sync
# --8<-- [start:get_project]
project = Project(name="My uniquely named project about Alzheimer's Disease").get()
# --8<-- [end:get_project]

# By default, sync_from_synapse generates a manifest.csv in each synced directory.
# The manifest.csv is interoperable with sync_to_synapse, the Synapse
# UI download cart, and `download_list_files`.
# --8<-- [start:sync_project]
# We'll set the `if_collision` to `keep.local` so that we don't overwrite any files.
project.sync_from_synapse(path=DIRECTORY_TO_SYNC_PROJECT_TO, if_collision="keep.local")

# Print out the contents of the directory where the data was synced to
# Explore the directory to see the contents have been recursively synced.
print(os.listdir(DIRECTORY_TO_SYNC_PROJECT_TO))
# --8<-- [end:sync_project]
# Or, use `manifest="root"` to generate a single manifest.csv at the root directory
# instead of one in each sub-directory. Use `manifest="suppress"` to skip
# manifest generation entirely.

# --8<-- [start:sync_project_with_root_manifest]
project.sync_from_synapse(
path=DIRECTORY_TO_SYNC_PROJECT_TO,
if_collision="keep.local",
manifest="root",
)
print(os.listdir(DIRECTORY_TO_SYNC_PROJECT_TO))
# --8<-- [end:sync_project_with_root_manifest]

# Step 2: The same as step 1, but for a single folder
# Step 3: The same as step 1, but for a single folder
# --8<-- [start:sync_folder]
folder = Folder(name=FOLDER_NAME_TO_SYNC, parent_id=project.id)

folder.sync_from_synapse(path=DIRECTORY_TO_SYNC_FOLDER_TO, if_collision="keep.local")

print(os.listdir(os.path.expanduser(DIRECTORY_TO_SYNC_FOLDER_TO)))
# --8<-- [end:sync_folder]

# Step 3: Loop over all files/folders on the project/folder object instances
# Step 4: Loop over all files/folders on the project/folder object instances
# --8<-- [start:loop_over_project_folder]
for folder_at_root in project.folders:
print(f"Folder at root: {folder_at_root.name}")

for file_in_root_folder in folder_at_root.files:
print(f"File in {folder_at_root.name}: {file_in_root_folder.name}")

for folder_in_folder in folder_at_root.folders:
print(f"Folder in {folder_at_root.name}: {folder_in_folder.name}")
for file_in_folder in folder_in_folder.files:
print(f"File in {folder_in_folder.name}: {file_in_folder.name}")
# --8<-- [end:loop_over_project_folder]
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ nav:
- Domain Models of Synapse: explanations/domain_models_of_synapse.md
- Access Control: explanations/access_control.md
- Properties vs Annotations: explanations/properties_vs_annotations.md
- Manifest TSV: explanations/manifest_tsv.md
- Manifest CSV: explanations/manifest_csv.md
- Benchmarking: explanations/benchmarking.md
- Structuring Your Project: explanations/structuring_your_project.md
- Asyncio Changes in Python 3.14: explanations/asyncio_in_python_3_14.md
Expand Down
Loading
Loading