Skip to content

Latest commit

 

History

History
119 lines (83 loc) · 8.43 KB

File metadata and controls

119 lines (83 loc) · 8.43 KB

Manifest CSV

The manifest is a CSV file with file locations and metadata used to bulk upload and download files in Synapse. It is the standard manifest format used by Project.sync_from_synapse, Project.sync_to_synapse, Folder.sync_from_synapse, Folder.sync_to_synapse, the Synapse UI download cart, and the synapse get-download-list CLI command.

!!! note This CSV manifest replaces the legacy TSV manifest produced by synapseutils.syncFromSynapse. The syncFromSynapse and syncToSynapse utility functions are deprecated and will be removed in v5.0.0. Use Project.sync_from_synapse / Folder.sync_from_synapse and Project.sync_to_synapse / Folder.sync_to_synapse instead. See the legacy TSV manifest documentation for details on the old format.

Manifest file format

The format of the manifest file is a comma-separated value (CSV) file with one row per file and columns describing the file. The minimum required columns for uploading are path and parentId, where path is the local file path and parentId is the Synapse ID of the project or folder where the file is uploaded to. Values that contain commas are automatically quoted (e.g., "hello, world").

Required fields for upload

Field Meaning Example
path local file path or URL /path/to/local/file.txt
parentId Synapse ID of parent syn1235

!!! note The legacy TSV manifest used the columns parent and id, while the CSV manifest uses parentId and ID to align with Synapse REST API field names. If you’re migrating a TSV manifest to CSV, you’ll need to rename parent to parentId and id to ID.

Standard fields

These columns are recognized by sync_to_synapse and have specific meaning. Any of these columns may be present in the manifest but only path and parentId are required for upload. Each of these are individual examples and is what you would find in a row in each of these columns. To clarify, "syn1235;/path/to_local/file.txt" below states that you would like both "syn1235" and "/path/to_local/file.txt" added as items used to generate a file. You can also specify one item by specifying "syn1234"

Field Meaning Example
path local file path or URL /path/to/local/file.txt
parentId Synapse ID of parent container syn1235
ID Synapse entity ID syn2345
name name of file in Synapse Example_file
synapseStore whether to upload the file True
contentType content type of file to overwrite defaults text/html
forceVersion whether to update version False
activityName name of activity in provenance Ran normalization
activityDescription text description of what was done Ran algorithm xyz with parameters...
used list of items used to generate file syn1235;/path/to_local/file.txt
executed list of items executed https://github.org/;/path/to_local/code.py

Metadata fields (ignored during upload)

These columns are present in manifests generated by the Synapse UI download cart and synapse get-download-list CLI. They are ignored by sync_to_synapse and are not treated as annotations.

Field Meaning
error any error in downloading file
versionNumber version of the file
dataFileSizeBytes size of the file in bytes
createdBy user who created the file
createdOn date the file was created
modifiedBy user who last modified
modifiedOn date last modified
synapseURL URL to the file in Synapse
dataFileMD5Hex MD5 hash of the file

Annotations

Any columns that are not in the standard or metadata fields described above will be interpreted as annotations of the file.

Adding annotations to each row:

path parentId annot1 annot2 annot3 annot4 annot5 annot6
/path/file1.txt syn1243 bar 3.1415 "aaaa, bbbb" "[14,27,30]" "Annotation, with a comma" "True"
/path/file2.txt syn12433 baz 2.71 value_1 "[1,2,3]" string without commas "[True,False]"
/path/file3.txt syn12455 zzz 3.52 value_3 "[42,56,77]" a_single_string

Multiple values of annotations per key

Using multiple values for a single annotation should be used sparingly as it makes it more difficult for you to manage the data. However, it is supported.

Annotations can be comma , separated lists surrounded by brackets [].

Because the manifest is a CSV file, multi-value annotations that contain commas are automatically quoted. For example, [a,b,c] will appear in the CSV as "[a,b,c]".

This is an annotation with 3 values:

path parentId annot1
/path/file1.txt syn1243 "[a,b,c]"

Dates in the manifest file

Dates within the manifest file will always be written as ISO 8601 format in UTC without milliseconds. For example: 2023-12-20T16:55:08Z.

Dates can be written in other formats specified in ISO 8601 and they will be recognized. However, sync_from_synapse will always write dates in the UTC format specified above. For example, you may want to specify a datetime at a specific timezone like 2023-12-20 23:55:08-07:00 and this will be recognized as a valid datetime.

Manifest sources

The CSV manifest format is shared across multiple tools:

Source Filename
Project.sync_from_synapse / Folder.sync_from_synapse manifest.csv
Synapse UI download cart manifest.csv
CLI synapse get-download-list manifest_<timestamp>.csv

A manifest generated by any of these sources can be used as input to sync_to_synapse, provided the path column is present with valid local file paths. Manifests from the Synapse UI do not include a path column by default, so users must add it before uploading.

Example manifest file

path parentId ID name annot1 annot2 collection_date used executed
/path/file1.txt syn1243 syn5001 file1.txt bar 3.1415 2023-12-04T07:00:00Z syn124;/path/file2.txt https://github.org/foo/bar
/path/file2.txt syn12433 syn5002 file2.txt baz 2.71 2001-01-01T08:00:00Z https://github.org/foo/baz
/path/file3.txt syn12455 syn5003 file3.txt zzz 3.52 2023-12-04T07:00:00Z https://github.org/foo/zzz

References

  • [Project.sync_from_synapse][synapseclient.models.Project.sync_from_synapse]
  • [Project.sync_to_synapse][synapseclient.models.Project.sync_to_synapse]
  • [Folder.sync_from_synapse][synapseclient.models.Folder.sync_from_synapse]
  • [Folder.sync_to_synapse][synapseclient.models.Folder.sync_to_synapse]
  • Manifest TSV (legacy)
  • Managing custom metadata at scale