[SYNPY-1799, SYNPY-1804] Add manifest generation to sync_from_synapse#1354
[SYNPY-1799, SYNPY-1804] Add manifest generation to sync_from_synapse#1354
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds manifest CSV generation to StorableContainer.sync_from_synapse() so downloaded content can be re-uploaded via sync_to_synapse using an interoperable manifest.csv format (matching Synapse UI download cart / CLI column naming).
Changes:
- Added manifest CSV helpers and integrated manifest generation into
StorableContainer.sync_from_synapse_async()withmanifest={all,root,suppress}. - Deprecated legacy
synapseutils.syncFromSynapse()to steer users toStorableContainer.sync_from_synapse. - Added/updated unit tests around manifest behavior and the deprecation warning.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
synapseclient/models/mixins/storable_container.py |
Adds CSV manifest generation helpers and hooks manifest generation into sync flow. |
synapseclient/models/protocols/storable_container_protocol.py |
Updates protocol signature/docs to include manifest parameter. |
synapseutils/sync.py |
Marks syncFromSynapse as deprecated with guidance to use StorableContainer.sync_from_synapse. |
tests/unit/synapseclient/models/async/unit_test_folder_async.py |
Adds tests for manifest behavior and CSV helper functions. |
tests/unit/synapseutils/unit_test_synapseutils_sync.py |
Adds a test asserting the deprecation warning is emitted. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks @danlu1! I left some comments. You'll also need to update the docs folder:
|
…eneration for recursive calls
| } | ||
| if entity.annotations: | ||
| for key, val in entity.annotations.items(): | ||
| annotation_keys.add(key) |
There was a problem hiding this comment.
Can you remind me, are the keys in the dict right above restricted in Synapse as annotation keys? For example, if a file had an annotation of name this would overwrite the name value in the above dict.
There was a problem hiding this comment.
There was a problem hiding this comment.
I think that Synapse doesn't allow these to be annotation keys in the first place. Pandas definitely doesn't allow multiple columns with the same name. At some point Pandas would change the name of the second name column (name1?) I think. Either way, I'm wondering if we shoudl raise an exception here if that happens.
There was a problem hiding this comment.
Dan will check the reserved annotation keys.
There was a problem hiding this comment.
@andrewelamb We do have property keys setup:
synapsePythonClient/synapseclient/entity.py
Lines 88 to 105 in 8cd7c05
name is one of them. As this doc, properties always take precedence over annotations in lookups. So I think we don't have to worry about the edge case here.
merge upstream changes
| link_hops: int = 1, | ||
| queue: asyncio.Queue = None, | ||
| include_types: Optional[List[str]] = None, | ||
| manifest: Literal["all", "root", "suppress"] = "all", |
There was a problem hiding this comment.
This doesn't match the order of the literal in storable container.py. I would suggest making the literal a global in storable_container.py and then importing it here.
There was a problem hiding this comment.
I did the other way around: define ManifestSetting in the protocol and then import it in the storable_container.py to fix circular import issue.
|
@danlu1 I just had a few suggestions, but it looks good otherwise, so I approved! |
merge upstream changes

Problem:
StorableContainer.sync_from_synapse() does not generate a manifest file after downloading content from Synapse. We want to add manifest generation similar to synapseutils.syncFromSynapse() so users can use that manifest for uploading data via sync_to_synapse.
Solution:
_extract_entity_metadata_for_manifest_csv()— extracts per-file rows and discovers annotation column keys dynamically_write_manifest_data_csv()— writes the CSV using DictWritergenerate_manifest_csv()— public entry point, skips gracefully if path or files are absentstorable_container_protocol.pywas updated to keep the sync and async signatures in alignment.Testing:
Unit tests and integration tests were added.