Skip to content

Commit 322cf42

Browse files
authored
Merge pull request #1014 from Sage-Bionetworks/develop
Release 22.11.2
2 parents 8a27c0c + b48329c commit 322cf42

File tree

18 files changed

+1094
-93
lines changed

18 files changed

+1094
-93
lines changed

.github/workflows/test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ jobs:
118118
run: >
119119
source .venv/bin/activate;
120120
pytest --cov-report=term --cov-report=html:htmlcov --cov=schematic/
121-
-m "not (google_credentials_needed or rule_combos)"
121+
-m "not (google_credentials_needed or rule_combos or schematic_api)"
122122
123123
- name: Upload pytest test results
124124
uses: actions/upload-artifact@v2

README.md

Lines changed: 28 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ This command will install the dependencies based on what we specify in poetry.lo
7474
5. Fill in credential files:
7575
*Note*: If you won't interact with Synapse, please ignore this section.
7676

77-
There are two main configuration files that need to be edited :
77+
There are two main configuration files that need to be edited:
7878
[config.yml](https://github.com/Sage-Bionetworks/schematic/blob/develop/config.yml)
7979
and [synapseConfig](https://raw.githubusercontent.com/Sage-Bionetworks/synapsePythonClient/v2.3.0-rc/synapseclient/.synapseConfig)
8080

@@ -88,6 +88,8 @@ editor of your choice and edit the `username` and `authtoken` attribute under th
8888
8989
<strong>Configure config.yml File</strong>
9090

91+
*Note*: Below is only a brief explanation of some attributes in `config.yml`. <strong>Please use the link [here](https://github.com/Sage-Bionetworks/schematic/blob/develop/config.yml) to get the latest version of `config.yml` in `develop` branch</strong>.
92+
9193
Description of `config.yml` attributes
9294

9395
definitions:
@@ -104,20 +106,39 @@ Description of `config.yml` attributes
104106
service_acct_creds: "syn25171627" # synapse ID of service_account_creds.json file
105107

106108
manifest:
107-
title: "Patient Manifest " # title of metadata manifest file
108-
data_type: "Patient" # component or data type from the data model
109+
title: "example" # title of metadata manifest file
110+
# to make all manifests enter only 'all manifests'
111+
data_type:
112+
- "Biospecimen"
113+
- "Patient"
109114

110115
model:
111116
input:
112117
location: "data/schema_org_schemas/example.jsonld" # path to JSON-LD data model
113118
file_type: "local" # only type "local" is supported currently
114-
validation_schema: "~/path/to/validation_schema.json" # path to custom JSON Validation Schema JSON file
115-
log_location: "~/path/to/log_folder/validation_schema.json" # auto-generated JSON Validation Schemas can be logged
116-
119+
style: # configuration of google sheet
120+
google_manifest:
121+
req_bg_color:
122+
red: 0.9215
123+
green: 0.9725
124+
blue: 0.9803
125+
opt_bg_color:
126+
red: 1.0
127+
green: 1.0
128+
blue: 0.9019
129+
master_template_id: '1LYS5qE4nV9jzcYw5sXwCza25slDfRA1CIg3cs-hCdpU'
130+
strict_validation: true
117131

118132
*Note*: Paths can be specified relative to the `config.yml` file or as absolute paths.
119133

120-
6. Obtain Google credential Files
134+
6. Login to Synapse by using the command line
135+
On the CLI in your virtual environment, run the following command:
136+
```
137+
synapse login -u <synapse username> -p <synapse password> --rememberMe
138+
```
139+
Please make sure that you run the command before running `schematic init` below
140+
141+
7. Obtain Google credential Files
121142

122143
To obtain ``credentials.json`` and ``token.pickle``, please run:
123144

@@ -152,7 +173,6 @@ requires token-based authentication. As browser support that requires the token-
152173
token-based authentication and keep only service account authentication in the future.
153174

154175

155-
156176
### Development process instruction
157177

158178
For new features, bugs, enhancements

api/openapi/api.yaml

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,11 +79,22 @@ paths:
7979
nullable: true
8080
description: ID of view listing all project data assets. E.g. for Synapse this would be the Synapse ID of the fileview listing all data assets for a given project.(i.e. master_fileview in config.yml)
8181
required: false
82+
- in: query
83+
name: output_format
84+
schema:
85+
type: string
86+
enum: ["excel", "google_sheet", "dataframe (only if getting existing manifests)"]
87+
description: If "excel" gets selected, this approach would avoid sending metadata to Google sheet APIs; if "google_sheet" gets selected, this would return a Google sheet URL. This parameter could potentially override sheet_url parameter.
88+
required: false
8289
operationId: api.routes.get_manifest_route
8390
responses:
84-
"201":
85-
description: Googlesheet link created
91+
"200":
92+
description: Googlesheet link created OR an excel file gets returned OR pandas dataframe gets returned
8693
content:
94+
application/vnd.ms-excel:
95+
schema:
96+
type: string
97+
format: binary
8798
application/json:
8899
schema:
89100
type: string
@@ -381,6 +392,13 @@ paths:
381392
description: Title of Manifest
382393
example: Example
383394
required: false
395+
- in: query
396+
name: return_excel
397+
schema:
398+
type: boolean
399+
nullable: true
400+
description: If true, this would return an Excel spreadsheet.(This approach would avoid sending metadata to Google sheet APIs)
401+
required: false
384402
operationId: api.routes.populate_manifest_route
385403
responses:
386404
"200":

api/routes.py

Lines changed: 59 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
import connexion
99
from connexion.decorators.uri_parsing import Swagger2URIParser
10-
from flask import current_app as app, request, g, jsonify
10+
from flask import current_app as app
1111
from werkzeug.debug import DebuggedApplication
1212

1313
from schematic import CONFIG
@@ -25,6 +25,7 @@
2525
import json
2626
from schematic.utils.df_utils import load_df
2727
import pickle
28+
from flask import send_from_directory
2829

2930
# def before_request(var1, var2):
3031
# # Do stuff before your route executes
@@ -196,7 +197,20 @@ def get_temp_jsonld(schema_url):
196197
return tmp_file.name
197198

198199
# @before_request
199-
def get_manifest_route(schema_url, title, oauth, use_annotations, dataset_ids=None, asset_view = None):
200+
def get_manifest_route(schema_url: str, oauth: bool, use_annotations: bool, dataset_ids=None, asset_view = None, output_format=None, title=None):
201+
"""Get the immediate dependencies that are related to a given source node.
202+
Args:
203+
schema_url: link to data model in json ld format
204+
title: title of a given manifest.
205+
oauth: if user wants to use OAuth for Google authentication
206+
dataset_id: Synapse ID of the "dataset" entity on Synapse (for a given center/project).
207+
output_format: contains three option: "excel", "google_sheet", and "dataframe". if set to "excel", return an excel spreadsheet
208+
use_annotations: Whether to use existing annotations during manifest generation
209+
asset_view: ID of view listing all project data assets. For example, for Synapse this would be the Synapse ID of the fileview listing all data assets for a given project.
210+
Returns:
211+
Googlesheet URL (if sheet_url is True), or pandas dataframe (if sheet_url is False).
212+
"""
213+
200214
# call config_handler()
201215
config_handler(asset_view = asset_view)
202216

@@ -238,20 +252,32 @@ def get_manifest_route(schema_url, title, oauth, use_annotations, dataset_ids=No
238252
)
239253

240254

241-
def create_single_manifest(data_type, dataset_id=None):
255+
def create_single_manifest(data_type, title, dataset_id=None, output_format=None):
242256
# create object of type ManifestGenerator
243257
manifest_generator = ManifestGenerator(
244258
path_to_json_ld=jsonld,
245-
title=t,
259+
title=title,
246260
root=data_type,
247261
oauth=oauth,
248262
use_annotations=use_annotations,
249263
alphabetize_valid_values = 'ascending',
250264
)
251265

266+
# if returning a dataframe
267+
if output_format:
268+
if "dataframe" in output_format:
269+
output_format = "dataframe"
270+
252271
result = manifest_generator.get_manifest(
253-
dataset_id=dataset_id, sheet_url=True,
272+
dataset_id=dataset_id, sheet_url=True, output_format=output_format
254273
)
274+
275+
# return an excel file if output_format is set to "excel"
276+
if output_format == "excel":
277+
dir_name = os.path.dirname(result)
278+
file_name = os.path.basename(result)
279+
mimetype='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
280+
return send_from_directory(directory=dir_name, filename=file_name, as_attachment=True, mimetype=mimetype, cache_timeout=0)
255281

256282
return result
257283

@@ -262,22 +288,37 @@ def create_single_manifest(data_type, dataset_id=None):
262288
component_digraph = sg.se.get_digraph_by_edge_type('requiresComponent')
263289
components = component_digraph.nodes()
264290
for component in components:
265-
t = f'{title}.{component}.manifest'
266-
result = create_single_manifest(data_type = component)
267-
all_results.append(result)
291+
if title:
292+
t = f'{title}.{component}.manifest'
293+
else:
294+
t = f'Example.{component}.manifest'
295+
if output_format != "excel":
296+
result = create_single_manifest(data_type=component, output_format=output_format, title=t)
297+
all_results.append(result)
298+
else:
299+
app.logger.error('Currently we do not support returning multiple files as Excel format at once. Please choose a different output format. ')
268300
else:
269301
for i, dt in enumerate(data_type):
270-
if len(data_type) > 1:
271-
t = f'{title}.{dt}.manifest'
272-
else:
273-
t = title
274-
302+
if not title:
303+
t = f'Example.{dt}.manifest'
304+
else:
305+
if len(data_type) > 1:
306+
t = f'{title}.{dt}.manifest'
307+
else:
308+
t = title
275309
if dataset_ids:
276310
# if a dataset_id is provided add this to the function call.
277-
result = create_single_manifest(data_type = dt, dataset_id = dataset_ids[i])
311+
result = create_single_manifest(data_type=dt, dataset_id=dataset_ids[i], output_format=output_format, title=t)
278312
else:
279-
result = create_single_manifest(data_type = dt)
280-
all_results.append(result)
313+
result = create_single_manifest(data_type=dt, output_format=output_format, title=t)
314+
315+
# if output is pandas dataframe or google sheet url
316+
if isinstance(result, str) or isinstance(result, pd.DataFrame):
317+
all_results.append(result)
318+
else:
319+
if len(data_type) > 1:
320+
app.logger.warning(f'Currently we do not support returning multiple files as Excel format at once. Only {t} would get returned. ')
321+
return result
281322

282323
return all_results
283324

@@ -341,7 +382,7 @@ def submit_manifest_route(schema_url, asset_view=None, manifest_record_type=None
341382

342383
return manifest_id
343384

344-
def populate_manifest_route(schema_url, title=None, data_type=None):
385+
def populate_manifest_route(schema_url, title=None, data_type=None, return_excel=None):
345386
# call config_handler()
346387
config_handler()
347388

@@ -355,7 +396,7 @@ def populate_manifest_route(schema_url, title=None, data_type=None):
355396
metadata_model = MetadataModel(inputMModelLocation=jsonld, inputMModelLocationType='local')
356397

357398
#Call populateModelManifest class
358-
populated_manifest_link = metadata_model.populateModelManifest(title=title, manifestPath=temp_path, rootNode=data_type)
399+
populated_manifest_link = metadata_model.populateModelManifest(title=title, manifestPath=temp_path, rootNode=data_type, return_excel=return_excel)
359400

360401
return populated_manifest_link
361402

pyproject.toml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,10 @@ filterwarnings = [
112112
markers = [
113113
"""\
114114
google_credentials_needed: marks tests requiring \
115-
Google credentials (skipped on GitHub CI)\
115+
Google credentials (skipped on GitHub CI) \
116116
""",
117+
"""\
118+
schematic_api: marks tests requiring \
119+
running API locally (skipped on GitHub CI)
120+
"""
117121
]

schematic/manifest/commands.py

Lines changed: 28 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import os
22
import logging
3-
3+
from pathlib import Path
44
import click
55
import click_log
66
import logging
@@ -12,7 +12,7 @@
1212
from schematic.help import manifest_commands
1313
from schematic import CONFIG
1414
from schematic.schemas.generator import SchemaGenerator
15-
from schematic.utils.google_api_utils import export_manifest_csv, export_manifest_excel
15+
from schematic.utils.google_api_utils import export_manifest_csv, export_manifest_excel, export_manifest_drive_service
1616
from schematic.store.synapse import SynapseStorage
1717

1818
logger = logging.getLogger(__name__)
@@ -147,8 +147,26 @@ def create_single_manifest(data_type, output_csv=None, output_xlsx=None):
147147
)
148148

149149
# call get_manifest() on manifest_generator
150+
# if output_xlsx gets specified, output_format = "excel"
151+
if output_xlsx:
152+
output_format = "excel"
153+
154+
# if file name is in the path, and that file does not exist
155+
if not os.path.exists(output_xlsx):
156+
if ".xlsx" or ".xls" in output_xlsx:
157+
path = Path(output_xlsx)
158+
output_path = path.parent.absolute()
159+
else:
160+
logger.error(f"{output_xlsx} does not exists. Please try a valid file path")
161+
162+
else:
163+
output_path = output_xlsx
164+
else:
165+
output_format = None
166+
output_path = None
167+
150168
result = manifest_generator.get_manifest(
151-
dataset_id=dataset_id, sheet_url=sheet_url, json_schema=json_schema,
169+
dataset_id=dataset_id, sheet_url=sheet_url, json_schema=json_schema, output_format = output_format, output_path = output_path
152170
)
153171

154172
if sheet_url:
@@ -160,13 +178,13 @@ def create_single_manifest(data_type, output_csv=None, output_xlsx=None):
160178
if prefix_ext == ".model":
161179
prefix = prefix_root
162180
output_csv = f"{prefix}.{data_type}.manifest.csv"
181+
163182
elif output_xlsx:
164-
export_manifest_excel(output_excel=output_xlsx, manifest=result)
165183
logger.info(
166184
f"Find the manifest template using this Excel file path: {output_xlsx}"
167185
)
168186
return result
169-
export_manifest_csv(file_name=output_csv, manifest=result)
187+
export_manifest_csv(file_path=output_csv, manifest=result)
170188
logger.info(
171189
f"Find the manifest template using this CSV file path: {output_csv}"
172190
)
@@ -184,8 +202,12 @@ def create_single_manifest(data_type, output_csv=None, output_xlsx=None):
184202
result = create_single_manifest(data_type = component)
185203
else:
186204
for dt in data_type:
187-
if len(data_type) > 1:
205+
if len(data_type) > 1 and not output_xlsx:
188206
t = f'{title}.{dt}.manifest'
207+
elif output_xlsx:
208+
if ".xlsx" or ".xls" in output_xlsx:
209+
title_with_extension = os.path.basename(output_xlsx)
210+
t = title_with_extension.split('.')[0]
189211
else:
190212
t = title
191213
result = create_single_manifest(data_type = dt, output_csv=output_csv, output_xlsx=output_xlsx)

0 commit comments

Comments
 (0)