-
Notifications
You must be signed in to change notification settings - Fork 0
381 update import script to json payload #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
scripts/api/api.py
Outdated
| headers=headers, | ||
| data=data_file_data, | ||
| ) | ||
| print(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it should print. Maybe a log.info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scripts/api/api.py
Outdated
| headers=headers, | ||
| data=metadata_file_data, | ||
| ) | ||
| print(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. Now that I think about it, we probably want to return the status or success of r?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| def diverge_files(self, path): | ||
| basename = os.path.basename(path) | ||
| is_data_file = self.DATAFILE.match(basename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about calling these vars data_file and metadata_file? It's not a boolean expression, I believe it returns the matched regex or None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| is_metadata_file = self.METADATA.match(basename) | ||
|
|
||
| if is_data_file: | ||
| file_extension_to_dict = self.DATAFILE.match(basename).groupdict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use the var declared on 26 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| is_metadata_file = self.METADATA.match(basename) | ||
|
|
||
| if is_data_file: | ||
| file_extension_to_dict = self.DATAFILE.match(basename).groupdict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| file_extension_to_dict = self.DATAFILE.match(basename).groupdict() | |
| data_file_matches = self.DATAFILE.match(basename).groupdict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| return self.process_data_file(path, file_extension_to_dict) | ||
| if is_metadata_file: | ||
| metadata_file_extension = self.METADATA.match(basename).groupdict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| metadata_file_extension = self.METADATA.match(basename).groupdict() | |
| metadata_file_matches = self.METADATA.match(basename).groupdict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self.data_file_list = data_file_list | ||
| self.metadata_file_list = metadata_file_list | ||
|
|
||
| def diverge_files(self, path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def diverge_files(self, path): | |
| def process_file(self, path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@Elkrival Is there an example config containing the new URL directive pointing to the API endpoint? I couldn't find it. |
17a12ba to
7858a39
Compare
|
@ngmaloney 733d948 this commit updates the example config |
build/scripts-3.11/import.py
Outdated
| parser = ap.ArgumentParser() | ||
| parser.add_argument('-c', '--config') | ||
| parser.add_argument('-d', '--dbname', default='dpdata') | ||
| parser.add_argument('-v', '--verbose', action='store_true') | ||
| parser.add_argument('expr') | ||
| args = parser.parse_args() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I'm a fan of the argparse library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
build/scripts-3.11/import.py
Outdated
| # dirname = os.path.dirname(f) | ||
| # basename = os.path.basename(f) | ||
| # # probe for dpdash-compatibility and gather information | ||
| # probe = dpimport.probe(f) | ||
| # if not probe: | ||
| # logger.debug('document is unknown %s', basename) | ||
| # continue | ||
| # # nothing to be done | ||
| # if db.exists(probe): | ||
| # logger.info('document exists and is up to date %s', probe['path']) | ||
| # continue | ||
| # logger.info('document does not exist or is out of date %s', probe['path']) | ||
| # # import the file | ||
| # logger.info('importing file %s', f) | ||
| # dppylib.import_file(db.db, probe) | ||
|
|
||
| # logger.info('cleaning metadata') | ||
| # lastday = get_lastday(db.db) | ||
| # if lastday: | ||
| # clean_metadata(db.db, lastday) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # dirname = os.path.dirname(f) | |
| # basename = os.path.basename(f) | |
| # # probe for dpdash-compatibility and gather information | |
| # probe = dpimport.probe(f) | |
| # if not probe: | |
| # logger.debug('document is unknown %s', basename) | |
| # continue | |
| # # nothing to be done | |
| # if db.exists(probe): | |
| # logger.info('document exists and is up to date %s', probe['path']) | |
| # continue | |
| # logger.info('document does not exist or is out of date %s', probe['path']) | |
| # # import the file | |
| # logger.info('importing file %s', f) | |
| # dppylib.import_file(db.db, probe) | |
| # logger.info('cleaning metadata') | |
| # lastday = get_lastday(db.db) | |
| # if lastday: | |
| # clean_metadata(db.db, lastday) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| DPimport is a command line tool for importing files into DPdash using a | ||
| simple [`glob`](https://en.wikipedia.org/wiki/Glob_(programming)) expression. | ||
| simple [`glob`](<https://en.wikipedia.org/wiki/Glob_(programming)>) expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need the <>s? Is it because of the parens in the url?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were pushes done to main so maybe this was done for a reason. I'm unsure what.
|
|
||
|
|
||
|
|
||
| ## MongoDB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth noting something like "This used to require Mongo but doesn't anymore"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The app still uses mongo, but we don't write to it directly anymore
build/scripts-3.11/import.py
Outdated
| logging.basicConfig(level=level) | ||
|
|
||
| with open(os.path.expanduser(args.config), 'r') as fo: | ||
| config = yaml.load(fo, Loader=yaml.SafeLoader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config isn't used anywhere, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it's used in the import.py file within the scripts directory
build/scripts-3.11/import.py
Outdated
| studies[subject['_id']['study']] = {} | ||
| studies[subject['_id']['study']]['subject'] = [] | ||
| studies[subject['_id']['study']]['max_day'] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subject['_id']['study'] is referenced a lot in this loop. Feels like it would be worth it to both give it a clear name/reason and also to reduce noise on any given line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build/scripts-3.11/import.py
Outdated
| { | ||
| '_id' : True, | ||
| 'collection' : True, | ||
| 'synced' : True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this line request that the sync status be part of the result set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, but this code is not part of the work
655c34d
build/scripts-3.11/import.py
Outdated
| if doc['synced'] is False and 'collection' in doc: | ||
| db[doc['collection']].drop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it look like you're dropping the whole collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not part of the original work, it's been removed 655c34d
build/scripts-3.11/import.py
Outdated
| subject_metadata['days'] = subject['days'] | ||
| subject_metadata['study'] = subject['_id']['study'] | ||
|
|
||
| studies[subject['_id']['study']]['max_day'] = studies[subject['_id']['study']]['max_day'] if (studies[subject['_id']['study']]['max_day'] >= subject['days'] ) else subject['days'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like too much for one line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is outdated it's been removed
655c34d
scripts/api/api.py
Outdated
| def create_data_file(api_url, data_file_data): | ||
| request_url = api_url + "day" | ||
| headers = {"content-type": "application/json"} | ||
| r = requests.post( | ||
| request_url, | ||
| headers=headers, | ||
| data=data_file_data, | ||
| ) | ||
| status = r.status_code | ||
| if status != 200: | ||
| response = r.json()["message"] | ||
| logging.info(response) | ||
| else: | ||
| response = r.json()["data"] | ||
| logging.info(response) | ||
|
|
||
|
|
||
| def create_metadata_file(api_url, metadata_file_data): | ||
| request_url = api_url + "metadata" | ||
| headers = {"content-type": "application/json"} | ||
| r = requests.post( | ||
| request_url, | ||
| headers=headers, | ||
| data=metadata_file_data, | ||
| ) | ||
| status = r.status_code | ||
| if status != 200: | ||
| response = r.json()["message"] | ||
| logging.info(response) | ||
| else: | ||
| response = r.json()["data"] | ||
| logging.info(response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, these look identical aside from the url's postfix. Could you move this functionality into a common function that both create_data_file and create_metadata_file both call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Added service that parses data csv and metadata csv * Using the new data structures the service parses the data and creats a json * Added tests to the service Updated import script for v2 * Removed all files that parse mongodb and removed dependency * Added service that handles incoming csv data for metadata and participant csv data * Service also has a method to convert data to json * Added api request with a configuration for url Add requests * Added requests dependency Updates to script to add hash collection * Added hash collection
655c34d to
9fdfcc4
Compare
Test CSV Load generator
Add authentication to request
* Subject ID and uppercase Study keys needed to be updated for import
* updated key in metadata * Updated tests
* Handle infinity and nan values * If compute error, import as string values
* Updates to import
* Updates for new data migration structure
* Updated tests
* Test consent and synced
* extract variables from assessment
|
@Elkrival what's the status of this PR? Does it need to be rebased/merged or closed? |
* removed print
* added variables as dictionaries
* Pr comments
Extract variables
* print structure
When connecting with local/self-signed/dev environments it is sometimes necessary to bypass SSL verification. This commit adds a config option to do so but it is not the default nor is it considered secure/best-practice.
This should also speed up the import API.
Omit verbosity
This pr updates the script to parse csv files and makes a requests to the api. It no longer depends on pymongo to parse the database and make the updates.
The pr removes unused code, and it adds a service that collects the parsed data, when the data is collected it then makes a request to the backend node api to insert the data.
It also adds a configuration of api_url which points to the server route.
Tests and more tests.
Figma link to workflow. https://www.figma.com/file/EYWZPBDHD25Of0KUkwMksC/import-Workflow?type=whiteboard&node-id=0-1&t=DDCkCP8SeU9rSRy6-0