-
Notifications
You must be signed in to change notification settings - Fork 26
Hotfix for non-string fields in json data #455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -53,6 +53,10 @@ def __init__( | |||
stderr.print(f"[red] json data file {json_file} does not exist") | |||
sys.exit(1) | |||
self.json_data = relecov_tools.utils.read_json_file(json_file) | |||
for row in self.json_data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be converted at the begining or the end just before uploading to database?
In any case put a comment explaning why this should be this way only here and not in the rest of modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By converting it at the beggining you avoid checking its datatype/converting it multiple times in the code.
Why this change should go here and not in other modules? Because I think that this should be a temporal solution until iskylims and platform accepts different datatypes instead of only strings (Charfield), I'd say forcing all data to be string is not a good practice whatsoever. For example, in this particular case the field that thrown the error was "host age" being an integer, which makes total sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but iskylims and the platform will ALWAYS accept only string
in the database, as we are storing property values for all data types in the same column, so we need to put property_value column as "string" datatype, then we have another column in the table that is property_data_type
and there for host_age will be "integrer" and for collecting institution will be set to "string", so if we need to check the datatype inside the platform (or convert it) we will use that column, as all property_values will be strings due to db constraints in the design.
Here my question is that the data json must be manipulated according to its datatype and only convert all the properties in the json in the moment you call the post to the API. I think as upload_db should NOT manipulate the data in the json as other modules, it's ok (and good practice to not duplicate the to_string
conversion several times in the code), but just make sure you had this though into account. If there is some manipulation, we should keep the datatype until the api upload (i think), if there is no manipulation is ok to leave it here, but not as temporal fix, but as the right way to go according to our database design (or until maybe we change the database to non-relational and maybe we can make different datatypes for different properties)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get your point. Anyway this manipulation is only done prior to API request as update-db does not modify any file nor creates any output (excluding logs), so its safe.
Right now update-db fails for non-string fields, which is fine since relecov-platform forces most fields to charfields. This PR creates a temporal workaround by converting all fields in json data to strings