-
Notifications
You must be signed in to change notification settings - Fork 8
monomer data updated to include protease contribution to degradation #1463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…justments based on N-end rule for ADENYLATECYC-MONOMER and SPOT-MONOMER, which now have updated sequences.
rjuenemann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this @nvivanco ! I left some minor comments.
Could you edit the PR description to indicate explicitly that while these protease assignments and degradation contributions are being recorded for all proteins regardless of the rates used, we are not actually using the data from Gupta et al 2024 in the simulation at this point?
I just worry that out of context someone might see these protease assignments and data and think that we are already functionally modeling them. It would be worth adding a comment about this to the code as well before line 181 when you are actually filling in the values for each protein.
Great work! Let me know if you have any questions.
| @@ -0,0 +1,84 @@ | |||
| # Generated by /Users/noravivancogonzalez/code/wcEcoli/reconstruction/ecoli/scripts/protein_half_lives/convert_to_flat_Clim_protease_assignments.py on Fri Jan 24 13:17:32 2025 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this script also be added to the pull request?
| "ppgpp_regulation.tsv", | ||
| "ppgpp_regulation_added.tsv", | ||
| "ppgpp_regulation_removed.tsv", | ||
| "priority_protease_assignments_1.tsv", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor detail - could you update the description for the PR to include the _1 in the filename? Just so that way people are searching for the correct name if they come across this PR later
| len(cistron_id) for cistron_id in cistron_ids) | ||
| max_deg_source_id_length = max( | ||
| len(source_id) for source_id in deg_rate_source_id) | ||
| max_protease_length = max( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename this as max_protease_id_length for clarity and consistency
| if protein['id'] in protease_dict.keys(): | ||
| protease_assignment[i] = protease_dict[protein['id']]['protease_assignment'] | ||
| ClpP_contribution[i] = protease_dict[protein['id']]['ClpP_fraction'] | ||
| Lon_contribution[i] = protease_dict[protein['id']]['Lon_fraction'] | ||
| HslV_contribution[i] = protease_dict[protein['id']]['HslV_fraction'] | ||
| Unexplained_contribution[i] = protease_dict[protein['id']]['Unexplained_fraction'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it seems like you are doing the same thing for all proteins (not just the ones with measured degradation rates), repeating the same code in lines 156-161, 165-170, and 181-186 is a bit redundant. I would instead move these 5 lines outside of the measured/pulsed/N-end branching, i.e. unindent lines 181-186 and add a new line before 181
| ClpP_contribution = np.full(len(all_proteins), None) | ||
| Lon_contribution = np.full(len(all_proteins), None) | ||
| HslV_contribution = np.full(len(all_proteins), None) | ||
| Unexplained_contribution = np.full(len(all_proteins), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style suggestion: I understand this is the usual convention of capital letters for ClpP and the other proteases, but starting variables with capital letters in python is usually reserved for class names, and GitHub is coloring them as such. Could we change these instead to something like contribution_ClpP (if we want to keep the ClpP capitalization) to avoid confusion? The refactor->rename feature on PyCharm should help update all occurrences.
|
retest this please |
|
It looks like the reproducibility test is still failing - I Slacked you the error messages and the relevant runscript |
include ecocyc update
Under reconstruction, translation.py was edited to include degradation contributions (in fractions) of proteases to specific monomers. These data is derived from Gupta et al 2024, and has been added to a flat file named priority_protease_assingments.tsv, which has been added to knowledge_base_raw.py.