Skip to content

Conversation

@nvivanco
Copy link
Contributor

Under reconstruction, translation.py was edited to include degradation contributions (in fractions) of proteases to specific monomers. These data is derived from Gupta et al 2024, and has been added to a flat file named priority_protease_assingments.tsv, which has been added to knowledge_base_raw.py.

@nvivanco nvivanco requested a review from rjuenemann April 10, 2025 23:22
Copy link
Contributor

@rjuenemann rjuenemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this @nvivanco ! I left some minor comments.

Could you edit the PR description to indicate explicitly that while these protease assignments and degradation contributions are being recorded for all proteins regardless of the rates used, we are not actually using the data from Gupta et al 2024 in the simulation at this point?

I just worry that out of context someone might see these protease assignments and data and think that we are already functionally modeling them. It would be worth adding a comment about this to the code as well before line 181 when you are actually filling in the values for each protein.

Great work! Let me know if you have any questions.

@@ -0,0 +1,84 @@
# Generated by /Users/noravivancogonzalez/code/wcEcoli/reconstruction/ecoli/scripts/protein_half_lives/convert_to_flat_Clim_protease_assignments.py on Fri Jan 24 13:17:32 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this script also be added to the pull request?

"ppgpp_regulation.tsv",
"ppgpp_regulation_added.tsv",
"ppgpp_regulation_removed.tsv",
"priority_protease_assignments_1.tsv",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor detail - could you update the description for the PR to include the _1 in the filename? Just so that way people are searching for the correct name if they come across this PR later

len(cistron_id) for cistron_id in cistron_ids)
max_deg_source_id_length = max(
len(source_id) for source_id in deg_rate_source_id)
max_protease_length = max(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename this as max_protease_id_length for clarity and consistency

Comment on lines +156 to +161
if protein['id'] in protease_dict.keys():
protease_assignment[i] = protease_dict[protein['id']]['protease_assignment']
ClpP_contribution[i] = protease_dict[protein['id']]['ClpP_fraction']
Lon_contribution[i] = protease_dict[protein['id']]['Lon_fraction']
HslV_contribution[i] = protease_dict[protein['id']]['HslV_fraction']
Unexplained_contribution[i] = protease_dict[protein['id']]['Unexplained_fraction']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it seems like you are doing the same thing for all proteins (not just the ones with measured degradation rates), repeating the same code in lines 156-161, 165-170, and 181-186 is a bit redundant. I would instead move these 5 lines outside of the measured/pulsed/N-end branching, i.e. unindent lines 181-186 and add a new line before 181

Comment on lines +146 to +149
ClpP_contribution = np.full(len(all_proteins), None)
Lon_contribution = np.full(len(all_proteins), None)
HslV_contribution = np.full(len(all_proteins), None)
Unexplained_contribution = np.full(len(all_proteins), None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style suggestion: I understand this is the usual convention of capital letters for ClpP and the other proteases, but starting variables with capital letters in python is usually reserved for class names, and GitHub is coloring them as such. Could we change these instead to something like contribution_ClpP (if we want to keep the ClpP capitalization) to avoid confusion? The refactor->rename feature on PyCharm should help update all occurrences.

@rjuenemann
Copy link
Contributor

retest this please

@rjuenemann
Copy link
Contributor

It looks like the reproducibility test is still failing - I Slacked you the error messages and the relevant runscript

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants