Param tuning: ensembling (version 2 but all the same code as version 1) #212

ntalluri · 2025-03-24T17:19:26Z

@agitter my param-tuning-ensembling branch #207 was out of sync with the changes I had locally. I needed to redo the branch to be up to date

@agitter Do this PR second (then follow up with the pull requests #208, #209 after this one is merged)

Will need to merge with updated master after #193 is merged. (hopefully this will remove the repeated files through out the PRs)
Included in this PR:

update to evaluation.py that will deal with making node ensemble frequencies then create a node PR curve
a new test suite evaluate for only ensembling idea
updates to Snakemake file that will run evaluation per dataset and per algortihm-dataset pair

ntalluri · 2025-03-31T14:40:36Z

Reminder: There was one unresolved comment that we can continue discussing here #193 (comment).

Snakefile

spras/evaluation.py

test/evaluate/input/node_table.csv

test/evaluate/test_evaluate.py

spras/evaluation.py

ntalluri · 2025-05-27T20:11:21Z

issue #232 can be updated in this PR

test/evaluate/test_evaluate.py

…as into param-tuning-ensembling-2.0

spras/evaluation.py

agitter

I pushed formatting changes.

Reminder: There was one unresolved comment that we can continue discussing here #193 (comment).

I opened #259 for this so we don't need to track it here.

spras/evaluation.py

agitter · 2025-06-07T02:19:45Z

spras/evaluation.py

+                    'Threshold': ["None"],
+                    'Precision': ["None"],
+                    'Recall': ["None"],
+                    'Average_Precison': ["None"],
+                    'Baseline_Precision': ["None"]


Is there a reason these are strings and not None? Can we set them to actual values? The precision and recall may default to 0, we can look for precedent. The AP may as well. The baseline precision we actually do have.

I get a specific error: If using all scalar values, you must pass an index. To not this error, I need to add an index when I do pd.DataFrame(data). Another solution to get around that is to make everything a list.
https://stackoverflow.com/questions/17839973/constructing-dataframe-from-values-in-variables-yields-valueerror-if-using-all#:~:text=The%20error%20message,3%20%202%20%203

Another option is to make the dictionary a series then to a dataframe

https://stackoverflow.com/questions/17839973/constructing-dataframe-from-values-in-variables-yields-valueerror-if-using-all#:~:text=data%20%3D%20%7B%27a%27%3A%201%2C%20%27b%27%3A%202%7D%0Apd.Series(data).to_frame()v

Another option is also pd.DataFrame.from_dict(dictionary, orient = "index")

I get a specific error: If using all scalar values, you must pass an index. To not this error, I need to add an index when I do pd.DataFrame(data). Another solution to get around that is to make everything a list. https://stackoverflow.com/questions/17839973/constructing-dataframe-from-values-in-variables-yields-valueerror-if-using-all#:~:text=The%20error%20message,3%20%202%20%203

I'm planning on using the list method

test/evaluate/expected/expected-pr-curve-ensemble-nodes.txt

test/evaluate/test_evaluate.py

spras/evaluation.py

ntalluri · 2025-06-11T20:21:32Z

Thinking about the conversation from our meeting:

The way I have been thinking about evaluation, I have been asking "Given the nodes the algorithm selected in its subnetworks, how well does the algorithm recover the gold standard ones?". This assesses the quality of what the algorithm selected.

I think if I was asking, "Did the algorithm select the correct nodes (the gold standard nodes) from the entire network?", then including all of the nodes from the full interactome with a frequency of 0 makes sense. I would be seeing if the algorithms were able to distinguish between the relevant gold standard nodes from the entire "universe" of possible nodes within all of their outputs.

I think this is what was done in Pathlinker as well

The right answer is based on what question we are trying to answer.

Adding only the missing gold standard nodes with a frequency of 0 ensures accurate recall calculation by capturing the correct number of false negatives (the gold standard nodes that should have been recovered but were not).

By adding the entire network's nodes with frequency 0, we would be penalizing an algorithm for not predicting the whole network. This also might penalize methods that return sparser smaller networks.
However, this seems to be the correct way to look at the ensembles where the gold standard will be the positive samples and the network part not predicted is the negative samples. This allows us to evaluate how well the algorithm prioritizes relevant nodes (the gold standard) over all possible alternatives.

add only the gold standard
Gold standard nodes: {"A", "B", "C"}

Node	Is Gold Standard (y_true)	Frequency (y_score)
A	1	0.9
D	0	0.8
B	1	0.0
C	1	0.0

True Positives (TP): A
False Positives (FP): D
False Negatives (FN): B, C
Precision = TP / (TP + FP) = 1 / (1 + 1) = 0.5
Recall = TP / (TP + FN) = 1 / (1 + 2) = 0.33

add the whole network

Node	Is Gold Standard	Frequency
A	1	0.9
D	0	0.8
B	1	0.0
C	1	0.0
X	0	0.0
...
X997	0	0.0

True Positives (TP): A
False Positives (FP): D, X -> X997
False Negatives (FN): B, C
Precision = TP / (TP + FP) = 1 / (1 + 998) = 0.001
Recall = TP / (TP + FN) = 1 / (1 + 2) = 0.33

How I have been defining TP, FP, FN, TN for nodes:

Term	Meaning
True Positive (TP)	A predicted node that is also in the gold standard pathway
False Positive (FP)	A predicted node that is not in the gold standard pathway
False Negative (FN)	A node in the gold standard pathway that was not predicted
True Negative (TN)	A node that was not predicted and is not in the gold standard pathway

ntalluri · 2025-06-11T22:02:54Z

add the whole network plot:

The baseline precision and average precision gets messed up because I added the whole network

agitter · 2025-06-13T21:21:55Z

"Did the algorithm select the correct nodes (the gold standard nodes) from the entire network?"

I also believe this is the version of the question we want to ask. It makes for the most straightforward and fair comparison of evaluation metrics across methods that predict variable size networks.

The baseline precision and average precision gets messed up because I added the whole network

Are they wrong in the plot now? Or are they different and lower? It may be okay if they are low. Do we have any example where a pathway reconstruction algorithm actually does recover all of the gold standard nodes so we can confirm the PR curve and AP look as expected in that case?

ntalluri · 2025-06-16T21:25:02Z

The baseline precision is calculated the same way for each algorithm by including all gold standard nodes with a frequency of 0, resulting in a baseline of (all of the gold standard nodes) / (total nodes). There will no longer be a baseline per algorithm since this will always be the same per algorithm.

Average precision (AP) tends to be low because adding the entire network’s nodes and edges to the ensembles introduces a large number of negatives, skewing the score. In a smaller test case where all gold standard nodes are recovered, the AP remains higher than the image above (run on the egfr dataset) since the total number of nodes is limited.

Example:

Node Ensemble (bolded are the gold standard nodes)
Node Frequency
A 0.5
B 0.5
C 0.75
D 0.75
E 0.9
F 0.9
L 0.5
M 0.5
N 0.25
O 0.25
P 0.25
Q 0.25
Z 0.01
G 0.0
H 0.0
I 0.0
J 0.0
K 0.0
R 0.0

agitter

I wanted to test this with the EGFR config, but it isn't set up to use the gold standard. Should we add that to the config as part of this pull request to demonstrate the behavior on a real dataset? The toy datasets are too small to see how the PR curves work.

test/evaluate/test_evaluate.py

spras/evaluation.py

ntalluri · 2025-06-25T16:12:35Z

I wanted to test this with the EGFR config, but it isn't set up to use the gold standard. Should we add that to the config as part of this pull request to demonstrate the behavior on a real dataset? The toy datasets are too small to see how the PR curves work.

I agree, I was testing the egfr dataset locally. I will add the evaluation dataset, and more of the algorithms/parameter setting. Not all of the parameter settings from the egfr param tuning config file.

Removed random merge conflicts

agitter

I pushed a few formatting changes. We can merge once the tests pass.

ntalluri added 6 commits March 3, 2025 13:47

test cases for dup edges

63f1f80

removed dup edge cases from master

05d4979

Merge branch 'master' of github.com:ntalluri/spras

4e60656

Merge branch 'master' of github.com:ntalluri/spras

7c94c2f

updated SnakeFile

156753c

added test cases for ensemble node PR curves

e07d0a1

ntalluri mentioned this pull request Mar 24, 2025

Param tuning code integration: ensembling #207

Closed

Line spacing fixes

2b7424b

agitter reviewed May 5, 2025

View reviewed changes

ntalluri commented May 28, 2025

View reviewed changes

test/evaluate/test_evaluate.py Show resolved Hide resolved

ntalluri added 9 commits May 28, 2025 14:51

updated Snakefile based on review

ee8d921

update eval code based on review

248ebd4

updated test case based on review

721e5e0

Merge branch 'param-tuning-ensembling-2.0' of github.com:ntalluri/spr…

af92fc3

…as into param-tuning-ensembling-2.0

update ensemble visulization snakemake logic

e231a17

update ensemble logic

48f6df2

update test cases

5a7c4d3

update ensemble eval code to include file of the values

50dea36

updated test cases

1c7a604

ntalluri requested a review from agitter May 29, 2025 20:07

tristan-f-r added the tuning Workflow-spanning algorithm tuning label May 30, 2025

ntalluri commented Jun 5, 2025

View reviewed changes

spras/evaluation.py Outdated Show resolved Hide resolved

ntalluri and others added 3 commits June 5, 2025 14:49

clean up code and comments

d9e4400

update test cases

89c2508

Clean up imports and formatting

c98f0df

agitter reviewed Jun 7, 2025

View reviewed changes

agitter mentioned this pull request Jun 13, 2025

Param tuning code integration:no param tuning #208

Open

ntalluri and others added 4 commits June 18, 2025 14:23

update to evaluation code

b3a7ac0

updated test cases

c3fe362

precommit

8bd6133

Formatting and doc changes

5a45f03

agitter reviewed Jun 21, 2025

View reviewed changes

ntalluri added 3 commits June 24, 2025 15:03

update to evaluation code based on review

6f51672

update to test cases and updated based on review

7f32a0b

precommit

b4d6ef3

ntalluri added needed for benchmarking Priority PRs needed for the benchmarking paper labels Jun 25, 2025

ntalluri and others added 4 commits June 25, 2025 16:54

added eval dataset and added a couple more parameters

5f8a991

Merge branch 'main' into param-tuning-ensembling-2.0

db011f0

Update egfr.yaml

3d22a0a

Removed random merge conflicts

precommit

1dea550

ntalluri requested a review from agitter July 1, 2025 21:26

agitter added 2 commits July 11, 2025 14:18

Sync EGFR config file with general changes

cd538e1

Formatting

b6a86df

agitter approved these changes Jul 11, 2025

View reviewed changes

agitter merged commit 85a3184 into Reed-CompBio:main Jul 11, 2025
14 checks passed

Param tuning: ensembling (version 2 but all the same code as version 1) #212

Param tuning: ensembling (version 2 but all the same code as version 1) #212

Uh oh!

Conversation

ntalluri commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ntalluri commented Mar 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ntalluri commented May 27, 2025

Uh oh!

Uh oh!

Uh oh!

agitter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

agitter Jun 7, 2025

Choose a reason for hiding this comment

Uh oh!

ntalluri Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

ntalluri Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

ntalluri Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

ntalluri Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ntalluri commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ntalluri commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agitter commented Jun 13, 2025

Uh oh!

ntalluri commented Jun 16, 2025

Uh oh!

agitter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ntalluri commented Jun 25, 2025

Uh oh!

agitter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ntalluri commented Mar 24, 2025 •

edited

Loading

ntalluri commented Jun 11, 2025 •

edited

Loading

ntalluri commented Jun 11, 2025 •

edited

Loading