-
Notifications
You must be signed in to change notification settings - Fork 24
Param tuning: ensembling (version 2 but all the same code as version 1) #212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Param tuning: ensembling (version 2 but all the same code as version 1) #212
Conversation
Reminder: There was one unresolved comment that we can continue discussing here #193 (comment). |
issue #232 can be updated in this PR |
…as into param-tuning-ensembling-2.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed formatting changes.
Reminder: There was one unresolved comment that we can continue discussing here #193 (comment).
I opened #259 for this so we don't need to track it here.
spras/evaluation.py
Outdated
'Threshold': ["None"], | ||
'Precision': ["None"], | ||
'Recall': ["None"], | ||
'Average_Precison': ["None"], | ||
'Baseline_Precision': ["None"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason these are strings and not None
? Can we set them to actual values? The precision and recall may default to 0, we can look for precedent. The AP may as well. The baseline precision we actually do have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get a specific error: If using all scalar values, you must pass an index
. To not this error, I need to add an index when I do pd.DataFrame(data). Another solution to get around that is to make everything a list.
https://stackoverflow.com/questions/17839973/constructing-dataframe-from-values-in-variables-yields-valueerror-if-using-all#:~:text=The%20error%20message,3%20%202%20%203
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is to make the dictionary a series then to a dataframe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is also pd.DataFrame.from_dict(dictionary, orient = "index")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get a specific error:
If using all scalar values, you must pass an index
. To not this error, I need to add an index when I do pd.DataFrame(data). Another solution to get around that is to make everything a list. https://stackoverflow.com/questions/17839973/constructing-dataframe-from-values-in-variables-yields-valueerror-if-using-all#:~:text=The%20error%20message,3%20%202%20%203
I'm planning on using the list method
Thinking about the conversation from our meeting: The way I have been thinking about evaluation, I have been asking "Given the nodes the algorithm selected in its subnetworks, how well does the algorithm recover the gold standard ones?". This assesses the quality of what the algorithm selected. I think if I was asking, "Did the algorithm select the correct nodes (the gold standard nodes) from the entire network?", then including all of the nodes from the full interactome with a frequency of 0 makes sense. I would be seeing if the algorithms were able to distinguish between the relevant gold standard nodes from the entire "universe" of possible nodes within all of their outputs.
The right answer is based on what question we are trying to answer. Adding only the missing gold standard nodes with a frequency of 0 ensures accurate recall calculation by capturing the correct number of false negatives (the gold standard nodes that should have been recovered but were not). By adding the entire network's nodes with frequency 0, we would be penalizing an algorithm for not predicting the whole network. This also might penalize methods that return sparser smaller networks. add only the gold standard
add the whole network
How I have been defining TP, FP, FN, TN for nodes:
|
I also believe this is the version of the question we want to ask. It makes for the most straightforward and fair comparison of evaluation metrics across methods that predict variable size networks.
Are they wrong in the plot now? Or are they different and lower? It may be okay if they are low. Do we have any example where a pathway reconstruction algorithm actually does recover all of the gold standard nodes so we can confirm the PR curve and AP look as expected in that case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to test this with the EGFR config, but it isn't set up to use the gold standard. Should we add that to the config as part of this pull request to demonstrate the behavior on a real dataset? The toy datasets are too small to see how the PR curves work.
I agree, I was testing the egfr dataset locally. I will add the evaluation dataset, and more of the algorithms/parameter setting. Not all of the parameter settings from the egfr param tuning config file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a few formatting changes. We can merge once the tests pass.
@agitter my param-tuning-ensembling branch #207 was out of sync with the changes I had locally. I needed to redo the branch to be up to date
@agitter Do this PR second (then follow up with the pull requests #208, #209 after this one is merged)
Will need to merge with updated master after #193 is merged. (hopefully this will remove the repeated files through out the PRs)
Included in this PR:
update to evaluation.py that will deal with making node ensemble frequencies then create a node PR curve
a new test suite evaluate for only ensembling idea
updates to Snakemake file that will run evaluation per dataset and per algortihm-dataset pair