Skip to content
This repository was archived by the owner on Aug 14, 2024. It is now read-only.

Feature/orthoinference 71 updates #104

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

cookersjs
Copy link
Contributor

This PR addresses a couple of things that had been on the backburner for Orthoinference:

  1. Most PhysicalEntity instances are now screened before any inference is attempted on a ReactionlikeEvent. The only cases that aren't screened ahead of time are some edge cases for EntitySets that were very difficult to isolate out ahead of time. Safeguards still exist for preventing inference, so the ones that do squeak through this new screening system will still be rejected. This will prevent a significant majority of orphan PEs.

  2. Orthoinference is now interruptable. If the process is stopped in the middle of a species' inference, it can be restarted and it will still create Pathways and PathwayDiagrams for any automatically inferred RlEs. A question: Should manually inferred ones also be receiving Pathway and PathwayDiagram instances? Its a very simple adjustment if so. This addresses Orthoinference needs to be interruption proof #31

  3. The report file produced at the end of a species' inference now has a header that denotes the release. This addresses Orthoinference report needs a header to identify the Reactome version #83

Thanks!

@cookersjs cookersjs added enhancement New feature or request orthoinference Issues related to the Orthoinference project labels Oct 17, 2019
private static StableIdentifierGenerator stableIdentifierGenerator;
private static OrthologousPathwayDiagramGenerator orthologousPathwayDiagramGenerator;
private static final String summationText = "This event has been computationally inferred from an event that has been demonstrated in another species.<p>The inference is based on the homology mapping from PANTHER. Briefly, reactions for which all involved PhysicalEntities (in input, output and catalyst) have a mapped orthologue/paralogue (for complexes at least 75% of components must have a mapping) are inferred to the other species. High level events are also inferred for these events to allow for easier navigation.<p><a href='/electronic_inference_compara.html' target = 'NEW'>More details and caveats of the event inference in Reactome.</a> For details on PANTHER see also: <a href='http://www.pantherdb.org/about.jsp' target='NEW'>http://www.pantherdb.org/about.jsp</a>";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the "target = 'NEW'" attribute that is used two times is allowed in modern HTML. For a new window or tab, 'target=_blank' is the attribute value to use:

https://www.w3schools.com/tags/att_a_target.asp

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this string be split on to several lines? This is a bit wide, even for me.

@@ -113,6 +115,7 @@ public static void inferEvents(Properties props, String species) throws Exceptio
Map<String,String[]> homologueMappings = readHomologueMappingFile(species, "hsap", pathToOrthopairs);
ProteinCountUtility.setHomologueMappingFile(homologueMappings);
EWASInferrer.setHomologueMappingFile(homologueMappings);
SkipInstanceChecker.setHomologueMappingFile(homologueMappings);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name setHomologueMappingFile is a little misleading since it is taking the homologue map data structure itself rather than the file name. Could it be setHomologueMap instead?

Comment on lines +170 to +171
if (!seenPrecedingEvent.contains(inferrableEventInst)) {
if (inferrableEventInst.getAttributeValue(precedingEvent) != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these conditions be merged into one if-statement as !seenPrecedingEvent.contains(inferrableEventInst) && inferrableEventInst.getAttributeValue(precedingEvent) != null ?
I don't see an else-statement pairing with the inner if-statement so it seems like it should be OK, and would make the rest of this a little more readable.

infReactionInst.addAttributeValue(stableIdentifier, orthoStableIdentifierInst);
dba.storeInstance(infReactionInst);
logger.info("Inferred RlE instance: " + infReactionInst);
return;
Copy link
Collaborator

@SolomonShorser-OICR SolomonShorser-OICR Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an if-statement that only contains a return? Could this be modified to be the negation of the same condition which allows the rest of the code to flow? This is a void function, so it's not even expected to return anything. I find that using return as a form of flow-control to be confusing. In short functions, this is less of an issue, but here, the exit point is hidden, easy to miss. There are other empty returns doing the same thing.

previouslyInferredInstances = checkIfPreviouslyInferred(reactionInst, inferredFrom, previouslyInferredInstances);
if (previouslyInferredInstances.size() > 0)
{
previouslyInferredInstances.addAll(checkIfPreviouslyInferred(reactionInst, inferredFrom, previouslyInferredInstances));
Copy link
Member

@jweiser jweiser Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since previouslyInferredInstances is only added to and not queried in the method checkIfPreviouslyInferred, passing it as a parameter isn't needed. It's best practice not to modify parameters, but instead to create and return a different variable in the method body.

You could do this part of the code as follows:

List<GKInstance> previouslyInferredInstances = checkIfPreviouslyInferred(
    reactionInst, Arrays.asList(inferredFrom, orthologousEvent)
);

Then the checkIfPreviouslyInferred method could be:

private static List<GKInstance> checkIfPreviouslyInferred(GKInstance reactionInst, List<String> attributes) throws Exception 
{
	List previouslyInferredInstances = new ArrayList<>();
	for (String attribute : attributes) {
		for (GKInstance attributeInst : (Collection<GKInstance>) reactionInst.getAttributeValuesList(attribute)) {
			GKInstance reactionSpeciesInst = (GKInstance) attributeInst.getAttributeValue(species);
			if (reactionSpeciesInst.getDBID().equals(speciesInst.getDBID()) && attributeInst.getAttributeValue(isChimeric) == null) {
				previouslyInferredInstances.add(attributeInst);
			}
		}
	}
	return previouslyInferredInstances;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it may be better to rename checkIfPreviouslyInferred to getInstancesIfPreviouslyInferred to make it clear that the list of previously inferred instances is what the method returns. A method name starting with "check" may be more suitable to a void or possibly a boolean return value.

if (previouslyInferredInstances.size() > 0)
{
previouslyInferredInstances.addAll(checkIfPreviouslyInferred(reactionInst, inferredFrom, previouslyInferredInstances));
if (previouslyInferredInstances.size() > 0) {
GKInstance prevInfInst = previouslyInferredInstances.get(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a check needed to see if the list contains more than one element? If the list has more than one previously inferred instances would that be an error?

manualEventToNonHumanSource.put(reactionInst, prevInfInst);
manualHumanEvents.add(reactionInst);
eventsAlreadyInferredMap.put(reactionInst, prevInfInst);
eventsAlreadyInferred.add(reactionInst);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the eventsAlreadyInferredMap and eventsAlreadyInferred data structures both needed? Could the list populated in eventsAlreadyInferred be obtained from the key set of the eventsAlreadyInferredMap?

{
for (GKInstance inferrableEventInst : updatedInferrableHumanEvents) {
if (!seenPrecedingEvent.contains(inferrableEventInst)) {
if (inferrableEventInst.getAttributeValue(precedingEvent) != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two conditions could be combined with && if seenPrecedingEvent.add(inferrableEventInst); (line 198) is moved outside the inner if condition. Since seenPrecedingEvent is a set, it won't cause a problem if you add an instance that is already present in the set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also want to consider refactoring the body of the if statement to its own method so that you have the operation occurring for each inferrableEventInst isolated.

if (!inferredPrecedingEvents.contains(precedingEventInst.getDBID().toString()))
{
for (GKInstance precedingEventInst : precedingEventInstances) {
if (!inferredPrecedingEvents.contains(precedingEventInst.getDBID().toString())) {
Copy link
Member

@jweiser jweiser Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could inferredPrecedingEvents be typed as Set<Long> instead of Set<String> so a call to toString() isn't needed?

updatedPrecedingEventInstances.add(precedingEventInst);
}
}
// Add preceding event to inferred instance
if (updatedPrecedingEventInstances != null && updatedPrecedingEventInstances.size() > 0)
{
if (updatedPrecedingEventInstances != null && updatedPrecedingEventInstances.size() > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null check shouldn't be necessary since updatingPrecedingEventInstances is initialized as an empty ArrayList.

logger.info("Inferring inputs...");
if (inferReactionInputsOrOutputs(reactionInst, infReactionInst, input))
logger.info("Inferring outputs...");
if (inferReactionInputsOrOutputs(reactionInst, infReactionInst, output))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be clearer to use guard clauses rather than nest.

https://stackoverflow.com/a/4887280/2295778
https://refactoring.guru/replace-nested-conditional-with-guard-clauses

// Attempt to infer all PhysicalEntities associated with this reaction's Input, Output, CatalystActivity and RegulatedBy attributes.
// Failure to successfully infer any of these attributes will end inference for this reaction.
logger.info("Inferring inputs...");
if (!(inferReactionInputsOrOutputs(reactionInst, infReactionInst, input)))
{
	logger.info("Input inference unsuccessful -- terminating inference for " + reactionInst);
	return;
}

logger.info("Inferring outputs...");
if (!(inferReactionInputsOrOutputs(reactionInst, infReactionInst, output)))
{
	logger.info("Output inference unsuccessful -- terminating inference for " + reactionInst);
	return;
}

logger.info("Inferring catalysts...");
if (!(inferReactionCatalysts(reactionInst, infReactionInst)))
{
	logger.info("Catalyst inference unsuccessful -- terminating inference for " + reactionInst);
	return;
}

inferredCount++;
inferrableHumanEvents.add(reactionInst);
String inferredEvent = infReactionInst.getAttributeValue(DB_ID).toString() + "\t" + infReactionInst.getDisplayName() + "\n";
Files.write(Paths.get(inferredFilehandle), inferredEvent.getBytes(), StandardOpenOption.APPEND);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inferredFilehandle might be better named inferredFilePath. It also may be better to store it as a Path object rather than a String:

private static Path inferredFilePath;

// Replacing the setInferredFilename method
public static void setInferredFilePath(String inferredFilename)
{
	inferredFilePath = Paths.get(inferredFilename);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this could confuse future developers, as Java does not really use file handles.

Comment on lines +110 to +111
String inferredEvent = infReactionInst.getAttributeValue(DB_ID).toString() + "\t" + infReactionInst.getDisplayName() + "\n";
Files.write(Paths.get(inferredFilehandle), inferredEvent.getBytes(), StandardOpenOption.APPEND);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the reaction db_id and display name are needed, the getExtendedDisplayName() method could be used in place of the inferredEvent temp variable:

Files.write(
    Paths.get(inferredFilehandle), 
    infReactionInst.getExtendedDisplayName().getBytes(), 
    StandardOpenOption.APPEND
);

Comment on lines +275 to 285
public static Map<GKInstance, GKInstance> getInferredEvent(Map<GKInstance, GKInstance> eventsAlreadyInferredMap)
{
inferredEvent.putAll(eventsAlreadyInferredMap);
return inferredEvent;
}

public static List<GKInstance> getInferrableHumanEvents()
public static List<GKInstance> getInferrableHumanEvents(List<GKInstance> eventsAlreadyInferred)
{
inferrableHumanEvents.addAll(eventsAlreadyInferred);
return inferrableHumanEvents;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +333 to +336
public static void setEligibleFilename(String eligibleFilename)
{
eligibleFilehandle = eligibleFilename;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*/
private static boolean reactionComponentsAreInferrable(GKInstance reactionInst) throws Exception {
// First gather all inputs, outputs and the PEs in catalyst activities
// Inputs/Outputs/CatalystPEs need to be stored in seperate collections. At time of writing, having it all stored in
Copy link
Member

@jweiser jweiser Oct 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo - separate?

Comment on lines +121 to +122
String eligibleEventName = reactionInst.getAttributeValue(DB_ID).toString() + "\t" + reactionInst.getDisplayName() + "\n";
Files.write(Paths.get(eligibleFilehandle), eligibleEventName.getBytes(), StandardOpenOption.APPEND);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +157 to +174
for (GKInstance reactionInput : reactionInputs) {
if (!componentIsInferrable(reactionInput)) {
return false;
}
}
// Screen outputs
for (GKInstance reactionOutput : reactionOutputs) {
if (!componentIsInferrable(reactionOutput)) {
return false;
}
}
// Screen catalyst PhysicalEntities
for (GKInstance reactionCatalystPE : reactionCatalystPEs) {
if (!componentIsInferrable(reactionCatalystPE)) {
return false;
}
}
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do this:

return Stream.of(reactionInputs, reactionOutputs, reactionCatalystPEs)
	.allMatch(reactionInput -> componentIsInferrable(reactionInput));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing is the componentIsInferrable exception would need to be handled in that method rather than thrown. If it's practical to handle the exception in the method it gets thrown rather than throwing it to the method caller it can help reduce the methods which throw exceptions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jweiser I'm not entirely sure what would be equivalent. In the code here, the function will return as soon as something is found to not be inferrable. Does Stream.of(...).allMatch(...) short-circuit at the first failed match, or will it evaluate everything? It's not clear to me. Is there an alternate to allMatch that will fail on the first failure?

Forcing an unnecessary (I assume) evaluation of everything in all of those collections seems rather inefficient, just for the sake of slightly niftier-looking code (ignoring the fact that lambdas already introduce performance overheads).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@SolomonShorser-OICR SolomonShorser-OICR Oct 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to simplify this might be to add all of the inputs, outputs, PEs into a big list and then just use one loop:

List<GKInstance> bigList = new ArrayList<>();
bigList.addAll(reactionInputs);
bigList.addAll(reactionCatalystPEs);
bigList.addAll(reactionOutputs);

for (GKInstance instance : bigList) {
    if (!componentIsInferrable(instance)) {
        return false;
    }
}
return true;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still has return in the middle of the function, but only in two places, and the code is smaller so it's not as bad ;)

Comment on lines +182 to +183
if (!SpeciesCheckUtility.checkForSpeciesAttribute(reactionComponent)) {
// return true;
Copy link
Member

@jweiser jweiser Oct 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a guard clause and the return true; not commented out. Empty blocks are best avoided.

// return true;
} else if (reactionComponent.getSchemClass().isa(GenomeEncodedEntity))
{
if (reactionComponent.getSchemClass().toString().contains(EntityWithAccessionedSequence)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why contains here rather than equals or isa?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isa probably wouldn't make sense since this is comparing Strings. It's possible that the String-representation of the SchemaClass is something like [EntityWithAccessionedSequence] so equals would fail as well. Not sure...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It think it could be
reactionComponent.getSchemClass().isa(EntityWithAccessionedSequence)
or
reactionComponent.getSchemClass().getName().equals(EntityWithAccessionedSequence)

Copy link
Member

@jweiser jweiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Mostly style/organization related comments.

if (reactionComponent.getSchemClass().isa(Complex) || reactionComponent.getSchemClass().isa(Polymer)) {
int percent = 0;
if (totalProteinCounts > 0) {
percent = (inferrableProteinCounts * 100) / totalProteinCounts;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an int for percent doesn't cause any loss-of-precision problems?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double percent = 0 at line 199 would be good

Copy link
Collaborator

@SolomonShorser-OICR SolomonShorser-OICR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look at the logic in ReactioneInferrer.inferReaction. There are two empty return statements. I feel that using empty returns in a large void function as a method of flow-control is not a good practice. They can easily get lost and make it more difficult to comprehend the logic flow. If possible, adjust this function so that it does not need any empty return statements.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request orthoinference Issues related to the Orthoinference project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants