Feature/orthoinference 71 updates #104

cookersjs · 2019-10-17T18:51:12Z

This PR addresses a couple of things that had been on the backburner for Orthoinference:

Most PhysicalEntity instances are now screened before any inference is attempted on a ReactionlikeEvent. The only cases that aren't screened ahead of time are some edge cases for EntitySets that were very difficult to isolate out ahead of time. Safeguards still exist for preventing inference, so the ones that do squeak through this new screening system will still be rejected. This will prevent a significant majority of orphan PEs.
Orthoinference is now interruptable. If the process is stopped in the middle of a species' inference, it can be restarted and it will still create Pathways and PathwayDiagrams for any automatically inferred RlEs. A question: Should manually inferred ones also be receiving Pathway and PathwayDiagram instances? Its a very simple adjustment if so. This addresses Orthoinference needs to be interruption proof #31
The report file produced at the end of a species' inference now has a header that denotes the release. This addresses Orthoinference report needs a header to identify the Reactome version #83

Thanks!

…ting inference

…t just ones created on current run

…r as an input

…ctures

jweiser · 2019-10-28T20:44:44Z

orthoinference/src/main/java/org/reactome/orthoinference/EventsInferrer.java

 	private static StableIdentifierGenerator stableIdentifierGenerator;
 	private static OrthologousPathwayDiagramGenerator orthologousPathwayDiagramGenerator;
+	private static final String summationText = "This event has been computationally inferred from an event that has been demonstrated in another species.<p>The inference is based on the homology mapping from PANTHER. Briefly, reactions for which all involved PhysicalEntities (in input, output and catalyst) have a mapped orthologue/paralogue (for complexes at least 75% of components must have a mapping) are inferred to the other species. High level events are also inferred for these events to allow for easier navigation.<p><a href='/electronic_inference_compara.html' target = 'NEW'>More details and caveats of the event inference in Reactome.</a> For details on PANTHER see also: <a href='http://www.pantherdb.org/about.jsp' target='NEW'>http://www.pantherdb.org/about.jsp</a>";


I'm not sure the "target = 'NEW'" attribute that is used two times is allowed in modern HTML. For a new window or tab, 'target=_blank' is the attribute value to use:

https://www.w3schools.com/tags/att_a_target.asp

Could this string be split on to several lines? This is a bit wide, even for me.

jweiser · 2019-10-28T21:07:34Z

orthoinference/src/main/java/org/reactome/orthoinference/EventsInferrer.java

@@ -113,6 +115,7 @@ public static void inferEvents(Properties props, String species) throws Exceptio
 			Map<String,String[]> homologueMappings = readHomologueMappingFile(species, "hsap", pathToOrthopairs);
 			ProteinCountUtility.setHomologueMappingFile(homologueMappings);
 			EWASInferrer.setHomologueMappingFile(homologueMappings);
+			SkipInstanceChecker.setHomologueMappingFile(homologueMappings);


The name setHomologueMappingFile is a little misleading since it is taking the homologue map data structure itself rather than the file name. Could it be setHomologueMap instead?

SolomonShorser-OICR · 2019-10-28T21:15:58Z

orthoinference/src/main/java/org/reactome/orthoinference/PathwaysInferrer.java

+			if (!seenPrecedingEvent.contains(inferrableEventInst)) {
+				if (inferrableEventInst.getAttributeValue(precedingEvent) != null) {


Could these conditions be merged into one if-statement as !seenPrecedingEvent.contains(inferrableEventInst) && inferrableEventInst.getAttributeValue(precedingEvent) != null ?
I don't see an else-statement pairing with the inner if-statement so it seems like it should be OK, and would make the rest of this a little more readable.

SolomonShorser-OICR · 2019-10-28T21:21:57Z

orthoinference/src/main/java/org/reactome/orthoinference/ReactionInferrer.java

-							infReactionInst.addAttributeValue(stableIdentifier, orthoStableIdentifierInst);
-							dba.storeInstance(infReactionInst);
-							logger.info("Inferred RlE instance: " + infReactionInst);
+							return;


Is this an if-statement that only contains a return? Could this be modified to be the negation of the same condition which allows the rest of the code to flow? This is a void function, so it's not even expected to return anything. I find that using return as a form of flow-control to be confusing. In short functions, this is less of an issue, but here, the exit point is hidden, easy to miss. There are other empty returns doing the same thing.

jweiser · 2019-10-28T21:25:27Z

orthoinference/src/main/java/org/reactome/orthoinference/EventsInferrer.java

-			previouslyInferredInstances = checkIfPreviouslyInferred(reactionInst, inferredFrom, previouslyInferredInstances);
-			if (previouslyInferredInstances.size() > 0)
-			{
+			previouslyInferredInstances.addAll(checkIfPreviouslyInferred(reactionInst, inferredFrom, previouslyInferredInstances));


Since previouslyInferredInstances is only added to and not queried in the method checkIfPreviouslyInferred, passing it as a parameter isn't needed. It's best practice not to modify parameters, but instead to create and return a different variable in the method body.

You could do this part of the code as follows:

List<GKInstance> previouslyInferredInstances = checkIfPreviouslyInferred( reactionInst, Arrays.asList(inferredFrom, orthologousEvent) );

Then the checkIfPreviouslyInferred method could be:

private static List<GKInstance> checkIfPreviouslyInferred(GKInstance reactionInst, List<String> attributes) throws Exception { List previouslyInferredInstances = new ArrayList<>(); for (String attribute : attributes) { for (GKInstance attributeInst : (Collection<GKInstance>) reactionInst.getAttributeValuesList(attribute)) { GKInstance reactionSpeciesInst = (GKInstance) attributeInst.getAttributeValue(species); if (reactionSpeciesInst.getDBID().equals(speciesInst.getDBID()) && attributeInst.getAttributeValue(isChimeric) == null) { previouslyInferredInstances.add(attributeInst); } } } return previouslyInferredInstances; }

Also, it may be better to rename checkIfPreviouslyInferred to getInstancesIfPreviouslyInferred to make it clear that the list of previously inferred instances is what the method returns. A method name starting with "check" may be more suitable to a void or possibly a boolean return value.

jweiser · 2019-10-28T21:32:01Z

orthoinference/src/main/java/org/reactome/orthoinference/EventsInferrer.java

-			if (previouslyInferredInstances.size() > 0)
-			{
+			previouslyInferredInstances.addAll(checkIfPreviouslyInferred(reactionInst, inferredFrom, previouslyInferredInstances));
+			if (previouslyInferredInstances.size() > 0) {
 				GKInstance prevInfInst = previouslyInferredInstances.get(0);


Is a check needed to see if the list contains more than one element? If the list has more than one previously inferred instances would that be an error?

jweiser · 2019-10-28T21:44:26Z

orthoinference/src/main/java/org/reactome/orthoinference/EventsInferrer.java

-					manualEventToNonHumanSource.put(reactionInst, prevInfInst);
-					manualHumanEvents.add(reactionInst);
+					eventsAlreadyInferredMap.put(reactionInst, prevInfInst);
+					eventsAlreadyInferred.add(reactionInst);


Are the eventsAlreadyInferredMap and eventsAlreadyInferred data structures both needed? Could the list populated in eventsAlreadyInferred be obtained from the key set of the eventsAlreadyInferredMap?

jweiser · 2019-10-28T21:54:15Z

orthoinference/src/main/java/org/reactome/orthoinference/PathwaysInferrer.java

-				{
+		for (GKInstance inferrableEventInst : updatedInferrableHumanEvents) {
+			if (!seenPrecedingEvent.contains(inferrableEventInst)) {
+				if (inferrableEventInst.getAttributeValue(precedingEvent) != null) {


These two conditions could be combined with && if seenPrecedingEvent.add(inferrableEventInst); (line 198) is moved outside the inner if condition. Since seenPrecedingEvent is a set, it won't cause a problem if you add an instance that is already present in the set.

You may also want to consider refactoring the body of the if statement to its own method so that you have the operation occurring for each inferrableEventInst isolated.

jweiser · 2019-10-28T21:59:25Z

orthoinference/src/main/java/org/reactome/orthoinference/PathwaysInferrer.java

-						if (!inferredPrecedingEvents.contains(precedingEventInst.getDBID().toString()))
-						{
+					for (GKInstance precedingEventInst : precedingEventInstances) {
+						if (!inferredPrecedingEvents.contains(precedingEventInst.getDBID().toString())) {


Could inferredPrecedingEvents be typed as Set<Long> instead of Set<String> so a call to toString() isn't needed?

jweiser · 2019-10-28T22:00:45Z

orthoinference/src/main/java/org/reactome/orthoinference/PathwaysInferrer.java

 							updatedPrecedingEventInstances.add(precedingEventInst);
 						}
 					}
 					// Add preceding event to inferred instance
-					if (updatedPrecedingEventInstances != null && updatedPrecedingEventInstances.size() > 0)
-					{
+					if (updatedPrecedingEventInstances != null && updatedPrecedingEventInstances.size() > 0) {


The null check shouldn't be necessary since updatingPrecedingEventInstances is initialized as an empty ArrayList.

jweiser · 2019-10-28T22:27:05Z

orthoinference/src/main/java/org/reactome/orthoinference/ReactionInferrer.java

-				logger.info("Inferring inputs...");
-				if (inferReactionInputsOrOutputs(reactionInst, infReactionInst, input))
+				logger.info("Inferring outputs...");
+				if (inferReactionInputsOrOutputs(reactionInst, infReactionInst, output))


It may be clearer to use guard clauses rather than nest.

https://stackoverflow.com/a/4887280/2295778
https://refactoring.guru/replace-nested-conditional-with-guard-clauses

// Attempt to infer all PhysicalEntities associated with this reaction's Input, Output, CatalystActivity and RegulatedBy attributes. // Failure to successfully infer any of these attributes will end inference for this reaction. logger.info("Inferring inputs..."); if (!(inferReactionInputsOrOutputs(reactionInst, infReactionInst, input))) { logger.info("Input inference unsuccessful -- terminating inference for " + reactionInst); return; } logger.info("Inferring outputs..."); if (!(inferReactionInputsOrOutputs(reactionInst, infReactionInst, output))) { logger.info("Output inference unsuccessful -- terminating inference for " + reactionInst); return; } logger.info("Inferring catalysts..."); if (!(inferReactionCatalysts(reactionInst, infReactionInst))) { logger.info("Catalyst inference unsuccessful -- terminating inference for " + reactionInst); return; }

jweiser · 2019-10-29T13:46:46Z

orthoinference/src/main/java/org/reactome/orthoinference/ReactionInferrer.java

+						inferredCount++;
+						inferrableHumanEvents.add(reactionInst);
+						String inferredEvent = infReactionInst.getAttributeValue(DB_ID).toString() + "\t" + infReactionInst.getDisplayName() + "\n";
+						Files.write(Paths.get(inferredFilehandle), inferredEvent.getBytes(), StandardOpenOption.APPEND);


inferredFilehandle might be better named inferredFilePath. It also may be better to store it as a Path object rather than a String:

private static Path inferredFilePath; // Replacing the setInferredFilename method public static void setInferredFilePath(String inferredFilename) { inferredFilePath = Paths.get(inferredFilename); }

Yeah, this could confuse future developers, as Java does not really use file handles.

jweiser · 2019-10-29T13:49:53Z

orthoinference/src/main/java/org/reactome/orthoinference/ReactionInferrer.java

+						String inferredEvent = infReactionInst.getAttributeValue(DB_ID).toString() + "\t" + infReactionInst.getDisplayName() + "\n";
+						Files.write(Paths.get(inferredFilehandle), inferredEvent.getBytes(), StandardOpenOption.APPEND);


If the reaction db_id and display name are needed, the getExtendedDisplayName() method could be used in place of the inferredEvent temp variable:

Files.write( Paths.get(inferredFilehandle), infReactionInst.getExtendedDisplayName().getBytes(), StandardOpenOption.APPEND );

jweiser · 2019-10-29T13:52:31Z

orthoinference/src/main/java/org/reactome/orthoinference/ReactionInferrer.java

+	public static Map<GKInstance, GKInstance> getInferredEvent(Map<GKInstance, GKInstance> eventsAlreadyInferredMap)
 	{
+		inferredEvent.putAll(eventsAlreadyInferredMap);
 		return inferredEvent;
 	}

-	public static List<GKInstance> getInferrableHumanEvents()
+	public static List<GKInstance> getInferrableHumanEvents(List<GKInstance> eventsAlreadyInferred)
 	{
+		inferrableHumanEvents.addAll(eventsAlreadyInferred);
 		return inferrableHumanEvents;
 	}


See https://github.com/reactome/data-release-pipeline/pull/104/files#r339807277

jweiser · 2019-10-29T13:54:18Z

orthoinference/src/main/java/org/reactome/orthoinference/SkipInstanceChecker.java

+	public static void setEligibleFilename(String eligibleFilename)
+	{
+		eligibleFilehandle = eligibleFilename;
+	}


Similar to https://github.com/reactome/data-release-pipeline/pull/104/files#r340082898

jweiser · 2019-10-29T13:55:26Z

orthoinference/src/main/java/org/reactome/orthoinference/SkipInstanceChecker.java

+	 */
+	private static boolean reactionComponentsAreInferrable(GKInstance reactionInst) throws Exception {
+		// First gather all inputs, outputs and the PEs in catalyst activities
+		// Inputs/Outputs/CatalystPEs need to be stored in seperate collections. At time of writing, having it all stored in


Typo - separate?

jweiser · 2019-10-29T14:21:12Z

orthoinference/src/main/java/org/reactome/orthoinference/SkipInstanceChecker.java

+			String eligibleEventName = reactionInst.getAttributeValue(DB_ID).toString() + "\t" + reactionInst.getDisplayName() + "\n";
+			Files.write(Paths.get(eligibleFilehandle), eligibleEventName.getBytes(), StandardOpenOption.APPEND);


Similar to https://github.com/reactome/data-release-pipeline/pull/104/files#r340084682

jweiser · 2019-10-29T14:41:28Z

orthoinference/src/main/java/org/reactome/orthoinference/SkipInstanceChecker.java

+		for (GKInstance reactionInput : reactionInputs) {
+			if (!componentIsInferrable(reactionInput)) {
+				return false;
+			}
+		}
+		// Screen outputs
+		for (GKInstance reactionOutput : reactionOutputs) {
+			if (!componentIsInferrable(reactionOutput)) {
+				return false;
+			}
+		}
+		// Screen catalyst PhysicalEntities
+		for (GKInstance reactionCatalystPE : reactionCatalystPEs) {
+			if (!componentIsInferrable(reactionCatalystPE)) {
+				return false;
+			}
+		}
+		return true;


You could do this:

return Stream.of(reactionInputs, reactionOutputs, reactionCatalystPEs) .allMatch(reactionInput -> componentIsInferrable(reactionInput));

The only thing is the componentIsInferrable exception would need to be handled in that method rather than thrown. If it's practical to handle the exception in the method it gets thrown rather than throwing it to the method caller it can help reduce the methods which throw exceptions.

@jweiser I'm not entirely sure what would be equivalent. In the code here, the function will return as soon as something is found to not be inferrable. Does Stream.of(...).allMatch(...) short-circuit at the first failed match, or will it evaluate everything? It's not clear to me. Is there an alternate to allMatch that will fail on the first failure?

Forcing an unnecessary (I assume) evaluation of everything in all of those collections seems rather inefficient, just for the sake of slightly niftier-looking code (ignoring the fact that lambdas already introduce performance overheads).

Yes, allMatch is a short-circuiting operation: https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#allMatch-java.util.function.Predicate-

One way to simplify this might be to add all of the inputs, outputs, PEs into a big list and then just use one loop:

List<GKInstance> bigList = new ArrayList<>(); bigList.addAll(reactionInputs); bigList.addAll(reactionCatalystPEs); bigList.addAll(reactionOutputs); for (GKInstance instance : bigList) { if (!componentIsInferrable(instance)) { return false; } } return true;

This still has return in the middle of the function, but only in two places, and the code is smaller so it's not as bad ;)

jweiser · 2019-10-29T16:05:55Z

orthoinference/src/main/java/org/reactome/orthoinference/SkipInstanceChecker.java

+		if (!SpeciesCheckUtility.checkForSpeciesAttribute(reactionComponent)) {
+//				return true;


This could be a guard clause and the return true; not commented out. Empty blocks are best avoided.

jweiser · 2019-10-29T16:08:29Z

orthoinference/src/main/java/org/reactome/orthoinference/SkipInstanceChecker.java

+//				return true;
+		} else if (reactionComponent.getSchemClass().isa(GenomeEncodedEntity))
+		{
+			if (reactionComponent.getSchemClass().toString().contains(EntityWithAccessionedSequence)) {


Why contains here rather than equals or isa?

isa probably wouldn't make sense since this is comparing Strings. It's possible that the String-representation of the SchemaClass is something like [EntityWithAccessionedSequence] so equals would fail as well. Not sure...

It think it could be
reactionComponent.getSchemClass().isa(EntityWithAccessionedSequence)
or
reactionComponent.getSchemClass().getName().equals(EntityWithAccessionedSequence)

jweiser

Looks good overall. Mostly style/organization related comments.

SolomonShorser-OICR · 2019-10-30T18:54:40Z

orthoinference/src/main/java/org/reactome/orthoinference/SkipInstanceChecker.java

+			if (reactionComponent.getSchemClass().isa(Complex) || reactionComponent.getSchemClass().isa(Polymer)) {
+				int percent = 0;
+				if (totalProteinCounts > 0) {
+					percent = (inferrableProteinCounts * 100) / totalProteinCounts;


Using an int for percent doesn't cause any loss-of-precision problems?

double percent = 0 at line 199 would be good

SolomonShorser-OICR

Please have a look at the logic in ReactioneInferrer.inferReaction. There are two empty return statements. I feel that using empty returns in a large void function as a method of flow-control is not a good practice. They can easily get lost and make it more difficult to comprehend the logic flow. If possible, adjust this function so that it does not need any empty return statements.

cookersjs added 7 commits October 16, 2019 09:28

Inputs and outputs of Reactions screened before attempting inference

057b22c

Most inputs/outputs screened and all catalysts screened before attemp…

063db2f

…ting inference

Orthoinference now creates hierarchies for ALL inferred reactions, no…

a4bfeba

…t just ones created on current run

Fixed bug in inferrable reaction checker causing output to also appea…

426c654

…r as an input

Prevent manually inferred RlEs getting into alreadyInferred data stru…

77c4726

…ctures

Comments for SkipInstanceChecker's new code

83299cd

Report header added; Moved eligibility-determining code

83dfd45

cookersjs added enhancement New feature or request orthoinference Issues related to the Orthoinference project labels Oct 17, 2019

cookersjs requested review from jweiser and SolomonShorser-OICR October 17, 2019 18:51

jweiser reviewed Oct 28, 2019

View reviewed changes

SolomonShorser-OICR reviewed Oct 28, 2019

View reviewed changes

jweiser reviewed Oct 28, 2019

View reviewed changes

jweiser reviewed Oct 29, 2019

View reviewed changes

SolomonShorser-OICR reviewed Oct 30, 2019

View reviewed changes

SolomonShorser-OICR suggested changes Oct 30, 2019

View reviewed changes

Shifted comment to more relevant place

93b0e15

		if (!seenPrecedingEvent.contains(inferrableEventInst)) {
		if (inferrableEventInst.getAttributeValue(precedingEvent) != null) {

		String inferredEvent = infReactionInst.getAttributeValue(DB_ID).toString() + "\t" + infReactionInst.getDisplayName() + "\n";
		Files.write(Paths.get(inferredFilehandle), inferredEvent.getBytes(), StandardOpenOption.APPEND);

		String eligibleEventName = reactionInst.getAttributeValue(DB_ID).toString() + "\t" + reactionInst.getDisplayName() + "\n";
		Files.write(Paths.get(eligibleFilehandle), eligibleEventName.getBytes(), StandardOpenOption.APPEND);

		if (!SpeciesCheckUtility.checkForSpeciesAttribute(reactionComponent)) {
		// return true;

Feature/orthoinference 71 updates #104

Are you sure you want to change the base?

Feature/orthoinference 71 updates #104

Uh oh!

Conversation

cookersjs commented Oct 17, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SolomonShorser-OICR Oct 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jweiser Oct 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jweiser Oct 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jweiser Oct 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SolomonShorser-OICR Oct 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jweiser Oct 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

SolomonShorser-OICR Oct 28, 2019 •

edited

Loading

jweiser Oct 28, 2019 •

edited

Loading

jweiser Oct 28, 2019 •

edited

Loading

jweiser Oct 29, 2019 •

edited

Loading

SolomonShorser-OICR Oct 30, 2019 •

edited

Loading

jweiser Oct 29, 2019 •

edited

Loading