Skip to content
This repository was archived by the owner on Aug 14, 2024. It is now read-only.

Feature/orthoinference 71 updates #104

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,12 @@ public class EventsInferrer
private static String releaseVersion;
private static GKInstance instanceEditInst;
private static GKInstance speciesInst;
private static Map<GKInstance,GKInstance> manualEventToNonHumanSource = new HashMap<>();
private static List<GKInstance> manualHumanEvents = new ArrayList<>();
private static Map<GKInstance, GKInstance> eventsAlreadyInferredMap = new HashMap<>();
private static List<GKInstance> eventsAlreadyInferred = new ArrayList<>();
private static StableIdentifierGenerator stableIdentifierGenerator;
private static OrthologousPathwayDiagramGenerator orthologousPathwayDiagramGenerator;
private static final String summationText = "This event has been computationally inferred from an event that has been demonstrated in another species.<p>The inference is based on the homology mapping from PANTHER. Briefly, reactions for which all involved PhysicalEntities (in input, output and catalyst) have a mapped orthologue/paralogue (for complexes at least 75% of components must have a mapping) are inferred to the other species. High level events are also inferred for these events to allow for easier navigation.<p><a href='/electronic_inference_compara.html' target = 'NEW'>More details and caveats of the event inference in Reactome.</a> For details on PANTHER see also: <a href='http://www.pantherdb.org/about.jsp' target='NEW'>http://www.pantherdb.org/about.jsp</a>";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the "target = 'NEW'" attribute that is used two times is allowed in modern HTML. For a new window or tab, 'target=_blank' is the attribute value to use:

https://www.w3schools.com/tags/att_a_target.asp

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this string be split on to several lines? This is a bit wide, even for me.



@SuppressWarnings("unchecked")
public static void inferEvents(Properties props, String species) throws Exception
Expand Down Expand Up @@ -103,7 +105,7 @@ public static void inferEvents(Properties props, String species) throws Exceptio
String inferredFilename = "inferred_" + species + "_75.txt";
createNewFile(eligibleFilename);
createNewFile(inferredFilename);
ReactionInferrer.setEligibleFilename(eligibleFilename);
SkipInstanceChecker.setEligibleFilename(eligibleFilename);
ReactionInferrer.setInferredFilename(inferredFilename);

stableIdentifierGenerator = new StableIdentifierGenerator(dbAdaptor, (String) speciesObject.get("abbreviation"));
Expand All @@ -113,6 +115,7 @@ public static void inferEvents(Properties props, String species) throws Exceptio
Map<String,String[]> homologueMappings = readHomologueMappingFile(species, "hsap", pathToOrthopairs);
ProteinCountUtility.setHomologueMappingFile(homologueMappings);
EWASInferrer.setHomologueMappingFile(homologueMappings);
SkipInstanceChecker.setHomologueMappingFile(homologueMappings);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name setHomologueMappingFile is a little misleading since it is taking the homologue map data structure itself rather than the file name. Could it be setHomologueMap instead?

} catch (FileNotFoundException e) {
logger.fatal("Unable to locate " + speciesName +" mapping file: hsap_" + species + "_mapping.txt. Orthology prediction not possible.");
return;
Expand Down Expand Up @@ -165,19 +168,19 @@ public static void inferEvents(Properties props, String species) throws Exceptio
logger.info("Attempting RlE inference: " + reactionInst);
// Check if the current Reaction already exists for this species, that it is a valid instance (passes some filters), and that it doesn't have a Disease attribute.
// Adds to manualHumanEvents array if it passes conditions. This code block allows you to re-run the code without re-inferring instances.
List<GKInstance> previouslyInferredInstances = new ArrayList<GKInstance>();
List<GKInstance> previouslyInferredInstances = new ArrayList<>();
previouslyInferredInstances = checkIfPreviouslyInferred(reactionInst, orthologousEvent, previouslyInferredInstances);
previouslyInferredInstances = checkIfPreviouslyInferred(reactionInst, inferredFrom, previouslyInferredInstances);
if (previouslyInferredInstances.size() > 0)
{
previouslyInferredInstances.addAll(checkIfPreviouslyInferred(reactionInst, inferredFrom, previouslyInferredInstances));
Copy link
Member

@jweiser jweiser Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since previouslyInferredInstances is only added to and not queried in the method checkIfPreviouslyInferred, passing it as a parameter isn't needed. It's best practice not to modify parameters, but instead to create and return a different variable in the method body.

You could do this part of the code as follows:

List<GKInstance> previouslyInferredInstances = checkIfPreviouslyInferred(
    reactionInst, Arrays.asList(inferredFrom, orthologousEvent)
);

Then the checkIfPreviouslyInferred method could be:

private static List<GKInstance> checkIfPreviouslyInferred(GKInstance reactionInst, List<String> attributes) throws Exception 
{
	List previouslyInferredInstances = new ArrayList<>();
	for (String attribute : attributes) {
		for (GKInstance attributeInst : (Collection<GKInstance>) reactionInst.getAttributeValuesList(attribute)) {
			GKInstance reactionSpeciesInst = (GKInstance) attributeInst.getAttributeValue(species);
			if (reactionSpeciesInst.getDBID().equals(speciesInst.getDBID()) && attributeInst.getAttributeValue(isChimeric) == null) {
				previouslyInferredInstances.add(attributeInst);
			}
		}
	}
	return previouslyInferredInstances;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it may be better to rename checkIfPreviouslyInferred to getInstancesIfPreviouslyInferred to make it clear that the list of previously inferred instances is what the method returns. A method name starting with "check" may be more suitable to a void or possibly a boolean return value.

if (previouslyInferredInstances.size() > 0) {
GKInstance prevInfInst = previouslyInferredInstances.get(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a check needed to see if the list contains more than one element? If the list has more than one previously inferred instances would that be an error?

if (prevInfInst.getAttributeValue(disease) == null)
{
GKInstance prevInfSummationInst = (GKInstance) prevInfInst.getAttributeValue(summation);
String prevInfSummationText = prevInfSummationInst.getAttributeValue(text).toString();
if (prevInfInst.getAttributeValue(disease) == null && prevInfSummationText.equals(summationText)) {
logger.info("Inferred RlE already exists, skipping inference");
manualEventToNonHumanSource.put(reactionInst, prevInfInst);
manualHumanEvents.add(reactionInst);
eventsAlreadyInferredMap.put(reactionInst, prevInfInst);
eventsAlreadyInferred.add(reactionInst);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the eventsAlreadyInferredMap and eventsAlreadyInferred data structures both needed? Could the list populated in eventsAlreadyInferred be obtained from the key set of the eventsAlreadyInferredMap?

} else {
logger.info("Disease reaction, skipping inference");
logger.info("Either a disease or manually inferred reaction, skipping inference");
}
continue;
}
Expand All @@ -191,8 +194,11 @@ public static void inferEvents(Properties props, String species) throws Exceptio
return;
}
}
PathwaysInferrer.setInferredEvent(ReactionInferrer.getInferredEvent());
PathwaysInferrer.inferPathways(ReactionInferrer.getInferrableHumanEvents());
// Retrieve events inferred from this run, and any that were already inferred. Combine them and then begin Pathway inference.
// The two methods below perform this for a map, containing the original RlE and the inferred RlE, and a List of just the inferred RlEs.
// The latter will be iterated through when building Pathway hierarchies, the former when information from original RlE is needed during this build.
PathwaysInferrer.setInferredEvent(ReactionInferrer.getInferredEvent(eventsAlreadyInferredMap));
PathwaysInferrer.inferPathways(ReactionInferrer.getInferrableHumanEvents(eventsAlreadyInferred));
orthologousPathwayDiagramGenerator.generateOrthologousPathwayDiagrams();
outputReport(species);
logger.info("Finished orthoinference of " + speciesName);
Expand All @@ -218,7 +224,7 @@ private static void setReleaseDates(String dateOfRelease)
}

@SuppressWarnings("unchecked")
private static List<GKInstance> checkIfPreviouslyInferred(GKInstance reactionInst, String attribute, List<GKInstance> previouslyInferredInstances) throws InvalidAttributeException, Exception
private static List<GKInstance> checkIfPreviouslyInferred(GKInstance reactionInst, String attribute, List<GKInstance> previouslyInferredInstances) throws Exception
{
for (GKInstance attributeInst : (Collection<GKInstance>) reactionInst.getAttributeValuesList(attribute))
{
Expand All @@ -233,16 +239,18 @@ private static List<GKInstance> checkIfPreviouslyInferred(GKInstance reactionIns

private static void outputReport(String species) throws IOException
{
int eligibleCount = ReactionInferrer.getEligibleCount();
int eligibleCount = SkipInstanceChecker.getEligibleCount();
int inferredCount = ReactionInferrer.getInferredCount();
float percentInferred = (float) 100*inferredCount/eligibleCount;
// Create file if it doesn't exist
String reportFilename = "report_ortho_inference_test_reactome_" + releaseVersion + ".txt";
logger.info("Updating " + reportFilename);
if (!Files.exists(Paths.get(reportFilename))) {
createNewFile(reportFilename);
String reportHeader = "## Number of inferred reactions by species for Reactome Release " + releaseVersion;
Files.write(Paths.get(reportFilename), reportHeader.getBytes(), StandardOpenOption.APPEND);
}
String results = "hsap to " + species + ":\t" + inferredCount + " out of " + eligibleCount + " eligible reactions (" + String.format("%.2f", percentInferred) + "%)\n";
String results = "hsap to " + species + ":\tInferred " + inferredCount + " out of " + eligibleCount + " eligible reactions (" + String.format("%.2f", percentInferred) + "%)\n";
Files.write(Paths.get(reportFilename), results.getBytes(), StandardOpenOption.APPEND);
}

Expand Down Expand Up @@ -302,7 +310,6 @@ private static void setSummationInstance() throws Exception
GKInstance summationInst = new GKInstance(dbAdaptor.getSchema().getClassByName(Summation));
summationInst.setDbAdaptor(dbAdaptor);
summationInst.addAttributeValue(created, instanceEditInst);
String summationText = "This event has been computationally inferred from an event that has been demonstrated in another species.<p>The inference is based on the homology mapping from PANTHER. Briefly, reactions for which all involved PhysicalEntities (in input, output and catalyst) have a mapped orthologue/paralogue (for complexes at least 75% of components must have a mapping) are inferred to the other species. High level events are also inferred for these events to allow for easier navigation.<p><a href='/electronic_inference_compara.html' target = 'NEW'>More details and caveats of the event inference in Reactome.</a> For details on PANTHER see also: <a href='http://www.pantherdb.org/about.jsp' target='NEW'>http://www.pantherdb.org/about.jsp</a>";
summationInst.addAttributeValue(text, summationText);
summationInst.addAttributeValue(_displayName, summationText);
summationInst = InstanceUtilities.checkForIdenticalInstances(summationInst, null);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ private static GKInstance createInfEWAS(GKInstance ewasInst, boolean override) t
}
// Infers Complex or Polymer instances. These instances are generally comprised of more than 1 PhysicalEntity, and calls 'createOrthoEntity' for each one. Complex/Polymer instances
// are also subject to the 'countDistinctProteins' function. The result from this needs to have at least 75% of total proteins to be inferrable for inference to continue.
private static GKInstance createInfComplexPolymer(GKInstance complexInst, boolean override) throws InvalidAttributeException, InvalidAttributeValueException, Exception
private static GKInstance createInfComplexPolymer(GKInstance complexInst, boolean override) throws Exception
{
if (complexPolymerIdenticals.get(complexInst) == null)
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,6 @@ public static void inferPathways(List<GKInstance> inferrableHumanEvents) throws
addInferredEventsToInferredPathways();
logger.info("Finished populating inferred Pathways with inferred Events");

//TODO: LOG starting HERE

// Connect preceding events to RlEs, if they have any in the source species.
logger.info("Adding preceding events to inferred Events");
inferPrecedingEvents();
Expand Down Expand Up @@ -168,40 +166,31 @@ private static List<GKInstance> getInferredEventInstances(GKInstance humanPathwa
private static void inferPrecedingEvents() throws Exception
{
Set<GKInstance> seenPrecedingEvent = new HashSet<>();
for (GKInstance inferrableEventInst : updatedInferrableHumanEvents)
{
if (!seenPrecedingEvent.contains(inferrableEventInst))
{
if (inferrableEventInst.getAttributeValue(precedingEvent)!= null)
{
for (GKInstance inferrableEventInst : updatedInferrableHumanEvents) {
if (!seenPrecedingEvent.contains(inferrableEventInst)) {
if (inferrableEventInst.getAttributeValue(precedingEvent) != null) {
Comment on lines +170 to +171
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these conditions be merged into one if-statement as !seenPrecedingEvent.contains(inferrableEventInst) && inferrableEventInst.getAttributeValue(precedingEvent) != null ?
I don't see an else-statement pairing with the inner if-statement so it seems like it should be OK, and would make the rest of this a little more readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two conditions could be combined with && if seenPrecedingEvent.add(inferrableEventInst); (line 198) is moved outside the inner if condition. Since seenPrecedingEvent is a set, it won't cause a problem if you add an instance that is already present in the set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also want to consider refactoring the body of the if statement to its own method so that you have the operation occurring for each inferrableEventInst isolated.

logger.info("Adding preceding event to " + inferrableEventInst);
List<GKInstance> precedingEventInstances = new ArrayList<>();
// Find all preceding events for source instance that have an inferred counterpart
for (GKInstance precedingEventInst : (Collection<GKInstance>) inferrableEventInst.getAttributeValuesList(precedingEvent))
{
if (inferredEventIdenticals.get(precedingEventInst) != null)
{
for (GKInstance precedingEventInst : (Collection<GKInstance>) inferrableEventInst.getAttributeValuesList(precedingEvent)) {
if (inferredEventIdenticals.get(precedingEventInst) != null) {
precedingEventInstances.add(inferredEventIdenticals.get(precedingEventInst));
}
}
Set<String> inferredPrecedingEvents = new HashSet<>();
// Find any inferred preceding events that already exist for the inferred instance (don't want to add any redundant preceding events)
for (GKInstance precedingEventInst : (Collection<GKInstance>) inferredEventIdenticals.get(inferrableEventInst).getAttributeValuesList(precedingEvent))
{
for (GKInstance precedingEventInst : (Collection<GKInstance>) inferredEventIdenticals.get(inferrableEventInst).getAttributeValuesList(precedingEvent)) {
inferredPrecedingEvents.add(precedingEventInst.getDBID().toString());
}
List<GKInstance> updatedPrecedingEventInstances = new ArrayList<>();
// Find existing preceding events that haven't already been attached to the inferred instance
for (GKInstance precedingEventInst : precedingEventInstances)
{
if (!inferredPrecedingEvents.contains(precedingEventInst.getDBID().toString()))
{
for (GKInstance precedingEventInst : precedingEventInstances) {
if (!inferredPrecedingEvents.contains(precedingEventInst.getDBID().toString())) {
Copy link
Member

@jweiser jweiser Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could inferredPrecedingEvents be typed as Set<Long> instead of Set<String> so a call to toString() isn't needed?

updatedPrecedingEventInstances.add(precedingEventInst);
}
}
// Add preceding event to inferred instance
if (updatedPrecedingEventInstances != null && updatedPrecedingEventInstances.size() > 0)
{
if (updatedPrecedingEventInstances != null && updatedPrecedingEventInstances.size() > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null check shouldn't be necessary since updatingPrecedingEventInstances is initialized as an empty ArrayList.

inferredEventIdenticals.get(inferrableEventInst).addAttributeValue(precedingEvent, updatedPrecedingEventInstances);
dba.updateInstanceAttribute(inferredEventIdenticals.get(inferrableEventInst), precedingEvent);
}
Expand Down
Loading