Skip to content

Path Query to Cypher

Yash Sharma edited this page Aug 29, 2017 · 1 revision

Note - The Path Query to Cypher conversion code is in org.intermine.neo4j.cypher package.

The InterMine graph is queried by the user via the API using Path Queries. On the other hand, Neo4j database is queried by Cypher Query Language. Therefore, we needed to convert the Path Query given by the user to Cypher Query. Following is an example Path Query.

<query name="" model="genomic" view="Gene.primaryIdentifier" longDescription="" sortOrder="Gene.primaryIdentifier asc">
    <constraint path="Gene.chromosomeLocation" op="DOES NOT OVERLAP">
        <value>2L:14615435..14619002
        </value>
    </constraint>
</query>

As you can see, various paths make up most of the Path Query. The information represented by paths is

  • either returned as views,
  • constrained to filter the data or
  • used to order the results etc.

A data structure called Path Tree is used during the Path Query to Cypher.

PathTree

Roughly, the following approach is used to convert Path Query to Cypher.

1. Take a PathQuery object as input.
2. Retrieve all the Paths from the Views, Constraints & Sort Order.
3. Using all the Paths of PathQuery, create a PathTree such that 
	1. Each TreeNode represents a component of the path. For example, the path "Gene.pathways.identifier" forms three TreeNodes i.e. Gene, Pathways & Identifier.
	2. Paths with common prefix have the same common ancestor.
	3. Root TreeNode represents a Graph Node.
	4. All Leaves of the PathTree always represent Graph Properties.
	5. All other Internal TreeNodes can represent either Graph Nodes or Graph Relationships.
4. Generate & store a unique variable name to each Internal node of the PathTree.
	1. This variable name will be used for referring that TreeNode in the cypher query.
	2. For generating the variable name, we can separate each component of the path using underscores. For example, the variable name for "Gene.pathways.identifier" will be gene_pathways_identifier.
5. Use the PathTree & PathQuery to generate the cypher query
	1. For creating the Match Clause, starting with the Root, recursively match each TreeNode of the PathTree,
		1. If the TreeNode is Root, match the node itself. e.g. (gene).
		2. If current TreeNode is a NODE,
			1. If its parent is also a NODE, then fetch the Relationship Type from the XML data model file and create the match as (parentNode)-[relationshipFromXml]-(currentNode).
			2. If the parent is a RELATIONSHIP, then fetch the grand parent from the PathTree and create match as (grandParentNode)-[parentNode]-(currentNode).
		3. If current TreeNode is a RELATIONSHIP, 
	        	1. If the current node does not have any children, then add a match with an empty node as (parentNode)-[currentNode]-().
	        	2. If the current node has any children, then do nothing (they will be matched when recursion reaches the children).
	2. For creating the WHERE clause,
		1. For each constraint in the PathQuery, generate an equivalent Cypher constraint.
		2. In the constraint logic of the PathQuery, replace the constraint code of each constraint with its equivalent Cypher constraint.
	3. For creating the RETURN clause
		1. For each view, get its path
		2. Get the variableName of the last TreeNode of the path
		3. Add variableNames separated by commas for each such variable
	4. For creating ORDER BY clause,
		1. For each Sort Order, get its path
		2. Get the variableName of the last TreeNode of the path
		3. Add variableName ASC/DESC separated by commas for each such variable
	5. For handling JOIN operations in the PathQuery,
		1. Add OPTIONAL MATCH clause in the query for corresponding paths.
7. Return the generated query

The above approach is implemented in QueryGenerator class. After converting the Path Query given above, we will get the following Cypher query.

MATCH (gene :Gene),
(gene)-[gene_chromosomelocation:LOCATED_ON]-()
WHERE NOT (gene_chromosomelocation.start <= 14619002 AND gene_chromosomelocation.end >= 14615435 AND ()-[gene_chromosomelocation]-(:Chromosome {primaryIdentifier:'2L'}))
RETURN gene.primaryIdentifier
ORDER BY gene.primaryIdentifier ASC

Clone this wiki locally