Skip to content

Commit 94aad97

Browse files
committed
Add Regular Path Patterns to Cypher
1 parent a670fe0 commit 94aad97

File tree

3 files changed

+353
-1
lines changed

3 files changed

+353
-1
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
= CIP2017-02-06 Regular Path Patterns
2+
:numbered:
3+
:toc:
4+
:toc-placement: macro
5+
:source-highlighter: codemirror
6+
7+
*Authors:* Tobias Lindaaker <tobias[email protected]>
8+
9+
toc::[]
10+
11+
== Regular Path Patterns
12+
13+
Above and beyond the types of patterns that can be expressed in Cypher using the normal path syntax, Cypher also supports what amounts to regular expressions over paths.
14+
Queries of this type are typically referred to as Regular Path Queries (RPQs).
15+
In Cypher Regular Path Queries are expressed through the use of _Regular Path Patterns_.
16+
17+
A Regular Path Pattern is defined as:
18+
19+
• A simple relationship type +
20+
`()-/:X/-()` denotes a Regular Path Pattern matching relationships of type `X`.
21+
• A predicate on the labels of a node +
22+
`()-/(:Z)/-()` denotes a Regular Path Pattern matching nodes with label `Z`.
23+
• A sequence of Regular Path Patterns +
24+
`()-/_a_ _b_/-()` denotes a Regular Path Pattern matching first the pattern defined by `_a_`, then the pattern defined by `_b_` (in order left to right).
25+
• An alternative between Regular Path Patterns +
26+
`()-/_a_ | _b_/-()` denotes a Regular Path Pattern matching either the pattern defined by `_a_` or the pattern defined by `_b_`.
27+
• A repetition of a Regular Path Pattern +
28+
`()-/_a_*/-()` denotes a Regular Path Pattern matching the pattern defined by `_a_` zero or more times. +
29+
`()-/_a_+/-()` denotes a Regular Path Pattern matching the pattern defined by `_a_` one or more times. +
30+
`()-/_a_*_x_../-()` denotes a Regular Path Pattern matching the pattern defined by `_a_` `_x_` or more times. +
31+
`()-/_a_*_x_.._y_/-()` denotes a Regular Path Pattern matching the pattern defined by `_a_` at least `_x_` times and at most `_y_` times.
32+
• A grouping of a Regular Path Pattern +
33+
`()-/[_a_]/-()` denotes a grouping of the pattern `_a_`.
34+
• A specification of direction for a Regular Path Pattern +
35+
`()-/ _a_ >/-()` denotes that the Regular Path Pattern `_a_` should be interpreted in a left-to-right direction. +
36+
`()-/< _a_ /-()` denotes that the Regular Path Pattern `_a_` should be interpreted in a right-to-left direction. +
37+
`()-/< _a_ >/-()` denotes that the Regular Path Pattern `_a_` should be interpreted in any direction.
38+
• A reference to a Defined Path Predicate +
39+
`()-/alpha/-()` denotes a reference to a Defined Path Predicate named `alpha`.
40+
41+
Regular Path Patterns are written similarly to how relationship patterns are written, but enclosed within two slash (`/`) characters instead of brackets (`[]`).
42+
43+
Contrary to Relationship Patterns, Regular Path Patterns do _not_ allow binding a relationship to a variable.
44+
In order to bind the matching path to a variable, a Path Assignment should be used, by preceding the path with an identifier and an equals sign (`=`).
45+
This avoids a problem that existed in the past with repetition of relationships (a syntax that is deprecated with the introduction of Regular Path Patterns), where a relationship variable would bind to a list, making it hard to express predicates over the actual relationships.
46+
Predicates on parts of a Regular Path Pattern are instead expressed through the use of explicitly defined path predicates.
47+
48+
=== Syntax
49+
50+
The syntax of Regular Path Patterns fit into the greater Cypher syntax through `PatternElementChain`.
51+
52+
----
53+
PatternElementChain = (RelationshipPattern | RegularPathPattern), NodePattern ;
54+
55+
RegularPathPattern = (LeftArrowHead, Dash, '/', [RegularPathExpression], '/', Dash, RightArrowHead)
56+
| (LeftArrowHead, Dash, '/', [RegularPathExpression], '/', Dash)
57+
| (Dash, '/', [RegularPathExpression], '/', Dash, RightArrowHead)
58+
| (Dash, '/', [RegularPathExpression], '/', Dash)
59+
;
60+
RegularPathExpression = {RegularPathAlternative}- ;
61+
RegularPathAlternative = RegularPathSequence, {'|', RegularPathSequence} ;
62+
RegularPathSequence = {RegularPathStar}- ;
63+
RegularPathStar = RegularPathDirected [('*', [RangeLiteral]) | '+'] ;
64+
RegularPathDirected = ['<'], RegularPathBase, ['>'] ;
65+
RegularPathBase = RegularPathRelationship
66+
| RegularPathAnyRelationship
67+
| RegularPathNode
68+
| RegularPathReference
69+
| '[' RegularPathExpression ']'
70+
;
71+
RegularPathRelationship = RelType ;
72+
RegularPathAnyRelationship = '-' ;
73+
RegularPathNode = '(' NodeLabels ')' ;
74+
RegularPathReference = SymbolicName ;
75+
----
76+
77+
The `RegularPathReference` is a reference to a Defined Path Predicate.
78+
These are defined using the following syntax:
79+
80+
----
81+
DefinedPathPredicate = 'PATH' PathPredicatePrototype, 'IS', Pattern, [Where] ;
82+
PathPredicatePrototype = '(', Variable, ')', RegularPathPrototype, '(', Variable, ')' ;
83+
RegularPathPrototype = (LeftArrowHead, Dash, '/', DefinedPathName, '/', Dash)
84+
| (Dash, '/', DefinedPathName, '/', Dash, RightArrowHead)
85+
| (Dash, '/', DefinedPathName, '/', Dash)
86+
;
87+
DefinedPathName = SymbolicName ;
88+
----
89+
90+
91+
=== Directions
92+
93+
The direction of relationships matched by a Regular Path Pattern is primarily decided by the directional arrow surrounding the pattern.
94+
If the arrow points from left to right (i.e. `(left)-/pattern/\->(right)`), the paths described by the pattern are paths in the left-to-right direction, i.e. paths that are _outgoing_ from the node to the left of the pattern, and _incoming_ to the node to the right of the pattern.
95+
If the arrow points from right to left (i.e. `(left)\<-/pattern/-(right)`), the paths described by the pattern are paths in the right-to-left paths direction, i.e. paths that are _incoming_ to the node to the left of the pattern, and _outgoing_ from the node to the right of the pattern.
96+
If there are no arrowheads (i.e. `(left)-/pattern/-(right)`), or if both arrowheads are present (i.e. `(left)\<-/pattern/\->(right)`), the paths described by the pattern are paths in either the left-to-right or the right-to-left direction.
97+
98+
All parts of a Regular Path Pattern will assume the direction of the surrounding arrow, unless the direction is explicitly overridden for that particular part of the pattern.
99+
A prefix of `<` to part of a pattern overrides the direction of that part to be right-to-left.
100+
A suffix of `>` to part of a pattern overrides the direction of that part to be left-to-right.
101+
Both a `<` prefix and a `>` suffix can be used on the same part of the pattern to override the direction of that part to be _either direction_.
102+
Direction overrides only apply to a single pattern part.
103+
In order to apply the direction override to multiple parts of the pattern, those parts should be grouped.
104+
105+
Using both a `<` prefix and a `>` suffix on the same pattern is always the same thing as a disjunction between that pattern with a `<` prefix and that pattern with a `>` suffix.
106+
This means that `()-/< _a_ >/-()` is the same as `()-/[< _a_] | [_a_ >]/-()`.
107+
108+
==== Directions and Defined Path Predicates
109+
110+
When a Defined Path Predicate is referenced the direction of reference is matched with the direction in the declaration of the Defined Path Predicate.
111+
If the declaration of the Defined Path Predicate is defined left-to-right, but the direction of the reference is right-to-left, the direction of definition of the the Defined Path Predicate is reversed to match that of the reference.
112+
The same reversal applies if the Defined Path Predicate is defined right-to-left but the direction of the reference is left-to-right.
113+
If the direction of the reference is _either direction_, the Defined Path Predicate is matched both in its declared direction and its reversed direction.
114+
If a Defined Path Predicate is declared without a direction, the direction of the reference does not matter, since the direction of the Defined Path Predicate is inherently _any direction_.
115+
A Defined Path Predicate declared without a direction must have a definition that is equivalent if reversed.
116+
117+
==== Direction examples
118+
119+
• `()-/a <[b c] d/\->()` is the same as `()-/a/\->()\<-/b c/-()-/d/\->(d)`, i.e. the direction of the group `b c` has been overridden to be right-to-left in a pattern where the overall direction is left-to-right.
120+
• `()-/a <b> c/\->()` is the same as `()-/a/\->()-/b/-()-/c/\->()`, i.e. the direction of `b` has been overridden to be _either direction_.
121+
• `()-/a/-()`, `()-/<a>/-()`, `()-/<a>/\->()`, `()\<-/<a>/-()`, `()\<-/<a>/\->()`, and `()\<-/a/\->()` all mean the same thing: matching `a` in _either direction_.
122+
123+
Given these Defined Path Predicates:
124+
125+
[source, cypher]
126+
----
127+
PATH (l)-/alpha/->(r) IS (l)-[:X]->()-[:Y]->(r)
128+
PATH (l)-/beta/->(r) IS (l)<-[:Y]-()<-[:X]-(r)
129+
PATH (l)-/gamma/-(r) IS (l)-/[:X :Y]> | <[:Y :X]/-(r)
130+
----
131+
132+
• `()-/alpha/\->()` is equivalent to `()\<-/beta/-()`
133+
• `()\<-/alpha/-()` is equivalent to `()-/beta/\->()`
134+
• `()-/gamma/\->()` is equivalent to `()\<-/gamma/-()`, since both are equivalent to `()-/gamma/-()`
135+
• `()-/gamma/-()` is equivalent to `()-/alpha/-()`, since `()-/alpha/-()` is the same as `()-/alpha> | <alpha/-()`, which is equivalent to the declaration of `gamma`. +
136+
It is also equivalent to `()-/<beta | beta>/-()` which is the same as `()-/beta/-()`.
137+
138+
=== Regular Path Pattern Examples
139+
140+
The astute reader of the syntax will have noticed that it is possible to express a Regular Path Pattern with an empty path expression:
141+
142+
[source, cypher]
143+
----
144+
MATCH (a)-//-(b)
145+
----
146+
147+
This pattern simply states that `a` and `b` must be the same node, and is thus the same as:
148+
149+
[source, cypher]
150+
----
151+
MATCH (a), (b) WHERE a = b
152+
----
153+
154+
The same reader will also have noticed that it is possible to define a pattern containing just a relationship type:
155+
156+
[source, cypher]
157+
----
158+
MATCH (a)-/:KNOWS/->(b)
159+
----
160+
161+
That pattern is indeed equivalent to the very similar relationship pattern:
162+
163+
[source, cypher]
164+
----
165+
MATCH (a)-[:KNOWS]->(b)
166+
----
167+
168+
The main difference being that the variant with a relationship pattern is able to bind that relationship and express further predicates over it.
169+
170+
The Regular Path Patterns start becoming interesting when larger expressions are put together:
171+
172+
[source, cypher]
173+
.Finding someone loved by someone hated by someone you know, transitively
174+
----
175+
MATCH (you)-/[:KNOWS :HATES]+ :LOVES/->(someone)
176+
----
177+
178+
Note the `+` expressing one or more occurrences of the sequence `KNOWS` followed by `HATES`.
179+
180+
The direction of each relationship is governed by the overall direction of the Regular Path Pattern.
181+
It is however possible to explicitly define the direction for a particular part of the pattern.
182+
This is done by either prefixing that part with `<` for a right-to-left direction or suffix it with `>` for a left-to-right direction.
183+
It is possible to both prefix the part with `<` and suffix it with `>`, indicating that this part of the pattern matches in any direction.
184+
185+
[source, cypher]
186+
.Specifying the direction for different parts of the pattern
187+
----
188+
MATCH (you)-/[:KNOWS <:HATES]+ :LOVES/->(someone)
189+
----
190+
191+
In the example above we say that the `HATES` relationships should have the opposite direction to the other relationships in the path.
192+
193+
Through the use of Defined Path Predicates we can express even more predicates over a path:
194+
195+
[source, cypher]
196+
.Find a chain of unreciprocated lovers
197+
----
198+
MATCH (you)-/unreciprocated_love*/->(someone)
199+
PATH (a)-/unreciprocated_love/->(b) IS
200+
(a)-[:LOVES]->(b)
201+
WHERE NOT EXISTS { (b)-[:LOVES]->(a) }
202+
----
203+
204+
Note how there is no colon used for referencing the Defined Path Predicate, the colon is used in Regular Path Patterns only for referencing actual relationship types.
205+
206+
Sometimes it will be interesting to express a predicate on a node in a Regular Path Pattern.
207+
This can be achieved by using a Defined Path Predicate where the nodes on both ends are the same:
208+
209+
[source, cypher]
210+
.Find friends of friends that are not haters
211+
----
212+
MATCH (you)-/:KNOWS not_a_hater :KNOWS/-(friendly_friend_of_friend)
213+
PATH (x)-/not_a_hater/-(x) IS (x)
214+
WHERE NOT EXISTS { (x)-[:HATES]->() }
215+
----
216+
217+
In the case of a Defined Path Predicate where both nodes are the same, the direction of the predicate is irrelevant.
218+
In general the direction of a Defined Path Predicate is quite important, and used for mapping the pattern in the predicate into the Regular Path Patterns that reference it.
219+
The only cases where it is allowed to omit the direction of a Defined Path Predicate is when the defined predicate is reflexive.
220+
This is obviously the case when both nodes are the same, but it would also be the case when the internal pattern is symmetrical, such as in the following example:
221+
222+
[source, cypher]
223+
.Find chains of co-authorship
224+
----
225+
MATCH (you)-/co_author*/-(someone)
226+
PATH (a)-/co_author/-(b) IS
227+
(a)-[:AUTHORED]->(:Book)<-[:AUTHORED]-(b)
228+
WHERE a <> b
229+
----

grammar/basic-grammar.xml

+84-1
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,10 @@
8282
</production>
8383

8484
<production name="PatternElementChain" rr:inline="true">
85-
<non-terminal ref="RelationshipPattern"/>
85+
<alt>
86+
<non-terminal ref="RelationshipPattern"/>
87+
<non-terminal ref="RegularPathPattern"/>
88+
</alt>
8689
&WS;
8790
<non-terminal ref="NodePattern"/>
8891
</production>
@@ -125,6 +128,86 @@
125128
]
126129
</production>
127130

131+
<production name="RegularPathPattern">
132+
<alt>
133+
<seq>
134+
<non-terminal ref="LeftArrowHead" rr:title="&lt;"/>
135+
&WS; <non-terminal ref="Dash" rr:title="-"/>
136+
&WS; / &WS; <opt><non-terminal ref="RegularPathExpression"/></opt> &WS; /
137+
&WS; <non-terminal ref="Dash" rr:title="-"/>
138+
&WS; <non-terminal ref="RightArrowHead" rr:title="&gt;"/>
139+
</seq>
140+
<seq>
141+
<non-terminal ref="LeftArrowHead" rr:title="&lt;"/>
142+
&WS; <non-terminal ref="Dash" rr:title="-"/>
143+
&WS; / &WS; <opt><non-terminal ref="RegularPathExpression"/></opt> &WS; /
144+
&WS; <non-terminal ref="Dash" rr:title="-"/>
145+
</seq>
146+
<seq>
147+
&WS; <non-terminal ref="Dash" rr:title="-"/>
148+
&WS; / &WS; <opt><non-terminal ref="RegularPathExpression"/></opt> &WS; /
149+
&WS; <non-terminal ref="Dash" rr:title="-"/>
150+
&WS; <non-terminal ref="RightArrowHead" rr:title="&gt;"/>
151+
</seq>
152+
<seq>
153+
&WS; <non-terminal ref="Dash" rr:title="-"/>
154+
&WS; / &WS; <opt><non-terminal ref="RegularPathExpression"/></opt> &WS; /
155+
&WS; <non-terminal ref="Dash" rr:title="-"/>
156+
</seq>
157+
</alt>
158+
</production>
159+
160+
<production name="RegularPathExpression">
161+
<repeat min=1><non-terminal ref="RegularPathAlternative"/></repeat>
162+
</production>
163+
164+
<production name="RegularPathAlternative">
165+
<non-terminal ref="RegularPathSequence"/>
166+
<opt>| <non-terminal ref="RegularPathSequence"/></opt>
167+
</production>
168+
169+
<production name="RegularPathSequence">
170+
<repeat min=1><non-terminal ref="RegularPathStar"/></repeat>
171+
</production>
172+
173+
<production name="RegularPathStar">
174+
<non-terminal ref="RegularPathDirected"/>
175+
<opt><alt>
176+
<seq>* <opt><non-terminal ref="RangeLiteral"/></opt></seq>
177+
+
178+
</alt></opt>
179+
</production>
180+
181+
<production name="RegularPathDirected">
182+
<opt>&lt;</opt> <non-terminal ref="RegularPathBase"/> <opt>&gt;</opt>
183+
</production>
184+
185+
<production name="RegularPathBase">
186+
<alt>
187+
<non-terminal ref="RegularPathRelationship"/>
188+
<non-terminal ref="RegularPathAnyRelationship"/>
189+
<non-terminal ref="RegularPathNode"/>
190+
<non-terminal ref="RegularPathReference"/>
191+
<seq>[ <non-terminal ref="RegularPathExpression"/> ]</seq>
192+
</alt>
193+
</production>
194+
195+
<production name="RegularPathRelationship">
196+
<non-terminal ref="RelType"/>
197+
</production>
198+
199+
<production name="RegularPathAnyRelationship">
200+
<non-terminal ref="Dash" rr:title="-"/>
201+
</production>
202+
203+
<production name="RegularPathNode">
204+
( <non-terminal ref="NodeLabels"/> )
205+
</production>
206+
207+
<production name="RegularPathReference">
208+
<non-terminal ref="SymbolicName"/>
209+
</production>
210+
128211
<production name="Properties" rr:inline="true">
129212
<alt>
130213
<non-terminal ref="MapLiteral"/>

grammar/cypher.xml

+40
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,8 @@
5656
</production>
5757

5858
<production name="RegularQuery" rr:inline="true">
59+
<!-- TODO: is this where we declare path predicates? -->
60+
<repeat><non-terminal ref="DefinedPathPredicate"/></repeat>
5961
<non-terminal ref="SingleQuery"/>
6062
<repeat>&WS;<non-terminal ref="Union"/></repeat>
6163
</production>
@@ -129,6 +131,44 @@
129131
<opt>&WS; <non-terminal ref="Where"/></opt>
130132
</production>
131133

134+
<production name="DefinedPathPredicate">
135+
PATH <non-terminal ref="PathPredicatePrototype"/>
136+
IS <non-terminal ref="Pattern"/>
137+
<opt><non-terminal ref="Where"/></opt>
138+
</production>
139+
140+
<production name="PathPredicatePrototype">
141+
( <non-terminal ref="Variable"/> )
142+
<non-terminal ref="RegularPathPrototype"/>
143+
( <non-terminal ref="Variable"/> )
144+
</production>
145+
146+
<production name="RegularPathPrototype">
147+
<alt>
148+
<seq>
149+
<non-terminal ref="LeftArrowHead" rr:title="&lt;"/>
150+
&WS; <non-terminal ref="Dash" rr:title="-"/>
151+
&WS; / &WS; <opt><non-terminal ref="DefinedPathName"/></opt> &WS; /
152+
&WS; <non-terminal ref="Dash" rr:title="-"/>
153+
</seq>
154+
<seq>
155+
&WS; <non-terminal ref="Dash" rr:title="-"/>
156+
&WS; / &WS; <opt><non-terminal ref="DefinedPathName"/></opt> &WS; /
157+
&WS; <non-terminal ref="Dash" rr:title="-"/>
158+
&WS; <non-terminal ref="RightArrowHead" rr:title="&gt;"/>
159+
</seq>
160+
<seq>
161+
&WS; <non-terminal ref="Dash" rr:title="-"/>
162+
&WS; / &WS; <opt><non-terminal ref="DefinedPathName"/></opt> &WS; /
163+
&WS; <non-terminal ref="Dash" rr:title="-"/>
164+
</seq>
165+
</alt>
166+
</production>
167+
168+
<production name="DefinedPathName">
169+
<non-terminal ref="SymbolicName"/>
170+
</production>
171+
132172
<production name="Unwind">
133173
UNWIND &WS; &expr; &SP; AS &SP; &var;
134174
</production>

0 commit comments

Comments
 (0)