|
| 1 | += CIP2017-02-06 Regular Path Patterns |
| 2 | +:numbered: |
| 3 | +:toc: |
| 4 | +:toc-placement: macro |
| 5 | +:source-highlighter: codemirror |
| 6 | + |
| 7 | +*Authors:* Tobias Lindaaker < tobias[email protected]> |
| 8 | + |
| 9 | +toc::[] |
| 10 | + |
| 11 | +== Regular Path Patterns |
| 12 | + |
| 13 | +Above and beyond the types of patterns that can be expressed in Cypher using the normal path syntax, Cypher also supports what amounts to regular expressions over paths. |
| 14 | +Queries of this type are typically referred to as Regular Path Queries (RPQs). |
| 15 | +In Cypher Regular Path Queries are expressed through the use of _Regular Path Patterns_. |
| 16 | + |
| 17 | +A Regular Path Pattern is defined as: |
| 18 | + |
| 19 | +• A simple relationship type + |
| 20 | + `()-/:X/-()` denotes a Regular Path Pattern matching relationships of type `X`. |
| 21 | +• A predicate on the labels of a node + |
| 22 | + `()-/(:Z)/-()` denotes a Regular Path Pattern matching nodes with label `Z`. |
| 23 | +• A sequence of Regular Path Patterns + |
| 24 | + `()-/_a_ _b_/-()` denotes a Regular Path Pattern matching first the pattern defined by `_a_`, then the pattern defined by `_b_` (in order left to right). |
| 25 | +• An alternative between Regular Path Patterns + |
| 26 | + `()-/_a_ | _b_/-()` denotes a Regular Path Pattern matching either the pattern defined by `_a_` or the pattern defined by `_b_`. |
| 27 | +• A repetition of a Regular Path Pattern + |
| 28 | + `()-/_a_*/-()` denotes a Regular Path Pattern matching the pattern defined by `_a_` zero or more times. + |
| 29 | + `()-/_a_+/-()` denotes a Regular Path Pattern matching the pattern defined by `_a_` one or more times. + |
| 30 | + `()-/_a_*_x_../-()` denotes a Regular Path Pattern matching the pattern defined by `_a_` `_x_` or more times. + |
| 31 | + `()-/_a_*_x_.._y_/-()` denotes a Regular Path Pattern matching the pattern defined by `_a_` at least `_x_` times and at most `_y_` times. |
| 32 | +• A grouping of a Regular Path Pattern + |
| 33 | + `()-/[_a_]/-()` denotes a grouping of the pattern `_a_`. |
| 34 | +• A specification of direction for a Regular Path Pattern + |
| 35 | + `()-/ _a_ >/-()` denotes that the Regular Path Pattern `_a_` should be interpreted in a left-to-right direction. + |
| 36 | + `()-/< _a_ /-()` denotes that the Regular Path Pattern `_a_` should be interpreted in a right-to-left direction. + |
| 37 | + `()-/< _a_ >/-()` denotes that the Regular Path Pattern `_a_` should be interpreted in any direction. |
| 38 | +• A reference to a Defined Path Predicate + |
| 39 | + `()-/alpha/-()` denotes a reference to a Defined Path Predicate named `alpha`. |
| 40 | + |
| 41 | +Regular Path Patterns are written similarly to how relationship patterns are written, but enclosed within two slash (`/`) characters instead of brackets (`[]`). |
| 42 | + |
| 43 | +Contrary to Relationship Patterns, Regular Path Patterns do _not_ allow binding a relationship to a variable. |
| 44 | +In order to bind the matching path to a variable, a Path Assignment should be used, by preceding the path with an identifier and an equals sign (`=`). |
| 45 | +This avoids a problem that existed in the past with repetition of relationships (a syntax that is deprecated with the introduction of Regular Path Patterns), where a relationship variable would bind to a list, making it hard to express predicates over the actual relationships. |
| 46 | +Predicates on parts of a Regular Path Pattern are instead expressed through the use of explicitly defined path predicates. |
| 47 | + |
| 48 | +=== Syntax |
| 49 | + |
| 50 | +The syntax of Regular Path Patterns fit into the greater Cypher syntax through `PatternElementChain`. |
| 51 | + |
| 52 | +---- |
| 53 | +PatternElementChain = (RelationshipPattern | RegularPathPattern), NodePattern ; |
| 54 | +
|
| 55 | +RegularPathPattern = (LeftArrowHead, Dash, '/', [RegularPathExpression], '/', Dash, RightArrowHead) |
| 56 | + | (LeftArrowHead, Dash, '/', [RegularPathExpression], '/', Dash) |
| 57 | + | (Dash, '/', [RegularPathExpression], '/', Dash, RightArrowHead) |
| 58 | + | (Dash, '/', [RegularPathExpression], '/', Dash) |
| 59 | + ; |
| 60 | +RegularPathExpression = {RegularPathAlternative}- ; |
| 61 | +RegularPathAlternative = RegularPathSequence, {'|', RegularPathSequence} ; |
| 62 | +RegularPathSequence = {RegularPathStar}- ; |
| 63 | +RegularPathStar = RegularPathDirected [('*', [RangeLiteral]) | '+'] ; |
| 64 | +RegularPathDirected = ['<'], RegularPathBase, ['>'] ; |
| 65 | +RegularPathBase = RegularPathRelationship |
| 66 | + | RegularPathAnyRelationship |
| 67 | + | RegularPathNode |
| 68 | + | RegularPathReference |
| 69 | + | '[' RegularPathExpression ']' |
| 70 | + ; |
| 71 | +RegularPathRelationship = RelType ; |
| 72 | +RegularPathAnyRelationship = '-' ; |
| 73 | +RegularPathNode = '(' NodeLabels ')' ; |
| 74 | +RegularPathReference = SymbolicName ; |
| 75 | +---- |
| 76 | + |
| 77 | +The `RegularPathReference` is a reference to a Defined Path Predicate. |
| 78 | +These are defined using the following syntax: |
| 79 | + |
| 80 | +---- |
| 81 | +DefinedPathPredicate = 'PATH' PathPredicatePrototype, 'IS', Pattern, [Where] ; |
| 82 | +PathPredicatePrototype = '(', Variable, ')', RegularPathPrototype, '(', Variable, ')' ; |
| 83 | +RegularPathPrototype = (LeftArrowHead, Dash, '/', DefinedPathName, '/', Dash) |
| 84 | + | (Dash, '/', DefinedPathName, '/', Dash, RightArrowHead) |
| 85 | + | (Dash, '/', DefinedPathName, '/', Dash) |
| 86 | + ; |
| 87 | +DefinedPathName = SymbolicName ; |
| 88 | +---- |
| 89 | + |
| 90 | + |
| 91 | +=== Directions |
| 92 | + |
| 93 | +The direction of relationships matched by a Regular Path Pattern is primarily decided by the directional arrow surrounding the pattern. |
| 94 | +If the arrow points from left to right (i.e. `(left)-/pattern/\->(right)`), the paths described by the pattern are paths in the left-to-right direction, i.e. paths that are _outgoing_ from the node to the left of the pattern, and _incoming_ to the node to the right of the pattern. |
| 95 | +If the arrow points from right to left (i.e. `(left)\<-/pattern/-(right)`), the paths described by the pattern are paths in the right-to-left paths direction, i.e. paths that are _incoming_ to the node to the left of the pattern, and _outgoing_ from the node to the right of the pattern. |
| 96 | +If there are no arrowheads (i.e. `(left)-/pattern/-(right)`), or if both arrowheads are present (i.e. `(left)\<-/pattern/\->(right)`), the paths described by the pattern are paths in either the left-to-right or the right-to-left direction. |
| 97 | + |
| 98 | +All parts of a Regular Path Pattern will assume the direction of the surrounding arrow, unless the direction is explicitly overridden for that particular part of the pattern. |
| 99 | +A prefix of `<` to part of a pattern overrides the direction of that part to be right-to-left. |
| 100 | +A suffix of `>` to part of a pattern overrides the direction of that part to be left-to-right. |
| 101 | +Both a `<` prefix and a `>` suffix can be used on the same part of the pattern to override the direction of that part to be _either direction_. |
| 102 | +Direction overrides only apply to a single pattern part. |
| 103 | +In order to apply the direction override to multiple parts of the pattern, those parts should be grouped. |
| 104 | + |
| 105 | +Using both a `<` prefix and a `>` suffix on the same pattern is always the same thing as a disjunction between that pattern with a `<` prefix and that pattern with a `>` suffix. |
| 106 | +This means that `()-/< _a_ >/-()` is the same as `()-/[< _a_] | [_a_ >]/-()`. |
| 107 | + |
| 108 | +==== Directions and Defined Path Predicates |
| 109 | + |
| 110 | +When a Defined Path Predicate is referenced the direction of reference is matched with the direction in the declaration of the Defined Path Predicate. |
| 111 | +If the declaration of the Defined Path Predicate is defined left-to-right, but the direction of the reference is right-to-left, the direction of definition of the the Defined Path Predicate is reversed to match that of the reference. |
| 112 | +The same reversal applies if the Defined Path Predicate is defined right-to-left but the direction of the reference is left-to-right. |
| 113 | +If the direction of the reference is _either direction_, the Defined Path Predicate is matched both in its declared direction and its reversed direction. |
| 114 | +If a Defined Path Predicate is declared without a direction, the direction of the reference does not matter, since the direction of the Defined Path Predicate is inherently _any direction_. |
| 115 | +A Defined Path Predicate declared without a direction must have a definition that is equivalent if reversed. |
| 116 | + |
| 117 | +==== Direction examples |
| 118 | + |
| 119 | +• `()-/a <[b c] d/\->()` is the same as `()-/a/\->()\<-/b c/-()-/d/\->(d)`, i.e. the direction of the group `b c` has been overridden to be right-to-left in a pattern where the overall direction is left-to-right. |
| 120 | +• `()-/a <b> c/\->()` is the same as `()-/a/\->()-/b/-()-/c/\->()`, i.e. the direction of `b` has been overridden to be _either direction_. |
| 121 | +• `()-/a/-()`, `()-/<a>/-()`, `()-/<a>/\->()`, `()\<-/<a>/-()`, `()\<-/<a>/\->()`, and `()\<-/a/\->()` all mean the same thing: matching `a` in _either direction_. |
| 122 | + |
| 123 | +Given these Defined Path Predicates: |
| 124 | + |
| 125 | +[source, cypher] |
| 126 | +---- |
| 127 | +PATH (l)-/alpha/->(r) IS (l)-[:X]->()-[:Y]->(r) |
| 128 | +PATH (l)-/beta/->(r) IS (l)<-[:Y]-()<-[:X]-(r) |
| 129 | +PATH (l)-/gamma/-(r) IS (l)-/[:X :Y]> | <[:Y :X]/-(r) |
| 130 | +---- |
| 131 | + |
| 132 | +• `()-/alpha/\->()` is equivalent to `()\<-/beta/-()` |
| 133 | +• `()\<-/alpha/-()` is equivalent to `()-/beta/\->()` |
| 134 | +• `()-/gamma/\->()` is equivalent to `()\<-/gamma/-()`, since both are equivalent to `()-/gamma/-()` |
| 135 | +• `()-/gamma/-()` is equivalent to `()-/alpha/-()`, since `()-/alpha/-()` is the same as `()-/alpha> | <alpha/-()`, which is equivalent to the declaration of `gamma`. + |
| 136 | + It is also equivalent to `()-/<beta | beta>/-()` which is the same as `()-/beta/-()`. |
| 137 | + |
| 138 | +=== Regular Path Pattern Examples |
| 139 | + |
| 140 | +The astute reader of the syntax will have noticed that it is possible to express a Regular Path Pattern with an empty path expression: |
| 141 | + |
| 142 | +[source, cypher] |
| 143 | +---- |
| 144 | +MATCH (a)-//-(b) |
| 145 | +---- |
| 146 | + |
| 147 | +This pattern simply states that `a` and `b` must be the same node, and is thus the same as: |
| 148 | + |
| 149 | +[source, cypher] |
| 150 | +---- |
| 151 | +MATCH (a), (b) WHERE a = b |
| 152 | +---- |
| 153 | + |
| 154 | +The same reader will also have noticed that it is possible to define a pattern containing just a relationship type: |
| 155 | + |
| 156 | +[source, cypher] |
| 157 | +---- |
| 158 | +MATCH (a)-/:KNOWS/->(b) |
| 159 | +---- |
| 160 | + |
| 161 | +That pattern is indeed equivalent to the very similar relationship pattern: |
| 162 | + |
| 163 | +[source, cypher] |
| 164 | +---- |
| 165 | +MATCH (a)-[:KNOWS]->(b) |
| 166 | +---- |
| 167 | + |
| 168 | +The main difference being that the variant with a relationship pattern is able to bind that relationship and express further predicates over it. |
| 169 | + |
| 170 | +The Regular Path Patterns start becoming interesting when larger expressions are put together: |
| 171 | + |
| 172 | +[source, cypher] |
| 173 | +.Finding someone loved by someone hated by someone you know, transitively |
| 174 | +---- |
| 175 | +MATCH (you)-/[:KNOWS :HATES]+ :LOVES/->(someone) |
| 176 | +---- |
| 177 | + |
| 178 | +Note the `+` expressing one or more occurrences of the sequence `KNOWS` followed by `HATES`. |
| 179 | + |
| 180 | +The direction of each relationship is governed by the overall direction of the Regular Path Pattern. |
| 181 | +It is however possible to explicitly define the direction for a particular part of the pattern. |
| 182 | +This is done by either prefixing that part with `<` for a right-to-left direction or suffix it with `>` for a left-to-right direction. |
| 183 | +It is possible to both prefix the part with `<` and suffix it with `>`, indicating that this part of the pattern matches in any direction. |
| 184 | + |
| 185 | +[source, cypher] |
| 186 | +.Specifying the direction for different parts of the pattern |
| 187 | +---- |
| 188 | +MATCH (you)-/[:KNOWS <:HATES]+ :LOVES/->(someone) |
| 189 | +---- |
| 190 | + |
| 191 | +In the example above we say that the `HATES` relationships should have the opposite direction to the other relationships in the path. |
| 192 | + |
| 193 | +Through the use of Defined Path Predicates we can express even more predicates over a path: |
| 194 | + |
| 195 | +[source, cypher] |
| 196 | +.Find a chain of unreciprocated lovers |
| 197 | +---- |
| 198 | +MATCH (you)-/unreciprocated_love*/->(someone) |
| 199 | +PATH (a)-/unreciprocated_love/->(b) IS |
| 200 | + (a)-[:LOVES]->(b) |
| 201 | + WHERE NOT EXISTS { (b)-[:LOVES]->(a) } |
| 202 | +---- |
| 203 | + |
| 204 | +Note how there is no colon used for referencing the Defined Path Predicate, the colon is used in Regular Path Patterns only for referencing actual relationship types. |
| 205 | + |
| 206 | +Sometimes it will be interesting to express a predicate on a node in a Regular Path Pattern. |
| 207 | +This can be achieved by using a Defined Path Predicate where the nodes on both ends are the same: |
| 208 | + |
| 209 | +[source, cypher] |
| 210 | +.Find friends of friends that are not haters |
| 211 | +---- |
| 212 | +MATCH (you)-/:KNOWS not_a_hater :KNOWS/-(friendly_friend_of_friend) |
| 213 | +PATH (x)-/not_a_hater/-(x) IS (x) |
| 214 | + WHERE NOT EXISTS { (x)-[:HATES]->() } |
| 215 | +---- |
| 216 | + |
| 217 | +In the case of a Defined Path Predicate where both nodes are the same, the direction of the predicate is irrelevant. |
| 218 | +In general the direction of a Defined Path Predicate is quite important, and used for mapping the pattern in the predicate into the Regular Path Patterns that reference it. |
| 219 | +The only cases where it is allowed to omit the direction of a Defined Path Predicate is when the defined predicate is reflexive. |
| 220 | +This is obviously the case when both nodes are the same, but it would also be the case when the internal pattern is symmetrical, such as in the following example: |
| 221 | + |
| 222 | +[source, cypher] |
| 223 | +.Find chains of co-authorship |
| 224 | +---- |
| 225 | +MATCH (you)-/co_author*/-(someone) |
| 226 | +PATH (a)-/co_author/-(b) IS |
| 227 | + (a)-[:AUTHORED]->(:Book)<-[:AUTHORED]-(b) |
| 228 | + WHERE a <> b |
| 229 | +---- |
0 commit comments