Skip to content

Commit 98d78b6

Browse files
committed
Minor changes and clarifications
1 parent ce09cf5 commit 98d78b6

File tree

1 file changed

+96
-51
lines changed

1 file changed

+96
-51
lines changed

cip/CIP2017-06-18-multiple-graphs.adoc

+96-51
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,6 @@ This CIP proposes to extend Cypher with support for the construction, transforma
1414

1515
toc::[]
1616

17-
```
18-
TODO:
19-
20-
* Parameter handling
21-
* Graph name syntax
22-
* Precise update semantics
23-
* Entity identity
24-
```
25-
2617
== Motivation
2718

2819
Cypher today is a query language for property graphs that provides access to a single, global, implicit graph in order to extract, transform, and return tabular data that is derived from it.
@@ -68,12 +59,24 @@ An entity is considered to be deleted if it is no longer part of any graph.
6859

6960
=== Graph Addressing
7061

71-
Graphs do not expose an identity like nodes or relationships. They may however be made addressable through other means by a conforming implementation (e.g. through exposing the graph under a _Graph URL_).
62+
Graphs do not expose an identity like nodes or relationships do.
63+
64+
Graphs may be made addressable through other means by a conforming implementation (e.g. through exposing the graph under a _graph URL_ for referencing and loading it).
65+
The details regarding the format and choice of graph URLs is outside the scope of this proposal.
66+
67+
A graph is considered to have been deleted if it is no longer registered under a graph URL and no other reference to it is retained (e.g. from a running query).
7268

73-
The details of such a mechanism are out of scope of this proposal.
69+
=== Entity Identity
7470

75-
However, a graph is considered to have been deleted if it is no longer registered under a Graph URL and no other reference to it is retained (e.g. from a running query).
71+
In the single property graph model, nodes and relationships are commonly identified by a single integer id.
72+
This model was originally not designed for sharing entities between many different graphs while ensuring that entity ids are unique.
7673

74+
In the multiple property graphs model, entities are additionally implicitly associated with a _graph space_ that allows to distinguish between entities with the same original id from different sources (e.g. different databases or even snapshots of the same database).
75+
76+
In the multiple property graphs model, no graph may contain two entities from the same graph space that have the same original id.
77+
78+
Graph spaces may be made identifiable by a conforming implementation by assigning a _graph URI_ to them.
79+
The details regarding the format and choice of graph URIs is outside the scope of this proposal.
7780

7881
== Background: Single Graph Execution Model
7982

@@ -109,8 +112,8 @@ This CIP proposes to redefine the *execution context* to be
109112
This CIP proposes to redefine the *query context* to be
110113

111114
* a set of named graphs from the *execution context*
112-
* an optional information that indicates which of these named graphs is the current *source graph*
113-
* an optional information that indicates which of these named graphs is the current *target graph*
115+
* an optional information that indicates which of these named graphs if any is the *source graph*
116+
* an optional information that indicates which of these named graphs if any is the *target graph*
114117
* optional *tabular data*, i.e. a potentially ordered bag of records, each having the same fixed set of fields
115118

116119
These redefinitions constitute the multiple graphs execution model. A parameterized Cypher query under this model can _also_ be described as executing within (and operating on) a given execution context and an initial query context and finally returning the query context produced as output for the top-most `RETURN` clause.
@@ -140,7 +143,7 @@ A query `Q1` whose output signature is an acceptable (in terms of provided bindi
140143

141144
This homogenous query composition is enabled by using an uniform query context that is passed between clauses.
142145

143-
Note: The currently drafted subquery CIP proposes a language addition (e.g. `THEN`) for expressing this kind of query composition directly.
146+
Note: The currently drafted subquery CIP proposes a language addition (e.g. `THEN`) for expressing this kind of query composition directly. In terms of this CIP, `THEN` is simply syntactic sugar for `WITH * GRAPHS *`
144147

145148
=== Query combinators
146149

@@ -188,27 +191,43 @@ This CIP proposes the following kinds of graph specifiers:
188191

189192
* `NEW GRAPH [<new-graph-name>] [AT <graph-url>]`: Reference to a newly created, empty graph that is to be bound as `<new-graph-name>` and may potentially overwrite any pre-existing graph at the provided `<graph-url>`
190193
* `GRAPH [<new-graph-name] AT <graph-url>`: Reference to the graph at the given `<graph-url>` that is to be bound as `<new-graph-name>`
191-
* `GRAPH <graph-name> [AS <new-graph-name>]`: Reference to an already bound named graph
192-
* `SOURCE GRAPH [AS <new-graph-name>]`: Reference to the currently _provided source graph_, optionally to be bound as `<new-graph-name>`
193-
* `TARGET GRAPH [AS <new-graph-name>]`: Reference to the currently _provided target graph_, optionally to be bound as `<new-graph-name>`
194+
* `[GRAPH] <graph-name> [AS <new-graph-name>]`: Reference to an already bound named graph
195+
* `COPY [GRAPH] <graph-name> [AS <new-graph-name>]`: Reference to a copy of an already bound named graph
196+
* `SOURCE GRAPH [<new-graph-name>]`: Reference to the currently _provided source graph_, optionally to be bound as `<new-graph-name>`
197+
* `TARGET GRAPH [<new-graph-name>]`: Reference to the currently _provided target graph_, optionally to be bound as `<new-graph-name>`
194198

195199
If a graph specifier is not referencing an already bound named graph and does not specify a `<new-graph-name>`, it is bound to a fresh system generated name.
196200
The details of this are left to implementations.
197201

198202
It is an error to use a `<graph-specifier>` in a context where it's introduced `<new-graph-name>` is already bound.
199203

200-
=== Changing back to the default graph
204+
==== Graph names
205+
206+
Graph names use the same syntax as existing variable names.
201207

202-
Additionally, this CIP proposes new syntax for changing the source and the target graph of the current query back to the the default graph provided by the outer execution context:
208+
It is an error to use the same name for both a regular variable or the name of a graph.
209+
210+
==== Graph URLs
211+
212+
The exact shape and form of graph URL lies outside the scope of this CIP.
213+
214+
This CIP however proposes that a `<graph-url>` must always be given as either a string literal or a query parameter.
215+
216+
This allows parameterization of queries by controlling which graphs from which graph URLs they should use.
217+
218+
=== Changing back to no graph
219+
220+
Additionally, this CIP proposes new syntax for discarding the source and the target graph of the current query:
203221

204222
[source, cypher]
205223
----
224+
FROM -
225+
INTO -
206226
----
207227

208-
`DEFAULT GRAPH` is not a graph specifier; rather this syntax is a special form for discarding the current source and target graph such that the provided source and target graph are again chosen to be the default graph as specified for partial query contexts.
209-
210-
In consequence, both `FROM DEFAULT GRAPH` and `INTO DEFAULT GRAPH` without an explicitly given `<new-graph-name>` will not bind the default graph to a generated fresh name.
228+
`-` is not a graph specifier; rather this syntax is a special form for discarding the current source and target graph such that the provided source and target graph are again chosen to be the default graph as specified for partial query contexts.
211229

230+
In consequence, both `FROM -` and `INTO -` will not bind the default graph to a generated fresh name.
212231
This is different from `<graph-specifier>` semantics that will ensure that referenced graphs are always bound to a name.
213232

214233
=== Returning, aliasing, and selecting graphs
@@ -218,33 +237,35 @@ The newly proposed syntax is:
218237

219238
[source, cypher]
220239
----
221-
WITH [ < return-items > ] [ GRAPHS < graph-return-items > ]
222-
RETURN [ < return-items > ] [ GRAPHS < graph-return-items > ]
240+
WITH [ < return-items > ] [ [ INPUT ] GRAPHS < graph-return-items > ]
241+
RETURN [ < return-items > ] [ [ INPUT ] GRAPHS < graph-return-items > ]
223242
----
224243

225244
This CIP proposes the following kinds of `<graph-return-items>`:
226245

227-
* `<graph-item-list`: A comma separated list of `<graph-return-item>` (defined below) that are to be passed on
246+
* `<graph-specifier-list>`: A comma separated list of `<graph-specifier>` that are to be passed on
228247
* `*`: All named graphs are to be passed on
229-
* `*, <graph-item-list>`: All named graphs are to be passed on together with any additional named graphs that are newly bound in `<graph-item-list>`
248+
* `*, <graph-specifier-list>`: All named graphs are to be passed on together with any additional named graphs that are newly bound in `<graph-specifier-list>`
230249
* `-`: No named graphs are to be passed on
231250

232-
The order of named graphs inherently given by `<graph-return-items` is semantically insignificant.
251+
The order of named graphs inherently given by `<graph-return-items>` is semantically insignificant.
233252
However it is recommended that conforming implementations preserve this order at least in programmatic output operations (e.g. a textual display of the list of returned graphs).
234253
This in essence mirrors the semantics for tabular data returned by Cypher.
235254

236-
This CIP proposes the introduction of the following kinds of graph return items that may be included in a `<graph-item-list>`:
255+
Both `WITH ... GRAPHS ...` and `RETURN ... GRAPHS ...` will pass on (or return respectively) exactly the set of described named graphs.
256+
To simplify passing on available graphs it is proposed by this CIP that regular `WITH <return-items>` is taken to be syntactic sugar for `WITH <return-items> GRAPHS -` and that regular `RETURN <return-items>` is taken to be syntactic sugar for `RETURN <return-items> GRAPHS -`.
237257

238-
* `<graph-specifier>`: Any graph that is described by a `<graph-specifier>` may be passed on under the provided `<new-graph-name>` (unless the given graph is an un-aliased already existing graph, it which case it's passed on with it's existing name)
239-
* `<graph-name> [AS <new-graph-name>], ...`: Syntactic sugar for `GRAPH <graph-name> [AS <new-graph-name>]`
258+
To even further simplify, it is additionally proposed that `WITH|RETURN <return-items> INPUT GRAPHS <graph-return-items>` is to be syntactic sugar for `WITH|RETURN <return-items> GRAPHS <graph-return-items>, SOURCE GRAPH, TARGET GRAPH`.
259+
However if `<graph-return-items>` already passes on a reference for the `SOURCE GRAPH`, no additional reference for it is added and if `<graph-return-items>` already passes on a reference for the `TARGET GRAPH`, no additional reference for it is added.
240260

241-
Both `WITH` and `RETURN` will pass on (or return respectively) exactly the set of described named graphs.
242261
If the current named source graph (or the current named target graph) are not passed on, they are discarded and due to the rules regarding partial query contexts the provided source graph (or target respectively) again are chosen to be the default graph of the outer execution context.
243262

263+
Note: `WITH <return-items> GRAPHS *` may be used to pass through the initial query context without having to alias source and target graphs explicitly.
264+
244265
=== Discarding available tabular data
245266

246-
It is additionally proposed that both `WITH GRAPHS <graph-return-items>` and `RETURN GRAPHS <graph-return-items>` are
247-
special forms for discarding all tabular data such that the provided tabular input for the following clause (or query respectively) would again be the provided single record without any fields as specified by the rules for partial query contexts.
267+
It is additionally proposed that both `WITH GRAPHS <graph-return-items>` and `RETURN GRAPHS <graph-return-items>` are syntactic sugar for `WITH - GRAPHS <graph-return-items>` (and `RETURN - GRAPHS <graph-return-items>` respectively).
268+
These special forms may be used for discarding all tabular data such that the provided tabular input for the following clause (or query respectively) would again be the provided single record without any fields as specified by the rules for partial query contexts.
248269

249270
Note: This syntax may be used to indicate when the gradual construction of a named graph is finished since neither fields nor the cardinality of tabular data is preserved after this point.
250271

@@ -259,35 +280,59 @@ The proposed syntax is:
259280

260281
[source, cypher]
261282
----
262-
FROM < graph-specifier > | DEFAULT GRAPH [AS < new-graph-name >] { < graph-construction-subquery > }
263-
INTO < graph-specifier > | DEFAULT GRAPH [AS < new-graph-name >] { < graph-construction-subquery > }
283+
FROM < graph-specifier > | '-' { < graph-construction-subquery > }
284+
INTO < graph-specifier > | '-' { < graph-construction-subquery > }
264285
----
265286

266287
A `<graph-construction-subquery>` is an updating subquery (i.e. a sequence of clauses, including update clauses) that may or may not end in `RETURN`.
267288
All variables bound before the nested `FROM` and `INTO` subqueries are made visible to the `<graph-construction-subquery>`.
268289
All variables bound at the end of the `<graph-construction-subquery>` are made visible to the remaining outer query.
269290

270-
These forms have the exact same effect as creating aliases for the current source and target graph, then changing the current source and target graph as specified before executing the given `<graph-construction-subquery>`, and finally restoring the original source and target graphs using the aliases followed by discarding those aliases from the current scope.
291+
These forms have the exact same effect as creating fresh aliases for the current source and target graph, then changing the current source and target graph as specified before executing the given `<graph-construction-subquery>`, and finally restoring the original source and target graphs using the aliases followed by discarding those aliases from the current scope.
292+
293+
=== Updating graphs
294+
295+
This CIP proposes the following update semantics for Cypher with support for multiple graphs.
296+
297+
Entities are always created in and deleted from the currently provided target graph.
298+
299+
Semantically, all effects of an updating clause must be made visible before proceeding with the execution of the next clause.
300+
In other words, a conforming implementation must ensure that a later clause alway sees the complete set of updates of a preceding updating clause.
301+
302+
A single update clause may perform multiple conflicting updates on the same node or relationship.
303+
In this situation, the outcome is undefined.
304+
305+
Conflicting updates are considered to be out of scope of this CIP.
306+
307+
For now it is proposed that a conforming implementation must choose at least either the original value or one of the values written or `NULL` as the final outcome of a conflicting update.
271308

272309
=== Query signature declarations
273310

274-
Finally this CIP proposed using the `WITH` clause as the initial clause in a query for declaring all query input arguments:
311+
Finally this CIP proposed using the `WITH` clause as the initial clause in a query for declaring all query inputs:
275312

276313
[source, cypher]
277314
----
278-
WITH [ < return-items > ] [ GRAPHS < graph-return-items > ]
315+
WITH < return-items > [ [ INPUT ] GRAPHS < graph-return-items > ]
316+
WITH [ < return-items > ] [ INPUT ] GRAPHS < graph-return-items >
279317
----
280318

281-
It is proposed that using `WITH` as the initial clause here is to be called a *query input declaration* while the use of `RETURN` as the last clause is to be called a *query output declaration* henceforth.
319+
It is proposed that using `WITH` as the initial clause in a query is to be called a *query input declaration* while the use of `RETURN` as the last clause is to be called a *query output declaration*.
282320

283321
Query input declarations are subject to the following limitations:
284322

285-
* All return items are expected to be over an imagined set of input variables from the previous query
286-
* All such referenced variables must be declared or aliased explicitly by another return item
287-
* The use of `WITH *` and `WITH *, ...` causes all undeclared incoming variables to be renamed to fresh system generated variable names
288-
* The use of `GRAPH *` and `GRAPH *, ...` causes all incoming graphs to be renamed to fresh system generated graph names
323+
* All return item expressions are expected to reference an imagined set of input variables from the previous query
324+
* All such referenced variables must be declared or aliased explicitly by another return item unless the query input declaration starts with `WITH *` or `WITH *,`
325+
* If the input query context provides additional, undeclared variables or graphs, those inputs are to be silently discarded by query composition or execution
326+
327+
A query that does not start with a query input declaration is assumed to start with `WITH - GRAPHS -`, i.e. to run in isolation and to initially read and write to the default graph.
289328

290-
If the input query context provides additional variables or graphs, those inputs are to be silently discarded by query composition or execution.
329+
== Grammar
330+
331+
Proposed syntax changes
332+
[source, ebnf]
333+
----
334+
// TODO
335+
----
291336

292337
== Examples
293338

@@ -327,7 +372,7 @@ INTO NEW GRAPH berlin
327372
CREATE (a)-[:FRIEND]->(b) WHERE c.name = "Berlin"
328373
INTO NEW GRAPH santiago
329374
CREATE (a)-[:FRIEND]->(b) WHERE c.name = "Santiago"
330-
FROM DEFAULT GRAPH
375+
FROM -
331376
RETURN c.name AS city, count(r) AS num_friends GRAPHS berlin, santiago
332377
----
333378

@@ -379,7 +424,7 @@ MATCH (a:Person)-[e:KNOWS]->(b:Person)
379424
WITH a.country AS a_country, b.country AS b_country, count(a) AS a_cnt, count(b) AS b_cnt, count(e) AS e_cnt
380425
INTO NEW GRAPH rollup {
381426
MERGE (:Persons {country: a_country, cnt: a_cnt})-[:KNOW {cnt: e_cnt}]->(:Persons {country: b_country, cnt: b_cnt})
382-
}
427+
}
383428
// Return final graph output
384429
RETURN GRAPHS rollup
385430
----
@@ -394,18 +439,18 @@ MATCH (a:Person)-[e]->(b:Person),
394439
(a)-[:LIVES_IN]->()->[:IS_LOCATED_IN]-(c:Country {name: ‘Sweden’}),
395440
(b)-[:LIVES_IN]->()->[:IS_LOCATED_IN]-(c)
396441
// Create a persistent graph at 'graph://social-network/swe'
397-
INTO GRAPH sweden_people AT './swe' {
442+
INTO NEW GRAPH sweden_people AT './swe' {
398443
// connecting persons that live in the same city in Sweden.
399444
CREATE (a)-[e]->(b)
400-
}
445+
}
401446
// Finally discard all tabular data and cardinality
402447
WITH GRAPHS *
403448
404449
MATCH (a:Person)-[e]->(b:Person),
405450
(a)-[:LIVES_IN]->()->[:IS_LOCATED_IN]-(c:Country {name: ‘Germany’}),
406451
(b)-[:LIVES_IN]->()->[:IS_LOCATED_IN]-(c)
407452
// Create a persistent graph at 'graph://social-network/ger'
408-
INTO GRAPH german_people AT './ger' {
453+
INTO NEW GRAPH german_people AT './ger' {
409454
// connecting persons that live in the same city in Germany.
410455
CREATE (a)-[e]->(b)
411456
}
@@ -416,7 +461,7 @@ WITH GRAPHS *
416461
FROM GRAPH sweden_people
417462
MATCH p=(a)--(b)--(c)--(a) WHERE NOT (a)--(c)
418463
// Create a temporary graph 'swedish_triangles'
419-
INTO GRAPH swedish_triangles {
464+
INTO NEW GRAPH swedish_triangles {
420465
ADD p
421466
}
422467
// and return it together with a count of it's content

0 commit comments

Comments
 (0)