Skip to content

Commit 931e0e8

Browse files
committed
Cypher: support bounded varlen WHERE pattern predicates (#973)
1 parent f8449bc commit 931e0e8

5 files changed

Lines changed: 105 additions & 79 deletions

File tree

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
1010

1111
### Documentation
1212
- **GFQL component-labeling examples + README clarity (#1324)**: Added concise WCC/SCC labeling examples for `compute_cugraph`, `compute_igraph('clusters')`, and local Cypher `CALL graphistry.cugraph.*` write/row modes in GFQL docs, clarified that component IDs are partition labels (not stable semantic IDs), and tightened the main README GFQL intro sentence for readability.
13+
- **GFQL / Cypher docs — variable-length boundary refresh (#973)**: Updated direct-Cypher capability docs (`docs/source/gfql/cypher.rst`, `docs/source/gfql/spec/cypher_mapping.md`) to reflect current support for connected variable-length patterns and bounded/exact variable-length `WHERE` pattern predicates, while preserving explicit fail-fast notes for remaining path/list-carrier and advanced row-shaping gaps.
14+
15+
### Changed
16+
- **GFQL / Cypher lowering — bounded/exact variable-length `WHERE` pattern predicates (#973)**: Removed the pre-normalization compiler gate that rejected bounded/exact variable-length `WHERE` pattern predicates and now lower these shapes through the existing WHERE-pattern rewrite and row-filter paths. Converted the old fail-fast test into positive execution coverage and added boolean-wrapper amplification (`OR`/`XOR`/`NOT`) for bounded variable-length `WHERE` predicates in `graphistry/tests/compute/gfql/cypher/test_lowering.py`.
1317

1418
## [0.55.1 - 2026-05-05]
1519

docs/source/gfql/cypher.rst

Lines changed: 23 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -206,11 +206,11 @@ Support Matrix
206206
- Execute directly through ``g.gfql("...")``. Helper translation to a single ``Chain`` is stricter.
207207
* - Variable-length relationship patterns
208208
- Partial
209-
- Direct Cypher supports endpoint-only traversals such as ``[*2]``,
210-
``[*1..3]``, ``[*]``, and typed forms like ``[:R*2..4]``, plus bounded
211-
connected multi-relationship patterns where the row shape stays in the
212-
current supported subset. Path/list-carrier uses, bounded/exact
213-
``WHERE`` pattern predicates, and broader branching/path-shaping cases
209+
- Direct Cypher supports endpoint traversals such as ``[*2]``,
210+
``[*1..3]``, ``[*]``, and typed forms like ``[:R*2..4]``; connected
211+
multi-relationship variable-length patterns; and bounded/exact/fixed-point
212+
variable-length ``WHERE`` pattern predicates in the current row-shaped
213+
subset. Path/list-carrier uses and unsupported path/row-shaping cases
214214
still fail fast.
215215
* - ``CREATE`` / ``DELETE`` / ``SET``
216216
- Not supported
@@ -236,9 +236,10 @@ Pattern Matching Forms
236236
- Node labels and multi-label node patterns such as ``(p:Person:Admin)``.
237237
- Relationship direction forms ``->``, ``<-``, and undirected ``-[]-``.
238238
- Relationship type alternation such as ``[r:KNOWS|HATES]``.
239-
- Single variable-length relationship patterns when they are the only
240-
relationship in the connected pattern, including ``[*n]``, ``[*m..n]``,
241-
``[*]``, and typed forms such as ``[:R*2..4]``.
239+
- Single variable-length relationship patterns, including ``[*n]``,
240+
``[*m..n]``, ``[*]``, and typed forms such as ``[:R*2..4]``.
241+
- Connected patterns that mix variable-length and fixed-length relationships,
242+
such as ``MATCH (a)-[:R*2]->()-[:S]->(c) RETURN c``.
242243
- Connected comma-separated patterns such as
243244
``MATCH (a)-[:A]->(b), (b)-[:B]->(c)``.
244245
- Repeated ``MATCH`` clauses when they stay connected through shared aliases.
@@ -255,40 +256,35 @@ WHERE Forms
255256
- Label predicates such as ``WHERE b:Foo:Bar``.
256257
- Relationship-type predicates such as ``WHERE type(r) = 'KNOWS'``.
257258
- Positive relationship-existence pattern predicates such as
258-
``WHERE (n)-[:R]->()`` and bare fixed-point variable-length existence checks
259-
such as ``WHERE (n)-[*]-()``.
260-
- One positive relationship-existence pattern predicate may be combined with
261-
ordinary row filters through top-level ``AND``, for example
262-
``WHERE n.kind = 'x' AND (n)-[:R*]->() AND n.id <> 'a'``.
259+
``WHERE (n)-[:R]->()`` and variable-length existence checks such as
260+
``WHERE (n)-[*]-()`` and ``WHERE (n)-[:R*2]->()``.
261+
- Pattern predicates can be combined with row predicates in the current
262+
boolean subset, including ``AND`` / ``OR`` / ``XOR`` and ``NOT`` forms.
263263

264264
Variable-Length Relationship Boundary
265265
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
266266

267-
Direct Cypher multihop support is intentionally narrow in the current landing
268-
slice. The supported direct forms include endpoint traversals and bounded
269-
connected multi-relationship patterns where the result stays in the current
270-
row-shaping subset, for example:
267+
Direct Cypher multihop support remains intentionally bounded. The supported
268+
direct forms include endpoint traversals, connected multi-relationship
269+
patterns, and variable-length ``WHERE`` pattern predicates where the result
270+
stays in the current row-shaping subset, for example:
271271

272272
- ``MATCH (a)-[*2]->(b) RETURN b``
273273
- ``MATCH (a)-[:R*1..3]->(b) RETURN b``
274274
- ``MATCH (a)<-[*2]-(b) RETURN b``
275275
- ``MATCH (a)-[:R*1..2]-(b) RETURN b``
276276
- ``MATCH (a)-[:R*2]->(b)-[:S]->(c) RETURN c``
277277
- ``MATCH (a)-[:R]->(b), (b)-[:S*1..2]->(c) RETURN a.id AS a_id, c.id AS c_id``
278+
- ``MATCH (n) WHERE (n)-[:R*2]->() RETURN n``
279+
- ``MATCH (n) WHERE NOT (n)-[:R*2]->() RETURN n.id AS id``
278280

279281
The current compiler explicitly rejects these remaining subfamilies with
280282
``GFQLValidationError`` instead of attempting unsound execution:
281283

282284
- path/list-carrier use of a variable-length relationship alias, such as
283285
``RETURN r`` or ``count(r)``
284-
- exact or bounded variable-length ``WHERE`` pattern predicates such as
285-
``WHERE (n)-[:R*2]-()``
286-
- top-level ``OR`` / ``NOT`` around variable-length ``WHERE`` pattern
287-
predicates, or more than one positive pattern predicate in the same
288-
``WHERE`` clause
289-
- branching connected multihop patterns, or shapes that would require
290-
unsupported path/relationship-carrier row shaping around a variable-length
291-
segment
286+
- shapes that still require unsupported path/relationship-carrier row shaping
287+
around a variable-length segment
292288
- connected multi-pattern relationship-alias projection such as
293289
``RETURN r`` / ``r.prop`` when it would require unsupported row shaping
294290
- multi-alias ``RETURN *`` projections that would require unsupported
@@ -431,10 +427,8 @@ Not Supported Today
431427

432428
- Variable-length relationship aliases used as path/list carriers, such as
433429
``RETURN r`` or ``count(r)``.
434-
- Exact or bounded variable-length ``WHERE`` pattern predicates such as
435-
``WHERE (n)-[:R*2]-()``.
436-
- Branching connected multihop patterns, or connected multihop shapes that
437-
still require unsupported path/relationship-carrier row shaping.
430+
- Connected multihop shapes that still require unsupported
431+
path/relationship-carrier row shaping.
438432
- Multiple disconnected ``MATCH`` patterns used as arbitrary joins.
439433
- Multi-pattern re-entry shapes beyond the bounded single
440434
``MATCH ... WITH ... MATCH ... RETURN`` form.

docs/source/gfql/spec/cypher_mapping.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,11 @@ When translating from Cypher, you'll encounter three scenarios:
3636
### Direct Translations
3737
- Graph patterns: `(a)-[r]->(b)` → chain operations
3838
- Property filters: WHERE clauses embed into operations
39-
- Path traversals: direct `g.gfql("MATCH ...")` supports endpoint-only single
40-
variable-length relationship forms such as `[*2]`, `[*1..3]`, and `[*]`.
41-
Native GFQL still gives you the full explicit hop surface, including output
42-
slicing, intermediate-hop aliasing, and rewrites for currently unsupported
43-
direct-Cypher multihop shapes.
39+
- Path traversals: direct `g.gfql("MATCH ...")` supports single and connected
40+
variable-length relationship forms such as `[*2]`, `[*1..3]`, and `[*]`,
41+
including bounded/exact variable-length `WHERE` pattern predicates in the
42+
current row-shaped subset. Native GFQL still gives you the full explicit hop
43+
surface (output slicing, intermediate-hop aliasing, and custom rewrites).
4444
- Pattern composition: Multiple patterns become sequential operations
4545
- Same-path constraints: `WHERE` across steps → `g.gfql([...], where=[...])`
4646

@@ -255,10 +255,10 @@ g.gfql([
255255
### Edge Patterns
256256

257257
Rows using `[*...]` below show the native GFQL rewrite for the same traversal
258-
intent. Direct `g.gfql("MATCH ...")` now accepts these endpoint-only
259-
single-variable-length relationship forms, while native GFQL remains the more
260-
explicit option when you need intermediate-hop control or unsupported mixed
261-
pattern shapes.
258+
intent. Direct `g.gfql("MATCH ...")` accepts these variable-length forms in
259+
the supported direct-Cypher subset, while native GFQL remains the more explicit
260+
option when you need intermediate-hop control or advanced path/list-carrier
261+
semantics.
262262

263263
| Cypher / intent | Python | Wire Protocol (compact) |
264264
|-----------------|--------|-------------------------|
@@ -274,7 +274,7 @@ pattern shapes.
274274
| `-[r:BOUGHT {amount: gt(100)}]->` | `e_forward({"type": "BOUGHT", "amount": gt(100)}, name="r")` | `{"type": "Edge", "direction": "forward", "edge_match": {"type": "BOUGHT", "amount": {"type": "GT", "val": 100}}, "name": "r"}` |
275275

276276
When you need constraints on intermediate hops, path/list-carrier semantics, or
277-
mixed connected patterns beyond the current direct-Cypher subset, use repeated
277+
advanced row-shaping beyond the current direct-Cypher subset, use repeated
278278
single-hop GFQL steps with aliases instead of collapsing the traversal into one
279279
multihop edge operator.
280280

graphistry/compute/gfql/cypher/lowering.py

Lines changed: 0 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -5590,34 +5590,6 @@ def _is_variable_length_relationship_pattern(relationship: RelationshipPattern)
55905590
)
55915591

55925592

5593-
def _reject_unsupported_variable_length_where_pattern_predicates(query: CypherQuery) -> None:
5594-
if query.where is None:
5595-
return
5596-
predicates: List[WherePatternPredicate] = [
5597-
predicate for predicate in query.where.predicates if isinstance(predicate, WherePatternPredicate)
5598-
]
5599-
if query.where.expr_tree is not None:
5600-
predicates.extend(_where_expr_tree_pattern_predicates(query.where.expr_tree))
5601-
for predicate in predicates:
5602-
relationships = [
5603-
element
5604-
for element in predicate.pattern
5605-
if isinstance(element, RelationshipPattern)
5606-
]
5607-
for relationship in relationships:
5608-
if not _is_variable_length_relationship_pattern(relationship):
5609-
continue
5610-
if relationship.min_hops is None and relationship.max_hops is None and relationship.to_fixed_point:
5611-
continue
5612-
raise _unsupported(
5613-
"Cypher WHERE pattern predicates currently support only bare variable-length fixed-point relationships, not exact or bounded hop counts",
5614-
field="where",
5615-
value=boolean_expr_to_text(query.where.expr_tree) if query.where.expr_tree is not None else None,
5616-
line=predicate.span.line,
5617-
column=predicate.span.column,
5618-
)
5619-
5620-
56215593
def _reject_nonterminal_variable_length_relationship_patterns(query: CypherQuery) -> None: # noqa: ARG001
56225594
"""No-op: variable-length rels in connected patterns are now supported.
56235595
@@ -8331,7 +8303,6 @@ def _attach_graph_context(result: CompiledCypherQuery) -> CompiledCypherQuery:
83318303

83328304
normalizer = ASTNormalizer()
83338305
query = normalizer.rewrite_shortest_path(query)
8334-
_reject_unsupported_variable_length_where_pattern_predicates(query)
83358306
_reject_variable_length_path_alias_references(query, params=params)
83368307
query = normalizer.rewrite_where_pattern_predicates(query)
83378308

graphistry/tests/compute/gfql/cypher/test_lowering.py

Lines changed: 68 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5258,22 +5258,79 @@ def test_connected_variable_length_typed_mixed() -> None:
52585258

52595259

52605260
@pytest.mark.parametrize(
5261-
"query",
5261+
"query,expected_rows",
52625262
[
5263-
"MATCH (n) WHERE (n)-[:REL1*2]-() RETURN n",
5264-
"MATCH (n) WHERE (n)-[*2]-() RETURN n",
5265-
"MATCH (n) WHERE (n)<-[:REL1*1..2]-() RETURN n",
5266-
"MATCH (n) WHERE (n)-[:REL1*2]-() AND n.id <> 'a' RETURN n",
5263+
(
5264+
"MATCH (n) WHERE (n)-[:REL1*2]->() RETURN n.id AS id ORDER BY id",
5265+
[{"id": "a"}, {"id": "b"}, {"id": "c"}],
5266+
),
5267+
(
5268+
"MATCH (n) WHERE (n)-[*2]->() RETURN n.id AS id ORDER BY id",
5269+
[{"id": "a"}, {"id": "b"}, {"id": "c"}],
5270+
),
5271+
(
5272+
"MATCH (n) WHERE (n)<-[:REL1*1..2]-() RETURN n.id AS id ORDER BY id",
5273+
[{"id": "b"}, {"id": "c"}, {"id": "d"}],
5274+
),
5275+
(
5276+
"MATCH (n) WHERE (n)-[:REL1*2]->() AND n.id <> 'a' RETURN n.id AS id ORDER BY id",
5277+
[{"id": "b"}, {"id": "c"}],
5278+
),
52675279
],
52685280
)
5269-
def test_string_cypher_failfast_rejects_bounded_variable_length_where_pattern_predicates(query: str) -> None:
5270-
graph = _mk_empty_graph()
5281+
def test_string_cypher_executes_bounded_variable_length_where_pattern_predicates(
5282+
query: str,
5283+
expected_rows: list[dict[str, object]],
5284+
) -> None:
5285+
graph = _mk_graph(
5286+
pd.DataFrame({"id": ["a", "b", "c", "d"]}),
5287+
pd.DataFrame(
5288+
{
5289+
"s": ["a", "b", "c"],
5290+
"d": ["b", "c", "d"],
5291+
"type": ["REL1", "REL1", "REL1"],
5292+
}
5293+
),
5294+
)
52715295

5272-
with pytest.raises(GFQLValidationError) as exc_info:
5273-
graph.gfql(query)
5296+
result = graph.gfql(query)
5297+
assert result._nodes.to_dict(orient="records") == expected_rows
52745298

5275-
assert exc_info.value.code == ErrorCode.E108
5276-
assert "WHERE pattern predicates" in exc_info.value.message
5299+
5300+
@pytest.mark.parametrize(
5301+
"query,expected_rows",
5302+
[
5303+
(
5304+
"MATCH (n) WHERE (n)-[:REL1*2]->() OR n.id = 'd' RETURN n.id AS id ORDER BY id",
5305+
[{"id": "a"}, {"id": "b"}, {"id": "d"}],
5306+
),
5307+
(
5308+
"MATCH (n) WHERE (n)-[:REL1*2]->() XOR n.id = 'd' RETURN n.id AS id ORDER BY id",
5309+
[{"id": "a"}, {"id": "b"}, {"id": "d"}],
5310+
),
5311+
(
5312+
"MATCH (n) WHERE NOT (n)-[:REL1*2]->() RETURN n.id AS id ORDER BY id",
5313+
[{"id": "c"}, {"id": "d"}],
5314+
),
5315+
],
5316+
)
5317+
def test_string_cypher_executes_bounded_variable_length_where_pattern_boolean_wrappers(
5318+
query: str,
5319+
expected_rows: list[dict[str, object]],
5320+
) -> None:
5321+
graph = _mk_graph(
5322+
pd.DataFrame({"id": ["a", "b", "c", "d"]}),
5323+
pd.DataFrame(
5324+
{
5325+
"s": ["a", "b", "c"],
5326+
"d": ["b", "c", "d"],
5327+
"type": ["REL1", "REL1", "REL1"],
5328+
}
5329+
),
5330+
)
5331+
5332+
result = graph.gfql(query)
5333+
assert result._nodes.to_dict(orient="records") == expected_rows
52775334

52785335

52795336

0 commit comments

Comments
 (0)