Skip to content

Commit 5c27afa

Browse files
authored
Merge pull request #1400 from graphistry/issue-1396-worker-d
GFQL: tag-cooccurrence join aggregation cardinality hardening (#1396)
2 parents 21a65fd + 89edb2e commit 5c27afa

2 files changed

Lines changed: 78 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
4848
### Tests
4949
- **GFQL / Cypher two-MATCH reentry varlen regression hardening (#1001)**: Strengthened reentry varlen acceptance assertions from shape-only checks to exact expected rows, and added forward/reverse split-vs-connected query equivalence regressions to guard against wrong-row drift in the `match5-25/26` query family.
5050
- **GFQL / Cypher reentry ordered-top-k amplification (#1342, #880 partial)**: Added lowering regressions for MATCH-after-WITH re-entry with single-column and multi-column ordered top-k prefixes, carried-scalar top-k alignment, `LIMIT 0` empty-prefix behavior, `SKIP` failfast retention, plus cuDF parity coverage for the multi-row top-k lane.
51+
- **GFQL / Cypher tag-cooccurrence join+aggregation cardinality amplification (#1396, #880 residual lane)**: Added focused IC6-shape regression coverage for `collect(distinct friend) -> UNWIND -> connected comma MATCH -> WITH tag.name, count(post)` with non-trivial grouped counts (`Alpha=2`, `Beta=1`) plus cuDF parity guard, so the residual tag-cooccurrence join-aggregation lane is pinned without adapter-side workaround assumptions.
5152
- **DataFrame join helper branch coverage expansion + RAPIDS matrix validation (#1380)**: Expanded `graphistry/tests/compute/dataframe/test_join.py` to cover non-overwrite and empty-join schema branches (`joined_hidden_scalar_columns`, `joined_alias_columns`, `connected_inner_join_rows`), prefix/no-label projection behavior (`project_node_attrs`), inequality branch coverage (`ineq_eval_pairs`), and semijoin no-shared/delegation paths (`semijoin_eval_pairs`). Kept GFQL reentry/collision semantics in integration suites. Validated on DGX GPU for both RAPIDS `25.02` and `26.02` using `docker/test-rapids-official-local.sh` with GFQL profile + dataframe join tests.
5253

5354
### Internal

graphistry/tests/compute/gfql/cypher/test_lowering.py

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -619,6 +619,54 @@ def _mk_issue_1000_ic6_minimal_graph_cudf() -> _CypherTestGraph:
619619
return _mk_cudf_graph(graph._nodes, graph._edges)
620620

621621

622+
def _mk_issue_1396_tag_cooccurrence_join_aggregation_graph() -> _CypherTestGraph:
623+
return _mk_graph(
624+
pd.DataFrame(
625+
{
626+
"id": [501, 502, 503, 4398046511333, 2, 3, 9001, 9002, 9003],
627+
"label__Tag": [True, True, True, False, False, False, False, False, False],
628+
"label__Person": [False, False, False, True, True, True, False, False, False],
629+
"label__Post": [False, False, False, False, False, False, True, True, True],
630+
"name": [
631+
"Carl_Gustaf_Emil_Mannerheim",
632+
"Alpha",
633+
"Beta",
634+
None,
635+
None,
636+
None,
637+
None,
638+
None,
639+
None,
640+
],
641+
}
642+
),
643+
pd.DataFrame(
644+
{
645+
"s": [4398046511333, 4398046511333, 9001, 9002, 9003, 9001, 9002, 9003, 9001, 9002, 9003],
646+
"d": [2, 3, 2, 2, 3, 501, 501, 501, 502, 502, 503],
647+
"type": [
648+
"KNOWS",
649+
"KNOWS",
650+
"HAS_CREATOR",
651+
"HAS_CREATOR",
652+
"HAS_CREATOR",
653+
"HAS_TAG",
654+
"HAS_TAG",
655+
"HAS_TAG",
656+
"HAS_TAG",
657+
"HAS_TAG",
658+
"HAS_TAG",
659+
],
660+
}
661+
),
662+
)
663+
664+
665+
def _mk_issue_1396_tag_cooccurrence_join_aggregation_graph_cudf() -> _CypherTestGraph:
666+
graph = _mk_issue_1396_tag_cooccurrence_join_aggregation_graph()
667+
return _mk_cudf_graph(graph._nodes, graph._edges)
668+
669+
622670
def _prefix_scalar_reentry_query(
623671
*,
624672
tag_name: str = "topic",
@@ -10734,6 +10782,35 @@ def test_string_cypher_executes_issue_1000_ic6_exact_runtime_minimal_on_cudf() -
1073410782
]
1073510783

1073610784

10785+
def test_issue_1396_tag_cooccurrence_join_aggregation_counts() -> None:
10786+
"""IC6 tag-cooccurrence join+aggregation shape keeps grouped post cardinality."""
10787+
result = _mk_issue_1396_tag_cooccurrence_join_aggregation_graph().gfql(
10788+
_issue_1000_ic6_query(),
10789+
params=_issue_1000_ic6_params(),
10790+
)
10791+
10792+
assert result._nodes.to_dict(orient="records") == [
10793+
{"tagName": "Alpha", "postCount": 2},
10794+
{"tagName": "Beta", "postCount": 1},
10795+
]
10796+
10797+
10798+
def test_issue_1396_tag_cooccurrence_join_aggregation_counts_on_cudf() -> None:
10799+
pytest.importorskip("cudf")
10800+
10801+
result = _mk_issue_1396_tag_cooccurrence_join_aggregation_graph_cudf().gfql(
10802+
_issue_1000_ic6_query(),
10803+
params=_issue_1000_ic6_params(),
10804+
engine="cudf",
10805+
)
10806+
10807+
assert type(result._nodes).__module__.startswith("cudf")
10808+
assert result._nodes.to_pandas().to_dict(orient="records") == [
10809+
{"tagName": "Alpha", "postCount": 2},
10810+
{"tagName": "Beta", "postCount": 1},
10811+
]
10812+
10813+
1073710814
def test_string_cypher_executes_scalar_only_prefix_with_match_reentry() -> None:
1073810815
query = _prefix_scalar_reentry_query(order_by="id")
1073910816

0 commit comments

Comments
 (0)