Skip to content

Latest commit

 

History

History
238 lines (198 loc) · 9.39 KB

File metadata and controls

238 lines (198 loc) · 9.39 KB
name pygraphistry-gfql
description Construct and run GFQL graph queries in PyGraphistry using chain-list syntax or Cypher strings. Use when asked to "query my graph with GFQL", "MATCH pattern in graphistry", "find paths between nodes", "hop constraints", "let bindings", "GRAPH constructor", or "run Cypher on my graph". Also triggers on "g.gfql()", "n() e_forward() n()", "chain-list query", "subgraph extraction", "remote graph query", or "pattern matching in graphistry". Proactively suggest when the user wants multi-hop traversal or pattern matching on a graph already loaded in PyGraphistry.

PyGraphistry GFQL

Doc routing (local + canonical)

  • First route with ../pygraphistry/references/pygraphistry-readthedocs-toc.md.
  • Use ../pygraphistry/references/pygraphistry-readthedocs-top-level.tsv for section-level shortcuts.
  • Only scan ../pygraphistry/references/pygraphistry-readthedocs-sitemap.xml when a needed page is missing.
  • Use one batched discovery read before deep-page reads; avoid cat * and serial micro-reads.
  • In user-facing answers, prefer canonical https://pygraphistry.readthedocs.io/en/latest/... links.

Two syntaxes, one entrypoint

g.gfql() accepts both chain-list (Python AST objects) and Cypher strings. It auto-detects the language from the argument type:

# Chain-list syntax (Python AST objects)
g2 = g.gfql([n({'type': 'person'}), e_forward(), n()])

# Cypher string syntax (auto-detected)
g2 = g.gfql("MATCH (p:Person)-[r:KNOWS]->(q:Person) RETURN p.name, q.name")

# Explicit language parameter (optional)
g2 = g.gfql(query_string, language="cypher")

When to use which:

  • Chain-list: Programmatic composition, dynamic parameterization, when building queries from code
  • Cypher: Readability, familiarity for Cypher users, complex pattern matching with RETURN/ORDER BY/LIMIT

Quick start — chain-list

from graphistry import n, e_forward

g2 = g.gfql([
    n({'type': 'person'}),
    e_forward({'relation': 'transfers_to'}, min_hops=1, max_hops=3),
    n({'risk': True})
])

Quick start — Cypher

# Simple pattern match
g2 = g.gfql("MATCH (p:Person)-[r:KNOWS]->(q:Person) WHERE p.age > 30 RETURN p.name, q.name")

# Variable-length paths
g2 = g.gfql("MATCH (a:Account)-[*1..3]->(m:Merchant) RETURN a, m")

# Parameterized queries
g2 = g.gfql(
    "MATCH (n) WHERE n.score > $cutoff RETURN n.id, n.score ORDER BY n.score DESC LIMIT $top_n",
    params={"cutoff": 50, "top_n": 10}
)

# Relationship type alternation
g2 = g.gfql("MATCH (a:Person)-[:KNOWS|COLLABORATES_WITH]->(b:Person) RETURN a.name, b.name")

Cypher node labels and DataFrame columns

GFQL Cypher maps :Label to boolean columns label__<Label>, not string columns. Prefer property filters (simpler, works with any column):

# Recommended: property filter (works with any string/numeric column)
g2 = g.gfql("MATCH (p) WHERE p.type = 'Person' AND p.age > 30 RETURN p.name")

# Alternative: pre-create boolean label columns for Cypher :Label syntax
nodes['label__Person'] = nodes['type'] == 'Person'
g = graphistry.edges(edges, 'src', 'dst').nodes(nodes, 'id')
g2 = g.gfql("MATCH (p:Person) WHERE p.age > 30 RETURN p.name")

Supported Cypher clauses

  • Full: MATCH, WHERE, RETURN, WITH, ORDER BY, SKIP, LIMIT, DISTINCT, CALL graphistry.*, GRAPH {}, USE
  • Partial: OPTIONAL MATCH (bounded subset), UNWIND (top-level), UNION/UNION ALL (direct g.gfql() only)
  • Not supported: CREATE, MERGE, DELETE, SET, REMOVE (GFQL is read-only)

Cypher functions

  • Scalar: labels(), type(), keys(), properties(), abs(), sqrt(), coalesce(), substring(), tointeger(), tofloat(), toboolean(), tostring()
  • Aggregation: count(), sum(), min(), max(), avg(), collect(), count(DISTINCT ...)
  • Operators: =, <>, <, <=, >, >=, IN, STARTS WITH, ENDS WITH, CONTAINS, IS NULL, IS NOT NULL, AND, OR, NOT

GRAPH constructor (Cypher extension)

# Extract subgraph as a graph object (not a table)
subgraph = g.gfql("GRAPH { MATCH (a)-[r]->(b) WHERE a.risk_score > 7 }")

# Multi-stage pipeline with named GRAPH bindings and USE
result = g.gfql("""
    GRAPH g1 = GRAPH { MATCH (a)-[r]->(b) WHERE a.event_count > 100 }
    GRAPH g2 = GRAPH { USE g1 CALL graphistry.degree.write() }
    USE g2 MATCH (n) RETURN n.id, n.degree ORDER BY n.degree DESC LIMIT 10
""")

Let/DAG bindings

from graphistry import n, e_forward, let, ref

# Named bindings forming a DAG
result = g.gfql(let({
    'high_risk': n({'risk_score': {'$gt': 0.8}}),
    'neighborhoods': ref('high_risk', [e_forward(max_hops=2), n()])
}))

# Select specific binding output
result = g.gfql(let({...}), output='neighborhoods')
# Multi-stage DAG: sequential refs build on each other
result = g.gfql(let({
    'people': n({'type': 'person'}),
    'contacts': ref('people', [e_forward({'rel': 'contacts'}), n()]),
    'owned': ref('contacts', [e_forward({'rel': 'owns'}), n()])
}), output='owned')
# Nested let: inner DAGs execute as opaque units for parallel-friendly pipelines
result = g.gfql(let({
    'social': let({
        'people': n({'type': 'person'}),
        'friends': ref('people', [e_forward({'rel': 'knows'}), n()]),
    }),
    'infra': let({
        'servers': n({'type': 'server'}),
        'traffic': ref('servers', [e_forward({'rel': 'serves'}), n()]),
    }),
    'combined': ref('social', [e_forward(), n()])
}), output='combined')
# Let + degree computation + visual encoding
from graphistry import n, e_forward, let, ref, call
result = g.gfql(let({
    'seeds': n({'risk_flag': True}),
    'neighborhood': ref('seeds', [e_forward(max_hops=2), n()]),
}))
# Then compute degrees and encode color
result = result.get_degrees().encode_point_color('degree', as_continuous=True)
  • Independent bindings operate on the root graph
  • ref() bindings operate on the referenced binding's output
  • Nested let scope rules (requires pygraphistry >= 0.53.7):
    • Inner bindings do NOT leak to outer scope
    • Inner bindings CAN read outer bindings (lexical closure)
    • Sibling nested lets may reuse names without collision
    • Each nested let is an opaque execution unit (parallel-friendly)

Targeted patterns (high signal)

# Edge query filtering
g2 = g.gfql([n(), e_forward(edge_query="type == 'replied_to' and submolt == 'X'"), n()])
# Same-path constraints with where + compare/col
from graphistry import col, compare
g2 = g.gfql([n(name='a'), e_forward(name='e'), n(name='b')], where=[compare(col('a', 'owner_id'), '==', col('b', 'owner_id'))])
# Traverse 2-4 hops but only return hops 3-4
g2 = g.gfql([e_forward(min_hops=2, max_hops=4, output_min_hops=3, output_max_hops=4)])

Edge direction variants

  • e_forward() — source-to-destination
  • e_reverse() — destination-to-source
  • e_undirected() — both directions
  • e() — alias for any direction

High-value patterns

  • g.gfql() is the unified entrypoint — pass chain-lists OR Cypher strings.
  • NEVER use .chain() or .hop() — they are deprecated and emit warnings. Always use g.gfql([...]) for chain-list syntax or g.gfql("MATCH ...") for Cypher.
  • When user explicitly asks for GFQL, final snippets must include explicit .gfql(...).
  • When the task says remote execution/dataset, use gfql_remote(...).
  • Use name= labels for intermediate matches when you need constraints.
  • Use where=[...] for cross-step/path constraints.
  • Use min_hops/max_hops and output_min_hops/output_max_hops for traversal vs returned slice.
  • Use predicates (is_in, numeric/date predicates) for concise filtering.
  • Use engine='auto' by default; force cudf/pandas only when needed.

Remote mode

# Remote with chain-list
rg = graphistry.bind(dataset_id='my-dataset')
res = rg.gfql_remote([n(), e_forward(), n()], engine='auto')
# Remote with Cypher string
res = rg.gfql_remote("MATCH (n:Person)-[r]->(m) WHERE n.risk_level = 'critical' RETURN n, r, m")
# Remote with Let/DAG
res = rg.gfql_remote(let({...}))
# Remote slim payload (only required columns)
res = rg.gfql_remote([n(), e_forward(), n()], output_type='nodes', node_col_subset=['node_id', 'time'])
# Post-process on remote side when you want trimmed transfer payloads
res = rg.python_remote_table(lambda g: g._edges[['src', 'dst']].head(1000))

Validation and safety

  • Validate user-derived query fragments before execution.
  • Normalize datetime columns before temporal predicates.
  • Prefer small column subsets for remote result transfer.
  • Preflight Cypher: from graphistry.compute.gfql.cypher import parse_cypher, compile_cypher

Canonical docs