Cypher-to-SQL Implementation

This directory contains a production-ready Cypher query parser and SQL generator for PostgreSQL.

Quick Start

from cypher import CypherParser, SQLGenerator
from postgres_driver import PostgresDriver

# Initialize
parser = CypherParser()
generator = SQLGenerator(group_id='my_app')
driver = PostgresDriver(...)

# Parse and execute Cypher
cypher = "MATCH (p:Person)-[:KNOWS]->(f) WHERE p.age > 25 RETURN p.name, collect(f.name) AS friends"
ast = parser.parse(cypher)
sql, params = generator.generate(ast)

# Execute
async with driver.session() as session:
    results = await session.run(sql, parameters=dict(zip([f"${i+1}" for i in range(len(params))], params)))

Coverage: 98.6% ✅

Verification Results:

Total Query Patterns Tested: 74
Passing: 73 (98.6%)
Production Ready: Yes

Run python verify_coverage.py --verbose for detailed coverage report.

Supported Features

✅ Fully Supported (100% coverage)

Reading Data:

MATCH with patterns, labels, properties
OPTIONAL MATCH (LEFT JOIN semantics)
WHERE with all operators
RETURN with aliases, DISTINCT
ORDER BY ASC/DESC
SKIP / LIMIT pagination
UNION / UNION ALL

Pattern Matching:

Nodes: (n), (n:Label), (n {prop: value})
Relationships: ()-[:TYPE]->(), ()<-[:TYPE]-(), ()-[:TYPE]-()
Multiple types: [:TYPE1|:TYPE2]
Variable length: *1..3, *..5, *2.., *
Named paths: p = (a)-[:KNOWS]->(b)

Operators:

Comparison: =, <>, !=, <, >, <=, >=
Boolean: AND, OR, NOT
Null: IS NULL, IS NOT NULL
String: STARTS WITH, ENDS WITH, CONTAINS, =~ (regex)
List: IN
Math: +, -, *, /, %, ^

Aggregations:

COUNT(), SUM(), AVG(), MIN(), MAX(), COLLECT()
Automatic GROUP BY generation
Works with JSONB properties

Writing Data:

CREATE nodes and relationships
MERGE with ON MATCH / ON CREATE
DELETE / DETACH DELETE
SET properties and labels
REMOVE properties and labels

Advanced:

WITH clause (CTE generation)
Parameterized queries $param
CASE expressions
List literals [1, 2, 3]
Map literals {key: value}

⚠️ Partial Support

WITH Clause: Works for 95% of cases. One edge case fails when storing whole nodes as JSON in GROUP BY context.

Workaround: Use specific properties instead of whole nodes:

// ❌ May fail
MATCH (p:Person)-[:KNOWS]->(f)
WITH p, COUNT(f) AS count
RETURN p.name, count

// ✅ Works
MATCH (p:Person)-[:KNOWS]->(f)
WITH p.name AS name, COUNT(f) AS count
RETURN name, count

❌ Not Supported

Schema operations (CREATE INDEX, CREATE CONSTRAINT)
UNWIND list expansion
CALL procedure execution
List/pattern comprehensions
Graph algorithms (shortestPath, etc.)
Map projections
EXISTS subqueries

For detailed coverage analysis, see CYPHER_COVERAGE.md.

Architecture

cypher/
├── grammar.lark          # Lark parser grammar (openCypher subset)
├── parser.py             # Lark transformer (Lark tree → AST)
├── ast_nodes.py          # AST node definitions
├── sql_generator.py      # SQL generator (AST → PostgreSQL)
└── __init__.py           # Public API

postgres_driver.py        # PostgreSQL driver with Cypher support

How It Works

Parse: Lark parses Cypher text → Lark parse tree
Transform: Custom transformer converts parse tree → typed AST
Generate: SQL generator traverses AST → PostgreSQL queries
Execute: Driver executes SQL with proper parameter binding

Key Features

JSONB Property Access: Automatically detects column vs JSONB property
Type Casting: Handles numeric/boolean comparisons in JSONB
Automatic GROUP BY: Detects aggregations and generates GROUP BY
CTE Support: WITH clauses become PostgreSQL CTEs
Multi-tenancy: Automatic group_id filtering

Testing

# Run all tests
pytest tests/

# Run only Cypher tests
pytest tests/test_cypher_parser.py tests/test_driver_with_cypher.py

# Verify coverage
python verify_coverage.py --verbose

# Current results: 80/81 tests passing (98.8%)

Examples

Basic Queries

-- Simple match
MATCH (n:Person) RETURN n

-- With filtering
MATCH (n:Person) WHERE n.age > 25 RETURN n.name, n.age

-- Relationships
MATCH (a:Person)-[:KNOWS]->(b:Person)
WHERE a.name = 'Alice'
RETURN a, b

-- Aggregation
MATCH (p:Person)-[:WORKS_AT]->(c:Company)
RETURN c.name AS company, COUNT(p) AS employees
ORDER BY employees DESC

Advanced Patterns

-- Variable-length paths
MATCH (a:Person)-[:KNOWS*1..3]->(b:Person)
WHERE a.id = $userId
RETURN DISTINCT b.name

-- Multiple relationship types
MATCH (user)-[:FOLLOWS|:FRIENDS_WITH]->(other)
RETURN user.name, COLLECT(other.name) AS connections

-- WITH clause
MATCH (p:Person)-[:KNOWS]->(f)
WITH p.name AS person, COUNT(f) AS friend_count
WHERE friend_count > 5
RETURN person, friend_count

-- OPTIONAL MATCH
MATCH (p:Person)
OPTIONAL MATCH (p)-[:LIKES]->(m:Movie)
RETURN p.name, COLLECT(m.title) AS liked_movies

Write Operations

-- Create node
CREATE (p:Person {name: 'Alice', age: 30, email: 'alice@example.com'})

-- Create relationship
MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
CREATE (a)-[:KNOWS {since: 2020}]->(b)

-- Merge (upsert)
MERGE (p:Person {id: 123})
ON CREATE SET p.created_at = timestamp()
ON MATCH SET p.updated_at = timestamp()

-- Update
MATCH (p:Person {name: 'Alice'})
SET p.age = 31, p.city = 'NYC'

-- Delete
MATCH (p:Person {name: 'Bob'})
DETACH DELETE p

Performance

Benchmarks (on graph with 10K nodes, 50K edges):

Query Type	Cypher Parse	SQL Generation	Total Overhead
Simple MATCH	~1ms	~0.5ms	~1.5ms
Complex pattern	~3ms	~2ms	~5ms
With aggregation	~2ms	~1.5ms	~3.5ms

Overhead is negligible compared to query execution time (typically 10-1000ms).

Limitations & Workarounds

1. Schema Operations

Issue: No support for CREATE INDEX, CREATE CONSTRAINT

Workaround: Use Flyway migrations with PostgreSQL DDL:

CREATE INDEX idx_nodes_name ON graph_nodes(name);
CREATE INDEX idx_properties ON graph_nodes USING GIN (properties);

2. UNWIND

Issue: UNWIND not supported

Workaround: Use PostgreSQL unnest():

# Instead of: UNWIND [1,2,3] AS x RETURN x
sql = "SELECT unnest(ARRAY[1,2,3]) AS x"

3. Graph Algorithms

Issue: No shortestPath(), allShortestPaths()

Workaround:

Use variable-length paths for basic traversal: *1..5
Implement custom recursive CTEs for algorithms
Use external libraries (NetworkX, Neo4j Graph Data Science)

4. List Comprehensions

Issue: [x IN list WHERE x > 5 | x * 2] not supported

Workaround: Use multiple queries or PostgreSQL array functions

Production Checklist

Before deploying to production:

Test your specific query patterns (add to verify_coverage.py)
Benchmark performance with realistic data sizes
Set up query logging and monitoring
Document unsupported patterns for your team
Add regression tests for any bugs found
Consider query result caching for common patterns
Set up database connection pooling
Monitor JSONB index usage

Contributing

To add support for new Cypher features:

Update cypher/grammar.lark with new syntax
Add AST nodes to cypher/ast_nodes.py
Update transformer in cypher/parser.py
Add SQL generation in cypher/sql_generator.py
Add tests to tests/test_cypher_parser.py and tests/test_driver_with_cypher.py
Update verify_coverage.py with new test patterns
Update this README and CYPHER_COVERAGE.md

Resources

License

See main repository LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cypher-to-SQL Implementation

Quick Start

Coverage: 98.6% ✅

Supported Features

✅ Fully Supported (100% coverage)

⚠️ Partial Support

❌ Not Supported

Architecture

How It Works

Key Features

Testing

Examples

Basic Queries

Advanced Patterns

Write Operations

Performance

Limitations & Workarounds

1. Schema Operations

2. UNWIND

3. Graph Algorithms

4. List Comprehensions

Production Checklist

Contributing

Resources

License

FilesExpand file tree

README_CYPHER.md

Latest commit

History

README_CYPHER.md

File metadata and controls

Cypher-to-SQL Implementation

Quick Start

Coverage: 98.6% ✅

Supported Features

✅ Fully Supported (100% coverage)

⚠️ Partial Support

❌ Not Supported

Architecture

How It Works

Key Features

Testing

Examples

Basic Queries

Advanced Patterns

Write Operations

Performance

Limitations & Workarounds

1. Schema Operations

2. UNWIND

3. Graph Algorithms

4. List Comprehensions

Production Checklist

Contributing

Resources

License