This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Neo4j + Databricks Unity Catalog Integration — a federated query system enabling SQL queries against Neo4j graph databases from Databricks via Unity Catalog's JDBC support. The core component is a shaded (fat) JAR connector that bundles the Neo4j JDBC driver with SQL-to-Cypher translators.
The Neo4j Unity Catalog Connector JAR is built and released from a separate repo: neo4j-unity-catalog-connector. Download the latest release from releases.
Translators are discovered via Java ServiceLoader (META-INF/services/org.neo4j.jdbc.translator.spi.TranslatorFactory). The pipeline chains translators by Translator.getOrder():
- SparkSubqueryCleaningTranslator (highest precedence) — strips Spark's
SPARK_GEN_SUBQ_0 WHERE 1=0wrapping that Databricks adds to JDBC queries - SqlToCypherTranslator — converts cleaned SQL into Cypher
Each translator returns null for queries it doesn't handle, passing to the next in the chain.
The Maven shade plugin merges neo4j-jdbc, neo4j-jdbc-translator-impl, and neo4j-jdbc-translator-sparkcleaner into a single JAR. All dependencies are relocated under org.neo4j.jdbc.internal.shaded.* to avoid classpath conflicts with Databricks SafeSpark's isolated JVM. The ServicesResourceTransformer merges SPI registration files across the bundled JARs — this is critical for translator discovery.
Databricks runs custom JDBC drivers in an isolated JVM sandbox. The connector requires metaspace tuning:
spark.databricks.safespark.jdbcSandbox.jvm.maxMetaspace.mib 128
neo4j-unity-catalog-connector/— Maven project (Java 17), the connector JARneo4j-uc-federation-lab/— Databricks notebooks (5 numbered notebooks demonstrating patterns)site/— Antora documentation site (AsciiDoc, published to GitHub Pages)docs/— Markdown reference documentation
- Formatter: Palantir Java Format via Spotless Maven Plugin (enforced at compile phase)
- Run
./mvnw spotless:applybefore committing Java changes
JUnit 5 tests in neo4j-unity-catalog-connector/src/test/java/. Key test (BundledTranslatorsTest.java) verifies:
- SPI discovery of both translator factories
- Spark subquery cleaning pipeline
- Cypher passthrough behavior
- Neo4j JDBC driver class loading
Tag with connector-* pattern triggers GitHub Actions to build and publish a release:
git tag connector-1.0.0
git push origin connector-1.0.0Supported: SELECT COUNT(*), aggregates with WHERE, COUNT DISTINCT, NATURAL JOIN (graph traversals), subqueries with aggregates, GROUP BY (implicit and explicit WITH-clause generation), HAVING (simple, compound, mixed aggregates, without GROUP BY), ORDER BY (including on aggregate aliases and after WITH clauses), DISTINCT with GROUP BY/HAVING, LIMIT/OFFSET with WITH clauses, WHERE + GROUP BY combinations, JOIN + GROUP BY, COUNT(DISTINCT) in HAVING, additional aggregate functions (percentileCont, percentileDisc, stDev, stDevP), full clause combinations.
Not supported (use Spark Connector instead): non-aggregate SELECT, relationship property aggregation.