Skip to content

neo4j-partners/dbx_analyst

Repository files navigation

Databricks Table Access Audit Tool

Audit tool for analyzing user and group access to Unity Catalog tables. Stores permissions in a Neo4j knowledge graph for powerful access path discovery and impact analysis.

Quick Start

# Install dependencies
uv sync

# Copy and configure environment
cp .env.sample .env
# Edit .env with your Databricks and Neo4j credentials

# Run interactive demo
uv run src/local_demo.py

The interactive mode will:

  1. Connect to Databricks and list available SQL warehouses
  2. Let you select a warehouse
  3. Provide a menu to sync, view stats, or run sample queries

Configuration

Copy .env.sample to .env and fill in your Databricks and Neo4j credentials:

cp .env.sample .env

Running

Local Demo (Recommended)

The local demo provides an interactive menu and command-line options:

# Interactive mode - menu-driven interface
uv run src/local_demo.py

# Direct sync with a specific warehouse
uv run src/local_demo.py --warehouse-id <ID>

# Show graph statistics
uv run src/local_demo.py --stats

# Query: What tables can a user access?
uv run src/local_demo.py --user-access alice@example.com

# Query: Who can access a table?
uv run src/local_demo.py --table-access catalog.schema.table

CLI Module

The table_access_audit module provides additional commands:

# Test Databricks connection
uv run python -m table_access_audit test

# Test Neo4j connection
uv run python -m table_access_audit graph-test

# Initialize Neo4j schema (create constraints)
uv run python -m table_access_audit graph-init

# Show Neo4j graph status
uv run python -m table_access_audit graph-status

# List Databricks resources
uv run python -m table_access_audit list-users
uv run python -m table_access_audit list-groups
uv run python -m table_access_audit list-catalogs
uv run python -m table_access_audit list-tables <catalog> [--schema <schema>]

# Get table grants
uv run python -m table_access_audit get-grants <catalog.schema.table> [--direct-only]

# Sync permissions to Neo4j
uv run python -m table_access_audit sync [--catalog <name>] [--include-system]

Neo4j Setup Options

Option 1: Neo4j AuraDB Free (Recommended)

  1. Sign up at https://neo4j.com/cloud/aura-free/
  2. Create a free instance
  3. Copy the connection URI and password to .env

Option 2: Neo4j Desktop

  1. Download from https://neo4j.com/download/
  2. Create a local database
  3. Use NEO4J_URI=neo4j://localhost:7687

Option 3: Docker

docker run -d \
  --name neo4j \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  neo4j:5

Graph Data Model

Nodes:

  • User - Databricks users
  • Group - Databricks groups
  • ServicePrincipal - Service principals
  • Catalog - Unity Catalog catalogs
  • Schema - Schemas within catalogs
  • Table - Tables within schemas

Relationships:

  • MEMBER_OF - User/Group membership (supports nested groups)
  • CONTAINS_SCHEMA - Catalog → Schema
  • CONTAINS_TABLE - Schema → Table
  • HAS_PRIVILEGE - Grant with privilege property (SELECT, MODIFY, etc.)
  • OWNS - Ownership relationship

Why SQL Warehouse?

This tool uses SQL SHOW GRANTS commands instead of the REST API because:

  1. Reliability: The REST API grants.get() fails on many Unity Catalog configurations with "not a valid securable type" errors
  2. Consistency: SQL SHOW GRANTS works reliably across all Unity Catalog deployments
  3. Same method: This is how Databricks Catalog Explorer retrieves permissions

A SQL warehouse must be running to execute queries. Use --list-warehouses to see available warehouses.

Sources and References

Databricks Documentation

Key Learnings

REST API vs SQL for Grants:

  • The REST API w.grants.get() fails on some configurations with: "SECURABLETYPE.CATALOG is not a valid securable type"
  • SQL SHOW GRANTS ON <type> <name> works reliably
  • Requires a running SQL warehouse

Unity Catalog Permissions Model:

  • Data is secure by default - users have no access until granted
  • Privileges inherit downward: Catalog → Schema → Table
  • Use account-level groups (not workspace-level) for Unity Catalog grants
  • BROWSE privilege enables discovery without USE CATALOG/SCHEMA

SQL Identifier Escaping:

  • Don't wrap full qualified names in backticks
  • Quote individual parts with special characters: `catalog-name`.schema.table

About

Databricks permissions analyst

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors