Data Exploration

Tools for discovering table schemas and executing SQL queries in Databricks.

Finding Tables by Keyword

⚠️ START HERE if you don't know which catalog/schema contains your data.

Use information_schema to search for tables by keyword — do NOT manually iterate through catalogs list → schemas list → tables list. Manual enumeration wastes 10+ steps.

# Find tables matching a keyword
databricks experimental aitools tools query \
  "SELECT table_catalog, table_schema, table_name FROM system.information_schema.tables WHERE table_name LIKE '%keyword%'" \
  --profile <PROFILE>

# Then discover schema for the tables you found
databricks experimental aitools tools discover-schema catalog.schema.table1 catalog.schema.table2 --profile <PROFILE>

Overview

The databricks experimental aitools tools command group provides tools for data discovery and exploration:

discover-schema: Batch discover table metadata, columns, types, sample data, and statistics
query: Execute SQL queries against Databricks SQL warehouses

When to use this: Use these commands whenever you need to:

Discover table schemas and metadata
Execute SQL queries against warehouse data
Explore data structure and content
Validate data or check table statistics

Prerequisites

Authenticated Databricks CLI - see CLI Authentication Guide for OAuth2 setup and profile configuration
Access to Unity Catalog tables with appropriate read permissions
SQL Warehouse (for query command - auto-detected unless DATABRICKS_WAREHOUSE_ID is set)

Discover Schema

Batch discover table metadata including columns, types, sample data, and null counts.

Command Syntax

databricks experimental aitools tools discover-schema TABLE... [flags]

Tables must be specified in CATALOG.SCHEMA.TABLE format.

What It Returns

For each table, returns:

Column names and types
Sample data (5 rows)
Null counts per column
Total row count

Examples

# Discover schema for a single table
databricks experimental aitools tools discover-schema samples.nyctaxi.trips --profile my-workspace

# Discover schema for multiple tables
databricks experimental aitools tools discover-schema \
  catalog.schema.table1 \
  catalog.schema.table2 \
  --profile my-workspace

# Get JSON output
databricks experimental aitools tools discover-schema \
  samples.nyctaxi.trips \
  --output json \
  --profile my-workspace

Common Use Cases

Understanding table structure before querying

databricks experimental aitools tools discover-schema catalog.schema.customer_data --profile my-workspace

Comparing schemas across multiple tables

databricks experimental aitools tools discover-schema \
  catalog.schema.table_v1 \
  catalog.schema.table_v2 \
  --profile my-workspace

Identifying columns with null values
- The null counts help identify data quality issues

Query

Execute SQL statements against a Databricks SQL warehouse and return results.

Command Syntax

databricks experimental aitools tools query "SQL" [flags]

Warehouse Selection

The command auto-detects an available warehouse unless:

DATABRICKS_WAREHOUSE_ID environment variable is set
You specify a warehouse using other configuration methods

To check which warehouse will be used:

# Get the default warehouse that would be auto-detected
databricks experimental aitools tools get-default-warehouse --profile my-workspace

Output

Returns:

Query results as JSON
Row count
Execution metadata

Examples

# Simple SELECT query
databricks experimental aitools tools query \
  "SELECT * FROM samples.nyctaxi.trips LIMIT 5" \
  --profile my-workspace

# Aggregation query
databricks experimental aitools tools query \
  "SELECT vendor_id, COUNT(*) as trip_count FROM samples.nyctaxi.trips GROUP BY vendor_id" \
  --profile my-workspace

# With JSON output
databricks experimental aitools tools query \
  "SELECT * FROM catalog.schema.table WHERE date > '2024-01-01'" \
  --output json \
  --profile my-workspace

# Using specific warehouse
DATABRICKS_WAREHOUSE_ID=abc123 databricks experimental aitools tools query \
  "SELECT * FROM samples.nyctaxi.trips LIMIT 10" \
  --profile my-workspace

Common Use Cases

Exploratory data analysis

# Check table size
databricks experimental aitools tools query \
  "SELECT COUNT(*) FROM catalog.schema.table" \
  --profile my-workspace

# View sample data
databricks experimental aitools tools query \
  "SELECT * FROM catalog.schema.table LIMIT 10" \
  --profile my-workspace

# Get column statistics
databricks experimental aitools tools query \
  "SELECT MIN(column), MAX(column), AVG(column) FROM catalog.schema.table" \
  --profile my-workspace

Data validation

# Check for null values
databricks experimental aitools tools query \
  "SELECT COUNT(*) FROM catalog.schema.table WHERE column IS NULL" \
  --profile my-workspace

# Verify data freshness
databricks experimental aitools tools query \
  "SELECT MAX(timestamp_column) FROM catalog.schema.table" \
  --profile my-workspace

Quick analytics

# Group by analysis
databricks experimental aitools tools query \
  "SELECT category, COUNT(*), AVG(value) FROM catalog.schema.table GROUP BY category" \
  --profile my-workspace

Workflow: Complete Data Exploration

Here's a typical workflow combining both commands:

# 1. Discover the schema first
databricks experimental aitools tools discover-schema \
  samples.nyctaxi.trips \
  --profile my-workspace

# 2. Based on discovered columns, run targeted queries
databricks experimental aitools tools query \
  "SELECT vendor_id, payment_type, COUNT(*) as trips, AVG(fare_amount) as avg_fare
   FROM samples.nyctaxi.trips
   GROUP BY vendor_id, payment_type
   ORDER BY trips DESC
   LIMIT 10" \
  --profile my-workspace

# 3. Investigate specific patterns found in the data
databricks experimental aitools tools query \
  "SELECT * FROM samples.nyctaxi.trips
   WHERE fare_amount > 100
   LIMIT 20" \
  --profile my-workspace

Claude Code-Specific Tips

Remember that each Bash command in Claude Code runs in a separate shell:

# ✅ RECOMMENDED: Use --profile flag
databricks experimental aitools tools discover-schema samples.nyctaxi.trips --profile my-workspace

# ✅ ALTERNATIVE: Chain with &&
export DATABRICKS_CONFIG_PROFILE=my-workspace && \
  databricks experimental aitools tools query "SELECT * FROM samples.nyctaxi.trips LIMIT 5"

# ❌ DOES NOT WORK: Separate export
export DATABRICKS_CONFIG_PROFILE=my-workspace
databricks experimental aitools tools query "SELECT * FROM samples.nyctaxi.trips LIMIT 5"

Flags

Both commands support:

Flag	Description	Default
`--profile`	Profile name from ~/.databrickscfg	Default profile
`--output`	Output format: `text` or `json`	`text`
`--debug`	Enable debug logging	`false`
`--target`	Bundle target to use (if applicable)	-

Troubleshooting

Table Not Found

Symptom: Error: TABLE_OR_VIEW_NOT_FOUND

Solution:

Verify table name format: CATALOG.SCHEMA.TABLE
Check if you have read permissions on the table

List available tables:

databricks tables list <catalog> <schema> --profile my-workspace

Warehouse Not Available

Symptom: Error: No available SQL warehouse found

Solution:

Check for default warehouse:

databricks experimental aitools tools get-default-warehouse --profile my-workspace

List available warehouses:

databricks warehouses list --profile my-workspace

Set specific warehouse:

DATABRICKS_WAREHOUSE_ID=<warehouse-id> databricks experimental aitools tools query "SELECT 1" --profile my-workspace

Start a stopped warehouse:

databricks warehouses start --id <warehouse-id> --profile my-workspace

Permission Denied

Symptom: Error: PERMISSION_DENIED

Solution:

Check Unity Catalog grants on the table:

databricks grants get --full-name catalog.schema.table --principal <user-email> --profile my-workspace

Request SELECT permission from your workspace administrator
Verify you have warehouse access (USAGE permission)

SQL Syntax Error

Symptom: Error: PARSE_SYNTAX_ERROR

Solution:

Check SQL syntax - use standard SQL
Verify column names match schema (use discover-schema first)
Ensure proper quoting for string literals
Test query incrementally (start simple, add complexity)

Best Practices

Always discover schema first - Use discover-schema before writing complex queries to understand:
- Available columns and their types
- Data distributions and null patterns
- Sample data for context

Use LIMIT for exploration - When exploring large tables, always use LIMIT to avoid long-running queries:

databricks experimental aitools tools query "SELECT * FROM large_table LIMIT 100" --profile my-workspace

JSON output for parsing - Use --output json when you need to process results programmatically:

databricks experimental aitools tools query "SELECT * FROM table" --output json --profile my-workspace | jq '.results'

Check table existence - Before querying, verify the table exists:

databricks tables get --full-name catalog.schema.table --profile my-workspace

Profile usage - Always specify --profile in Claude Code to avoid authentication issues

Related Commands

Declarative Automation Bundles - Deploy SQL, pipeline, and app resources as code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Exploration

Finding Tables by Keyword

Overview

Prerequisites

Discover Schema

Command Syntax

What It Returns

Examples

Common Use Cases

Query

Command Syntax

Warehouse Selection

Output

Examples

Common Use Cases

Workflow: Complete Data Exploration

Claude Code-Specific Tips

Flags

Troubleshooting

Table Not Found

Warehouse Not Available

Permission Denied

SQL Syntax Error

Best Practices

Related Commands

FilesExpand file tree

data-exploration.md

Latest commit

History

data-exploration.md

File metadata and controls

Data Exploration

Finding Tables by Keyword

Overview

Prerequisites

Discover Schema

Command Syntax

What It Returns

Examples

Common Use Cases

Query

Command Syntax

Warehouse Selection

Output

Examples

Common Use Cases

Workflow: Complete Data Exploration

Claude Code-Specific Tips

Flags

Troubleshooting

Table Not Found

Warehouse Not Available

Permission Denied

SQL Syntax Error

Best Practices

Related Commands