Provisioning Databricks Cluster with Claude Code CLI

This template provides a self-contained deployment of a Databricks cluster pre-configured with Claude Code CLI for AI-assisted development directly on the cluster.

What Gets Deployed

Unity Catalog Volume for init script storage
Databricks cluster with Claude Code CLI auto-installed on startup
MLflow experiment for tracing Claude Code sessions
Bash helper functions for easy usage

How to use

Copy terraform.tfvars.example to terraform.tfvars
Update terraform.tfvars with your values:
- databricks_resource_id: Your Azure Databricks workspace resource ID
- cluster_name: Name for your cluster
- catalog_name: Unity Catalog name to use
(Optional) Customize cluster configuration in terraform.tfvars (node type, autoscaling, etc.)
(Optional) Configure your remote backend
Run terraform init to initialize terraform and get provider ready
Run terraform plan to review the resources that will be created
Run terraform apply to create the resources

Prerequisites

Databricks workspace with Unity Catalog enabled
Unity Catalog with an existing catalog and schema
Unity Catalog metastore must have a root storage credential configured (required for volumes)
Permission to create clusters
(For Azure) Authenticated via az login or environment variables
Databricks Runtime 14.3 LTS or higher recommended

Note: If you encounter an error about missing root storage credential, you need to configure the metastore's root storage credential first. See Databricks documentation for details.

Post-Deployment

After the cluster starts, you can connect via SSH to use Claude Code and other development tools.

1. Configure SSH Tunnel

Use the Databricks CLI to set up SSH access to your new cluster:

# Authenticate if needed
databricks auth login --host https://your-workspace-url.cloud.databricks.com

# Set up SSH config (replace 'claude-dev' with your preferred alias)
databricks ssh setup --name claude-dev
# Select your cluster from the list when prompted

This creates an entry in your ~/.ssh/config file.

2. Connect via VSCode or Cursor

Install the Remote - SSH extension in VSCode or Cursor.
Open the Command Palette (Cmd+Shift+P / Ctrl+Shift+P).
Select Remote-SSH: Connect to Host.
Choose claude-dev (or the alias you created).
Select Linux as the platform.
Once connected, open your persistent workspace folder: /Workspace/Users/<your-email>/.

Important: Work Storage Location ⚠️ DO NOT use Databricks Repos (/Repos/...) for active development work. Repos folders can be unreliable for persistent storage and may lose uncommitted changes during cluster restarts or sync operations.

✅ Use /Workspace/Users/<your-email>/ instead. This location provides reliable persistent storage. You can use regular git commands to manage version control (see "Using Git in /Workspace" section below).

3. Launch Claude Code

Open the terminal in your remote VSCode/Cursor session and run:

# 1. Load environment variables and helpers
source ~/.bashrc

# 2. Enable MLflow tracing (optional but recommended)
claude-tracing-enable

# 3. Start Claude Code
claude

First-time setup tips:

Claude will ask for file permissions; use Shift+Tab to auto-allow edits in the current directory.
If you need to refresh credentials, run claude-refresh-token.

4. Remote Web App Development (Port Forwarding)

VSCode and Cursor automatically forward ports. For example, to run a Streamlit app:

Create app.py:

import streamlit as st
st.title("Databricks Remote App")
st.write("Running on cluster!")

Run it:
```
streamlit run app.py --server.port 8501
```
Click "Open in Browser" in the popup notification to view it at localhost:8501.

5. Using the Databricks Python Interpreter

You don't need to configure a virtual environment. Databricks manages it for you.

In the remote terminal, find the python path:

echo $DATABRICKS_VIRTUAL_ENV
# Output example: /local_disk0/.ephemeral_nfs/envs/pythonEnv-xxxx/bin/python

In VSCode/Cursor, open the Command Palette and select Python: Select Interpreter.
Paste the path from above.

6. Persistent Sessions with tmux

To keep your agent running even if you disconnect:

# Start a new session
tmux new -s claude-session

# Detach (Ctrl+B, then D)
# Reattach later
tmux attach -t claude-session

This allows you to leave long-running tasks (like "Build a data pipeline") executing on the cluster while you are offline.

7. Using Git in /Workspace

Since /Workspace doesn't have native Repos integration, use standard git commands:

# Navigate to your workspace directory
cd /Workspace/Users/<your-email>/

# Option 1: Clone an existing repository
git clone https://github.com/your-org/your-repo.git
cd your-repo

# Option 2: Initialize a new repository
mkdir my-project && cd my-project
git init
git remote add origin https://github.com/your-org/your-repo.git

# Configure git (first time only)
git config user.name "Your Name"
git config user.email "your.email@company.com"

# Regular git workflow
git add .
git commit -m "Your commit message"
git push origin main

Git Authentication Options:

Personal Access Token (PAT) - Recommended:

# GitHub: Create at https://github.com/settings/tokens
# Use token as password when prompted
git clone https://github.com/your-org/repo.git

SSH Keys:

# Generate SSH key on the cluster
ssh-keygen -t ed25519 -C "your.email@company.com"

# Add to GitHub: Copy output and add at https://github.com/settings/keys
cat ~/.ssh/id_ed25519.pub

# Clone using SSH
git clone git@github.com:your-org/repo.git

Git Credential Manager:

# Store credentials to avoid repeated prompts
git config --global credential.helper store

Helper Commands

Claude CLI Commands

Command	Purpose
`check-claude`	Verify Claude CLI installation and configuration
`claude-debug`	Show detailed Claude configuration
`claude-refresh-token`	Regenerate Claude settings from environment
`claude-token-status`	Check token freshness and auto-refresh status
`claude-tracing-enable`	Enable MLflow tracing for Claude sessions
`claude-tracing-status`	Check tracing status
`claude-tracing-disable`	Disable tracing

Git Workspace Commands

Command	Purpose
`git-workspace-init`	Interactive setup for git in /Workspace (clone or init)
`git-workspace-check`	Verify location and check for uncommitted/unpushed changes
`git-workspace-setup-auth`	Configure git authentication (PAT, SSH, or credential helper)

These helpers warn you if working in /Repos and ensure your work is backed up in git.

VS Code/Cursor Remote Commands

Command	Purpose
`claude-vscode-setup`	Show Remote SSH setup instructions
`claude-vscode-env`	Get Python interpreter path for IDE
`claude-vscode-check`	Verify Remote SSH configuration
`claude-vscode-config`	Generate settings.json snippet

Offline Installation

For air-gapped or restricted network environments, use the separate offline module: adb-coding-assistants-cluster-offline. See the Offline Installation Guide for detailed instructions.

Configuration Examples

Single-Node Development Cluster

cluster_mode = "SINGLE_NODE"
num_workers  = 0
node_type_id = "Standard_D8pds_v6"

Autoscaling Production Cluster

cluster_mode = "STANDARD"
num_workers  = null  # Enable autoscaling
min_workers  = 2
max_workers  = 8
node_type_id = "Standard_D8pds_v6"

Authentication

This example uses Databricks unified authentication. Authentication can be provided via:

Azure CLI (recommended for local development):
```
az login
terraform apply
```

Environment Variables (recommended for CI/CD):

export DATABRICKS_HOST="https://adb-xxx.azuredatabricks.net"
export DATABRICKS_TOKEN="dapi..."
terraform apply

Configuration Profile:

export DATABRICKS_CONFIG_PROFILE="my-profile"
terraform apply

For more details on authentication, see the Databricks unified authentication documentation.

Troubleshooting

Init Script Fails

Check cluster event logs in the Databricks UI under Compute → Your Cluster → Event Log.

Common issues:

Network connectivity to download packages
Unity Catalog volume permissions
Insufficient cluster permissions

Claude Not Found After Login

# Reload bashrc
source ~/.bashrc

# Verify PATH
check-claude

Authentication Issues

# Check environment variables
check-claude

# Regenerate configuration
claude-refresh-token

Additional Resources

Requirements

Name	Version
terraform	>= 1.0
azurerm	>=4.31.0
databricks	>=1.81.1

Providers

Name	Version
azurerm	4.57.0

Modules

No modules.

Resources

Name	Type
azurerm_client_config.current	data source
azurerm_databricks_workspace.this	data source
azurerm_resource_group.this	data source

Inputs

Name	Description	Type	Default	Required
catalog_name	Unity Catalog name for the volume	`string`	n/a	yes
cluster_name	Name of the Databricks cluster	`string`	n/a	yes
databricks_resource_id	The Azure resource ID for the Databricks workspace. Format: /subscriptions/{subscription-id}/resourceGroups/{resource-group}/providers/Microsoft.Databricks/workspaces/{workspace-name}	`string`	n/a	yes
autotermination_minutes	Minutes of inactivity before cluster auto-terminates	`number`	`30`	no
cluster_mode	Cluster mode: STANDARD or SINGLE_NODE	`string`	`"STANDARD"`	no
init_script_source_path	Local path to the init script	`string`	`null`	no
max_workers	Maximum number of workers for autoscaling	`number`	`3`	no
min_workers	Minimum number of workers for autoscaling	`number`	`1`	no
mlflow_experiment_name	MLflow experiment name for Claude Code tracing	`string`	`"/Workspace/Shared/claude-code-tracing"`	no
node_type_id	Node type for the cluster. Default is Standard_D8pds_v6 (modern, premium SSD + local NVMe). If unavailable in your region, consider Standard_DS13_v2 as fallback.	`string`	`"Standard_D8pds_v6"`	no
num_workers	Number of worker nodes (null for autoscaling)	`number`	`null`	no
schema_name	Schema name for the volume	`string`	`"default"`	no
spark_version	Databricks Runtime version	`string`	`"17.3.x-cpu-ml-scala2.13"`	no
tags	Custom tags for the cluster	`map(string)`	{ "Environment": "dev", "Purpose": "coding-assistants" }	no
volume_name	Volume name to store init scripts	`string`	`"coding_assistants"`	no

Outputs

Name	Description
cluster_id	The ID of the created cluster
cluster_name	Name of the created cluster
cluster_url	URL to access the cluster in Databricks UI
init_script_path	Path to the init script in the volume
mlflow_experiment_name	MLflow experiment name for tracing
setup_instructions	Instructions for using the cluster
volume_full_name	Full name of the volume
volume_path	Path to the volume containing init scripts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provisioning Databricks Cluster with Claude Code CLI

What Gets Deployed

How to use

Prerequisites

Post-Deployment

1. Configure SSH Tunnel

2. Connect via VSCode or Cursor

3. Launch Claude Code

4. Remote Web App Development (Port Forwarding)

5. Using the Databricks Python Interpreter

6. Persistent Sessions with tmux

7. Using Git in /Workspace

Helper Commands

Claude CLI Commands

Git Workspace Commands

VS Code/Cursor Remote Commands

Offline Installation

Configuration Examples

Single-Node Development Cluster

Autoscaling Production Cluster

Authentication

Troubleshooting

Init Script Fails

Claude Not Found After Login

Authentication Issues

Additional Resources

Requirements

Providers

Modules

Resources

Inputs

Outputs

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Provisioning Databricks Cluster with Claude Code CLI

What Gets Deployed

How to use

Prerequisites

Post-Deployment

1. Configure SSH Tunnel

2. Connect via VSCode or Cursor

3. Launch Claude Code

4. Remote Web App Development (Port Forwarding)

5. Using the Databricks Python Interpreter

6. Persistent Sessions with tmux

7. Using Git in /Workspace

Helper Commands

Claude CLI Commands

Git Workspace Commands

VS Code/Cursor Remote Commands

Offline Installation

Configuration Examples

Single-Node Development Cluster

Autoscaling Production Cluster

Authentication

Troubleshooting

Init Script Fails

Claude Not Found After Login

Authentication Issues

Additional Resources

Requirements

Providers

Modules

Resources

Inputs

Outputs