Provisioning Databricks Managed File Events on AWS

This example is using the aws-managed-file-events module.

This template provides a deployment of AWS infrastructure for Databricks Managed File Events, enabling file notification mode for Auto Loader with automatic S3 event notifications and SQS queues.

How to use

Reference this module using one of the different module source types
Add a variables.tf with the same content in variables.tf
Add a terraform.tfvars file and provide values to each defined variable
Configure authentication to your Databricks workspace and AWS account
Add a output.tf file
(Optional) Configure your remote backend
Run terraform init to initialize terraform and get provider ready
Run terraform apply to create the resources

Complete Example with All Options

The following shows all available module options:

module "managed_file_events" {
  source = "../../modules/aws-managed-file-events"

  # Required variables
  prefix                = var.prefix
  region                = var.region
  aws_account_id        = var.aws_account_id
  databricks_account_id = var.databricks_account_id

  # S3 Configuration
  create_bucket        = true                    # Set to false to use existing bucket
  existing_bucket_name = null                    # Required if create_bucket = false
  bucket_name          = "my-custom-bucket-name" # Custom bucket name (default: prefix-file-events)
  s3_path_prefix       = "data/incoming"         # Path prefix within the bucket
  force_destroy_bucket = false                   # Allow bucket deletion with objects

  # External Location Configuration
  external_location_name  = "my-external-location"  # Custom name (default: prefix-file-events-location)
  storage_credential_name = "my-storage-credential" # Custom name (default: prefix-file-events-credential)

  # Catalog Configuration (Optional)
  create_catalog         = true
  catalog_name           = "my_catalog"
  catalog_owner          = "data-engineers@company.com"
  catalog_isolation_mode = "OPEN"  # OPEN or ISOLATED

  # Grants Configuration
  external_location_grants = [
    {
      principal  = "data-engineers@company.com"
      privileges = ["READ_FILES", "WRITE_FILES"]
    }
  ]

  storage_credential_grants = [
    {
      principal  = "data-engineers@company.com"
      privileges = ["CREATE_EXTERNAL_LOCATION"]
    }
  ]

  catalog_grants = [
    {
      principal  = "data-engineers@company.com"
      privileges = ["USE_CATALOG", "CREATE_SCHEMA"]
    },
    {
      principal  = "analysts@company.com"
      privileges = ["USE_CATALOG"]
    }
  ]

  tags = {
    Environment = "production"
    ManagedBy   = "terraform"
    Project     = "data-platform"
  }
}

Using with Auto Loader

Once deployed, you can use Auto Loader with managed file events in your Databricks notebooks:

df = spark.readStream.format("cloudFiles") \
    .option("cloudFiles.format", "json") \
    .option("cloudFiles.useManagedFileEvents", "true") \
    .load("s3://your-bucket/path")

Or in Lakeflow Declarative Pipelines:

from pyspark import pipelines as dp

@dp.table
def my_table():
    return spark.readStream.format("cloudFiles") \
        .option("cloudFiles.format", "json") \
        .option("cloudFiles.useManagedFileEvents", "true") \
        .load("/Volumes") # Ingesting from a volume that points to your S3 bucket will be more performant than the S3 location itself.

Reference

Databricks File Notification Mode Documentation

Requirements

Name	Version
aws	>= 5.0
databricks	>= 1.65.0

Providers

No providers.

Modules

Name	Source	Version
managed_file_events	../../modules/aws-managed-file-events	n/a

Resources

No resources.

Inputs

Name	Description	Type	Default	Required
aws_account_id	(Required) AWS Account ID	`string`	n/a	yes
databricks_account_id	(Required) Databricks Account ID	`string`	n/a	yes
databricks_client_id	(Required) Databricks service principal client ID	`string`	n/a	yes
databricks_client_secret	(Required) Databricks service principal client secret	`string`	n/a	yes
databricks_host	(Required) Databricks workspace URL (e.g., https://xxx.cloud.databricks.com)	`string`	n/a	yes
databricks_pat_token	(Required) Databricks service principal client secret	`string`	n/a	yes
prefix	(Required) Prefix for resource naming	`string`	n/a	yes
region	(Required) AWS region to deploy to	`string`	n/a	yes
aws_profile	(Optional) AWS CLI profile name for authentication	`string`	`null`	no
tags	(Optional) Tags to add to created resources	`map(string)`	`{}`	no

Outputs

Name	Description
bucket_name	Name of the S3 bucket
external_location_name	Name of the external location
external_location_url	S3 URL of the external location
iam_role_arn	ARN of the IAM role
storage_credential_name	Name of the storage credential

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provisioning Databricks Managed File Events on AWS

How to use

Complete Example with All Options

Using with Auto Loader

Reference

Requirements

Providers

Modules

Resources

Inputs

Outputs

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Provisioning Databricks Managed File Events on AWS

How to use

Complete Example with All Options

Using with Auto Loader

Reference

Requirements

Providers

Modules

Resources

Inputs

Outputs