This example is using the aws-managed-file-events module.
This template provides a deployment of AWS infrastructure for Databricks Managed File Events, enabling file notification mode for Auto Loader with automatic S3 event notifications and SQS queues.
- Reference this module using one of the different module source types
- Add a
variables.tfwith the same content in variables.tf - Add a
terraform.tfvarsfile and provide values to each defined variable - Configure authentication to your Databricks workspace and AWS account
- Add a
output.tffile - (Optional) Configure your remote backend
- Run
terraform initto initialize terraform and get provider ready - Run
terraform applyto create the resources
The following shows all available module options:
module "managed_file_events" {
source = "../../modules/aws-managed-file-events"
# Required variables
prefix = var.prefix
region = var.region
aws_account_id = var.aws_account_id
databricks_account_id = var.databricks_account_id
# S3 Configuration
create_bucket = true # Set to false to use existing bucket
existing_bucket_name = null # Required if create_bucket = false
bucket_name = "my-custom-bucket-name" # Custom bucket name (default: prefix-file-events)
s3_path_prefix = "data/incoming" # Path prefix within the bucket
force_destroy_bucket = false # Allow bucket deletion with objects
# External Location Configuration
external_location_name = "my-external-location" # Custom name (default: prefix-file-events-location)
storage_credential_name = "my-storage-credential" # Custom name (default: prefix-file-events-credential)
# Catalog Configuration (Optional)
create_catalog = true
catalog_name = "my_catalog"
catalog_owner = "data-engineers@company.com"
catalog_isolation_mode = "OPEN" # OPEN or ISOLATED
# Grants Configuration
external_location_grants = [
{
principal = "data-engineers@company.com"
privileges = ["READ_FILES", "WRITE_FILES"]
}
]
storage_credential_grants = [
{
principal = "data-engineers@company.com"
privileges = ["CREATE_EXTERNAL_LOCATION"]
}
]
catalog_grants = [
{
principal = "data-engineers@company.com"
privileges = ["USE_CATALOG", "CREATE_SCHEMA"]
},
{
principal = "analysts@company.com"
privileges = ["USE_CATALOG"]
}
]
tags = {
Environment = "production"
ManagedBy = "terraform"
Project = "data-platform"
}
}Once deployed, you can use Auto Loader with managed file events in your Databricks notebooks:
df = spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", "json") \
.option("cloudFiles.useManagedFileEvents", "true") \
.load("s3://your-bucket/path")Or in Lakeflow Declarative Pipelines:
from pyspark import pipelines as dp
@dp.table
def my_table():
return spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", "json") \
.option("cloudFiles.useManagedFileEvents", "true") \
.load("/Volumes") # Ingesting from a volume that points to your S3 bucket will be more performant than the S3 location itself.| Name | Version |
|---|---|
| aws | >= 5.0 |
| databricks | >= 1.65.0 |
No providers.
| Name | Source | Version |
|---|---|---|
| managed_file_events | ../../modules/aws-managed-file-events | n/a |
No resources.
| Name | Description | Type | Default | Required |
|---|---|---|---|---|
| aws_account_id | (Required) AWS Account ID | string |
n/a | yes |
| databricks_account_id | (Required) Databricks Account ID | string |
n/a | yes |
| databricks_client_id | (Required) Databricks service principal client ID | string |
n/a | yes |
| databricks_client_secret | (Required) Databricks service principal client secret | string |
n/a | yes |
| databricks_host | (Required) Databricks workspace URL (e.g., https://xxx.cloud.databricks.com) | string |
n/a | yes |
| databricks_pat_token | (Required) Databricks service principal client secret | string |
n/a | yes |
| prefix | (Required) Prefix for resource naming | string |
n/a | yes |
| region | (Required) AWS region to deploy to | string |
n/a | yes |
| aws_profile | (Optional) AWS CLI profile name for authentication | string |
null |
no |
| tags | (Optional) Tags to add to created resources | map(string) |
{} |
no |
| Name | Description |
|---|---|
| bucket_name | Name of the S3 bucket |
| external_location_name | Name of the external location |
| external_location_url | S3 URL of the external location |
| iam_role_arn | ARN of the IAM role |
| storage_credential_name | Name of the storage credential |