This Terraform module exports Azure cost-related data and forwards to AWS S3. The supported data sets are described below:
- Cost Data: Daily parquet files containing standardized cost and usage
details in FOCUS format; daily schedule requires an end date - defaults to
10 years from deployment but can be changed with module variable
cost_export_daily_schedule_to_years - Azure Advisor Recommendations: Daily JSON files containing cost optimization recommendations from Azure Advisor
- Carbon Emissions Data: Monthly JSON reports with carbon footprint metrics across Scope 1 and Scope 3 emissions
Note
There is currently an issue with publishing Function App code on the Flex Consumption Plan using a managed identity. We have had to revert to using the storage account connection string for now. More details can be found here (behind a paywall, sadly).
This module creates a fully integrated solution for exporting multiple Azure datasets and forwarding them to AWS S3. The following diagram illustrates the data flow and component architecture for all three export types:
The module creates three distinct export pipelines for each of the data sets:
- Daily Export: Cost Management exports daily FOCUS-format cost data (Parquet files) to Azure Storage
- Event Trigger: Blob creation events trigger the
CostExportProcessorfunction via storage queue - Processing: Function processes and transforms the data (removes sensitive columns, restructures paths)
- Upload: Processed data uploaded to S3 in partitioned structure:
billing_period=YYYYMMDD/; all billing account cost data written to the same folder each parquet object prefixed with the billing account name
- Daily Trigger:
AdvisorRecommendationsExporterfunction runs daily at 2 AM (timer trigger) - API Call: Function calls Azure Advisor Recommendations API for all subscriptions in scope, filtering for cost category recommendations
- Processing: Response data formatted as JSON with subscription tracking and date metadata
- Upload: JSON data uploaded to S3 in partitioned structure:
gds-recommendations-v1/billing_period=YYYYMMDD/
- Monthly Trigger:
CarbonEmissionsExporterfunction runs every day to download the latest data as soon as it becomes available (around the 19th of each month)- API Call: Function calls Azure Carbon Optimization API against
MonthlySummaryReportfor previous month's Scope 1 & 3 emissions- Batches the API call per 100 subscriptions, and merges all each of the datasets into one - refer to "subscription batching" below.
- Processing: Response data formatted as JSON with dynamic date range validation (12-month rolling window)
- Upload: JSON data uploaded to S3 in partitioned structure:
billing_period=YYYYMMDD/
- API Call: Function calls Azure Carbon Optimization API against
The Carbon Optimization API provides a rolling 12-month window of emissions data. The available date range is calculated dynamically based on Microsoft's data availability policy:
- Data Availability: Previous month's data becomes available by the 19th of the current month
- Rolling Window: API provides access to exactly 12 months of historical data
- Dynamic Calculation: Date ranges are recalculated on each function execution (no hard-coded dates)
- Automatic Adjustment: Functions automatically use the most recent available data within the API's current range
Example: On October 30, 2024 (day ≥19), the API would provide data for September 2024. The same function running on January 15, 2025 would provide data for November 2024.
A test endpoint is available at /api/carbon-date-range to view the current
calculated date range.
The Carbon Optimization API has a maximum limit of 100 subscriptions per request. The functions automatically handle large subscription lists through intelligent batching:
- Automatic Batching: Subscription lists >100 are automatically split into batches of 100 or fewer
- Result Merging: Responses from multiple batches are seamlessly merged into a single result
- Error Handling: Partial failures are handled gracefully - successful batches are preserved even if some fail
- Transparent Operation: Batching is completely transparent to users and maintains all existing functionality
- Enhanced Logging: Detailed logs show batch progress and any issues
Example: For 131 subscriptions (like GDS), the system automatically:
- Creates 2 batches: 100 + 31 subscriptions
- Makes 2 separate API calls
- Merges the results automatically
- Provides complete data as if from a single request
- Function Apps use Managed Identity to authenticate with Entra ID Application
- Entra ID Application uses OIDC federation to assume AWS IAM Role
- All data transfers secured with cross-cloud federation (no long-lived AWS credentials)
- Application Insights provides telemetry and monitoring for all pipelines
Endpoint: POST /api/cost-export-backfill
Can be called on-demand with a mandatory query parameter start_date in the
format YYYY-MM-DD.
The cost export has two separate lock files; one for the schedule (which creates the backfill of Cost Mgmt Export tasks for each month) and the run (the executing of those exports) - in batches of six (half year). Lock objects are created only after successfully creating the schedule or once a full run across all tasks has completed successfully.
To run the full backfill of tasks, simply repeatedly run this cost export backfill task. If a task is already running, it will not interrupt the running task but it will count as one of the batch of six. It takes around 15 minutes for each task to run - and will run concurrently.
The schedule will be created from the given backfill start date for every month up to until last month.
To remove the lock object, contact appvia support.
Query Parameters:
start_date- the backfill start date in format YYYY-MM-DD (e.g. 2025-01-01); no default must be givenforce_overwrite=true- Overwrite existing data files (default: false); setskip_existingto Falseskip_existing=false- Process all months regardless of existing data (default: true)
Examples:
POST /api/cost-export-backfill- Skip months that already have data (idempotent)POST /api/cost-export-backfill?force_overwrite=true- Overwrite all existing dataPOST /api/cost-export-backfill?skip_existing=false- Process all months, but don't skip if carbon export already exists
Endpoint: POST /api/carbon-backfill
Can be called on-demand with a mandatory query parameter start_date in the
format YYYY-MM-DD, called the same API as the monthly trigger but for each
month from the given start date.
Uses a "carbon export" lock object on the target S3 bucket as semaphore; the lock object exists then Carbon data backfill is skipped. Lock object is created only once a full carbon export backfill has completed successfully.
The Carbon Mgmt API only provides up to 12 months of archive data; where the backfill start date precedes the 12 months it will write an empty file. The backfill will run from start date up until the month prior to current Carbon Export (note the 19th of the month - see above).
To remove the lock object, contact appvia support.
Query Parameters:
start_date- the backfill start date in format YYYY-MM-DD (e.g. 2025-01-01); no default must be givenforce_overwrite=true- Overwrite existing data files (default: false); setskip_existingto Falseskip_existing=false- Process all months regardless of existing data (default: true)write_empty_object- If no data exists for given month will write an empty export (default: true)
Examples:
POST /api/carbon-backfill- Skip months that already have data (idempotent)POST /api/carbon-backfill?force_overwrite=true- Overwrite all existing dataPOST /api/carbon-backfill?skip_existing=false- Process all months, but don't skip if carbon export already exists
We don't provide a backfill for this dataset.
Runs every weekday at 6AM GMT automatically run the backfill for cost exports and carbon exports; first costs then carbon.
The appvia analytics teams can delete the associated lockfile for each tenant to force re-running the backfill. And because the Cost Export backfill will only run batches of six, it will take multiple days to export a full backfill schedule.
The backfill start date (backfill_start_date) module terraform variable must
be explicitly set.
- Private Networking: All components use private endpoints and VNet integration
- Zero Trust: No public network access (except during deployment if
deploy_from_external_network=true) - Managed Identity: Azure resources authenticate using system-assigned managed identities
- Cross-Cloud Federation: OIDC federation eliminates need for long-lived AWS credentials
- Hash-Pinned Dependencies: Python packages in
requirements.txtare pinned to exact versions with SHA256 hashes, ensuring artifact integrity and protecting against supply-chain attacks
Python dependencies are managed using a two-file approach:
| File | Purpose | Edit manually? |
|---|---|---|
src/cost_export/requirements.in |
Direct dependencies only (7 packages) | Yes — this is the source of truth |
src/cost_export/requirements.txt |
Fully resolved lockfile with all transitive deps, each pinned with SHA256 hashes | No — always machine-generated |
requirements.txt is committed to the repository and is what Azure's Oryx build system installs using --require-hashes. It must contain every package in the dependency tree (direct and transitive) pinned with == and hashed. Do not edit it by hand.
-
Edit
src/cost_export/requirements.in— add, remove, or change the version of the direct dependency. Versions are pinned with==.Note on boto3/s3fs compatibility:
boto3is capped at<1.43becauses3fspulls inaiobotocore, which requiresbotocore<1.43.1.boto3>=1.43requiresbotocore>=1.43.15, making the two incompatible. If you bump either package, re-check this constraint. -
Regenerate the lockfile:
make python-lock
This resolves the full dependency tree for Linux / Python 3.13 (matching the Function App runtime) and overwrites
requirements.txtwith all packages pinned and hashed.uvis pre-installed in the dev container and fetches a Python 3.13 interpreter automatically — no local Python 3.13 required. -
Commit both files:
git add src/cost_export/requirements.in src/cost_export/requirements.txt git commit -m "chore: update python dependencies"
- An existing virtual network with two subnets, one of which has a delegation
for Microsoft.App.environments (
function_app_subnet_id) - Role assignments:
- Azure RBAC:
Reader and Data Access,User Access AdministratorandContributorat the subscription scope (where you will be provisioning resources)User Access Administratorat the Tenant Root Group management group scope*
- Billing:
- Enterprise Agreement (EA):
EnrollmentReaderat the billing account scope (see Assign Enterprise Agreement roles to service principals) - Microsoft Customer Agreement (MCA):
Billing account contributorat the billing account scope
- Enterprise Agreement (EA):
- Azure RBAC:
Tip
* Role assignment privileges can be constrained to Carbon Optimization Reader, Management Group Reader and Reader
provider "azurerm" {
# These need to be explicitly registered
resource_providers_to_register = ["Microsoft.CostManagementExports", "Microsoft.App"]
# Required: the cost_export storage account disables shared access keys, so the provider
# must use Entra ID for storage data-plane operations. Without this, apply fails with
# KeyBasedAuthenticationNotPermitted (403).
storage_use_azuread = true
features {}
}
module "example" {
source = "git::https://github.com/co-cddo/terraform-azure-focus?ref=1833bb30497da1b2faac808c0a4ba3adde71494e" # v0.0.2
aws_account_id = "<aws-account-id>"
billing_account_ids = ["<billing-account-id>"] # List of billing account IDs (applicable to FOCUS cost data only)
subnet_id = "/subscriptions/<subscription-id>/resourceGroups/existing-infra/providers/Microsoft.Network/virtualNetworks/existing-vnet/subnets/default"
function_app_subnet_id = "/subscriptions/<subscription-id>/resourceGroups/existing-infra/providers/Microsoft.Network/virtualNetworks/existing-vnet/subnets/functionapp"
virtual_network_name = "existing-vnet"
virtual_network_resource_group_name = "existing-infra"
resource_group_name = "rg-cost-export"
# Setting to false or omitting this argument assumes that you have
# private GitHub runners configured in the existing virtual network.
# It is not recommended to set this to true in production
deploy_from_external_network = false
# Uncomment when running in CI/CD with a service principal
# (e.g., GitHub Actions)
# current_principal_type = "ServicePrincipal"
}Tip
If you don't have a suitable existing Virtual Network with two subnets (one of which has a delegation to Microsoft.App.environments), please refer to the example configuration here, which provisions the prerequisite baseline infrastructure before consuming the module.
Important
Use the dev container. It is the recommended way to work on this
repository. All required tooling (Terraform, uv, az, make, pre-commit
hooks, etc.) is pre-installed at pinned versions. You do not need to install
anything locally beyond Docker.
- Install Docker Desktop
- Open the repo in VS Code
- Install the Dev Containers extension if you don't already have it
- Select Reopen in Container (VS Code will prompt you automatically, or
use the command palette:
Dev Containers: Reopen in Container)
The container will build on first use and subsequent opens will be fast.
- Terraform &
terraform-docs - Azure CLI (
az) uv(Python package manager)make- Pre-commit hooks
- All VS Code extensions needed for this repo
- Docker Desktop
See examples/existing-infrastructure for a working example.
cd examples/existing-infrastructure
az login
terraform init
terraform planThis module includes comprehensive tests for the carbon export functionality, including dynamic date range calculations, idempotency features, and subscription batching logic.
Use the Makefile targets for easy test execution:
# Run all Python tests
make tests-python
# Quick validation (syntax check + unit tests)
make python-test-quick
# Run individual test suites
cd src/cost_export
python3 test_carbon_date_range.py
python3 test_carbon_idempotency.py
python3 test_carbon_batching.py
python3 test_carbon_batching_unit.pyThe test suite covers:
- Dynamic Date Range Calculation: Validates that carbon API date ranges are calculated correctly based on Microsoft's data availability rules
- Idempotency: Ensures carbon export functions can be safely re-run without duplicate processing
- Subscription Batching: Tests the automatic batching logic that handles large subscription lists (>100) for the Carbon API
- Error Handling: Validates graceful handling of API limits and failures
- Syntax Validation: Ensures all Python code compiles correctly
Terraform module documentation is maintained by a terraform-docs
pre-commit hook.
| Name | Version |
|---|---|
| archive | >= 2.0 |
| azapi | >= 1.7.0 |
| azuread | > 2.0 |
| azurerm | > 4.0 |
| null | >= 3.0 |
| random | >= 3.0 |
| time | >= 0.7.0 |
| Name | Description | Type | Default | Required |
|---|---|---|---|---|
| aws_account_id | AWS account ID to use for the S3 bucket | string |
n/a | yes |
| billing_account_ids | List of billing account IDs to create FOCUS cost exports for. Use the billing account ID format from Azure portal (e.g., 'bdfa614c-3bed-5e6d-313b-b4bfa3cefe1d:16e4ddda-0100-468b-a32c-abbfc29019d8_2019-05-31') | list(string) |
n/a | yes |
| function_app_subnet_id | ID of the subnet to connect the function app to. This subnet must have delegation configured for Microsoft.App/environments and must be in the same virtual network as the private endpoints | string |
n/a | yes |
| resource_group_name | Name of the new resource group | string |
n/a | yes |
| subnet_id | ID of the subnet to deploy the private endpoints to. Must be a subnet in the existing virtual network | string |
n/a | yes |
| virtual_network_name | Name of the existing virtual network | string |
n/a | yes |
| virtual_network_resource_group_name | Name of the existing resource group where the virtual network is located | string |
n/a | yes |
| aws_region | AWS region for the S3 bucket | string |
"eu-west-2" |
no |
| aws_s3_bucket_name | Name of the AWS S3 bucket to store cost data | string |
"uk-gov-gds-cost-inbound-azure" |
no |
| backfill_start_date | The year and month to start backfill - in the format 'YYYY-MM-01'; defaults to 2022-01-01 | string |
"2022-01-01" |
no |
| cost_export_daily_schedule_to_years | The number of years from initial deployment to set the end date of the daily schedule for cost export | number |
15 |
no |
| cost_mgmt_suffix | [optional] suffix to add to cost mgmt export tasks - to allow multiple deployments of this module in one tenant | string |
"" |
no |
| current_principal_type | Type of the current principal running Terraform. Set to 'ServicePrincipal' when running in CI/CD with a service principal, 'User' for interactive usage. | string |
"User" |
no |
| deploy_from_external_network | If you don't have existing GitHub runners in the same virtual network, set this to true. This will enable 'public' access to the function app during deployment. This is added for convenience and is not recommended in production environments | bool |
false |
no |
| focus_dataset_version | Version of the cost and usage details (FOCUS) dataset to use | string |
"1.0r2" |
no |
| is_enterprise_customer | Set to true if you are an Enterprise Agreement customer | bool |
false |
no |
| location | The Azure region where resources will be created | string |
"uksouth" |
no |
| log_analytics_workspace_id | Resource ID of an existing Log Analytics workspace to use for diagnostic settings. If not provided, a new workspace will be created. | string |
null |
no |
| logging_level | Logging level for the app; can be DEBUG or INFO (default) | string |
"INFO" |
no |
| tags | Tags to apply to all resources | map(string) |
{} |
no |
| Name | Description |
|---|---|
| aws_app_client_id | The aws app client id |
| billing_account_ids | Billing account IDs configured for cost reporting |
| billing_accounts_map | Map of billing account indices to IDs and scopes |
| carbon_container_name | The storage container name for carbon data (not used - carbon data goes directly to S3) |
| carbon_export_name | The name of the carbon optimization export (timer-triggered function) |
| cost_export_app_principal_id | The principal id of the cost export app - use this to assign Enrollment Reader role |
| cost_export_storage_account_id | The resource id of the cost export storage account |
| cost_export_storage_account_name | The name of the cost export storage account |
| current_principal_type | Principal type of the current Azure client (ServicePrincipal or User) |
| deployment_storage_account_id | The resource id of the deployment storage account |
| deployment_storage_account_name | The name of the deployment storage account |
| deployment_storage_private_endpoint_ip | The private IP address of the deployment storage blob private endpoint |
| ea_billing_role_definition_ids | The set of roleDefinitionId - use each of these as input to the Enrollment Reader JSON body - must match the billing id in the URL |
| event_grid_subscription_name | The name of the Event Grid subscription for blob created events |
| event_grid_system_topic_name | The name of the Event Grid system topic for storage events |
| focus_container_name | The storage container name for FOCUS cost data |
| function_app_id | The resource id of the cost export function app |
| function_app_name | The name of the cost export function app |
| function_app_private_endpoint_ip | The private IP address of the function app private endpoint |
| log_analytics_workspace_id | The resource ID of the Log Analytics workspace used for diagnostic settings |
| publish_code_command | Publish code command for debugging |
| random_string_suffix | The random suffix appended to generated resource names |
| recommendations_export_name | The name of the Azure Advisor recommendations export (timer-triggered function) |
| report_scopes | Report scopes created for each billing account |
| storage_private_endpoint_ip | The private IP address of the cost export storage blob private endpoint |
| storage_queue_private_endpoint_ip | The private IP address of the cost export storage queue private endpoint |
| tenant_id | The tenant id - use this to assign the Enrollment Reader role |
