Skip to content

Implement credential validation in AWS provider #6

@dcmcand

Description

@dcmcand

Problem

Currently, the AWS provider attempts to deploy/destroy infrastructure without validating credentials or permissions upfront. This can lead to failures deep into the operation with unclear error messages.

Proposed Solution

Implement credential validation that runs at the start of all operations (Deploy, Destroy, Reconcile, and their dry-run equivalents) - ideally immediately after creating the AWS clients.

Validation Steps

  1. Credential Validity: Verify that AWS credentials are valid and can authenticate

    • Use sts:GetCallerIdentity to validate credentials work
    • Return clear error if credentials are invalid or expired
    • Display the identity (user/role ARN) and account ID
  2. Permission Check: Verify the credentials have all required IAM permissions

    • Use IAM Policy Simulator API where possible
    • Provide clear, actionable error messages listing missing permissions
  3. Fail Fast: If validation fails, return informative error before attempting any infrastructure operations

Required AWS IAM Permissions

Based on analysis of the AWS provider implementation (pkg/provider/aws/) and Terraform modules, the following permissions are required:

STS (Security Token Service)

  • sts:GetCallerIdentity - Validate credentials

S3 (State Bucket Management)

These permissions are required by the Go CLI for Terraform state bucket lifecycle management:

  • s3:HeadBucket - Check if state bucket exists
  • s3:CreateBucket - Create state bucket
  • s3:PutBucketVersioning - Enable versioning on state bucket
  • s3:PutPublicAccessBlock - Block public access to state bucket
  • s3:ListObjectVersions - List objects before deletion (destroy)
  • s3:DeleteObject - Delete objects in bucket (destroy)
  • s3:DeleteBucket - Delete state bucket (destroy)

EC2 (VPC, Subnets, Network)

  • ec2:CreateVpc
  • ec2:DeleteVpc
  • ec2:DescribeVpcs
  • ec2:ModifyVpcAttribute (enable DNS)
  • ec2:CreateSubnet
  • ec2:DeleteSubnet
  • ec2:DescribeSubnets
  • ec2:CreateInternetGateway
  • ec2:DeleteInternetGateway
  • ec2:AttachInternetGateway
  • ec2:DetachInternetGateway
  • ec2:DescribeInternetGateways
  • ec2:AllocateAddress (Elastic IPs for NAT Gateways)
  • ec2:ReleaseAddress
  • ec2:DescribeAddresses
  • ec2:CreateNatGateway
  • ec2:DeleteNatGateway
  • ec2:DescribeNatGateways
  • ec2:CreateRouteTable
  • ec2:DeleteRouteTable
  • ec2:DescribeRouteTables
  • ec2:CreateRoute
  • ec2:AssociateRouteTable
  • ec2:DisassociateRouteTable
  • ec2:CreateSecurityGroup
  • ec2:DeleteSecurityGroup
  • ec2:DescribeSecurityGroups
  • ec2:AuthorizeSecurityGroupIngress
  • ec2:AuthorizeSecurityGroupEgress
  • ec2:CreateVpcEndpoint
  • ec2:DeleteVpcEndpoints
  • ec2:DescribeVpcEndpoints
  • ec2:DescribeNetworkInterfaces (wait for ENI cleanup)
  • ec2:DescribeAvailabilityZones
  • ec2:CreateTags
  • ec2:DeleteTags

EKS (Elastic Kubernetes Service)

  • eks:CreateCluster
  • eks:DeleteCluster
  • eks:DescribeCluster
  • eks:UpdateClusterVersion
  • eks:UpdateClusterConfig
  • eks:CreateNodegroup
  • eks:DeleteNodegroup
  • eks:DescribeNodegroup
  • eks:ListNodegroups
  • eks:UpdateNodegroupConfig
  • eks:TagResource
  • eks:UntagResource

IAM (Identity and Access Management)

  • iam:CreateRole
  • iam:DeleteRole
  • iam:GetRole
  • iam:AttachRolePolicy
  • iam:DetachRolePolicy
  • iam:ListAttachedRolePolicies
  • iam:PassRole (required for EKS to assume cluster/node roles)
  • iam:TagRole

EFS (Elastic File System) - Optional, only if EFS is enabled in config

  • elasticfilesystem:CreateFileSystem
  • elasticfilesystem:DeleteFileSystem
  • elasticfilesystem:DescribeFileSystems
  • elasticfilesystem:CreateMountTarget
  • elasticfilesystem:DeleteMountTarget
  • elasticfilesystem:DescribeMountTargets
  • elasticfilesystem:TagResource

Implementation Location

Add validation in pkg/provider/aws/provider.go:

  1. Create a validateCredentials(ctx context.Context, clients *Clients, cfg *config.NebariConfig) error method

  2. Call it from:

    • Deploy() - at the start, after client creation
    • Reconcile() - at the start, after client creation
    • Destroy() - at the start, after client creation
    • dryRunDeploy() - at the start, after client creation
    • dryRunDestroy() - at the start, after client creation
  3. Use sts:GetCallerIdentity as a basic credential check

  4. Optionally check for EFS permissions only if cfg.AWS.EFS.Enabled == true

  5. Provide clear error messages listing any missing permissions

Example Error Message

Error: AWS credentials validation failed

Identity: arn:aws:iam::123456789012:user/my-user
Account: 123456789012

The provided AWS credentials are missing required permissions:
  - ec2:CreateVpc
  - ec2:CreateSubnet
  - eks:CreateCluster
  - iam:CreateRole
  - iam:PassRole

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    aws-providerAWS provider related issuescloud-providerCloud provider related issues (AWS, GCP, Azure, Local)enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions