This implementation adds caching for AWS accounts and permission sets using S3 to improve resilience during AWS service outages. The cache uses a parallel API/cache call strategy with automatic cache updates when data changes.
The implementation uses a resilient caching strategy:
- Parallel Execution: Both AWS API and S3 cache are called simultaneously using ThreadPoolExecutor
- API Success Path:
- If API call succeeds, compare the result with cached data
- If data differs, update the cache automatically
- Return the fresh API data
- API Failure Path:
- If API call fails but cache has data, return cached data as fallback
- Log warning about using cached data
- Both Fail Path:
- If both API and cache fail, raise an exception
- No TTL Management: Cache is kept indefinitely and updated automatically when data changes
- Maximum Resilience: Always tries API first, falls back to cache on failure
- Automatic Updates: Cache stays fresh without manual intervention
- Parallel Performance: API and cache calls don't block each other
-
S3 Bucket (
s3.tf):- Uses the same
fivexl/account-baseline/aws//modules/s3_baselinemodule as the audit bucket - Bucket name: Configurable via
config_bucket_namevariable (default:sso-elevator-config) - Cache structure:
accounts.json- stores all accountspermission_sets/<arn_hash>.json- stores permission sets per SSO instance
- No TTL metadata - cache is kept indefinitely
- Security Features:
- Server-side encryption enabled (AES256 by default, KMS optional via
config_bucket_kms_key_arn) - Public access blocked (via s3_baseline module)
- Versioning enabled
- Lifecycle policy to clean up old versions after 7 days
- Server-side encryption enabled (AES256 by default, KMS optional via
- Note: The bucket is always created (not conditional) as it's intended for future config storage even when caching is disabled
- Uses the same
-
Variables (
vars.tf):config_bucket_name: Name of the S3 bucket (default:sso-elevator-config)cache_enabled: Enable/disable caching (default:true)config_bucket_kms_key_arn: Optional ARN of a customer-managed KMS key for encryption (default: null)- Variables are passed to Lambda functions as environment variables
-
IAM Permissions:
- S3 permissions added to the requester Lambda function:
s3:GetObjects3:PutObjects3:ListBucket
- Note: The revoker Lambda does NOT use cache and does not have these permissions
- S3 permissions added to the requester Lambda function:
-
Outputs (
outputs.tf):config_s3_bucket_name: The name of the config S3 bucketconfig_s3_bucket_arn: The ARN of the config S3 bucket
-
Updated Cache Module (
src/cache.py):CacheConfig: Configuration class for cache settings (no TTL field)get_cached_accounts(): Retrieve cached accounts from S3set_cached_accounts(): Store accounts in S3 cacheget_cached_permission_sets(): Retrieve cached permission setsset_cached_permission_sets(): Store permission sets in cachewith_cache_resilience(): New function implementing parallel API/cache strategy with automatic updates
-
Updated Organizations Module (
src/organizations.py):list_accounts_with_cache(): Lists accounts with cache resilienceget_accounts_from_config_with_cache(): Gets filtered accounts with cache resilience- Uses parallel API/cache calls with automatic cache updates
-
Updated SSO Module (
src/sso.py):list_permission_sets_with_cache(): Lists permission sets with cache resilienceget_permission_sets_from_config_with_cache(): Gets filtered permission sets with cache resilienceget_account_assignment_information_with_cache(): Combined function for both cached data- Uses parallel API/cache calls with automatic cache updates
-
Updated Config (
src/config.py):config_bucket_name: S3 bucket name (default:sso-elevator-config)cache_enabled: Boolean flag to enable/disable caching (default:True)- Removed
cache_ttl_minutesfield
-
Updated Lambda Handlers:
src/main.py: Updated to use S3 cache for accounts and permission setssrc/revoker.py: Does NOT use cache - always fetches fresh data from AWS APIs for accuracy
The cache uses S3 object keys:
-
Accounts Cache:
- Key:
accounts.json - Stores all accounts in a single JSON file
- Key:
-
Permission Sets Cache:
- Key:
permission_sets/<arn_hash>.json - Stores all permission sets for a specific SSO instance
- ARN is hashed (colons and slashes replaced with underscores) for safe file naming
- Key:
-
First Request:
- API and cache called in parallel
- Cache miss (no data)
- API succeeds → Store in cache → Return API data
-
Subsequent Requests (Data Unchanged):
- API and cache called in parallel
- Both succeed
- Data matches → No cache update → Return API data
-
Subsequent Requests (Data Changed):
- API and cache called in parallel
- Both succeed
- Data differs → Update cache → Return API data
-
API Unavailable:
- API and cache called in parallel
- API fails, cache succeeds
- Return cached data (logged as fallback)
-
Cache Unavailable:
- API and cache called in parallel
- Cache fails, API succeeds
- Store in cache → Return API data
The cache is designed to be fail-safe:
- All cache operations are wrapped in try-except blocks
- Cache failures NEVER prevent the application from functioning
- Warnings are logged when cache operations fail
- S3 unavailability automatically triggers fallback to direct AWS API calls
- Application works normally if:
- S3 bucket doesn't exist
- Bucket name is misconfigured
- IAM permissions are missing
- S3 service is down
Behavior on errors:
- Cache read errors: Log warning, continue with API data only
- Cache write errors: Log warning, return API data without caching
- API errors with cache available: Log warning, return cached data
- Both fail: Raise exception (application cannot function)
Caching is enabled by default:
module "aws_sso_elevator" {
source = "path/to/module"
# Other configuration...
# These are the defaults (no need to specify):
# cache_enabled = true
# config_bucket_name = "sso-elevator-config"
# config_bucket_kms_key_arn = null # Uses AES256 encryption
}To disable caching (S3 bucket will still be created for future config storage):
module "aws_sso_elevator" {
source = "path/to/module"
# Other configuration...
cache_enabled = false # Disable caching (bucket still created for future use)
}To use a custom S3 bucket name:
module "aws_sso_elevator" {
source = "path/to/module"
# Other configuration...
config_bucket_name = "my-custom-sso-elevator-config"
}To use a customer-managed KMS key instead of AES256:
module "aws_sso_elevator" {
source = "path/to/module"
# Other configuration...
config_bucket_kms_key_arn = "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
}Cache operations are logged with the following patterns:
"Retrieved X accounts from cache": Accounts retrieved from cache"Cache miss for accounts": No cached accounts found"Successfully fetched accounts from API": API call succeeded"API call failed for accounts": API call failed"API data differs from cache": Cache will be updated"API data matches cache": No cache update needed"API failed for accounts, using cached data as fallback": Using cache due to API failure"Failed to get cached accounts": Error reading from cache"Failed to cache accounts": Error writing to cache
Monitor these CloudWatch metrics for the cache bucket:
NumberOfObjects: Number of cached itemsBucketSizeBytes: Total cache sizeAllRequests: Cache read/write activity
- Data Sensitivity: Cache contains account IDs, names, and permission set information
- Encryption:
- Server-side encryption is enabled by default using AES256
- Optional customer-managed KMS key support via
cache_kms_key_arnvariable - Encryption at rest is always active
- Access Control: IAM permissions limit cache access to Lambda functions only
- Public Access: All public access is blocked by default
- Versioning: Enabled to protect against accidental overwrites
- No Expiration: Cache is kept indefinitely and updated automatically when data changes
- Parallel Calls: API and cache calls execute simultaneously (~50-100ms for cache)
- Cache Hit + API Success: Same latency as API-only (cache runs in parallel)
- API Failure with Cache: ~50-100ms (S3 GetObject latency)
- Cost: S3 charges apply (~$0.005 per 1,000 GET requests, ~$0.005 per 1,000 PUT requests)
- Existing deployments will need to update their Terraform configuration
- The variable
cache_ttl_minuteshas been removed and replaced withcache_enabled - Cache behavior has changed from TTL-based to automatic update-based
- No data migration is needed as the cache will refresh automatically
If you're migrating from the TTL-based cache:
- Update your Terraform configuration to remove
cache_ttl_minutesand usecache_enabledinstead - Apply the Terraform changes
- The cache will continue to work with the new strategy
- No manual intervention needed
- Test Normal Operation: Submit access request → Verify both API and cache are called
- Test Cache Update: Change AWS data → Submit request → Verify cache is updated
- Test API Fallback: Temporarily block API access → Verify cached data is used
- Test Cache Disabled: Set
cache_enabled = false→ Verify normal operation (no S3 calls) - Test Missing Permissions: Temporarily remove S3 IAM permissions → Verify graceful fallback to API
Important: These warnings do not break functionality. The application will continue to work using API data.
Common causes:
- S3 bucket doesn't exist (check Terraform deployment)
- Wrong bucket name (verify
config_bucket_namematches between Terraform and Lambda environment variables) - Lambda IAM permissions missing S3 read access
How to diagnose:
- Check CloudWatch Logs for the full error message
- Verify the bucket exists:
aws s3 ls s3://sso-elevator-config-<random-suffix> - Confirm bucket name environment variable: Check Lambda configuration
CONFIG_BUCKET_NAME - Verify IAM permissions include
s3:GetObject,s3:ListBucket
Important: These warnings do not break functionality. The application will continue to work with API data.
Common causes:
- S3 bucket doesn't exist
- Wrong bucket name configuration
- Lambda IAM permissions missing S3 write access
How to diagnose:
- Check CloudWatch Logs for detailed error messages
- Verify IAM permissions include
s3:PutObject
This is expected behavior when AWS APIs are unavailable. The application is working correctly by using cached data.
If you notice the cache isn't updating when AWS data changes:
- Check CloudWatch Logs for cache write errors
- Verify S3 permissions include
s3:PutObject - Check if data comparison is working correctly (logs will show "API data differs from cache")