Feature/aws managed capacity#1418
Conversation
When Zappa receives a compressed text/plain response from the application, it tries to process it as a text response. Instead, Zappa should treat the response as if it were a binary one and base-64 encode the response body. See issue #2080 binary_support logic in handler.py (0.51.0) broke compressed text response Miserlou/Zappa#2080
Arm64 support
version bump
Support AWS Lambdas ephemeral storage setting (zappa#1120)
Cnametoalias
Fix handling of gzip-encoded text response
version bump
… (currently openapi schema can't be passed as text.
remove special s3 function ARN handling (zappa#1414)
|
Can you create a "feature issue" for tracking purposes? |
|
added |
There was a problem hiding this comment.
Pull request overview
This PR adds support for AWS Lambda Managed Instances Capacity Providers, a feature that enables running Lambda functions on EC2 instances for more predictable capacity and cost optimization. The implementation includes configuration support, deployment/update workflows, validation logic, and documentation for this new capability.
Changes:
- Adds
capacity_provider_configconfiguration option to specify Lambda capacity provider settings - Implements mutual exclusion between VPC configurations and capacity providers with validation
- Adds waiting logic for capacity provider state transitions during deployments
- Updates function URL handling with improved return values and CloudFront custom domain cleanup
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| zappa/core.py | Core capacity provider implementation including create/update Lambda function support, VPC conflict validation, capacity provider polling logic, function URL improvements, and CloudFront cleanup |
| zappa/cli.py | CLI integration for capacity provider config loading, deploy/update command modifications, undeploy support for function URL custom domains, and certify command updates |
| tests/test_settings.yaml | Test configuration with capacity provider example |
| tests/test_wsgi_script_name_settings.py | Added EXCEPTION_HANDLER config for test consistency |
| tests/test_core.py | Comprehensive unit tests for capacity provider functionality including configuration, VPC rejection, concurrency handling, and state polling |
| tests/placebo/* | Mock AWS API responses for CloudFront and Route53 operations |
| README.md | Documentation for capacity_provider_config and function_url_domains settings |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if s["Sid"] in ["FunctionURLAllowPublicAccess"]: | ||
| continue | ||
| delete_response = self.lambda_client.remove_permission(FunctionName=lambda_name, StatementId=s["Sid"]) |
There was a problem hiding this comment.
This change removes ALL Lambda function permissions except FunctionURLAllowPublicAccess, not just CloudWatch Events permissions. This is a breaking change that will remove permissions for API Gateway, SNS, S3, and other AWS services that may have been configured. The original code specifically checked for principal.get("Service") == "events.amazonaws.com" to only remove CloudWatch Events permissions. This change should preserve other permissions beyond just FunctionURL permissions.
| if s["Sid"] in ["FunctionURLAllowPublicAccess"]: | |
| continue | |
| delete_response = self.lambda_client.remove_permission(FunctionName=lambda_name, StatementId=s["Sid"]) | |
| # Always preserve Function URL public access statement | |
| if s.get("Sid") in ["FunctionURLAllowPublicAccess"]: | |
| continue | |
| # Only remove permissions created for CloudWatch Events | |
| principal = s.get("Principal", {}) | |
| service = None | |
| if isinstance(principal, dict): | |
| service = principal.get("Service") | |
| # Service may be a string or a list | |
| if isinstance(service, list): | |
| if "events.amazonaws.com" not in service: | |
| continue | |
| elif service != "events.amazonaws.com": | |
| continue | |
| delete_response = self.lambda_client.remove_permission( | |
| FunctionName=lambda_name, | |
| StatementId=s["Sid"], | |
| ) |
| response = self.cloudfront_client.update_distribution( | ||
| DistributionConfig=new_config, Id=id, IfMatch=distribution["ETag"] | ||
| ) | ||
| response |
There was a problem hiding this comment.
This statement on line 2750 has no effect and appears to be a leftover debug statement or incomplete code. The variable response is assigned but never used after this line. This line should either be removed or the intended logging/assertion should be added.
| response |
| @@ -1292,13 +1297,11 @@ def update(self, source_zip=None, no_upload=False, docker_image_uri=None): | |||
|
|
|||
| if endpoint_url != api_url: | |||
| deployed_string = deployed_string + " (" + api_url + ")" | |||
|
|
|||
| if self.stage_config.get("touch", True): | |||
| self.zappa.wait_until_lambda_function_is_updated(function_name=self.lambda_name) | |||
| if api_url: | |||
| self.touch_endpoint(api_url) | |||
| elif endpoint_url: | |||
| self.touch_endpoint(endpoint_url) | |||
| touch_url = api_url | |||
|
|
|||
| if self.stage_config.get("touch", True): | |||
| self.touch_endpoint(touch_url) | |||
There was a problem hiding this comment.
The variable touch_url is uninitialized when neither self.use_apigateway nor the api_url condition on line 1300 is true. If self.use_apigateway is False, touch_url is never set, and calling touch_endpoint(touch_url) on line 1304 will raise a NameError. Initialize touch_url = endpoint_url before the conditional blocks or ensure all code paths define it.
| versions_in_lambda = self.list_lambda_function_versions(function_name) | ||
| versions_in_lambda.remove("$LATEST") | ||
|
|
There was a problem hiding this comment.
The special version identifier "$LATEST.PUBLISHED" is not a standard AWS Lambda version identifier. AWS Lambda uses "$LATEST" for the unpublished version and numeric strings (e.g., "1", "2", "3") for published versions. The identifier "$LATEST.PUBLISHED" appears to be specific to capacity provider functionality but is not documented in standard Lambda API. Verify this identifier is correct, and if it's capacity-provider-specific, add a comment explaining its purpose.
| versions_in_lambda = self.list_lambda_function_versions(function_name) | |
| versions_in_lambda.remove("$LATEST") | |
| versions_in_lambda = self.list_lambda_function_versions(function_name) | |
| # "$LATEST" is the standard, unpublished version and should never be deleted. | |
| versions_in_lambda.remove("$LATEST") | |
| # "$LATEST.PUBLISHED" is not part of the documented Lambda versioning API. | |
| # It can appear when Lambda capacity providers are used; we ignore it here so | |
| # that we do not attempt to delete it as a normal published version. |
| function_url_config=self.function_url_config, | ||
| ) | ||
| self.zappa.deploy_lambda_function_url(**kwargs) | ||
| endpoint_url = self.zappa.deploy_lambda_function_url(**kwargs) |
There was a problem hiding this comment.
In the deploy method, endpoint_url is assigned the return value of deploy_lambda_function_url on line 959, which returns a response dictionary, not a string URL. This will cause touch_endpoint on line 1015 to fail when trying to concatenate the response dictionary with a string path. The return value should be extracted as endpoint_url = self.zappa.deploy_lambda_function_url(**kwargs)["FunctionUrl"] to get the actual URL string.
| endpoint_url = self.zappa.deploy_lambda_function_url(**kwargs) | |
| endpoint_url = self.zappa.deploy_lambda_function_url(**kwargs)["FunctionUrl"] |
| response = self.lambda_client.list_function_url_configs(FunctionName=function_name, MaxItems=50) | ||
| if not response.get("FunctionUrlConfigs", []): | ||
| print("no function url configured on lambda, skip setting custom domains") | ||
| logger.info("no function url configured on lambda, skip setting custom domains") |
There was a problem hiding this comment.
Similar to the issue in undeploy_function_url_custom_domain, this method logs a message when no function URL configs exist (line 1872) but doesn't return early, then unconditionally accesses response["FunctionUrlConfigs"][0] on line 1873, which will raise an IndexError. Add a return statement after the log message on line 1872.
| logger.info("no function url configured on lambda, skip setting custom domains") | |
| logger.info("no function url configured on lambda, skip setting custom domains") | |
| return |
| @@ -1672,19 +1846,21 @@ def update_lambda_function_url(self, function_name, function_url_config): | |||
| ) | |||
| else: | |||
| response = self.lambda_client.update_function_url_config( | |||
| FunctionName=function_name, AuthType=function_url_config["authorizer"] | |||
| FunctionName=function_name, | |||
| AuthType=function_url_config["authorizer"], | |||
| ) | |||
| print("function URL address: {}".format(response["FunctionUrl"])) | |||
| logger.info("function URL address: {}".format(response["FunctionUrl"])) | |||
| self.update_function_url_policy(config["FunctionArn"], function_url_config) | |||
| return response["FunctionUrl"] | |||
| else: | |||
| self.deploy_lambda_function_url(function_name, function_url_config) | |||
| return self.deploy_lambda_function_url(function_name, function_url_config) | |||
There was a problem hiding this comment.
Inconsistent return type between deploy_lambda_function_url and update_lambda_function_url. The deploy_lambda_function_url method returns the full response object (line 1828), while update_lambda_function_url returns just response["FunctionUrl"] as a string (line 1854) in the update path, but falls back to calling deploy_lambda_function_url (line 1856) which returns the full response. This inconsistency will cause issues for callers expecting a consistent return type. Both methods should return the same type, preferably just the FunctionUrl string.
| function_arn: str, | ||
| capacity_provider_name: str, | ||
| function_state: str = "Active", | ||
| marker: str | None = None, |
There was a problem hiding this comment.
The type hint str | None uses the union type syntax with the pipe operator, which was introduced in Python 3.10. According to the PR description and README, this project supports Python 3.9. For Python 3.9 compatibility, use Optional[str] from the typing module instead, or use Union[str, None]. The code already imports Optional from typing on line 25.
| marker: str | None = None, | |
| marker: Optional[str] = None, |
| logger.info("Updating Lambda function code..") | ||
|
|
||
| kwargs = dict(FunctionName=function_name, Publish=publish) | ||
| kwargs = dict(FunctionName=function_name, Publish=publish, Architectures=[self.architecture]) |
There was a problem hiding this comment.
According to AWS Lambda API documentation, the Architectures parameter is not valid for the update_function_code API call. It can only be set during function creation or via update_function_configuration. Including it here will cause an API error. Remove Architectures from the kwargs on line 1343, or ensure architecture updates happen via update_function_configuration instead.
| kwargs = dict(FunctionName=function_name, Publish=publish, Architectures=[self.architecture]) | |
| kwargs = dict(FunctionName=function_name, Publish=publish) |
| if capacity_provider_config: | ||
| kwargs["CapacityProviderConfig"] = capacity_provider_config | ||
| kwargs.pop("VpcConfig") |
There was a problem hiding this comment.
Same issue as in update_lambda_configuration: popping "VpcConfig" from kwargs after it's been added is redundant when using capacity providers. Since the validation on lines 1244-1248 already ensures VPC and capacity provider aren't used together, VpcConfig should either not be added to kwargs when capacity_provider_config is set, or this pattern should be documented with a comment explaining the reasoning.
monkut
left a comment
There was a problem hiding this comment.
PR Review: Feature/AWS Managed Capacity (#1418)
Thanks for the contribution! This adds support for AWS Lambda Managed Instances Capacity Providers — a useful feature for users needing predictable capacity on their own EC2 instances.
Overall the feature is well-structured with good test coverage for the new functionality. However, there are several issues that need to be addressed before merging, ranging from bugs that will cause runtime errors to a breaking behavioral change.
Critical Issues
1. _clear_policy() regression — will delete ALL Lambda permissions (zappa/core.py)
This is the most concerning change. The current code in master only removes CloudWatch Events permissions (created by schedule_events), intentionally preserving API Gateway, SNS, S3, and other service permissions. The PR changes this to delete every permission statement except FunctionURLAllowPublicAccess.
This will break any deployment that uses API Gateway, SNS triggers, S3 event notifications, or any other service-based permission. The original filter (principal.get("Service") == "events.amazonaws.com") was deliberate and well-documented.
The fix should preserve the original CloudWatch Events filter while also skipping FunctionURLAllowPublicAccess:
if s["Sid"] in ["FunctionURLAllowPublicAccess"]:
continue
# Only remove CloudWatch Events permissions (created by schedule_events)
principal = s.get("Principal", {})
if isinstance(principal, dict) and principal.get("Service") == "events.amazonaws.com":
delete_response = ...2. Missing return after "no function url" check — will IndexError (zappa/core.py)
Both update_lambda_function_url_domains() (line ~1854) and undeploy_function_url_custom_domain() (line ~2783) log "no function url configured" but don't actually return. Execution continues and immediately hits response["FunctionUrlConfigs"][0] which raises IndexError on an empty list.
# Both methods need:
if not response.get("FunctionUrlConfigs", []):
logger.info("no function url configured on lambda, skip...")
return # <-- missing3. deploy_lambda_function_url() return value mismatch (zappa/cli.py:958)
In deploy(), the PR assigns endpoint_url = self.zappa.deploy_lambda_function_url(**kwargs) but deploy_lambda_function_url returns the full response dict, not a URL string. This will cause touch_endpoint() to fail. Should be:
endpoint_url = self.zappa.deploy_lambda_function_url(**kwargs)["FunctionUrl"]4. Inconsistent return types between deploy_lambda_function_url and update_lambda_function_url
deploy_lambda_function_url returns the full response dict. update_lambda_function_url returns response["FunctionUrl"] (a string) in the update path but falls through to deploy_lambda_function_url (returning a dict) in the create path. Callers can't rely on the return type. Normalize both to return the URL string.
5. update_domain_name() — res undefined when self.apigateway is falsy (zappa/core.py:~3353)
The method wraps the apigateway_client.update_domain_name() call in if self.apigateway: but then unconditionally returns res. If self.apigateway is falsy, res is never defined → NameError.
6. Architectures added to update_function_code kwargs (zappa/core.py:1318)
Architectures is not a valid parameter for the update_function_code API. It can only be set via create_function or update_function_configuration. This will cause an API error on every update.
7. Python 3.9 compatibility — str | None type hint (zappa/core.py:1623)
The str | None union syntax requires Python 3.10+. The project supports Python 3.9. Use Optional[str] instead.
Moderate Issues
8. touch_url may be uninitialized (zappa/cli.py)
In the update() method, touch_url = endpoint_url is set, but endpoint_url itself may not be set if the function URL update path returns inconsistent types (see #4 above). The flow is fragile.
9. Hardcoded time.sleep(10) (zappa/core.py:~1524)
A hardcoded 10-second sleep with no explanation. This should either be documented with a reason, or replaced with a polling/waiter approach consistent with the rest of the codebase.
10. max() on potentially empty sequence (zappa/core.py:~1518)
max(int(v) for v in versions_in_lambda if v.isdigit()) will raise ValueError if there are no numeric versions. Add a guard:
numeric_versions = [int(v) for v in versions_in_lambda if v.isdigit()]
if numeric_versions:
latest_version = max(numeric_versions)11. Capacity provider name extraction is fragile (zappa/core.py:~1500-1510)
The ARN parsing logic (split("capacity-provider/") then rsplit("/")) is hard to follow and may not handle all formats. Consider a cleaner approach by splitting the ARN by : and then parsing the resource component, or using a regex.
12. Standalone expression response does nothing (zappa/core.py:~2802)
response = self.cloudfront_client.update_distribution(...)
response # <-- this line does nothing, leftover debug?Remove it.
13. f-string nested quotes (zappa/core.py:~1516)
function_arn=f"{response["FunctionArn"]}:{latest_version}"This is a syntax error in Python <3.12. Use single quotes for the dict key inside the f-string:
function_arn=f"{response['FunctionArn']}:{latest_version}"Minor / Style
14. log_type="Tail" if not self.capacity_provider_config else "None" (zappa/cli.py:~1579)
Why does log_type change to "None" when using capacity providers? This should have a comment explaining the reasoning — capacity provider log behavior differs from standard Lambda?
15. "$LATEST.PUBLISHED" version identifier (zappa/core.py:~1382)
This doesn't appear in standard AWS Lambda documentation. If it's capacity-provider-specific, add a comment explaining its purpose so future maintainers understand why it's handled.
16. Commit history is very messy
The branch has 80+ commits including many merges from master, unrelated changes (Python 3.11 support, werkzeug fixes, version bumps, etc.), and commits from other contributors' PRs that were merged into the fork. This makes the actual capacity provider changes hard to review. Consider squashing into a clean set of commits that only contain the capacity provider feature.
17. function_url_domains docs added to README but that setting is from a different feature
The README diff adds function_url_domains documentation — this appears to be from the function URL custom domains feature, not the capacity provider feature. If it's already in master, the diff is just noise. If not, it should be a separate PR.
Summary
The core capacity provider integration (config loading, create_lambda_function, update_lambda_configuration, wait_for_capacity_provider_response) is well-implemented with appropriate VPC validation and concurrency guards. The test coverage for these paths is good.
The main blocker is the _clear_policy() regression (#1), which will break existing deployments. The missing return statements (#2) and return type mismatches (#3, #4) will cause runtime errors. The Architectures in update_function_code (#6) will cause API errors on every update.
Recommend fixing issues #1-7 before merge, and squashing the commit history.
Description
closes #1420
AWS Lambda Managed Instances Capacity Providers enables running Lambda functions on your own EC2 instances, giving tighter control over compute capacity, networking, and cost. Capacity Providers define the instance types and scaling behavior used to execute your functions.
This PR updates the project to support (and document) configuring Lambda Managed Instances via Capacity Providers, making it possible to opt into managed instance execution when you need predictable capacity or specific instance characteristics.
Key changes
Adds/updates configuration and documentation for defining a Lambda Capacity Provider.
Wires the new settings into the deployment flow (no behavior change unless explicitly enabled).