Refactor deploying image logic #346

Haivilo · 2025-04-12T08:01:11Z

TODO

Fix integration tests

Summary of Changes

Image Mapping Changed from Per-Function to Per-Workflow
- The deployment logic was simplified to store a single image URI per workflow_instance_id instead of per function.
Introduced Generic Lambda Handler
- Added generic_handler.py, a dynamic entrypoint that routes invocations based on the target field in the event payload.
Payload Format Enhancement
- Payloads now include a top-level "target" field indicating the function name to invoke, to support the generic Lambda handler.
Minor Enhancements
- Docker image names now use workflow_instance_id instead of full function name.
- _copy_image_to_region was renamed to _copy_image_if_not_exists and now avoids redundant ECR operations when the image exists in the same region.

engshahrad · 2025-04-15T19:15:35Z

@Haivilo for the sake of documentation, can you provide details on performance gains as a result of this change in a comment?

@Danidite and @vGsteiger can you engage with Steve to fix rough edges and bring the new capability to the project? Thanks

Haivilo · 2025-04-15T23:39:42Z

Hi @engshahrad, the build time improves as the number of functions increase, which reaches ~50% reduction when you have 5 workflow functions. This was tested locally. Here is the improvement of example build time:

For the performance related to migrating speed, I will discuss with @Danidite and get back to you. I was unable to retrieve the results efficiently.

…ing and function registration - Changed `_copy_image_to_region` to `_copy_image_if_not_exists` and modified logic - Added logic to skip image copying if it already exists in the current region. - Updated Docker image naming to use `workflow_instance_id`, so every lambda function uses the same docker image. - Introduced `set_wrapped_function` method in `CaribouFunction` to register wrapped functions. - Enhanced `CaribouWorkflow` to pass `successor_function_name` in various methods for better tracking. - Added a new `generic_handler.py` to handle dynamic function routing in Lambda. - Updated deployment packager to include the new generic handler. - Adjusted tests to reflect changes in image copying logic.

Danidite

It overall looks very good, great work! However, I do have a few minor comments and concerns.

Danidite · 2025-05-20T18:47:21Z

caribou/deployment/client/generic_handler.py

+        workflow = app.workflow
+
+        # Get payload and target function
+        payload = _get_payload(event)


It seems like this function was never used. The "result = target_function(event)" seems to read the original event. Is the target function suppose to take in the original event or the parsed payload?

Danidite · 2025-05-20T18:48:16Z

caribou/deployment/client/generic_handler.py

+        # Get payload and target function
+        payload = _get_payload(event)
+        target_function_name = event.get("target") if isinstance(event, dict) else None
+        target_function, func_name = _find_target_function(workflow, target_function_name)


Function name is also never used here, is this intentional? If so maybe the _find_target_function(...) function should only return the target_function as its not used anywhere else.

Danidite · 2025-05-20T20:22:17Z

caribou/deployment/client/generic_handler.py

+        target_function_name = event.get("target") if isinstance(event, dict) else None
+        target_function, func_name = _find_target_function(workflow, target_function_name)
+
+        _, _ = payload, func_name  # Unused variables, for now disabled to run tests


Make sure to remove this after the above fixes/changes, I temporarily added it to pass the compliance tests.

Danidite · 2025-05-20T20:30:10Z

caribou/common/models/remote_client/aws_remote_client.py

        parts = deployed_image_uri.split("/")
        original_region = parts[0].split(".")[3]
        original_image_name = parts[1]

        ecr_client = self._client("ecr")
        new_region = ecr_client.meta.region_name
+        if new_region == original_region:
+            logger.info("Image already exists in the %s region, skipping copy", new_region)
+            return deployed_image_uri


If I am understanding it correctly, all this checks is if the new region does not equal to the original region name (home region I assume). So for scenarios where this is not the first re-deployment, would it still perform redundant ECR operations? If so would a better way to check simply just checking if the image already exist in the ECR via some boto3 call?

Danidite · 2025-05-20T20:41:18Z

caribou/deployment/client/generic_handler.py

+        for func_name, caribou_func in workflow.functions.items():
+            if caribou_func.name == target_name:
+                return caribou_func.wrapped_function, func_name
+    else:


Honestly, this else statement where if no target_name is provided it defaults to calling the entry point seems potentially risky for future maintenance. If the target is somehow lost in a future change, it may result in infinite loops (if they also bypass our other failsafes). A better approach might be to modify the invoker (client CLI) so that even for the first function, it must explicitly specify the target_name of the entry_point function. Then, in this code, you can simply raise an error and terminate if target_name is missing. This doesn't need to be implemented in this PR, just create a new issue for the change.

engshahrad requested a review from Danidite April 15, 2025 19:11

engshahrad added the enhancement New feature or request label Apr 15, 2025

Danidite assigned Haivilo Apr 25, 2025

Haivilo added 4 commits May 8, 2025 14:38

clean up generic handler

9c8fd6c

format

b686028

Fix most tests - not finished

f0b2c81

Haivilo force-pushed the feat-common-image-fix-geo branch from b1eb883 to f0b2c81 Compare May 9, 2025 04:22

Danidite added 2 commits May 20, 2025 18:56

fixed some pyline issues

42d6b83

fixed typing and linting issues, fixed unit tests for caribou workflows

b5b5ba5

Danidite requested changes May 20, 2025

View reviewed changes

Merge branch 'main' into feat-common-image-fix-geo

50d4bcb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor deploying image logic #346

Refactor deploying image logic #346

Uh oh!

Haivilo commented Apr 12, 2025

Uh oh!

engshahrad commented Apr 15, 2025

Uh oh!

Haivilo commented Apr 15, 2025

Uh oh!

Danidite left a comment

Uh oh!

Danidite May 20, 2025 •

edited

Loading

Uh oh!

Danidite May 20, 2025

Uh oh!

Danidite May 20, 2025 •

edited

Loading

Uh oh!

Danidite May 20, 2025

Uh oh!

Danidite May 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Refactor deploying image logic #346

Are you sure you want to change the base?

Refactor deploying image logic #346

Uh oh!

Conversation

Haivilo commented Apr 12, 2025

TODO

Summary of Changes

Uh oh!

engshahrad commented Apr 15, 2025

Uh oh!

Haivilo commented Apr 15, 2025

Uh oh!

Danidite left a comment

Choose a reason for hiding this comment

Uh oh!

Danidite May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Danidite May 20, 2025

Choose a reason for hiding this comment

Uh oh!

Danidite May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Danidite May 20, 2025

Choose a reason for hiding this comment

Uh oh!

Danidite May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Danidite May 20, 2025 •

edited

Loading

Danidite May 20, 2025 •

edited

Loading

Danidite May 20, 2025 •

edited

Loading