Feature/schedule management UI change #566

nishi7409 · 2025-11-21T00:14:46Z

Features

Daily scheduling (different times per day)
Recurring scheduling (same time daily)
Timezone support
Auto Scaling Group integration
LiteLLM automatic registration/deregistration
CloudWatch Events monitoring

Changes

Frontend

ScheduleConfig.tsx: Scheduling UI with validation
AutoScalingConfig.tsx: Toggle between 24/7 and scheduled operation
ModelManagementUtils.tsx: Schedule display in model cards
Added comprehensive test coverage

Backend

schedule_management.py: Core scheduling logic with ASG integration
schedule_monitoring.py: CloudWatch Events processing and LiteLLM lifecycle
schedule_handlers.py: State machine integration
domain_objects.py: Enhanced data models

Technical Details

LiteLLM Integration

Models register with LiteLLM when scaling up
Models deregister when scaling down
Retry logic with exponential backoff

Validation

24-hour time format validation
2-hour minimum schedule windows
Timezone validation
At least one day required for daily schedules

UI Changes

Simple Auto Scaling toggle
Progressive disclosure of schedule options
Clean timezone and time selection
Improved model card display format

Screenshots

Auto Scaling Configuration

Model Management Cards

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Summary (generated)

Summary of Changes

This pull request implements comprehensive schedule management features for model auto-scaling across the entire application stack. Below is an overview of the major changes:

Backend Implementation

Schedule Management Core

Introduced daily and recurring scheduling with timezone support
Auto Scaling Group (ASG) integration for capacity management
LiteLLM automatic registration and deregistration lifecycle management
Enum simplification: EACH_DAY/RECURRING_DAILY renamed to DAILY/RECURRING
DynamoDB field updates: autoScalingGroup → auto_scaling_group, status → model_status

New Functions and Modules

register_litellm() and remove_litellm() for model lifecycle management
schedule_handlers.py module implementing core scheduling logic and CloudWatch Events monitoring
scale_immediately() function to enforce schedules upon model creation
merge_schedule_data() to preserve existing schedule metadata during updates
adjust_initial_capacity_for_schedule() integrated into handle_create_model_stack()

Data Structure Changes

Schedule data now stored under model_config.autoScalingConfig.scheduling
New JobStatus Pydantic model for job status tracking
Removed default fallback values from Lambda function name environment variable lookups

Validation and Error Handling

24-hour format validation for time inputs
2-hour minimum scheduling window enforcement
Timezone verification
Enhanced error handling and logging throughout

Infrastructure Changes

CDK Stack Updates

New ScheduleManagementLambda for handling scheduled ASG actions
New ScheduleMonitoringLambda for processing CloudWatch events
CloudWatch Events rule for ASG instance launch/terminate events
Expanded IAM permissions for Auto Scaling operations: PutScheduledUpdateGroupAction, DeleteScheduledAction, DescribeScheduledActions
State machine workflow modified to include schedule creation step after stack creation

Constants Refactoring

Replaced hardcoded strings with APP_MANAGEMENT_KEY constant across infrastructure files for improved maintainability

Frontend UI Implementation

New Components

ScheduleConfig.tsx component providing React form interface for schedule management
Support for daily and recurring scheduling with timezone selection
Progressive disclosure of schedule options based on user selections

Features

Timezone selection from IANA timezone database with fallback support
Per-day schedule configuration
Comprehensive validation for time formats and scheduling windows
New card definition sections for displaying schedule information
Utility functions: formatScheduleType() and formatScheduleDetails()

Type Definitions

New ScheduleType enum and schedule-related TypeScript types
Zod validation schemas for time format (HH:MM), minimum windows, and timezone requirements
Integration of scheduling configuration into IAutoScalingConfig

API Integration

New useUpdateScheduleMutation hook for schedule-only updates
Dedicated schedule API endpoint routing
Improved error handling and loading state management

Testing

Comprehensive test coverage for new scheduling functionality
Tests refactored to use schedule management functions directly
Edge case and error scenario coverage
ASG integration and LiteLLM lifecycle management tests
Exception handling and retry logic validation

All changes maintain backward compatibility through optional scheduling fields.

…le-management-ui-change

…gement-ui-change

github-actions · 2025-11-21T03:49:32Z

Summary of Changes

This pull request implements comprehensive schedule management features for model auto-scaling across the entire application stack. Below is an overview of the major changes:

Backend Implementation

Schedule Management Core

Introduced daily and recurring scheduling with timezone support
Auto Scaling Group (ASG) integration for capacity management
LiteLLM automatic registration and deregistration lifecycle management
Enum simplification: EACH_DAY/RECURRING_DAILY renamed to DAILY/RECURRING
DynamoDB field updates: autoScalingGroup → auto_scaling_group, status → model_status

New Functions and Modules

register_litellm() and remove_litellm() for model lifecycle management
schedule_handlers.py module implementing core scheduling logic and CloudWatch Events monitoring
scale_immediately() function to enforce schedules upon model creation
merge_schedule_data() to preserve existing schedule metadata during updates
adjust_initial_capacity_for_schedule() integrated into handle_create_model_stack()

Data Structure Changes

Schedule data now stored under model_config.autoScalingConfig.scheduling
New JobStatus Pydantic model for job status tracking
Removed default fallback values from Lambda function name environment variable lookups

Validation and Error Handling

24-hour format validation for time inputs
2-hour minimum scheduling window enforcement
Timezone verification
Enhanced error handling and logging throughout

Infrastructure Changes

CDK Stack Updates

New ScheduleManagementLambda for handling scheduled ASG actions
New ScheduleMonitoringLambda for processing CloudWatch events
CloudWatch Events rule for ASG instance launch/terminate events
Expanded IAM permissions for Auto Scaling operations: PutScheduledUpdateGroupAction, DeleteScheduledAction, DescribeScheduledActions
State machine workflow modified to include schedule creation step after stack creation

Constants Refactoring

Replaced hardcoded strings with APP_MANAGEMENT_KEY constant across infrastructure files for improved maintainability

Frontend UI Implementation

New Components

ScheduleConfig.tsx component providing React form interface for schedule management
Support for daily and recurring scheduling with timezone selection
Progressive disclosure of schedule options based on user selections

Features

Timezone selection from IANA timezone database with fallback support
Per-day schedule configuration
Comprehensive validation for time formats and scheduling windows
New card definition sections for displaying schedule information
Utility functions: formatScheduleType() and formatScheduleDetails()

Type Definitions

New ScheduleType enum and schedule-related TypeScript types
Zod validation schemas for time format (HH:MM), minimum windows, and timezone requirements
Integration of scheduling configuration into IAutoScalingConfig

API Integration

New useUpdateScheduleMutation hook for schedule-only updates
Dedicated schedule API endpoint routing
Improved error handling and loading state management

Testing

Comprehensive test coverage for new scheduling functionality
Tests refactored to use schedule management functions directly
Edge case and error scenario coverage
ASG integration and LiteLLM lifecycle management tests
Exception handling and retry logic validation

All changes maintain backward compatibility through optional scheduling fields.

github-actions · 2025-11-21T03:53:26Z

lambda/models/scheduling/schedule_monitoring.py

+            logger.error(f"Failed to check ASG state: {e}")
+            raise ValueError(f"Failed to check ASG {asg_name}: {str(e)}")

        # Update model status


Inconsistent indentation detected. This line has extra leading whitespace:

Suggested change

# Update model status

# Update model status

github-actions · 2025-11-21T03:53:29Z

lambda/models/scheduling/schedule_monitoring.py

    try:
        response = model_table.scan(
-            FilterExpression="autoScalingGroup = :asg_name",
+            FilterExpression="auto_scaling_group = :asg_name",


The field name was changed from autoScalingGroup to auto_scaling_group. Verify that this matches the actual DynamoDB schema. If the schema uses snake_case, this is correct, but if it uses camelCase, this will cause the scan to fail silently and return no results.

github-actions · 2025-11-21T03:53:32Z

lambda/models/scheduling/schedule_monitoring.py

 def update_model_status(model_id: str, new_status: ModelStatus, reason: str) -> None:
    """Update model status in DynamoDB"""
    try:
+        # Convert enum to string value for DynamoDB
+        status_str = new_status.value if hasattr(new_status, "value") else str(new_status)
+
        model_table.update_item(
            Key={"model_id": model_id},
-            UpdateExpression="SET #status = :status, lastStatusUpdate = :timestamp, statusReason = :reason",
-            ExpressionAttributeNames={"#status": "status"},
+            UpdateExpression="SET model_status = :status, lastStatusUpdate = :timestamp, statusReason = :reason",
            ExpressionAttributeValues={
-                ":status": new_status,
+                ":status": status_str,
                ":timestamp": datetime.now(dt_timezone.utc).isoformat(),
                ":reason": reason,
            },
        )


Inconsistent indentation detected throughout this function. Lines 321-335 have extra leading whitespace:

Suggested change

def update_model_status(model_id: str, new_status: ModelStatus, reason: str) -> None:

"""Update model status in DynamoDB"""

try:

# Convert enum to string value for DynamoDB

status_str = new_status.value if hasattr(new_status, "value") else str(new_status)

model_table.update_item(

Key={"model_id": model_id},

UpdateExpression="SET #status = :status, lastStatusUpdate = :timestamp, statusReason = :reason",

ExpressionAttributeNames={"#status": "status"},

UpdateExpression="SET model_status = :status, lastStatusUpdate = :timestamp, statusReason = :reason",

ExpressionAttributeValues={

":status": new_status,

":status": status_str,

":timestamp": datetime.now(dt_timezone.utc).isoformat(),

":reason": reason,

},

)

def update_model_status(model_id: str, new_status: ModelStatus, reason: str) -> None:

"""Update model status in DynamoDB"""

try:

# Convert enum to string value for DynamoDB

status_str = new_status.value if hasattr(new_status, "value") else str(new_status)

model_table.update_item(

Key={"model_id": model_id},

UpdateExpression="SET model_status = :status, lastStatusUpdate = :timestamp, statusReason = :reason",

ExpressionAttributeValues={

":status": status_str,

":timestamp": datetime.now(dt_timezone.utc).isoformat(),

":reason": reason,

},

)

logger.info(f"Updated model {model_id} model_status to {status_str}: {reason}")

github-actions · 2025-11-21T03:53:35Z

lambda/models/scheduling/schedule_monitoring.py

            Key={"model_id": model_id},
-            UpdateExpression="SET #status = :status, lastStatusUpdate = :timestamp, statusReason = :reason",
-            ExpressionAttributeNames={"#status": "status"},
+            UpdateExpression="SET model_status = :status, lastStatusUpdate = :timestamp, statusReason = :reason",


The DynamoDB field name was changed from status to model_status. Verify that this matches the actual DynamoDB schema. The old code used ExpressionAttributeNames to handle the reserved keyword status, but the new field name model_status is not a reserved word. Ensure this field exists in your DynamoDB table schema.

github-actions · 2025-11-21T03:53:38Z

lambda/models/scheduling/schedule_monitoring.py

+def register_litellm(model_id: str) -> None:
+    """Register model with LiteLLM if missing"""
+    try:
+        model_key = {"model_id": model_id}
+        ddb_item = model_table.get_item(Key=model_key, ConsistentRead=True).get("Item")
+
+        if not ddb_item:
+            logger.warning(f"Model {model_id} not found in DynamoDB")
+            return
+
+        # Check if already registered
+        existing_litellm_id = ddb_item.get("litellm_id")
+        if existing_litellm_id:
+            logger.info(f"Model {model_id} already registered with LiteLLM")
+            return
+
+        model_url = ddb_item.get("model_url")
+        if not model_url:
+            logger.warning(f"Model {model_id} has no model_url")
+            return
+
+        # Initialize LiteLLM client
+        secrets_manager = boto3.client("secretsmanager", region_name=os.environ["AWS_REGION"], config=retry_config)
+        iam_client = boto3.client("iam", region_name=os.environ["AWS_REGION"], config=retry_config)
+
+        litellm_client = LiteLLMClient(
+            base_uri=get_rest_api_container_endpoint(),
+            verify=get_cert_path(iam_client),
+            headers={
+                "Authorization": secrets_manager.get_secret_value(
+                    SecretId=os.environ.get("MANAGEMENT_KEY_NAME"), VersionStage="AWSCURRENT"
+                )["SecretString"],
+                "Content-Type": "application/json",
+            },
+        )
+
+        litellm_config_str = os.environ.get("LITELLM_CONFIG_OBJ", json.dumps({}))
+        try:
+            litellm_params = json.loads(litellm_config_str)
+            litellm_params = litellm_params.get("litellm_settings", {})
+        except json.JSONDecodeError:
+            litellm_params = {}
+
+        model_name = ddb_item["model_config"]["modelName"]
+        litellm_params["model"] = f"openai/{model_name}"
+        litellm_params["api_base"] = model_url
+
+        # Register with LiteLLM
+        litellm_response = litellm_client.add_model(
+            model_name=model_id,
+            litellm_params=litellm_params,
+        )
+
+        litellm_id = litellm_response["model_info"]["id"]
+        logger.info(f"Registered model {model_id} with LiteLLM: {litellm_id}")
+
+        # Update DynamoDB with new litellm_id
+        model_table.update_item(
+            Key=model_key,
+            UpdateExpression="SET litellm_id = :lid",
+            ExpressionAttributeValues={":lid": litellm_id},
+        )
+
+    except Exception as e:
+        logger.error(f"Failed to register {model_id} with LiteLLM: {e}", exc_info=True)
+
+
+def remove_litellm(model_id: str) -> None:
+    """Remove model from LiteLLM if registered"""
+    try:
+        model_key = {"model_id": model_id}
+        ddb_item = model_table.get_item(Key=model_key, ConsistentRead=True).get("Item")
+
+        if not ddb_item:
+            logger.warning(f"Model {model_id} not found in DynamoDB")
+            return
+
+        litellm_id = ddb_item.get("litellm_id")
+        if not litellm_id:
+            logger.info(f"Model {model_id} has no LiteLLM registration to remove")
+            return
+
+        try:
+            # Initialize LiteLLM client
+            secrets_manager = boto3.client("secretsmanager", region_name=os.environ["AWS_REGION"], config=retry_config)
+            iam_client = boto3.client("iam", region_name=os.environ["AWS_REGION"], config=retry_config)
+
+            litellm_client = LiteLLMClient(
+                base_uri=get_rest_api_container_endpoint(),
+                verify=get_cert_path(iam_client),
+                headers={
+                    "Authorization": secrets_manager.get_secret_value(
+                        SecretId=os.environ.get("MANAGEMENT_KEY_NAME"), VersionStage="AWSCURRENT"
+                    )["SecretString"],
+                    "Content-Type": "application/json",
+                },
+            )
+
+            # Remove from LiteLLM
+            litellm_client.delete_model(identifier=litellm_id)
+            logger.info(f"Removed model {model_id} from LiteLLM: {litellm_id}")
+
+            # Clear litellm_id from DynamoDB
+            model_table.update_item(
+                Key=model_key,
+                UpdateExpression="SET litellm_id = :li",
+                ExpressionAttributeValues={":li": None},
+            )
+
+        except Exception as e:
+            logger.error(f"Failed to remove {model_id} from LiteLLM: {e}", exc_info=True)
+
+    except Exception as e:
+        logger.error(f"Error in remove_litellm for {model_id}: {e}", exc_info=True)
+


The register_litellm() and remove_litellm() functions are new and contain significant logic for LiteLLM client initialization. Consider extracting the common LiteLLM client initialization code (lines 458-470 and 520-532) into a shared helper function to reduce duplication:

Suggested change

def register_litellm(model_id: str) -> None:

"""Register model with LiteLLM if missing"""

try:

model_key = {"model_id": model_id}

ddb_item = model_table.get_item(Key=model_key, ConsistentRead=True).get("Item")

if not ddb_item:

logger.warning(f"Model {model_id} not found in DynamoDB")

return

# Check if already registered

existing_litellm_id = ddb_item.get("litellm_id")

if existing_litellm_id:

logger.info(f"Model {model_id} already registered with LiteLLM")

return

model_url = ddb_item.get("model_url")

if not model_url:

logger.warning(f"Model {model_id} has no model_url")

return

# Initialize LiteLLM client

secrets_manager = boto3.client("secretsmanager", region_name=os.environ["AWS_REGION"], config=retry_config)

iam_client = boto3.client("iam", region_name=os.environ["AWS_REGION"], config=retry_config)

litellm_client = LiteLLMClient(

base_uri=get_rest_api_container_endpoint(),

verify=get_cert_path(iam_client),

headers={

"Authorization": secrets_manager.get_secret_value(

SecretId=os.environ.get("MANAGEMENT_KEY_NAME"), VersionStage="AWSCURRENT"

)["SecretString"],

"Content-Type": "application/json",

},

)

litellm_config_str = os.environ.get("LITELLM_CONFIG_OBJ", json.dumps({}))

try:

litellm_params = json.loads(litellm_config_str)

litellm_params = litellm_params.get("litellm_settings", {})

except json.JSONDecodeError:

litellm_params = {}

model_name = ddb_item["model_config"]["modelName"]

litellm_params["model"] = f"openai/{model_name}"

litellm_params["api_base"] = model_url

# Register with LiteLLM

litellm_response = litellm_client.add_model(

model_name=model_id,

litellm_params=litellm_params,

)

litellm_id = litellm_response["model_info"]["id"]

logger.info(f"Registered model {model_id} with LiteLLM: {litellm_id}")

# Update DynamoDB with new litellm_id

model_table.update_item(

Key=model_key,

UpdateExpression="SET litellm_id = :lid",

ExpressionAttributeValues={":lid": litellm_id},

)

except Exception as e:

logger.error(f"Failed to register {model_id} with LiteLLM: {e}", exc_info=True)

def remove_litellm(model_id: str) -> None:

"""Remove model from LiteLLM if registered"""

try:

model_key = {"model_id": model_id}

ddb_item = model_table.get_item(Key=model_key, ConsistentRead=True).get("Item")

if not ddb_item:

logger.warning(f"Model {model_id} not found in DynamoDB")

return

litellm_id = ddb_item.get("litellm_id")

if not litellm_id:

logger.info(f"Model {model_id} has no LiteLLM registration to remove")

return

try:

# Initialize LiteLLM client

secrets_manager = boto3.client("secretsmanager", region_name=os.environ["AWS_REGION"], config=retry_config)

iam_client = boto3.client("iam", region_name=os.environ["AWS_REGION"], config=retry_config)

litellm_client = LiteLLMClient(

base_uri=get_rest_api_container_endpoint(),

verify=get_cert_path(iam_client),

headers={

"Authorization": secrets_manager.get_secret_value(

SecretId=os.environ.get("MANAGEMENT_KEY_NAME"), VersionStage="AWSCURRENT"

)["SecretString"],

"Content-Type": "application/json",

},

)

# Remove from LiteLLM

litellm_client.delete_model(identifier=litellm_id)

logger.info(f"Removed model {model_id} from LiteLLM: {litellm_id}")

# Clear litellm_id from DynamoDB

model_table.update_item(

Key=model_key,

UpdateExpression="SET litellm_id = :li",

ExpressionAttributeValues={":li": None},

)

except Exception as e:

logger.error(f"Failed to remove {model_id} from LiteLLM: {e}", exc_info=True)

except Exception as e:

logger.error(f"Error in remove_litellm for {model_id}: {e}", exc_info=True)

def _get_litellm_client() -> LiteLLMClient:

"""Initialize and return a LiteLLM client"""

secrets_manager = boto3.client("secretsmanager", region_name=os.environ["AWS_REGION"], config=retry_config)

iam_client = boto3.client("iam", region_name=os.environ["AWS_REGION"], config=retry_config)

return LiteLLMClient(

base_uri=get_rest_api_container_endpoint(),

verify=get_cert_path(iam_client),

headers={

"Authorization": secrets_manager.get_secret_value(

SecretId=os.environ.get("MANAGEMENT_KEY_NAME"), VersionStage="AWSCURRENT"

)["SecretString"],

"Content-Type": "application/json",

},

)

github-actions · 2025-11-21T03:53:41Z

lambda/models/state_machine/create_model.py

+def adjust_initial_capacity_for_schedule(prepared_event: Dict[str, Any]) -> None:
+    """Adjust Auto Scaling Group initial capacity based on schedule configuration"""
+    try:
+        # Check if scheduling is configured
+        auto_scaling_config = prepared_event.get("autoScalingConfig", {})
+        scheduling_config = auto_scaling_config.get("scheduling")
+
+        if not scheduling_config or not scheduling_config.get("scheduleEnabled"):
+            logger.info("No scheduling configured - using original capacity settings")
+            return
+
+        schedule_type = scheduling_config.get("scheduleType")
+        timezone_name = scheduling_config.get("timezone")
+
+        try:
+            tz = ZoneInfo(timezone_name)
+            now = datetime.now(tz)
+            current_time = now.time()
+            current_day = now.strftime("%A").lower()
+            is_within_schedule = False
+
+            if schedule_type == "RECURRING" and scheduling_config.get("dailySchedule"):
+                # Daily recurring schedule
+                daily_schedule = scheduling_config["dailySchedule"]
+                start_time_str = daily_schedule.get("startTime")
+                stop_time_str = daily_schedule.get("stopTime")
+
+                if start_time_str and stop_time_str:
+                    # Parse times
+                    start_hour, start_minute = map(int, start_time_str.split(":"))
+                    stop_hour, stop_minute = map(int, stop_time_str.split(":"))
+
+                    start_time_obj = datetime.min.time().replace(hour=start_hour, minute=start_minute)
+                    stop_time_obj = datetime.min.time().replace(hour=stop_hour, minute=stop_minute)
+
+                    # Check if current time is within the schedule
+                    if start_time_obj <= stop_time_obj:
+                        # Normal schedule within same day
+                        is_within_schedule = start_time_obj <= current_time <= stop_time_obj
+                    else:
+                        # Schedule crosses midnight
+                        is_within_schedule = current_time >= start_time_obj or current_time <= stop_time_obj
+
+            elif schedule_type == "DAILY" and scheduling_config.get("weeklySchedule"):
+                # Daily schedule
+                weekly_schedule = scheduling_config["weeklySchedule"]
+                today_schedule = weekly_schedule.get(current_day)
+
+                if today_schedule and today_schedule.get("startTime") and today_schedule.get("stopTime"):
+                    start_time_str = today_schedule["startTime"]
+                    stop_time_str = today_schedule["stopTime"]
+
+                    # Parse times
+                    start_hour, start_minute = map(int, start_time_str.split(":"))
+                    stop_hour, stop_minute = map(int, stop_time_str.split(":"))
+
+                    start_time_obj = datetime.min.time().replace(hour=start_hour, minute=start_minute)
+                    stop_time_obj = datetime.min.time().replace(hour=stop_hour, minute=stop_minute)
+
+                    # Check if current time is within the schedule
+                    if start_time_obj <= stop_time_obj:
+                        # Normal schedule within same day
+                        is_within_schedule = start_time_obj <= current_time <= stop_time_obj
+                    else:
+                        # Schedule crosses midnight
+                        is_within_schedule = current_time >= start_time_obj or current_time <= stop_time_obj
+
+            # Adjust capacity based on schedule
+            if is_within_schedule:
+                logger.info(f"Current time {current_time} ({timezone_name}) is within scheduled hours")
+                # Keep original capacity settings
+            else:
+                logger.info(f"Current time {current_time} ({timezone_name}) is outside scheduled hours")
+                # Set desired capacity to 0 for deployment outside scheduled hours
+                auto_scaling_config["minCapacity"] = 0
+                # Keep maxCapacity at original value - CloudFormation requires maxCapacity > 0
+                auto_scaling_config["desiredCapacity"] = 0
+
+        except Exception as time_error:
+            logger.error(f"Error processing schedule time logic: {time_error}", exc_info=True)
+            # If we can't determine the schedule, use default capacity to be safe
+            logger.info("Using original capacity settings due to schedule processing error")
+
+    except Exception as e:
+        logger.error(f"Error adjusting initial capacity for schedule: {e}", exc_info=True)
+        # If scheduling logic fails, proceed with original capacity settings
+        logger.info("Using original capacity settings due to scheduling error")
+


The adjust_initial_capacity_for_schedule() function has several issues:

Time parsing vulnerability: Lines 111-112 and 135-136 use split(":") without validation. If the time format is invalid (e.g., "25:70"), the int() conversion will fail. The PR description mentions "24-hour time format validation" but this function doesn't validate the format before parsing.

Incorrect time object creation: Lines 114-115 and 138-139 use datetime.min.time().replace() which creates time objects at midnight (00:00:00). This is correct for time-only objects, but the logic is unclear. Consider using time() constructor directly:

Suggested change

def adjust_initial_capacity_for_schedule(prepared_event: Dict[str, Any]) -> None:

"""Adjust Auto Scaling Group initial capacity based on schedule configuration"""

try:

# Check if scheduling is configured

auto_scaling_config = prepared_event.get("autoScalingConfig", {})

scheduling_config = auto_scaling_config.get("scheduling")

if not scheduling_config or not scheduling_config.get("scheduleEnabled"):

logger.info("No scheduling configured - using original capacity settings")

return

schedule_type = scheduling_config.get("scheduleType")

timezone_name = scheduling_config.get("timezone")

try:

tz = ZoneInfo(timezone_name)

now = datetime.now(tz)

current_time = now.time()

current_day = now.strftime("%A").lower()

is_within_schedule = False

if schedule_type == "RECURRING" and scheduling_config.get("dailySchedule"):

# Daily recurring schedule

daily_schedule = scheduling_config["dailySchedule"]

start_time_str = daily_schedule.get("startTime")

stop_time_str = daily_schedule.get("stopTime")

if start_time_str and stop_time_str:

# Parse times

start_hour, start_minute = map(int, start_time_str.split(":"))

stop_hour, stop_minute = map(int, stop_time_str.split(":"))

start_time_obj = datetime.min.time().replace(hour=start_hour, minute=start_minute)

stop_time_obj = datetime.min.time().replace(hour=stop_hour, minute=stop_minute)

# Check if current time is within the schedule

if start_time_obj <= stop_time_obj:

# Normal schedule within same day

is_within_schedule = start_time_obj <= current_time <= stop_time_obj

else:

# Schedule crosses midnight

is_within_schedule = current_time >= start_time_obj or current_time <= stop_time_obj

elif schedule_type == "DAILY" and scheduling_config.get("weeklySchedule"):

# Daily schedule

weekly_schedule = scheduling_config["weeklySchedule"]

today_schedule = weekly_schedule.get(current_day)

if today_schedule and today_schedule.get("startTime") and today_schedule.get("stopTime"):

start_time_str = today_schedule["startTime"]

stop_time_str = today_schedule["stopTime"]

# Parse times

start_hour, start_minute = map(int, start_time_str.split(":"))

stop_hour, stop_minute = map(int, stop_time_str.split(":"))

start_time_obj = datetime.min.time().replace(hour=start_hour, minute=start_minute)

stop_time_obj = datetime.min.time().replace(hour=stop_hour, minute=stop_minute)

# Check if current time is within the schedule

if start_time_obj <= stop_time_obj:

# Normal schedule within same day

is_within_schedule = start_time_obj <= current_time <= stop_time_obj

else:

# Schedule crosses midnight

is_within_schedule = current_time >= start_time_obj or current_time <= stop_time_obj

# Adjust capacity based on schedule

if is_within_schedule:

logger.info(f"Current time {current_time} ({timezone_name}) is within scheduled hours")

# Keep original capacity settings

else:

logger.info(f"Current time {current_time} ({timezone_name}) is outside scheduled hours")

# Set desired capacity to 0 for deployment outside scheduled hours

auto_scaling_config["minCapacity"] = 0

# Keep maxCapacity at original value - CloudFormation requires maxCapacity > 0

auto_scaling_config["desiredCapacity"] = 0

except Exception as time_error:

logger.error(f"Error processing schedule time logic: {time_error}", exc_info=True)

# If we can't determine the schedule, use default capacity to be safe

logger.info("Using original capacity settings due to schedule processing error")

except Exception as e:

logger.error(f"Error adjusting initial capacity for schedule: {e}", exc_info=True)

# If scheduling logic fails, proceed with original capacity settings

logger.info("Using original capacity settings due to scheduling error")

start_time_obj = time(start_hour, start_minute)

stop_time_obj = time(stop_hour, stop_minute)

Missing import: The code uses time() implicitly but doesn't import it. Add from datetime import time to the imports.

Inconsistent error handling: The nested try-except blocks (lines 96-163) catch all exceptions broadly. The inner exception at line 160 logs and continues, but the outer exception at line 165 also logs and continues, making it unclear which error path is taken. Consider more specific exception handling.

Logic issue with schedule type check: Line 103 checks schedule_type == "RECURRING" but line 125 checks schedule_type == "DAILY". According to the PR changes summary, the enum was renamed from RECURRING_DAILY → RECURRING and EACH_DAY → DAILY. Verify these string values match the frontend enum values being sent.

github-actions · 2025-11-21T03:53:44Z

lambda/models/state_machine/create_model.py

+    # Adjust initial capacity based on schedule if scheduling is configured
+    adjust_initial_capacity_for_schedule(prepared_event)


The call to adjust_initial_capacity_for_schedule() modifies prepared_event in-place. Ensure this function is called at the right point in the workflow—after all necessary configuration is set but before the event is used downstream. Verify that modifying minCapacity and desiredCapacity at this stage doesn't conflict with any subsequent CloudFormation stack creation logic.

github-actions · 2025-11-21T03:53:47Z

lambda/models/state_machine/create_model.py

+    # Remove scheduling configuration from autoScalingConfig before sending to ECS deployer
+    if "autoScalingConfig" in prepared_event and "scheduling" in prepared_event["autoScalingConfig"]:
+        del prepared_event["autoScalingConfig"]["scheduling"]


The scheduling configuration is removed from autoScalingConfig before invoking the ECS deployer. This is intentional per the PR description, but ensure the ECS deployer Lambda doesn't expect this field. If it does, this could cause silent failures. Consider adding a log statement to confirm the removal:

Suggested change

# Remove scheduling configuration from autoScalingConfig before sending to ECS deployer

if "autoScalingConfig" in prepared_event and "scheduling" in prepared_event["autoScalingConfig"]:

del prepared_event["autoScalingConfig"]["scheduling"]

if "autoScalingConfig" in prepared_event and "scheduling" in prepared_event["autoScalingConfig"]:

logger.info("Removing scheduling configuration before ECS deployer invocation")

del prepared_event["autoScalingConfig"]["scheduling"]

github-actions · 2025-11-21T03:53:50Z

lambda/models/state_machine/create_model.py

+    try:
+        response = lambdaClient.invoke(
+            FunctionName=os.environ["ECS_MODEL_DEPLOYER_FN_ARN"],
+            Payload=json.dumps({"modelConfig": prepared_event}),
+        )
+
+    except Exception as invoke_error:
+        raise StackFailedToCreateException(
+            json.dumps(
+                {
+                    "error": f"Failed to invoke ECS Model Deployer Lambda: {str(invoke_error)}",
+                    "event": event,
+                    "invoke_error": str(invoke_error),
+                }
+            )
+        )
+
+    try:
+        payload = response["Payload"].read()
+        payload = json.loads(payload)
+    except Exception as parse_error:
+        raise StackFailedToCreateException(
+            json.dumps(
+                {
+                    "error": f"Failed to parse ECS Model Deployer response: {str(parse_error)}",
+                    "event": event,
+                    "raw_response": str(response["Payload"].read()),
+                    "parse_error": str(parse_error),
+                }
+            )
+        )


The error handling for the ECS deployer invocation has been improved with try-except blocks, but there's a potential issue: at line 398, response["Payload"].read() is called again after already being read at line 390. The stream is consumed after the first read, so the second read will return an empty result. This will cause the error message to show an empty string. Remove the second read:

Suggested change

try:

response = lambdaClient.invoke(

FunctionName=os.environ["ECS_MODEL_DEPLOYER_FN_ARN"],

Payload=json.dumps({"modelConfig": prepared_event}),

)

except Exception as invoke_error:

raise StackFailedToCreateException(

json.dumps(

{

"error": f"Failed to invoke ECS Model Deployer Lambda: {str(invoke_error)}",

"event": event,

"invoke_error": str(invoke_error),

}

)

)

try:

payload = response["Payload"].read()

payload = json.loads(payload)

except Exception as parse_error:

raise StackFailedToCreateException(

json.dumps(

{

"error": f"Failed to parse ECS Model Deployer response: {str(parse_error)}",

"event": event,

"raw_response": str(response["Payload"].read()),

"parse_error": str(parse_error),

}

)

)

except Exception as parse_error:

raise StackFailedToCreateException(

json.dumps(

{

"error": f"Failed to parse ECS Model Deployer response: {str(parse_error)}",

"event": event,

"raw_response": payload,

"parse_error": str(parse_error),

}

)

)

github-actions · 2025-11-21T03:53:53Z

lambda/models/state_machine/create_model.py

    stack_name = payload.get("stackName", None)

    if not stack_name:
-        # Log the full payload for debugging
-        logger.error(f"ECS Model Deployer response: {payload}")
-        error_message = payload.get("errorMessage", "Unknown error")
-        error_type = payload.get("errorType", "Unknown error type")
-
+        error_message = payload.get("errorMessage")
+        error_type = payload.get("errorType")
+        trace = payload.get("trace", [])
        raise StackFailedToCreateException(
            json.dumps(
                {
                    "error": f"Failed to create Model CloudFormation Stack. {error_type}: {error_message}",
                    "event": event,
                    "deployer_response": payload,
+                    "debug_info": {
+                        "error_type": error_type,
+                        "error_message": error_message,
+                        "stack_trace": trace,
+                        "full_response": payload,
+                    },


Indentation is inconsistent in the new hunk. Lines 404-410 have incorrect indentation (extra spaces). Ensure proper Python indentation alignment with the surrounding code.

github-actions · 2025-11-21T03:53:56Z

lambda/utilities/common_functions.py

 from decimal import Decimal
 from functools import cache
-from typing import Any, Callable, cast, Dict, Optional, TypeVar, Union
+from typing import Any, Callable, cast, Dict, List, Optional, TypeVar, Union


The import statement adds List to the typing imports. Verify that List is actually used in this module. If it's not used, consider removing it to keep imports minimal. If it is used, ensure it's being used instead of the built-in list type hint (which is preferred in Python 3.9+).

github-actions · 2025-11-21T03:53:59Z

lambda/models/state_machine/schedule_handlers.py


        response = lambda_client.invoke(
-            FunctionName=os.environ.get("SCHEDULE_MANAGEMENT_FUNCTION_NAME", "LISA-ScheduleManagement"),
+            FunctionName=os.environ.get("SCHEDULE_MANAGEMENT_FUNCTION_NAME"),


Missing default fallback value for SCHEDULE_MANAGEMENT_FUNCTION_NAME environment variable. If this environment variable is not set, os.environ.get() will return None, causing the Lambda invocation to fail. Consider adding a sensible default value or ensuring this variable is always defined in the deployment configuration.

Suggested change

FunctionName=os.environ.get("SCHEDULE_MANAGEMENT_FUNCTION_NAME"),

FunctionName=os.environ.get("SCHEDULE_MANAGEMENT_FUNCTION_NAME", "LISA-ScheduleManagement"),

github-actions · 2025-11-21T03:54:02Z

lambda/models/state_machine/schedule_handlers.py


        response = lambda_client.invoke(
-            FunctionName=os.environ.get("SCHEDULE_MANAGEMENT_FUNCTION_NAME", "LISA-ScheduleManagement"),
+            FunctionName=os.environ.get("SCHEDULE_MANAGEMENT_FUNCTION_NAME"),


Missing default fallback value for SCHEDULE_MANAGEMENT_FUNCTION_NAME environment variable. If this environment variable is not set, os.environ.get() will return None, causing the Lambda invocation to fail. Consider adding a sensible default value or ensuring this variable is always defined in the deployment configuration.

Suggested change

FunctionName=os.environ.get("SCHEDULE_MANAGEMENT_FUNCTION_NAME"),

FunctionName=os.environ.get("SCHEDULE_MANAGEMENT_FUNCTION_NAME", "LISA-ScheduleManagement"),

github-actions · 2025-11-21T03:54:11Z

lib/models/model-api.ts

+        const scheduleMonitoringLambda = new Function(this, 'ScheduleMonitoring', {
+            runtime: getDefaultRuntime(),
+            handler: 'models.scheduling.schedule_monitoring.lambda_handler',
+            code: Code.fromAsset(lambdaPath),
+            layers: lambdaLayers,
+            environment: {
+                MODEL_TABLE_NAME: modelTable.tableName,
+                ECS_CLUSTER_NAME: `${config.deploymentPrefix}-ECS-Cluster`,
+                LISA_API_URL_PS_NAME: lisaServeEndpointUrlPs.parameterName,
+                MANAGEMENT_KEY_NAME: managementKeyName,
+                REST_API_VERSION: 'v2',
+            },
+            role: stateMachinesLambdaRole,
+            vpc: vpc.vpc,
+            securityGroups: securityGroups,
+            timeout: Duration.minutes(5),
+            description: 'Processes Auto Scaling Group CloudWatch events to update model status',
+        });


The scheduleMonitoringLambda environment variable ECS_CLUSTER_NAME is set to ${config.deploymentPrefix}-ECS-Cluster, but according to the PR description, the monitoring logic has been refactored to use Auto Scaling Group state checking instead of ECS-based status tracking. Verify that this environment variable is still needed or remove it if it is no longer used.

github-actions · 2025-11-21T03:54:14Z

lib/api-base/ecsCluster.ts

+    public addScheduleAwareService (
+        taskName: ECSTasks,
+        taskDefinition: TaskDefinition,
+        scheduleConfig?: {
+            scheduleEnabled: boolean;
+            scheduleType: string;
+            timezone: string;
+        }
+    ): { service: Ec2Service; targetGroup?: ApplicationTargetGroup } {
+        const result = this.addTask(taskName, taskDefinition);
+        const { service } = result;
+
+        // Add schedule-related tags to the service
+        if (scheduleConfig?.scheduleEnabled) {
+            Tags.of(service).add('ScheduleEnabled', 'true');
+            Tags.of(service).add('ScheduleType', scheduleConfig.scheduleType);
+            Tags.of(service).add('Timezone', scheduleConfig.timezone);
+            Tags.of(service).add('ScheduleManaged', 'true');
+        } else {
+            Tags.of(service).add('ScheduleEnabled', 'false');
+            Tags.of(service).add('RunsAllTime', 'true');
+        }
+
+        // Add schedule-related environment variables
+        if (scheduleConfig?.scheduleEnabled) {
+            const container = service.taskDefinition.findContainer(createCdkId([taskName, 'Container']));
+            if (container) {
+                container.addEnvironment('SCHEDULE_ENABLED', 'true');
+                container.addEnvironment('SCHEDULE_TYPE', scheduleConfig.scheduleType);
+                container.addEnvironment('SCHEDULE_TIMEZONE', scheduleConfig.timezone);
+            }
+        }
+
+        return result;
+    }


The addScheduleAwareService() method adds schedule-related tags and environment variables to services, but there's a potential issue with the implementation:

The method modifies the task definition by adding environment variables directly to the container (lines 501-506). This happens after the service is created via this.addTask(), which may have already started the task. Consider whether these environment variables should be set before service creation.

The method searches for a container using createCdkId([taskName, 'Container']) (line 501), but there's no validation that this container exists or that the naming convention matches. If the container isn't found, the environment variables silently fail to be added. Consider adding explicit error handling or logging.

Suggested change

public addScheduleAwareService (

taskName: ECSTasks,

taskDefinition: TaskDefinition,

scheduleConfig?: {

scheduleEnabled: boolean;

scheduleType: string;

timezone: string;

}

): { service: Ec2Service; targetGroup?: ApplicationTargetGroup } {

const result = this.addTask(taskName, taskDefinition);

const { service } = result;

// Add schedule-related tags to the service

if (scheduleConfig?.scheduleEnabled) {

Tags.of(service).add('ScheduleEnabled', 'true');

Tags.of(service).add('ScheduleType', scheduleConfig.scheduleType);

Tags.of(service).add('Timezone', scheduleConfig.timezone);

Tags.of(service).add('ScheduleManaged', 'true');

} else {

Tags.of(service).add('ScheduleEnabled', 'false');

Tags.of(service).add('RunsAllTime', 'true');

}

// Add schedule-related environment variables

if (scheduleConfig?.scheduleEnabled) {

const container = service.taskDefinition.findContainer(createCdkId([taskName, 'Container']));

if (container) {

container.addEnvironment('SCHEDULE_ENABLED', 'true');

container.addEnvironment('SCHEDULE_TYPE', scheduleConfig.scheduleType);

container.addEnvironment('SCHEDULE_TIMEZONE', scheduleConfig.timezone);

}

}

return result;

}

public addScheduleAwareService (

taskName: ECSTasks,

taskDefinition: TaskDefinition,

scheduleConfig?: {

scheduleEnabled: boolean;

scheduleType: string;

timezone: string;

}

): { service: Ec2Service; targetGroup?: ApplicationTargetGroup } {

// Add schedule-related environment variables before service creation

if (scheduleConfig?.scheduleEnabled) {

const container = taskDefinition.findContainer(createCdkId([taskName, 'Container']));

if (container) {

container.addEnvironment('SCHEDULE_ENABLED', 'true');

container.addEnvironment('SCHEDULE_TYPE', scheduleConfig.scheduleType);

container.addEnvironment('SCHEDULE_TIMEZONE', scheduleConfig.timezone);

}

}

const result = this.addTask(taskName, taskDefinition);

const { service } = result;

// Add schedule-related tags to the service

if (scheduleConfig?.scheduleEnabled) {

Tags.of(service).add('ScheduleEnabled', 'true');

Tags.of(service).add('ScheduleType', scheduleConfig.scheduleType);

Tags.of(service).add('Timezone', scheduleConfig.timezone);

Tags.of(service).add('ScheduleManaged', 'true');

} else {

Tags.of(service).add('ScheduleEnabled', 'false');

Tags.of(service).add('RunsAllTime', 'true');

}

return result;

}

github-actions · 2025-11-21T03:54:17Z

lib/api-base/ecsCluster.ts

+    public createScheduledAction (
+        actionName: string,
+        schedule: string,
+        minSize?: number,
+        maxSize?: number,
+        desiredCapacity?: number,
+        timezone?: string
+    ): CfnScheduledAction {
+        const scheduledAction = new CfnScheduledAction(this, createCdkId([this.identifier, actionName, 'ScheduledAction']), {
+            autoScalingGroupName: this.autoScalingGroup.autoScalingGroupName,
+            recurrence: schedule,
+            ...(minSize !== undefined && { minSize }),
+            ...(maxSize !== undefined && { maxSize }),
+            ...(desiredCapacity !== undefined && { desiredCapacity }),
+            ...(timezone && { timeZone: timezone })
+        });
+
+        // Add tags to track the scheduled action
+        Tags.of(scheduledAction).add('ActionType', 'Schedule');
+        Tags.of(scheduledAction).add('LISACluster', this.identifier);
+        Tags.of(scheduledAction).add('CreatedBy', 'LISA-ScheduleManagement');
+
+        return scheduledAction;
+    }


The createScheduledAction() method uses conditional spread operators to optionally include capacity parameters (lines 445-447). However, AWS CloudFormation requires at least one of minSize, maxSize, or desiredCapacity to be specified for a scheduled action to be meaningful. Consider adding validation to ensure at least one capacity parameter is provided:

Suggested change

public createScheduledAction (

actionName: string,

schedule: string,

minSize?: number,

maxSize?: number,

desiredCapacity?: number,

timezone?: string

): CfnScheduledAction {

const scheduledAction = new CfnScheduledAction(this, createCdkId([this.identifier, actionName, 'ScheduledAction']), {

autoScalingGroupName: this.autoScalingGroup.autoScalingGroupName,

recurrence: schedule,

...(minSize !== undefined && { minSize }),

...(maxSize !== undefined && { maxSize }),

...(desiredCapacity !== undefined && { desiredCapacity }),

...(timezone && { timeZone: timezone })

});

// Add tags to track the scheduled action

Tags.of(scheduledAction).add('ActionType', 'Schedule');

Tags.of(scheduledAction).add('LISACluster', this.identifier);

Tags.of(scheduledAction).add('CreatedBy', 'LISA-ScheduleManagement');

return scheduledAction;

}

public createScheduledAction (

actionName: string,

schedule: string,

minSize?: number,

maxSize?: number,

desiredCapacity?: number,

timezone?: string

): CfnScheduledAction {

if (minSize === undefined && maxSize === undefined && desiredCapacity === undefined) {

throw new Error('At least one of minSize, maxSize, or desiredCapacity must be specified for a scheduled action');

}

const scheduledAction = new CfnScheduledAction(this, createCdkId([this.identifier, actionName, 'ScheduledAction']), {

autoScalingGroupName: this.autoScalingGroup.autoScalingGroupName,

recurrence: schedule,

...(minSize !== undefined && { minSize }),

...(maxSize !== undefined && { maxSize }),

...(desiredCapacity !== undefined && { desiredCapacity }),

...(timezone && { timeZone: timezone })

});

// Add tags to track the scheduled action

Tags.of(scheduledAction).add('ActionType', 'Schedule');

Tags.of(scheduledAction).add('LISACluster', this.identifier);

Tags.of(scheduledAction).add('CreatedBy', 'LISA-ScheduleManagement');

return scheduledAction;

}

github-actions · 2025-11-24T08:19:51Z

lambda/models/handler/schedule_handlers.py

+        try:
+            result = schedule_management.delete_schedule(payload)

-        result = json.loads(response["Payload"].read())
+            if result.get("statusCode") != 200:
+                error_message = result.get("body", {}).get("message", "Unknown error")
+                raise ValueError(f"Failed to delete schedule: {error_message}")

-        if response["StatusCode"] != 200 or result.get("statusCode") != 200:
-            error_message = result.get("body", {}).get("message", "Unknown error")
-            raise ValueError(f"Failed to delete schedule: {error_message}")
+        except Exception as e:
+            raise ValueError(f"Failed to delete schedule: {str(e)}")


Similar to the UpdateScheduleHandler, verify that schedule_management.delete_schedule() returns a dict with a statusCode key. The error handling assumes this structure, but if the function raises exceptions instead, the try-except should be adjusted accordingly.

github-actions · 2025-11-24T08:19:54Z

lambda/models/state_machine/schedule_handlers.py

 import boto3
 from botocore.config import Config

+from ..scheduling import schedule_management


The import of schedule_management module is added here, but the old code removed the lambda_client initialization. Ensure that schedule_management module is properly available in the deployment package and that all its functions (update_schedule, delete_schedule) are correctly implemented to handle the payloads being passed.

github-actions · 2025-11-24T08:19:57Z

lambda/models/state_machine/schedule_handlers.py

+        # Call schedule management function directly
        payload = {
            "operation": "update",
            "modelId": model_id,
            "scheduleConfig": scheduling_config,
            "autoScalingGroup": auto_scaling_group,
        }

-        response = lambda_client.invoke(
-            FunctionName=os.environ.get("SCHEDULE_MANAGEMENT_FUNCTION_NAME"),
-            InvocationType="RequestResponse",
-            Payload=json.dumps(payload),
-        )
-
-        result = json.loads(response["Payload"].read())
+        result = schedule_management.update_schedule(payload)

        if result.get("statusCode") == 200:
-            result_body = json.loads(result["body"])
+            result_body = json.loads(result["body"]) if isinstance(result["body"], str) else result["body"]


The refactoring changes from Lambda invocation to direct function calls. However, there's an inconsistency in result handling:

Line 69 checks if result.get("statusCode") == 200: but direct function calls may not return a dict with statusCode key

The result parsing on line 69 assumes the result structure matches Lambda response format

Verify that schedule_management.update_schedule() returns a dict with statusCode and body keys, or adjust the result handling accordingly.

github-actions · 2025-11-24T08:20:00Z

lambda/models/state_machine/schedule_handlers.py

        payload = {
            "operation": "update",
            "modelId": model_id,
            "scheduleConfig": new_scheduling_config,
            "autoScalingGroup": auto_scaling_group,
        }

-        response = lambda_client.invoke(
-            FunctionName=os.environ.get("SCHEDULE_MANAGEMENT_FUNCTION_NAME", "LISA-ScheduleManagement"),
-            InvocationType="RequestResponse",
-            Payload=json.dumps(payload),
-        )
-
-        result = json.loads(response["Payload"].read())
+        result = schedule_management.update_schedule(payload)

        if result.get("statusCode") == 200:
-            result_body = json.loads(result["body"])
+            result_body = json.loads(result["body"]) if isinstance(result["body"], str) else result["body"]


Same issue as the previous hunk: the result handling assumes a Lambda response format with statusCode and body keys. Ensure schedule_management.update_schedule() returns the expected structure, or update the result parsing logic to match the actual return type.

github-actions · 2025-11-24T08:20:03Z

lambda/models/state_machine/schedule_handlers.py

        payload = {"operation": "delete", "modelId": model_id}

-        response = lambda_client.invoke(
-            FunctionName=os.environ.get("SCHEDULE_MANAGEMENT_FUNCTION_NAME", "LISA-ScheduleManagement"),
-            InvocationType="RequestResponse",
-            Payload=json.dumps(payload),
-        )
-
-        result = json.loads(response["Payload"].read())
+        result = schedule_management.delete_schedule(payload)


The result handling on line 164 (not shown but implied) checks result.get("statusCode") == 200:. Verify that schedule_management.delete_schedule() returns a dict with a statusCode key, or adjust the result handling to match the actual return type of the direct function call.

github-actions · 2025-11-24T08:20:09Z

lib/models/model-api.ts

+        // Add permission for state machine lambdas to invoke the ScheduleManagement lambda
+        const scheduleManagementPermission = new Policy(this, 'ScheduleManagementInvokePerms', {
+            statements: [
+                new PolicyStatement({
+                    effect: Effect.ALLOW,
+                    actions: [
+                        'lambda:InvokeFunction',
+                    ],
+                    resources: [
+                        scheduleManagementLambda.functionArn,
+                    ],
+                })
+            ]
+        });
+        stateMachinesLambdaRole.attachInlinePolicy(scheduleManagementPermission);


The inline policy ScheduleManagementInvokePerms grants permission to invoke scheduleManagementLambda, but there's no corresponding permission for the CloudWatch Events rule to invoke scheduleMonitoringLambda. Add a Lambda invoke permission for the CloudWatch Events rule:

Suggested change

// Add permission for state machine lambdas to invoke the ScheduleManagement lambda

const scheduleManagementPermission = new Policy(this, 'ScheduleManagementInvokePerms', {

statements: [

new PolicyStatement({

effect: Effect.ALLOW,

actions: [

'lambda:InvokeFunction',

],

resources: [

scheduleManagementLambda.functionArn,

],

})

]

});

stateMachinesLambdaRole.attachInlinePolicy(scheduleManagementPermission);

// Add permission for state machine lambdas to invoke the ScheduleManagement lambda

const scheduleManagementPermission = new Policy(this, 'ScheduleManagementInvokePerms', {

statements: [

new PolicyStatement({

effect: Effect.ALLOW,

actions: [

'lambda:InvokeFunction',

],

resources: [

scheduleManagementLambda.functionArn,

],

})

]

});

stateMachinesLambdaRole.attachInlinePolicy(scheduleManagementPermission);

scheduleMonitoringLambda.grantInvoke(new ServicePrincipal('events.amazonaws.com'));

github-actions · 2025-11-24T08:20:15Z

lib/user-interface/react/src/components/model-management/create-model/CreateModelModal.tsx

+            // Check if this is a scheduling-only update
+            const isSchedulingOnlyUpdate = updateFields.autoScalingConfig?.scheduling &&
+                Object.keys(updateFields).length === 2 && // modelId + autoScalingConfig
+                Object.keys(updateFields.autoScalingConfig).length === 1 && // only scheduling
+                Object.keys(updateFields.autoScalingConfig)[0] === 'scheduling';
+
+            if (isSchedulingOnlyUpdate) {
+                // Use separate schedule API for scheduling-only updates
+                resetScheduleUpdate();
+                updateScheduleMutation({
+                    modelId: props.selectedItems[0].modelId,
+                    scheduleConfig: state.form.autoScalingConfig.scheduling
+                });
+            } else {
+                // Handle autoScalingConfig if present (non-scheduling changes)
+                if (updateFields.autoScalingConfig) {
+                    // Only pick instance-specific fields for autoScalingInstanceConfig
+                    const instanceConfigFields = ['minCapacity', 'maxCapacity', 'desiredCapacity', 'cooldown', 'defaultInstanceWarmup'];
+                    const autoScalingInstanceConfig = _.pick(updateFields.autoScalingConfig, instanceConfigFields);
+                    const filteredInstanceConfig = _.pickBy(autoScalingInstanceConfig, (value) => value !== undefined);
+
+                    // Only include autoScalingInstanceConfig if it has at least one instance-specific property
+                    if (!_.isEmpty(filteredInstanceConfig)) {
+                        updateRequest.autoScalingInstanceConfig = filteredInstanceConfig;
+                    }
+                }


The logic for detecting scheduling-only updates has a potential issue. The condition checks if updateFields.autoScalingConfig has only a scheduling key, but this doesn't account for cases where other fields in autoScalingConfig might be undefined or null. Consider using a more robust check:

Suggested change

// Check if this is a scheduling-only update

const isSchedulingOnlyUpdate = updateFields.autoScalingConfig?.scheduling &&

Object.keys(updateFields).length === 2 && // modelId + autoScalingConfig

Object.keys(updateFields.autoScalingConfig).length === 1 && // only scheduling

Object.keys(updateFields.autoScalingConfig)[0] === 'scheduling';

if (isSchedulingOnlyUpdate) {

// Use separate schedule API for scheduling-only updates

resetScheduleUpdate();

updateScheduleMutation({

modelId: props.selectedItems[0].modelId,

scheduleConfig: state.form.autoScalingConfig.scheduling

});

} else {

// Handle autoScalingConfig if present (non-scheduling changes)

if (updateFields.autoScalingConfig) {

// Only pick instance-specific fields for autoScalingInstanceConfig

const instanceConfigFields = ['minCapacity', 'maxCapacity', 'desiredCapacity', 'cooldown', 'defaultInstanceWarmup'];

const autoScalingInstanceConfig = _.pick(updateFields.autoScalingConfig, instanceConfigFields);

const filteredInstanceConfig = _.pickBy(autoScalingInstanceConfig, (value) => value !== undefined);

// Only include autoScalingInstanceConfig if it has at least one instance-specific property

if (!_.isEmpty(filteredInstanceConfig)) {

updateRequest.autoScalingInstanceConfig = filteredInstanceConfig;

}

}

// Check if this is a scheduling-only update

const isSchedulingOnlyUpdate = updateFields.autoScalingConfig?.scheduling &&

Object.keys(updateFields).length === 2 && // modelId + autoScalingConfig

Object.keys(updateFields.autoScalingConfig).filter(key => updateFields.autoScalingConfig[key] !== undefined).length === 1 && // only scheduling

updateFields.autoScalingConfig.scheduling !== undefined;

Alternatively, explicitly check that other ASG fields are undefined:

Suggested change

// Check if this is a scheduling-only update

const isSchedulingOnlyUpdate = updateFields.autoScalingConfig?.scheduling &&

Object.keys(updateFields).length === 2 && // modelId + autoScalingConfig

Object.keys(updateFields.autoScalingConfig).length === 1 && // only scheduling

Object.keys(updateFields.autoScalingConfig)[0] === 'scheduling';

if (isSchedulingOnlyUpdate) {

// Use separate schedule API for scheduling-only updates

resetScheduleUpdate();

updateScheduleMutation({

modelId: props.selectedItems[0].modelId,

scheduleConfig: state.form.autoScalingConfig.scheduling

});

} else {

// Handle autoScalingConfig if present (non-scheduling changes)

if (updateFields.autoScalingConfig) {

// Only pick instance-specific fields for autoScalingInstanceConfig

const instanceConfigFields = ['minCapacity', 'maxCapacity', 'desiredCapacity', 'cooldown', 'defaultInstanceWarmup'];

const autoScalingInstanceConfig = _.pick(updateFields.autoScalingConfig, instanceConfigFields);

const filteredInstanceConfig = _.pickBy(autoScalingInstanceConfig, (value) => value !== undefined);

// Only include autoScalingInstanceConfig if it has at least one instance-specific property

if (!_.isEmpty(filteredInstanceConfig)) {

updateRequest.autoScalingInstanceConfig = filteredInstanceConfig;

}

}

// Check if this is a scheduling-only update

const isSchedulingOnlyUpdate = updateFields.autoScalingConfig?.scheduling &&

Object.keys(updateFields).length === 2 && // modelId + autoScalingConfig

updateFields.autoScalingConfig.minCapacity === undefined &&

updateFields.autoScalingConfig.maxCapacity === undefined &&

updateFields.autoScalingConfig.desiredCapacity === undefined &&

updateFields.autoScalingConfig.cooldown === undefined &&

updateFields.autoScalingConfig.defaultInstanceWarmup === undefined;

github-actions · 2025-11-24T08:20:18Z

lib/user-interface/react/src/components/model-management/create-model/CreateModelModal.tsx

+    useEffect(() => {
+        if (!isScheduleUpdating && isScheduleUpdateSuccess) {
+            notificationService.generateNotification(`Successfully updated schedule: ${state.form.modelId}`, 'success');
+            props.setVisible(false);
+            props.setIsEdit(false);
+            props.setSelectedItems([]);
+            resetState();
+        }
+        // eslint-disable-next-line react-hooks/exhaustive-deps
+    }, [isScheduleUpdating, isScheduleUpdateSuccess]);


The useEffect hook for schedule update success is missing scheduleUpdateError and scheduleUpdateSuccess in its dependency array. While the eslint-disable comment suppresses the warning, this could lead to stale closures. Consider adding these to the dependency array or ensure they're properly memoized:

Suggested change

useEffect(() => {

if (!isScheduleUpdating && isScheduleUpdateSuccess) {

notificationService.generateNotification(`Successfully updated schedule: ${state.form.modelId}`, 'success');

props.setVisible(false);

props.setIsEdit(false);

props.setSelectedItems([]);

resetState();

}

// eslint-disable-next-line react-hooks/exhaustive-deps

}, [isScheduleUpdating, isScheduleUpdateSuccess]);

useEffect(() => {

if (!isScheduleUpdating && isScheduleUpdateSuccess) {

notificationService.generateNotification(`Successfully updated schedule: ${state.form.modelId}`, 'success');

props.setVisible(false);

props.setIsEdit(false);

props.setSelectedItems([]);

resetState();

}

// eslint-disable-next-line react-hooks/exhaustive-deps

}, [isScheduleUpdating, isScheduleUpdateSuccess, notificationService]);

github-actions · 2025-11-24T08:20:21Z

lib/user-interface/react/src/components/model-management/create-model/CreateModelModal.tsx

+    const reviewError = normalizeError('Model',
+        isCreateError ? createError :
+            isUpdateError ? updateError :
+                isScheduleUpdateError ? scheduleUpdateError : undefined);


The error normalization logic now includes scheduleUpdateError, but there's no corresponding error handling UI or user feedback mechanism shown for schedule update failures. Ensure that scheduleUpdateError is properly displayed to the user when a schedule update fails, similar to how other errors are handled.

github-actions · 2025-11-24T08:20:24Z