Skip to content

Conversation

@jhjaggars
Copy link
Contributor

@jhjaggars jhjaggars commented Feb 9, 2026

User description

Summary

Removes the manual lifecycle scripts in deploy/regional/examples/ and consolidates on the Lambda/OIDC workflow as the single canonical path for investigation management.

Background

The Lambda/OIDC workflow (tools/create-investigation-lambda.sh + tools/sre-auth/) provides:

  • OIDC authentication via Keycloak
  • Group-based authorization (Lambda validates membership)
  • Per-user IAM roles with tag-based task isolation
  • Automatic role management

The manual scripts in deploy/regional/examples/ were a parallel path that:

  • Bypassed Keycloak authentication entirely
  • Required direct AWS IAM credentials
  • Created confusion about which workflow to use

Changes

Deleted Files

  • deploy/regional/examples/ directory (6 shell scripts)
    • create_investigation.sh
    • launch_task.sh
    • join_task.sh
    • stop_task.sh
    • close_investigation.sh
    • build-task-def.jq
  • Untracked test scripts from tag-isolation development

Documentation Updates

  • CLAUDE.md - Removed examples/ from structure, removed "Manual Lifecycle Scripts" section
  • README.md - Removed manual scripts references
  • deploy/regional/README.md - Replaced manual workflow with Lambda-based workflow
  • docs/runbooks/investigation-workflow.md - Updated cleanup phases to use AWS CLI
  • docs/runbooks/troubleshooting.md - Updated troubleshooting commands
  • docs/configuration/hcp-boundary-setup.md - Removed create_incident.sh references
  • docs/configuration/integration-scripts.md - Updated to generic automation examples
  • docs/architecture/overview.md - Updated Mermaid diagram

Code Updates

  • lambda/create-investigation/handler.py - Updated test reference comment
  • deploy/regional/lambda-create-investigation.tf - Removed deleted script from excludes

Test Results

LocalStack integration tests: 22 passed ✓

  • All infrastructure tests (ECS, EFS, IAM, S3, KMS, tag isolation) passed
  • No broken references to deleted scripts found

Migration Path

The single recommended workflow is now:

tools/create-investigation-lambda.sh <cluster-id> <investigation-id> [oc-version]

For manual cleanup operations, users can reference the AWS CLI commands documented in deploy/regional/README.md and docs/runbooks/investigation-workflow.md.

🤖 Generated with Claude Code


PR Type

Enhancement


Description

  • Removes 6 manual lifecycle scripts from deploy/regional/examples/

  • Consolidates on Lambda/OIDC workflow as single canonical path

  • Updates all documentation to reference Lambda-based workflow

  • Replaces manual scripts with AWS CLI commands for cleanup operations


Diagram Walkthrough

flowchart LR
  OLD["Manual Scripts<br/>deploy/regional/examples/"]
  LAMBDA["Lambda Workflow<br/>tools/create-investigation-lambda.sh"]
  DOCS["Documentation<br/>Updated to Lambda path"]
  
  OLD -->|"Removed"| DELETED["Deleted Files"]
  LAMBDA -->|"Replaces"| OLD
  LAMBDA -->|"Referenced in"| DOCS
  DELETED -->|"Cleanup via"| CLI["AWS CLI Commands"]
Loading

File Walkthrough

Relevant files
Documentation
9 files
handler.py
Update test reference comment to new location                       
+1/-1     
CLAUDE.md
Remove examples directory from structure documentation     
+1/-50   
README.md
Remove manual scripts section and examples references       
+1/-27   
README.md
Replace manual workflow with Lambda-based workflow             
+59/-81 
overview.md
Update Mermaid diagram to reflect Lambda workflow               
+6/-11   
hcp-boundary-setup.md
Remove create_incident.sh references                                         
+2/-2     
integration-scripts.md
Update to generic automation examples without scripts       
+18/-38 
investigation-workflow.md
Update cleanup phases to use AWS CLI directly                       
+48/-33 
troubleshooting.md
Update troubleshooting commands to AWS CLI                             
+5/-2     
Miscellaneous
6 files
close_investigation.sh
Remove manual investigation cleanup script                             
+0/-144 
create_investigation.sh
Remove manual investigation creation script                           
+0/-170 
join_task.sh
Remove manual task connection script                                         
+0/-67   
launch_task.sh
Remove manual task launch script                                                 
+0/-103 
stop_task.sh
Remove manual task stop script                                                     
+0/-104 
build-task-def.jq
Remove jq task definition transformation script                   
+0/-31   
Configuration changes
1 files
lambda-create-investigation.tf
Remove deleted test script from Lambda excludes                   
+0/-1     

The Lambda/OIDC workflow (tools/create-investigation-lambda.sh) is the
canonical path for investigation lifecycle management. The manual scripts
in deploy/regional/examples/ bypassed Keycloak authentication and required
direct AWS IAM credentials - they were a parallel path causing confusion.

Changes:
- Remove deploy/regional/examples/ directory (6 shell scripts)
- Remove untracked test scripts from tag-isolation development
- Update all documentation to reference Lambda workflow
- Update docs/runbooks to use AWS CLI directly for cleanup
- Update lambda handler and Terraform to remove deleted script references

The single recommended workflow is now:
  tools/create-investigation-lambda.sh <cluster-id> <investigation-id>

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Replace manual cleanup with a script

The PR replaces the close_investigation.sh script with a complex manual cleanup
process. A new script should be created in the tools/ directory to automate
these steps, improving user experience and reducing error risk.

Examples:

deploy/regional/README.md [207-224]
### Cleanup

Investigation cleanup requires manual deletion of resources:

```bash
# 1. List and stop any running tasks
aws ecs list-tasks --cluster rosa-boundary-dev --family <task-family>
aws ecs stop-task --cluster rosa-boundary-dev --task <task-id>

# 2. Deregister task definition revisions

 ... (clipped 8 lines)
docs/runbooks/investigation-workflow.md [301-316]
Cleanup requires manual deletion of resources.

### Usage

```bash
# 1. Stop all running tasks
aws ecs list-tasks --cluster <cluster-name> --family <task-family>
aws ecs stop-task --cluster <cluster-name> --task <task-id>

# 2. Deregister task definition revisions

 ... (clipped 6 lines)

Solution Walkthrough:

Before:

// In deploy/regional/README.md
### Cleanup

Investigation cleanup requires manual deletion of resources:

```bash
# 1. List and stop any running tasks
aws ecs list-tasks --cluster rosa-boundary-dev --family <task-family>
aws ecs stop-task --cluster rosa-boundary-dev --task <task-id>

# 2. Deregister task definition revisions
aws ecs list-task-definitions --family-prefix <task-family>
aws ecs deregister-task-definition --task-definition <task-definition-arn>

# 3. Delete EFS access point
aws efs delete-access-point --access-point-id <fsap-id>



#### After:
```markdown
// In new file tools/close-investigation.sh
#!/bin/bash
# Script to automate cleanup of an investigation
TASK_FAMILY=$1
ACCESS_POINT_ID=$2

# ... logic to find and stop tasks ...
aws ecs list-tasks ... | xargs aws ecs stop-task ...

# ... logic to find and deregister task definitions ...
aws ecs list-task-definitions ... | xargs aws ecs deregister-task-definition ...

# ... logic to delete access point with confirmation ...
read -p "Delete access point $ACCESS_POINT_ID?" CONFIRM
if [ "$CONFIRM" == "yes" ]; then
  aws efs delete-access-point --access-point-id $ACCESS_POINT_ID
fi

echo "Investigation closed successfully!"

Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies that replacing the automated close_investigation.sh script with manual CLI commands is a significant regression in usability and safety for the new workflow.

Medium
General
Improve task deregistration cleanup instructions

Improve the manual cleanup instructions by providing a single command that uses
xargs to deregister all revisions of a task definition family, instead of just
one.

deploy/regional/README.md [216-218]

-# 2. Deregister task definition revisions
-aws ecs list-task-definitions --family-prefix <task-family>
-aws ecs deregister-task-definition --task-definition <task-definition-arn>
+# 2. Deregister all task definition revisions for the family
+aws ecs list-task-definitions --family-prefix <task-family> --status ACTIVE --query 'taskDefinitionArns[]' --output text | \
+  xargs -n 1 aws ecs deregister-task-definition --task-definition
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that the manual cleanup instructions for deregistering task definitions are incomplete, as they don't handle multiple revisions. The proposed xargs command provides a robust, one-line solution, improving the usability and correctness of the documentation.

Low
Provide a complete deregistration command

Improve the manual cleanup instructions by providing a single command that uses
xargs to deregister all revisions of a task definition family, instead of just
one.

docs/runbooks/investigation-workflow.md [310-312]

-# 2. Deregister task definition revisions
-aws ecs list-task-definitions --family-prefix <task-family>
-aws ecs deregister-task-definition --task-definition <task-definition-arn>
+# 2. Deregister all task definition revisions for the family
+aws ecs list-task-definitions --family-prefix <task-family> --status ACTIVE --query 'taskDefinitionArns[]' --output text | \
+  xargs -n 1 aws ecs deregister-task-definition --task-definition
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly points out that the manual task deregistration command is incomplete, as it only handles a single revision. The proposed fix using xargs provides a complete, copy-pasteable command that makes the manual cleanup process more robust and user-friendly.

Low
  • More

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant