Skip to content

Multi-job support (using job start/end hooks), optional CloudWatch logging, demo workflows, module/repo rename#2

Closed
ryan-williams wants to merge 0 commit intomainfrom
rw/hooks
Closed

Multi-job support (using job start/end hooks), optional CloudWatch logging, demo workflows, module/repo rename#2
ryan-williams wants to merge 0 commit intomainfrom
rw/hooks

Conversation

@ryan-williams
Copy link
Copy Markdown
Member

@ryan-williams ryan-williams commented Jul 31, 2025

  • Improved shutdown/cleanup behavior:
    • Deregisters GH runners (in addition to shutting down EC2 instance)
    • Supports multiple jobs (in sequence) on a given EC2 runner
      • Self-hosted runners don't support parallel job execution, out of the box (cf. orgs/community#26769)
      • #3 adds the ability to initialize+register multiple runners in different folders on a given instance
    • Uses ACTIONS_RUNNER_HOOK_JOB_* hooks for job tracking (instead of polling)
  • Added demo workflows (.github/workflows/demo*; see demos.yml runs)
  • Added (optional) CloudWatch integration (for debugging runners)
  • Improved runner name/labels, AWS tags
  • Renamed repo to ec2-gha (and Python module to ec2_gha)
  • Simplified repo stack:
    • Removed/Archived OA/ec2 (its runner.yml now lives in this repo, and wraps action.yml from this repo)
    • Copied demos from OA/ec2-gha-demos into this repo

Examples

demos#18

mamba#15

(these failures are expected, part of demonstrating issues with pip install mamba_ssm at different versions)

Ocean_Emulator#308

CloudWatch console cloudwatch streams
EC2 instances console (showing many terminat{ed,ing} instances instances

ryan-williams added a commit that referenced this pull request Aug 4, 2025
(for testing on #2)
@ryan-williams ryan-williams force-pushed the rw/hooks branch 2 times, most recently from 3f0abc9 to de868e0 Compare August 4, 2025 15:25
ryan-williams added a commit that referenced this pull request Aug 4, 2025
(for testing on #2)
@ryan-williams ryan-williams force-pushed the rw/hooks branch 11 times, most recently from 4b3f29f to e09ed8f Compare August 6, 2025 04:52
ryan-williams added a commit that referenced this pull request Aug 6, 2025
(for testing on #2)
ryan-williams added a commit that referenced this pull request Aug 6, 2025
(for testing on #2)
@ryan-williams ryan-williams marked this pull request as draft August 6, 2025 21:07
@ryan-williams ryan-williams force-pushed the rw/hooks branch 6 times, most recently from 6bec165 to efc6c98 Compare August 7, 2025 14:29
@ryan-williams ryan-williams force-pushed the rw/hooks branch 4 times, most recently from cdb6a54 to 8dcb14b Compare August 7, 2025 20:56
@ryan-williams ryan-williams marked this pull request as ready for review August 8, 2025 18:55
@ryan-williams ryan-williams changed the title v2: Enhanced runner lifecycle, CloudWatch logs, flexible configuration Multi-job support (using job start/end hooks), optional CloudWatch logging, demo workflows, module/repo rename Aug 8, 2025
@ryan-williams ryan-williams force-pushed the rw/hooks branch 2 times, most recently from c648782 to d4bb7fa Compare August 11, 2025 18:56
Copy link
Copy Markdown
Member

@jder jder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff, this seems great. I didn't read the bash script super carefully but generally LGTM. Thanks!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐑 maybe next time we're in here, it would be nice to swap to a snapshot testing library rather than the many individual asserts on template outputs

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, Claude chose syrupy and added it

check_required(env, required)
# The timeout is infallible
timeout = int(os.environ["INPUT_GH_TIMEOUT"])
timeout = int(os.environ["INPUT_MAX_INSTANCE_LIFETIME"])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this timeout (passed to DeployInstance below) has different semantics from the previous GH_TIMEOUT. Was your thinking that we don't need a timeout for how long to wait for the instance to come up which is separate from the timeout for the instance lifetime?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Added runner_registration_timeout, essentially replacing the old gh_timeout, and used it here.

Comment on lines +103 to +110
# Determine the default user based on the home directory
if [ "$homedir" = "/home/ubuntu" ]; then
DEFAULT_USER="ubuntu"
elif [ "$homedir" = "/home/ec2-user" ]; then
DEFAULT_USER="ec2-user"
else
DEFAULT_USER="root"
fi
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps DEFAULT_USER=$$(stat -c "%U" "$homedir")?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done.

ryan-williams added a commit that referenced this pull request Aug 16, 2025
Removed workflows not in test branch:
- demo-00-minimal.yml
- demo-gpu-dbg.yml

Added placeholders for workflows from test branch:
- demo-gpu-minimal.yml
- demo-gpu.yml
- demo-multi-instance.yml
- demo-multi-job.yml

All placeholder workflows display a message that they're being developed
in the #2 / rw/hooks branch to avoid confusion.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
ryan-williams added a commit that referenced this pull request Aug 19, 2025
Removed workflows not in test branch:
- demo-00-minimal.yml
- demo-gpu-dbg.yml

Added placeholders for workflows from test branch:
- demo-gpu-minimal.yml
- demo-gpu.yml
- demo-multi-instance.yml
- demo-multi-job.yml

All placeholder workflows display a message that they're being developed
in the #2 / rw/hooks branch to avoid confusion.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ryan-williams ryan-williams force-pushed the rw/hooks branch 7 times, most recently from a90bd76 to 59bd562 Compare August 20, 2025 02:41
ryan-williams added a commit that referenced this pull request Aug 20, 2025
Removed workflows not in test branch:
- demo-00-minimal.yml
- demo-gpu-dbg.yml

Added placeholders for workflows from test branch:
- demo-gpu-minimal.yml
- demo-gpu.yml
- demo-multi-instance.yml
- demo-multi-job.yml

All placeholder workflows display a message that they're being developed
in the #2 / rw/hooks branch to avoid confusion.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@ryan-williams ryan-williams force-pushed the rw/hooks branch 5 times, most recently from db06efe to 5a95a93 Compare August 20, 2025 06:06
@ryan-williams
Copy link
Copy Markdown
Member Author

I "merged" this to by pushing main up to the point that @jder had approved (plus a few commits addressing his CR comments).

I'll open a new PR to add further work I've done since then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants