Skip to content

Replace lifecycled and self-termination with ASG lifecycle hooks #964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 42 commits into from
Closed
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
da91dd9
Add MaxInstanceLifetime
nitrocode May 6, 2021
155478b
Remove extra spacing
nitrocode May 6, 2021
62c209b
Merge branch 'master' into MaxInstanceLifetime
nitrocode Sep 14, 2021
d06ab5e
Merge branch 'master' into MaxInstanceLifetime
nitrocode Sep 24, 2021
913a1b1
Remove disconnect after idle setting
keithduncan Nov 22, 2021
b458f03
Remove terminate instance after agent exit
keithduncan Nov 22, 2021
3fca7f3
Bump agent scaler and enable scale in
keithduncan Nov 22, 2021
be157f4
Remove lifecycled
keithduncan Nov 22, 2021
3a259c7
Don't start the agent on boot
keithduncan Nov 22, 2021
974946d
Remove scale in protection
keithduncan Nov 22, 2021
462623d
Remove AZ rebalance suspension lambda
keithduncan Nov 22, 2021
51188df
Add skeleton of hooks and lambdas to respond to them
keithduncan Nov 22, 2021
bb40e70
Give instances permission to complete lifecycle actions
keithduncan Nov 22, 2021
5a23ec2
Make the terminate hook continue by default
keithduncan Nov 22, 2021
9b72181
Fix typo in SSM IAM action
keithduncan Nov 22, 2021
921301a
Remove lambdas
keithduncan Nov 22, 2021
ba2fd09
Add an SSM role and boot hook automation
keithduncan Nov 22, 2021
9268219
Remove instance permission to complete lifecycle actions
keithduncan Nov 22, 2021
9700a02
Add a terminate instance ssm automation
keithduncan Nov 22, 2021
42d6860
Move the automation role
keithduncan Nov 22, 2021
aa1deb9
Add EventBridge rules to route ASG lifecycle events to the SSM automa…
keithduncan Nov 22, 2021
56ae297
Add account id field to the ssm document ARN
keithduncan Nov 22, 2021
5244f5d
Add ssm:CreateDocument permission to the service role
keithduncan Nov 22, 2021
a02572f
Fix capitalisation of the SSM document aws:runCommand
keithduncan Nov 22, 2021
c1f101e
Add more ssm document permissions
keithduncan Nov 22, 2021
448e169
Add tag actions
keithduncan Nov 22, 2021
22fa12b
Make systemctl stop wait 1 hour for the process to exit
keithduncan Nov 23, 2021
63e9a79
Fix event pattern structure
keithduncan Nov 23, 2021
223cb37
Add missing iam permission
keithduncan Nov 23, 2021
d1c312b
Strings strings strings
keithduncan Nov 23, 2021
624ea72
Add more automation role iam requirements
keithduncan Nov 23, 2021
8bb0941
Add windows support for ssm automations
keithduncan Nov 23, 2021
ca1e1c4
Limit the event match to the specific hook we have created
keithduncan Nov 23, 2021
1352c0d
Make the rules for boot and automation depend on the hooks
keithduncan Nov 23, 2021
8ae7f15
Add a spot interruption rule and automation that terminates the instance
keithduncan Nov 23, 2021
d839604
Fix reference to the shutdown hook from the shutdown rule
keithduncan Nov 23, 2021
c1b4421
Make the windows ssm start command timeout 10 minutes
keithduncan Nov 23, 2021
5675c98
Add comment to spot interruption event schema
keithduncan Nov 23, 2021
d48e9a0
Remove auto scaling group name from the TerminateInstanceInAutoScalin…
keithduncan Nov 25, 2021
2d30b78
Merge remote-tracking branch 'nitrocode/MaxInstanceLifetime' into kei…
keithduncan Nov 25, 2021
d50bb4b
Enable capacity rebalancing
keithduncan Nov 25, 2021
b4dde06
Give the automation role permission to invoke autoscaling:TerminateIn…
keithduncan Nov 25, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 0 additions & 7 deletions goss.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,6 @@ service:
enabled: true
running: true

lifecycled:
enabled: true
running: true

sshd:
enabled: true
running: true
Expand Down Expand Up @@ -93,9 +89,6 @@ process:
buildkite-agent:
running: true

lifecycled:
running: true

sshd:
running: true

Expand Down
4 changes: 0 additions & 4 deletions packer/linux/buildkite-ami.json
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,6 @@
"type": "shell",
"script": "scripts/install-cloudwatch-agent.sh"
},
{
"type": "shell",
"script": "scripts/install-lifecycled.sh"
},
{
"type": "shell",
"script": "scripts/install-docker.sh"
Expand Down
11 changes: 0 additions & 11 deletions packer/linux/conf/bin/bk-install-elastic-stack.sh
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,6 @@ experiment="${BUILDKITE_AGENT_EXPERIMENTS}"
priority=%n
spawn=${BUILDKITE_AGENTS_PER_INSTANCE}
no-color=true
disconnect-after-idle-timeout=${BUILDKITE_SCALE_IN_IDLE_PERIOD}
disconnect-after-job=${BUILDKITE_TERMINATE_INSTANCE_AFTER_JOB}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need a story for this

EOF

Expand All @@ -203,15 +202,6 @@ if [[ -n "${BUILDKITE_ELASTIC_BOOTSTRAP_SCRIPT}" ]] ; then
rm /tmp/elastic_bootstrap
fi

cat << EOF > /etc/lifecycled
AWS_REGION=${AWS_REGION}
LIFECYCLED_HANDLER=/usr/local/bin/stop-agent-gracefully
LIFECYCLED_CLOUDWATCH_GROUP=/buildkite/lifecycled
EOF

systemctl enable lifecycled.service
systemctl start lifecycled

# wait for docker to start
next_wait_time=0
until docker ps || [ $next_wait_time -eq 5 ]; do
Expand All @@ -224,7 +214,6 @@ if ! docker ps ; then
fi

systemctl enable "buildkite-agent"
systemctl start "buildkite-agent"

# let the stack know that this host has been initialized successfully
/opt/aws/bin/cfn-signal \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,8 @@ Environment="HOME=/var/lib/buildkite-agent"
Environment="PATH=/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"
Environment="USER=buildkite-agent"
ExecStart=/usr/bin/buildkite-agent start
ExecStopPost=/usr/local/bin/terminate-instance
RestartSec=5
Restart=on-failure
Restart=always
RestartForceExitStatus=SIGPIPE
TimeoutStartSec=10
TimeoutStopSec=0
Expand Down
22 changes: 0 additions & 22 deletions packer/linux/scripts/install-lifecycled.sh

This file was deleted.

4 changes: 0 additions & 4 deletions packer/windows/buildkite-ami.json
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,6 @@
"type": "powershell",
"script": "scripts/install-cloudwatch-agent.ps1"
},
{
"type": "powershell",
"script": "scripts/install-lifecycled.ps1"
},
{
"type": "powershell",
"script": "scripts/install-docker.ps1"
Expand Down
11 changes: 1 addition & 10 deletions packer/windows/conf/bin/bk-install-elastic-stack.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -135,15 +135,10 @@ priority=%n
spawn=${Env:BUILDKITE_AGENTS_PER_INSTANCE}
no-color=true
shell=powershell
disconnect-after-idle-timeout=${Env:BUILDKITE_SCALE_IN_IDLE_PERIOD}
disconnect-after-job=${Env:BUILDKITE_TERMINATE_INSTANCE_AFTER_JOB}
"@
$OFS=" "

nssm set lifecycled AppEnvironmentExtra :AWS_REGION=$Env:AWS_REGION
nssm set lifecycled AppEnvironmentExtra +LIFECYCLED_HANDLER="C:\buildkite-agent\bin\stop-agent-gracefully.ps1"
Restart-Service lifecycled

# wait for docker to start
$next_wait_time=0
do {
Expand Down Expand Up @@ -212,13 +207,9 @@ nssm set buildkite-agent AppEnvironmentExtra :HOME=C:\buildkite-agent
If ($lastexitcode -ne 0) { Exit $lastexitcode }
nssm set buildkite-agent AppExit Default Restart
If ($lastexitcode -ne 0) { Exit $lastexitcode }
nssm set buildkite-agent AppRestartDelay 10000
If ($lastexitcode -ne 0) { Exit $lastexitcode }
nssm set buildkite-agent AppEvents Exit/Post "powershell C:\buildkite-agent\bin\terminate-instance.ps1"
nssm set buildkite-agent AppRestartDelay 5000
If ($lastexitcode -ne 0) { Exit $lastexitcode }

Restart-Service buildkite-agent

# renable debug tracing
Set-PSDebug -Trace 2

Expand Down
19 changes: 0 additions & 19 deletions packer/windows/scripts/install-lifecycled.ps1

This file was deleted.

Loading