Description
Hi all,
I am using the CDK to deploy an EC2-backed ECS cluster with two of my application services and an XRay sidecar service. When I deploy, the XRay sidecar tasks continually restart (it seems every 30 minutes or so) and never reach a healthy state, keeping the CloudFormation stack stuck in UPDATE_IN_PROGRESS
or UPDATE_ROLLBACK_IN_PROGRESS
if I attempt to cancel the change. I can find no documentation stating if the container is performing its own healthchecks and restarting, and unfortunately, the ECS task has no logs to explain the exits.
Here is my CDK code provisioning the cluster, the service discovery DNS namespace, the server task, and the service.
const vpc = new Vpc(this, `test-vpc`);
const cluster = new ECS.Cluster(this, `test-cluster`, {
clusterName: clusterId,
vpc,
capacity: {
instanceType: new EC2.InstanceType(props.clusterInstanceType),
},
containerInsights: true,
});
const namespace = 'testing';
const dnsNamespace = new ServiceDiscovery.PrivateDnsNamespace(this, `dns-namespace`, {
vpc,
name: namespace,
});
const serverTaskDefinition = new ECS.TaskDefinition(this, serverTaskId, {
compatibility: ECS.Compatibility.EC2,
});
serverTaskDefinition.addContainer('ServerContainer', {
image: ContainerImage.fromEcrRepository(serverRepo, latestTag),
containerName: 'server',
memoryReservationMiB: 1024,
portMappings: [
{
containerPort: Port.HTTP,
}
],
healthCheck: {
command: [ `curl localhost/healthcheck` ],
interval: cdk.Duration.seconds(300),
},
});
const xraySidecarTaskDefinition = new ECS.TaskDefinition(this, `xray-sidecar`, {
compatibility: ECS.Compatibility.EC2,
});
xraySidecarTaskDefinition.addContainer('xray-sidecar-task', {
containerName: XRayConfig.containerName,
image: ContainerImage.fromRegistry("amazon/aws-xray-daemon"),
cpu: 32,
memoryReservationMiB: 256,
environment: {
AWS_XRAY_DAEMON_ADDRESS: `${XRayConfig.containerName}.${namespace}:${XRayConfig.port}`,
},
portMappings: [
{
hostPort: XRayConfig.port,
containerPort: XRayConfig.port,
protocol: ECS.Protocol.UDP,
}
]
});
const xraySidecarService = new ECS.Ec2Service(this, `xray-sidecar-service`, {
taskDefinition: xraySidecarTaskDefinition,
cluster,
desiredCount: 1,
cloudMapOptions: {
cloudMapNamespace: dnsNamespace,
name: XRayConfig.containerName,
containerPort: XRayConfig.port,
dnsRecordType: ServiceDiscovery.DnsRecordType.SRV,
},
});
I would really appreciate if anyone could point out an issue with my CDK approach, or if there is something I've missed in the docs explaining how to prevent the XRay sidecar process from exiting. It seems to me like if there is some healthcheck inside the container (i.e. UDP packets must be received within 30 seconds of the process starting) that, when that healthcheck fails, there should be an error logged to push users in the right direction.
Thanks for any advice you can offer!