Skip to content

ECS task continually restarting, freezing CloudFormation deployments #201

Open
@nate-anderson

Description

@nate-anderson

Hi all,

I am using the CDK to deploy an EC2-backed ECS cluster with two of my application services and an XRay sidecar service. When I deploy, the XRay sidecar tasks continually restart (it seems every 30 minutes or so) and never reach a healthy state, keeping the CloudFormation stack stuck in UPDATE_IN_PROGRESS or UPDATE_ROLLBACK_IN_PROGRESS if I attempt to cancel the change. I can find no documentation stating if the container is performing its own healthchecks and restarting, and unfortunately, the ECS task has no logs to explain the exits.

Here is my CDK code provisioning the cluster, the service discovery DNS namespace, the server task, and the service.

const vpc = new Vpc(this, `test-vpc`);

const cluster = new ECS.Cluster(this, `test-cluster`, {
    clusterName: clusterId,
    vpc,
    capacity: {
      instanceType: new EC2.InstanceType(props.clusterInstanceType),
    },
    containerInsights: true,
});

const namespace = 'testing';
const dnsNamespace = new ServiceDiscovery.PrivateDnsNamespace(this, `dns-namespace`, {
    vpc,
    name: namespace,
});

const serverTaskDefinition = new ECS.TaskDefinition(this, serverTaskId, {
    compatibility: ECS.Compatibility.EC2,
});

serverTaskDefinition.addContainer('ServerContainer', {
    image: ContainerImage.fromEcrRepository(serverRepo, latestTag),
    containerName: 'server',
    memoryReservationMiB: 1024,
    portMappings: [
        {
            containerPort: Port.HTTP,
        }
    ],
    healthCheck: {
        command: [ `curl localhost/healthcheck` ],
        interval: cdk.Duration.seconds(300),  
    },
});

const xraySidecarTaskDefinition = new ECS.TaskDefinition(this, `xray-sidecar`, {
    compatibility: ECS.Compatibility.EC2,
});

xraySidecarTaskDefinition.addContainer('xray-sidecar-task', {
    containerName: XRayConfig.containerName,
    image: ContainerImage.fromRegistry("amazon/aws-xray-daemon"),
    cpu: 32,
    memoryReservationMiB: 256,
    environment: {
            AWS_XRAY_DAEMON_ADDRESS: `${XRayConfig.containerName}.${namespace}:${XRayConfig.port}`,
    },
    portMappings: [
        {
            hostPort: XRayConfig.port,
            containerPort: XRayConfig.port,
            protocol: ECS.Protocol.UDP,
        }
    ]
});

const xraySidecarService = new ECS.Ec2Service(this, `xray-sidecar-service`, {
    taskDefinition: xraySidecarTaskDefinition,
    cluster,
    desiredCount: 1,
    cloudMapOptions: {
        cloudMapNamespace: dnsNamespace,
        name: XRayConfig.containerName,
        containerPort: XRayConfig.port,
        dnsRecordType: ServiceDiscovery.DnsRecordType.SRV,
    },
});

I would really appreciate if anyone could point out an issue with my CDK approach, or if there is something I've missed in the docs explaining how to prevent the XRay sidecar process from exiting. It seems to me like if there is some healthcheck inside the container (i.e. UDP packets must be received within 30 seconds of the process starting) that, when that healthcheck fails, there should be an error logged to push users in the right direction.

Thanks for any advice you can offer!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions