Skip to content

Task ID wrong in health status event #296

@bartjolenovation

Description

@bartjolenovation

Hi,

we are looking to upgrade marathon, and also marathon consul.
We are testing with Version: 1.4.2 (from apt) and marathon 1.9.

We see that when an application is stopped (by scaling to 0 instances) and then starting by setting 1 instance, the application is not registered in consul (the unregister works fine though). We see the following in the marathon-consul log where I grep the instance UUID:

`grep b98f6bf3-eb61-11e9-af4c-02423f927891 /var/log/marathon-consul.log

time="2019-10-10T13:27:38Z" level=info msg="Got StatusEvent" Id=demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891._app.1 TaskStatus="TASK_STARTING"

time="2019-10-10T13:27:38Z" level=debug msg="Not handled task status" Id=demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891._app.1 taskStatus="TASK_STARTING"

time="2019-10-10T13:27:38Z" level=info msg="Got StatusEvent" Id=demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891._app.1 TaskStatus="TASK_RUNNING"

time="2019-10-10T13:27:38Z" level=debug msg="Not handled task status" Id=demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891._app.1 taskStatus="TASK_RUNNING"

time="2019-10-10T13:27:41Z" level=info msg="Got HealthStatusEvent" Id=demo_hello-world.b98f6bf3-eb61-11e9-af4c-02423f927891

time="2019-10-10T13:27:41Z" level=error msg="Task not found" Id=demo_hello-world.b98f6bf3-eb61-11e9-af4c-02423f927891

time="2019-10-10T13:27:41Z" level=info msg="Got HealthStatusEvent" Id=demo_hello-world.b98f6bf3-eb61-11e9-af4c-02423f927891

time="2019-10-10T13:27:41Z" level=error msg="Task not found" Id=demo_hello-world.b98f6bf3-eb61-11e9-af4c-02423f927891 `

So the "Got HealthStatusEvent" registers the following instanceID: demo_hello-world.b98f6bf3-eb61-11e9-af4c-02423f927891 where we see that the instance ID is demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891

So I thought that the marathon event endpoint returns an incorrect instanceid and went on to capture the traffic with tcpflow (tcpflow -i eth1 -c port 8080 >> ~/tcpdump2) that goes through the events endpoint:

grep b98f6bf3-eb61-11e9-af4c-02423f927891 ~/tcpdump2 data: {"instanceId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891","condition":"Scheduled","runSpecId":"/demo/hello-world","agentId":null,"host":null,"runSpecVersion":"2019-10-10T13:27:36.878Z","timestamp":"2019-10-10T13:27:36.906Z","eventType":"instance_changed_event"} data: {"instanceId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891","condition":"Provisioned","runSpecId":"/demo/hello-world","agentId":"6b0d8829-4681-4638-bbe2-121d950e241e-S0","host":"10.141.141.10","runSpecVersion":"2019-10-10T13:27:36.878Z","timestamp":"2019-10-10T13:27:37.746Z","eventType":"instance_changed_event"} data: {"slaveId":"6b0d8829-4681-4638-bbe2-121d950e241e-S0","taskId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891._app.1","taskStatus":"TASK_STARTING","message":"","appId":"/demo/hello-world","host":"10.141.141.10","ipAddresses":[{"ipAddress":"127.0.1.1","protocol":"IPv4"}],"ports":[31338],"version":"2019-10-10T13:27:36.878Z","eventType":"status_update_event","timestamp":"2019-10-10T13:27:38.063Z"} data: {"instanceId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891","condition":"Starting","runSpecId":"/demo/hello-world","agentId":"6b0d8829-4681-4638-bbe2-121d950e241e-S0","host":"10.141.141.10","runSpecVersion":"2019-10-10T13:27:36.878Z","timestamp":"2019-10-10T13:27:38.063Z","eventType":"instance_changed_event"} data: {"slaveId":"6b0d8829-4681-4638-bbe2-121d950e241e-S0","taskId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891._app.1","taskStatus":"TASK_RUNNING","message":"","appId":"/demo/hello-world","host":"10.141.141.10","ipAddresses":[{"ipAddress":"127.0.1.1","protocol":"IPv4"}],"ports":[31338],"version":"2019-10-10T13:27:36.878Z","eventType":"status_update_event","timestamp":"2019-10-10T13:27:38.769Z"} data: {"instanceId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891","condition":"Running","runSpecId":"/demo/hello-world","agentId":"6b0d8829-4681-4638-bbe2-121d950e241e-S0","host":"10.141.141.10","runSpecVersion":"2019-10-10T13:27:36.878Z","timestamp":"2019-10-10T13:27:38.769Z","eventType":"instance_changed_event"} data: {"appId":"/demo/hello-world","instanceId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891","version":"2019-10-10T13:27:36.878Z","alive":true,"eventType":"health_status_changed_event","timestamp":"2019-10-10T13:27:41.919Z"} data: {"appId":"/demo/hello-world","instanceId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891","version":"2019-10-10T13:27:36.878Z","alive":true,"eventType":"health_status_changed_event","timestamp":"2019-10-10T13:27:41.929Z"} data: {"instanceId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891","runSpecId":"/demo/hello-world","healthy":true,"runSpecVersion":"2019-10-10T13:27:36.878Z","timestamp":"2019-10-10T13:27:41.929Z","eventType":"instance_health_changed_event"}

And I see no instanceID that lacks the "instance-" prefix. I haven't found anything that resembles this problem and am not sure whether this is a marathon or a marathon-consul problem.

The error is thrown by events/event_handler.go (line 142)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions