-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Hi,
we are looking to upgrade marathon, and also marathon consul.
We are testing with Version: 1.4.2 (from apt) and marathon 1.9.
We see that when an application is stopped (by scaling to 0 instances) and then starting by setting 1 instance, the application is not registered in consul (the unregister works fine though). We see the following in the marathon-consul log where I grep the instance UUID:
`grep b98f6bf3-eb61-11e9-af4c-02423f927891 /var/log/marathon-consul.log
time="2019-10-10T13:27:38Z" level=info msg="Got StatusEvent" Id=demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891._app.1 TaskStatus="TASK_STARTING"
time="2019-10-10T13:27:38Z" level=debug msg="Not handled task status" Id=demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891._app.1 taskStatus="TASK_STARTING"
time="2019-10-10T13:27:38Z" level=info msg="Got StatusEvent" Id=demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891._app.1 TaskStatus="TASK_RUNNING"
time="2019-10-10T13:27:38Z" level=debug msg="Not handled task status" Id=demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891._app.1 taskStatus="TASK_RUNNING"
time="2019-10-10T13:27:41Z" level=info msg="Got HealthStatusEvent" Id=demo_hello-world.b98f6bf3-eb61-11e9-af4c-02423f927891
time="2019-10-10T13:27:41Z" level=error msg="Task not found" Id=demo_hello-world.b98f6bf3-eb61-11e9-af4c-02423f927891
time="2019-10-10T13:27:41Z" level=info msg="Got HealthStatusEvent" Id=demo_hello-world.b98f6bf3-eb61-11e9-af4c-02423f927891
time="2019-10-10T13:27:41Z" level=error msg="Task not found" Id=demo_hello-world.b98f6bf3-eb61-11e9-af4c-02423f927891 `
So the "Got HealthStatusEvent" registers the following instanceID: demo_hello-world.b98f6bf3-eb61-11e9-af4c-02423f927891 where we see that the instance ID is demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891
So I thought that the marathon event endpoint returns an incorrect instanceid and went on to capture the traffic with tcpflow (tcpflow -i eth1 -c port 8080 >> ~/tcpdump2) that goes through the events endpoint:
grep b98f6bf3-eb61-11e9-af4c-02423f927891 ~/tcpdump2 data: {"instanceId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891","condition":"Scheduled","runSpecId":"/demo/hello-world","agentId":null,"host":null,"runSpecVersion":"2019-10-10T13:27:36.878Z","timestamp":"2019-10-10T13:27:36.906Z","eventType":"instance_changed_event"} data: {"instanceId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891","condition":"Provisioned","runSpecId":"/demo/hello-world","agentId":"6b0d8829-4681-4638-bbe2-121d950e241e-S0","host":"10.141.141.10","runSpecVersion":"2019-10-10T13:27:36.878Z","timestamp":"2019-10-10T13:27:37.746Z","eventType":"instance_changed_event"} data: {"slaveId":"6b0d8829-4681-4638-bbe2-121d950e241e-S0","taskId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891._app.1","taskStatus":"TASK_STARTING","message":"","appId":"/demo/hello-world","host":"10.141.141.10","ipAddresses":[{"ipAddress":"127.0.1.1","protocol":"IPv4"}],"ports":[31338],"version":"2019-10-10T13:27:36.878Z","eventType":"status_update_event","timestamp":"2019-10-10T13:27:38.063Z"} data: {"instanceId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891","condition":"Starting","runSpecId":"/demo/hello-world","agentId":"6b0d8829-4681-4638-bbe2-121d950e241e-S0","host":"10.141.141.10","runSpecVersion":"2019-10-10T13:27:36.878Z","timestamp":"2019-10-10T13:27:38.063Z","eventType":"instance_changed_event"} data: {"slaveId":"6b0d8829-4681-4638-bbe2-121d950e241e-S0","taskId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891._app.1","taskStatus":"TASK_RUNNING","message":"","appId":"/demo/hello-world","host":"10.141.141.10","ipAddresses":[{"ipAddress":"127.0.1.1","protocol":"IPv4"}],"ports":[31338],"version":"2019-10-10T13:27:36.878Z","eventType":"status_update_event","timestamp":"2019-10-10T13:27:38.769Z"} data: {"instanceId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891","condition":"Running","runSpecId":"/demo/hello-world","agentId":"6b0d8829-4681-4638-bbe2-121d950e241e-S0","host":"10.141.141.10","runSpecVersion":"2019-10-10T13:27:36.878Z","timestamp":"2019-10-10T13:27:38.769Z","eventType":"instance_changed_event"} data: {"appId":"/demo/hello-world","instanceId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891","version":"2019-10-10T13:27:36.878Z","alive":true,"eventType":"health_status_changed_event","timestamp":"2019-10-10T13:27:41.919Z"} data: {"appId":"/demo/hello-world","instanceId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891","version":"2019-10-10T13:27:36.878Z","alive":true,"eventType":"health_status_changed_event","timestamp":"2019-10-10T13:27:41.929Z"} data: {"instanceId":"demo_hello-world.instance-b98f6bf3-eb61-11e9-af4c-02423f927891","runSpecId":"/demo/hello-world","healthy":true,"runSpecVersion":"2019-10-10T13:27:36.878Z","timestamp":"2019-10-10T13:27:41.929Z","eventType":"instance_health_changed_event"}
And I see no instanceID that lacks the "instance-" prefix. I haven't found anything that resembles this problem and am not sure whether this is a marathon or a marathon-consul problem.
The error is thrown by events/event_handler.go (line 142)