Skip to content

Chatie API Server Down Accident Report #73

@huan

Description

@huan

Token Service Discovery Service Accident

Our wechaty puppet service discovery service has been experiencing out-of-service issues from 11 am Jun 15.

  1. 11 am: out-of-service due to SSL cert expired
  2. 2 pm: we have noticed this problem in the noon then working on it, and found that the 80 ports of the server can not be reached from the public internet
  3. 2:30 pm: the service is back to service by switching to the Heroku Dynos under a downgraded level because we have to use two dynos to serve more than 1,300 concurrency WebSocket connections. You might notice that the token service sometimes returns 404, you can retry 1-2 times to get the right result. (because the token is registered to one server, but not the other)
  4. 10 pm: the service has been moved back to the Azure server by creating a new server, which fixed the 80 port unreachable problem. (it might be related to the azure bug because we can not make the 80 port to be visitable from the internet)
  5. 11 pm: the server fully restored

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions