-
Notifications
You must be signed in to change notification settings - Fork 70
Description
api.data.gov holds open keepalive connections to the API backends to improve our proxying performance. This week one of the NREL APIs changed the underlying routing of the API backend server api.data.gov was pointing to (so no change on api.data.gov's side, but just on the NREL load balancer's side). However, api.data.gov requests continued to hit older versions of the API somewhat randomly. I think I eventually figured out this was due to api.data.gov holding open keepalive connections that the NREL load balancer was still routing to the old APIs behind the scenes. While this ultimately seems like an issue the NREL load balancer needs to fix (since any external client could really be holding open keepalive connections indefinitely), I was eventually able to fix things by restarting the api.data.gov processes to establish new connections, which I believe confirmed this was due to api.data.gov holding open keepalive connections indefinitely.
So while I don't necessarily view this as an api.data.gov issue per-say, api.data.gov holding open keepalive connections for multiple hours is perhaps a more niche use-case, so I think we could tune our system so we don't hold open connections indefinitely. This should have a negligible performance impact if we cap the keepalive time at 15-30 minutes since reestablishing the TCP connection on that duration shouldn't really be noticeable, and the vast majority of connections would still utilize keepalive connections when possible. So while this issue may be rare for an API backend to be misbehaving like this (and the backend load balancer should probably fix this), I think it could be worth capping our keepalive time just to ensure we don't indefinitely route to an old location in the face of unexpected backend behavior like this. I think this could be accomplished by just configuring the max_connection_duration setting on our end.