Description
/area networking
What version of Knative?
1.15.0
Expected Behavior
Legacy applications may have undefined behavior when HTTP/2 upgrade requests are made. Knative should gracefully handle those errors and downgrade the health check attempt to HTTP/1 or HTTP/1.1.
Actual Behavior
Applications which do not support HTTP/2 will not handle the upgrade request properly. In our case, a legacy application returns a 500 when OPTIONS
are sent to upgrade the connection. Knative fails the entire healthcheck because of this, even if the same check over HTTP/1 or HTTP/1.1 will properly return a 200.
Steps to Reproduce the Problem
- Create an application which does not support HTTP/2 or returns a 500 on the OPTIONS request
- Notice that Knative will start failing the health checks and the pod will be killed
Additional Context
It is not within the Kubernetes spec that an application must support HTTP/2 or that it should expect an OPTIONS
call to its health/liveness probes. Only GET
is part of the contract, which the Queue Proxy does not follow.
I believe the logic is flawed in the queue proxy's HTTP probes here.
serving/pkg/queue/health/probe.go
Line 155 in 873602a
When an error occurs during the upgrade, maxProto should be set to 1 and Knative should stop trying to make HTTP/2 requests. Currently because of this line, HTTP/2 will be retried indefinitely and HTTP/1 will never be attempted.