Description
Background and Motivation
An important gRPC feature in .NET 6 is adding client side load balancing. Issue discussing why is here - dotnet/core#5495.
tldr: Kubernetes pods can scale to multiple instances. Kubernetes built in load balancer is L4. That means it doesn't distribute load well with HTTP/2 because of multiplexing.
Part of load balancing is discovering whether a connection can be successfully established with the server. For example, a client might be configured with 3 endpoints, and it will use the first endpoint it can successfully connect to.
The gRPC spec for connectivity status is here: https://github.com/grpc/grpc/blob/master/doc/connectivity-semantics-and-api.md
Proposed API
Add a new method to HttpClient that can be used to establish a connection to an endpoint without making a HTTP request.
Also the returned HttpConnection provides the ability to track the state of the connection to the server.
public class SocketsHttpHandler
{
public Task<HttpConnection> ConnectAsync(Uri, CancellationToken);
}
public class HttpConnection
{
public Uri ServerUri { get; }
public ConnectionState State { get; }
// Plus an API that allows you to subscribe to state change updates.
// I copied this off https://docs.microsoft.com/en-us/dotnet/api/system.threading.cancellationtoken.register
public IDisposable StateChanged(Action action);
public IDisposable StateChanged(Action<object> action, Object state);
}
public enum ConnectionState
{
// ...
}
The returned connection would represent the HTTP connection to the server. So for HTTP/2 and HTTP/3 a successfully connected status would include exchanging SETTINGS frames.
Things I'm not sure about:
- Is the returned connection an abstraction over multiple internal connections to the server? For example, there could be multiple HTTP/2 connections to
http://localhost
. If any have a state of Connected then the abstraction has a state of Connected? - Similar to question above, how would this feature work in HTTP/1.1 (TCP connection per call) and HTTP/3 (UDP world)
- You have a connection that is no longer connected. How to reconnect? Do you call ConnectAsync again with the same Uri? Does it return the same HttpConnection instance?
- Should ConnectAsync be async and start a connection? Or should it sync and return a HttpConnection that then has a ConnectAsync method on it?
Usage Examples
Basic usage (this would allow GrpcChannel to support a ConnectAsync method):
var client = new HttpClient();
var connection = await client.ConnectAsync("https://localhost");
if (connection.State != ConnectionState.Connected)
{
throw new InvalidOperationException("Could not connect to server.");
}
// Request made using previous established connection (open connection fetched from pool using current behavior)
var response = await client.GetAsync("https://localhost/settings.json");
Basic load balancing that uses the first successful connection (this is called a pick first strategy):
var client = new HttpClient();
var endpoints = new string[] { "https://localhost", "https://localhost:5000", "https://localhost:5001" };
foreach (var endpoint in endpoints)
{
var connection = await client.ConnectAsync(endpoint);
if (connection.State == ConnectionState.Connected)
{
return await client.GetAsync(endpoint + "/settings.json");
}
}
throw new InvalidOperationException("Could not connect to the configured servers");
Risks
In the future we might want to implement channelz. channelz is about collecting gRPC call stats. You can view calls made down to a subchannel level (subchannel = individual TCP connection to a server). There hasn't been much demand for it (one issue with no upvotes) so right now it doesn't seem important.
Should consider a design that doesn't block channelz in the future.