Skip to content

Conversation

@justinsb
Copy link
Member

No description provided.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 15, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign justinsb for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 15, 2025
@justinsb justinsb force-pushed the discovery branch 2 times, most recently from 8396a06 to eead354 Compare December 15, 2025 18:52
MinVersion: tls.VersionTLS12,
}

server := &http.Server{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http3 ? I had white wine before

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't get it ... would you like to use http3? Have I opted in to http3 automatically? Do you want us to be the "guinea pig" for http3 in kube (if so, I'm game!)

I don't think there's anything special we need from http3. We do need the client certificate information. I was thinking we would probably end up deploying this directly behind an L4 load balancer, or (failing that) using ingress or gateway with SNI routing.

In terms of backends, right now I have this with a simple in-memory implementation. Honestly that's probably good enough to get started, as we will not be offering any guarantee as to retention of these objects.

But ... if we wanted to do better, I think we should put them into etcd because (1) we should be able to run etcd pretty cheaply and we don't have to worry about wracking up a huge GCS bill if someone figures out how to make us send queries to GCS etc and (2) it means that we can use etcd-operator, which would be good from the "all the wood behind one arrow" perspective

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was half-joking about using http3 for the discovery server but looks like the OIDC protocol is only compatible with HTTP 1.1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can support http3, but let's start with whatever go gives us out of the box (which I think is still http1 or http2)

I do think a controversial one would be to support DNS over HTTP, if you're feeling spicy :-)

@@ -0,0 +1,82 @@
# Using kubectl with Discovery Service

Since the Discovery Service now emulates the Kubernetes API for `DiscoveryEndpoint` resources, you can use `kubectl` to interact with it.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to ask gemini to clean up these instructions / demo scripts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up!

}

// DiscoveryEndpoint represents a registered client in the discovery service.
type DiscoveryEndpoint struct {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should move this to apis/discovery.kops.k8s.io/v1alpha1 for consistency (and probably change the version to v1alpha1)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

{
Name: "discoveryendpoints",
SingularName: "discoveryendpoint",
Namespaced: true,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this probably should be cluster scoped, although I guess we could use the namespace to indicate the cluster if we we wanted to allow multiple clusters to share the same CA (which isn't a terrible idea if someone is doing multicluster).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we consider RBAC in the equation ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this isn't technically kube-apiserver, and I haven't implemented RBAC.

Right now we have this: anyone that has any cert signed by a CA can read the objects for that CA's universe (defined by the hash of the CA public key). You can write an object that matches your own CN only.

I probably need to build out the client side here to better understand what we actually need, whether it's acceptable to have the same CA certificate etc. (e.g. maybe we should only let kubelet certificates register, or maybe we should only let control plane nodes register, or maybe we should create a dedicated CA only for discovery)


require (
k8s.io/apimachinery v0.34.3
k8s.io/client-go v0.34.3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically client-go / apimachinery is only used by the clients / tests, so it might be nice to split them out. But to do that would require a separate go.mod, which is a bit of a pain.

@justinsb justinsb force-pushed the discovery branch 3 times, most recently from 62736c6 to 2136f14 Compare December 16, 2025 14:22
return s
}

func (s *Server) registerRoutes() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we never delete endpoints ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently, no. You're right - I should add a TTL. (Maybe 2 hours, and then we can have nodes register every hour?). It won't be a "hard" TTL - we reserve the rights to remove objects at any time (and I should add that to the README.md / GEMINI.md)

I should probably also add explicit deletion support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants