Warning
DOCS ARE OUTDATED!! This will be fixed soon. This repo is intended to be used for educational purposes only. Conscious decisions have been taken to enable a quick setup with opinionated architecture choices over security (like best practices around handling secret keys for example) to get up and running as a proof-of-concept/learning-lab environment. Please do not attempt to use this for a production setup or anything serious
Normally when non-cluster workloads need to access cluster microservices, those microservices need to be exposed as a LoadBalancer service publically. This both incurs costs due to AWS spinning up a network load-balancer and charging you for it as well as the traffic that passes through it, and in addition is a security risk if any bad actors are in the VPC up to no good to try and intercept traffic.
Tailscale allows us to both solve the connectivity problem and offer defense-in-depth by connecting both sides via Wireguard while also allowing the non-cluster workloads to access all the ClusterIP services in the cluster that would normally not be accessible to it and bypass the need for a cloud load-balancer.
Similarly, when cluster workloads need to access an external non-cluster service like a database or some other service hosted outside the cluster, secure egress access is required as well in order to not have to ask for exceptions for allowance through stringent firewall rules in front of those non-cluster workloads.
Tailscale helps us to solve the connectivity and security issue by proxying workload traffic securely and easily without requiring any firewall modifications to those non-cluster workloads.
In this EKS-focused PoC, we will use everyone's favourite IaC tool Terraform to:
- Spin up a private EKS cluster with VPC CNI, then:
- Install the Tailscale Operator and add it to our tailnet
- Spin up an EC2 instance in the same VPC but different subnet/availability zone, then:
- Run Tailscale on it to add it to our tailnet
- Configure our tailnet with the following:
- SplitDNS to the
kube-dnsservice to resolve and access thenginxClusterIPservice by its cluster FQDN for the search domainsvc.cluster.localfrom the EC2 client instance
- SplitDNS to the
Subnet Router on K8s scenario:
- On the EKS cluster, we will:
- Create a Tailscale pod as a subnet router via the
ConnectorCustom Resource, add it to our tailnet and advertise the cluster's pod & service CIDR routes - Deploy a simple
nginxpod with aClusterIPservice to act as our test server behind the subnet router
- Create a Tailscale pod as a subnet router via the
- On the EC2 instance, we will:
- Accept the advertised routes from the EKS cluster subnet router
- Use
curlbinary as a simple test client to query theClusterIPnginxservice in the EKS cluster that is being advertised by the cluster Tailscale subnet router to verify connectivity via the cluster service FQDN
Egress Service from K8s scenario:
- On the EC2 instance, we will:
- Install Docker and run a simple
nginxserver container on it that is now a external-to-the-cluster service
- Install Docker and run a simple
- On the EKS cluster, we will:
- Create an
ExternalNameservice that points to the Tailscale IPv4 IP of our external-to-the-clusternginxserver running as a standalone container in an EC2 instance - The Tailscale Operator will create a pod that acts as an egress proxy and provides our cluster access to this cluster-external service seamlessly
- We will create a simple client test pod
netshootin the EKS cluster that uses thecurlbinary as a simple client to query thenginxexternal-to-the-cluster-service to validate connectivity
- Create an
- The client EC2 instance first makes a DNS request to the
nginx(server) pod to resolve the FQDNnginx.default.svc.cluster.local, the client's DNS request will first be directed to the split-DNS resolver that is thekube-dnsClusterIPservice that is reachable through the subnet routerConnectorpod that is advertising that entire prefix kube-dnsreturns a response to the client with the IP of thenginxClusterIPservice which is also in the same advertised prefix- Now for the actual HTTP request via
curl, the source IP is the internal interface IP of the client and the destination IP is theClusterIPof thenginxservice that is routed through the subnet router pod as next-hop through the Tailscale overlay tunnel - The subnet router pod receives the request, SNATs the source IP to its own pod IP and sends off the request to
kube-proxyas this is a serviceClusterIP kube-proxyroutes to the appropriatenginxendpoint (normal K8s business)nginxresponds to the subnet router as destination and from there the subnet router knows to send the packet back to the client over the Tailscale overlay tunnel
Note
Since the Connector pod is a single point of failure, there is a Custom Resource ConnectorList to (maybe?) deploy it as a Deployment/Stateful for redundancy. This repo will implement that in the next iteration but for now as a PoC it functions well enough.
- The client
netshootpod on the EKS cluster first makes a DNS request to theExternalNameServiceviakube-dnsto resolve the FQDN (generated by the Tailscale Operator) that is resolved to the user-provided Tailscale IPv4 tailnet-IP of the EC2 instance - The packet from the
netshootpod hops to the Egress proxy pod created by the Tigera Operator. This proxy pod that is also part of the Tailnet then forwards the packet out of the node but encapsulated on the Tailscale overlay network to the destination IPv4 tailnet-IP of the EC2 instance - The packet arrives on the Tailscale interface of the EC2 instance, gets decapsulated and sent to the Docker0 bridge interface IP and finally hits the destination
nginxcontainer at port 80 - The
nginxpod responds by now sending the packet in reverse to it's Docker0 bridge interface and then to the tailscale interface, whereby the packet gets encapsulated and sent across the tunnel back to the proxy pod's Tailscale IPv4 tailnet-IP/interface - The proxy pod is nice enough to hand off the packet to the
netshootpod and thus there is a response to our initial request
Note
Since the proxy pod is a single point of failure, there is a Custom Resource ProxyGroup to deploy it as a StatefulSet for redundancy. This repo will implement that in the next iteration but for now as a PoC it functions well enough.
Disclaimer: I may be way off here but I need to collect do some packet captures to fully understand the packet paths for both scenarios when I get some more time
This repo is organized in sequential sections. Each step will build on top of the other so please follow the order as proposed below. You can start by clicking on Step 1 - Tailscale Admin Portal Setup here.
Step 1 - Tailscale Admin Portal Setup
Step 2 - Local Environment Setup
Step 3 - Terraform Setup and Deploy
Step 4 - Subnet Router Validation/Testing
Step 5 - Egress Service Validation/Testing
Step 6 - Clean-up
- The Tailscale docs don't seem to have a full .spec of how to define the options under
subnetRouter, I can guess but it doesn't need to be like that. I wanted to play with 'no-snat-subnet-router' but unsure how to define the key under the spec and left it for now. - It is unclear whether
tagsandautoApproverscan be injected into the ACL configuration via the tailscale Terraform provider. The description and docs there again need some love. - Same as #2 for creating Oauth client dynamically - maybe this one is locked down to the UI for security but I don't know. With that automation it would help properly create/revoke short-lived oAuth tokens with specific scopes for specific machines (subnet router vs regular Tailscale client)
- Ephemeral authkey support for the tailscale-operator pod would be nice, see this issue
- There is probably something happening with the NAT Gateway endpoint making DERP happen again on new packet because the tailnet doesn't know of its existence. I think there was some experimental flag to do something about that but I will explore it when I have more time.
- EC2 instance needs to have the public-IP removed and switch to Tailscale SSH and MagicDNS. Make it fully private.
- More complex network scenarios/topologies closer to real deployments across VPCs and regions. Testing w/VPC peering and doing multi-cluster stuff with connectors for Ingress/Egress gateway functionality would be cool to setup. See how far we can get before it's all DERP.
- Test with real apps/databases. See how Wireguard throughput/performance is. Try to do some
Locusttesting for maximizing throughput w/multiple client streams. - For the Egress use-case, would probably be better to run the nginx container on the EC2 instance as host-mode instead of bridge mode so that we can see a real client IP from the cluster when we test connectivity rather than the docker0 bridge interface IP.
- Add more meaningful screenshots to this repo but I also don't want it to get too bloated. TBD what the solution is.
"If I have seen a little further today than yesterday, it is only because I stood on the shoulders of giants" - Isaac Newton (paraphrased)
Tailscale+K8s Docs
Terraform Tailscale Provider
Terraform AWS EKS Module
Terraform AWS VPC Module
Terraform-CloudInit-Tailscale Module
Terraform Kubectl Provider
Terraform Kubernetes Provider
"To err is human, to forgive is divine" - Latin proverb
There are probably a lot of mistakes, a lot of jank, and gaps in documenting and explaining this repo. I am always happy to listen and act on constructive feedback given with kind intent to continuously improve. Thank you!
