This is the documentation - and executable code! - for the Service Mesh Academy "Linkerd in Production" workshop. The easiest way to use this file is to execute it with demosh.
Things in Markdown comments are safe to ignore when reading this later. When
executing this with demosh, things after the horizontal rule below (which is
just before a commented @SHOW directive) will get displayed.
This workshop requires that you have a running Kubernetes cluster with at
least three Nodes. This demo assumes that you're using a Civo cluster for
this, but pretty much any cloud provider should work as long as your cluster
has at least three Nodes. This demo also assumes that your cluster is called
sma – if you named it something else, you can either substitute its name for
sma in the commands below, or use kubectl config rename-context to rename
your cluster's context to match.
Start by installing cert-manager.
helm repo add jetstack https://charts.jetstack.io --force-update
helm repo update
helm install \
cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set installCRDs=true \
--waitNow we have cert-manager running in our cluster, ready to manage
certificates for us.
This is not what you really want to do in production. This one bit is still just a demo.
In real-world production, you really don't want to ever have the trust
anchor's private key present in your cluster at all: instead, you want to let
cert-manager hand off a CSR to your off-cluster CA and get a signed
certificate. cert-manager supports several different mechanisms here,
including Vault, Venafi, etc.
All of those mechanisms are very much out of scope for this SMA, so we're going to load the trust anchor's private key into the cluster. Again, don't do this in the real world.
bash non-prod-trust-anchor.shAt this point, we have a TLS Secret for our trust anchor certificate:
kubectl get secret -n linkerd linkerd-trust-anchorWe also have a cert-manager Issuer called linkerd-trust-anchor that will
issue certs signed by the linkerd-trust-anchor Secret.
kubectl get issuer -n linkerd -o yaml linkerd-trust-anchor | bat -l yamlNext, we tell cert-manager how to use our linkerd-trust-anchor Issuer to
create identity issuer certificates. This is how you'll do things in
production -- you'd define the linkerd-trust-anchor Issuer differently, but
you'd use it the same way.
bat cert-manager.yaml
kubectl apply -f cert-manager.yamlWe should now see the identity issuer certificate ready to go:
#@immed
rm -rf linkerd-control-plane
kubectl get certificate -n linkerd
kubectl get secret -n linkerd linkerd-identity-issuerWe're going to use Helm to install Linkerd in HA mode. We'll start by grabbing
the Helm chart so we can take a look at values-ha.yaml:
helm fetch --untar linkerd/linkerd-control-plane
bat linkerd-control-plane/values-ha.yamlGiven values-ha.yaml, we can install Linkerd with Helm. First up, install
the CRDs.
helm install linkerd-crds -n linkerd linkerd/linkerd-crdsNext up, install the Linkerd control plane. Note the -f parameter including
values-ha.yaml, so that we install in HA mode.
Also note that we're passing the public half of the trust anchor to Helm, so it can update the trust anchor bundle that Linkerd uses for workload identity verification. This is also something that may need to change when you're using a proper off-cluster CA.
helm install linkerd-control-plane -n linkerd \
--set-file identityTrustAnchorsPEM=./ca.crt \
--set identity.issuer.scheme=kubernetes.io/tls \
-f linkerd-control-plane/values-ha.yaml \
linkerd/linkerd-control-planeOnce Helm says we're good, let's make sure everything is really on the level:
linkerd checkWe can also take a look to verify that we really do have multiple Nodes and multiple control plane replicas:
kubectl get nodes
kubectl get pods -n linkerdAnd, if we're paranoid, we can verify that no two replicas for a single Deployment share the same Node:
kubectl get pod -n linkerd -o go-template='{{ range .items }}{{ .metadata.name}}: {{ .spec.nodeName }}{{"\n"}}{{end}}'Well... Linkerd is installed in HA mode, cert-manager is handling rotating the identity issuer every 48 hours... as far as installing Linkerd in a production-ready way, this really is pretty much all there is to it.
Next steps would be installing your application, setting up policy, etc. Policy is out of scope for this, but let's go ahead and install emojivoto to show a touch of debugging with. There's nothing dramatic here: we're just doing a straightforward install using auto-injection.
kubectl create ns emojivoto
kubectl annotate ns emojivoto linkerd.io/inject=enabled
kubectl apply -f https://run.linkerd.io/emojivoto.yml
kubectl wait pod --for=condition=ready -n emojivoto --allAt the most basic level, Linkerd is just another Kubernetes workload, so the place to start with getting a sense of what's up is with events:
kubectl get event -n emojivoto --sort-by="{.lastTimestamp}" | tail -20We'll probably see IssuedLeafCertificate events above -- these get posted
when Linkerd issues workload identity certificates, so if they're missing,
it's a problem. Let's make sure we see those:
kubectl get event -n emojivoto --field-selector reason=IssuedLeafCertificateWe should see four, one for each relevant ServiceAccount.
The logs can also be useful. Let's take a quick look at the logs for the
Linkerd identity workload, linkerd-identity.
IDPOD=$(kubectl get pods -n linkerd -l 'linkerd.io/control-plane-component=identity' -o jsonpath='{ .items[0].metadata.name }')
#@print "# Found identity pod ${IDPOD}"
kubectl logs -n linkerd ${IDPOD} | head -10linkerd-identity is responsible for managing workload identity, so it makes
sense that we see things about identities in its logs -- but note that it
mentioned other containers, too. Checking those quickly...
kubectl logs -n linkerd ${IDPOD} -c linkerd-proxy | head -10The linkerd-proxy container deals with... proxying things. You may see
transient errors here (Kubernetes is only eventually consistent, after all),
but persistent errors can point to real problems.
kubectl logs -n linkerd ${IDPOD} -c linkerd-init | head -10The linkerd-init container deals with network configuration at startup --
and a special note here is that this can be very different if you're using the
Linkerd CNI plugin! We're not, though, so here we see the init container
messing with kernel routing on our behalf.
One last note: let's take a look at the logs for one of our emojivoto containers.
EMOJIPOD=$(kubectl get pods -n emojivoto -l 'app=emoji-svc' -o jsonpath='{ .items[0].metadata.name }')
#@print "# Found emoji-svc pod ${EMOJIPOD}"
kubectl logs -n emojivoto ${EMOJIPOD} | head -10Note that, by default, we get the linkerd-proxy container. Although it's
nice to see what identities it's using, this may well not be what you're
interested in -- it's worth remembering that you may well need to be explicit
about the container you want:
kubectl logs -n emojivoto ${EMOJIPOD} -c emoji-svc | head -10We'll take a quick look at two other debugging tools: the linkerd identity and linkerd diagnostic commands.
linkerd identity is a bit simpler, so let's take a look at it first. Its
purpose in life is to show you what identity Linkerd is using for a given
workload. For example, we can look at the identity in use for the emoji-svc
workload -- the output is a dump of the workload's identity certificate:
linkerd identity -n emojivoto -l app=emoji-svc | moreThere's a lot of detail there, so it can be instructive just to zoom in on the human-readable parts:
linkerd identity -n emojivoto -l app=emoji-svc | grep CN=which shows us that the emoji-svc uses an identity named
emoji.emojivoto.serviceaccount.identity.linkerd.cluster.local, issued by
identity.linkerd.cluster.local (AKA the Linkerd identity issuer).
An aside: the control plane components have identities too! For example:
linkerd identity -n linkerd -l linkerd.io/control-plane-component=identity \
| grep CN=We see multiple outputs because there are multiple replicas for this workload,
but we can clearly see that the linkerd-identity controller has its own
identity (and that this identity is the same across all the replicas).
linkerd diagnostics has a few powerful functions:
linkerd diagnostics proxy-metricswill fetch low-level metrics directly from Linkerd proxies.linkerd diagnostics controller-metricsdoes the same, but from control plane components.linkerd diagnostics endpointswill show you what endpoints Linkerd believes are alive for a given destination.linkerd diagnostics policywill show you about active 2.13 policy.
These tend to be very, very verbose: get used to using grep.
Let's start with a simple one: what endpoints are active for the emoji-svc?
linkerd diagnostics endpoints emoji-svc.emojivoto.svc.cluster.local:8080This shows us a single active endpoint. Note that you use the fully-qualified DNS name of the Service, plus the port you're interested in.
-
Only active endpoints will be shown: if, for example, one replica is being fastfailed, it will not appear in this list.
-
Policy is not taken into account here: if, for example, you're using an HTTPRoute to divert all the traffic going to a given Service, the active endpoints listed here won't change.
We'll take a quick look at proxy-metrics too:
linkerd diagnostics proxy-metrics po/${EMOJIPOD} -n emojivoto | moreThis is... basically a firehose. There are a lot of metrics. The great win
about the linkerd diagnostics proxy-metrics is that it gives you a way to
check metrics even if your metrics aggregator isn't working. For example, if
you're trying to set up your own Prometheus and you don't see any metrics,
this is the single best way to cross-check what's going on.
We're not going to show linkerd diagnostics controller-metrics because it's
pretty much like proxy-metrics, and we're not going to show linkerd diagnostics policy here because it's covered in the SMA on Linkerd 2.13+
circuit breaking and dynamic routing (at
https://buoyant.io/service-mesh-academy/circuit-breaking-and-dynamic-routing-deep-dive).
So that's a wrap on our quick dive into production Linkerd -- thanks!