A guide to making it easier to develop weave-gitops-enterprise
. If you came here expecting but not finding an answer please make an issue to help improve these docs!
Weave GitOps Enterprise (WGE) is packaged as a Helm chart and currently consists of the following components:
clusters-service
The API of WGE. This is the component that backend engineers will be changing most often. Uses the grpc-gateway to convert our gRPC service definitions to HTTP endpoints. Also imports API handlers from other projects (Weave GitOps among others) and exposes them to consumers.ui-server
The UI of WGE. This is the component that frontend engineers will be changing most often. Built in React and uses yarn as the package manager.- cluster-bootstrap-controller Allows for custom Jobs to be executed on newly provisioned CAPI clusters. Most often, this will be used to install CNI which CAPI does not install. Without this controller, newly provisioned clusters would not be ready to be used by end users. Because it also references the CAPI CRD, it requires CAPI tooling to be installed first.
- cluster-controller Defines the CRD for declaring leaf clusters. A leaf cluster is a cluster that the management cluster can query via a kubeconfig. This controller ensures that kubeconfig secrets have been supplied for leaf clusters. Because it also references the CAPI CRD, it requires CAPI tooling to be installed first.
Docker Desktop is a complete Linux virtual machine, which will be used for building and running the whole product - the defaults are way too conservative.
Dedicate at least 16GB RAM and as many CPU cores as you can spare - if you don't, everything will grind to a halt. Re-building the whole backend takes about a minute or two when there's enough resources, but if you're being too stingy with resources you could easily find yourself waiting half an hour or more, every time you want to change any code.
As for disk space, if you can spare a few hundred gigabytes, do give it a few hundred gigabytes. Every time your development environment loads a change, Docker will store another few megabytes of volumes which adds up quicker than you'd think - and when you run out, you might get annoying mystery errors.
After making this change, remember to turn your Docker Desktop off and on again.
You need to enable the buildkit feature in docker, or docker will complain:
BuildKit is enabled by default for all users on Docker Desktop
export DOCKER_BUILDKIT=1
A PAT is needed to get access to our private repositories and packages.
This token needs at least the repo
and read:packages
permissions. If you want to be able to delete the GitOps repo every
time you recreate your local Kind cluster, add the delete_repo
permission too and set the DELETE_GITOPS_DEV_REPO
flag to 1. You
can create a token here, and
export it as:
export GITHUB_TOKEN=your_token
In order for Go to be able to download private dependencies, you must also
update your ~/.gitconfig
with:
[url "ssh://[email protected]/"]
insteadOf = https://github.com/
If you are running into issues installing Go modules, try also setting the GOPRIVATE
environment variable:
export GOPRIVATE=github.com/weaveworks/*
You will also be using your personal GitHub account to host GitOps repositories. Therefore you need to export your GitHub username as well:
export GITHUB_USER=your_username
We use OCI artifacts hosted in GHCR, so you need your docker to be logged in to this repository:
echo $GITHUB_TOKEN | docker login --username $GITHUB_USER --password-stdin ghcr.io
To run a local development environment, you need to install Docker and kubectl.
Run make dependencies
to download binaries needed by the scripts in
tools/
. These are saved in tools/bin/
.
⚠️ The following script will delete a local Kind cluster namedwge-dev
and a remote repository namedwge-dev
in your personal GitHub account, if either of them exists. Take a look at the script to understand what it does and how to customize the cluster/repository names.
Run the following script to get a Kind cluster ready for Tilt:
./tools/reboot.sh
This will recreate a local Kind cluster, install CAPD and setup Flux to reconcile from a GitOps repository in your personal GitHub account. It will also create a file containing local settings such as your GitOps repository that the enterprise Helm chart will use in the next step.
If
reboot.sh
creates the Kind cluster but fails afterwards, you can use just./tools/setup.sh
to install CAPD and Flux and create the local config file. It is safe to run it repeatedly.
The reboot.sh
script has the capability to patch the default kind config
using custom configuration provided in the ./tools/custom/
directory. Place
any configuration you'd like in a file matching the pattern kind-cluster-patch-*.yaml
in that directory and it will get merged into the base configuration.
The reboot.sh
script will execute all scripts it finds in ./tools/custom/
that
match the file name pattern *.sh
after creating the cluster and installing
all components.
If you create a Tiltfile.local
in the local directory, Tilt will detect and include any calls there, allowing you to add a bit of customization to your local environment. That's useful if you want to point to another external cluster, by default Tilt only allows local clusters like kind or use a different registry:
allow_k8s_contexts('another-cluster')
default_registry('my-custom-registry:5000')
To start the development environment, run
make cluster-dev
and your system should build and start. The first time you run this, it will take ~10 mins (depending on your connection speed) to build all the containers and deploy them to your local cluster. This is because the docker builds have to download all the Go modules/JS libraries from scratch, use the Tilt UI to check progress. Subsequent runs should be a lot faster.
When cluster-service
has become green, you should be able to access
your cluster at http://localhost:8000. The login is
username wego-admin
and password dev
.
Any change you make to local code will trigger tilt to rebuild and restart the pods running in your system.
Especially for frontend development, the time it takes for the pod to restart can be annoying. To spin up a local development frontend against your development cluster, run:
yarn
PROXY_HOST=http://localhost:8000 yarn start
Now you have a separate frontend running on http://localhost:3000 with in-process reload.
Tip: by starting tilt with the command MANUAL_MODE=true tilt up
instead,
you will cause tilt to stop restarting the backend unless you click
the button in the UI. If you find that the backend always restarts
while you're doing something, this might help.
Here's a list of things that might go wrong.
This means you have not installed a CAPI provider. If you set up your
cluster with tools/reset.sh
then it should have been installed
already.
If you want to fix it without a cluster reset, look for the function
running clusterctl
in tools/reset.sh
.
If that doesn't help, it could also mean that your version of
clusterctl
is too old, so you simply have the wrong version of the
cluster API. Note that make dependencies
does not support upgrades,
so running it does not keep you up to date - you have to manually
delete tools/bin/clusterctl
and then re-run make dependencies
to
upgrade it.
Note: this only applies to the building step - not after it's started!
This might mean your github token isn't set correctly. Try turning
tilt off, running export GITHUB_TOKEN=your_token
in that terminal,
and starting it again and see if that fixes it.
If that doesn't work, make sure your token has both repo and package permissions, and that it hasn't expired.
If you still experience problems, ask someone in your team if it works for them, as it could be a new dependency you don't have permissions to read. If it's stopped working for you both, you probably need to change the github permissions.
The UI is empty and you see an error mentioning "kubernetes client initialization failed: Unauthorized"
Whenever the Uncategorized
job runs in tilt, it resets all
permissions. When that happens, it can delete the permissions for the
chart-mccp-cluster-service
pod that's currently running.
Manually re-run the Uncategorized
job to make sure it runs as
expected. When it's finished, re-start the
chart-mccp-cluster-service
deployment. Once that's finished, it
should work again.
If you're on a mac, it should go away if you simply turn Docker Desktop off and on again - you don't have to delete the machine, just reboot. If you're on linux, try turning the kind docker container off and on again, or reboot your machine.
To get it to go away, on mac try to put ulimit -n <big_number>
in
your shell config file (probably ~/.zshrc
or maybe ~/.bashrc
) -
half a million or so is a big number.
On linux, you might need to set fs.nr_open
to a big value in
/etc/sysctl.conf, and you might need to override LimitNOFILE for the
containerd systemd unit. There's other related settings that can cause
similar-looking errors - the fs.inotify
family of settings often
start up very low and might need a bump.
To build all containers use the following command:
# Builds everything - make sure you exported GITHUB_TOKEN as shown in
# the one-time setup
make GITHUB_BUILD_TOKEN=${GITHUB_TOKEN}
The following sections suggest some common dev workflows.
Before you start working on the code, you need to install the following tools:
- Go (1.18) for backend development
- Node.js (14) for frontend development
- kubectl for interacting with Kubernetes clusters
- Helm for working with Helm charts
- Buf for generating code from protobuf definitions
Most of the code for the API is under ./cmd/clusters-service
. There's a
Makefile in that directory with some helpful targets so when working on the API
make sure to run these from that location instead of root. The following
commands assume execution from ./cmd/clusters-service
.
To install gRPC tooling run:
make install
The API endpoints are defined as gRPC definitions in
./cmd/clusters-service/api/capi_server.proto
. Therefore if you need to add or
update an endpoint you need to first define it in there. For example the
following endpoint is used to return the version of WGE.
// GetEnterpriseVersion returns the WeGO Enterprise version
rpc GetEnterpriseVersion(GetEnterpriseVersionRequest)
returns (GetEnterpriseVersionResponse){
option (google.api.http) = {
get: "/v1/enterprise/version"
};
}
After making a change in the protobuf definition, you will need to run make generate
to regenerate the code.
To run the service locally, run:
export CAPI_CLUSTERS_NAMESPACE=default
go run main.go
You can execute HTTP requests to the API by pointing to an endpoint, for example:
curl --insecure http://localhost:8000/v1/enterprise/version
The --insecure flag is needed because the service will generate self-signed untrusted certificates by default.
To run all unit tests before pushing run:
make unit-tests
To run all tests, including integration tests run:
make test
To create leaf clusters to test out our features, we can rely on the vcluster to help us deploy new clusters on the fly. That project will basically create a entire cluster inside you kind cluster without adding much overhead.
to get started install the vcluster
cli first, by following https://www.vcluster.com/docs/getting-started/setup and then just run the ./tools/create-leaf-cluster.sh
script.
$ ./tools/create-leaf-cluster.sh leaf-cluster-01
This command will create a new cluster and configure the GitopsCluster
CR pointing to the cluster's kubeconfig.
Note that this won't configure completelly the cluster, you might need to install flux and rbac rules in order to be able to query it properly. But it should be already visible on the Weave GitOps cluster's tab.
When you push your changes to a remote branch (even before creating a PR for it), CI will kick off a build that runs a quick suite of tests and then builds your containers and creates a new Helm chart tagged with the most recent commit. This Helm chart includes all the changes from your branch and can be used to deploy WGE as a whole to a cluster.
-
Find the version of the Helm chart you need to deploy:
# Add the Helm repo locally (needs to happen only once) ./tools/bin/helm repo add weave-gitops-enterprise-charts \ https://charts.dev.wkp.weave.works/charts-v3 \ --username wge --password gitops "weave-gitops-enterprise-charts" has been added to your repositories_ # Search the Helm repo for the commit SHA that corresponds to your most recent commit ./tools/bin/helm repo update > /dev/null 2>&1 \ && ./tools/bin/helm search repo weave-gitops-enterprise-charts --devel --versions \ | grep <commit-SHA> weave-gitops-enterprise-charts/mccp <chart-version-with-commit-SHA> 1.16.0 A Helm chart for Kubernetes
-
Create a new kind cluster and install flux
cat > kind-cluster-with-extramounts.yaml <<EOF kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 nodes: - role: control-plane extraMounts: - hostPath: /var/run/docker.sock containerPath: /var/run/docker.sock EOF ./tools/bin/kind create cluster \ --name kind \ --config=kind-cluster-with-extramounts.yaml export GITHUB_TOKEN=<your-GH-token> ./tools/bin/flux bootstrap github \ --owner=<your-GH-username> \ --repository=config \ --personal=true \ --path=clusters/kind
-
Install CAPI
./tools/bin/clusterctl init --infrastructure docker
-
Install WGE
cat > values.yaml <<EOF tls: enabled: true config: capi: repositoryURL: <your config repo URL> EOF kubectl apply -f ./test/utils/data/entitlement/entitlement-secret.yaml ./tools/bin/flux create source helm weave-gitops-enterprise-charts \ --url=https://charts.dev.wkp.weave.works/charts-v3 \ --namespace=flux-system \ --secret-ref=weave-gitops-enterprise-credentials ./tools/bin/flux create hr weave-gitops-enterprise \ --namespace=flux-system \ --interval=10m \ --source=HelmRepository/weave-gitops-enterprise-charts \ --chart=mccp \ --chart-version=<chart-version-with-commit-SHA> \ --values values.yaml
When making code modifications see if you can write a test first!
- Integration and unit tests should be placed in the
_test.go
file next to the source you're modifying. - Acceptance tests live in
./test/acceptance
The clusters-service
caches the index.yaml
and values.yaml
files from
helm-repos (profiles) in a SQLite database. This is to avoid hammering the
source-controller with requests. The cache is stored in a file called
/tmp/helm-cache/charts.db
by default.
kubectl cp flux-system/$(kubectl get pods -A -l app=clusters-service --no-headers -o custom-columns=":metadata.name"):/tmp/helm-cache/charts.db mccp.db
We usually develop the UI against the test server and by default the UI dev server will use that.
yarn
yarn start
Open up http://localhost:3000. Changes to code will be hot-reloaded.
To start the jest
test runner CLI dialog:
$ yarn test
PASS src/components/Applications/__tests__/index.test.tsx
Applications index test
✓ renders table rows (349 ms)
snapshots
✓ loading (170 ms)
✓ success (168 ms)
Test Suites: 1 passed, 1 total
Tests: 3 passed, 3 total
Snapshots: 2 passed, 2 total
Time: 5.448 s
Ran all test suites.
Watch Usage: Press w to show more.
- The
@testing-library/react
package provides a test renderer as well as helpers for dealing with hooks and component state - Snapshots alone generally aren't enough, you should do some assertions to validate component behavior
- Hooks can be tested in isolation from components using the
act
helper.
We use a technique called "Snapshots" to record the rendered output of components and track them in version control over time. Snapshots are not really tests, since they don't have any explicity assertions. Think of them more as a record of the output of a component.
When combined with the styled-components
integration, snapshots give us a way to track styling logic over time. This can be very helpful in debugging styling issues that would otherwise be hard to understand.
After any changes to styling logic, you should expect to update snapshots, else unit tests will fail.
To update snapshots:
yarn test -u
The easiest way to dev on the UI is to use an existing cluster. demo-01 is kept automatically up to date with every change that lands on main. To use it run the following command:
PROXY_HOST=https://demo-01.wge.dev.weave.works/ yarn start
The username/password used to login are stored in 1Password.
If you need to develop the UI against new features that haven't made to the test cluster yet, you can run your own clusters-service locally and point the UI dev server at it with:
PROXY_HOST=http://localhost:8000 yarn start
Recommended UI Development environment variables:
# This will point your local FE dev server to the local cluster backend
export WEGO_EE_PROXY_HOST=http://localhost:8000
# This will skip building the UI docker container, which we don't need because we have a dev server.
export SKIP_UI_BUILD=true
# This will use your local Go compiler to build the binary, rather than building in a docker container via Tilt.
# You may need to install things like gcc to make this work, since we are using CGO=true.
export NATIVE_BUILD=true
It is possible to test against local files from gitops OSS during
development. If you want to do this, you need to make sure you've
started tilt using MANUAL_MODE=true tilt up
so that it won't
re-start your backend. After that's running, you have to do
development against the yarn start
method.
For another method that has fewer caveats, look at "How to update
weave-gitops
to a non-released version during development".
One magical command to "reload" core (assumes the project directories are located in the same directory):
# Note, this assumes you have core and EE at the same level in the file system
make core-ui && make core-lib
weave-gitops-enterprise
depends on weave-gitops
. When WG makes a new release we'll want to update the version WGE depends on. It goes a little something like this:
export WG_VERSION=0.2.4
# 1. Update the backend Golang code
go get -d github.com/weaveworks/weave-gitops@$WG_VERSION
go mod tidy
# 2. Update the frontend typescript/javascript code
yarn add @weaveworks/weave-gitops@$WG_VERSION
This will update WGE to use the latest main
of weave-gitops
make update-weave-gitops
You can also pick a different branch as long as that branch has a PR associated with it by running
make update-weave-gitops BRANCH=<my-branch-name>
When a new release of the cluster-controller is made we'll usually want to update it in WGE.
export CC_VERSION=1.2.0
# update the backend golang code in the weave-gitops-enterprise repo root
cd weave-gitops-enterprise
go get -d github.com/weaveworks/cluster-controller@CC_VERSION
go mod tidy
Copy across the newer helm-chart
cd ../cluster-controller
# generates and copies a new helm subchart to ../weave-gitops-enterprise
make helm
cd ../weave-gitops-enterprise
# TODO: improve this:
# Carefully add back the "important" changes to the chart, read the comments as you go.
git add --patch
Update images.clusterBootstrapController
in https://github.com/weaveworks/weave-gitops-enterprise/blob/main/charts/mccp/values.yaml
Manually copy across any big changes to the deployment or CRDs from cluster-controller/config into weave-gitops-enterprise/charts/mccp/
We have 3 demo clusters currently that we use to demonstrate our work and test new features.
UI | GitOps | CAPI |
---|---|---|
http://35.188.40.143:30080 | https://github.com/wkp-example-org/capd-demo-reloaded | CAPD |
https://demo-01.wge.dev.weave.works | https://gitlab.git.dev.weave.works/wge/demo-01 | CAPG |
As enterprise features are deployed, the multi-cluster permissions may need to be updated as well. For example viewing canaries from a leaf cluster did not work. Below is an example rbac config that resolved the canary issue:
apiVersion: v1
kind: ServiceAccount
metadata:
name: demo-02
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: impersonate-user-groups
subjects:
- kind: ServiceAccount
name: demo-02
namespace: default
roleRef:
kind: ClusterRole
name: user-groups-impersonator
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: user-groups-impersonator
rules:
- apiGroups: [""]
resources: ["users", "groups"]
verbs: ["impersonate"]
- apiGroups: [""]
resources: ["namespaces"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apiextensions.k8s.io"] # required for canary support
resources: ["customresourcedefinitions"]
verbs: ["get", "list", "watch"]
Requires: gcloud CLI >= 352.0.0
Install and configure the gcloud CLI if needed. Then run:
gcloud container clusters get-credentials demo-01 --region europe-north1-a
The following steps use demo-03 as an example but the same concepts can be applied to all demo clusters. Depending on the cluster, you may need to sign up to our on-prem Gitlab instance using your @weave.works email address and request access to the Weave GitOps Enterprise group or get added to the wkp-example-org in Github.
-
Figure out the version of the WGE chart you want to deploy:
- If we've done a release recently you can change it to
0.0.19
or a major version like that. - Alternatively, to deploy an unreleased version from
main
or another branch you need to take a look at the branch or the charts repo to determine the version.
- If we've done a release recently you can change it to
-
Find the
HelmRelease
definition for WGE in the repo. It is calledweave-gitops-enterprise
and is part of theflux-system
namespace. Locate thespec.chart.spec.version
field (example) and update it to the new version (i.e.0.0.17-110-g485f9bf
) by committing tomain
or via a PR.- If this is an official release (i.e
0.0.19
etc) make sure the release repo is set:Otherwise make sure its using dev:sourceRef: kind: HelmRepository name: weave-gitops-enterprise-mccp-chart-release namespace: flux-system
sourceRef: kind: HelmRepository name: weave-gitops-enterprise-mccp-chart-dev namespace: flux-system
- If this is an official release (i.e
-
Flux will detect this change and update the cluster with the version you specified in the previous step.
-
Voila
Note
As of writing the
HelmRelease
for 35.188.40.143 lives in https://github.com/wkp-example-org/capd-demo-reloaded/blob/main/clusters/management/weave-gitops-enterprise.yaml but may have moved, so look around for the Helm release file, if this has gone missing.
-
Get your local copy of
weave-gitops-enterprise
up to date by runninggit fetch
-
Figure out the git version ref of
origin/main
(for example) with:git describe --always --match "v*" --abbrev=7 origin/main | sed 's/^[^0-9]*//'
. You could also provideorigin/fixes-the-funny-bug
as the branch name here. -
It will output a ref that looks like this:
0.0.7-10-g9838aff
Requires: helm CLI >= 3.8.1
- Add the charts repo locally:
./tools/bin/helm repo add wkp https://charts.dev.wkp.weave.works/charts-v3 \
--username wge --password gitops
- Use the commit sha to find the relevant chart version by running the following:
./tools/bin/helm repo update \
&& ./tools/bin/helm search repo wkp --devel --versions \
| grep e4e540d
where e4e540d
is your commit sha. This will return wkp/mccp 0.0.17-88-ge4e540d 1.16.0 A Helm chart for Kubernetes
where 0.0.17-88-ge4e540d
is the version you're looking for.
- If you are using a Helm verion prior to
v3.8.0
set theHELM_EXPERIMENTAL_OCI
environment variable. Helm versionsv3.8.0
and newer have OCI support enabled by default
export HELM_EXPERIMENTAL_OCI=1
-
If you haven't already, install and configure the gcloud CLI
-
Use the gcloud cli to query registry artifacts
The Google Artifact Registry Docker repository can hold both helm charts and docker images. If both types will be deployed to the same registry, charts should be stored in the
charts
namespace and images in theimages
namespace as documented here
gcloud artifacts docker images list europe-west1-docker.pkg.dev/weave-gitops-clusters/weave-gitops-enterprise --include-tags
- Once you know the version tag you can use the oci image url and version to run helm show/pull/install commands
With oci registries the
--version
flag is required
helm show all oci://europe-west1-docker.pkg.dev/weave-gitops-clusters/weave-gitops-enterprise/charts/mccp --version 0.8.1-55
openssl req -x509 \
-newkey rsa:4096 \
-keyout key.pem -out cert.pem \
-sha256 -days 365 \
-nodes -subj '/CN=localhost' \
-addext "subjectAltName = DNS.1:localhost"
sudo security add-trusted-cert \
-d -r trustRoot \
-k /Library/Keychains/System.keychain cert.pem
clusters-service <OTHER_ARGS...> --tls-cert-file cert.pem --tls-private-key key.pem
Requires: azure CLI >= 2.36.0
Install and configure the azure CLI if needed. Then run:
az aks get-credentials --name <cluster-name> --resource-group <resource-group> --admin
Requires: aws CLI >= 2.5.2
Install and configure the aws CLI if needed. Then run:
aws eks --region <aws-region> update-kubeconfig --name <cluster-name>
First add a new feature flag in values.yaml (example)
For example:
# Turns on pipelines features if set to true. This includes the UI elements.
enablePipelines: false
Then add an if
block to the deployment.yaml (example) template of cluster-service in order to conditionally set an environment variable. This variable will be available to cluster-service at runtime. As a convention, the environment variable needs to be prefixed with WEAVE_GITOPS_FEATURE_
and takes a value of true
.
For example:
{{- if .Values.enablePipelines }}
- name: WEAVE_GITOPS_FEATURE_PIPELINES
value: "true"
{{- end }}
Finally, you can use the feature flag from Go through the featureflags
package, for example:
if featureflags.Get("WEAVE_GITOPS_FEATURE_PIPELINES") != "" {
// Feature flag has been set
}