Skip to content

build: install eoAPI chart#17

Merged
spwoodcock merged 33 commits intomainfrom
infra/193-eoapi
Aug 6, 2025
Merged

build: install eoAPI chart#17
spwoodcock merged 33 commits intomainfrom
infra/193-eoapi

Conversation

@aliziel
Copy link
Copy Markdown
Collaborator

@aliziel aliziel commented May 30, 2025

What type of PR is this? (check all applicable)

  • 🍕 Feature
  • 🐛 Bug Fix
  • 📝 Documentation
  • 🧑‍💻 Refactor
  • ✅ Test
  • 🤖 Build or CI
  • ❓ Other (please specify)

Related Issue

hotosm/openaerialmap#193

Describe this PR

  • Adds deployment via helmfile:
    • Ensures prerequisite releases are installed
    • Creates new revision on change only
    • Enables further customization, templating, isolation, etc.
  • Adds Prometheus and Grafana tooling via eoAPI support chart:
    • Enables custom metrics for HPA
    • Monitoring, observability, and alerting for admins
  • Adds local TF vars for added buffer during initial development
  • Documents initial implementation, basic setup, and areas for further improvement
  • Fixes CIDR collision during subnet generation

Screenshots

Grafana:
grafana
Prometheus:
prometheus

Review Guide

  1. Spin up a local cluster (e.g. kind, minikube, Docker Desktop)

  2. Install helm

  3. Install helmfile (or run in container)

  4. Add global resources

    helmfile.yaml
    releases:
      - name: cluster-autoscaler
        namespace: cluster-autoscaler
        chart: cluster-autoscaler/cluster-autoscaler
        version: 9.46.6
        values:
          - autoDiscovery:
              clusterName: hotosm-development-cluster
            awsRegion: us-east-1
    
      - name: ingress
        namespace: ingress-nginx
        chart: ingress-nginx/ingress-nginx
        version: 4.12.1
        values:
          - controller:
              enableLatencyMetrics: true
              metrics:
                enabled: true
                service:
                  annotations:
                    prometheus.io/scrape: "true"
                    prometheus.io/port: "10254"
    
      - name: cert-manager
        chart: cert-manager/cert-manager
        namespace: cert-manager
        version: 1.17.1
        values:
          - installCRDs: true
    
    repositories:
      - name: cluster-autoscaler
        url: https://kubernetes.github.io/autoscaler
      - name: ingress-nginx
        url: https://kubernetes.github.io/ingress-nginx
      - name: cert-manager
        url: https://charts.jetstack.io
    $ helmfile apply
  5. Pull down this branch

  6. Initialize helmfile, recommend installing the diff plugin to more easily view changes

    $ cd kubernetes/helm
    $ helmfile init
  7. Set environment and apply helmfile → expect successful pgo + eoapi install

    $ export S3_BACKUP_ROLE=arn:aws:iam::0123456789:role/s3-backup
    $ sed -i '' '43d' eoapi-values.yaml # If not installing on AWS, delete setting referencing specific storage class
    $ helmfile apply
    # ...
    # UPDATED RELEASES:
    # NAME
    # pgo     ...
    # eoapi   ...
  8. Apply again without changes → expect no updates

    $ helmfile apply
    # Comparing release=pgo, ...
    # Comparing release=eoapi, ...
  9. Modify eoapi input and reapply → expect successful eoapi update and pgo skipped

    $ sed -i '' 's/100m/101m/' eoapi-values.yaml 
    $ helmfile apply
    # ...
    # UPDATED RELEASES:
    # NAME
    # eoapi   ...
    
    $ helm list -A
    # NAME              	REVISION
    # eoapi     ...     	2
    # pgo       ...     	1
  10. Explore cluster resources + deployed app → expect available eoapi services and interface

    $ kubectl get pod,svc,deploy -A
    $ kubectl -n ingress-nginx get svc/ingress-ingress-nginx-controller \
        -o=jsonpath='{.status.loadBalancer.ingress[0].hostname}'
    # <hostname>
    # ^^^^^^^^^ Plug output into browser
  11. Set eoapi value ingress.tls.enabled: true and reapply → expect eoapi-support chart to be installed

    $ sed -i '' '28,29s/# //' eoapi-values.yaml
    $ helmfile apply
    # ...
    # UPDATED RELEASES:
    # NAME
    # eoapi-support   ...

    NOTE: this chart should be installed when TLS is setup. We're using a shortcut for local testing, so we won't be able to interact with it.

  12. $ helmfile destroy
    # ...
    # DELETED RELEASES:
    # NAME 
    # eoapi-support   ...
    # eoapi           ...
    # pgo             ...

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 30, 2025

tofu plan -chdir=terraform -var-file=vars/production.tfvars
No changes. Your infrastructure matches the configuration.
By @aliziel at 2025-06-24T22:13:28Z (view log).
No changes. Your infrastructure matches the configuration.

OpenTofu has compared your real infrastructure against your configuration and
found no differences, so no changes are needed.

@aliziel aliziel force-pushed the infra/193-eoapi branch from 90e8d58 to 666a9f4 Compare June 1, 2025 00:52
@gadomski
Copy link
Copy Markdown
Collaborator

gadomski commented Jun 17, 2025

👋🏼 checking in ... I believe this PR will unblock a couple of other tasks (e.g. hotosm/openaerialmap#191) so curious if we can push to land this soon-ish?

@aliziel
Copy link
Copy Markdown
Collaborator Author

aliziel commented Jun 24, 2025

Will post in #oam-dev re: closing this out and tearing down the DS instance 👍

@aliziel
Copy link
Copy Markdown
Collaborator Author

aliziel commented Jun 24, 2025

@spwoodcock @dakotabenjamin Relinking some notes on cluster access in case either of you would like to poke around. I believe there was an auth strategy in mind here, so no action needed, just wanted to put it back on your radar in case its helpful for review or TLS setup (outlined in eoapi-values.yaml). I also hadn't realized that I could mark ready for review again, so thank you for reenabling !

cc @gadomski @ceholden

@aliziel aliziel marked this pull request as ready for review June 24, 2025 23:07
@spwoodcock
Copy link
Copy Markdown
Member

spwoodcock commented Jun 25, 2025

@aliziel thanks for this!

Are there any docs for access you could provide?
Do we need a kubeconfig file with certs?
Ideally I just want to have my config set, then kubectl get pods.

By the sounds of it, one requirement is:

  • Add IAM role names to ADMIN_ROLES as a Github actions variable in this repo.
  • Redeploy via the opentofu Github workflow
  • This will provide access for the admins - is this via AWS console?

@aliziel
Copy link
Copy Markdown
Collaborator Author

aliziel commented Jun 26, 2025

@aliziel thanks for this!

Are there any docs for access you could provide? Do we need a kubectl file with certs? Ideally I just want to have my config set, then kubectl get pods.

By the sounds of it, one requirement is:

  • Add IAM role names to ADMIN_ROLES as a Github actions variable in this repo.
  • Redeploy via the opentofu Github workflow
  • This will provide access for the admins - is this via AWS console?

If you mean AWS docs, here's their section on access entries and their page on kubectl setup. I can also expand the docs on this PR, but your summary is correct. Adding access entries maps AWS IAM to Kubernetes permissions, so you can just pull the kubeconfig and start.

I did add notes about AWS auth in the TF section, I'll add a similar mention in the Kubernetes one as well.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 26, 2025

tofu init -chdir=terraform -var-file=vars/production.tfvars
View output.
By @aliziel at 2025-06-26T19:54:11Z (view log).
Too many command line arguments. Did you mean to use -chdir?

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 27, 2025

tofu plan -chdir=terraform -var-file=vars/production.tfvars
Error: Missing item separator
By @aliziel at 2025-06-27T19:42:01Z (view log).
Error: Missing item separator

  on <value for var.cluster_admin_access_role_arns> line 1:
  (source code not available)

Expected a comma to mark the beginning of the next item.

@spwoodcock
Copy link
Copy Markdown
Member

Thanks - the kubeconfig generation docs above were the missing piece for me - I have never had to generate one via AWS CLI 😄 Thanks for linking the eoAPI docs too 👍

The process is:

# Run AWS CLI
docker run --rm -it --entrypoint=sh -v $PWD:$PWD --workdir $PWD public.ecr.aws/aws-cli/aws-cli:2.19.1

# Configure SSO
aws configure sso
	Session name: k8s
	Start URL: https://hotosm.awsapps.com/start/#
	Start region: eu-west-1

# Login via SSO
aws sso login --profile Admin

# View available clusters
aws eks list-clusters --profile Admin

# Generate a kubeconfig file:
aws eks update-kubeconfig --profile Admin --name hotosm-production-cluster --region us-east-1
# (I don't think the cluster name needs to remain secret)

# I didn't get this far - assuming this step
# Copy generated kubeconfig file to ~/.kube/config
cp kubeconfig ~/.kube/config

# Use kubectl as normal
kubectl get pods

@dakotabenjamin when I list clusters as AdministratorAccess-153950808028 profile, I see an empty list.
Also if I try to generate the kubeconfig file I get the same.
I assume I'm using the wrong role or something - could you point me in the right direction? 🙏

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 28, 2025

tofu validate -chdir=terraform -var-file=vars/production.tfvars
View output.
By @spwoodcock at 2025-08-06T14:05:39+01:00 (view log).

@aliziel
Copy link
Copy Markdown
Collaborator Author

aliziel commented Jul 1, 2025

# I didn't get this far - assuming this step
# Copy generated kubeconfig file to ~/.kube/config
cp kubeconfig ~/.kube/config

@spwoodcock So you actually shouldn't need this command, just the one before and its ready as a new context:

aws eks update-kubeconfig --profile Admin --name hotosm-production-cluster --region us-east-1

# Verify
kubectl config get-contexts

The command docs outline the logic in case you have a more custom setup.

when I list clusters as AdministratorAccess-153950808028 profile, I see an empty list. Also if I try to generate the kubeconfig file I get the same. I assume I'm using the wrong role or something - could you point me in the right direction? 🙏

I just tried to do a workaround apply but got stuck on permissions. An admin role might've been added manually for review, so the permission gap wouldn't have flagged since it wasn't through a locked down CI role ?

Comment thread kubernetes/manifests/cluster-issuer-staging.yaml Outdated
Comment thread kubernetes/manifests/cluster-issuer.yaml Outdated
ceholden and others added 2 commits July 11, 2025 09:41
@aliziel aliziel force-pushed the infra/193-eoapi branch 2 times, most recently from a64a8cf to 68970a4 Compare July 16, 2025 01:19
@spwoodcock
Copy link
Copy Markdown
Member

I can connect to the cluster & the services are there - is there anything pending, or should we merge?

@spwoodcock
Copy link
Copy Markdown
Member

By the way, I'm probably adding ArgoCD pretty soon, so will need to swap this for a pull based approach 👍

@spwoodcock
Copy link
Copy Markdown
Member

spwoodcock commented Aug 6, 2025

I'll go ahead and merge this then - it looks good to me 👍

We can always iterate on the setup =)

Thanks for your work on this @aliziel!

@spwoodcock spwoodcock merged commit 834aaab into main Aug 6, 2025
1 check passed
@aliziel aliziel mentioned this pull request Aug 7, 2025
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants