Open
Description
Description
This is a list, in priority order of what the MetaCPAN NOC would like help from DCL on.
Action
- ?: Impliment loki/grafana/prometheus
- DCL: should we use DO S3 ($?? pm) limits or https://www.backblaze.com/ ? - DO benefit is local + all in the same DO bill so possible sponsorship
Planned
- Review DCL findings and recommendations
- Cluster node setup (what should we expand it to - to reduce memory issues)
- Collecting log output from containers, maybe ingress logging option? - n.b forwarding ip's!
- Best practices recommendations, yaml lint etc
- Better container/node monitoring (how much memory does X container need, what is using all the processes in the cluster)
- Review and update all app configs - setup best practices (affinity, limits, etc)
- Support K8s access for multiple users/roles/projects in one cluster, e.g. if we want to give project X access how do we partition both access and resources (Rancher?)
- Discuss storage options for moving cpan store previous out of date discussion
- Simplest way to backup DO PG (ideally to BackBlaze - s3 storage), currently useing https://app.snapshooter.com/ (the free account should be enough)?
Completed
- Basic k8s monitoring https://opsview.dcmanaged.com/ - MetaCPAN Noc has access
- Slack channel for discussion + honewycomb.io error alert integration
- LL: Increase cluster memory, use 3 x 16G instances as we are running at ~90%
- LL: Start using https://k8slens.dev/ for viewing cluster information, will improve with grafana/prometheus/loki implimented