What's Changed
- Update readme by @OguzPastirmaci in #87
- Update path for OCI CLI in helm-deployment.tf by @OguzPastirmaci in #89
- Fix ubuntu repo by @robo-cap in #92
- Issue number: 90 - fss mount on all worker nodes by @subburamoracle in #91
- Install OKE node client packages from local repo if it exists by @OguzPastirmaci in #93
- Improve ons-webhook resiliency by @robo-cap in #94
- Add retry function to cloud init by @OguzPastirmaci in #95
- Module fixes and improvements by @robo-cap in #96
- Use NSGs instead of SLs for Lustre Service by @robo-cap in #100
- Update NPD values file by @OguzPastirmaci in #102
- Add NCCL tests manifest for BM.GPU.GB200-v3.4 and update the other manifests to use NCCL 2.29 by @OguzPastirmaci in #103
- Add Terratest tests by @OguzPastirmaci in #101
- Add the document for replacing the boot volumes of self-managed nodes by @OguzPastirmaci in #106
- Update NCCL/RCCL images by @OguzPastirmaci in #107
- Add check to wait until kubeconfig exists by @OguzPastirmaci in #108
- Add MI355 manifest and update other manifests by @OguzPastirmaci in #109
- Move GPU Fryer active health checks to Python by @OguzPastirmaci in #110
- Update BM.GPU.MI355X-v1.8.yaml by @OguzPastirmaci in #111
- added support for VM.DenseIO shapes by @shethdhvani in #114
- Update replacing node using BVR guide by @OguzPastirmaci in #115
- Fix pod logs mount by @robo-cap in #118
- Replace Nginx Ingress controller with Contour by @robo-cap in #117
- Fix: Set to retentionSize for Prometheus by @sam-andaluri in #119
- Update contour helm values by @robo-cap in #120
- Add NCCL tests manifest for BM.GPU.GB300.4 by @OguzPastirmaci in #121
- Update BM.GPU.GB300.4.yaml by @OguzPastirmaci in #122
- Add cloud-shell support to the BVR script by @robo-cap in #123
- Remove BV high storage class by @OguzPastirmaci in #126
- Add option to change services CIDR by @OguzPastirmaci in #127
- Add NCCL tests 2.29.3 images by @OguzPastirmaci in #124
- Update Node Problem Detector checks by @OguzPastirmaci in #130
- Add an option to the OKE stack to use an existing Dynamic Group by @subburamoracle in #105
- Bump chart versions by @OguzPastirmaci in #131
- Add per-pool kubernetes version, max pods, and node cycling by @OguzPastirmaci in #128
- Larger CIDR to accomodate more nodes by @OguzPastirmaci in #129
- BugFix: Fix alert webhook to reduce chances of duplicate alerts by @sam-andaluri in #133
- set kubeproxy to use ipvs & several small tweaks by @robo-cap in #132
- Increase DCGM Exporter memory limits by @OguzPastirmaci in #134
New Contributors
- @subburamoracle made their first contribution in #91
- @shethdhvani made their first contribution in #114
- @sam-andaluri made their first contribution in #119
Full Changelog: v25.11.0...v26.2.0