Skip to content

v26.01.0-beta1

Latest

Choose a tag to compare

@rchakode rchakode released this 28 Dec 14:55
· 2 commits to main since this release

Highlights of Release v26.01.0-beta1

This release introduces comprehensive GPU metrics support via NVIDIA DCGM Exporter integration, modernized Python tooling, and UI/UX improvements on dark mode support.

New Features

GPU Metrics Support

  • NVIDIA DCGM Integration: Full support for GPU metrics collection via NVIDIA DCGM Exporter
  • GPU Compute & Memory Views: Separate monitoring views for GPU utilization and memory usage
    • GPU Compute: Shows GPU utilization percentage per pod
    • GPU Memory: Shows GPU memory usage per pod
  • GPU Node Heatmap: Heatmap visualization for GPU nodes showing compute and memory utilization
  • Per-pod GPU Usage Charts: Breakdown of GPU usage by pod within each node
  • Conditional GPU Charts: GPU charts display only when DCGM endpoint is configured
  • Helm Chart Support: Added DCGM Exporter configuration options in Helm chart

Theme & UI Improvements

  • Theme-aware Legend Colors: Legend text colors adapt to the selected theme
  • Enhanced Heatmap Labels: Improved label visibility and positioning
  • Better Tooltip Positioning: Tooltips now position correctly across all views

Node Heatmap

  • Resource Utilization View: Visual representation of CPU, Memory, and GPU (new) usage across nodes

Infrastructure & Build

Python Tooling Modernization

  • pyproject.toml: Migrated from setup.py to modern pyproject.toml-based configuration
  • uv Package Manager: Adopted uv for faster, more reliable dependency management
  • Ruff Linter: Integrated Ruff for code formatting and linting

Container & CI/CD

  • Ubuntu 24.04 Base Image: Upgraded container base image for improved security and performance
  • Docker Workflow Improvements: Enhanced CI/CD pipeline for main branch pushes and PR builds
  • Variable-based Docker Registry: Flexible Docker repository naming via CI variables
  • GitHub Actions Updates: Bumped astral-sh/setup-uv and github/codeql-action versions

Bug Fixes

  • Fixed regression in GPU metrics processing
  • Fixed Y-axis rounding to 2 decimal places in stacked bar charts
  • Fixed typo in pyproject.toml that was breaking Docker builds
  • Removed logging of sensitive billing hourly rate
  • Cleaned up unused variables and missing declarations
  • Removed deprecated Azure and GCP pricing cost code

Documentation

  • Updated README with accurate tech stack and installation instructions
  • Added CLAUDE.md with development guidelines and versioning information
  • Documented minimum version requirements for GPU support

New configuration variables

  • Environment variable KOA_DCGM_EXPORTER_ENDPOINT required for GPU monitoring

Breaking Changes

N/A


Acknowledgements

Big kudos to @ccamel for his huge contributions throughout this milestone. 💯 :


For deployment instructions, see the README.