Skip to content

OptimizationJob CRD for HPO: The Evolution of the Katib Project #2605

@akshaychitneni

Description

@akshaychitneni

This issue tracks work on adding OptimizationJob CRD support as discussed in https://github.com/kubeflow/sdk/tree/main/docs/proposals/46-hyperparameter-optimization#potential-api-for-optimizationjob-crd that focuses on hyperparameter optimization (HPO) for TrainJobs

It should include -

  • CRD specifically for hyperparameter optimization of TrainJobs
  • Integration with TrainJob API
  • Support for model/dataset initialization shared across trials
  • Support push-based metrics collection via SDK
  • Integration with SDK's OptimizerClient API

Design Document

https://docs.google.com/document/d/1Y8IJ-UdZ7VCEAlax_xEFbbqEi7EB6SfIX4D7ua-xn4M/edit

cc @andreyvelich @kramaranya

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions