This is the Katib v0.18.0 release.
The key highlights:
- KEP-2339: Hyperparameter optimization for LLMs fine-tuning.
- KEP-2340: Support for push-based metrics collection.
- KEP-2374: Advanced parameter distribution, such as uniform, log-uniform, normal, and log-normal.
Breaking Changes
- Move Katib manifest image references to ghcr (#2535 by @saileshd1402)
- Migrate docker images to ghcr (#2531 by @mahdikhashan)
- Upgrade Kubernetes to v1.31.3 (#2478 by @Electronic-Waste)
- Upgrade Kubernetes to v1.30.7 (#2463 by @Electronic-Waste)
- Drop Python 3.7 and Support Python 3.11 in the SDK (#2337 by @tenzen-y)
New Features
Hyperparameter Optimization for LLMs
- [DOCS] move llm hyperparameter optimisation design image to the proposal directory and rename it (#2472 by @mahdikhashan)
- [GSoC] Update
tune
API for LLM hyperparameters optimization (#2393 by @helenxie-bit) - [GSoC] Create LLM Hyperparameters Optimization API Proposal (#2333 by @helenxie-bit)
Support for Advanced Distributions for HPO
- [GSOC]
optuna
suggestion service logic update (#2446 by @shashank-iitbhu) - [GSOC]
hyperopt
suggestion service logic update (#2412 by @shashank-iitbhu) - [GSOC] Add validator for feasible space distribution (#2404 by @shashank-iitbhu)
- [GSOC] added Unknown distribution and convertDistribution in suggestion client (#2403 by @shashank-iitbhu)
- [GSOC] Support for various Parameter distributions in Katib (#2334 by @shashank-iitbhu)
- [GSoC] Added
DistributionType
to Experiment API (#2377 by @shashank-iitbhu)
Push-based Metrics Collector
- [GSoC] Provide a PyTorch MNIST Example for Push-based Metrics Collection (#2437 by @Electronic-Waste)
- [GSoC] Compatibility Changes in Trial Controller (#2394 by @Electronic-Waste)
- [GSoC] New Interface
report_metrics
in Python SDK (#2371 by @Electronic-Waste) - [GSoC] KEP for Project 6: Push-based Metrics Collection for Katib (#2328 by @Electronic-Waste)
- [GSoC] Add New Parameter in
tune
(#2369 by @Electronic-Waste)
SDK Updates
- [SDK] Support PyTorchJob as a Trial Worker (#2512 by @andreyvelich)
- [SDK] test: Add e2e test for tune function. (#2399 by @Electronic-Waste)
- [SDK] improve PVC creation name error (#2496 by @mahdikhashan)
- [SDK] Fix empty list for env variables and numpy version (#2360 by @andreyvelich)
- [SDK] Explain Python version support cycle (#2354 by @andreyvelich)
Bug Fixes
- fix(webhook): fix validation message in experiment webhook (#2507 by @Electronic-Waste)
- Install typing-extensions v4.10.0 to fix Python test error (#2504 by @helenxie-bit)
- [SDK] Update
tune
API (#2497 by @helenxie-bit) - fix(api): resolve all api voilation exceptions in katib api (#2482 by @truc0)
- fix(trial): use propagated gomega to improve debuggability. (#2432 by @Electronic-Waste)
- fix(ui): update None Collector with Push Collector. (#2418 by @Electronic-Waste)
- fix: Resolve errors in e2e tests for cypress in Katib UI (#2384 by @tariq-hasan)
- doc(example): fix the broken link. (#2433 by @Electronic-Waste)
- fix: remove remaining MXNet dependency. (#2456 by @Electronic-Waste)
- Remove Dropout layer from ENAS Trial container to fix E2E tests (#2455 by @andreyvelich)
- [SDK] fix grpc related bugs in Python SDK (#2398 by @Electronic-Waste)
- [SDK] Fix types error (#2424 by @helenxie-bit)
- fix: remove the dependency of
protocmp
ingoogle.golang.org/protobuf/testing/protocmp
. (#2391 by @Electronic-Waste) - Fix TestReconcileBatchJob (#2350 by @forsaken628)
- Fix apple silicon rosetta error when building images from the source code (#2342 by @helenxie-bit)
- fix katib use crds token pipeline trail template guide (#2330 by @Jerry-yz)
- Fix Scikit-Learn Version for Skopt Tests (#2336 by @andreyvelich)
Misc
- Support old-style TensorFlow events (tensorboard) (#2517 by @garymm)
- Set experiment names at a max of 40 characters. (#2468 by @AydanPirani)
- [CI] optimize katib ui dockerfile (#2505 by @mahdikhashan)
- Sort experiments by descending creation date by default in katib-ui (#2498 by @Doris-xm)
- [GSoC] Add unit tests for
tune
API (#2423 by @helenxie-bit) - Update MutatingWebhookConfiguration: Switch from objectSelector to AdmissionWebhookMatchConditions (#2241 by @lianghao208)
- chore: supporting the listen-address parameter on db-manager (#2465 by @caiofralmeida)
- Upgrade klog to v2 (#2470 by @Doris-xm)
- Ignore cache exporting errors in the image building workflows (#2487 by @Doris-xm)
- Upgrade grpcio version to v1.64.1 (#2483 by @Electronic-Waste)
- docs: remove katib workflow (#2443 by @gonmmarques)
- Migrate KatibCertGenerator to OPA CertController (#2345 by @forsaken628)
- Promote @Electronic-Waste and @helenxie-bit as Katib reviewers (#2439 by @andreyvelich)
- Update README and out-of-date docs (#2438 by @andreyvelich)
- Changes isort profile to black, to be fully compatible and adds 'pkg' dir for black and flake8 (#2413 by @Ygnas)
- Introduced error constants and replaced reflect with cmp (#2289 by @tariq-hasan)
- [Test] Refactor
inject_webhook_test.go
according to the Developer Guide (#2401 by @Electronic-Waste) - Enhance pre-commit hooks with flake8 and black (#2407 by @Ygnas)
- added
Distribution
field to feasibleSpace inapi.proto
(#2397 by @shashank-iitbhu) - Begin enabling pre-commit hooks (#2242 by @droctothorpe)
- Update Instructions for Argo Workflows (#2382 by @jaffe-fly)
- docs: update suggestion.md (#2387 by @eltociear)
- Add command to re-run GitHub Actions tests (#2385 by @andreyvelich)
- Bump Katib Python SDK to 0.17.0 version (#2379 by @andreyvelich)
- Add Changelog for Katib v0.17.0 (#2380 by @andreyvelich)
- Replaced hpcloud with nxadm for tail package in Go (#2375 by @tariq-hasan)
- Use ErrorList for experiment validator (#2329 by @ckcd)
- Add Changelog for Katib v0.17.0-rc.1 (#2370 by @andreyvelich)
- Remove default caBundle value (#2368 by @vihangm)
- Bump Katib Python SDK to 0.17.0rc1 version (#2365 by @andreyvelich)
- Add unit test for
create_experiment
in thekatib_client
module (#2325 by @tariq-hasan) - Remove code generation from release script (#2363 by @andreyvelich)
- Upgrade the protobuf version to >=4.21.12,<5 (#2358 by @tenzen-y)
- Replace gRPC code generation tool from Znly/protoc to Buf (#2344 by @forsaken628)
- Replace already closed github.com/golang/mock with go.uber.org/mock (#2357 by @forsaken628)
- Use cache-dependency-path in actions/setup-go for CI workflow (#2355 by @forsaken628)
- Update Slack Invitation (#2349 by @andreyvelich)
- Update GitHub template to better triage Issues (#2335 by @andreyvelich)
- Add Changelog for Katib v0.17.0-rc.0 (#2319 by @andreyvelich)
- Update outdated actions (#2324 by @Mersho)
- Make test fields private in Go unit tests (#2316 by @tariq-hasan)
- Bump Katib Python SDK to 0.17.0rc0 Version (#2318 by @andreyvelich)
New Contributors
- @saileshd1402 made their first contribution in #2535
- @mahdikhashan made their first contribution in #2496
- @helenxie-bit made their first contribution in #2333
- @shashank-iitbhu made their first contribution in #2334
- @truc0 made their first contribution in #2482
- @tariq-hasan made their first contribution in #2289
- @Ygnas made their first contribution in #2407
- @jaffe-fly made their first contribution in #2382
- @eltociear made their first contribution in #2387
- @vihangm made their first contribution in #2368