Skip to content

v0.2.0

Latest
Compare
Choose a tag to compare
@kfswain kfswain released this 13 Mar 21:31
· 60 commits to main since this release
v0.2.0

What's Changed

  • Revert "Replace EndpointSlice reconciler with pod list backed by informers" by @kfswain in #301
  • Fixing small linter complaints by @kfswain in #302
  • In hermetic test, add additional test cases and move k8sClient object creation so it's called once for all tests by @BenjaminBraunDev in #278
  • [Metrics] Add average kv cache and waiting queue size metrics for inference pool by @JeffLuoo in #304
  • Move getting started guide to docs site by @kfswain in #308
  • site-source: Fix 'Bakcground' misspell in API concepts page by @timflannagan in #309
  • Mkdocs fixes by @kfswain in #314
  • Bump google.golang.org/protobuf from 1.36.4 to 1.36.5 by @dependabot in #315
  • Remove gci linter by @ahg-g in #317
  • fix: adds ErrorNotFound Handling for InferenceModel Reconciler by @danehans in #286
  • site-src: Replace k8sgateway with kgateway & fix spelling in roles-and-personas.md by @timflannagan in #311
  • Fix: Go Mod Imports by @danehans in #318
  • Updates EPP Deployment and Release Doc/Script by @danehans in #322
  • Delete InferenceModels from the datastore when deletionTimestamp is set by @ahg-g in #319
  • Actually init logging using Zap by @tchap in #267
  • Remove fatal log calls in executable code by @tchap in #265
  • feat: Adds e2e test script by @danehans in #294
  • Replacing endpointSlice Reconciler with a direct Pod Reconciler by @kfswain in #300
  • Move manager from runserver to main by @tchap in #331
  • feat: adds image-load and kind-load Make targets by @danehans in #288
  • Use structured logging by @tchap in #330
  • Add TLS support with self-signed certificate. by @ahg-g in #335
  • Lora syncer docs by @coolkp in #320
  • Fix cloudbuild rule for the LoRA syncer image by @ahg-g in #339
  • fix: Corrects release branch naming by @danehans in #333
  • Use contextual logging by @tchap in #337
  • Bump the kubernetes group with 6 updates by @dependabot in #351
  • Bump sigs.k8s.io/controller-runtime from 0.20.1 to 0.20.2 by @dependabot in #352
  • Fixes to the adapter rollouts guide by @ahg-g in #338
  • Consolidating all storage behind datastore by @ahg-g in #350
  • fixed a typo - close a bash markdown by @nirrozenbaum in #364
  • Added controller and datastore package by @hzxuzhonghu in #363
  • Move pkg/ext-proc -> cmd/ext-proc by @tchap in #368
  • added license header to all .go files by @nirrozenbaum in #370
  • fix inference extension not correctly scrape pod metrics by @Kuromesi in #366
  • Move pkg/manifests -> config/manifests by @tchap in #371
  • [Metrics] Add request error metrics by @JeffLuoo in #269
  • Rename pkg/ext-proc to pkg/epp by @tchap in #372
  • Move pkg/ext-proc/metrics/README.md -> site-src/guides/metrics.md by @courageJ in #373
  • Defining an outer metadata struct as part of the extproc endpoint picking protocol by @ahg-g in #377
  • Draft a revised README.md by @smarterclayton in #374
  • Add README.md file to the epp pkg by @ahg-g in #386
  • Split the proxy and model server protocols for easy reference by @ahg-g in #387
  • [Metric] Add inference pool and request error metrics to the dashboard by @JeffLuoo in #389
  • Switch to gcr.io/distroless/static:nonroot base image by @ahg-g in #384
  • fix context canceled recv error handling by @Kuromesi in #390
  • Added endpoint picker diagram by @ahg-g in #396
  • Added v1alpha2 api by @hzxuzhonghu in #398
  • Adding a roadmap to README by @kfswain in #400
  • Bump github.com/prometheus/client_golang from 1.20.5 to 1.21.0 by @dependabot in #402
  • Bump github.com/google/go-cmp from 0.6.0 to 0.7.0 by @dependabot in #403
  • updated logging in inferencepool reconciler by @nirrozenbaum in #399
  • added inferencemodel predicate + minor changes in logging by @nirrozenbaum in #397
  • Syncing getting started guide all to main by @kfswain in #410
  • fixed typo in filepath in website guide page by @nirrozenbaum in #412
  • Fix InferenceModel deletion logic by @ahg-g in #393
  • Updated yamls to use v1alpha2 by @ahg-g in #420
  • Rm v1alpha1 api by @hzxuzhonghu in #405
  • removed the EndpointPickerNotHealthy condition form pool status by @ahg-g in #421
  • [Metrics] Add metrics validation in integration test by @JeffLuoo in #413
  • predicate follow up PR to remove the check from Reconcile func by @nirrozenbaum in #418
  • Mis cleanup by @hzxuzhonghu in #428
  • fix metric scrape port not updated when inference pool target port updated by @Kuromesi in #417
  • make ModelName immutable and fix model weight by @hzxuzhonghu in #427
  • Consistent validation for reference types by @robscott in #430
  • create pods during integration tests by @Kuromesi in #431
  • fix typos by @nirrozenbaum in #433
  • Adding Accepted and ResolvedRefs conditions to InferencePool by @robscott in #446
  • Add code for Envoy extension that supports body-to-header translation by @rramkumar1 in #355
  • Add Makefile + cloudbuild configs for body-based routing extension by @rramkumar1 in #442
  • added cpu based example by @nirrozenbaum in #436
  • updated cleanup section in quickstart by @nirrozenbaum in #448
  • scheduling changes for lora affinity load balancing by @kaushikmitr in #423
  • Fixing default status on InferencePool by @robscott in #449
  • Use server side namespace filter by @hzxuzhonghu in #429
  • fixed filepath that points to gpu based model server deployment in few places by @nirrozenbaum in #451
  • Add library for generating self-signed cert by @rramkumar1 in #453
  • Support full duplex streaming by @kfswain in #450
  • Renaming conditions and reasons used in InferencePool status by @robscott in #454
  • Move integration and e2e tests for epp into epp-specific directories by @rramkumar1 in #457
  • Add initial integration test for body-based routing extension by @rramkumar1 in #458
  • Each pod has independent loops to refresh metrics by @liu-cong in #460
  • fixed broken link in README by @nirrozenbaum in #467
  • fixed minimal requirement for envoy version by @nirrozenbaum in #466
  • Bump github.com/onsi/ginkgo/v2 from 2.22.2 to 2.23.0 by @dependabot in #473
  • Bump sigs.k8s.io/controller-runtime from 0.20.2 to 0.20.3 by @dependabot in #470
  • Bump google.golang.org/grpc from 1.70.0 to 1.71.0 by @dependabot in #471
  • Bump github.com/prometheus/client_golang from 1.21.0 to 1.21.1 by @dependabot in #474
  • Bump sigs.k8s.io/structured-merge-diff/v4 from 4.5.0 to 4.6.0 by @dependabot in #472
  • [BBR] Fix bug where request trailers were not being handled by @rramkumar1 in #477
  • Add the base model to InferenceModel sample manifest by @liu-cong in #479
  • Fix metrics debug log; change metrics client log level to reduce spam by @liu-cong in #478
  • Add support for OpenAI API streaming protocol by @kfswain in #469

New Contributors

Full Changelog: v0.1.0...v0.2.0