Skip to content

Cherry pick changes from main branch to release0.6 branch#1969

Merged
varungup90 merged 5 commits intorelease-0.6from
main
Mar 2, 2026
Merged

Cherry pick changes from main branch to release0.6 branch#1969
varungup90 merged 5 commits intorelease-0.6from
main

Conversation

@varungup90
Copy link
Collaborator

Pull Request Description

Cherry pick changes from main branch to release0.6 branch

Related Issues

Resolves: #[Insert issue number(s)]

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

scarlet25151 and others added 5 commits February 26, 2026 23:02
* add more metrics
* pass label from the engine side
* chore: add grafana dashboard
* refactor the metrics to only expose one public method

Signed-off-by: chenyu.jiang <chenyu.jiang@bytedance.com>
Co-authored-by: chenyu.jiang <chenyu.jiang@bytedance.com>
Signed-off-by: varungupta <varungup90@gmail.com>
* Feat: Support vllm new kvevent format

Signed-off-by: qiupengfei <qiupengfei@baidu.com>
Change-Id: I5aafd7fbb42c72e62ba6ac06065a790e46c3e9f9

* Fix: isSamePod is wrong & incr RouterInit timeout

Signed-off-by: qiupengfei <qiupengfei@baidu.com>
Change-Id: Ia495ac8f352d0e92872b5d1847c4ea6330909c6c

* Fix: gemini cr suggestion & unittest

Signed-off-by: qiupengfei <qiupengfei@baidu.com>
Change-Id: I7bf72cfaa964beb4729db943111db0113ecac184

* Fix: gofmt error

Signed-off-by: qiupengfei <qiupengfei@baidu.com>
Change-Id: I979001891d1bf506afd3699911e50ad046caf878

* Fix: parseBlockHashToInt64 should handle float64 and float32

Signed-off-by: qiupengfei <qiupengfei@baidu.com>
Change-Id: I26a889bba182538bd2692c81301047f8b881b2ea

---------

Signed-off-by: qiupengfei <qiupengfei@baidu.com>
Co-authored-by: qiupengfei <qiupengfei@baidu.com>
Signed-off-by: chenyu.jiang <chenyu.jiang@bytedance.com>
Signed-off-by: varungupta <varungup90@gmail.com>
@varungup90 varungup90 self-assigned this Mar 2, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces several key enhancements to the aibrix platform. It focuses on improving model configuration flexibility, enhancing metric collection, ensuring compatibility with future vLLM changes, and refining PD disaggregation logic. These changes collectively contribute to a more robust, efficient, and adaptable system.

Highlights

  • Model Config Profiles: Introduces a new design for supplying model/gateway configuration via a single annotation with support for multiple named profiles selectable at runtime.
  • PromQL Worker: Implements a PromQL worker to handle Prometheus-based metrics, improving metric collection and preventing slow queries from affecting the main path.
  • KV Cache Event Decoding: Updates the KV cache event decoding to support both legacy int64 and new bytes formats for block hashes from vLLM, ensuring compatibility with future vLLM changes.
  • Metrics Emission: Refactors metrics emission to use a consistent method with labels and routing context, enhancing metric accuracy and flexibility.
  • PD Disaggregation: Enhances PD disaggregation logic to leverage model config profiles for prompt length bucketing and combined pod selection.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • development/app/config/mock/config-profile.yaml
    • Added a mock config-profile.yaml for testing model config profiles.
  • development/app/config/mock/kustomization.yaml
    • Modified kustomization.yaml to include the new config-profile.yaml.
  • docs/source/designs/model-config-profiles.rst
    • Added documentation for the model config and profiles design.
  • pkg/cache/cache_init.go
    • Modified cache_init.go to include a Prometheus event queue and initialize the PromQL worker.
  • pkg/cache/cache_metrics.go
    • Modified cache_metrics.go to add Prometheus query intervals, timeouts, and per-stage request latency metrics.
  • pkg/cache/cache_metrics_test.go
    • Modified cache_metrics_test.go to update metric emission tests with routing context.
  • pkg/cache/kvcache/event_types.go
    • Modified event_types.go to include notes on block hash conversion and support for both old and new formats.
  • pkg/cache/kvcache/msgpack_decoder.go
    • Modified msgpack_decoder.go to handle data parallel rank and support both legacy int64 and new bytes formats for block hashes.
  • pkg/cache/kvcache/msgpack_decoder_test.go
    • Modified msgpack_decoder_test.go to add tests for decoding block hashes in bytes format.
  • pkg/cache/kvcache/msgpack_encoder.go
    • Modified msgpack_encoder.go to specify parent_block_hash as nullable.
  • pkg/cache/kvcache/zmq_client.go
    • Modified zmq_client.go to send replay requests as multipart messages.
  • pkg/cache/utils.go
    • Modified utils.go to add functions for merging label pairs and handling metric labels.
  • pkg/cache/utils_test.go
    • Added utils_test.go to test label merging.
  • pkg/constants/model.go
    • Modified model.go to add ModelAnnoConfig for JSON model config with multiple profiles.
  • pkg/kvevent/handler.go
    • Modified handler.go to note that block hashes are already int64 after msgpack decoding.
  • pkg/kvevent/handler_test.go
    • Modified handler_test.go to use int32SliceToBytes for token IDs.
  • pkg/kvevent/integration_test.go
    • Modified integration_test.go to use int32SliceToBytes for token IDs.
  • pkg/kvevent/manager.go
    • Modified manager.go to correct pod comparison logic.
  • pkg/metrics/custom_metrics.go
    • Modified custom_metrics.go to refactor metric emission and add gateway pod name.
  • pkg/metrics/engine_fetcher.go
    • Modified engine_fetcher.go to emit metrics using EmitMetricToPrometheus and handle simple values.
  • pkg/metrics/metrics.go
    • Modified metrics.go to add per-stage request latency metrics.
  • pkg/metrics/types.go
    • Modified types.go to add labels to metric values and support Prometheus results.
  • pkg/metrics/utils.go
    • Modified utils.go to get counter/gauge values and histogram values.
  • pkg/plugins/gateway/algorithms/least_kv_cache.go
    • Modified least_kv_cache.go to use KVCacheUsagePerc instead of GPUCacheUsagePerc.
  • pkg/plugins/gateway/algorithms/least_kv_cache_test.go
    • Modified least_kv_cache_test.go to use KVCacheUsagePerc instead of GPUCacheUsagePerc.
  • pkg/plugins/gateway/algorithms/pd_disaggregation.go
    • Modified pd_disaggregation.go to leverage model config profiles for prompt length bucketing and combined pod selection.
  • pkg/plugins/gateway/algorithms/pd_disaggregation_test.go
    • Modified pd_disaggregation_test.go to use model config annotations and test prompt length bucketing.
  • pkg/plugins/gateway/algorithms/prefix_cache.go
    • Modified prefix_cache.go to change log level to V(4).
  • pkg/plugins/gateway/algorithms/router.go
    • Modified router.go to increase router initialization timeout.
  • pkg/plugins/gateway/algorithms/throughput.go
    • Modified throughput.go to use AvgPromptToksPerReq and AvgGenerationToksPerReq.
  • pkg/plugins/gateway/algorithms/throughput_test.go
    • Modified throughput_test.go to use AvgPromptToksPerReq and AvgGenerationToksPerReq.
  • pkg/plugins/gateway/configprofiles/configprofiles.go
    • Added configprofiles.go to handle model config profiles.
  • pkg/plugins/gateway/configprofiles/configprofiles_test.go
    • Added configprofiles_test.go to test model config profiles.
  • pkg/plugins/gateway/gateway.go
    • Modified gateway.go to add gateway request total metric and pod name.
  • pkg/plugins/gateway/gateway_req_body.go
    • Modified gateway_req_body.go to resolve model config profiles and derive routing strategies.
  • pkg/plugins/gateway/gateway_req_body_test.go
    • Modified gateway_req_body_test.go to validate routing strategy and request body handling.
  • pkg/plugins/gateway/gateway_req_headers.go
    • Modified gateway_req_headers.go to handle request headers and config profiles.
  • pkg/plugins/gateway/gateway_rsp_body.go
    • Modified gateway_rsp_body.go to update response headers and request end helper.
  • test/e2e/routing_config_profile_test.go
    • Added routing_config_profile_test.go to test config profile routing strategy.
  • test/e2e/util.go
    • Modified util.go to add model name and config profile.
Activity
  • Introduced model config profiles for flexible routing.
  • Implemented PromQL worker for efficient metric collection.
  • Updated KV cache event decoding for vLLM compatibility.
  • Refactored metrics emission for consistency and accuracy.
  • Enhanced PD disaggregation with config profiles.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request cherry-picks a large set of changes, primarily introducing a new "Model Config and Profiles" feature. This is a significant architectural improvement, moving model-specific configurations from scattered pod labels to a centralized JSON annotation. The implementation is well-documented and integrated across the gateway and routing logic. The PR also includes several other valuable changes, such as a new worker queue for Prometheus queries to improve stability, and a major refactoring of how metrics are emitted. I have identified a couple of issues that need attention: a hardcoded secret in a mock configuration and a potential logic change in the throughput router that should be clarified.

@varungup90 varungup90 merged commit 413bdd6 into release-0.6 Mar 2, 2026
33 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants