Feat/llm cost user policy#1359
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds llm-cost and llm-cost-based-ratelimit policies, pricing configuration and assets, runtime/build changes to include llm-pricing, many mock LLM OpenAPI paths, extensive integration tests for cost and budget enforcement, and small test-step and formatting tweaks. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Gateway as Gateway
participant Provider as LLM Provider
participant PolicyEngine as Policy Engine
participant PricingDB as Pricing DB
participant RateLimiter as Rate Limiter
Client->>Gateway: Send LLM request
Gateway->>Provider: Forward request
Provider-->>Gateway: Response (model, usage)
Gateway->>PolicyEngine: Run llm-cost policy (response)
PolicyEngine->>PricingDB: Lookup model pricing
PricingDB-->>PolicyEngine: Return rates
PolicyEngine->>PolicyEngine: Compute USD cost
PolicyEngine-->>Gateway: Inject `x-llm-cost` header
Gateway->>PolicyEngine: Run llm-cost-based-ratelimit (pass cost)
PolicyEngine->>RateLimiter: Check/deduct budget
alt Budget available
RateLimiter-->>PolicyEngine: Allow
PolicyEngine-->>Gateway: Proceed
Gateway-->>Client: 200 + cost headers
else Budget exhausted
RateLimiter-->>PolicyEngine: Reject
PolicyEngine-->>Gateway: Deny (429)
Gateway-->>Client: 429 Too Many Requests
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
gateway/gateway-runtime/Makefile (1)
39-39: Deduplicate repeated target-context setup to reduce drift risk.The same prep command is repeated in 4 targets. Consider extracting it into a reusable make target/macro so future path changes stay consistent.
♻️ Suggested refactor
+.PHONY: prepare-target-context + +prepare-target-context: + `@mkdir` -p target/configs + `@cp` ../build.yaml target/ + `@cp` -r ../configs/llm-pricing target/configs/ + build: ## Build Gateway Runtime Docker image using buildx `@echo` "Building Gateway Runtime Docker image ($(IMAGE_NAME):$(VERSION))..." - `@mkdir` -p target/configs && cp ../build.yaml target/ && cp -r ../configs/llm-pricing target/configs/ + @$(MAKE) prepare-target-context docker buildx build -f Dockerfile \ ... build-debug: ## Build debug Gateway Runtime image for remote debugging with dlv (VS Code attach on port 2346) `@echo` "Building Gateway Runtime debug image ($(IMAGE_NAME)-debug:$(VERSION))..." - `@mkdir` -p target/configs && cp ../build.yaml target/ && cp -r ../configs/llm-pricing target/configs/ + @$(MAKE) prepare-target-context ... build-coverage-image: test ## Build Gateway Runtime Docker image with coverage instrumentation `@echo` "Building Gateway Runtime coverage image ($(IMAGE_NAME)-coverage:$(VERSION))..." - `@mkdir` -p target/configs && cp ../build.yaml target/ && cp -r ../configs/llm-pricing target/configs/ + @$(MAKE) prepare-target-context ... build-and-push-multiarch: ## Build and push Gateway Runtime Docker image for multiple architectures (amd64, arm64) `@echo` "Building and pushing multi-arch Gateway Runtime Docker image ($(VERSION))..." - `@mkdir` -p target/configs && cp ../build.yaml target/ && cp -r ../configs/llm-pricing target/configs/ + @$(MAKE) prepare-target-contextAlso applies to: 62-62, 85-85, 109-109
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/gateway-runtime/Makefile` at line 39, The repeated prep command (mkdir -p target/configs && cp ../build.yaml target/ && cp -r ../configs/llm-pricing target/configs/) should be extracted into a single reusable Makefile target or macro (e.g., a .PHONY target named prepare-configs or a make variable) and then referenced as a dependency or invoked from the four targets that currently duplicate the line; update each target to depend on or call prepare-configs instead of repeating the shell sequence, ensure the new target performs the same mkdir/cp steps and preserves exit behavior so build ordering and error handling remain unchanged.gateway/it/mock-api/paths/unknown-llm/chat.yaml (1)
8-16: Makemodelandmessagesrequired in this endpoint to avoid scenario overlap.Right now this payload can also represent “missing model” cases, which dilutes the specific unknown-model path this mock is intended to cover.
♻️ Proposed schema tightening
schema: type: object + required: + - model + - messages properties: model: type: string messages: type: array🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/it/mock-api/paths/unknown-llm/chat.yaml` around lines 8 - 16, The schema currently allows payloads missing fields; update the OpenAPI schema under the object schema to require both properties by adding a required array that lists "model" and "messages" so the endpoint always expects those fields (ensure the existing properties for model and messages remain unchanged and that "messages" stays typed as an array of objects).gateway/it/features/llm-cost-combination.feature (1)
104-109: Assert the budget headers on rejected requests.Both scenarios only check
429, so regressions inx-ratelimit-cost-limit-dollarsandx-ratelimit-cost-remaining-dollarswould still pass even though they are part of the rejection contract described for this policy pair.Suggested coverage tweak
@@ Then the response status code should be 429 + And the response header "x-ratelimit-cost-limit-dollars" should exist + And the response header "x-ratelimit-cost-remaining-dollars" should exist @@ Then the response status code should be 429 + And the response header "x-ratelimit-cost-limit-dollars" should exist + And the response header "x-ratelimit-cost-remaining-dollars" should existAlso applies to: 186-191
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/it/features/llm-cost-combination.feature` around lines 104 - 109, Update the feature assertions for the POST to "http://localhost:8080/combo-openai/openai/v1/chat/completions" (and the other scenario at lines 186-191) to also validate the rejection headers: assert the response contains headers "x-ratelimit-cost-limit-dollars" and "x-ratelimit-cost-remaining-dollars", verify the limit header is present and numeric, and verify the remaining header is present and represents a value <= 0 (or equals "0" if your policy uses exact zero), so the 429 test covers the full rejection contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@gateway/gateway-controller/default-policies/llm-cost-based-ratelimit.yaml`:
- Around line 35-40: The duration pattern currently allows a leading + or -
which permits signed durations like "-1h"; update the regex for the duration
field in the llm-cost-based-ratelimit policy to disallow any sign prefix so only
non-negative Go duration strings are accepted (i.e. remove the leading "[-+]?"
from the pattern used by the duration property). Ensure the pattern still
matches valid Go duration units (ns, us, µs, ms, s, m, h) and the same numeric
formats used previously.
- Around line 11-16: The schema currently allows an empty params object because
budgetLimits is optional; update the parameters schema to make budgetLimits
mandatory by adding a required array that includes "budgetLimits" (i.e., add
required: ["budgetLimits"] alongside the existing
type/additionalProperties/properties) so a missing budgetLimits fails validation
rather than becoming a silent no-op; reference the parameters -> properties ->
budgetLimits entries when making this change.
- Around line 157-177: The Redis default-value bindings use camelCase keys
(keyPrefix, failureMode, connectionTimeout, readTimeout, writeTimeout) which
drift from the snake_case names used elsewhere (key_prefix, failure_mode,
connection_timeout, read_timeout, write_timeout) causing non-default config to
be ignored; update the property names in this policy so the "wso2/defaultValue"
mappings reference the snake_case identifiers (key_prefix, failure_mode,
connection_timeout, read_timeout, write_timeout) and keep the existing defaults
and enums (e.g., failureMode enum "open"/"closed") intact to restore correct
Redis config binding for distributed budget enforcement.
---
Nitpick comments:
In `@gateway/gateway-runtime/Makefile`:
- Line 39: The repeated prep command (mkdir -p target/configs && cp
../build.yaml target/ && cp -r ../configs/llm-pricing target/configs/) should be
extracted into a single reusable Makefile target or macro (e.g., a .PHONY target
named prepare-configs or a make variable) and then referenced as a dependency or
invoked from the four targets that currently duplicate the line; update each
target to depend on or call prepare-configs instead of repeating the shell
sequence, ensure the new target performs the same mkdir/cp steps and preserves
exit behavior so build ordering and error handling remain unchanged.
In `@gateway/it/features/llm-cost-combination.feature`:
- Around line 104-109: Update the feature assertions for the POST to
"http://localhost:8080/combo-openai/openai/v1/chat/completions" (and the other
scenario at lines 186-191) to also validate the rejection headers: assert the
response contains headers "x-ratelimit-cost-limit-dollars" and
"x-ratelimit-cost-remaining-dollars", verify the limit header is present and
numeric, and verify the remaining header is present and represents a value <= 0
(or equals "0" if your policy uses exact zero), so the 429 test covers the full
rejection contract.
In `@gateway/it/mock-api/paths/unknown-llm/chat.yaml`:
- Around line 8-16: The schema currently allows payloads missing fields; update
the OpenAPI schema under the object schema to require both properties by adding
a required array that lists "model" and "messages" so the endpoint always
expects those fields (ensure the existing properties for model and messages
remain unchanged and that "messages" stays typed as an array of objects).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 836c2554-009c-4196-b5ef-0b715dc89885
⛔ Files ignored due to path filters (1)
go.work.sumis excluded by!**/*.sum
📒 Files selected for processing (38)
.gitignoregateway/build-lock.yamlgateway/build.yamlgateway/configs/config-template.tomlgateway/configs/config.tomlgateway/configs/llm-pricing/model_prices.jsongateway/gateway-controller/default-policies/llm-cost-based-ratelimit.yamlgateway/gateway-controller/default-policies/llm-cost.yamlgateway/gateway-controller/pkg/config/config.gogateway/gateway-controller/pkg/policy/builder.gogateway/gateway-runtime/Dockerfilegateway/gateway-runtime/Makefilegateway/it/features/llm-cost-combination.featuregateway/it/features/llm-cost.featuregateway/it/mock-api/openapi.yamlgateway/it/mock-api/paths/anthropic/messages.yamlgateway/it/mock-api/paths/anthropic/messages_cache_1hr.yamlgateway/it/mock-api/paths/anthropic/messages_cache_read.yamlgateway/it/mock-api/paths/anthropic/messages_geo_speed.yamlgateway/it/mock-api/paths/anthropic/messages_web_search.yamlgateway/it/mock-api/paths/custom-llm/chat.yamlgateway/it/mock-api/paths/custom-llm/gpt35.yamlgateway/it/mock-api/paths/gemini/generate_content.yamlgateway/it/mock-api/paths/gemini/generate_content_cached.yamlgateway/it/mock-api/paths/gemini/generate_content_thinking.yamlgateway/it/mock-api/paths/mistral/chat.yamlgateway/it/mock-api/paths/openai/chat.yamlgateway/it/mock-api/paths/openai/chat_batch.yamlgateway/it/mock-api/paths/openai/chat_cached.yamlgateway/it/mock-api/paths/openai/chat_flex.yamlgateway/it/mock-api/paths/openai/chat_priority.yamlgateway/it/mock-api/paths/openai/chat_reasoning.yamlgateway/it/mock-api/paths/openai/chat_web_search.yamlgateway/it/mock-api/paths/unknown-llm/chat.yamlgateway/it/mock-api/paths/unknown-llm/no_model_field.yamlgateway/it/steps_health.gogateway/it/suite_test.gogateway/it/test-config.toml
💤 Files with no reviewable changes (1)
- gateway/it/mock-api/paths/openai/chat.yaml
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@gateway/gateway-controller/default-policies/llm-cost-based-ratelimit.yaml`:
- Around line 138-145: The memory config bindings in this policy use camelCase
keys maxEntries and cleanupInterval which drift from the `advanced-ratelimit`
config keys max_entries and cleanup_interval causing overrides to be ignored;
update the wso2/defaultValue bindings for the fields named maxEntries and
cleanupInterval to reference
`${config.policy_configurations.ratelimit_v0.memory.max_entries}` and
`${config.policy_configurations.ratelimit_v0.memory.cleanup_interval}`
respectively (leave the schema property names as-is if needed but ensure the
bound config keys match `max_entries` and `cleanup_interval`).
- Line 64: The YAML references
config.policy_configurations.llm_cost_ratelimit.cost_scale_factor but that
section/field doesn't exist; add a cost_scale_factor entry under
policy_configurations.llm_cost_v0 in both config.toml and config-template.toml
(provide a sensible default, e.g., cost_scale_factor = 1.0), and update the YAML
binding to use the correct path
config.policy_configurations.llm_cost_v0.cost_scale_factor (the
wso2/defaultValue key should reference
${config.policy_configurations.llm_cost_v0.cost_scale_factor}); ensure the key
name cost_scale_factor matches exactly between the TOML and the YAML.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dd7ec1d8-b533-466b-90c5-4849b4ca4abb
📒 Files selected for processing (1)
gateway/gateway-controller/default-policies/llm-cost-based-ratelimit.yaml
52c1880 to
4897c21
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
gateway/gateway-controller/default-policies/llm-cost-based-ratelimit.yaml (2)
138-145:⚠️ Potential issue | 🟠 MajorMemory config bindings drift from
advanced-ratelimit.The bindings use
maxentriesandcleanupintervalbut the referenceadvanced-ratelimit.yamlusesmax_entriesandcleanup_interval. Non-default memory config will be ignored.Suggested fix
maxEntries: type: integer default: 10000 - "wso2/defaultValue": "${config.policy_configurations.ratelimit_v0.memory.maxentries}" + "wso2/defaultValue": "${config.policy_configurations.ratelimit_v0.memory.max_entries}" cleanupInterval: type: string default: "5m" - "wso2/defaultValue": "${config.policy_configurations.ratelimit_v0.memory.cleanupinterval}" + "wso2/defaultValue": "${config.policy_configurations.ratelimit_v0.memory.cleanup_interval}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/gateway-controller/default-policies/llm-cost-based-ratelimit.yaml` around lines 138 - 145, The policy's memory config binds use the wrong keys—change the wso2/defaultValue references for maxEntries and cleanupInterval to match the advanced-ratelimit names: replace "${config.policy_configurations.ratelimit_v0.memory.maxentries}" with "${config.policy_configurations.ratelimit_v0.memory.max_entries}" and replace "${config.policy_configurations.ratelimit_v0.memory.cleanupinterval}" with "${config.policy_configurations.ratelimit_v0.memory.cleanup_interval}" so the defaults for maxEntries and cleanupInterval in the schema (maxEntries, cleanupInterval) correctly pick up non-default values from the config.
62-64:⚠️ Potential issue | 🔴 CriticalConfig section reference and field are both missing.
Line 64 references
${config.policy_configurations.llm_cost_ratelimit.cost_scale_factor}, but the config template only has[policy_configurations.llm_cost_v0]which contains onlypricing_file. Neither thellm_cost_ratelimitsection nor thecost_scale_factorfield exists.Suggested fix
Add
cost_scale_factorto[policy_configurations.llm_cost_v0]inconfig-template.toml:[policy_configurations.llm_cost_v0] pricing_file = "/home/wso2/conf/llm-pricing/model_prices.json" +cost_scale_factor = 1000000000Then update the binding in this YAML:
- "wso2/defaultValue": "${config.policy_configurations.llm_cost_ratelimit.cost_scale_factor}" + "wso2/defaultValue": "${config.policy_configurations.llm_cost_v0.cost_scale_factor}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/gateway-controller/default-policies/llm-cost-based-ratelimit.yaml` around lines 62 - 64, The policy YAML references ${config.policy_configurations.llm_cost_ratelimit.cost_scale_factor} but your config template only defines [policy_configurations.llm_cost_v0] with pricing_file; add the missing section/field: add cost_scale_factor under the appropriate policy_configurations block in config-template.toml (or rename the YAML binding to match the existing llm_cost_v0 keys). Specifically, either add a new llm_cost_ratelimit table with a cost_scale_factor entry in config-template.toml, or change the YAML's "wso2/defaultValue" to reference policy_configurations.llm_cost_v0.pricing_file (or the correct existing key) so the binding matches an existing config value.
🧹 Nitpick comments (5)
gateway/it/mock-api/paths/mistral/chat.yaml (1)
45-49: Unusual fieldprompt_audio_secondsfor Mistral API response.The
prompt_audio_seconds: nullfield in the usage object is atypical for Mistral's API response format. If this is intentional for testing edge cases where the cost policy handles unexpected fields, consider adding a comment to clarify. Otherwise, it may be removed for accuracy.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/it/mock-api/paths/mistral/chat.yaml` around lines 45 - 49, The usage object in the mock Mistral chat response contains an atypical field prompt_audio_seconds: null; either remove this field to match Mistral's expected response shape or explicitly document its presence as an intentional edge-case for cost-policy tests (add an inline comment in paths/mistral/chat.yaml near the usage object), and if it’s retained update any test or validation code that consumes the usage object (e.g., mocks or parsers) to handle prompt_audio_seconds being present and null.gateway/it/mock-api/paths/openai/chat_flex.yaml (1)
8-19: Consider addingrequiredconstraints for consistency.The
chat_cached.yamlin the same directory specifiesrequired: [model, messages]while this schema omits it. For consistency across OpenAI mock endpoints, consider adding the constraint.Suggested fix
schema: type: object + required: + - model + - messages properties: model: type: string🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/it/mock-api/paths/openai/chat_flex.yaml` around lines 8 - 19, Add a required constraint to the top-level schema by adding required: - model - messages so the schema object enforces presence of properties.model and properties.messages; update the schema node (referencing schema, properties.model, properties.messages) to match the other mock (chat_cached.yaml) for consistency across OpenAI mock endpoints.gateway/gateway-runtime/Dockerfile (1)
157-159: Debug stage lacks directory creation and ownership setup forllm-pricing.The production stage (lines 203-205) explicitly creates
/home/wso2/conf/llm-pricingand sets ownership. The debug stage copies to the same path but skips these steps. While the debug stage runs as root, consider adding consistency for maintainability:Suggested fix for debug stage
COPY --from=target configs/llm-pricing/ /home/wso2/conf/llm-pricing/ -RUN chmod +x /app/policy-engine /usr/local/bin/dlv /usr/local/bin/docker-entrypoint.sh /usr/local/bin/health-check.sh +RUN mkdir -p /home/wso2/conf/llm-pricing && \ + chmod +x /app/policy-engine /usr/local/bin/dlv /usr/local/bin/docker-entrypoint.sh /usr/local/bin/health-check.sh🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/gateway-runtime/Dockerfile` around lines 157 - 159, The debug stage copies configs/llm-pricing into /home/wso2/conf/llm-pricing but doesn’t create the directory or set ownership like the production stage; update the debug-stage Dockerfile sequence around the COPY --from=target configs/llm-pricing/ /home/wso2/conf/llm-pricing/ and subsequent RUN chmod lines to also create /home/wso2/conf/llm-pricing (mkdir -p) and chown it to the same user/group used in production (e.g., chown -R wso2:wso2 /home/wso2/conf/llm-pricing) so permissions and layout match production.gateway/it/mock-api/paths/custom-llm/chat.yaml (1)
8-16: Consider addingrequiredconstraints for request body fields.Unlike the Mistral mock (which specifies
required: [model, messages]), this schema lacks required field constraints. For consistency across mock API definitions, consider adding them.Suggested fix
schema: type: object + required: + - model + - messages properties: model: type: string🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/it/mock-api/paths/custom-llm/chat.yaml` around lines 8 - 16, The request body schema is missing required constraints for the model and messages fields; update the schema for the chat operation by adding a required array that lists "model" and "messages" (i.e., add required: [model, messages] under the same schema object that contains properties: model and messages) so the mock matches the Mistral definition and enforces both fields are present.gateway/it/features/llm-cost-combination.feature (1)
193-201: Consider increasing the wait buffer for window reset test.The 11-second wait after a 10-second window may be tight in slow CI environments, potentially causing flaky tests. Consider increasing to 12-13 seconds for more reliable test execution.
Suggested adjustment
- When I wait for 11 seconds + When I wait for 12 seconds🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/it/features/llm-cost-combination.feature` around lines 193 - 201, Increase the wait buffer in the window-reset scenario by changing the step that currently says "When I wait for 11 seconds" to wait 12–13 seconds instead, so the feature's scenario (the step before sending POST to "http://localhost:8080/combo-reset/openai/v1/chat/completions") has a more reliable delay for slow CI environments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@gateway/it/mock-api/paths/custom-llm/chat.yaml`:
- Around line 25-39: The mock chat example uses model "my-enterprise-llm-v1" but
that model is not defined in the pricing DB, causing llm-cost policy tests to
fail; add an entry for "my-enterprise-llm-v1" in the pricing file
model_prices.json with input rate 1e-5 and output rate 2e-5 so the policy can
compute the expected x-llm-cost (0.0016000000). Locate the model_prices.json and
add the object keyed by "my-enterprise-llm-v1" with fields matching existing
entries (e.g., "input_per_token": 0.00001, "output_per_token": 0.00002) and
ensure it follows the same structure/format as other models so the lookup used
by the llm-cost policy succeeds.
---
Duplicate comments:
In `@gateway/gateway-controller/default-policies/llm-cost-based-ratelimit.yaml`:
- Around line 138-145: The policy's memory config binds use the wrong
keys—change the wso2/defaultValue references for maxEntries and cleanupInterval
to match the advanced-ratelimit names: replace
"${config.policy_configurations.ratelimit_v0.memory.maxentries}" with
"${config.policy_configurations.ratelimit_v0.memory.max_entries}" and replace
"${config.policy_configurations.ratelimit_v0.memory.cleanupinterval}" with
"${config.policy_configurations.ratelimit_v0.memory.cleanup_interval}" so the
defaults for maxEntries and cleanupInterval in the schema (maxEntries,
cleanupInterval) correctly pick up non-default values from the config.
- Around line 62-64: The policy YAML references
${config.policy_configurations.llm_cost_ratelimit.cost_scale_factor} but your
config template only defines [policy_configurations.llm_cost_v0] with
pricing_file; add the missing section/field: add cost_scale_factor under the
appropriate policy_configurations block in config-template.toml (or rename the
YAML binding to match the existing llm_cost_v0 keys). Specifically, either add a
new llm_cost_ratelimit table with a cost_scale_factor entry in
config-template.toml, or change the YAML's "wso2/defaultValue" to reference
policy_configurations.llm_cost_v0.pricing_file (or the correct existing key) so
the binding matches an existing config value.
---
Nitpick comments:
In `@gateway/gateway-runtime/Dockerfile`:
- Around line 157-159: The debug stage copies configs/llm-pricing into
/home/wso2/conf/llm-pricing but doesn’t create the directory or set ownership
like the production stage; update the debug-stage Dockerfile sequence around the
COPY --from=target configs/llm-pricing/ /home/wso2/conf/llm-pricing/ and
subsequent RUN chmod lines to also create /home/wso2/conf/llm-pricing (mkdir -p)
and chown it to the same user/group used in production (e.g., chown -R wso2:wso2
/home/wso2/conf/llm-pricing) so permissions and layout match production.
In `@gateway/it/features/llm-cost-combination.feature`:
- Around line 193-201: Increase the wait buffer in the window-reset scenario by
changing the step that currently says "When I wait for 11 seconds" to wait 12–13
seconds instead, so the feature's scenario (the step before sending POST to
"http://localhost:8080/combo-reset/openai/v1/chat/completions") has a more
reliable delay for slow CI environments.
In `@gateway/it/mock-api/paths/custom-llm/chat.yaml`:
- Around line 8-16: The request body schema is missing required constraints for
the model and messages fields; update the schema for the chat operation by
adding a required array that lists "model" and "messages" (i.e., add required:
[model, messages] under the same schema object that contains properties: model
and messages) so the mock matches the Mistral definition and enforces both
fields are present.
In `@gateway/it/mock-api/paths/mistral/chat.yaml`:
- Around line 45-49: The usage object in the mock Mistral chat response contains
an atypical field prompt_audio_seconds: null; either remove this field to match
Mistral's expected response shape or explicitly document its presence as an
intentional edge-case for cost-policy tests (add an inline comment in
paths/mistral/chat.yaml near the usage object), and if it’s retained update any
test or validation code that consumes the usage object (e.g., mocks or parsers)
to handle prompt_audio_seconds being present and null.
In `@gateway/it/mock-api/paths/openai/chat_flex.yaml`:
- Around line 8-19: Add a required constraint to the top-level schema by adding
required: - model - messages so the schema object enforces presence of
properties.model and properties.messages; update the schema node (referencing
schema, properties.model, properties.messages) to match the other mock
(chat_cached.yaml) for consistency across OpenAI mock endpoints.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bfb38e64-99fd-488d-9192-5a90fcccd832
⛔ Files ignored due to path filters (1)
go.work.sumis excluded by!**/*.sum
📒 Files selected for processing (38)
.gitignoregateway/build-lock.yamlgateway/build.yamlgateway/configs/config-template.tomlgateway/configs/config.tomlgateway/configs/llm-pricing/model_prices.jsongateway/gateway-controller/default-policies/llm-cost-based-ratelimit.yamlgateway/gateway-controller/default-policies/llm-cost.yamlgateway/gateway-controller/pkg/config/config.gogateway/gateway-controller/pkg/policy/builder.gogateway/gateway-runtime/Dockerfilegateway/gateway-runtime/Makefilegateway/it/features/llm-cost-combination.featuregateway/it/features/llm-cost.featuregateway/it/mock-api/openapi.yamlgateway/it/mock-api/paths/anthropic/messages.yamlgateway/it/mock-api/paths/anthropic/messages_cache_1hr.yamlgateway/it/mock-api/paths/anthropic/messages_cache_read.yamlgateway/it/mock-api/paths/anthropic/messages_geo_speed.yamlgateway/it/mock-api/paths/anthropic/messages_web_search.yamlgateway/it/mock-api/paths/custom-llm/chat.yamlgateway/it/mock-api/paths/custom-llm/gpt35.yamlgateway/it/mock-api/paths/gemini/generate_content.yamlgateway/it/mock-api/paths/gemini/generate_content_cached.yamlgateway/it/mock-api/paths/gemini/generate_content_thinking.yamlgateway/it/mock-api/paths/mistral/chat.yamlgateway/it/mock-api/paths/openai/chat.yamlgateway/it/mock-api/paths/openai/chat_batch.yamlgateway/it/mock-api/paths/openai/chat_cached.yamlgateway/it/mock-api/paths/openai/chat_flex.yamlgateway/it/mock-api/paths/openai/chat_priority.yamlgateway/it/mock-api/paths/openai/chat_reasoning.yamlgateway/it/mock-api/paths/openai/chat_web_search.yamlgateway/it/mock-api/paths/unknown-llm/chat.yamlgateway/it/mock-api/paths/unknown-llm/no_model_field.yamlgateway/it/steps_health.gogateway/it/suite_test.gogateway/it/test-config.toml
💤 Files with no reviewable changes (1)
- gateway/it/mock-api/paths/openai/chat.yaml
🚧 Files skipped from review as they are similar to previous changes (19)
- gateway/gateway-runtime/Makefile
- gateway/build.yaml
- gateway/it/mock-api/paths/unknown-llm/chat.yaml
- gateway/it/steps_health.go
- gateway/it/mock-api/paths/custom-llm/gpt35.yaml
- gateway/it/mock-api/paths/gemini/generate_content.yaml
- gateway/gateway-controller/default-policies/llm-cost.yaml
- gateway/it/mock-api/paths/openai/chat_priority.yaml
- gateway/build-lock.yaml
- gateway/it/mock-api/paths/anthropic/messages.yaml
- gateway/gateway-controller/pkg/policy/builder.go
- gateway/gateway-controller/pkg/config/config.go
- gateway/it/mock-api/paths/anthropic/messages_cache_read.yaml
- gateway/it/mock-api/paths/anthropic/messages_geo_speed.yaml
- gateway/it/test-config.toml
- gateway/configs/config.toml
- gateway/it/suite_test.go
- gateway/it/mock-api/paths/openai/chat_web_search.yaml
- gateway/it/mock-api/paths/gemini/generate_content_cached.yaml
| example: | ||
| id: resp-custom-001 | ||
| object: chat.completion | ||
| created: 1741569952 | ||
| model: my-enterprise-llm-v1 | ||
| choices: | ||
| - index: 0 | ||
| message: | ||
| role: assistant | ||
| content: "Hello from the custom enterprise LLM!" | ||
| finish_reason: stop | ||
| usage: | ||
| prompt_tokens: 80 | ||
| completion_tokens: 40 | ||
| total_tokens: 120 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the custom model exists in the pricing database with expected rates
fd -t f 'model_prices.json' --exec grep -A 10 'my-enterprise-llm-v1' {} \;Repository: wso2/api-platform
Length of output: 43
🏁 Script executed:
# Search for model_prices.json or similar pricing files
fd -t f 'model_prices'Repository: wso2/api-platform
Length of output: 106
🏁 Script executed:
# Also search broadly for any pricing-related files
fd -t f 'price'Repository: wso2/api-platform
Length of output: 106
🏁 Script executed:
# Check the structure of the gateway/it directory
find gateway/it -type f -name "*.json" | head -20Repository: wso2/api-platform
Length of output: 43
🏁 Script executed:
cat gateway/configs/llm-pricing/model_prices.jsonRepository: wso2/api-platform
Length of output: 50374
🏁 Script executed:
grep -A 5 'my-enterprise-llm-v1' gateway/configs/llm-pricing/model_prices.jsonRepository: wso2/api-platform
Length of output: 43
Add my-enterprise-llm-v1 to the pricing database.
The mock response uses a model name that does not exist in gateway/configs/llm-pricing/model_prices.json. The llm-cost policy tests will fail when attempting to look up pricing for this model. Define the model with the expected pricing rates (input=1e-5/token, output=2e-5/token) to produce the expected x-llm-cost header value of 0.0016000000.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@gateway/it/mock-api/paths/custom-llm/chat.yaml` around lines 25 - 39, The
mock chat example uses model "my-enterprise-llm-v1" but that model is not
defined in the pricing DB, causing llm-cost policy tests to fail; add an entry
for "my-enterprise-llm-v1" in the pricing file model_prices.json with input rate
1e-5 and output rate 2e-5 so the policy can compute the expected x-llm-cost
(0.0016000000). Locate the model_prices.json and add the object keyed by
"my-enterprise-llm-v1" with fields matching existing entries (e.g.,
"input_per_token": 0.00001, "output_per_token": 0.00002) and ensure it follows
the same structure/format as other models so the lookup used by the llm-cost
policy succeeds.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
gateway/it/features/llm-cost-based-ratelimit.feature (2)
85-92: Consider asserting header values for more robust verification.Lines 91-92 only check that rate-limit headers exist. Verifying the actual values (e.g.,
x-ratelimit-cost-limit-dollarsshould be"0.000236"andx-ratelimit-cost-remaining-dollarsshould be approximately"0.000118"after the first request) would catch subtle bugs in cost calculation or header formatting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/it/features/llm-cost-based-ratelimit.feature` around lines 85 - 92, Update the scenario assertions to verify the actual rate-limit header values rather than only existence: check that the "x-ratelimit-cost-limit-dollars" header equals "0.000236" and that "x-ratelimit-cost-remaining-dollars" equals or approximately equals "0.000118" after the first request (use exact string match for the limit header and a numeric/regex or tolerance-based comparison for the remaining header to allow minor floating-point differences); update the step(s) that currently assert header existence to perform these value checks against the headers "x-ratelimit-cost-limit-dollars" and "x-ratelimit-cost-remaining-dollars" so the test catches cost-calculation/formatting regressions.
675-676: Potential test flakiness due to tight timing buffer.The 11-second wait for a 10-second window provides only a 1-second buffer. In CI environments with variable timing (container scheduling, resource contention), this could cause intermittent failures. Consider increasing the wait to 12-15 seconds, or alternatively using a shorter window (e.g., 5s) with proportionally longer wait to increase the relative buffer.
🔧 Suggested adjustment
params: budgetLimits: - amount: 0.000236 - duration: "10s" + duration: "5s" - name: llm-costThen update the wait:
- When I wait for 11 seconds + When I wait for 8 secondsThis gives a 60% buffer (3s on 5s) instead of 10% (1s on 10s).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/it/features/llm-cost-based-ratelimit.feature` around lines 675 - 676, The test's timing is fragile: replace the step "When I wait for 11 seconds" (which only gives a 1s buffer on a 10-second window) with a longer wait to avoid CI flakiness—e.g., change it to "When I wait for 15 seconds" (or any 12–15s value) so the wait provides a safer buffer, or alternatively shorten the rate-limit window used by the scenario (e.g., switch the 10s window to 5s and wait ~8s) and update the corresponding step accordingly to ensure a larger relative buffer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@gateway/it/features/llm-cost-based-ratelimit.feature`:
- Around line 85-92: Update the scenario assertions to verify the actual
rate-limit header values rather than only existence: check that the
"x-ratelimit-cost-limit-dollars" header equals "0.000236" and that
"x-ratelimit-cost-remaining-dollars" equals or approximately equals "0.000118"
after the first request (use exact string match for the limit header and a
numeric/regex or tolerance-based comparison for the remaining header to allow
minor floating-point differences); update the step(s) that currently assert
header existence to perform these value checks against the headers
"x-ratelimit-cost-limit-dollars" and "x-ratelimit-cost-remaining-dollars" so the
test catches cost-calculation/formatting regressions.
- Around line 675-676: The test's timing is fragile: replace the step "When I
wait for 11 seconds" (which only gives a 1s buffer on a 10-second window) with a
longer wait to avoid CI flakiness—e.g., change it to "When I wait for 15
seconds" (or any 12–15s value) so the wait provides a safer buffer, or
alternatively shorten the rate-limit window used by the scenario (e.g., switch
the 10s window to 5s and wait ~8s) and update the corresponding step accordingly
to ensure a larger relative buffer.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f2fbd480-d885-4e29-a0df-6777b79dfe64
📒 Files selected for processing (2)
gateway/it/features/llm-cost-based-ratelimit.featuregateway/it/suite_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- gateway/it/suite_test.go
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
gateway/gateway-controller/default-policies/llm-cost-based-ratelimit.yaml (2)
133-145: Consider adding range constraints onmaxEntriesfor consistency.The memory config bindings are now correct. For parity with
advanced-ratelimit.yaml, consider adding:
maxEntries:minimum: 100, maximum: 10000000This is a minor consistency improvement.
♻️ Suggested constraint
maxEntries: type: integer default: 10000 + minimum: 100 + maximum: 10000000 "wso2/defaultValue": "${config.policy_configurations.ratelimit_v0.memory.max_entries}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/gateway-controller/default-policies/llm-cost-based-ratelimit.yaml` around lines 133 - 145, Add numeric range constraints to the memory.maxEntries schema by adding "minimum: 100" and "maximum: 10000000" under the maxEntries property so it matches the parity in advanced-ratelimit.yaml; update the maxEntries entry (within the memory properties block) to include these constraints alongside the existing type/default and wso2/defaultValue.
88-131: Consider adding range constraints for consistency withadvanced-ratelimit.The Redis config section works correctly, and the binding names now match the reference policy. For schema consistency,
advanced-ratelimit.yamldefines:
port:minimum: 1, maximum: 65535db:minimum: 0, maximum: 15Adding these constraints would catch invalid values at validation time rather than runtime.
♻️ Suggested constraints
port: type: integer default: 6379 + minimum: 1 + maximum: 65535 "wso2/defaultValue": "${config.policy_configurations.ratelimit_v0.redis.port}" ... db: type: integer default: 0 + minimum: 0 + maximum: 15 "wso2/defaultValue": "${config.policy_configurations.ratelimit_v0.redis.db}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gateway/gateway-controller/default-policies/llm-cost-based-ratelimit.yaml` around lines 88 - 131, The Redis schema is missing range constraints on the port and db properties; update the redis.port property to include minimum: 1 and maximum: 65535 and update redis.db to include minimum: 0 and maximum: 15 (keeping their type: integer), mirroring the constraints used in advanced-ratelimit to enforce validation at schema time and prevent invalid values at runtime.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@gateway/gateway-controller/default-policies/llm-cost-based-ratelimit.yaml`:
- Around line 62-64: The policy references
${config.policy_configurations.llm_cost_ratelimit_v0.cost_scale_factor} but that
section doesn't exist, so either add a new config section named
llm_cost_ratelimit_v0 with a cost_scale_factor key to both config.toml and
config-template.toml, or change the policy binding in
llm-cost-based-ratelimit.yaml to use the existing
policy_configurations.llm_cost_v0.cost_scale_factor instead (update the
"wso2/defaultValue" entry); ensure the chosen symbol (llm_cost_ratelimit_v0 or
llm_cost_v0 and cost_scale_factor) exists in both config files so the binding
overrides the schema default.
---
Nitpick comments:
In `@gateway/gateway-controller/default-policies/llm-cost-based-ratelimit.yaml`:
- Around line 133-145: Add numeric range constraints to the memory.maxEntries
schema by adding "minimum: 100" and "maximum: 10000000" under the maxEntries
property so it matches the parity in advanced-ratelimit.yaml; update the
maxEntries entry (within the memory properties block) to include these
constraints alongside the existing type/default and wso2/defaultValue.
- Around line 88-131: The Redis schema is missing range constraints on the port
and db properties; update the redis.port property to include minimum: 1 and
maximum: 65535 and update redis.db to include minimum: 0 and maximum: 15
(keeping their type: integer), mirroring the constraints used in
advanced-ratelimit to enforce validation at schema time and prevent invalid
values at runtime.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 19b789cd-668b-4818-93f2-4edb8868b8d5
📒 Files selected for processing (1)
gateway/gateway-controller/default-policies/llm-cost-based-ratelimit.yaml
5e8e877 to
4ae9963
Compare
# Conflicts: # gateway/gateway-controller/pkg/config/config.go # gateway/it/suite_test.go # Conflicts: # gateway/it/suite_test.go # Conflicts: # go.work.sum
4ae9963 to
1142f1e
Compare
Purpose
LLM API consumers and platform teams need visibility into the monetary cost of each LLM call at the gateway level, without requiring application-side
instrumentation. This is a prerequisite for enforcing per-user or per-API spending budgets (e.g. "block after $10/hour").
Goals
existing advanced-ratelimit engine.
context-window tiering, service tiers (priority/flex/batch), prompt caching, reasoning tokens, audio/image modality tokens, and web search tool costs.
Approach
llm-cost policy (response-phase, WASM):
Gemini grounding fees).
calculation.
llm-cost-based-ratelimit policy (request-phase, WASM):
or Redis backend).
Both policies are published as Go modules under github.com/wso2/gateway-controllers/policies/ and referenced from build.yaml / build-lock.yaml.
Automation tests
Security checks
Samples
N/A
Related PRs
Summary by CodeRabbit
New Features
Configuration
Tests
Chores