-
Notifications
You must be signed in to change notification settings - Fork 2.4k
feat(vertexai): add google_vertex_ai_online_evaluator resource #17944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+496
−0
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
c0f799a
feat(vertexai): add google_vertex_ai_online_evaluator resource
guvenenb f533576
Refactor OnlineEvaluator and address PR feedback
guvenenb 4808126
Fix OnlineEvaluator name and self_link mapping
guvenenb c66ca34
Fix indentation
guvenenb 952233e
Add a basic reasoning engine to example of online evaluator
guvenenb 3253f1d
Improve OnlineEvaluator basic example with Reasoning Engine and m…
guvenenb 08e6ce8
Document predefined metrics and add TODO in OnlineEvaluator.yaml
guvenenb 65b23e2
trigger tests again
guvenenb 44d9ec6
Merge branch 'GoogleCloudPlatform:main' into online_evals
guvenenb 4389d7d
Merge branch 'GoogleCloudPlatform:main' into online_evals
guvenenb 0583245
Add online evaluator tests
guvenenb 9a9cd1e
Use handwritten destroy check helper to fix GA/Beta compile conflict
guvenenb 2088314
Merge branch 'GoogleCloudPlatform:main' into online_evals
guvenenb 854efc6
Address review comments for OnlineEvaluator
guvenenb 26076e0
Merge branch 'GoogleCloudPlatform:main' into online_evals
guvenenb 1c349d6
Add tf-test prefix to display_names and specify google-beta provider…
guvenenb bca98b7
Fix Vertex AI Online Evaluator tests and service label
guvenenb 5a24c63
Fix Vertex AI Online Evaluator tests and drift issues
guvenenb 98ba0b3
Mark metricSources sub-fields as immutable in Vertex AI OnlineEvaluator
guvenenb a5a761a
Merge branch 'GoogleCloudPlatform:main' into online_evals
guvenenb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,244 @@ | ||
| # Copyright 2026 Google Inc. | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| --- | ||
| name: OnlineEvaluator | ||
| description: Description | ||
| base_url: projects/{{project}}/locations/{{region}}/onlineEvaluators | ||
| min_version: beta | ||
| self_link: 'projects/{{project}}/locations/{{region}}/onlineEvaluators/{{name}}' | ||
| create_url: projects/{{project}}/locations/{{region}}/onlineEvaluators | ||
| update_mask: true | ||
| update_verb: PATCH | ||
| timeouts: | ||
| insert_minutes: 20 | ||
| update_minutes: 20 | ||
| delete_minutes: 60 | ||
| async: | ||
| type: 'OpAsync' | ||
| operation: | ||
| base_url: '{{op_id}}' | ||
| actions: | ||
| - create | ||
| - update | ||
| - delete | ||
| result: | ||
| resource_inside_response: true | ||
| autogen_status: T25saW5lRXZhbHVhdG9y | ||
| parameters: | ||
| - name: region | ||
| type: String | ||
| required: true | ||
| description: Resource ID segment making up resource `name`. It identifies the resource | ||
| within its parent collection as described in https://google.aip.dev/122. | ||
| immutable: true | ||
| url_param_only: true | ||
| properties: | ||
| - name: agentResource | ||
| type: String | ||
| required: true | ||
| description: 'The name of the agent that the OnlineEvaluator evaluates periodically. | ||
| This value is used to filter the traces with a matching cloud.resource_id | ||
| and link the evaluation results with relevant dashboards/UIs. | ||
| This field is immutable. Once set, it cannot be changed.' | ||
| immutable: true | ||
| - name: cloudObservability | ||
| type: NestedObject | ||
| description: 'Data source for the OnlineEvaluator, based on Google Cloud Observability | ||
| stack (Cloud Trace & Cloud Logging).' | ||
| properties: | ||
| - name: logView | ||
|
guvenenb marked this conversation as resolved.
|
||
| type: String | ||
| description: 'Optional log view that will be used to query logs. | ||
| If empty, the `_Default` view will be used.' | ||
| immutable: true | ||
| - name: openTelemetry | ||
| type: NestedObject | ||
| description: Configuration for data source following OpenTelemetry. | ||
| immutable: true | ||
| ignore_read: true | ||
| properties: | ||
| - name: semconvVersion | ||
| type: String | ||
| required: true | ||
| description: 'Defines which version OTel Semantic Convention the data follows. | ||
| Can be "1.39.0" or newer.' | ||
| immutable: true | ||
| - name: traceScope | ||
| type: NestedObject | ||
| description: 'If chosen, the online evaluator will evaluate single traces matching | ||
| specified `filter`.' | ||
| immutable: true | ||
| properties: | ||
| - name: filter | ||
| type: Array | ||
| description: 'A list of predicates to filter traces. Multiple predicates are | ||
| combined using AND. | ||
| The maximum number of predicates is 10.' | ||
| immutable: true | ||
| item_type: | ||
| type: NestedObject | ||
| properties: | ||
| - name: duration | ||
| type: NestedObject | ||
| description: Defines a predicate for filtering based on a numeric value. | ||
| immutable: true | ||
| properties: | ||
| - name: comparisonOperator | ||
| type: String | ||
| required: true | ||
| description: 'The comparison operator to apply. | ||
| Possible values: | ||
| LESS | ||
| LESS_OR_EQUAL | ||
| EQUAL | ||
| NOT_EQUAL | ||
| GREATER_OR_EQUAL | ||
| GREATER' | ||
| immutable: true | ||
| - name: value | ||
| type: Double | ||
| required: true | ||
| description: The value to compare against. | ||
| immutable: true | ||
| - name: totalTokenUsage | ||
| type: NestedObject | ||
| description: Defines a predicate for filtering based on a numeric value. | ||
| immutable: true | ||
| properties: | ||
| - name: comparisonOperator | ||
| type: String | ||
| required: true | ||
| description: 'The comparison operator to apply. | ||
| Possible values: | ||
| LESS | ||
| LESS_OR_EQUAL | ||
| EQUAL | ||
| NOT_EQUAL | ||
| GREATER_OR_EQUAL | ||
| GREATER' | ||
| immutable: true | ||
| - name: value | ||
| type: Double | ||
| required: true | ||
| description: The value to compare against. | ||
| immutable: true | ||
| - name: traceView | ||
| type: String | ||
| description: 'Optional trace view that will be used to query traces. | ||
| If empty, the `_Default` view will be used. | ||
| NOTE: This field is not supported yet and will be ignored if set.' | ||
| immutable: true | ||
| required: true | ||
| immutable: true | ||
| - name: config | ||
| type: NestedObject | ||
| required: true | ||
| description: 'Configuration for sampling behavior of the OnlineEvaluator. | ||
| The OnlineEvaluator runs at a fixed interval of 10 minutes.' | ||
| properties: | ||
| - name: maxEvaluatedSamplesPerRun | ||
| type: String | ||
| description: 'The maximum number of evaluations to perform per run. | ||
| If set to 0, the number is unbounded.' | ||
| - name: randomSampling | ||
| type: NestedObject | ||
| description: Configuration for random sampling. | ||
| properties: | ||
| - name: percentage | ||
| type: Integer | ||
| required: true | ||
| description: 'The percentage of traces to sample for evaluation. | ||
| Must be an integer between `1` and `100`.' | ||
| - name: createTime | ||
| type: String | ||
| description: Timestamp when the OnlineEvaluator was created. | ||
| output: true | ||
| - name: displayName | ||
| type: String | ||
| description: 'Human-readable name for the `OnlineEvaluator`. | ||
| The name doesn''t have to be unique. | ||
| The name can consist of any UTF-8 characters. The maximum length is `63` | ||
| characters. If the display name exceeds max characters, an | ||
| `INVALID_ARGUMENT` error is returned.' | ||
| - name: metricSources | ||
| type: Array | ||
| required: true | ||
| immutable: true | ||
| description: 'A list of metric sources to be used for evaluating samples. | ||
| At least one MetricSource must be provided. | ||
| Right now, only predefined metrics and registered metrics are supported. | ||
| Every registered metric must have `display_name` (or `title`) and | ||
| `score_range` defined. Otherwise, the evaluations will fail. | ||
| The maximum number of `metric_sources` is 25.' | ||
| item_type: | ||
| type: NestedObject | ||
| properties: | ||
| # TODO: Support structured metric config instead of JSON string. | ||
| - name: metric | ||
| type: String | ||
| description: | | ||
| Inline metric config. Provide this field as a JSON-formatted string. | ||
| Suggested predefined metricSpecName values: | ||
| - `final_response_quality_v1` | ||
| - `tool_use_quality_v1` | ||
| - `hallucination_v1` | ||
| - `safety_v1` | ||
| - `multi_turn_task_success_v1` | ||
| - `multi_turn_tool_use_quality_v1` | ||
| - `multi_turn_trajectory_quality_v1` | ||
| immutable: true | ||
| custom_expand: 'templates/terraform/custom_expand/json_value.tmpl' | ||
| custom_flatten: 'templates/terraform/custom_flatten/json_schema.tmpl' | ||
| - name: metricResourceName | ||
| type: String | ||
| description: Resource name for registered metric. | ||
| immutable: true | ||
| - name: name | ||
| type: String | ||
| description: 'Identifier. The resource name of the OnlineEvaluator. | ||
| Format: projects/{project}/locations/{region}/onlineEvaluators/{id}.' | ||
| output: true | ||
| custom_flatten: 'templates/terraform/custom_flatten/name_from_self_link.tmpl' | ||
| - name: state | ||
| type: String | ||
| description: 'The state of the OnlineEvaluator. | ||
| Possible values: | ||
| ACTIVE | ||
| SUSPENDED | ||
| FAILED | ||
| WARNING' | ||
| output: true | ||
| - name: stateDetails | ||
| type: Array | ||
| description: 'Contains additional information about the state of the OnlineEvaluator. | ||
| This is used to provide more details in the event of a failure.' | ||
| output: true | ||
| item_type: | ||
| type: NestedObject | ||
| output: true | ||
| properties: | ||
| - name: message | ||
| type: String | ||
| description: Human-readable message describing the state of the OnlineEvaluator. | ||
| output: true | ||
| - name: updateTime | ||
| type: String | ||
| description: Timestamp when the OnlineEvaluator was last updated. | ||
| output: true | ||
|
|
||
| examples: | ||
| - name: 'vertex_ai_online_evaluator_basic' | ||
| primary_resource_id: 'evaluator' | ||
| vars: | ||
| evaluator_name: 'my-evaluator' | ||
| engine_name: 'my-engine' | ||
71 changes: 71 additions & 0 deletions
71
mmv1/templates/terraform/examples/vertex_ai_online_evaluator_basic.tf.tmpl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| resource "google_vertex_ai_reasoning_engine" "engine" { | ||
| provider = google-beta | ||
| display_name = "{{index $.Vars "engine_name"}}" | ||
| description = "A basic reasoning engine" | ||
| labels = { | ||
| "key" = "value" | ||
| } | ||
| region = "us-central1" | ||
| } | ||
|
|
||
| resource "google_vertex_ai_online_evaluator" "evaluator" { | ||
|
guvenenb marked this conversation as resolved.
|
||
| provider = google-beta | ||
| region = "us-central1" | ||
| display_name = "{{index $.Vars "evaluator_name"}}" | ||
|
|
||
| agent_resource = google_vertex_ai_reasoning_engine.engine.id | ||
|
|
||
| config { | ||
| max_evaluated_samples_per_run = "100" | ||
| random_sampling { | ||
| percentage = 10 | ||
| } | ||
| } | ||
|
|
||
| metric_sources { | ||
| metric = jsonencode({ | ||
| "predefinedMetricSpec" = { | ||
|
guvenenb marked this conversation as resolved.
|
||
| "metricSpecName" = "safety_v1" | ||
|
guvenenb marked this conversation as resolved.
|
||
| } | ||
| }) | ||
| } | ||
|
|
||
| metric_sources { | ||
| metric = jsonencode({ | ||
| "predefinedMetricSpec" = { | ||
| "metricSpecName" = "hallucination_v1" | ||
| } | ||
| }) | ||
| } | ||
|
|
||
| metric_sources { | ||
| metric = jsonencode({ | ||
| "predefinedMetricSpec" = { | ||
| "metricSpecName" = "final_response_quality_v1" | ||
| } | ||
| }) | ||
| } | ||
|
|
||
| metric_sources { | ||
| metric = jsonencode({ | ||
| "predefinedMetricSpec" = { | ||
| "metricSpecName" = "tool_use_quality_v1" | ||
| } | ||
| }) | ||
| } | ||
|
|
||
| cloud_observability { | ||
| open_telemetry { | ||
| semconv_version = "1.39.0" | ||
| } | ||
|
|
||
| trace_scope { | ||
| filter { | ||
| duration { | ||
| comparison_operator = "GREATER" | ||
| value = 0 | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.