Skip to content

Project-scoped timeseries interact poorly with grouping #7532

Open
@iximeow

Description

@iximeow

for project-scoped timeseries queries we append | filter silo_id == "<your silo>" && project_id == "<your project>" but a query using metrics with those fields may not have those fields by the time the filter is run.

i think this is most easily seen with a query like

get virtual_machine:vcpu_usage |
  filter timestamp >= @2025-02-11T00:59:44.938 && timestamp < @2025-02-11T01:20:24.938 &&
    instance_id == \"cdffcf35-6ae3-488d-a03d-64cf45f88fb2\" && state == \"emulation\" |
  align mean_within(20s) | group_by [instance_id], sum"

where the extra filters make us error in a pretty confusing way.

of course, if your query ends up retaining silo and project IDs the whole way through, the extra filter is fine, and so

> ./target/debug/oxide --profile dogfood experimental timeseries query --project ixi --query "\
        get virtual_machine:vcpu_usage | \
          filter timestamp >= @2025-02-11T00:59:44.938 && timestamp < @2025-02-11T01:20:24.938 && \
            instance_id == \"ad5a6c89-2845-4c2e-b247-8ca034e10597\" && state == \"emulation\" | \
          align mean_within(20s) | group_by [instance_id, project_id, silo_id], sum"

or tool of choice works with no issue.

how i got here, a moderately long adventure

included more because there are several things we could do better along the way and i'm filing other issues out of here..

i'd noticed this from the CLI:

./target/debug/oxide --profile dogfood \
    experimental timeseries query \
    --project ixi \
    --query "\
        get virtual_machine:vcpu_usage | \
          filter timestamp >= @2025-02-11T00:59:44.938 && timestamp < @2025-02-11T01:20:24.938 && \
            instance_id == \"cdffcf35-6ae3-488d-a03d-64cf45f88fb2\" && state == \"emulation\" |
          align mean_within(20s) | group_by [instance_id], sum"

which got me...

Error Response: status: 400 Bad Request; headers: {"content-type":
"application/json", "x-request-id": "f96ba139-2229-4a39-8435-7f6b39d640fb",
"content-length": "551", "date": "Wed, 12 Feb 2025 20:15:45 GMT"}; value: Error
{ error_code: Some("InvalidRequest"), message: "The filter expression
\"(silo_id == \"7bd7623a-68ed-4636-8ecb-b59e3b068787\") && (project_id ==
\"9c4152f9-4317-4269-9018-66142964d21c\")\" is not valid, the following errors
were encountered\n  > The filter expression refers to identifiers that are not
valid for its input table \"virtual_machine:vcpu_usage\". Invalid identifiers:
[\"silo_id\", \"project_id\"], valid identifiers: [\"datum\", \"instance_id\",
\"start_time\", \"timestamp\"]", request_id:
"f96ba139-2229-4a39-8435-7f6b39d640fb" }

emphasis on

The filter expression "(silo_id == "7bd7623a-68ed-4636-8ecb-b59e3b068787") && (project_id == "9c4152f9-4317-4269-9018-66142964d21c")" is not valid, 

... which i'd never written! unfortunately for the CLI or SDK, it's not obvious to end users that the extra filter expression is an implementation detail of the endpoint, rather than something about the query itself which is wrong. to rule that out i'd run the same query against the API directly:

curl --fail-with-body -v -X POST \
    -H 'content-type:application/json' \
    -H 'cookie: session=[snip]' \
    --data "{\"query\": \
        \"get virtual_machine:vcpu_usage | \
            filter timestamp >= @2025-02-11T00:59:44.938 && timestamp < @2025-02-11T01:20:24.938 && \
              instance_id == \\\cdffcf35-6ae3-488d-a03d-64cf45f88fb2\\\" && state == \\\"emulation\\\" | \
            align mean_within(20s) | group_by [instance_id], sum\" \
    }" \
    'https://oxide.sys.rack2.eng.oxide.computer/v1/timeseries/query?project=ixi'

which got me the same error. on the Omicron side i pretty quickly found #6873 which explains where the extra filter expression came from. but the group_by in my query means that virtual_machine:vcpu_usage doesn't have all the other fields like project_id and silo_id anymore, so the project filter will just produce an invalid query.

and indeed, grouping by [instance_id, project_id, silo_id] yields output more like you'd expect:

> ./target/debug/oxide --profile dogfood     experimental timeseries query --project ixi --query "\
        get virtual_machine:vcpu_usage | \
          filter timestamp >= @2025-02-11T00:59:44.938 && timestamp < @2025-02-11T01:20:24.938 && \
            instance_id == \"ad5a6c89-2845-4c2e-b247-8ca034e10597\" && state == \"emulation\" |
          align mean_within(20s) | group_by [instance_id, project_id, silo_id], sum"
{
  "tables": [
    {
      "name": "virtual_machine:vcpu_usage",
      "timeseries": {
        "8769668217919957407": {
          "fields": {
            "instance_id": {
              "type": "uuid",
              "value": "cdffcf35-6ae3-488d-a03d-64cf45f88fb2"
            },
            "project_id": {
              "type": "uuid",
              "value": "9c4152f9-4317-4269-9018-66142964d21c"
            },
            "silo_id": {
              "type": "uuid",
              "value": "7bd7623a-68ed-4636-8ecb-b59e3b068787"
            }
          },
          "points": {
            "timestamps": [
              "2025-02-11T01:00:04.938Z",
... eliding all the lines of data but it's all there and reasonable ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions