Skip to content

RFC: New vtgate Metrics #17585

Open
@systay

Description

@systay

1. Summary

Vitess currently classifies query plan types in a way that is neither intuitive nor helpful for performance analysis. In particular, QueriesProcessed and QueriesRouted rely on plan-type designations that are inconsistent across different operators (e.g., IN, Concatenate, DDL, Reference, FkCascade, and InsertSelect). This proposal introduces two new metrics and deprecates the older, less-informative ones.

2. Motivation

  1. Inconsistent Plan-Type Metric

    • The plan type was derived from the root operator’s name, leading to wide variability in reported plan types.
    • Some operators (like Route) reported the route type, while others forwarded whatever their child operator returned.
  2. Limited Usefulness

    • Metrics such as QueriesProcessed and QueriesRouted provide only a coarse breakdown.
    • Deeper insight (e.g., how complex or how many shards were involved) is missing or buried in logs/other metrics.
  3. Need for Clarity

    • We want an easily understandable categorization of query complexity to better diagnose query performance in vtgate.
    • We also want to track how many shards each query touches, to inform sharding logic decisions and query optimization.

3. Proposed Changes

3.1 Deprecation

  • Metrics to Deprecate:
    • QueriesProcessed
    • QueriesRouted
    • QueriesProcessedByTable
    • QueriesRoutedByTable
  • These will be marked as deprecated but still exist for at least one release to allow for a smooth transition.

3.2 Plan Classification

We propose categorizing queries into twelve distinct buckets to capture both common and potentially problematic execution patterns:

  1. Local
  • This query is solved at VTGate without calling any external system.
  1. Passthrough

    • The query is forwarded directly to a single shard without modification.
    • Minimal overhead in vtgate; the fastest possible path.
  2. MultiShard

    • The query is routed to multiple shards, but not all shards in the cluster.
    • vtgate may need to concatenate results but does not perform extensive operator logic beyond routing.
  3. Lookup

    • The query requires at least two calls: typically a vindex lookup first, followed by the main query.
    • A common pattern for partial fan-out or locating specific shard(s) through a lookup vindex.
  4. Scatter

    • The query is sent to all shards.
    • Indicates potential performance overhead, as every shard must be involved in the request.
  5. JoinOp

    • The plan includes at least one join operator at vtgate level (e.g., join of two routes).
    • Useful for quickly spotting queries that might be combining data across multiple shards.
  6. ForeignKey

    • The query involved foreign key constraint handling at vtgate level.
    • This happens when Vitess is managing the foreign key constraints.
  7. Complex

    • Catches any plan more involved than the above categories (e.g., subqueries, nested operators, multi-stage pipelines).
    • Indicates a need for further investigation or optimization if it appears frequently.
  8. OnlineDDL

    • Vitess-managed DDL statements performed online (e.g., schema migrations orchestrated by vtgate).
    • Tracked separately to measure usage and performance impact of online operations.
  9. DirectDDL

    • DDL statements that are directly passed to the underlying MySQL instances.
    • Does not go through Vitess’s online migration flow.
  10. Transaction

    • Statements that are related to transaction like (begin, commit, rollback, savepoint, etc.).
    • Does not go through Vitess’s online migration flow.
  11. Topology

    • Statements that calls into Topo server to retrieve the metadata information for response.

3.3 Query Classification

  • Purpose: Categorize queries by the high-level SQL statement type.

Possible Categories (not exhaustive):

  • SELECT
  • INSERT
  • UPDATE
  • DELETE
  • SET
  • DDL (could be further subdivided if desired)
  • Others as needed (ALTER, CREATE, DROP, etc.)

3.4 New Metrics

Three dimensions are added to the new metrics.
Plan - Plan classification
Query - Query classification
Tablet - Target Tablet Type (Primary, Replica, Rdonly, etc.)

  1. QueryExecutions - Counts queries executed at VTGate
  2. QueryRoutes - Counts queries routed from VTGate to VTTablet
  3. QueryExecutionsByTable - Counts queries executed at VTGate per table. This has Query and Table Name as dimension.

4. Backward Compatibility

  • We will keep the old metrics running in parallel, marked as “DEPRECATED,” for at least one release cycle.
  • Developers and operators should be encouraged to migrate to the new metrics.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions