Skip to content

Tracking issue: sample-based NDV for large ANALYZE jobs #67449

@0xPoe

Description

@0xPoe

Summary

Track the implementation of sample-based NDV collection for large-table ANALYZE.

Goal

Reduce TiKV-side NDV collection cost for very large analyze jobs while keeping NDV accuracy acceptable.

Scope

  • implement sample-based NDV collection for Analyze V2
  • define how it is enabled (explicit knob and/or auto-trigger for very large / slow analyze jobs)
  • benchmark both NDV accuracy and resource usage on representative datasets
  • document behavior and limitations clearly

Out of scope

The first step does not promise full-scan / IOPS reduction. The initial implementation mainly targets TiKV CPU cost.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions