[Feature]: Add dedicated metrics for Snapshot feature

## Is there an existing issue for this?

- [x] I have searched the existing issues

## Is your feature request related to a problem? Please describe.

The Snapshot feature (introduced in #44358) currently only uses generic Proxy-level metrics (`milvus_proxy_function_call_total` and `milvus_proxy_req_latency`). These metrics only track API call counts and latency, but lack snapshot-specific operational insights.

**Current limitations:**
1. Cannot monitor the total number of snapshots across collections
2. Cannot track snapshot storage consumption
3. Cannot observe restore operation progress in real-time via metrics
4. Cannot measure snapshot creation/restore duration at the DataCoord level
5. No visibility into snapshot-referenced data that cannot be garbage collected

This makes it difficult to:
- Set up alerting for snapshot storage growth
- Monitor restore job progress via Grafana dashboards
- Understand the storage impact of snapshots on object storage costs
- Troubleshoot snapshot-related performance issues

## Describe the solution you'd like

Add dedicated Prometheus metrics for the Snapshot feature:

### Snapshot Inventory Metrics
| Metric Name | Type | Labels | Description |
|-------------|------|--------|-------------|
| `milvus_snapshot_total` | Gauge | `collection_id`, `db_name` | Total number of snapshots |
| `milvus_snapshot_storage_bytes` | Gauge | `collection_id`, `snapshot_name` | Storage size of snapshot data |
| `milvus_snapshot_referenced_storage_bytes` | Gauge | `collection_id` | Storage size of data referenced by snapshots (cannot be GC'd) |

### Snapshot Operation Metrics
| Metric Name | Type | Labels | Description |
|-------------|------|--------|-------------|
| `milvus_snapshot_create_duration_seconds` | Histogram | `collection_id`, `status` | Time to create a snapshot |
| `milvus_snapshot_restore_duration_seconds` | Histogram | `collection_id`, `status` | Time to restore a snapshot |
| `milvus_snapshot_restore_progress_ratio` | Gauge | `job_id`, `snapshot_name` | Restore progress (0.0 - 1.0) |
| `milvus_snapshot_restore_jobs_total` | Gauge | `state` | Number of restore jobs by state (pending/in_progress/completed/failed) |

### Snapshot Error Metrics
| Metric Name | Type | Labels | Description |
|-------------|------|--------|-------------|
| `milvus_snapshot_operation_errors_total` | Counter | `operation`, `error_type` | Total snapshot operation errors |

## Describe an alternate solution

1. **Extend existing metrics**: Add `operation_type=snapshot_*` labels to existing DataCoord metrics instead of creating new metric families.

2. **Expose via API only**: Keep metrics lightweight and expose detailed snapshot statistics only via `DescribeSnapshot` API responses, letting users build custom exporters.

## Anything else? (Additional Context)

- Related Feature Issue: https://github.com/milvus-io/milvus/issues/44358
- Current metrics location: `pkg/metrics/proxy_metrics.go`
- Snapshot implementation: `internal/proxy/snapshot_impl.go`, `internal/datacoord/snapshot*.go`

**User Guide Note**: The snapshot user guide mentions "Monitoring: Track snapshot creation times and storage usage" as a best practice, but currently there's no built-in way to do this via metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Add dedicated metrics for Snapshot feature #47097

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Snapshot Inventory Metrics

Snapshot Operation Metrics

Snapshot Error Metrics

Describe an alternate solution

Anything else? (Additional Context)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric Name	Type	Labels	Description
`milvus_snapshot_total`	Gauge	`collection_id`, `db_name`	Total number of snapshots
`milvus_snapshot_storage_bytes`	Gauge	`collection_id`, `snapshot_name`	Storage size of snapshot data
`milvus_snapshot_referenced_storage_bytes`	Gauge	`collection_id`	Storage size of data referenced by snapshots (cannot be GC'd)

Metric Name	Type	Labels	Description
`milvus_snapshot_create_duration_seconds`	Histogram	`collection_id`, `status`	Time to create a snapshot
`milvus_snapshot_restore_duration_seconds`	Histogram	`collection_id`, `status`	Time to restore a snapshot
`milvus_snapshot_restore_progress_ratio`	Gauge	`job_id`, `snapshot_name`	Restore progress (0.0 - 1.0)
`milvus_snapshot_restore_jobs_total`	Gauge	`state`	Number of restore jobs by state (pending/in_progress/completed/failed)

[Feature]: Add dedicated metrics for Snapshot feature #47097

Description

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Snapshot Inventory Metrics

Snapshot Operation Metrics

Snapshot Error Metrics

Describe an alternate solution

Anything else? (Additional Context)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions