Skip to content

Feature: retention time marker procedure #5518

Open
@dantengsky

Description

@dantengsky

Summary

Provides a way of marking historical snapshots invisible, so that the old snapshots( and maybe the data it referenced) can fade away gradually.


Basic desc of functionalities:

Marks the latest visible snapshot of the given table.

  • A system configuration, let's say table_retention_time: Duration,
  • A system procedure, which marks the latest visible snapshot of a table
    • by insert/update a specified key of the KV service
    • TimeTravel of table data will respect this mark

NOTE: The query nodes work on their local clocks, which is NOT perfectly synced


basic idea of impl:

  • provides a system procedure, let's say
    call system$retention_mark([database_name,] table_name)
    • grab meta data of the table specified
    • check if key LATEST_VISBLE_SNAPHOST of the give table exist
      LATEST_VISBLE_SNAPHOST/<tid> -> timestamp
      • if it exist and value of it is less than (now() + table_retention_time)
        try to update it to (now() + table_retention_time)
      • if it does not exist
        try to insert the kv pair
    • And of course, the mutations should be executed in a kv transaction
      • the most important invariant of this operation
        value of LATEST_VISBLE_SNAPHOST/<tid> should only be increased

Notes:

  • if database_name is not provided, use the context's current database name
  • A "hurry" marker, whose clock is crazily ahead of time, may mark the LATEST_VISBLE_SNAPHOST "incorrectly"
    • Have to live with it, hoping it is not too crazy : )

      e.g. if the clock is two months ahead of time. The history of the table may be not accessible in the next 2 months.

      To intimidate this situation:
      The value of LATEST_VISBLE_SNAPHOST/<tid> could be changed to the timestamp of the snapshot, by navigating to the snapshot S at (now() + table_retention_time).
      Thus, snapshots generated after S, could be accessible, if clocks go back to normal.

    • The "current snapshot" referenced by the KV meta, is always visible

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-storageArea: databend storageC-featureCategory: feature

    Type

    No type

    Projects

    Status

    📋 Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions