Skip to content

Clarify Empty region directory recovery workflow for standalone deployments #8374

Description

@yimeng

Summary

The current documentation for table metadata reconciliation describes the Empty region directory recovery path, but it is unclear whether the documented Recovery Mode + ADMIN reconcile_* workflow applies to standalone deployments.

In a standalone GreptimeDB v1.1.1 environment, I was unable to use the documented workflow to recover from Empty region directory metadata/data inconsistency.

Environment

  • GreptimeDB: v1.1.1
  • Deployment mode: standalone
  • Storage: local file storage
  • WAL provider: raft_engine

Symptom

Startup fails when a table/region still exists in metadata, but its physical region directory no longer exists:

Error: 0: Failed to start datanode
1: Unexpected, violated: Failed to open batch regions
1: Empty region directory, region_id: <region_id>(<table_id>, 0), region_dir: data/greptime/<database>/<table_id>/<table_id>_0000000000/

This matches the Empty region directory scenario described in the docs.

What I tried

1. Enable recovery_mode = true in the standalone config

I created a recovery config based on the production standalone config and added:

recovery_mode = true

However, startup still attempted to open all regions synchronously and failed with the same Empty region directory error.

I also checked:

greptime standalone start --help

but did not find any recovery / recovery_mode related option in the CLI help output.

2. Use SQL reconciliation functions

After starting the standalone instance with background region initialization enabled:

init_regions_in_background = true

I tried:

ADMIN reconcile_database('some_database');

But standalone returned:

Unsupported operation reconcile

So it was not clear how the documented reconciliation workflow should be applied in standalone mode.

3. DROP TABLE works only when table metadata is still visible

For a Mito table that was still visible in information_schema.tables, DROP TABLE could remove it.

But for metadata residue where:

SHOW TABLES FROM some_database;

returned empty, and:

SELECT * FROM information_schema.tables WHERE table_schema = 'some_database';

also returned zero rows, DROP DATABASE could still fail with:

Table info not found: greptime.some_database.some_table

In that state, normal SQL operations could not clean the remaining metadata residue.

Documentation questions

Could the docs clarify the following?

  1. Is Recovery Mode supported in standalone mode, or only in distributed Metasrv/Datanode deployments?
  2. If standalone supports Recovery Mode, what is the exact config key and placement? For example, should it be top-level recovery_mode = true, under a specific section, or enabled by a CLI flag?
  3. Which GreptimeDB versions support this recovery workflow?
  4. Are ADMIN reconcile_table, ADMIN reconcile_database, and ADMIN reconcile_catalog expected to work in standalone mode?
  5. For standalone mode, what is the recommended recovery procedure when:
    • Empty region directory prevents normal startup, and
    • the problematic table is no longer visible in information_schema.tables, and
    • DROP DATABASE fails with Table info not found?
  6. Is init_regions_in_background = true an expected temporary workaround to bring the instance up, or is it unsafe for this scenario?

Suggested documentation improvement

It would be helpful to add a standalone-specific section for Empty region directory, including:

  • whether Recovery Mode is available in standalone;
  • exact example config/command;
  • limitations of ADMIN reconcile_* in standalone;
  • what to do when metadata residue is not visible via SQL;
  • a safe checklist for backup, temporary startup, cleanup, and verification.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions