Skip to content

CRITICAL: first_seen/last_seen history silently dropped on every Kvrocks rebuild #140

@t0kubetsu

Description

@t0kubetsu

Summary

Every --rebuild and --rebuild-from-meili run permanently discards all accumulated first_seen/last_seen history. This is a regression introduced in the v0.2606.0 refactor of index_kvrocks.py.

Root cause

tools/index_kvrocks.py:843–847:

seen_snapshot = (
    rebuild_kvrocks(indexer, include_tags=args.retag)   # always returns None
    if args.rebuild or args.rebuild_from_meili
    else None
)

rebuild_kvrocks() ends with an implicit return None. The return value is assigned to seen_snapshot, which is forwarded to apply_seen_snapshot(). That function bails immediately when seen_snapshot is falsy (line 307), so no timestamps are ever restored after a rebuild.

The function snapshot_seen_values() still exists at line 414 and correctly captures timestamps from the live Kvrocks index before keys are cleared. It was called in the pre-v0.2606.0 flow but was silently removed during the refactor.

first_seen is documented in CLAUDE.md as accumulated Kvrocks state that cannot be reconstructed from Meilisearch alone. This regression means every operator-triggered rebuild permanently destroys historical first-seen timestamps with no warning.

Note: reimport_port_dump.py (line 666) explicitly deletes doc:* keys before calling the same pipeline, so the in-place-merge fallback documented in the rebuild_kvrocks() comment is also unavailable for the port-dump path.

Fix

seen_snapshot = None
if args.rebuild or args.rebuild_from_meili:
    seen_snapshot = snapshot_seen_values(indexer)   # capture before clearing
    rebuild_kvrocks(indexer, include_tags=args.retag)

File

tools/index_kvrocks.py

Verification

Run --rebuild-from-meili on a populated index and confirm first_seen values survive on a sample of UIDs before and after.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions