Skip to content

[ENHANCEMENT] Implement tensor delta log, manifest freshness model, and snapshot-based update worker #5471

@makr-code

Description

@makr-code

Summary

Implement a production-safe dynamic tensor update path for ThemisDB based on:

  • tensor delta logging
  • manifest-driven freshness/state metadata
  • snapshot-based rebuild
  • advisory-only tensor artifacts
  • exact graph fallback

This issue is the concrete engineering bridge between:

  • dynamic graph updates under RocksDB/MVCC,
  • distributed tensor artifacts,
  • hybrid query planning,
  • and graph-verified finalization.

Problem

A production knowledge graph changes continuously under:

  • insert
  • update
  • delete
  • MVCC / transactional write workloads

Exact RocksDB-backed state can be updated quickly, but tensor artifacts such as:

  • tensor summaries
  • routing tensors
  • factorized artifacts
  • shard summaries
  • optional TT-like structures

cannot safely be recomputed synchronously in the commit path without unacceptable complexity, latency, and correctness risk.

The system therefore needs a derived-artifact maintenance model that:

  • preserves exact graph truth,
  • supports asynchronous tensor freshness,
  • remains planner-visible,
  • and never treats tensor artifacts as final truth-bearing state.

Goals

Design and implement a dynamic tensor-maintenance path with:

  1. tensor delta log
  2. manifest freshness/state model
  3. snapshot-based update worker
  4. patch / partial-refit / rebuild lifecycle
  5. planner/runtime compatibility contract
  6. exact graph fallback guarantees

Architectural Position

This issue must follow the explicit Tensor-Graph architecture boundaries:

  • Graph Truth remains authoritative
  • Tensor artifacts remain advisory-only
  • summary-first retrieval may guide candidate selection
  • exact-on-demand graph loading remains mandatory where correctness matters
  • graph-verified finalization remains the final correctness boundary

This issue must not attempt to make tensor artifacts fully ACID-synchronous with every graph mutation.

Instead, it should define a controlled model where:

  • exact graph state is immediately correct,
  • tensor artifacts are versioned and freshness-scoped,
  • planner/runtime can accept or reject them based on policy.

Phase 1: Design / API Contract

  • Define tensor delta log schema
  • Define manifest freshness/state model
  • Define advisory-only artifact semantics
  • Define relationship between exact graph state and derived tensor state
  • Define planner/runtime contract for artifact usability

Required delta-log concepts

  • mutation type (insert, update, delete)
  • affected entity / relation / shard
  • sequence / commit ordering
  • source transaction / snapshot linkage
  • optional artifact partition hints

Required manifest fields

  • source_seq_start
  • source_seq_end
  • delta_lag
  • artifact_age_ms
  • residual
  • rank_cap
  • rank_status
  • advisory_only
  • rebuild_state
  • invalidation_reason
  • update_mode = patch | partial_refit | rebuild
  • last_rebuild_at

Phase 2: Core Implementation

  • Implement tensor delta log writing on exact graph commit path
  • Implement manifest state persistence
  • Implement snapshot-based update worker
  • Implement artifact publish/swap model
  • Implement artifact invalidation path
  • Implement exact fallback integration hooks for planner/runtime

Worker responsibilities

  • consume delta windows
  • decide patch vs partial refit vs rebuild
  • create new immutable artifact outputs
  • publish updated manifest state
  • mark stale / failed / rebuilding states
  • recover safely after interruption

Phase 3: Failure Handling & Edge Cases

  • Handle stale artifact windows that exceed policy
  • Handle failed partial refit
  • Handle rank-cap breach
  • Handle residual above planner threshold
  • Handle worker crash during rebuild
  • Handle snapshot incompatibility / missing source range
  • Handle artifact invalidation while queries are active

Important edge cases

  • exact graph updated but tensor artifact still stale
  • shard-local summaries disagree with exact fragment fetch
  • rebuild backlog grows faster than worker throughput
  • planner sees only advisory summaries and must fall back
  • partial update is more expensive than full rebuild

Phase 4: Tests

  • Add unit tests for tensor delta log
  • Add unit tests for manifest state transitions
  • Add integration tests for snapshot rebuild worker
  • Add planner compatibility tests
  • Add crash-recovery tests
  • Add exact-fallback enforcement tests

Required test groups

  • test_tensor_delta_log
  • test_tensor_manifest
  • test_tensor_update_worker
  • test_tensor_snapshot_consistency
  • test_tensor_planner_policy

Phase 5: Performance / Hardening

  • Measure commit overhead of delta logging
  • Measure worker throughput by delta window size
  • Measure patch vs partial refit vs rebuild crossover points
  • Measure stale-artifact backlog growth
  • Measure planner fallback frequency under update pressure
  • Measure artifact publish/swap overhead

Performance questions

  • How much write-path overhead does tensor delta logging add?
  • When is patch cheaper than rebuild?
  • When does partial refit become unstable?
  • How large can delta lag become before summaries lose routing value?
  • When is distributed freshness debt unacceptable?

Phase 6: Documentation & Acceptance

  • Document tensor delta log design
  • Document manifest freshness/state model
  • Document snapshot-based update worker lifecycle
  • Document advisory-only semantics
  • Document exact graph fallback behavior
  • Document operator/developer guidance for stale / invalid / rebuilding artifacts

Phase 7: Integration

  • Integrate with hybrid query planner
  • Integrate with distributed tensor summary flows
  • Integrate with observability / metrics pipeline
  • Integrate with benchmark suite
  • Integrate with CI test coverage

Acceptance Criteria

  • Commit path does not require synchronous full tensor recomputation
  • Tensor artifacts are explicitly marked advisory-only
  • Snapshot-based rebuild is defined and crash-safe
  • Planner/runtime can reject stale or invalid artifacts
  • Exact graph truth remains authoritative for final correctness
  • Artifact states are visible via manifest and testable in CI
  • Dynamic tensor maintenance path is benchmarkable and observable

Production Readiness Checklist

  • exact graph fallback always available
  • stale artifacts never silently become truth-bearing
  • manifest states are persisted and inspectable
  • worker restart/recovery is reproducible
  • planner freshness gating is enforced
  • metrics exist for lag, residual, rebuild state, and fallback frequency

Known Issues & Limitations

  • Fully transactional inline TT-core maintenance is out-of-scope
  • Highly optimized incremental tensor-train update under MVCC is still research-heavy
  • GPU-based tensor update acceleration should remain optional until benchmarked
  • Initial versions may prefer rebuild safety over aggressive partial-refit sophistication

Breaking Changes

  • none initially

References

  • TARGET_ARCHITECTURE.md
  • HARDWARE_REQUIREMENTS.md
  • DISTRIBUTED_TENSOR_SHARDING.md
  • TENSOR_GRAPH_RESEARCH_ALIGNMENT.md

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions