Skip to content

tracking: remote dyn filter #8127

@discord9

Description

@discord9

Summary

Support remote dynamic filter propagation for distributed queries — let frontend-side dynamic filters produced by joins be registered, serialized, and propagated to remote datanode scans so they can prune data earlier. This is an optimization-only feature: failures must safely degrade to local-only dynamic filtering and must not affect query correctness.

Related PRs

  • docs: rfc for remote dyn filter #7931 — docs: rfc for remote dyn filter
  • feat: remote dyn filter basics #7979 — feat: remote dyn filter basics
  • frontend producer / bridge / bounded initial-register handoff (not yet submitted)
  • datanode remote filter state / apply runtime (not yet submitted)
  • observability, fallback, and lifecycle protection (not yet submitted)
  • end-to-end validation and performance baseline (not yet submitted)
  • large build-side membership payload, e.g. Bloom-based representation (future enhancement)

Component Breakdown

Component Description Status
RFC / design Defines the high-level remote dyn filter propagation model and phased rollout
Wire ABI / query identity Defines query_id, filter_id, epoch / completion semantics, payload boundaries, and safe downgrade rules
Region RPC control plane Adds a unary frontend -> datanode control-plane entry for remote dyn filter update / unregister messages 🔄
Frontend producer / bridge Identifies distributed join dynamic filters, generates stable query-local filter_ids, stores minimal query-scoped frontend state, and attaches bounded initial-register metadata to the first remote read 🔄
Datanode initial registration Receives initial-register metadata and prepares query-scoped handoff state for later consumer/runtime installation 🔄
Datanode apply runtime Maintains query_id + filter_id state, applies ordered updates, installs remote dynamic filter wrappers into scan predicates, handles remap, and owns successful-path cleanup 🔜
Unregister / lifecycle cleanup Cleans up frontend and datanode state on end-of-use, query finish, cancel, stream drop, or TTL fallback 🔜
Observability / fallback Adds metrics, tracing, budgets, error downgrade visibility, and control-plane protection 🔜
End-to-end validation Covers distributed join pruning, correctness under downgrade/failure, epoch ordering, cleanup, and performance baselines 🔜
Large build-side membership Designs a transportable representation for non-serializable HashTableLookupExpr-like membership, likely via Bloom / custom payloads 🔜

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions