Skip to content

Aggregating a router's upstream (northbound) declaration forwarding #2630

@BOURBONCASK

Description

@BOURBONCASK

Describe the feature

Hi Zenoh team,

First off — the regions work in 1.9 (Longwang) is
fantastic. Being able to split a big flat topology into a tree of subregions is exactly what let me push
an edge-to-cloud setup I work on past the old flat-peer limits, and I lean on it heavily.

Running at that larger scale, I bumped into a follow-on declaration-propagation question on the cloud
side that I'd love your read on. I've prototyped one approach (described below) but wanted to
sanity-check the direction before proposing it as an actual change.

The situation I hit

In my setup many downstream routers each forward their local key-expression sub-tree up into a shared
router mesh (cloud), and clients reach resources by their full key-expressions. The shape is pretty general,
though — anything that looks like "many downstream branches feeding into an upstream mesh" runs into the
same thing.

What I see is that every upstream router ends up holding on the order of N×K routing-table Resources:
N downstream branches, K keys each. The footprint matters, but the cost that bites first isn't really
memory (a big cloud box has plenty) — it's the control-plane work that scales with N×K: per-declaration
admission, route (re)computation, cold-cache recompute on topology changes, and — in a router mesh —
cross-router propagation, since link-state replicates every declaration to every upstream router. So at
scale the routing table, not the transports, becomes the binding constraint, and it shows up as
convergence time and churn cost rather than just RAM.

Carving the topology into subregions (which I also lean on) helps a related but different cost — it
contains churn so one branch's reconnects don't ripple through a shared structure — but it doesn't reduce
the number of declarations that propagate upstream and replicate across a router mesh. That count is
what this idea targets, and it's the part isolation alone can't address.

Zenoh already has config-driven aggregation (aggregation.subscribers / publishers, applied when a
session opens), and it's genuinely useful — but as far as I can tell it only collapses a session's
own declarations at that session's boundary. It doesn't change what a router forwards upstream on
behalf of the sessions below it. So a forwarding router's own subscribers collapse to one upstream, but
a downstream session's K subscribers that the router forwards stay K. (I checked this experimentally to
make sure I wasn't missing an existing knob.)

What I'm thinking

Extend that same config-aggregation idea to a router's northbound forwarding: when a north-bound router
HAT is about to forward a downstream subscriber/queryable whose key-expression is included by a
configured prefix, fold it into a single ${prefix} declaration upstream and suppress the per-key
children up there — while keeping them registered locally so downward routing is completely unchanged.
The upstream router then keeps one Resource per configured prefix instead of one per forwarded key (in
a quick prototype that collapse is exact: N×K → N, one aggregate per branch).

The part I like is that there's no new wire type — the folded declaration is just an ordinary
DeclareSubscriber / DeclareQueryable, so mesh propagation, matching and admin all keep working
as-is. It's opt-in, and with the config empty it takes the existing propagation path unchanged.

aggregation: {
  upstream: {                 // opt-in; northbound forwarding only
    subscribers: ["example/**"],
    queryables:  ["example/**"],
  },
}

A prefix typically covers a sub-tree shared by a group of downstream sessions, but the mechanism only
requires that the keys be included by the prefix.

What I tried hard to keep intact

  • Data routing / matching / cross-mesh reachability — the aggregate is just a wildcard declaration, so a
    client on one mesh router still reaches a resource sitting behind another.
  • Queryable get (default BestMatching) — the aggregate is advertised complete=false: it's a
    presence hint, not a completeness claim, so BestMatching falls through to the real per-key queryable
    and never shadows a genuinely-complete source.
  • target=AllComplete — handled too. Because the aggregate hides children that might be complete, a
    non-complete route entry that covers the query and points at a router is treated as a transparent
    forwarder, so AllComplete passes through to the router that did the fold and it re-applies the filter
    against its real children.
  • Presence — the aggregate is declared while the forwarding router has at least one matching declaration,
    and withdrawn when the last one leaves.

Trade-offs I'm aware of (and would document)

Turning it on costs some granularity at the upstream node:

  • Per-key ACL / QoS-overwrite / interceptors at the upstream see the ${prefix} aggregate rather than
    individual keys, so per-key policy belongs on the forwarding router.
  • Admin-space enumeration upstream shows the aggregate, not the children.
  • Since the aggregate is a wildcard, data published to unsubscribed keys under the prefix still travels
    to the forwarding router (which drops it) — so prefixes want to be tight.
  • Liveliness tokens are intentionally not folded: a liveliness sample's key is the token's own key, so
    a single ${prefix}/** token couldn't enumerate the live set or signal a per-key removal. That felt out
    of scope.
  • A startup check warns about suspicious prefixes (a bare ** root, the @/ admin-space, duplicates,
    mutually-including prefixes).

A couple of things I'd really value your read on

  1. Does extending the existing config-aggregation idea to a router's northbound forwarding feel like a
    direction worth pursuing at all?
  2. If so, is the router HAT (propagate_subscriber / propagate_queryable) the right layer, or would you
    rather it sit in the existing LocalResources framework? (The linkstate forwarding path is
    interest-decoupled, unlike the interest-driven LocalResources in the broker/peer HATs — happy to
    write up the layering.)
  3. For AllComplete I let the aggregate act as a transparent forwarder (a small in-process flag on the
    route entry, no wire change). Does that feel right, or would you prefer deriving the aggregate's
    complete from its children (which reintroduces multi-owner shadow/stale risk)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    new featureSomething new is needed

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions