Describe the feature
Hi Zenoh team,
First off — the regions work in 1.9 (Longwang) is
fantastic. Being able to split a big flat topology into a tree of subregions is exactly what let me push
an edge-to-cloud setup I work on past the old flat-peer limits, and I lean on it heavily.
Running at that larger scale, I bumped into a follow-on declaration-propagation question on the cloud
side that I'd love your read on. I've prototyped one approach (described below) but wanted to
sanity-check the direction before proposing it as an actual change.
The situation I hit
In my setup many downstream routers each forward their local key-expression sub-tree up into a shared
router mesh (cloud), and clients reach resources by their full key-expressions. The shape is pretty general,
though — anything that looks like "many downstream branches feeding into an upstream mesh" runs into the
same thing.
What I see is that every upstream router ends up holding on the order of N×K routing-table Resources:
N downstream branches, K keys each. The footprint matters, but the cost that bites first isn't really
memory (a big cloud box has plenty) — it's the control-plane work that scales with N×K: per-declaration
admission, route (re)computation, cold-cache recompute on topology changes, and — in a router mesh —
cross-router propagation, since link-state replicates every declaration to every upstream router. So at
scale the routing table, not the transports, becomes the binding constraint, and it shows up as
convergence time and churn cost rather than just RAM.
Carving the topology into subregions (which I also lean on) helps a related but different cost — it
contains churn so one branch's reconnects don't ripple through a shared structure — but it doesn't reduce
the number of declarations that propagate upstream and replicate across a router mesh. That count is
what this idea targets, and it's the part isolation alone can't address.
Zenoh already has config-driven aggregation (aggregation.subscribers / publishers, applied when a
session opens), and it's genuinely useful — but as far as I can tell it only collapses a session's
own declarations at that session's boundary. It doesn't change what a router forwards upstream on
behalf of the sessions below it. So a forwarding router's own subscribers collapse to one upstream, but
a downstream session's K subscribers that the router forwards stay K. (I checked this experimentally to
make sure I wasn't missing an existing knob.)
What I'm thinking
Extend that same config-aggregation idea to a router's northbound forwarding: when a north-bound router
HAT is about to forward a downstream subscriber/queryable whose key-expression is included by a
configured prefix, fold it into a single ${prefix} declaration upstream and suppress the per-key
children up there — while keeping them registered locally so downward routing is completely unchanged.
The upstream router then keeps one Resource per configured prefix instead of one per forwarded key (in
a quick prototype that collapse is exact: N×K → N, one aggregate per branch).
The part I like is that there's no new wire type — the folded declaration is just an ordinary
DeclareSubscriber / DeclareQueryable, so mesh propagation, matching and admin all keep working
as-is. It's opt-in, and with the config empty it takes the existing propagation path unchanged.
aggregation: {
upstream: { // opt-in; northbound forwarding only
subscribers: ["example/**"],
queryables: ["example/**"],
},
}
A prefix typically covers a sub-tree shared by a group of downstream sessions, but the mechanism only
requires that the keys be included by the prefix.
What I tried hard to keep intact
- Data routing / matching / cross-mesh reachability — the aggregate is just a wildcard declaration, so a
client on one mesh router still reaches a resource sitting behind another.
- Queryable
get (default BestMatching) — the aggregate is advertised complete=false: it's a
presence hint, not a completeness claim, so BestMatching falls through to the real per-key queryable
and never shadows a genuinely-complete source.
target=AllComplete — handled too. Because the aggregate hides children that might be complete, a
non-complete route entry that covers the query and points at a router is treated as a transparent
forwarder, so AllComplete passes through to the router that did the fold and it re-applies the filter
against its real children.
- Presence — the aggregate is declared while the forwarding router has at least one matching declaration,
and withdrawn when the last one leaves.
Trade-offs I'm aware of (and would document)
Turning it on costs some granularity at the upstream node:
- Per-key ACL / QoS-overwrite / interceptors at the upstream see the
${prefix} aggregate rather than
individual keys, so per-key policy belongs on the forwarding router.
- Admin-space enumeration upstream shows the aggregate, not the children.
- Since the aggregate is a wildcard, data published to unsubscribed keys under the prefix still travels
to the forwarding router (which drops it) — so prefixes want to be tight.
- Liveliness tokens are intentionally not folded: a liveliness sample's key is the token's own key, so
a single ${prefix}/** token couldn't enumerate the live set or signal a per-key removal. That felt out
of scope.
- A startup check warns about suspicious prefixes (a bare
** root, the @/ admin-space, duplicates,
mutually-including prefixes).
A couple of things I'd really value your read on
- Does extending the existing config-aggregation idea to a router's northbound forwarding feel like a
direction worth pursuing at all?
- If so, is the router HAT (
propagate_subscriber / propagate_queryable) the right layer, or would you
rather it sit in the existing LocalResources framework? (The linkstate forwarding path is
interest-decoupled, unlike the interest-driven LocalResources in the broker/peer HATs — happy to
write up the layering.)
- For
AllComplete I let the aggregate act as a transparent forwarder (a small in-process flag on the
route entry, no wire change). Does that feel right, or would you prefer deriving the aggregate's
complete from its children (which reintroduces multi-owner shadow/stale risk)?
Describe the feature
Hi Zenoh team,
First off — the regions work in 1.9 (Longwang) is
fantastic. Being able to split a big flat topology into a tree of subregions is exactly what let me push
an edge-to-cloud setup I work on past the old flat-peer limits, and I lean on it heavily.
Running at that larger scale, I bumped into a follow-on declaration-propagation question on the cloud
side that I'd love your read on. I've prototyped one approach (described below) but wanted to
sanity-check the direction before proposing it as an actual change.
The situation I hit
In my setup many downstream routers each forward their local key-expression sub-tree up into a shared
router mesh (cloud), and clients reach resources by their full key-expressions. The shape is pretty general,
though — anything that looks like "many downstream branches feeding into an upstream mesh" runs into the
same thing.
What I see is that every upstream router ends up holding on the order of N×K routing-table
Resources:N downstream branches, K keys each. The footprint matters, but the cost that bites first isn't really
memory (a big cloud box has plenty) — it's the control-plane work that scales with N×K: per-declaration
admission, route (re)computation, cold-cache recompute on topology changes, and — in a router mesh —
cross-router propagation, since link-state replicates every declaration to every upstream router. So at
scale the routing table, not the transports, becomes the binding constraint, and it shows up as
convergence time and churn cost rather than just RAM.
Carving the topology into subregions (which I also lean on) helps a related but different cost — it
contains churn so one branch's reconnects don't ripple through a shared structure — but it doesn't reduce
the number of declarations that propagate upstream and replicate across a router mesh. That count is
what this idea targets, and it's the part isolation alone can't address.
Zenoh already has config-driven aggregation (
aggregation.subscribers/publishers, applied when asession opens), and it's genuinely useful — but as far as I can tell it only collapses a session's
own declarations at that session's boundary. It doesn't change what a router forwards upstream on
behalf of the sessions below it. So a forwarding router's own subscribers collapse to one upstream, but
a downstream session's K subscribers that the router forwards stay K. (I checked this experimentally to
make sure I wasn't missing an existing knob.)
What I'm thinking
Extend that same config-aggregation idea to a router's northbound forwarding: when a north-bound router
HAT is about to forward a downstream subscriber/queryable whose key-expression is included by a
configured prefix, fold it into a single
${prefix}declaration upstream and suppress the per-keychildren up there — while keeping them registered locally so downward routing is completely unchanged.
The upstream router then keeps one
Resourceper configured prefix instead of one per forwarded key (ina quick prototype that collapse is exact: N×K → N, one aggregate per branch).
The part I like is that there's no new wire type — the folded declaration is just an ordinary
DeclareSubscriber/DeclareQueryable, so mesh propagation, matching and admin all keep workingas-is. It's opt-in, and with the config empty it takes the existing propagation path unchanged.
A prefix typically covers a sub-tree shared by a group of downstream sessions, but the mechanism only
requires that the keys be included by the prefix.
What I tried hard to keep intact
client on one mesh router still reaches a resource sitting behind another.
get(defaultBestMatching) — the aggregate is advertisedcomplete=false: it's apresence hint, not a completeness claim, so
BestMatchingfalls through to the real per-key queryableand never shadows a genuinely-complete source.
target=AllComplete— handled too. Because the aggregate hides children that might be complete, anon-complete route entry that covers the query and points at a router is treated as a transparent
forwarder, so
AllCompletepasses through to the router that did the fold and it re-applies the filteragainst its real children.
and withdrawn when the last one leaves.
Trade-offs I'm aware of (and would document)
Turning it on costs some granularity at the upstream node:
${prefix}aggregate rather thanindividual keys, so per-key policy belongs on the forwarding router.
to the forwarding router (which drops it) — so prefixes want to be tight.
a single
${prefix}/**token couldn't enumerate the live set or signal a per-key removal. That felt outof scope.
**root, the@/admin-space, duplicates,mutually-including prefixes).
A couple of things I'd really value your read on
direction worth pursuing at all?
propagate_subscriber/propagate_queryable) the right layer, or would yourather it sit in the existing
LocalResourcesframework? (The linkstate forwarding path isinterest-decoupled, unlike the interest-driven
LocalResourcesin the broker/peer HATs — happy towrite up the layering.)
AllCompleteI let the aggregate act as a transparent forwarder (a small in-process flag on theroute entry, no wire change). Does that feel right, or would you prefer deriving the aggregate's
completefrom its children (which reintroduces multi-owner shadow/stale risk)?