Skip to content

Need mechanism to avoid blackholing when DDM path between Transit Router and Server Router is lost #368

Open
@taspelund

Description

@taspelund

Server Routers act solely as stub routers and cannot be used as a transit node for traffic originating from another DDM router.
In the current Oxide topology (2 sidecars/Transit Routers + N gimlets/Server Routers), a single backplane link failure can result in blackholing of traffic.
i.e.
If the link between Sled 0 and Switch 0 goes down, traffic destined for Sled 0 that arrives at Switch 0 will be lost.

 ┌──────────┐   ┌──────────┐
 │          │   │          │
 │ Switch 0 │   │ Switch 1 │
 └─┬─────┬──┘   └──────┬──┬┘
   │     │             │  │ 
   │     └──────────┐  │  │ 
   │                │  │  │ 
   x     ┌──────────┼──┘  │ 
   │     │          │     │ 
  ┌┴─────┴─┐      ┌─┴─────┴┐
  │ Sled 0 │      │ Sled 1 │
  └────────┘      └────────┘

This happens as a result of multiple factors coinciding.

  1. The physical topology inside the Oxide rack is a 3-stage clos uses the "spine" layer as the exit of the fabric. This means there are no alternative paths to get from an exit to a leaf node that don't involve crossing an additional number of links (e.g. spine0 -> leaf0 = 1 link, vs spine0 -> leaf1 -> spine1 -> leaf0 = 3 links).
  2. DDM does not allow Server Routers to be used for transit, which preserves the "valley free" property of the network by disallowing the 3-link routing path mentioned in the above bullet point. (We also likely wouldn't want to use Server Routers for transit because of the implications it would have on gimlet CPU load, network bandwidth, etc.)
  3. The exit nodes are effectively doing Northbound aggregation of individual External IPs via BGP advertisements. The External IPs are assigned to instances via omicron, which can/will be dispersed across the Overlay, effectively partitioning the External IP subnets used in the rack. Without exposing the dis-aggregation of these External IPs (advertising /32 or /128 routes for each External IP in use) Northbound via BGP, there is no way for the Northbound network to see or react to this failure.

We need a mechanism to properly handle this failure case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIdeaNew ideas to consider.bgpBorder Gateway ProtocolddmDelay Driven Multipathwant

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions