Skip to content

facebook katran

Thomas Mangin edited this page Nov 10, 2025 · 1 revision

Facebook/Meta Katran: ExaBGP at Hyperscale

Source: https://engineering.fb.com/2018/05/22/open-sourcing-katran-a-scalable-network-load-balancer/ Company: Facebook/Meta (now open-source) Date: May 2018 Scale: Serving billions of users globally

Overview

Katran is Meta's scalable network load balancer (L4LB) that uses ExaBGP for BGP-based service announcement. This represents one of the largest-scale production deployments of ExaBGP in the world.

Why ExaBGP?

Facebook/Meta selected ExaBGP specifically for its:

"lightweight, flexible design"

Key Decision Factors:

  • Lightweight resource footprint
  • Flexible API for dynamic control
  • Simple STDIN/STDOUT integration
  • Proven reliability at scale
  • Easy integration with custom control planes

Architecture

Component Overview

┌─────────────────────────────────────────────────────────┐
│                    Katran L4LB Stack                     │
├─────────────────────────────────────────────────────────┤
│  Control Plane (Health Checks, Config Management)       │
│  Data Plane (XDP/eBPF Packet Processing)               │
│  BGP Announcement (ExaBGP)                              │
└─────────────────────────────────────────────────────────┘
                         │ BGP Peering
                         ↓
┌─────────────────────────────────────────────────────────┐
│              Network Switch (Top-of-Rack)                │
│                   ECMP Routing                           │
└─────────────────────────────────────────────────────────┘
                         │ ECMP Distribution
              ┌──────────┼──────────┐
              ↓          ↓          ↓
         [Katran 1] [Katran 2] [Katran 3] ... [Katran N]
              │          │          │
              └──────────┴──────────┘
                         │ DSR Mode
                         ↓
              [Backend Servers with VIPs on Loopback]

ExaBGP's Role

Service Announcement:

"This component simply announces the virtual IP addresses that the L4LB is responsible for to the world by peering with the network element (typically a switch) in front of the L4LB."

How It Works:

  1. Katran control plane determines which VIPs to serve
  2. ExaBGP announces VIPs to network switches via BGP
  3. Switches use ECMP to distribute traffic across announcing L4LBs
  4. Each L4LB independently selects backend using consistent hashing
  5. Backends respond directly to clients (DSR)

ECMP Load Distribution

"The switch then uses an equal-cost multipath (ECMP) mechanism to distribute packets among the L4LBs announcing the VIP."

Benefits:

  • Stateless distribution at network layer
  • Scales horizontally by adding L4LB instances
  • No state synchronization required between L4LBs
  • Hardware-speed packet distribution

Consistent Hashing: Each L4LB independently selects backends using hash of packet 5-tuple:

  • Source IP
  • Source Port
  • Destination IP (VIP)
  • Destination Port
  • Protocol

This ensures same backend selection across all L4LBs for a given flow.

Direct Server Return (DSR)

Architecture:

  • Backend servers configure VIPs on loopback interfaces
  • L4LB forwards packets with original VIP as destination
  • Backend responds directly to client
  • Return traffic bypasses L4LB entirely

Advantages:

  • L4LB only handles inbound traffic (50% reduction)
  • Scales to high bandwidth workloads
  • Lower latency for responses
  • Higher aggregate throughput

Control Plane

Responsibilities:

  • Health checking backend servers
  • Managing VIP-to-backend mappings
  • Configuration file management
  • API for state examination
  • ExaBGP control (announce/withdraw VIPs)

APIs:

"simple APIs to examine the state of the L4LB and backend servers"

Configuration:

  • File-based configuration
  • Dynamic add/remove of VIPs
  • Dynamic add/remove of backends
  • Health check policies

Data Plane (XDP/eBPF)

Technology:

  • XDP (eXpress Data Path) for packet processing
  • eBPF programs in kernel
  • Hardware offload capable
  • Line-rate packet forwarding

Not ExaBGP-related but completes the picture: Katran uses cutting-edge Linux kernel technology for high-performance packet processing.

ExaBGP Integration Pattern

Announcement Strategy

# Pseudo-code for Katran's ExaBGP integration

class KatranExaBGPController:
    def __init__(self):
        self.announced_vips = set()

    def announce_vip(self, vip):
        """Announce VIP via ExaBGP"""
        if vip not in self.announced_vips:
            # Write to ExaBGP stdin
            print(f"announce route {vip}/32 next-hop self")
            sys.stdout.flush()
            self.announced_vips.add(vip)

    def withdraw_vip(self, vip):
        """Withdraw VIP via ExaBGP"""
        if vip in self.announced_vips:
            print(f"withdraw route {vip}/32")
            sys.stdout.flush()
            self.announced_vips.remove(vip)

    def update_vips(self, desired_vips):
        """Update announced VIPs based on health and config"""
        current = self.announced_vips
        desired = set(desired_vips)

        # Announce new VIPs
        for vip in desired - current:
            self.announce_vip(vip)

        # Withdraw removed VIPs
        for vip in current - desired:
            self.withdraw_vip(vip)

Health Check Integration

# Simplified health check → BGP announcement flow

def health_check_loop():
    while True:
        # Check backend health
        healthy_backends = check_all_backends()

        # Determine which VIPs can be served
        servable_vips = get_servable_vips(healthy_backends)

        # Update ExaBGP announcements
        exabgp_controller.update_vips(servable_vips)

        sleep(health_check_interval)

Logic:

  1. Health check all configured backends
  2. Determine which VIPs have sufficient healthy backends
  3. Announce VIPs that can be served
  4. Withdraw VIPs that cannot be served (all backends down)

Configuration Management

# Configuration file-based VIP management

def config_watcher():
    """Watch config file for VIP changes"""
    while True:
        config = load_config('/etc/katran/vips.conf')

        # Parse VIP configurations
        configured_vips = parse_vip_config(config)

        # Update announcements based on health and config
        update_announcements(configured_vips)

        sleep(config_check_interval)

Scale and Performance

Production Scale

  • Deployment: Global Facebook/Meta infrastructure
  • Traffic: Billions of requests per second aggregate
  • VIPs: Thousands of virtual IPs announced
  • L4LBs: Hundreds or thousands of Katran instances

ExaBGP Performance

  • Lightweight: Minimal CPU/memory overhead
  • Fast convergence: BGP updates propagate quickly
  • Reliable: No reported ExaBGP-related outages at scale
  • Efficient: Handles thousands of route announcements

ECMP Distribution

  • Hardware-speed packet distribution
  • Consistent hashing ensures flow stability
  • Scales linearly with additional L4LB instances
  • No state synchronization overhead

Key Advantages

1. Horizontal Scalability

Add more L4LB instances → ExaBGP announces from new instances → ECMP distributes traffic

2. Fault Tolerance

  • L4LB failure → Switch detects BGP session down → ECMP reroutes to remaining L4LBs
  • Backend failure → Health checks detect → L4LB continues serving with remaining backends
  • No single point of failure

3. Operational Simplicity

  • File-based configuration
  • Simple BGP announcement (ExaBGP handles protocol complexity)
  • No complex state management
  • Easy to add/remove capacity

4. Cost Efficiency

  • Commodity servers instead of expensive hardware load balancers
  • Open source software (Katran + ExaBGP)
  • Scales with general-purpose Linux infrastructure

5. Performance

  • XDP/eBPF data plane for high throughput
  • DSR for efficient return path
  • ECMP for hardware-speed distribution
  • ExaBGP's lightweight footprint

Comparison with Traditional Load Balancers

Aspect Katran + ExaBGP Traditional Hardware LB
Cost Commodity servers Expensive appliances
Scalability Horizontal (add more) Vertical (limited)
State Stateless Stateful (complex sync)
Performance Line rate (XDP) Hardware limited
Flexibility Software-defined Vendor-locked
BGP ExaBGP (flexible) Proprietary
Open Source Yes No

Lessons for ExaBGP Documentation

1. Production Validation

Katran proves ExaBGP works at massive scale:

  • Billions of users
  • Global deployment
  • Mission-critical infrastructure
  • Years of production use

2. Use Case: Large-Scale L4 Load Balancing

This is a major use case to document:

  • ExaBGP + ECMP for stateless load distribution
  • DSR architecture with VIPs on loopback
  • Health check integration
  • Dynamic VIP management
  • Horizontal scalability pattern

3. Integration Pattern: Control Plane + ExaBGP

Standard pattern:

Control Plane (Config + Health Checks) → ExaBGP (Announcements) → Network (ECMP) → Backends

4. Design Principles

  • Lightweight: ExaBGP's minimal overhead is critical at scale
  • Flexible: STDIN/STDOUT API enables easy integration
  • Reliable: BGP provides proven control plane
  • Simple: Offload BGP complexity to ExaBGP

5. Architecture Benefits

  • Stateless L4LBs (no state sync)
  • Consistent hashing (flow stability)
  • ECMP distribution (hardware speed)
  • DSR (efficiency)
  • ExaBGP (dynamic announcements)

Technical Details

BGP Peering

  • Each Katran instance peers with local switch (eBGP or iBGP)
  • Announces /32 VIPs
  • Switch sees equal-cost paths from multiple L4LBs
  • ECMP distributes traffic

Failure Handling

L4LB Failure:

  1. BGP session drops (hold timer expiry)
  2. Switch removes routes from failed L4LB
  3. ECMP recalculates distribution among remaining L4LBs
  4. Existing flows on failed L4LB break, new flows go to healthy L4LBs

Backend Failure:

  1. Health check detects failure
  2. L4LB updates backend pool (removes failed backend)
  3. New flows distributed to healthy backends
  4. L4LB continues serving VIP if any backends remain

All Backends Down:

  1. Health checks detect all backends down
  2. L4LB withdraws VIP announcement via ExaBGP
  3. Switch stops routing to this L4LB for this VIP
  4. Other L4LBs continue serving VIP (if they have healthy backends)

Recovery

L4LB Recovery:

  1. L4LB starts, establishes BGP session
  2. Health checks run
  3. ExaBGP announces VIPs with healthy backends
  4. Switch adds L4LB to ECMP pool
  5. Traffic starts flowing

Backend Recovery:

  1. Health check detects backend recovery
  2. L4LB adds backend to pool
  3. New flows distributed to recovered backend
  4. If VIP was withdrawn (all backends were down), re-announce via ExaBGP

Open Source

Katran is open source:

Quotes and Key Statements

"lightweight, flexible design" — Why Facebook chose ExaBGP

"announces the virtual IP addresses that the L4LB is responsible for to the world by peering with the network element (typically a switch) in front of the L4LB" — ExaBGP's role

"The switch then uses an equal-cost multipath (ECMP) mechanism to distribute packets among the L4LBs announcing the VIP" — ECMP distribution pattern

"simple APIs to examine the state of the L4LB and backend servers" — Control plane design

Documentation Recommendations

High Priority

  1. Create "Large-Scale L4 Load Balancing" Use Case

    • Document Katran architecture
    • ExaBGP + ECMP pattern
    • DSR with loopback VIPs
    • Health check integration
    • Real-world scale validation
  2. Add Facebook/Meta to Users Section

    • High-profile user
    • Validates ExaBGP at extreme scale
    • Open source project
  3. Update Load Balancing Documentation

    • Add ECMP section
    • Document stateless L4LB pattern
    • Include Katran as reference implementation

Medium Priority

  1. Integration Guide

    • File-based configuration patterns
    • Control plane design
    • Health check to BGP announcement flow
    • Graceful degradation
  2. Architecture Patterns

    • ExaBGP + ECMP + DSR
    • Horizontal scaling
    • Fault tolerance

Code Examples

  1. Example: Katran-style VIP Management
    • Configuration watcher
    • Health check integration
    • Dynamic announcement

Summary

Katran proves ExaBGP's value at hyperscale:

  • ✅ Billions of users served
  • ✅ Mission-critical production deployment
  • ✅ Selected specifically for "lightweight, flexible design"
  • ✅ Years of reliable operation
  • ✅ Open source for community validation

Key Pattern: ExaBGP + ECMP + DSR for stateless, horizontally-scalable L4 load balancing

Documentation Impact: This is a major reference implementation that validates ExaBGP's production readiness at the highest scale.


Related: Facebook's DHCPLB Experience

Source

https://engineering.fb.com/2019/05/28/data-infrastructure/dhcplb-server/ (May 2019)

Initial ExaBGP + Anycast Approach

Facebook initially used ExaBGP for DHCP server load balancing:

"each Kea server advertise a global Anycast IP using ExaBGP"

Architecture:

  • Multiple DHCP servers (Kea)
  • Each server runs ExaBGP
  • Announces same anycast IP via BGP
  • Network uses ECMP to distribute requests

Limitations Discovered

Problem: Traffic was not evenly balanced across DHCP servers.

Root Cause:

"BGP/ECMP cares only about the topology of the network it is operating on and the number of hops between the source and destination; it does not take into account the number of requests hitting each path."

Key Insight: ECMP distributes based on:

  • Network topology
  • Path costs
  • Hop counts

NOT based on:

  • Request rates
  • Server load
  • Connection counts

Solution: DHCPLB

Facebook built DHCPLB as "a relay that balances incoming requests across a list of DHCP servers" with:

  • Application-layer load balancing
  • Traffic pool management (stable vs canary)
  • A/B testing capabilities
  • Even request distribution

Performance: Handled "the same volume of traffic with 10 times fewer servers"

Lessons Learned

When BGP/ECMP + ExaBGP Works Well:

  • ✅ Stateless services (like Katran's L4LB)
  • ✅ Long-lived connections (flow-based distribution OK)
  • ✅ Geographically distributed (topology matters)
  • ✅ Consistent hashing sufficient

When Application-Layer LB Needed:

  • ❌ Request-based distribution required
  • ❌ Uneven request patterns
  • ❌ Short-lived connections (many new flows)
  • ❌ A/B testing and canary deployments
  • ❌ Load-aware distribution

Documentation Implications

This demonstrates when NOT to use ExaBGP + ECMP alone:

  • DHCP servers (short requests, need even distribution)
  • RPC servers (request-level balancing)
  • Any service requiring load-aware distribution

Solution: Combine approaches:

  • ExaBGP announces service IP
  • Application-layer LB (like DHCPLB, HAProxy) receives traffic
  • LB distributes to backends based on load

This is similar to Vincent Bernat's multi-tier architecture (Tier 1: ExaBGP+ECMP, Tier 2: IPVS/HAProxy).


For Documentation Writers: Use Katran as the premier example of:

  1. Large-scale production deployment
  2. ExaBGP + ECMP architecture
  3. Stateless load balancing design
  4. Health check integration
  5. Horizontal scalability
  6. Why companies choose ExaBGP ("lightweight, flexible")

Status: Critical reference implementation — must be prominently featured in documentation.

Clone this wiki locally