-
Notifications
You must be signed in to change notification settings - Fork 461
facebook katran
Source: https://engineering.fb.com/2018/05/22/open-sourcing-katran-a-scalable-network-load-balancer/ Company: Facebook/Meta (now open-source) Date: May 2018 Scale: Serving billions of users globally
Katran is Meta's scalable network load balancer (L4LB) that uses ExaBGP for BGP-based service announcement. This represents one of the largest-scale production deployments of ExaBGP in the world.
Facebook/Meta selected ExaBGP specifically for its:
"lightweight, flexible design"
Key Decision Factors:
- Lightweight resource footprint
- Flexible API for dynamic control
- Simple STDIN/STDOUT integration
- Proven reliability at scale
- Easy integration with custom control planes
┌─────────────────────────────────────────────────────────┐
│ Katran L4LB Stack │
├─────────────────────────────────────────────────────────┤
│ Control Plane (Health Checks, Config Management) │
│ Data Plane (XDP/eBPF Packet Processing) │
│ BGP Announcement (ExaBGP) │
└─────────────────────────────────────────────────────────┘
│ BGP Peering
↓
┌─────────────────────────────────────────────────────────┐
│ Network Switch (Top-of-Rack) │
│ ECMP Routing │
└─────────────────────────────────────────────────────────┘
│ ECMP Distribution
┌──────────┼──────────┐
↓ ↓ ↓
[Katran 1] [Katran 2] [Katran 3] ... [Katran N]
│ │ │
└──────────┴──────────┘
│ DSR Mode
↓
[Backend Servers with VIPs on Loopback]
Service Announcement:
"This component simply announces the virtual IP addresses that the L4LB is responsible for to the world by peering with the network element (typically a switch) in front of the L4LB."
How It Works:
- Katran control plane determines which VIPs to serve
- ExaBGP announces VIPs to network switches via BGP
- Switches use ECMP to distribute traffic across announcing L4LBs
- Each L4LB independently selects backend using consistent hashing
- Backends respond directly to clients (DSR)
"The switch then uses an equal-cost multipath (ECMP) mechanism to distribute packets among the L4LBs announcing the VIP."
Benefits:
- Stateless distribution at network layer
- Scales horizontally by adding L4LB instances
- No state synchronization required between L4LBs
- Hardware-speed packet distribution
Consistent Hashing: Each L4LB independently selects backends using hash of packet 5-tuple:
- Source IP
- Source Port
- Destination IP (VIP)
- Destination Port
- Protocol
This ensures same backend selection across all L4LBs for a given flow.
Architecture:
- Backend servers configure VIPs on loopback interfaces
- L4LB forwards packets with original VIP as destination
- Backend responds directly to client
- Return traffic bypasses L4LB entirely
Advantages:
- L4LB only handles inbound traffic (50% reduction)
- Scales to high bandwidth workloads
- Lower latency for responses
- Higher aggregate throughput
Responsibilities:
- Health checking backend servers
- Managing VIP-to-backend mappings
- Configuration file management
- API for state examination
- ExaBGP control (announce/withdraw VIPs)
APIs:
"simple APIs to examine the state of the L4LB and backend servers"
Configuration:
- File-based configuration
- Dynamic add/remove of VIPs
- Dynamic add/remove of backends
- Health check policies
Technology:
- XDP (eXpress Data Path) for packet processing
- eBPF programs in kernel
- Hardware offload capable
- Line-rate packet forwarding
Not ExaBGP-related but completes the picture: Katran uses cutting-edge Linux kernel technology for high-performance packet processing.
# Pseudo-code for Katran's ExaBGP integration
class KatranExaBGPController:
def __init__(self):
self.announced_vips = set()
def announce_vip(self, vip):
"""Announce VIP via ExaBGP"""
if vip not in self.announced_vips:
# Write to ExaBGP stdin
print(f"announce route {vip}/32 next-hop self")
sys.stdout.flush()
self.announced_vips.add(vip)
def withdraw_vip(self, vip):
"""Withdraw VIP via ExaBGP"""
if vip in self.announced_vips:
print(f"withdraw route {vip}/32")
sys.stdout.flush()
self.announced_vips.remove(vip)
def update_vips(self, desired_vips):
"""Update announced VIPs based on health and config"""
current = self.announced_vips
desired = set(desired_vips)
# Announce new VIPs
for vip in desired - current:
self.announce_vip(vip)
# Withdraw removed VIPs
for vip in current - desired:
self.withdraw_vip(vip)# Simplified health check → BGP announcement flow
def health_check_loop():
while True:
# Check backend health
healthy_backends = check_all_backends()
# Determine which VIPs can be served
servable_vips = get_servable_vips(healthy_backends)
# Update ExaBGP announcements
exabgp_controller.update_vips(servable_vips)
sleep(health_check_interval)Logic:
- Health check all configured backends
- Determine which VIPs have sufficient healthy backends
- Announce VIPs that can be served
- Withdraw VIPs that cannot be served (all backends down)
# Configuration file-based VIP management
def config_watcher():
"""Watch config file for VIP changes"""
while True:
config = load_config('/etc/katran/vips.conf')
# Parse VIP configurations
configured_vips = parse_vip_config(config)
# Update announcements based on health and config
update_announcements(configured_vips)
sleep(config_check_interval)- Deployment: Global Facebook/Meta infrastructure
- Traffic: Billions of requests per second aggregate
- VIPs: Thousands of virtual IPs announced
- L4LBs: Hundreds or thousands of Katran instances
- Lightweight: Minimal CPU/memory overhead
- Fast convergence: BGP updates propagate quickly
- Reliable: No reported ExaBGP-related outages at scale
- Efficient: Handles thousands of route announcements
- Hardware-speed packet distribution
- Consistent hashing ensures flow stability
- Scales linearly with additional L4LB instances
- No state synchronization overhead
Add more L4LB instances → ExaBGP announces from new instances → ECMP distributes traffic
- L4LB failure → Switch detects BGP session down → ECMP reroutes to remaining L4LBs
- Backend failure → Health checks detect → L4LB continues serving with remaining backends
- No single point of failure
- File-based configuration
- Simple BGP announcement (ExaBGP handles protocol complexity)
- No complex state management
- Easy to add/remove capacity
- Commodity servers instead of expensive hardware load balancers
- Open source software (Katran + ExaBGP)
- Scales with general-purpose Linux infrastructure
- XDP/eBPF data plane for high throughput
- DSR for efficient return path
- ECMP for hardware-speed distribution
- ExaBGP's lightweight footprint
| Aspect | Katran + ExaBGP | Traditional Hardware LB |
|---|---|---|
| Cost | Commodity servers | Expensive appliances |
| Scalability | Horizontal (add more) | Vertical (limited) |
| State | Stateless | Stateful (complex sync) |
| Performance | Line rate (XDP) | Hardware limited |
| Flexibility | Software-defined | Vendor-locked |
| BGP | ExaBGP (flexible) | Proprietary |
| Open Source | Yes | No |
Katran proves ExaBGP works at massive scale:
- Billions of users
- Global deployment
- Mission-critical infrastructure
- Years of production use
This is a major use case to document:
- ExaBGP + ECMP for stateless load distribution
- DSR architecture with VIPs on loopback
- Health check integration
- Dynamic VIP management
- Horizontal scalability pattern
Standard pattern:
Control Plane (Config + Health Checks) → ExaBGP (Announcements) → Network (ECMP) → Backends
- Lightweight: ExaBGP's minimal overhead is critical at scale
- Flexible: STDIN/STDOUT API enables easy integration
- Reliable: BGP provides proven control plane
- Simple: Offload BGP complexity to ExaBGP
- Stateless L4LBs (no state sync)
- Consistent hashing (flow stability)
- ECMP distribution (hardware speed)
- DSR (efficiency)
- ExaBGP (dynamic announcements)
- Each Katran instance peers with local switch (eBGP or iBGP)
- Announces /32 VIPs
- Switch sees equal-cost paths from multiple L4LBs
- ECMP distributes traffic
L4LB Failure:
- BGP session drops (hold timer expiry)
- Switch removes routes from failed L4LB
- ECMP recalculates distribution among remaining L4LBs
- Existing flows on failed L4LB break, new flows go to healthy L4LBs
Backend Failure:
- Health check detects failure
- L4LB updates backend pool (removes failed backend)
- New flows distributed to healthy backends
- L4LB continues serving VIP if any backends remain
All Backends Down:
- Health checks detect all backends down
- L4LB withdraws VIP announcement via ExaBGP
- Switch stops routing to this L4LB for this VIP
- Other L4LBs continue serving VIP (if they have healthy backends)
L4LB Recovery:
- L4LB starts, establishes BGP session
- Health checks run
- ExaBGP announces VIPs with healthy backends
- Switch adds L4LB to ECMP pool
- Traffic starts flowing
Backend Recovery:
- Health check detects backend recovery
- L4LB adds backend to pool
- New flows distributed to recovered backend
- If VIP was withdrawn (all backends were down), re-announce via ExaBGP
Katran is open source:
- GitHub: https://github.com/facebookincubator/katran
- Uses ExaBGP as dependency
- Production-tested at Facebook/Meta scale
- Available for community use
"lightweight, flexible design" — Why Facebook chose ExaBGP
"announces the virtual IP addresses that the L4LB is responsible for to the world by peering with the network element (typically a switch) in front of the L4LB" — ExaBGP's role
"The switch then uses an equal-cost multipath (ECMP) mechanism to distribute packets among the L4LBs announcing the VIP" — ECMP distribution pattern
"simple APIs to examine the state of the L4LB and backend servers" — Control plane design
-
Create "Large-Scale L4 Load Balancing" Use Case
- Document Katran architecture
- ExaBGP + ECMP pattern
- DSR with loopback VIPs
- Health check integration
- Real-world scale validation
-
Add Facebook/Meta to Users Section
- High-profile user
- Validates ExaBGP at extreme scale
- Open source project
-
Update Load Balancing Documentation
- Add ECMP section
- Document stateless L4LB pattern
- Include Katran as reference implementation
-
Integration Guide
- File-based configuration patterns
- Control plane design
- Health check to BGP announcement flow
- Graceful degradation
-
Architecture Patterns
- ExaBGP + ECMP + DSR
- Horizontal scaling
- Fault tolerance
-
Example: Katran-style VIP Management
- Configuration watcher
- Health check integration
- Dynamic announcement
Katran proves ExaBGP's value at hyperscale:
- ✅ Billions of users served
- ✅ Mission-critical production deployment
- ✅ Selected specifically for "lightweight, flexible design"
- ✅ Years of reliable operation
- ✅ Open source for community validation
Key Pattern: ExaBGP + ECMP + DSR for stateless, horizontally-scalable L4 load balancing
Documentation Impact: This is a major reference implementation that validates ExaBGP's production readiness at the highest scale.
https://engineering.fb.com/2019/05/28/data-infrastructure/dhcplb-server/ (May 2019)
Facebook initially used ExaBGP for DHCP server load balancing:
"each Kea server advertise a global Anycast IP using ExaBGP"
Architecture:
- Multiple DHCP servers (Kea)
- Each server runs ExaBGP
- Announces same anycast IP via BGP
- Network uses ECMP to distribute requests
Problem: Traffic was not evenly balanced across DHCP servers.
Root Cause:
"BGP/ECMP cares only about the topology of the network it is operating on and the number of hops between the source and destination; it does not take into account the number of requests hitting each path."
Key Insight: ECMP distributes based on:
- Network topology
- Path costs
- Hop counts
NOT based on:
- Request rates
- Server load
- Connection counts
Facebook built DHCPLB as "a relay that balances incoming requests across a list of DHCP servers" with:
- Application-layer load balancing
- Traffic pool management (stable vs canary)
- A/B testing capabilities
- Even request distribution
Performance: Handled "the same volume of traffic with 10 times fewer servers"
When BGP/ECMP + ExaBGP Works Well:
- ✅ Stateless services (like Katran's L4LB)
- ✅ Long-lived connections (flow-based distribution OK)
- ✅ Geographically distributed (topology matters)
- ✅ Consistent hashing sufficient
When Application-Layer LB Needed:
- ❌ Request-based distribution required
- ❌ Uneven request patterns
- ❌ Short-lived connections (many new flows)
- ❌ A/B testing and canary deployments
- ❌ Load-aware distribution
This demonstrates when NOT to use ExaBGP + ECMP alone:
- DHCP servers (short requests, need even distribution)
- RPC servers (request-level balancing)
- Any service requiring load-aware distribution
Solution: Combine approaches:
- ExaBGP announces service IP
- Application-layer LB (like DHCPLB, HAProxy) receives traffic
- LB distributes to backends based on load
This is similar to Vincent Bernat's multi-tier architecture (Tier 1: ExaBGP+ECMP, Tier 2: IPVS/HAProxy).
For Documentation Writers: Use Katran as the premier example of:
- Large-scale production deployment
- ExaBGP + ECMP architecture
- Stateless load balancing design
- Health check integration
- Horizontal scalability
- Why companies choose ExaBGP ("lightweight, flexible")
Status: Critical reference implementation — must be prominently featured in documentation.
Getting Started
Configuration
- Configuration Syntax
- Neighbor Configuration
- Directives A-Z
- Templates
- Environment Variables
- Process Configuration
API
- API Overview
- Text API Reference
- JSON API Reference
- API Commands
- Writing API Programs
- Error Handling
- Production Best Practices
Address Families
- Overview
- IPv4 Unicast
- IPv6 Unicast
- FlowSpec
- EVPN
- L3VPN
- BGP-LS
- VPLS
- SRv6 / MUP
- Multicast
- RT Constraint
Features
Use Cases
Tools
Operations
Reference
- Architecture
- Design
- Attribute Reference
- Command Reference
- BGP State Machine
- Capabilities
- Communities
- Examples Index
- Glossary
- RFC Support
Integration
Migration
Community
External