-
Notifications
You must be signed in to change notification settings - Fork 461
Anycast Management
Automated anycast network control for high availability and geographic distribution
🌐 ExaBGP enables application-driven anycast - your services control their own routing
- Overview
- How Anycast Works
- Why ExaBGP for Anycast?
- Architecture Patterns
- Service IP Management
- Health Check Integration
- Common Use Cases
- Implementation Guide
- Failover Scenarios
- Best Practices
- Troubleshooting
- Real-World Examples
Anycast is a network addressing method where the same IP address is announced from multiple locations. Traffic is automatically routed to the nearest/healthiest instance.
Traditional Unicast:
Client → Single Server IP (100.10.0.100)
One location, single point of failure
Anycast:
Client → Same IP (100.10.0.100) announced from multiple locations
├─ Location A (if A is closest/healthy)
├─ Location B (if B is closest/healthy)
└─ Location C (if C is closest/healthy)
ExaBGP's Role:
- Services announce their availability via BGP
- Health checks control announcements
- Automatic failover when services fail
- No manual intervention required
1. Service IPs Configured
# On each server (loopback interface)
ip addr add 100.10.0.100/32 dev lo2. ExaBGP Announces Availability
# When service is healthy
announce route 100.10.0.100/32 next-hop self3. Network Routes Traffic
Router receives announcements from multiple locations
→ BGP best path selection (closest/lowest metric wins)
→ Traffic routed to nearest healthy instance
4. Automatic Failover
# When service fails health check
withdraw route 100.10.0.100/32Network converges to remaining healthy instances
→ Traffic automatically rerouted
→ Users experience minimal disruption (BGP convergence ~5-15 seconds)
vs. Hardware Load Balancers:
- ✅ No single point of failure: Distributed across all nodes
- ✅ Lower cost: No expensive load balancer hardware
- ✅ Geographic distribution: Works across data centers/regions
- ✅ Application-aware: Services control their own routing
vs. DNS-based Failover:
- ✅ Faster failover: BGP convergence (~5-15 sec) vs. DNS TTL (minutes)
- ✅ Automatic: No DNS record updates required
- ✅ Client-agnostic: Works regardless of DNS caching
- ✅ Connection-level: Existing connections unaffected during failover
vs. Static BGP Announcements:
- ✅ Health-aware: Automatic withdrawal when service fails
- ✅ Dynamic: Announce/withdraw based on real-time service state
- ✅ Flexible: Custom health check logic
- ✅ Application-driven: Service determines when it's ready
Simple deployment - each server peers directly with network:
┌──────────────────┐ ┌──────────────────┐
│ Server 1 │ │ Server 2 │
│ ┌──────────────┐ │ │ ┌──────────────┐ │
│ │ Service │ │ │ │ Service │ │
│ │ 100.10.0.100 │ │ │ │ 100.10.0.100 │ │
│ └──────┬───────┘ │ │ └──────┬───────┘ │
│ ┌──────┴───────┐ │ │ ┌──────┴───────┐ │
│ │ ExaBGP │◄┼─────────┼─┤ ExaBGP │ │
│ └──────┬───────┘ │ iBGP │ └──────┬───────┘ │
└────────┼─────────┘ └────────┼─────────┘
│ │
└────────────┬───────────────┘
│
▼
┌──────────────┐
│ Edge Router │
└──────────────┘
│
▼
Client Traffic
Characteristics:
- Simple topology
- Each server announces its own service IP
- Edge router sees multiple paths, chooses best
- Good for small deployments (< 50 servers)
Scalable deployment with route reflectors:
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│Server 1 │ │Server 2 │ │Server 3 │ │Server N │
│ExaBGP │ │ExaBGP │ │ExaBGP │ │ExaBGP │
└────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘
│ │ │ │
└────────────┴────────────┴────────────┘
│ iBGP
▼
┌──────────────┐
│ Route Server │ (Route Reflector)
│ (Redundant) │
└──────┬───────┘
│
┌───────────┼───────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Edge │ │ Edge │ │ Edge │
│ Router │ │ Router │ │ Router │
└─────────┘ └─────────┘ └─────────┘
Benefits:
- Scales to thousands of servers
- Reduced BGP session count
- Centralized policy control
- Standard BGP best practice
Global anycast across multiple regions:
Region A (US-East) Region B (EU-West)
┌──────────────┐ ┌──────────────┐
│ Servers │ │ Servers │
│ ExaBGP │ │ ExaBGP │
│ 100.10.0.100 │ │ 100.10.0.100 │
└──────┬───────┘ └──────┬───────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Route Server │ │ Route Server │
│ US-East │ │ EU-West │
└──────┬───────┘ └──────┬───────┘
│ │
└──────────┬─────────────────┘
│
▼
┌──────────────┐
│ Global │
│ Backbone │
└──────────────┘
Clients routed to nearest region automatically
Use cases:
- CDN-style content delivery
- Global DNS anycast
- Latency-sensitive applications
- Geographic redundancy
Configure service IPs on loopback interface (all servers):
Linux:
# Add service IP to loopback
ip addr add 100.10.0.100/32 dev lo
# Make persistent (Ubuntu/Debian - /etc/network/interfaces)
auto lo:0
iface lo:0 inet static
address 100.10.0.100
netmask 255.255.255.255
# Or systemd-networkd (/etc/systemd/network/10-lo0.network)
[Match]
Name=lo
[Address]
Address=100.10.0.100/32FreeBSD:
# /etc/rc.conf
ifconfig_lo0_alias0="inet 100.10.0.100 netmask 255.255.255.255"Announce multiple service IPs per server:
#!/usr/bin/env python3
"""
Multi-service anycast announcements
"""
import sys
import time
SERVICES = {
'100.10.0.100': {'port': 80, 'name': 'web'},
'100.10.0.101': {'port': 53, 'name': 'dns'},
'100.10.0.102': {'port': 3306, 'name': 'db'},
}
def is_service_healthy(ip, port):
"""Check if service is healthy"""
# Your health check logic
pass
time.sleep(2)
# Announce all healthy services
for service_ip, config in SERVICES.items():
if is_service_healthy(service_ip, config['port']):
sys.stdout.write(f"announce route {service_ip}/32 next-hop self\n")
sys.stdout.flush()
sys.stderr.write(f"[ANNOUNCE] {config['name']} at {service_ip}\n")
# Monitor and update
while True:
for service_ip, config in SERVICES.items():
healthy = is_service_healthy(service_ip, config['port'])
# Announce or withdraw based on health
# (implementation details below)
time.sleep(5)Simple TCP port check:
#!/usr/bin/env python3
"""
Basic anycast with TCP health check
"""
import sys
import time
import socket
SERVICE_IP = "100.10.0.100"
SERVICE_PORT = 80
CHECK_INTERVAL = 5
def is_service_healthy():
"""TCP health check"""
try:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(2)
result = sock.connect_ex(('127.0.0.1', SERVICE_PORT))
sock.close()
return result == 0
except:
return False
time.sleep(2)
announced = False
while True:
healthy = is_service_healthy()
if healthy and not announced:
# Service is up, announce route
sys.stdout.write(f"announce route {SERVICE_IP}/32 next-hop self\n")
sys.stdout.flush()
sys.stderr.write(f"[ANNOUNCE] Service healthy, announcing {SERVICE_IP}\n")
announced = True
elif not healthy and announced:
# Service is down, withdraw route
sys.stdout.write(f"withdraw route {SERVICE_IP}/32 next-hop self\n")
sys.stdout.flush()
sys.stderr.write(f"[WITHDRAW] Service unhealthy, withdrawing {SERVICE_IP}\n")
announced = False
time.sleep(CHECK_INTERVAL)Check HTTP endpoint:
#!/usr/bin/env python3
import sys
import time
import urllib.request
SERVICE_IP = "100.10.0.100"
HEALTH_URL = "http://127.0.0.1:80/health"
def is_service_healthy():
"""HTTP health check"""
try:
response = urllib.request.urlopen(HEALTH_URL, timeout=2)
return response.getcode() == 200
except:
return False
time.sleep(2)
announced = False
while True:
healthy = is_service_healthy()
if healthy and not announced:
sys.stdout.write(f"announce route {SERVICE_IP}/32 next-hop self\n")
sys.stdout.flush()
announced = True
elif not healthy and announced:
sys.stdout.write(f"withdraw route {SERVICE_IP}/32 next-hop self\n")
sys.stdout.flush()
announced = False
time.sleep(5)Prevent flapping with dampening:
#!/usr/bin/env python3
"""
Anycast with rise/fall dampening
Requires N consecutive successes to announce
Requires M consecutive failures to withdraw
"""
import sys
import time
import socket
SERVICE_IP = "100.10.0.100"
SERVICE_PORT = 80
RISE_COUNT = 3 # Consecutive successes to announce
FALL_COUNT = 2 # Consecutive failures to withdraw
rise_counter = 0
fall_counter = 0
announced = False
def is_service_healthy():
try:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(2)
result = sock.connect_ex(('127.0.0.1', SERVICE_PORT))
sock.close()
return result == 0
except:
return False
time.sleep(2)
while True:
healthy = is_service_healthy()
if healthy:
rise_counter += 1
fall_counter = 0
# Announce after N consecutive successes
if rise_counter >= RISE_COUNT and not announced:
sys.stdout.write(f"announce route {SERVICE_IP}/32 next-hop self\n")
sys.stdout.flush()
sys.stderr.write(f"[ANNOUNCE] Service stable, announcing {SERVICE_IP}\n")
announced = True
else:
fall_counter += 1
rise_counter = 0
# Withdraw after M consecutive failures
if fall_counter >= FALL_COUNT and announced:
sys.stdout.write(f"withdraw route {SERVICE_IP}/32 next-hop self\n")
sys.stdout.flush()
sys.stderr.write(f"[WITHDRAW] Service failed, withdrawing {SERVICE_IP}\n")
announced = False
time.sleep(5)Gracefully drain traffic for maintenance:
#!/usr/bin/env python3
"""
Anycast with maintenance mode support
Touch /var/run/service-maintenance to enter maintenance
"""
import sys
import time
import socket
import os
SERVICE_IP = "100.10.0.100"
MAINTENANCE_FILE = "/var/run/service-maintenance"
def is_maintenance_mode():
"""Check if maintenance mode enabled"""
return os.path.exists(MAINTENANCE_FILE)
def is_service_healthy():
"""Health check"""
try:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(2)
result = sock.connect_ex(('127.0.0.1', 80))
sock.close()
return result == 0
except:
return False
time.sleep(2)
announced = False
while True:
# Don't announce if in maintenance mode
if is_maintenance_mode():
if announced:
sys.stdout.write(f"withdraw route {SERVICE_IP}/32 next-hop self\n")
sys.stdout.flush()
sys.stderr.write(f"[MAINTENANCE] Entering maintenance, withdrawing {SERVICE_IP}\n")
announced = False
else:
healthy = is_service_healthy()
if healthy and not announced:
sys.stdout.write(f"announce route {SERVICE_IP}/32 next-hop self\n")
sys.stdout.flush()
announced = True
elif not healthy and announced:
sys.stdout.write(f"withdraw route {SERVICE_IP}/32 next-hop self\n")
sys.stdout.flush()
announced = False
time.sleep(5)Enter maintenance:
# Drain traffic from this server
touch /var/run/service-maintenance
# Perform maintenance
systemctl restart nginx
# Exit maintenance
rm /var/run/service-maintenanceMultiple DNS servers announce same IP:
#!/usr/bin/env python3
"""
Anycast DNS with health checking
"""
import sys
import time
import socket
DNS_IP = "100.10.0.53"
DNS_PORT = 53
def is_dns_healthy():
"""Check if DNS server responds"""
try:
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.settimeout(2)
# Send DNS query for example.com
query = b'\x00\x00\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x07example\x03com\x00\x00\x01\x00\x01'
sock.sendto(query, ('127.0.0.1', DNS_PORT))
# Expect response
data, _ = sock.recvfrom(512)
sock.close()
return len(data) > 0
except:
return False
time.sleep(2)
announced = False
while True:
if is_dns_healthy() and not announced:
sys.stdout.write(f"announce route {DNS_IP}/32 next-hop self\n")
sys.stdout.flush()
announced = True
elif not is_dns_healthy() and announced:
sys.stdout.write(f"withdraw route {DNS_IP}/32 next-hop self\n")
sys.stdout.flush()
announced = False
time.sleep(5)Benefits:
- Geographic distribution (low latency)
- Automatic failover (no DNS changes)
- DDoS resilience (distributed)
- Simple scaling (add more servers)
Load balancing across web servers:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Web Server 1 │ │ Web Server 2 │ │ Web Server 3 │
│ NGINX │ │ NGINX │ │ NGINX │
│ ExaBGP │ │ ExaBGP │ │ ExaBGP │
│ 100.10.0.80 │ │ 100.10.0.80 │ │ 100.10.0.80 │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
└─────────────────┴─────────────────┘
│
▼
Client traffic distributed
Health check:
import urllib.request
def is_web_healthy():
try:
response = urllib.request.urlopen('http://127.0.0.1/health', timeout=2)
return response.getcode() == 200
except:
return FalseDistribute read queries across replicas:
#!/usr/bin/env python3
"""
Anycast for database read replicas
Only announce when replication lag is acceptable
"""
import sys
import time
import psycopg2
DB_IP = "100.10.0.5432"
MAX_LAG_SECONDS = 10
def get_replication_lag():
"""Check PostgreSQL replication lag"""
try:
conn = psycopg2.connect(
host='127.0.0.1',
database='postgres',
user='monitor'
)
cursor = conn.cursor()
# Check lag on replica
cursor.execute("""
SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))
""")
lag = cursor.fetchone()[0]
conn.close()
return lag if lag else 0
except:
return float('inf') # Infinite lag = unhealthy
time.sleep(2)
announced = False
while True:
lag = get_replication_lag()
healthy = lag < MAX_LAG_SECONDS
if healthy and not announced:
sys.stdout.write(f"announce route {DB_IP}/32 next-hop self\n")
sys.stdout.flush()
sys.stderr.write(f"[ANNOUNCE] Replication lag OK ({lag:.1f}s)\n")
announced = True
elif not healthy and announced:
sys.stdout.write(f"withdraw route {DB_IP}/32 next-hop self\n")
sys.stdout.flush()
sys.stderr.write(f"[WITHDRAW] Replication lag too high ({lag:.1f}s)\n")
announced = False
time.sleep(10)Content delivery network edge nodes:
User Request → Anycast IP (100.10.0.80)
→ Routed to nearest healthy edge node
→ Content served from cache
→ If cache miss, origin fetch
Features:
- Geographic distribution
- Automatic failover
- Cache server health awareness
- Origin shield integration
Decide on IP addressing:
Service IP Port
─────────────────────────────────────
Web 100.10.0.80 80/443
DNS 100.10.0.53 53
Database (RO) 100.10.0.5432 5432
API 100.10.0.443 443
On all servers:
# Add service IP
ip addr add 100.10.0.80/32 dev lo
# Verify
ip addr show lopip install exabgpCreate config (/etc/exabgp/anycast.conf):
neighbor 192.168.1.1 {
router-id 192.168.1.10;
local-address 192.168.1.10;
local-as 65001;
peer-as 65001;
family {
ipv4 unicast;
}
api {
processes [ healthcheck ];
}
}
process healthcheck {
run /etc/exabgp/healthcheck.py;
encoder text;
}Create (/etc/exabgp/healthcheck.py):
#!/usr/bin/env python3
import sys
import time
import socket
SERVICE_IP = "100.10.0.80"
SERVICE_PORT = 80
def is_healthy():
try:
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(2)
result = sock.connect_ex(('127.0.0.1', SERVICE_PORT))
sock.close()
return result == 0
except:
return False
time.sleep(2)
announced = False
while True:
if is_healthy() and not announced:
sys.stdout.write(f"announce route {SERVICE_IP}/32 next-hop self\n")
sys.stdout.flush()
announced = True
elif not is_healthy() and announced:
sys.stdout.write(f"withdraw route {SERVICE_IP}/32 next-hop self\n")
sys.stdout.flush()
announced = False
time.sleep(5)chmod +x /etc/exabgp/healthcheck.py# Foreground (testing)
exabgp /etc/exabgp/anycast.conf
# Background (production - systemd)
systemctl start exabgp
systemctl enable exabgpOn router:
# Cisco
show ip bgp 100.10.0.80
# Juniper
show route 100.10.0.80
Should see multiple paths (one per server)
Timeline:
T+0s : Service crashes on Server 1
T+0-5s : Health check detects failure
T+5s : ExaBGP withdraws route
T+5-20s: BGP convergence
T+20s : All traffic routed to Server 2
User impact: Active connections to Server 1 fail, new connections go to Server 2
Graceful drain:
# 1. Enter maintenance mode
touch /var/run/service-maintenance
# 2. ExaBGP withdraws route (existing connections continue)
# 3. Wait for connections to drain (check with: ss -tan | grep :80 | wc -l)
# 4. Perform maintenance
systemctl restart nginx
# 5. Exit maintenance
rm /var/run/service-maintenance
# 6. Service announces again, receives new trafficSplit-brain scenario:
Data Center A <--X--> Data Center B
(Servers 1-2) (Servers 3-4)
Result:
- Clients in region A see Servers 1-2
- Clients in region B see Servers 3-4
- Each region has local service (good!)
- But services are partitioned (application-dependent)
Mitigation:
- Use external quorum/consensus service
- Monitor for partitions
- Alert operations team
Always use /32 for service IPs:
# ✅ Correct
ip addr add 100.10.0.80/32 dev lo
# ❌ Wrong
ip addr add 100.10.0.80/24 dev lo # Too broad!Why: Prevents unintended routing conflicts
Prevent route flapping:
RISE_COUNT = 3 # 3 consecutive successes
FALL_COUNT = 2 # 2 consecutive failuresEnsure BGP sessions stay up:
import subprocess
def check_bgp_session():
"""Check if BGP session is established"""
# Check ExaBGP process
result = subprocess.run(['pgrep', '-f', 'exabgp'], capture_output=True)
return result.returncode == 0import logging
logging.basicConfig(filename='/var/log/exabgp-anycast.log', level=logging.INFO)
def announce(ip):
sys.stdout.write(f"announce route {ip}/32 next-hop self\n")
sys.stdout.flush()
logging.info(f"ANNOUNCE: {ip}")Schedule failover tests:
# Simulate failure
systemctl stop nginx
# Wait for withdrawal
sleep 30
# Verify traffic rerouted
# Restart service
systemctl start nginxSymptoms: Service healthy but route not visible on router
Check:
# 1. Verify ExaBGP running
ps aux | grep exabgp
# 2. Check BGP session
# Look for "neighbor 192.168.1.1 up" in logs
# 3. Verify health check script running
ps aux | grep healthcheck.py
# 4. Test health check manually
/etc/exabgp/healthcheck.pyCommon causes:
- ExaBGP not running
- BGP session down
- Health check script not executable
- Service IP not on loopback
Symptoms: Route announced/withdrawn repeatedly
Diagnosis:
# Monitor route changes
watch -n 1 'show ip bgp 100.10.0.80 | grep paths'Solutions:
- Implement rise/fall counters
- Increase health check interval
- Fix unstable service
- Adjust health check sensitivity
Symptoms: All traffic goes to one server despite multiple announcements
Causes:
- Routers see different AS paths (prefer shorter)
- Different MED values
- Different router IDs
- ECMP not enabled on routers
Solutions:
# Ensure identical announcements from all servers
announce route 100.10.0.80/32 next-hop self
# Or use MED for preference
announce route 100.10.0.80/32 next-hop self med 100Enable ECMP on routers:
# Cisco
maximum-paths 4
# Juniper
set protocols bgp group anycast multipath
Architecture:
- 30+ global DNS servers
- All announce same anycast IPs
- ExaBGP with health checks
- Sub-second failover
Results:
- 99.99% uptime
- Geographic load distribution
- DDoS resilience
- Simple management
Use case: "Stop Buying Load Balancers"
Architecture:
- 100+ web servers in multiple DCs
- ExaBGP for anycast management
- Custom health checks (HTTP /health)
- ECMP across edge routers
Benefits:
- No hardware load balancers
- Automatic failover
- Cross-DC redundancy
- Cost savings
Architecture:
- 10 PostgreSQL read replicas
- Anycast read-only IP
- Health check includes replication lag
- Automatic removal when lag > threshold
Benefits:
- Automatic read load distribution
- Lag-aware routing
- Simple client configuration (single IP)
- No application changes
- Service High Availability - Complete HA patterns
- Quick Start - Basic setup tutorial
- API Overview - API architecture
- Configuration Syntax - Config reference
- First BGP Session - BGP setup
- Debugging - Troubleshooting guide
- Monitoring - Monitoring setup
Ready to deploy anycast? See Quick Start →
Getting Started
Configuration
- Configuration Syntax
- Neighbor Configuration
- Directives A-Z
- Templates
- Environment Variables
- Process Configuration
API
- API Overview
- Text API Reference
- JSON API Reference
- API Commands
- Writing API Programs
- Error Handling
- Production Best Practices
Address Families
- Overview
- IPv4 Unicast
- IPv6 Unicast
- FlowSpec
- EVPN
- L3VPN
- BGP-LS
- VPLS
- SRv6 / MUP
- Multicast
- RT Constraint
Features
Use Cases
Tools
Operations
Reference
- Architecture
- Design
- Attribute Reference
- Command Reference
- BGP State Machine
- Capabilities
- Communities
- Examples Index
- Glossary
- RFC Support
Integration
Migration
Community
External