# Debugging ExaBGP **Troubleshooting guide for common ExaBGP issues** > 🔍 **Most issues are configuration or API process problems** - not ExaBGP bugs --- ## Table of Contents - [Quick Diagnosis](#quick-diagnosis) - [Common Issues](#common-issues) - [Debug Mode](#debug-mode) - [Logging](#logging) - [BGP Session Issues](#bgp-session-issues) - [API Process Issues](#api-process-issues) - [Route Announcement Issues](#route-announcement-issues) - [Performance Issues](#performance-issues) - [Tools and Commands](#tools-and-commands) - [Getting Help](#getting-help) --- ## Quick Diagnosis **Start here for fast troubleshooting:** ### 1. Is ExaBGP Running? ```bash # Check process ps aux | grep exabgp # Check with pgrep pgrep -f exabgp # Check via systemd systemctl status exabgp ``` --- ### 2. Is BGP Session Established? ```bash # In ExaBGP logs, look for: grep "neighbor.*up" /var/log/exabgp.log # On router (Cisco): show bgp summary | grep 192.168.1.2 # On router (Juniper): show bgp summary | match 192.168.1.2 ``` --- ### 3. Are Routes Being Announced? ```bash # Check ExaBGP process output tail -f /var/log/exabgp.log | grep announce # Check router (Cisco): show ip bgp neighbors 192.168.1.2 received-routes # Check router (Juniper): show route receive-protocol bgp 192.168.1.2 ``` --- ### 4. Is API Process Running? ```bash # Check if health check / API script is running ps aux | grep healthcheck.py # Check for errors in stderr tail -f /var/log/exabgp.log | grep ERROR ``` --- ## Common Issues ### Issue 1: ExaBGP Won't Start **Symptoms:** ExaBGP exits immediately after starting **Check logs:** ```bash env exabgp.log.level=DEBUG exabgp /etc/exabgp/exabgp.conf 2>&1 | tee /tmp/exabgp-debug.log ``` **Common causes:** #### A. Configuration Syntax Error **Error message:** ``` configuration issue: syntax error ``` **Fix:** ```bash # Test configuration exabgp configuration validate /etc/exabgp/exabgp.conf # Check for common mistakes: # - Missing semicolons # - Incorrect indentation # - Typos in directives ``` **Example error:** ```ini # ❌ WRONG (missing semicolon) neighbor 192.168.1.1 { router-id 192.168.1.2 local-as 65001 } # ✅ CORRECT neighbor 192.168.1.1 { router-id 192.168.1.2; local-as 65001; } ``` --- #### B. API Process Not Found **Error message:** ``` process healthcheck run /etc/exabgp/healthcheck.py - [Errno 2] No such file or directory ``` **Fix:** ```bash # Verify script exists ls -l /etc/exabgp/healthcheck.py # Make executable chmod +x /etc/exabgp/healthcheck.py # Test manually /etc/exabgp/healthcheck.py ``` --- #### C. Python Version Mismatch **Error message:** ``` python3: No module named exabgp ``` **Fix:** ```bash # Check Python version python3 --version # ExaBGP requires Python 3.8.1+ # Reinstall if needed pip install --upgrade exabgp # Verify installation python3 -m exabgp version ``` --- ### Issue 2: BGP Session Not Establishing **Symptoms:** BGP session stuck in "Connect" or "Active" state **Debug:** ```bash # Run ExaBGP in debug mode env exabgp.log.level=DEBUG exabgp /etc/exabgp/exabgp.conf 2>&1 | grep -i "neighbor\|tcp" ``` **Common causes:** #### A. TCP Connection Failure **Error message:** ``` Connection refused ``` **Check:** ```bash # Test TCP connection to router telnet 192.168.1.1 179 # Check if router is listening # On router (Cisco): show tcp brief | include 179 # Verify firewall allows BGP iptables -L -n | grep 179 ``` **Fix:** ```cisco # On router, ensure BGP neighbor configured router bgp 65000 neighbor 192.168.1.2 remote-as 65001 ``` --- #### B. Authentication Failure **Error message:** ``` NOTIFICATION sent to peer 192.168.1.1 code 2 (OPEN Message Error) ``` **Check:** ```bash # Verify MD5 password matches grep md5-password /etc/exabgp/exabgp.conf ``` **Fix:** ```ini # ExaBGP config neighbor 192.168.1.1 { md5-password "secret123"; # Must match router } ``` ```cisco # Router config (must match!) router bgp 65000 neighbor 192.168.1.2 password secret123 ``` --- #### C. ASN Mismatch **Error message:** ``` NOTIFICATION sent code 2 subcode 2 (Bad Peer AS) ``` **Fix:** ```ini # Verify ASNs match # ExaBGP: local-as 65001; peer-as 65000; # Router: # router bgp 65000 # neighbor 192.168.1.2 remote-as 65001 ``` --- ### Issue 3: Routes Not Being Announced **Symptoms:** BGP session up, but routes not on router **Debug:** ```bash # Watch API process output tail -f /var/log/exabgp.log | grep -i "announce\|withdraw" ``` **Common causes:** #### A. API Process Not Sending Commands **Check:** ```bash # Is API process running? ps aux | grep healthcheck.py # Test API process manually /etc/exabgp/healthcheck.py # Should see output like: # announce route 100.10.0.100/32 next-hop self ``` **Fix:** ```python # Common mistake: forgetting to flush stdout sys.stdout.write("announce route 100.10.0.100/32 next-hop self\n") sys.stdout.flush() # ← CRITICAL! ``` --- #### B. Address Family Not Enabled **Error:** No error, routes just not visible **Check:** ```bash # Verify address family in config grep "family" /etc/exabgp/exabgp.conf ``` **Fix:** ```ini neighbor 192.168.1.1 { family { ipv4 unicast; # Must be enabled! } } ``` ```cisco # Router must also enable address family router bgp 65000 neighbor 192.168.1.2 remote-as 65001 ! address-family ipv4 unicast neighbor 192.168.1.2 activate ! ``` --- #### C. Routes Filtered by Router Policy **Check:** ```cisco # Cisco - check for route-map show bgp neighbors 192.168.1.2 | include route-map # Check if routes rejected show ip bgp neighbors 192.168.1.2 received-routes ``` **Fix:** ```cisco # Remove or adjust route-map router bgp 65000 neighbor 192.168.1.2 route-map ACCEPT in route-map ACCEPT permit 10 ``` --- ### Issue 4: Routes Announced but Not Installed **Symptoms:** Routes visible in BGP table but not in routing table **Check:** ```cisco # Cisco show ip bgp 100.10.0.100 # Shows in BGP table? show ip route 100.10.0.100 # Shows in routing table? ``` **Common causes:** #### A. Invalid Next-Hop **Error:** Route in BGP table but marked invalid **Fix:** ```python # ✅ Use "next-hop self" announce route 100.10.0.100/32 next-hop self # Or explicit reachable next-hop announce route 100.10.0.100/32 next-hop 192.168.1.2 ``` --- #### B. Better Path Exists **Router prefers different route** (lower MED, shorter AS-PATH, etc.) **Check:** ```cisco show ip bgp 100.10.0.100 # Look for "best" marker ``` **Fix:** ```python # Adjust BGP attributes to make route preferred announce route 100.10.0.100/32 next-hop self local-preference 200 ``` --- ### Issue 5: API Process Crashes **Symptoms:** ExaBGP runs but API process keeps exiting **Check logs:** ```bash tail -f /var/log/exabgp.log | grep -i "process.*exit\|error" ``` **Common causes:** #### A. Python Exception **Error:** ``` Process healthcheck exited with code 1 ``` **Debug:** ```bash # Run API process manually python3 /etc/exabgp/healthcheck.py # Add error handling import sys import traceback try: # Your code pass except Exception as e: sys.stderr.write(f"ERROR: {e}\n") traceback.print_exc(file=sys.stderr) ``` --- #### B. Missing Python Modules **Error:** ``` ModuleNotFoundError: No module named 'requests' ``` **Fix:** ```bash # Install missing module pip install requests # Or add to requirements echo "requests" >> requirements.txt pip install -r requirements.txt ``` --- ### Issue 6: FlowSpec Rules Not Applied **Symptoms:** FlowSpec announced but traffic not filtered **Check:** ```cisco # Cisco show flowspec ipv4 show flowspec ipv4 detail # Juniper show firewall filter __flowspec_default_inet__ ``` **Common causes:** #### A. FlowSpec Not Enabled on Router **Fix:** ```cisco # Cisco IOS-XR router bgp 65000 address-family ipv4 flow neighbor 192.168.1.2 activate ! ! flowspec local-install interface-all ! ``` --- #### B. FlowSpec Validation Failing **Error:** Rules received but not installed **Fix:** ```cisco # Disable validation (testing only!) flowspec validation off # or "local" ``` --- ## Debug Mode ### Enable Full Debugging **Command line:** ```bash env exabgp.log.level=DEBUG exabgp /etc/exabgp/exabgp.conf ``` **Environment variables:** ```bash # Enable all debug logging export exabgp.log.all=true export exabgp.log.level=DEBUG exabgp /etc/exabgp/exabgp.conf ``` --- ### Selective Debugging **Enable specific subsystems:** ```bash # Debug BGP packets export exabgp.log.packets=true # Debug BGP messages export exabgp.log.message=true # Debug configuration parsing export exabgp.log.configuration=true # Debug process communication export exabgp.log.processes=true # Debug network events export exabgp.log.network=true exabgp /etc/exabgp/exabgp.conf ``` --- ### Decode BGP Messages **Decode captured BGP packets:** ```bash # Capture BGP traffic tcpdump -i eth0 -w bgp.pcap port 179 # Decode with ExaBGP env exabgp.tcp.bind='' exabgp decode -c /etc/exabgp/exabgp.conf \ FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:003C:02:0000001C4001010040020040030465016501800404000000C840050400000064000000002001010101 ``` --- ## Logging ### Log Levels ```bash # Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL) export exabgp.log.level=DEBUG ``` --- ### Log Destinations **Stdout (default):** ```bash exabgp /etc/exabgp/exabgp.conf ``` **File:** ```bash exabgp /etc/exabgp/exabgp.conf > /var/log/exabgp.log 2>&1 ``` **Syslog:** ```ini [exabgp.log] destination = syslog level = INFO ``` --- ### Useful Log Patterns **Search for errors:** ```bash grep -i error /var/log/exabgp.log ``` **Track BGP session state:** ```bash grep "neighbor.*up\|neighbor.*down" /var/log/exabgp.log ``` **Monitor route announcements:** ```bash grep "announce\|withdraw" /var/log/exabgp.log ``` **Find process crashes:** ```bash grep "process.*exit" /var/log/exabgp.log ``` --- ## BGP Session Issues ### Session Won't Establish **Check TCP connectivity:** ```bash # Test connection telnet 192.168.1.1 179 # Check routing traceroute 192.168.1.1 # Verify firewall iptables -L -n | grep 179 ``` --- ### Session Flapping **Symptoms:** BGP session repeatedly going up/down **Check:** ```bash # Monitor session state watch -n 1 'grep "neighbor.*up\|neighbor.*down" /var/log/exabgp.log | tail' ``` **Common causes:** - Network instability - Keepalive/hold-time mismatch - Process crashes - Memory/CPU exhaustion **Fix:** ```ini # Adjust BGP timers neighbor 192.168.1.1 { hold-time 180; # Increase if needed } ``` --- ## API Process Issues ### Process Not Starting **Check:** ```bash # Verify script exists and is executable ls -l /etc/exabgp/healthcheck.py chmod +x /etc/exabgp/healthcheck.py # Test manually /etc/exabgp/healthcheck.py ``` --- ### Process Crashes Immediately **Debug:** ```bash # Run with Python directly python3 /etc/exabgp/healthcheck.py # Check for: # - Syntax errors # - Missing imports # - Exceptions ``` --- ### Process Hangs **Symptoms:** API process runs but doesn't send commands **Debug:** ```bash # Add debug output sys.stderr.write("[DEBUG] Script started\n") sys.stderr.write(f"[DEBUG] Service healthy: {is_healthy()}\n") sys.stderr.write("[DEBUG] Announcing route\n") sys.stdout.write("announce route 100.10.0.100/32 next-hop self\n") sys.stdout.flush() sys.stderr.write("[DEBUG] Route announced\n") ``` --- ## Route Announcement Issues ### Routes Not Visible on Router **Checklist:** 1. ✅ BGP session established? 2. ✅ Address family enabled? 3. ✅ API process sending commands? 4. ✅ Commands have newline + flush? 5. ✅ Router policy allowing routes? --- ### Routes Announced but Invalid **Check next-hop:** ```cisco show ip bgp 100.10.0.100 # Look for "inaccessible" or "invalid" ``` **Fix:** ```python # Use next-hop self or reachable IP announce route 100.10.0.100/32 next-hop self ``` --- ## Performance Issues ### High CPU Usage **Symptoms:** ExaBGP consuming excessive CPU **Check:** ```bash # Monitor CPU top -p $(pgrep -f exabgp) # Check API process ps aux | grep healthcheck.py ``` **Common causes:** - Tight loop in API process - No sleep in while loop - Processing large BGP tables **Fix:** ```python while True: # Your code time.sleep(5) # ← Add sleep! ``` --- ### High Memory Usage **Symptoms:** ExaBGP using excessive RAM **Check:** ```bash # Monitor memory ps aux | grep exabgp | awk '{print $6 " " $11}' ``` **Common causes:** - Large number of routes - Memory leak in API process - Large BGP tables from peers --- ## Tools and Commands ### Useful ExaBGP Commands ```bash # Test configuration exabgp configuration validate /etc/exabgp/exabgp.conf # Show version exabgp version # Decode BGP message exabgp decode -c config.conf # Run health check exabgp --run healthcheck --help ``` --- ### Network Debugging Tools ```bash # Capture BGP traffic tcpdump -i eth0 port 179 -w bgp.pcap # Test TCP connection telnet 192.168.1.1 179 nc -zv 192.168.1.1 179 # Monitor connections ss -tan | grep :179 netstat -an | grep :179 # Check routing ip route get 192.168.1.1 traceroute 192.168.1.1 ``` --- ### Router Commands **Cisco IOS/IOS-XR:** ```cisco ! BGP summary show bgp summary show ip bgp summary ! Specific neighbor show bgp neighbors 192.168.1.2 show ip bgp neighbors 192.168.1.2 received-routes show ip bgp neighbors 192.168.1.2 routes ! Route details show ip bgp 100.10.0.100 ! FlowSpec show flowspec ipv4 show flowspec ipv4 detail ``` **Juniper Junos:** ```juniper ! BGP summary show bgp summary ! Specific neighbor show bgp neighbor 192.168.1.2 ! Routes from peer show route receive-protocol bgp 192.168.1.2 ! Route details show route 100.10.0.100 detail ``` --- ## Getting Help ### Before Asking for Help **Gather this information:** 1. **ExaBGP version** ```bash exabgp version ``` 2. **Full debug log** ```bash env exabgp.log.level=DEBUG exabgp /etc/exabgp/exabgp.conf 2>&1 | tee debug.log ``` 3. **Configuration file** ```bash cat /etc/exabgp/exabgp.conf ``` 4. **API process code** ```bash cat /etc/exabgp/healthcheck.py ``` 5. **Router BGP config** (sanitized) 6. **Error messages** (exact text) --- ### Where to Get Help **GitHub Issues:** - https://github.com/Exa-Networks/exabgp/issues - Search existing issues first - Include all debug information **Slack:** - https://exabgp.slack.com/ - Real-time help during business hours (GMT/BST) - Share logs/config (pastebin/gist) **Documentation:** - [ExaBGP Wiki](https://github.com/Exa-Networks/exabgp/wiki) - [API Reference](Text-API-Reference) - [Configuration Reference](Configuration-Syntax) --- ### How to Report Bugs **Include:** 1. ExaBGP version (`exabgp version`) 2. Python version (`python3 --version`) 3. Operating system (`uname -a`) 4. Full configuration file (sanitized) 5. Complete debug output (`env exabgp.log.level=DEBUG exabgp server`) 6. Steps to reproduce 7. Expected vs actual behavior **Format:** ```markdown ## Environment - ExaBGP version: 4.2.25 - Python version: 3.9.2 - OS: Ubuntu 20.04 ## Configuration ```ini neighbor 192.168.1.1 { ... } ``` ## Steps to Reproduce 1. Start ExaBGP with config 2. Run health check script 3. Observe... ## Expected Behavior Routes should be announced ## Actual Behavior No routes announced, error: ... ## Debug Log ``` ``` ``` --- ## Quick Reference ### Checklist for Common Issues **ExaBGP won't start:** - [ ] Configuration syntax correct? - [ ] API process script exists? - [ ] Script is executable? - [ ] Python version >= 3.8.1? **BGP session won't establish:** - [ ] TCP connection works? (telnet) - [ ] ASNs match? - [ ] MD5 password matches? - [ ] Address family enabled? **Routes not announced:** - [ ] API process running? - [ ] stdout.flush() called? - [ ] Address family enabled? - [ ] Router policy allowing? **FlowSpec not working:** - [ ] FlowSpec family enabled? - [ ] Router supports FlowSpec? - [ ] FlowSpec locally installed? - [ ] Validation passing? --- ## Next Steps ### Learn More - **[Monitoring](Monitoring)** - Monitor ExaBGP in production - **[API Overview](API-Overview)** - API architecture - **[Quick Start](Quick-Start)** - Getting started ### Configuration - **[Configuration Syntax](Configuration-Syntax)** - Config reference - **[Directives Reference](Directives-Reference)** - A-Z directives --- **Still stuck?** Join our [Slack community](https://exabgp.slack.com/) or [file an issue](https://github.com/Exa-Networks/exabgp/issues) → ---