This project is part of a complete end-to-end trading system:
- Main Repository: fpga-trading-systems
- Project Number: 33 of 38 (for now, more to come)
- Category: FPGA Core
- Dependencies: None - Project foundation new projects
Platform: Xilinx Kintex-7 (XC7K325T on ALINX AX7325B) Technology: Pure VHDL, no vendor IP cores (except GTX primitives) Status: Hardware Tested - WNS +1.194ns, 0 critical warnings, PHY working reliably at 10Gbps Full Duplex
A complete custom implementation of the 10GBASE-R Physical Layer (PHY) in VHDL, designed for the Phase 3 multi-FPGA trading system architecture. This implementation provides full control over the 10 Gigabit Ethernet physical layer without relying on encrypted vendor IP.
Key Innovation: Full custom PCS (Physical Coding Sublayer) implementation with:
- 64B/66B encoder/decoder
- Self-synchronizing scrambler/descrambler
- Block lock state machine
- Direct GTX transceiver control
Target Use Case: Phase 3 Aurora-style low-latency links between FPGAs in the multi-FPGA trading system. The custom implementation allows fine-tuning for minimal latency.
┌─────────────────────────────────────────────────────────────────────────┐
│ PHY_10GBASE_R_TOP │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ PCS Layer │ │
│ │ │ │
│ │ TX Path: │ │
│ │ XGMII ──► 64B/66B Encoder ──► Scrambler ──► GTX TX │ │
│ │ │ │
│ │ RX Path: │ │
│ │ GTX RX ──► Block Lock ──► Descrambler ──► 64B/66B Decoder ──► XGMII │
│ │ │ │ │
│ │ └──► Slip Control │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────┴───────────────────────────────┐ │
│ │ GTX Wrapper │ │
│ │ │ │
│ │ QPLL (10.3125 GHz) ──► GTX Channel ──► SFP+ Serial Interface │ │
│ │ 156.25 MHz refclk Gearbox │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
01: Data block (all 8 bytes are data)10: Control block (contains control characters)00,11: Invalid (used for block alignment detection)
| Type Field | Format | Description |
|---|---|---|
| 0x1E | C0-C7 | All control (idle) |
| 0x78 | S,D1-D7 | Start in lane 0 |
| 0x87 | T,C1-C7 | Terminate in lane 0 |
| 0x99 | D0,T,C2-C7 | Terminate in lane 1 |
| 0xAA | D0D1,T,C3-C7 | Terminate in lane 2 |
| 0xB4 | D0D1D2,T,C4-C7 | Terminate in lane 3 |
| 0xCC | D0D1D2D3,T,C5-C7 | Terminate in lane 4 |
| 0xD2 | D0-D4,T,C6C7 | Terminate in lane 5 |
| 0xE1 | D0-D5,T,C7 | Terminate in lane 6 |
| 0xFF | D0-D6,T | Terminate in lane 7 |
G(X) = 1 + X^39 + X^58
Self-synchronizing: RX descrambler automatically locks within 58 bits.
- Configures GTXE2_COMMON (QPLL) for 10.3125 GHz
- Configures GTXE2_CHANNEL for 64B/66B mode
- Handles gearbox (64-bit user <-> 32-bit GTX internal)
- Reset sequencing and status monitoring
- XGMII (64+8) -> 66-bit encoded blocks
- Detects Start/Terminate/Control characters
- Handles all IEEE 802.3 block types
- 58-bit LFSR with polynomial X^58 + X^39 + 1
- Parallel implementation (64 bits per clock)
- Header bypassed (not scrambled)
- Self-synchronizing (shifts in received data)
- Automatic lock within 58 bits
- Sync status output
- Finds 66-bit block boundaries
- 64 valid headers -> lock
- 16 invalid in 64 -> unlock
- Slip control for gearbox alignment
- 66-bit encoded blocks -> XGMII (64+8)
- Decodes all block types
- Error character insertion on invalid blocks
- Integrates TX and RX paths
- Status aggregation
- Debug outputs
- Complete PHY with GTX + PCS
- XGMII interface for MAC
- SFP+ control signals
| Parameter | Value | Notes |
|---|---|---|
| Line Rate | 10.3125 Gbps | 64B/66B encoded |
| Reference Clock | 156.25 MHz | Differential |
| XGMII Clock | 156.25 MHz | 64-bit @ 156.25 MHz = 10 Gbps |
| Block Rate | 156.25 MHz / 66 * 64 ~= 151.5 MHz | 66-bit blocks |
| Stage | Cycles | Time (ns) |
|---|---|---|
| XGMII -> Encoder | 1 | 6.4 |
| Scrambler | 1 | 6.4 |
| GTX TX serializer | ~2-3 | 12.8-19.2 |
| Wire (1m) | - | 5 |
| GTX RX deserializer | ~2-3 | 12.8-19.2 |
| Block lock (steady) | 0 | 0 |
| Descrambler | 1 | 6.4 |
| Decoder | 1 | 6.4 |
| Total (estimate) | 8-10 | ~50-80 ns |
Build top: phy_10gbase_r_test_top (PHY + debug UART + test infrastructure)
| Resource | Used | Available | Util% |
|---|---|---|---|
| Slice LUTs | 704 | 203,800 | 0.35% |
| LUT as Logic | 701 | 203,800 | 0.34% |
| Slice Registers | 1,240 | 407,600 | 0.30% |
| BRAM | 0 | 445 | 0.00% |
| F7 Muxes | 43 | 101,900 | 0.04% |
| GTX Transceivers | 1 | 16 | 6.25% |
| BUFG | 5 | 32 | 15.63% |
| MMCM | 1 | 10 | 10.00% |
Timing Summary:
- sys_clk (200 MHz): WNS +1.194ns, 0 failing paths
- tx_mmcm_clk1 (161.13 MHz): WNS +2.669ns, 0 failing paths
- 0 TIMING-17 critical warnings, 0 unconstrained registers
Note on sys_clk WNS (+1.194ns): The critical path is in the debug UART module (uart_debug_inst), not in the PHY core. The PHY itself (PCS + GTX) runs entirely on tx_mmcm_clk1 with +2.669ns margin. The test top layer (phy_10gbase_r_test_top) and debug reporter are required to test the PHY on hardware -- without them there is no way to verify link status, block lock, or packet counters. The PHY core alone (phy_10gbase_r_top) has no sys_clk logic and would show only the tx_clk WNS of +2.669ns.
33-10gbe-phy-custom/
├── README.md
├── src/
│ ├── phy_10gbase_r_top.vhd # Top-level PHY
│ ├── phy_10gbase_r_test_top.vhd # Test wrapper (PHY + debug UART)
│ ├── gtx/
│ │ └── gtx_10g_wrapper.vhd # GTX transceiver wrapper
│ ├── pcs/
│ │ ├── pcs_10gbase_r.vhd # PCS top module
│ │ ├── encoder_64b66b.vhd # 64B/66B encoder
│ │ ├── decoder_64b66b.vhd # 64B/66B decoder
│ │ └── block_lock_fsm.vhd # Block synchronization
│ ├── scrambler/
│ │ ├── scrambler_tx.vhd # TX scrambler
│ │ └── descrambler_rx.vhd # RX descrambler
│ └── debug/
│ ├── gtx_debug_reporter.v # UART debug output
│ └── uart_tx_simple.v # UART TX primitive
├── constraints/
│ └── (pin assignments for target board)
├── test/
│ └── (testbenches)
├── scripts/
│ └── (build scripts)
└── docs/
└── (detailed documentation)
- Vivado 2024.1+ (for Kintex-7 GTX support)
- Target board with SFP+ cage and GTX transceiver access
cd 33-10gbe-phy-custom
vivado -mode batch -source scripts/build.tclSelf-checking testbench with TX->RX loopback:
- Generate random XGMII frames
- Encode, scramble, loopback
- Descramble, decode
- Verify XGMII output matches input
Most developers will not have 10GbE networking at home (2.5GbE is common at best). This project was verified using a dedicated 10GbE fiber-optic test setup:
┌──────────────┐ ┌─────────────────────┐ ┌──────────────┐
│ PC │ RJ45 │ 10GbE Managed │ SFP+ │ AX7325B │
│ (AQC107 NIC)│◄───────►│ Switch (Binardat) │◄───────►│ FPGA Board │
│ 10G RJ45 │ 10Gb │ 4xRJ45 + 4xSFP+ │ Fiber │ (SFP+ Cage) │
└──────────────┘ └─────────────────────┘ └──────────────┘
│
OM3 LC-LC Fiber
+ 10G SFP+ Modules
Hardware used:
| Component | Product | Specs |
|---|---|---|
| SFP+ Modules | 10G SFP+ Fiber Transceiver | SR MM850nm, 300m range, Duplex LC |
| Fiber Cable | Tunghey OM3 LC to LC Patch Cable | Multimode Duplex 50/125um, 15M, LS-ZH |
| 10GbE Switch | Binardat 8-Port 10G Managed Switch | 4x10G RJ45 + 4x10G SFP+, 160Gbps, L3 |
| PC NIC | Binardat 10G PCIe Network Adapter | Aquantia AQC107 chip, RJ45, PXE support |
Important notes:
- DAC (Direct Attach Copper) cables did not work with the AX7325B SFP+ cage -- fiber optics required
- The switch bridges 10G RJ45 (PC side) to 10G SFP+ (FPGA side)
- SFP+ modules must be inserted into both the switch SFP+ port and the FPGA board SFP+ cage
- PC sends test packets via raw sockets or packet generator at 10Gbps line rate
| Component | Status | Notes |
|---|---|---|
| GTX Wrapper | Verified | Hardware tested on AX7325B |
| 64B/66B Encoder | Verified | All block types supported |
| TX Scrambler | Verified | Parallel implementation |
| RX Descrambler | Verified | Self-synchronizing |
| Block Lock FSM | Verified | IEEE 802.3 compliant, edge detection fix applied |
| 64B/66B Decoder | Verified | All block types supported |
| PCS Top | Verified | Integration complete |
| PHY Top | Verified | Block lock achieved (BL:1 ST:7) |
| Testbench | Pending | Loopback test needed |
| Hardware Test | Complete | SFP+ loopback on port 2, stable lock |
Block Lock FSM Timing Issues
- Symptom: FSM cycling through states but never achieving stable lock (BL:0)
- Root Cause: Multiple issues in original FSM design:
- Counter incrementing every cycle in VALID_SH state instead of once per block
- No edge detection for rx_datavalid (continuous high in 64-bit gearbox mode)
- No settle time after gearbox slip operations
- Fix: Redesigned FSM with:
- Rising edge detection on rx_datavalid for new block identification
- WAIT_BLOCK state to wait for block boundaries
- SLIP_WAIT state (8 cycles) for gearbox settling
- Header latching on datavalid edge for stable testing
- Result: Stable block lock achieved (BL:1, ST:7)
Reset Polarity
- Symptom: PCS held in permanent reset (PR:1)
- Root Cause: AX7325B reset button is active-LOW, code assumed active-HIGH
- Fix: Changed reset synchronizer polarity check
GTX Gearbox Configuration
- Symptom: rx_header_valid and rx_datavalid not producing valid output
- Root Cause: TX/RX_INT_DATAWIDTH set to 0 (2-byte) instead of 1 (4-byte)
- Fix: Set INT_DATAWIDTH to 1 for 64-bit external width compatibility
GTX/IEEE Bit Order Mismatch (January 2026)
- Symptom: Block lock achieved (BL:1, ST:7) but no XGMII Start characters detected (SD:0000)
- Root Cause: Bit ordering mismatch between Xilinx GTX gearbox and IEEE 802.3 scrambler:
- GTX gearbox outputs data MSB-first:
RXDATA[63]is first received bit - IEEE 802.3 scrambler operates LSB-first: bit 0 is processed first
- Descrambler was processing
data_in[0]first, but that was actually the last received bit - This resulted in garbage output from descrambler despite valid block lock
- GTX gearbox outputs data MSB-first:
- Fix: Added bit reversal in
pcs_10gbase_r.vhd:-- RX path: Reverse GTX MSB-first to IEEE LSB-first rx_data_reversed <= bit_reverse(gtx_rx_data); rx_header_reversed <= gtx_rx_header(0) & gtx_rx_header(1); -- TX path: Reverse IEEE LSB-first to GTX MSB-first gtx_tx_data <= bit_reverse(scram_data); gtx_tx_header <= scram_header(0) & scram_header(1);
- Key insight: Block lock FSM still works with raw headers because "01" and "10" are symmetric under bit swap
- Result: XGMII Start characters now detected, MAC frames parsed correctly
- IEEE 802.3-2018 Clause 49: Physical Coding Sublayer (PCS)
- IEEE 802.3-2018 Clause 48: Physical Medium Attachment (PMA)
- Xilinx UG476: 7 Series FPGAs GTX/GTH Transceivers User Guide
- Xilinx UG482: 7 Series FPGAs GTP Transceivers User Guide
- 31-10gbe-uart-debug - 10GbE with Xilinx IP (reference)
- 23-order-book - FPGA order book (integration target)
- Phase 3 Architecture - Multi-FPGA system using this PHY
Created: January 2026 Last Updated: February 16, 2026 Hardware Status: Tested on AX7325B, WNS +1.194ns, 0 critical warnings Target Board: ALINX AX7325B (Kintex-7 XC7K325T-2FFG900I)