Skip to content

chili-chips-ba/openPCIE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

299 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Computing is about communicating. Some would also say about networking. Digital independence tags along on the wave of "Recommendations and Roadmap for European Sovereignty in open source HW, SW and RISC-V Technologies (2021)", calling for the development of critical open source IP blocks, such as PCIE Root Complex (RC). This is the first step in that direction. And, if you are looking for an opensource, soft PCIE EndPoint (EP) core, please check our other repo.

This project looks to open Artix7 PCIe Gen2 RC IP blocks for use outside of proprietary tool flows. While still reliant on Xilinx Series7 Hard Macros (HMs), it will surround them with open-source soft logic for PIO accesses — The RTL and, even more importantly, the layered sofware Driver with Demo App.

All that with full HW/SW opensource co-sim. Augmented with a rock-solid openBackplane in the basement of our hardware solution, the geek community will thus get all it takes for building their own, end-to-end openCompute systems.

The project‘s immediate goal is to empower the makers with ability to drive PCIE-based peripherals from their own soft RISC-V SOCs.

Given that the PCIE End-Point (EP) with DMA is already available in opensource, the opensource PCIE peripherals do exist for Artix7. Except that they are always, without exception, controlled by the proprietary RC on the motherboard side, typically in the form of RaspberryPi ASIC, or x86 PC. This project intends to change that status quo.

Our long-term goal is to set the stage for the development of full opensource PCIE stack, gradually phasing out Xilinx HMs from the solution. That’s a long, ambitious track, esp. when it comes to mixed-signal SerDes and high-quality PLLs. We therefore anticipate a series of follow on projects that would build on the foundations we hereby set.

This first phase is about implementing an open source PCIE Root Complex (RC) for Artix7 FPGA, utilizing Xilinx Series7 PCIE HM and GTP IP blocks, along with their low-jitter PLL.

References


Project Status

PART 1. Mini PCIE Backplane PCB

Almost all consumer PCIE installations have the RC chip soldered down on the motherboard, typically embodied in the CPU or "North Bridge" ASIC, where PCIE connectors are used solely for the EP cards. Similarly, all FPGA boards on the market are designed for EP applications. As such, they expect clock, reset and a few other signals from the infrastructure. It is only the professional and military-grade electronics that may have both RC and EP functions on add-on cards, with a backplane or mid-plane connecting them (see VPX chassis, or VITA 46.4).

This dev activity is about creating the minimal PCIE infrastructure necessary for using a plethora of ready-made FPGA EP cards as a Root Complex. This infrastructure takes the physical form of a mini backplane that provides the necessary PCIE context similarly to what a typical motherboard would give, but without a soldered-down RC chip that would be conflicting with our own FPGA RC node.

Such approach is less work and less risk than to design our own PCIE motherboard, with a large FPGA on it. But, it is also a task that we did not appreciate from the get-go. In a bit of a surprise, half-way through planning, we've realized that a suitable, ready-made backplane was not available on the market. This initial disappointment then turned into excitement knowing that this new outcome would make the project even more attractive / more valuable for the community... esp. when Envox.eu has agreed to step in and help. They will take on the PCIE backplane PCB development activity.

  • ✔ Create requirements document.
  • ✔ Select components. Schematic and PCB layout design.
  • ✔ Review and iterate design to ensure robust operation at 5GHz, possibly using openEMS for simulation of high-speed traces.
  • ✔ Manufacture prototype. Debug and bringup, using AMD-proprietary on-chip IBERT IP core to assess Signal Integrity.
  • Produce second batch that includes all improvements. Distribute it, and release design files with full documentation.

PART 2. Project setup and preparatory activities

  • ✔ Procure FPGA development boards and PCIE accessories.
  • ✔ Put together a prototype system. Bring it up using proprietary RTL IP, proprietary SW Driver, TestApp and Vivado toolchain.

PART 3. Initial HW/SW implementation

  • HW development of opensource RTL that mimics the functionality of PCIE RC proprietary solution.
  • ✔ SW development of opensource driver for the PCIE RC HW function. This may, or may not be done within Linux framework.
  • ✔ Design SOC based on RISC-V CPU with PCIE RC as its main peripheral.

PART 4. HW/SW co-simulation using full PCIE EP model

This dev activity is significantly beefed up compared to our original plan, which was to use a much simpler PCIE EP BFM, and non-SOC sim framework. While that would have reduced the time and effort spent on the sim, prompted by NLnet astute questions, we're happy to announce that wyvernSemi is now also onboard!

Their VProc can be used not only to faithfully model the RISC-V CPU and SW interactions with HW, but it also comes with an implementation of the PCIE model. The PCIE model has some EP capabilities with a configurtable configurations space, which can be paired in sim with our RC RTL design. Moreover, the existence of both RC and EP models paves the way for future plug-and-play, pick-and-choose opensource sims of the entire PCIE subsystem.

With the full end-to-end simulation thus in place, we hope that the need for hardware debugging, using ChipScope, expensive test equipment and PCIE protocol analyzers would be alleviated.

  • ✔ Extension of the existing PCIE RC model for some additional configurability of the EP capabilities.
  • ✔ Testbench development and build up. Execution and debug of sim testcases.
  • Documentation of EP model, TB and sim environment, with objectives to make it all simple enough to pickup, adapt and deploy in other projects.

PART 5. Integration, testing and iterative design refinements

  • One-by-one replace proprietary design elements from PART2.b with our opensource versions (except for Vivado and TestApp). Test it along the way, fixing problems as they occur.

PART 6. Prepare Demo and port it to openXC7

  • ✔ Develop our opensource PIO TestApp software and representative Demo.
  • Build design with openXC7, reporting issues and working with developers to fix them, possibly also trying ScalePNR flow.

Given that PCIE is an advanced, high-speed design, and our accute awareness of nextpnr-xilinx and openXC7 shortcomings, we expect to run into showstoppers on the timing closure front. We therefore hope that the upcoming ScalePNR flow will be ready for heavy-duty testing within this project.


HW Architecture

The project relies on a modular hardware ecosystem that combines our custom-designed openPCIE Backplane with SQRL Acorn CLE-215+ FPGA modules and various PCIe adapters to create a flexible testing platform.

References:

FPGA hardware platform

References:

The hardware platform for this project is the SQRL Acorn CLE-215+, a versatile FPGA development board. Although originally designed as a crypto-accelerator, its powerful Artix-7 FPGA and modular design make it an excellent choice for general-purpose PCIe development.

The system consists of two main components:

  • M.2 FPGA Module (Acorn CLE-215+): This is the core of the system, a compact board in an M.2 form factor. It houses the Xilinx Artix-7 XC7A200T FPGA and is designed to be plugged into a standard M.2 M-key slot.
M.2 FPGA Module (Top View) M.2 FPGA Module (Bottom View)
  • PCIe Adapter Board (Acorn Baseboard Mini): A carrier board that holds the M.2 FPGA module. Its primary function is to adapt the M.2 interface to a standard PCIe x4 edge connector, allowing the entire assembly to be installed and tested in a regular PC motherboard slot.
PCIe Adapter Board (Top View) PCIe Adapter Board (Bottom View)


The fully assembled Acorn CLE-215+ development board, ready for use in a PCIe slot.

It is important to note that the Acorn CLE-215+ is functionally identical to the more widely known NiteFury board, with the primary difference being the amount of onboard memory. The Acorn model features 1 GB of DDR3 RAM, while the standard NiteFury has 512 MB. Therefore, the NiteFury schematic serves as a direct and accurate reference for the board's hardware layout.

The central component of the SQRL Acorn CLE-215+ system is the Xilinx Artix-7 XC7A200T-FBG484 chip. This FPGA is crucial for implementing the PCIe Endpoint functionality, possessing a range of features that make it highly suitable for this purpose.

The key specifications are summarized below:

Specification Value
Family Xilinx Artix-7
Speed Grade -3
Logic Cells (LUT4-Equivalent)¹ 215,360
LUT6 134,600
Flip-Flops 269,200
Block RAM 13 Mbit
DSP Slices 740
GTP Transceivers 4 (up to 6.6 Gbit/s)
DDR3 SDRAM (Board) 1 GB, 16-bit
QSPI Flash (Board) 32 MB
User LEDs 4
General Purpose IOs 4
LVDS Pairs 4

¹ The 'Logic Cells' count is a Xilinx metric derived from the physical 6-input LUTs to provide an estimated equivalent in simpler 4-input LUTs for comparison purposes. The number of physical LUTs and other resources are the exact counts for the XC7A200T chip.

FPGA Board Setup

Properly programming and operating the Artix-7 FPGA on the SQRL board required two key hardware modifications.

1. Custom JTAG Cable

The JTAG connector on the Acorn CLE-215+ is non-standard and not directly compatible with the standard 14-pin connector on the Xilinx Platform Cable. A custom adapter cable is therefore required.

<

Custom JTAG Cable connecting the Xilinx Programmer to the board JTAG Connector Pinout on the Board

The connector on the board is a Molex Pico-Lock 1.50mm pitch male header. This is not a standard 2.54mm or 2.00mm header, so standard DuPont-style cables will not fit.

To simplify making the cable, we highly recommend purchasing a pre-assembled cable with the correct female connector.

This cable has the correct female connector on both ends. The easiest method is to cut the cable in half, which gives you two connector cables with open ends. You can then splice one of these cable ends onto the wires of your Xilinx programmer cable, matching the signals according to the following wiring diagram.


JTAG Connection Guide: Physical Pinout and Wiring Diagram.

2. External 12V Power Supply

The board cannot be programmed or operated solely from the PCIe/M.2 slot power. It requires an external 12V supply to function correctly, especially when complex designs and high-speed transceivers are active. Power is provided via a standard 6-pin PCIe power connector from an ATX power supply.


External 12V power connection.

3. Final Assembly

The complete system, including the custom cabling, is mounted in a test PC chassis for verification.


The complete FPGA system mounted in a PCIe slot.

4. Connection Verification

After the hardware was prepared, the connection was verified using the Vivado Hardware Manager. As shown below, the tool successfully detected the JTAG programmer and identified the xc7a200t_0 FPGA chip. This confirms that the physical connections are correct and the board is ready for programming.


Successful device detection in Vivado Hardware Manager.


openBackplane PCB

Please, refer to 1.pcb for additional detail.


TB/Sim Architecture

Simulation Test Bench

The openpcue2-rc test bench aims to have a flexible approach to simulation which allows a common test environment to be used whilst selecting between alternative CPU components, one of which uses the VProc virtual processor co-simulation element. This allows simulations to be fully HDL, with a RISC-V processor RTL implementation such as picoRV32, IBEX or EDUBOS5, or to co-simulate software using the virtual processor, with a significant speed up in simulation times. The test bench has the following features:

The figure below shows an oveview block diagram of the test bench HDL.

More details on the architecture and usage of the Wireguard test bench can be found in the README.md in the 5.sim directory.

Co-simulation HAL

The Wireguard control and status register harware abstraction layer (HAL) software is auto-generated, as is the CSR RTL, using peakrdl. For co-simulation purposes an additional layer is auto-generated from the same SystemRDL specification using systemrdl-compiler that accompanies the peakrdl tools. This produces two header files that define a common API to the application layer for both the RISC-V platform and the VProc based co-simulation verification environment. The details of the HAL generation can be found in the README.md in the 4.build/ directory.

More details of the test bench, the pcievhost component and its usage can be found in the 5.sim/README.md file.

References


SW Architecture

The software stack is designed to run bare-metal on the soft RISC-V SoC embedded within the FPGA. In a standard PC, the Operating System (Linux/Windows) manages PCIe enumeration automatically in the background, hiding the complexity from the developer. In this project, our open-source driver takes full control, manually performing every step of the enumeration process to act as the PCIe Host.

The architecture follows a layered approach:

  1. Application Layer (The "User" Logic):

    • The final stage of the program that executes the high-level task.
    • It performs Memory Write operations to send a data payload to the Endpoint and uses Memory Read to verify the integrity of the data path.
  2. PCIe Driver (Enumeration & Setup):

    • Responsible for the initialization sequence required to perform enumeration and establish a functional connection (link).
    • It manually performs device discovery, probes BAR sizes, assigns memory addresses, and configures the Command Register to enable the device for communication.
  3. HAL (Hardware Abstraction Layer):

    • Low-level helper functions that interact with the hardware by reading and writing data to specific memory addresses.

Note: This structure allows developers to treat PCIe devices just like any other local peripheral, abstracting away the complexities of the physical link.


Implementation Workflow

  • WIP

Debug, Bringup, Testing

Hardware Infrastructure

  • PCB: openPCIE Backplane (see 1.pcb) which provides slots for the RC, EP, and the Switch.
  • FPGA Boards: Two SQRL Acorn CLE-215+ (Artix-7) boards.
  • JTAG Programmers: Two Xilinx Platform Cable USB units are used, each equipped with a custom adapter cable.

The "Dual-PC" Debugging Approach

We used a two-PC setup to streamline development:

  • PC 1: Connected to the Root Complex (RC) FPGA via JTAG.
  • PC 2: Connected to the EndPoint (EP) FPGA via JTAG.

Why this setup?

  • Speed and Efficiency: This setup avoids the repetitive task of manually swapping the JTAG cable between boards and eliminates the time-consuming process of flashing onboard memory for every test iteration.
  • Simultaneous Debugging: This allows us to run two instances of Vivado Hardware Manager (ILA - Integrated Logic Analyzer) at the same time. We can trigger on the RC and EP simultaneously to view the transaction from both sides of the link.

Testing Procedure

To bring up the system and verify the PCIe link, follow these steps:

  • Hardware Assembly: Insert the FPGA cards into their designated slots on the openPCIE Backplane (RC and EP) and connect the external 12V power supply.
  • Bitstream Programming: Program both FPGAs using Vivado Hardware Manager. PC 1 is used to program the Root Complex, while PC 2 programs the EndPoint.
  • Manual Reset: Once both devices are programmed, press the manual reset button on the backplane.
  • Enumeration: Upon releasing the reset button, the Root Complex initiates the enumeration process and establishes the link with the EndPoint.
  • Re-Initialization: Every subsequent press of the reset button triggers a full re-initialization of the PCIe connection, allowing for repeated testing and debugging without the need to re-program the FPGAs.

Objective

The goal of this test is to verify that the Root Complex can successfully enumerate the link and perform both Memory Write and Memory Read transactions.

  1. Write: The RC sends a data payload to the EP.
  2. Transfer: The EP receives the TLP and writes it into its internal Block RAM (BRAM).
  3. Read: The RC requests to read the data back from the EP memory.

Functional Verification:

Verification Environment

Before diving into the results, here is the complete hardware validation setup used for the Direct (Point-to-Point) connection scenario. It features the openPCIE Backplane powered by an external 12V supply, populated with two Acorn CLE-215+ FPGA modules (one configured as RC, the other as EP).

Dual Xilinx Platform Cable USB units are connected via custom adapters to allow simultaneous debugging and bitstream loading from two separate host workstations.


The complete hardware validation environment for the Direct connection test.

Verification Methods

1. Visual Verification (LEDs) Visual feedback is provided via the 4 user LEDs on both FPGA boards:

  • RC Board: The LEDs indicate the Link Status, confirming that the physical connection is established and the devices are ready to communicate.
  • EP Board: The LEDs display the data received by the EP.

2. Internal Signal Monitoring (ILA) Using the Vivado Integrated Logic Analyzer (ILA) on both PCs enables detailed monitoring of internal signals to see exactly what data was sent and received, its precise timing, and the low-level transaction details.

Verification Results

  • WIP
RC-LED-Link-Up EP-LED-Data-Payload

References


PCIE Protocol Analyzer

References


Acknowledgements

We are grateful to NLnet Foundation for their sponsorship of this development activity.

NGI-Entrust-Logo

The wyvernSemi's wisdom and contribution made a great deal of difference -- Thank you, we are honored to have you on the project.

wyvernSemi-Logo

The Envox, our next-door buddy, is responsible for the birth of our backplane, which we like to call BB (not to be mistaked for their gorgeous blue beauty BB3)

Public posts:


End of Document

About

Peripheral Component Interconnect (PCI) has taken the Express lane long ago, moving to xGbps SerDes. Now for the first time in opensource on the Host side too. Our project roots for the Root Port in 4 ways: |1|openRTL |2|openBFM with unique SIM setup, way faster than vendor's |3|openSW stack |4|one-of-a-kind open backplane.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors