A CDC-safe asynchronous FIFO written in SystemVerilog, designed for robust clock-domain crossing (CDC) between independent write and read clocks.
This repository targets an "industry-style" IP deliverable: clean RTL, clear interface contract, repeatable simulation, and verification artifacts (SystemVerilog testbench + optional assertions).
- True async FIFO for CDC: independent
write_clkandread_clk - Modular RTL:
rtl/async_fifo.sv+ dedicatedrtl/sync_2ff.sv - Gray-coded pointers with 2FF synchronizers (classic, silicon-proven approach)
- Parameterized BITS and SIZE (power-of-two SIZE recommended)
- Clean flags:
full,empty, optionalalmost_full/emptyand fill levels - Verification-ready: self-checking tests + stress tests across clock ratios/jitter
- Optional SystemVerilog Assertions (SVA) to lock down protocol and invariants
- Linux shell environment
- GNU Make
- Python 3 (for
scripts/report.py)
- Verilator (required by
make lint,make lint-rtl,make lint-tb,make lint-all) - Bash-compatible shell (used by
Makefilerecipe shell flags)
- ModelSim Intel FPGA Starter Edition
2020.1was used for baseline simulation (/opt/intelFPGA/20.1/modelsim_ase/binin this project setup). - Questa
2025.3is required for full assertion support and coverage flow.
- Cadence Xcelium
23.03(xrun) for gate-level activity capture (dut.shm) - Cadence Genus
21.1for logical synthesis and power estimation - Cadence DDI
23.1(Digital Design and Implementation) for power-flow environment - Access to configured TSMC 28nm library/PDK paths used by synthesis scripts
module load questa/2025.3(if your environment provides this module)module load genus(compatible with Genus21.1)module load xcelium(compatible with Xcelium23.03)module load ddi(compatible with DDI23.1)
p_: module ports (inputs/outputs), except clock/reset.r_: registered signals (always_ffstate).w_: combinational/internal wires.- Clock/reset exception: use plain names without prefix, e.g.
write_clk,write_rst_n,read_clk,read_rst_n.
Async FIFO correctness hinges on avoiding metastability propagation across domains. This implementation follows the standard approach:
- Maintain binary pointers locally in each domain (write/read).
- Convert pointers to Gray code.
- Send Gray pointers across domains through 2-flop synchronizers.
- Convert synchronized Gray pointers back to binary locally.
- Compute
full/emptyby comparing pointers in the same clock domain.
This avoids sampling multi-bit binary counters asynchronously, which can break due to metastability and intermediate transitions.
This implementation is based on the design approach presented in the paper Simulation and Synthesis Techniques for Asynchronous FIFO Design.
| Signal | Dir | Description |
|---|---|---|
write_clk |
in | Write clock |
write_rst_n |
in | Active-low asynchronous write reset |
p_write_en |
in | Write request (one entry per cycle when accepted) |
p_write_data[BITS-1:0] |
in | Data to write |
p_write_full |
out | FIFO full flag (do not write when 1) |
p_write_almost_full |
out | (Optional) Programmable threshold |
p_write_level |
out | (Optional) Approximate fill level (write domain view) |
Write acceptance rule
A write is accepted on a rising edge of write_clk when:
p_write_en == 1andp_write_full == 0
| Signal | Dir | Description |
|---|---|---|
read_clk |
in | Read clock |
read_rst_n |
in | Active-low asynchronous read reset |
p_read_en |
in | Read request (one entry per cycle when accepted) |
p_read_data[BITS-1:0] |
out | Data read |
p_read_empty |
out | FIFO empty flag (do not read when 1) |
p_read_almost_empty |
out | (Optional) Programmable threshold |
p_read_level |
out | (Optional) Approximate fill level (read domain view) |
Read acceptance rule
A read is accepted on a rising edge of read_clk when:
p_read_en == 1andp_read_empty == 0
BITS(default: 32)
Width of each FIFO entry.SIZE(default: 16)
Number of entries. Recommended: power-of-two for simpler pointer logic.ADDR_WIDTH(optional derived)
$clog2(SIZE); pointer width often usesADDR_WIDTH+1to detect wrap.
Optional:
ALMOST_FULL_TH/ALMOST_EMPTY_THSYNC_STAGES(default 2)
- The FIFO uses per-domain active-low asynchronous resets (
write_rst_n,read_rst_n). - On reset, pointers go to zero; flags initialize to:
empty = 1full = 0
CDC recommendation: ensure both domains are reset to a consistent state, and synchronize reset deassertion per clock domain when required by your integration guidelines.
- Max throughput: 1 write per
write_clkcycle + 1 read perread_clkcycle (when not full/empty) - Latency depends on the chosen memory style (reg array vs inferred RAM).
For ASIC/FPGA inference, the FIFO can be adapted to:- Distributed regs (small SIZEs)
- SRAM/BRAM (larger SIZEs)
.
├── rtl/
│ ├── async_fifo.sv
│ └── sync_2ff.sv
├── tb/
│ ├── test_async_fifo.sv # SystemVerilog testbench
│ └── assertions.sv # SVA checks
├── sim/
│ ├── Makefile
│ └── waves/ # generated
└── docs/
└── design.md # deeper notes & diagrams
- Reset sanity and initial flags
- Smoke sequence (write N then read N)
- Interleaved traffic (ping-pong write/read)
- Write clock faster than read clock (stress full-side behavior)
- Read clock faster than write clock (stress empty-side behavior)
TEST generic |
Objective | Status |
|---|---|---|
reset |
Reset sanity (empty=1, full=0) |
Ready |
smoke |
Ordered write/read data integrity | Ready |
interleaved |
Ping-pong write/read flow | Ready |
write-clock-faster |
Stress when write dominates read | Ready |
read-clock-faster |
Stress when read dominates write | Ready |
"" (empty / regress) |
Run all tests above in sequence | Ready |
- Write domain protocol safety (no illegal pointer advance on full)
- Read domain protocol safety (no illegal pointer advance on empty)
- Pointer/Gray encoding consistency checks
- Gray transition sanity checks
- Flag consistency checks (
full/emptyequations) - Unknown/X checks after reset deassertion
Note: ModelSim Intel Edition compiles SVA but reports limited support warnings; full SVA feature support is available in Questa.
From project root (uses Makefile):
make build
make run
make test
make regress
make waves
make cleanmake test and make regress both execute the regression when TEST is empty.
Run a single test:
make test TEST=smoke
make test TEST=write-clock-fasterRun with explicit generics:
make test TEST=read-clock-faster SEED=11 BITS=8 SIZE=8Generate coverage (Questa required):
make coverage
make coverage-htmlCoverage outputs:
- UCDB database:
sim/coverage.ucdb - HTML report:
sim/coverage_html/index.html - Aggregate summary source used in this repo:
sim/coverage_html/files/overalldu.js
Makefile prepends the ModelSim path via:
MODELSIM_BIN ?= /opt/intelFPGA/20.1/modelsim_ase/binIf your installation is in another location:
make MODELSIM_BIN=/path/to/modelsim/bin build
make MODELSIM_BIN=/path/to/modelsim/bin runSynthesis is executed in three explicit steps (no automatic loop):
logical: run Genus logic synthesissim-netlist: run gate-level simulation with Xcelium (generatesdut.shm)power: run Genus power analysis using netlist DB +dut.shm
Area is measured after logical synthesis in Cadence Genus using the TSMC 28nm technology setup.
Power uses an activity-annotated flow:
- Synthesize RTL in Genus (logical step).
- Simulate the synthesized netlist in Cadence Xcelium to capture switching
activity (
dut.shm). - Back-annotate this activity in Genus and run
report_power.
This flow provides more representative power numbers than pure vectorless estimation, because switching comes from real simulation stimulus.
Run step by step for one configuration:
make logical-run-env BITS=32 SIZE=16
make sim-netlist-run-env BITS=32 SIZE=16
make power-run-env BITS=32 SIZE=16Or run all steps:
make synthesis-run-env BITS=32 SIZE=16For each synthesis step there are two target variants:
*-run: executes only the tool command (genus,xrun, etc.), assuming your shell environment is already configured.*-run-env: first executesmodule purge+module load ..., then runs the same step command.
Examples:
logical-runvslogical-run-envsim-netlist-runvssim-netlist-run-envpower-runvspower-run-env
Recommended usage:
- Use
*-run-envon clean shells, shared servers, and CI. - Use
*-runonly when modules were loaded manually beforehand.
To generate consolidated synthesis tables (area + power):
python3 scripts/report.pyThe script reads:
syntesis/logical/results/BITS*_SIZE*/reports/async_fifo_area.rptsyntesis/power/results/BITS*_SIZE*/power_evaluation.txt
Generated outputs:
syntesis/reports/area_table.csvsyntesis/reports/power_table.csvsyntesis/reports/summary.md
The consolidated data summary is in:
syntesis/reports/summary.md
- Configuration archive (kept per run):
syntesis/logical/results/BITS32_SIZE16/...
- Canonical simulation/power input paths (always overwritten by latest
logicalrun):syntesis/logical/results/gate_level/async_fifo_logic_mapped.vsyntesis/logical/results/gate_level/async_fifo_logic_mapped.dbsyntesis/logical/results/gate_level/async_fifo_analysis_view_0p90v_25c_captyp_nominal.sdf
- Gate-level switching activity (fixed path from
xrun):syntesis/sim/dut.shm
- Power report (kept per configuration):
syntesis/power/results/BITS32_SIZE16/power_evaluation.txt
sim-netlistandpoweralways read the canonical netlist/DB/SDF paths.- Therefore, run
logicalfirst for the sameBITS/SIZEbeforesim-netlistandpower.
- Connect
p_write_fullto your upstream backpressure logic - Connect
p_read_emptyto your downstream request logic - Avoid combinational paths between clock domains
SIZEis expected to be power-of-two (async_fifo.svenforces this)- Testbench currently targets ModelSim/Questa command-line flow
- SVA execution in ModelSim Intel Edition has feature limitations/warnings
- Current verification is simulation-based (no formal proof in this repo yet)
- Complete assertions 5 and 11 (full/empty flag equivalence with Gray-domain next-state conditions)
- Implement assertions 6 and 12 (no X/Z after reset deassertion for flags and pointers)
- Add optional assertions 13 and 14 (synced Gray one-bit change, no flag glitching between edges)
- Review and fix antecedents in existing properties to match accepted transactions (
p_write_en && !p_write_full,p_read_en && !p_read_empty) - Run SVA-enabled simulation and capture a clean pass report
- Reach 100% verification coverage (current total coverage: 78.55% from
sim/coverage_html/files/overalldu.js, fieldds.tc) - Add
make regressmatrix with multipleBITS/SIZEcombinations - Add deterministic multi-test runs by
NAMEand fixedSEEDset - Add wrap-around focused regression case (>= 10x depth transactions)
- Add overflow/underflow stress regression with scoreboard checks
- Publish regression summary (tests, seeds, params, pass/fail) in CI/log output
- Add formal properties
- Add programmable thresholds (
almost_full/empty) - Add dual-port RAM inference templates for FPGA/ASIC
- Add optional fall-through (FWFT) read mode
- Clifford E. Cummings and Peter Alfke, Simulation and Synthesis Techniques for Asynchronous FIFO Design, SNUG San Jose 2002. Official technical library listing: https://www.sunburst-design.com/papers/ . Public PDF access: https://www.researchgate.net/publication/252160343_Simulation_and_Synthesis_Techniques_for_Asynchronous_FIFO_Design
- Jason Yu, Dual-Clock Asynchronous FIFO in SystemVerilog, VerilogPro: https://www.verilogpro.com/asynchronous-fifo-design/
MIT (or your preferred license).