Skip to content

Commit ebfc14a

Browse files
Merge branch 'master' into temp_organize_place_timng
2 parents 5656830 + 848d1e7 commit ebfc14a

File tree

222 files changed

+5603
-4364
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

222 files changed

+5603
-4364
lines changed

.github/workflows/test.yml

-1
Original file line numberDiff line numberDiff line change
@@ -439,7 +439,6 @@ jobs:
439439
- { name: 'GCC 11 (Ubuntu Noble - 24.04)', eval: 'CC=gcc-11 && CXX=g++-11', }
440440
- { name: 'GCC 12 (Ubuntu Noble - 24.04)', eval: 'CC=gcc-12 && CXX=g++-12', }
441441
- { name: 'GCC 14 (Ubuntu Noble - 24.04)', eval: 'CC=gcc-14 && CXX=g++-14', }
442-
- { name: 'Clang 15 (Ubuntu Noble - 24.04)', eval: 'CC=clang-15 && CXX=clang++-15', }
443442
- { name: 'Clang 16 (Ubuntu Noble - 24.04)', eval: 'CC=clang-16 && CXX=clang++-16', }
444443
- { name: 'Clang 17 (Ubuntu Noble - 24.04)', eval: 'CC=clang-17 && CXX=clang++-17', }
445444
- { name: 'Clang 18 (Ubuntu Noble - 24.04)', eval: 'CC=clang-18 && CXX=clang++-18', }

CHANGELOG.md

+58
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,64 @@ _The following are changes which have been implemented in the VTR master branch
4747

4848
### Removed
4949

50+
51+
## v9.0.0 - 2024-12-23
52+
53+
### Added
54+
* Support for Advanced Architectures:
55+
* 3D FPGA and RAD architectures.
56+
* Architectures with hard Networks-on-Chip (NoCs).
57+
* Distinct horizontal and vertical channel widths and types.
58+
* Diagonal routing wires and other complex wire shapes (L-shaped, T-shaped, ....).
59+
60+
* New Benchmark Suites:
61+
* Koios: A deep-learning-focused benchmark suite with various design sizes.
62+
* Hermes: Benchmarks utilizing hard NoCs.
63+
* TitanNew: Large benchmarks targeting the Stratix 10 architecture.
64+
65+
* Commercial FPGAs Architecture Captures:
66+
* Intel’s Stratix 10 FPGA architecture.
67+
* AMD’s 7-series FPGA architecture.
68+
69+
* Parmys Logic Synthesis Flow:
70+
* Better Verilog language coverage
71+
* More efficient hard block mapping
72+
73+
* VPR Graphics Visualizations:
74+
* New interface for improved usability and underlying graphics rewritten using EZGL/GTK to allow more UI widgets.
75+
* Algorithm breakpoint visualizations for placement and routing algorithm debugging.
76+
* User-guided (manual) placement optimization features.
77+
* Enabled a live connection for client graphical application to VTR engines through sockets (server mode).
78+
* Interactive timing path analysis (IPA) client using server mode.
79+
80+
* Performance Enhancements:
81+
* Parallel router for faster inter-cluster routing or flat routing.
82+
83+
* Re-clustering API to modify packing decisions during the flow.
84+
* Support for floorplanning and placement constraints.
85+
* Unified intra- and inter-cluster (flat) routing.
86+
* Comprehensive web-based VTR utilities and API documentation.
87+
88+
### Changed
89+
* The default values of many command line options (e.g. inner_num is 0.5 instead of 1.0)
90+
* Changes to placement engine
91+
* Smart centroid initial placement algorithm.
92+
* Multiple smart placement directed moves.
93+
* Reinforcement learning-based placement algorithm.
94+
* Changes to routing engine
95+
* Faster lookahead creation.
96+
* More accurate lookahead for large blocks.
97+
* More efficient heap and pruning strategies.
98+
* max `pres_fac` capped to avoid possible numeric issues.
99+
100+
101+
### Fixed
102+
* Many algorithmic and coding bugs are fixed in this release
103+
104+
### Removed
105+
* Breadth-first (non-timing-driven) router.
106+
* Non-linear congestion placement cost.
107+
50108
## v8.0.0 - 2020-03-24
51109

52110
### Added

CMakeLists.txt

+7-7
Original file line numberDiff line numberDiff line change
@@ -62,8 +62,8 @@ option(ODIN_SANITIZE "Enable building odin with sanitize flags" OFF)
6262
option(WITH_PARMYS "Enable Yosys as elaborator and parmys-plugin as partial mapper" ON)
6363
option(YOSYS_F4PGA_PLUGINS "Enable building and installing Yosys SystemVerilog and UHDM plugins" OFF)
6464

65-
set(VTR_VERSION_MAJOR 8)
66-
set(VTR_VERSION_MINOR 1)
65+
set(VTR_VERSION_MAJOR 9)
66+
set(VTR_VERSION_MINOR 0)
6767
set(VTR_VERSION_PATCH 0)
6868
set(VTR_VERSION_PRERELEASE "dev")
6969

@@ -93,9 +93,9 @@ add_definitions("-DVTR_ASSERT_LEVEL=${VTR_ASSERT_LEVEL}")
9393
include(CheckCXXCompilerFlag)
9494

9595
#
96-
# We require c++17 support
96+
# We require c++20 support
9797
#
98-
set(CMAKE_CXX_STANDARD 17)
98+
set(CMAKE_CXX_STANDARD 20)
9999
set(CMAKE_CXX_STANDARD_REQUIRED ON)
100100
set(CMAKE_CXX_EXTENSIONS OFF) #No compiler specific extensions
101101

@@ -160,7 +160,7 @@ else()
160160
"-Wcast-align" #Warn if a cast causes memory alignment changes
161161
"-Wshadow" #Warn if local variable shadows another variable
162162
"-Wformat=2" #Sanity checks for printf-like formatting
163-
"-Wno-format-nonliteral" # But don't worry about non-literal formtting (i.e. run-time printf format strings)
163+
"-Wno-format-nonliteral" # But don't worry about non-literal formatting (i.e. run-time printf format strings)
164164
"-Wlogical-op" #Checks for logical op when bit-wise expected
165165
"-Wmissing-declarations" #Warn if a global function is defined with no declaration
166166
"-Wmissing-include-dirs" #Warn if a user include directory is missing
@@ -178,10 +178,10 @@ else()
178178
"-Wduplicated-cond" #Warn about identical conditions in if-else chains
179179
"-Wduplicated-branches" #Warn when different branches of an if-else chain are equivalent
180180
"-Wnull-dereference" #Warn about null pointer dereference execution paths
181-
"-Wuninitialized" #Warn about unitialized values
181+
"-Wuninitialized" #Warn about uninitialized values
182182
"-Winit-self" #Warn about self-initialization
183183
"-Wcatch-value=3" #Warn when catch statements don't catch by reference
184-
"-Wextra-semi" #Warn about redudnant semicolons
184+
"-Wextra-semi" #Warn about redundant semicolons
185185
"-Wimplicit-fallthrough=3" #Warn about case fallthroughs, but allow 'fallthrough' comments to suppress warnings
186186
#GCC-like optional
187187
#"-Wsuggest-final-types" #Suggest where 'final' would help if specified on a type methods

README.developers.md

+15-12
Original file line numberDiff line numberDiff line change
@@ -637,6 +637,10 @@ They can be used for FPGA architecture exploration for DL and also for tuning CA
637637

638638
A typical approach to evaluating an algorithm change would be to run `koios_medium` (or `koios_medium_no_hb`) tasks from the nightly regression test (vtr_reg_nightly_test4), the `koios_large` (or `koios_large_no_hb`) and the `koios_proxy` (or `koios_proxy_no_hb`) tasks from the weekly regression test (vtr_reg_weekly). The nightly test contains smaller benchmarks, whereas the large designs are in the weekly regression test. To measure QoR for the entire benchmark suite, both nightly and weekly tests should be run and the results should be concatenated.
639639

640+
As 3 of the `koios_large` circuits require special settings due to having long DSP chains, they are split in separate tasks as follows:
641+
* `bwave_like.float.large.v` and `bwave_like.fixed.large.v` are in `vtr_reg_weekly/koios_bwave_large` task
642+
* `dla_like.large.v` is in `vtr_reg_weekly/koios_dla_large` task
643+
640644
For evaluating an algorithm change in the Odin frontend, run `koios_medium` (or `koios_medium_no_hb`) tasks from the nightly regression test (vtr_reg_nightly_test4_odin) and the `koios_large_odin` (or `koios_large_no_hb_odin`) tasks from the weekly regression test (vtr_reg_weekly).
641645

642646
The `koios_medium`, `koios_large`, and `koios_proxy` regression tasks run these benchmarks with complex_dsp functionality enabled, whereas `koios_medium_no_hb`, `koios_large_no_hb` and `koios_proxy_no_hb` regression tasks run these benchmarks without complex_dsp functionality. Normally, only the `koios_medium`, `koios_large`, and `koios_proxy` tasks should be enough for QoR.
@@ -651,6 +655,8 @@ The following table provides details on available Koios settings in VTR flow:
651655
| Nightly | Medium designs | k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml | ✓ | vtr_reg_nightly_test4_odin/koios_medium | Odin | |
652656
| Nightly | Medium designs | k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml | | vtr_reg_nightly_test4_odin/koios_medium_no_hb | Odin | |
653657
| Weekly | Large designs | k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml | ✓ | vtr_reg_weekly/koios_large | Parmys | |
658+
| Weekly | Large designs | k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml | ✓ | vtr_reg_weekly/koios_dla_large | Parmys | |
659+
| Weekly | Large designs | k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml | ✓ | vtr_reg_weekly/koios_bwave_large | Parmys | |
654660
| Weekly | Large designs | k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml | | vtr_reg_weekly/koios_large_no_hb | Parmys | |
655661
| Weekly | Large designs | k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml | ✓ | vtr_reg_weekly/koios_large_odin | Odin | |
656662
| Weekly | Large designs | k6FracN10LB_mem20K_complexDSP_customSB_22nm.xml | | vtr_reg_weekly/koios_large_no_hb_odin | Odin | |
@@ -661,7 +667,15 @@ The following table provides details on available Koios settings in VTR flow:
661667

662668
For more information refer to the [Koios benchmark home page](vtr_flow/benchmarks/verilog/koios/README.md).
663669

664-
The following steps show a sequence of commands to run the `koios` tasks on the Koios benchmarks:
670+
To make running all the koios benchmarks easier, especially with thos circuits scattered between different tasks, there is an overall task list that runs all the 40 circuits of Koios as follows (this will run all the circuits with complex DSP functionality enabled. If you want to disable the complex DSP, edit the file to point to the `koios_*_no_hb` tasks):
671+
672+
```shell
673+
$ ../scripts/run_vtr_task.py -l koios_task_list.txt
674+
675+
#Several hours later... they complete
676+
#
677+
678+
If you want to run a subset of the koios benchmarks or run them without hard DSP blocks, you can run lower-level 'koios' tasks as follows:
665679

666680
```shell
667681
#From the VTR root
@@ -681,17 +695,6 @@ $ ../scripts/run_vtr_task.py regression_tests/vtr_reg_weekly/koios_sv_no_hb &
681695
682696
#Several hours later... they complete
683697
684-
#Parse the results
685-
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_nightly_test4/koios_medium
686-
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_weekly/koios_large
687-
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_weekly/koios_proxy
688-
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_weekly/koios_sv
689-
690-
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_nightly_test4/koios_medium_no_hb
691-
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_weekly/koios_large_no_hb
692-
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_weekly/koios_proxy_no_hb
693-
$ ../scripts/python_libs/vtr/parse_vtr_task.py regression_tests/vtr_reg_weekly/koios_sv_no_hb
694-
695698
#The run directory should now contain a summary parse_results.txt file
696699
$ head -5 vtr_reg_nightly_test4/koios_medium/<latest_run_dir>/parse_results.txt
697700
arch circuit script_params vtr_flow_elapsed_time vtr_max_mem_stage vtr_max_mem error odin_synth_time max_odin_mem parmys_synth_time max_parmys_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_mem pack_time placed_wirelength_est place_mem place_time place_quench_time placed_CPD_est placed_setup_TNS_est placed_setup_WNS_est placed_geomean_nonvirtual_intradomain_critical_path_delay_est place_delay_matrix_lookup_time place_quench_timing_analysis_time place_quench_sta_time place_total_timing_analysis_time place_total_sta_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time min_chan_width_total_timing_analysis_time min_chan_width_total_sta_time crit_path_routed_wirelength crit_path_route_success_iteration crit_path_total_nets_routed crit_path_total_connections_routed crit_path_total_heap_pushes crit_path_total_heap_pops critical_path_delay geomean_nonvirtual_intradomain_critical_path_delay setup_TNS setup_WNS hold_TNS hold_WNS crit_path_routing_area_total crit_path_routing_area_per_tile router_lookahead_computation_time crit_path_route_time crit_path_total_timing_analysis_time crit_path_total_sta_time

doc/src/vpr/command_line_usage.rst

+51-51
Original file line numberDiff line numberDiff line change
@@ -408,6 +408,50 @@ Use the options below to override this default naming behaviour.
408408

409409
Prefix for output files
410410

411+
.. option:: --read_flat_place <file>
412+
413+
Reads a file containing the locations of each atom on the FPGA.
414+
This is used by the packer to better cluster atoms together.
415+
416+
The flat placement file (which often ends in ``.fplace``) is a text file
417+
where each line describes the location of an atom. Each line in the flat
418+
placement file should have the following syntax:
419+
420+
.. code-block:: none
421+
422+
<atom_name : str> <x : float> <y : float> <layer : float> <atom_sub_tile : int> <atom_site_idx? : int>
423+
424+
For example:
425+
426+
.. code-block:: none
427+
428+
n523 6 8 0 0 3
429+
n522 6 8 0 0 5
430+
n520 6 8 0 0 2
431+
n518 6 8 0 0 16
432+
433+
The position of the atom on the FPGA is given by 3 floating point values
434+
(``x``, ``y``, ``layer``). We allow for the positions of atom to be not
435+
quite legal (ok to be off-grid) since this flat placement will be fed into
436+
the packer and placer, which will snap the positions to grid locations. By
437+
allowing for off-grid positions, the packer can better trade-off where to
438+
move atom blocks if they cannot be placed at the given position.
439+
For 2D FPGA architectures, the ``layer`` should be 0.
440+
441+
The ``sub_tile`` is a clustered placement construct: which cluster-level
442+
location at a given (x, y, layer) should these atoms go at (relevant when
443+
multiple clusters can be stacked there). A sub-tile of -1 may be used when
444+
the sub-tile of an atom is unkown (allowing the packing algorithm to choose
445+
any sub-tile at the given (x, y, layer) location).
446+
447+
The ``site_idx`` is an optional index into a linearized list of primitive
448+
locations within a cluster-level block which may be used as a hint to
449+
reconstruct clusters.
450+
451+
.. warning::
452+
453+
This interface is currently experimental and under active development.
454+
411455
.. option:: --write_flat_place <file>
412456

413457
Writes the post-placement locations of each atom into a flat placement file.
@@ -830,55 +874,9 @@ If any of init_t, exit_t or alpha_t is specified, the user schedule, with a fixe
830874

831875
**Default:** ``0.0``
832876

833-
.. _dusty_sa_options:
834-
Setting any of the following 5 options selects :ref:`Dusty's annealing schedule <dusty_sa>` .
835-
836-
.. option:: --alpha_min <float>
837-
838-
The minimum (starting) update factor (alpha) used.
839-
Ranges between 0 and alpha_max.
840-
841-
**Default:** ``0.2``
842-
843-
.. option:: --alpha_max <float>
844-
845-
The maximum (stopping) update factor (alpha) used after which simulated annealing will complete.
846-
Ranges between alpha_min and 1.
847-
848-
**Default:** ``0.9``
849-
850-
.. option:: --alpha_decay <float>
851-
852-
The rate at which alpha will approach 1: alpha(n) = 1 - (1 - alpha(n-1)) * alpha_decay
853-
Ranges between 0 and 1.
854-
855-
**Default:** ``0.7``
856-
857-
.. option:: --anneal_success_min <float>
858-
859-
The minimum success ratio after which the temperature will reset to maintain the target success ratio.
860-
Ranges between 0 and anneal_success_target.
861-
862-
**Default:** ``0.1``
863-
864-
.. option:: --anneal_success_target <float>
865-
866-
The temperature after each reset is selected to keep this target success ratio.
867-
Ranges between anneal_success_target and 1.
868-
869-
**Default:** ``0.25``
870-
871-
.. option:: --place_cost_exp <float>
872-
873-
Wiring cost is divided by the average channel width over a net's bounding box
874-
taken to this exponent. Only impacts devices with different channel widths in
875-
different directions or regions.
876-
877-
**Default:** ``1``
878-
879877
.. option:: --RL_agent_placement {on | off}
880878

881-
Uses a Reinforcement Learning (RL) agent in choosing the appropiate move type in placement.
879+
Uses a Reinforcement Learning (RL) agent in choosing the appropriate move type in placement.
882880
It activates the RL agent placement instead of using a fixed probability for each move type.
883881

884882
**Default:** ``on``
@@ -907,7 +905,7 @@ Setting any of the following 5 options selects :ref:`Dusty's annealing schedule
907905

908906
Controls how quickly the agent's memory decays. Values between [0., 1.] specify
909907
the fraction of weight in the exponentially weighted reward average applied to moves
910-
which occured greater than moves_per_temp moves ago. Values < 0 cause the
908+
which occurred greater than moves_per_temp moves ago. Values < 0 cause the
911909
unweighted reward sample average to be used (all samples are weighted equally)
912910

913911
**Default:** ``0.05``
@@ -926,6 +924,8 @@ Setting any of the following 5 options selects :ref:`Dusty's annealing schedule
926924

927925
**Default:** ``move_block_type``
928926

927+
928+
929929
.. option:: --placer_debug_block <int>
930930

931931
.. note:: This option is likely only of interest to developers debugging the placement algorithm
@@ -1023,7 +1023,7 @@ The following options are only valid when the placement engine is in timing-driv
10231023

10241024
.. option:: --place_delay_model_reducer {min, max, median, arithmean, geomean}
10251025

1026-
When calculating delta delays for the placment delay model how are multiple values combined?
1026+
When calculating delta delays for the placement delay model how are multiple values combined?
10271027

10281028
**Default:** ``min``
10291029

@@ -1056,15 +1056,15 @@ The following options are only valid when the placement engine is in timing-driv
10561056

10571057
.. option:: --place_tsu_abs_margin <float>
10581058

1059-
Specifies an absolute offest added to cell setup times used by the placer.
1059+
Specifies an absolute offset added to cell setup times used by the placer.
10601060
This effectively controls whether the placer should try to achieve extra margin on setup paths.
10611061
For example a value of 500e-12 corresponds to requesting an extra 500ps of setup margin.
10621062

10631063
**Default:** ``0.0``
10641064

10651065
.. option:: --post_place_timing_report <file>
10661066

1067-
Name of the post-placement timing report file to generate (not generated if unspecfied).
1067+
Name of the post-placement timing report file to generate (not generated if unspecified).
10681068

10691069

10701070
.. _noc_placement_options:

libs/EXTERNAL/libezgl/include/ezgl/point.hpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ class point2d {
3636
/**
3737
* Create a point at the given x and y position.
3838
*/
39-
point2d(double x_coord, double y_coord) : x(x_coord), y(y_coord)
39+
point2d(double x_coord, double y_coord) noexcept : x(x_coord), y(y_coord)
4040
{
4141
}
4242

libs/EXTERNAL/libezgl/include/ezgl/rectangle.hpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ class rectangle {
3333
/**
3434
* Default constructor: Create a zero-sized rectangle at {0,0}.
3535
*/
36-
rectangle() : m_first({0, 0}), m_second({0, 0})
36+
rectangle() noexcept : m_first({0, 0}), m_second({0, 0})
3737
{
3838
}
3939

0 commit comments

Comments
 (0)