Skip to content

Accurate Xilinx 7 series architecture #2301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Oct 21, 2024

Conversation

WhiteNinjaZ
Copy link
Contributor

Description

This PR provides the most up to date arch file for the Xilinx 7 series chip set. This PR is a WIP and is expected to change often.

@vaughnbetz
Copy link
Contributor

@WhiteNinjaZ @jgoeders : I think this is the most recent 7-series capture. If it is, I think it would be good to merge into the master so we have an architecture (which can still be refined) on the master branch that is the current best capture. It would also be good to add a CI test or two targeting it to make sure it doesn't get broken.

@WhiteNinjaZ WhiteNinjaZ marked this pull request as ready for review June 11, 2024 00:54
@WhiteNinjaZ
Copy link
Contributor Author

Yes this is the most up to date arch!

@vaughnbetz
Copy link
Contributor

vaughnbetz commented Jun 24, 2024

@WhiteNinjaZ, @jgoeders : @MohamedElgammal and I are going through the VTR 9 paper and trying to make sure Mohamed's 7-series figure is accurate, and that the capture is fairly accurate.

  1. Are all 'stubs' vertical? I see only stub_y in the architecture (length 1 V wires). I see len2x_stub and len4x_stub wires connected to them, but I never see any of the len6y_stub etc. wires with x-directed stubs.
  2. It looks like CLBs have their pins on all 4 sides. Is this the best capture, or would it be better to have pins on only two sides (say top and right)? The INT blocks might be better modeled with a 2-sided architecture, as each CLB would connect to a single vertical and single horizontal channel, which is closer to a separate INT block I think. However, I think Xilinx has direct connect to all 4 sides (actually can reach 8 CLBs with no wiring), which might be better modeled with pins on all 4 sides. Are the diagonal length 1 + 1 wires capturing some of these direct connects?

@WhiteNinjaZ
Copy link
Contributor Author

WhiteNinjaZ commented Jun 24, 2024

@vaughnbetz In answer to your questions

  1. if you look at figure 7 of the Netcracker paper you can see that there are two types of stubs: one type that connects to an INT in the same line of travel as the wire (i.e. L2 and L4 vertical) and another type where a stub branches to an INT perpendicular to the wires direction of travel (i.e. L2 and L4 Horizonatl traveling West). The first type of stub we modeled using the SB/CB paterns of the wire (i.e. just adding an extra 1 in the connectivity of these patterns, 1 1 1 instead of 1 0 1). The second type of stub as you can see from the figure only occurs on a branch going from a horizontal wire branching in the vertical direction. Hence why the stub type is only in the Y direction. I should note that we do not implement stubs for the L shaped wires simply because it cuases many issues with the way we implement them. It might be posible to include branching stubs for L shaped wires but I think this would be quite a bit of extra work and I thought it might be more benificial to focuss on other aspects that seem to have a greater influence on channel width than this.
    image
  2. I have expirmented quite extensivly with both types of capture and decided on pins on all 4 sides of the block for the following reasons: (1) due to the somewhat unique layout of INT and CLB tiles in xilinx around half of the CLBs can connect to wires on the right/top/botom and the other half can connect to wires on the left/top/bottom. This type of seperation as far as I understand is not posible with current VTR. (2) When looking at channel widths in a model where the pins of the CLB are only on 2 or 3 sides of the tile the only time we could get channel widths that where reasonable was when we also added in a more xilinx acurate representation of the IO placement (i.e. IOs are only on the left/right of the chip not top/botom). However, when we limmited the IO to only two sides of the chip the auto scale feature in VTR had to make the chip very large in some designs since the number of IO was esentialy cut in half and was the limiting factor in placement. This caused excesive runtimes since the design was a lot larger than it needed to be and it also caused our device utilization to be very low since in order to get sufficient IO the chip was scalled to include a lot of extra CLBs. We thought this would cause a less acurate comparison to other VTR architectures Wmin since our packing was significantly less dense (i.e. more routing head room). (3) When expirmenting with pins on only two/three sides of the CLB even with IOs only locked onto the left/right of the architecure we still had routing overhead caused by one side of the CLB being miss matched to the side where the IOs where coming from (i.e. if we locked the pins to be on the left/top/botom than the right side of the arch had to do extra routing sometimes in order to get to the oposite side of the CLB). This issue was caused by reason 1 above (i.e. realy an acurate capture would have half the clb inputs locked on the left/top/botom and the other half would be locked to the right/top/bottom).
    Given the limitations above we thought it would be best to just do what has been done in other architecutures and have the IO souround the chip and in order to avoid significant overhead that also meant we needed to have the pins on all 4 sides of the CLBs.

@vaughnbetz
Copy link
Contributor

Thanks for the fast and detailed answers @WhiteNinjaZ . That all makes sense.
@MohamedElgammal : I guess that means we should either not show the extra stubs in the figure, or show them vertically.

@vaughnbetz
Copy link
Contributor

@WhiteNinjaZ : can this PR be merged? It has a couple of CI failures -- are they spurious?

I think we need to get this merged ASAP.

@WhiteNinjaZ
Copy link
Contributor Author

@vaughnbetz: I am putting the finishing touches on the last version of this architecture and should be done in the next few days. I believe the CI failures are spurious since this PR does not make any changes to the VTR code base and only adds a new architecture.

@vaughnbetz
Copy link
Contributor

vaughnbetz commented Sep 30, 2024 via email

@WhiteNinjaZ
Copy link
Contributor Author

@vaughnbetz just pushed the most updated architecture: note that we are still running a few tests to try and figure out the weird increase in wirelength for the designs and if needed will make a few final changes to this architecture.

Copy link
Contributor

@vaughnbetz vaughnbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggested minor fixes to the comment at the top.
Also, the filename has a typo I think: cary should be carry

@vaughnbetz
Copy link
Contributor

If you make these changes and remove the WIP we can merge this.
Is there a test for the architecture? If not, please make one or file an issue to make one. It can just test the 6 VTR circuits at the fixed channel width, with very loose QoR bounds.

@WhiteNinjaZ WhiteNinjaZ changed the title WIP: Accurate Xilinx 7 series architecture Accurate Xilinx 7 series architecture Oct 15, 2024
@WhiteNinjaZ
Copy link
Contributor Author

@vaughnbetz This should be good to merge as soon as I add some tests. I will be able to do that tomorrow evening. A few notes: from the last major push of this architecture (from June) minW and wirelength have gone up by about 15% on average (the values collected for this version of the architecture have been updated in the vtr paper). The main culprit of this increase was more tightly spaced BRAM and DSP tiles in the earlier architecture and the fact that some of our FC override specifications for the CLB tile where actually never being implemented in the generated RR graph for the earlier arch version.

@vaughnbetz
Copy link
Contributor

Thanks @WhiteNinjaZ (and @jgoeders). We should merge this very soon .. the VTR paper is pretty close to done. We can talk about it today, but the 7-series arch is now one of a very small number of outstanding items, so we should merge this and close out that section ASAP.

@jgoeders
Copy link
Contributor

@vaughnbetz Sounds good. I'm out of town today and can't make the meeting, but it sounds like Joshua is done with the paper and just working on getting this PR through. I'll give our section another read through tomorrow to double check everything.

@vaughnbetz
Copy link
Contributor

Looks good. I also suggest adding a single small circuit (in another test suite) to vtr_reg_strong so it runs on the github runners, and is tested more frequently (even if the google CI is down). I'd make that one a fixed channel width run, again to keep the runtime low (and please post what the runtime is for that design). Ideally this would be a design that runs in seconds or tens of seconds.

@vaughnbetz
Copy link
Contributor

One failure, but it is a known problem with odd-even routing unit tests (@soheilshahrouz : please disable that one). Merging this and filing an issue for the strong (small circuit) test.

@vaughnbetz vaughnbetz merged commit b5d01dc into verilog-to-routing:master Oct 21, 2024
36 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants