Description
I am using an iCE40 HX8K device, Yosys 0.9+932 (git sha1 65f197e2, clang 6.0.0-1ubuntu2 -fPIC -Os) and nextpnr-ice40 git sha1 6a33541.
When I synthesise the following code
module top (input clk,output reg [7:0] z,input [7:0] x,input [7:0] y);
reg [7:0] xr,yr;
always @(posedge clk) begin
xr<=x;
yr<=y;
// xr<={1'b0,x[6:0]};
// yr<={1'b0,y[6:0]};
z<=xr+yr;
end
endmodule
using Yosys' default iCE40 synthesis script and then place-and-route it using nextpnr, again with default settings, it estimates a maximum clock frequency of 365 MHz. If I replace the assignments to xr and yr with the ones shown commented out, then the maximum clock frequency estimate is 263 MHz, around 40% slower. The reason for this appears to be that in the latter case nextpnr does not succeed in packing the LUT required to extract the carry out signal (i.e., z[7]) into the same logic cell as the flip-flop that registers it, which adds considerable routing delay to the critical path.
In practice it seems to be difficult to obtain a "carry out" signal from any adder without incurring a delay penalty like this. I imagine that such a path is critical in designs such as processor cores and in some situations where a comparator is used, and so many designs could see a noticeable performance improvement if it were possible to address this.