速度,在FPGA设计中包括三个方面的含义:流量(吞吐量,Throughput),时滞(latency)和时序(timing)。其中流量或吞吐量指的是每个时钟处理的数据量;时滞指从数据输入到处理结束输出数据经过的时钟延时;时序指的是时序元件之间的延时.比如我们说一个设计不满足时序时指的是关键路径的延时,即两个触发器之间的延时大于时钟周期。
流水线设计可以提高设计的吞吐率,缺点是增加了面积。比如下面一个求三次冥采用迭代方法的实现:
1: module power3(
2: output [7:0] XPower,
3: output finished,
4: input [7:0] X,
5: input clk, start); // the duration of start is a single clock
6: reg [7:0] ncount;
7: reg [7:0] XPower;
8: assign finished = (ncount == 0);
9: always@(posedge clk)
10: if(start) begin
11: XPower <= X;
12: ncount <= 2;
13: end
14: else if(!finished) begin
15: ncount <= ncount - 1;
16: XPower <= XPower * X;
17: end
18: endmodule
综合的结果如下:
这个设计吞吐率为8/3,或2.7位/时钟;时滞为3时钟,时序为关键路径延时为一个乘法器。
将迭代环路拆开,使用流水线方式实现如下:
1: module power3(
2: output reg [7:0] XPower,
3: input clk,
4: input [7:0] X
5: );
6: reg [7:0] XPower1, XPower2;
7: reg [7:0] X1, X2;
8: always @(posedge clk) begin
9: // Pipeline stage 1
10: X1 <= X;
11: XPower1 <= X;
12: // Pipeline stage 2
13: X2 <= X1;
14: XPower2 <= XPower1 * X1;
15: // Pipeline stage 3
16: XPower <= XPower2 * X2;
17: end
18: endmodule
综合的结果如下
这个设计吞吐率为8,时滞为3个时钟,关键路径延时仍为一个乘法器。
低时滞通过最小化中间延时的延时实现快速地输入传播到输出的设计,采用并行性,去除流水线,缩短逻辑。如将上面例子中流水线实现的求三次冥的过程将流水线去除,使输入到输出的时序最小化。
1: module power3(
2: output [7:0] XPower,
3: input [7:0] X
4: );
5: reg [7:0] XPower1, XPower2;
6: reg [7:0] X1, X2;
7: assign XPower = XPower2 * X2;
8: always @* begin
9: X1 = X;
10: XPower1 = X;
11: end
12: always @* begin
13: X2 = X1;
14: XPower2 = XPower1*X1;
15: end
16: endmodule
综合的结果如下:
流量为8位/时钟,时滞在一个乘法器和两个乘法器延时之间,0时钟;关键路径延时为2个乘法器。
可见去除流水线减少了时滞,却增加了组合延时。
关键路径的延时决定了系统时钟的最大速度,逻辑延时包括时钟沿到来到数据输出的延时、组合逻辑延时、布线延时、建立时间延时
方法加1增寄存器。 在时滞不影响设计情况下,在关键路径中加入寄存器可缩短关键延时,提高时钟速度。如下一个FIR滤波器的例子。
1: module fir(
2: output [7:0] Y,
3: input [7:0] A, B, C, X,
4: input clk,
5: input validsample);
6: reg [7:0] X1, X2, Y;
7: always @(posedge clk)
8: if(validsample) begin
9: X1 <= X;
10: X2 <= X1;
11: Y <= A* X+B* X1+C* X2;
12: end
13: endmodule
综合的结果如下:
其中关键路径延时为一个乘法器延时和一个加法器延时。可以在乘法器和加法器中间加入一个寄存器,可以缩短关键路径延时,此时关键路径延时为一个乘法器如下图:
1: module fir(
2: output [7:0] Y,
3: input [7:0] A, B, C, X,
4: input clk,
5: input validsample);
6: reg [7:0] X1, X2, Y;
7: reg [7:0] prod1, prod2, prod3;
8: always @ (posedge clk) begin
9: if(validsample) begin
10: X1 <= X;
11: X2 <= X1;
12: prod1 <= A * X;
13: prod2 <= B * X1;
14: prod3 <= C * X2;
15: end
16: Y <= prod1 + prod2 + prod3;
17: end
18: endmodule
综合结果如下:
缩短关键路径的方法还有将系统采用并行设计,拆分成子单元;在一些存在优先级编码的设计中,去除优先级;
总结:速度和面积是一对互相矛盾、互相制约的量,即提高速度要牺牲面积,减小面积会降低速度,在设计中要权衡各种因素,优化。
Matthew 5:43-45“[Love for Enemies] “You have heard that it was said, ‘Love your neighbor and hate your enemy.’ But I tell you, love your enemies and pray for those who persecute you, that you may be children of your Father in heaven. He causes his sun to rise on the evil and the good, and sends rain on the righteous and the unrighteous.” |