截至目前,我们已经掌握了Chisel的基础,这一节尝试用前面所学构建一个FIR(Finite Impulse Response,有限脉冲相应)滤波器模块。
FIR滤波器在数字信号处理中十分常见,在后续学习中也会经常出现,所以必须掌握。
这里先贴上百度百科对FIR滤波器的定义:
FIR(Finite Impulse Response)滤波器:有限长单位冲激响应滤波器,又称为非递归型滤波器,是数字信号处理系统中最基本的元件,它可以在保证任[幅频特性的同时具有严格的线性相频特性,同时其单位抽样响应是有限长的,因而滤波器是稳定的系统。因此,FIR滤波器在通信、图像处理、模式识别等领域都有着广泛的应用。
本节设计实现的FIR滤波器需要能够执行以下操作:
实际上,这个模块就是执行了对应位相乘(Element-wise Multiplication),两部分操作数分别是滤波器的系数元素和输入信号的元素,然后对他们的乘积求和,也叫做卷积(Convolution)。
数学上的基于信号的定义就是:
y n = b 0 x n + b 1 x n − 1 + b 2 x n − 2 + . . . y_n=b_0x_n+b_1x_{n-1}+b_2x_{n-2}+... yn=b0xn+b1xn−1+b2xn−2+...
其中:
现在尝试创建一个四元素的FIR滤波器模块,滤波器的系数是模块的参数。
输入和输出都是8-bit的无符号整数。
提示:
ShiftRegister
来实现,也可以用RegNext
构造来实现;首先确定一下输入输出:
输入:一个8-bit的无符号整数输入信号;
输出:一个8-bit的无符号整数输出信号;
然后确定一下需要存储哪些状态:
RegNext(next_val, init_value)
;于是实现如下:
// MyModule.scala
import chisel3._
import chisel3.util._
class MyModule(b0: Int, b1: Int, b2: Int, b3: Int) extends Module {
val io = IO(new Bundle {
val in = Input(UInt(8.W))
val out = Output(UInt(8.W))
})
val x_n1 = RegNext(io.in, 0.U)
val x_n2 = RegNext(x_n1, 0.U)
val x_n3 = RegNext(x_n2, 0.U)
io.out := io.in * b0.U + x_n1 * b1.U + x_n2 * b2.U + x_n3 * b3.U
}
object MyModule extends App {
println(getVerilogString(new MyModule(0, 0, 0, 0)))
}
// MyModuleTest.scala
import chisel3._
import chiseltest._
import org.scalatest.flatspec.AnyFlatSpec
class MyModuleTest extends AnyFlatSpec with ChiselScalatestTester {
behavior of "MyModule"
it should "get right results" in {
// Simple sanity check: a element with all zero coefficients should always produce zero
test(new MyModule(0, 0, 0, 0)) { c =>
c.io.in.poke(0.U)
c.io.out.expect(0.U)
c.clock.step(1)
c.io.in.poke(4.U)
c.io.out.expect(0.U)
c.clock.step(1)
c.io.in.poke(5.U)
c.io.out.expect(0.U)
c.clock.step(1)
c.io.in.poke(2.U)
c.io.out.expect(0.U)
}
// Simple 4-point moving average
test(new MyModule(1, 1, 1, 1)) { c =>
c.io.in.poke(1.U)
c.io.out.expect(1.U) // 1, 0, 0, 0
c.clock.step(1)
c.io.in.poke(4.U)
c.io.out.expect(5.U) // 4, 1, 0, 0
c.clock.step(1)
c.io.in.poke(3.U)
c.io.out.expect(8.U) // 3, 4, 1, 0
c.clock.step(1)
c.io.in.poke(2.U)
c.io.out.expect(10.U) // 2, 3, 4, 1
c.clock.step(1)
c.io.in.poke(7.U)
c.io.out.expect(16.U) // 7, 2, 3, 4
c.clock.step(1)
c.io.in.poke(0.U)
c.io.out.expect(12.U) // 0, 7, 2, 3
}
// Nonsymmetric filter
test(new MyModule(1, 2, 3, 4)) { c =>
c.io.in.poke(1.U)
c.io.out.expect(1.U) // 1*1, 0*2, 0*3, 0*4
c.clock.step(1)
c.io.in.poke(4.U)
c.io.out.expect(6.U) // 4*1, 1*2, 0*3, 0*4
c.clock.step(1)
c.io.in.poke(3.U)
c.io.out.expect(14.U) // 3*1, 4*2, 1*3, 0*4
c.clock.step(1)
c.io.in.poke(2.U)
c.io.out.expect(24.U) // 2*1, 3*2, 4*3, 1*4
c.clock.step(1)
c.io.in.poke(7.U)
c.io.out.expect(36.U) // 7*1, 2*2, 3*3, 4*4
c.clock.step(1)
c.io.in.poke(0.U)
c.io.out.expect(32.U) // 0*1, 7*2, 2*3, 3*4
}
println("SUCCESS!!")
}
}
Verilog代码输入如下:
module MyModule(
input clock,
input reset,
input [7:0] io_in,
output [7:0] io_out
);
`ifdef RANDOMIZE_REG_INIT
reg [31:0] _RAND_0;
reg [31:0] _RAND_1;
reg [31:0] _RAND_2;
`endif // RANDOMIZE_REG_INIT
reg [7:0] x_n1; // @[MyModule.scala 12:21]
reg [7:0] x_n2; // @[MyModule.scala 13:21]
reg [7:0] x_n3; // @[MyModule.scala 14:21]
wire [8:0] _io_out_T = io_in * 1'h0; // @[MyModule.scala 15:19]
wire [8:0] _io_out_T_1 = x_n1 * 1'h0; // @[MyModule.scala 15:33]
wire [8:0] _io_out_T_3 = _io_out_T + _io_out_T_1; // @[MyModule.scala 15:26]
wire [8:0] _io_out_T_4 = x_n2 * 1'h0; // @[MyModule.scala 15:47]
wire [8:0] _io_out_T_6 = _io_out_T_3 + _io_out_T_4; // @[MyModule.scala 15:40]
wire [8:0] _io_out_T_7 = x_n3 * 1'h0; // @[MyModule.scala 15:61]
wire [8:0] _io_out_T_9 = _io_out_T_6 + _io_out_T_7; // @[MyModule.scala 15:54]
assign io_out = _io_out_T_9[7:0]; // @[MyModule.scala 15:10]
always @(posedge clock) begin
if (reset) begin // @[MyModule.scala 12:21]
x_n1 <= 8'h0; // @[MyModule.scala 12:21]
end else begin
x_n1 <= io_in; // @[MyModule.scala 12:21]
end
if (reset) begin // @[MyModule.scala 13:21]
x_n2 <= 8'h0; // @[MyModule.scala 13:21]
end else begin
x_n2 <= x_n1; // @[MyModule.scala 13:21]
end
if (reset) begin // @[MyModule.scala 14:21]
x_n3 <= 8'h0; // @[MyModule.scala 14:21]
end else begin
x_n3 <= x_n2; // @[MyModule.scala 14:21]
end
end
// Register and memory initialization
... // 省略
endmodule
测试通过。
这一部分需要后面的内容,但现在先介绍构建FIR滤波器生成器的基本思想。
这个生成器有一个长度参数length
,这个参数指示了滤波器的节拍数(taps),每一拍的系数由硬件模块的输入给定。
因此,这个生成器有三个输入:
in
,滤波器的输入;valid
,输入的有效位;consts
,taps系数的常数向量;以及一个输出:
out
,滤波器的输出因此实现如下:
import chisel3._
import chisel3.util._
class MyModule(length: Int) extends Module {
val io = IO(new Bundle {
val in = Input(UInt(8.W))
val valid = Input(Bool())
val consts = Input(Vec(length, UInt(8.W))) // 后面会提到的用法
val out = Output(UInt(8.W))
})
// 后面才会提到的用法
val taps = Seq(io.in) ++ Seq.fill(io.consts.length - 1)(RegInit(0.U(8.W)))
taps.zip(taps.tail).foreach { case (a, b) => when (io.valid) { b := a } }
io.out := taps.zip(io.consts).map { case (a, b) => a * b }.reduce(_ + _)
}
object MyModule extends App {
println(getVerilogString(new MyModule(4)))
}
生成的Verilog代码如下:
module MyModule(
input clock,
input reset,
input [7:0] io_in,
input io_valid,
input [7:0] io_consts_0,
input [7:0] io_consts_1,
input [7:0] io_consts_2,
input [7:0] io_consts_3,
output [7:0] io_out
);
`ifdef RANDOMIZE_REG_INIT
reg [31:0] _RAND_0;
reg [31:0] _RAND_1;
reg [31:0] _RAND_2;
`endif // RANDOMIZE_REG_INIT
reg [7:0] taps_1; // @[MyModule.scala 13:66]
reg [7:0] taps_2; // @[MyModule.scala 13:66]
reg [7:0] taps_3; // @[MyModule.scala 13:66]
wire [15:0] _io_out_T = io_in * io_consts_0; // @[MyModule.scala 16:56]
wire [15:0] _io_out_T_1 = taps_1 * io_consts_1; // @[MyModule.scala 16:56]
wire [15:0] _io_out_T_2 = taps_2 * io_consts_2; // @[MyModule.scala 16:56]
wire [15:0] _io_out_T_3 = taps_3 * io_consts_3; // @[MyModule.scala 16:56]
wire [15:0] _io_out_T_5 = _io_out_T + _io_out_T_1; // @[MyModule.scala 16:71]
wire [15:0] _io_out_T_7 = _io_out_T_5 + _io_out_T_2; // @[MyModule.scala 16:71]
wire [15:0] _io_out_T_9 = _io_out_T_7 + _io_out_T_3; // @[MyModule.scala 16:71]
assign io_out = _io_out_T_9[7:0]; // @[MyModule.scala 16:10]
always @(posedge clock) begin
if (reset) begin // @[MyModule.scala 13:66]
taps_1 <= 8'h0; // @[MyModule.scala 13:66]
end else if (io_valid) begin // @[MyModule.scala 14:64]
taps_1 <= io_in; // @[MyModule.scala 14:68]
end
if (reset) begin // @[MyModule.scala 13:66]
taps_2 <= 8'h0; // @[MyModule.scala 13:66]
end else if (io_valid) begin // @[MyModule.scala 14:64]
taps_2 <= taps_1; // @[MyModule.scala 14:68]
end
if (reset) begin // @[MyModule.scala 13:66]
taps_3 <= 8'h0; // @[MyModule.scala 13:66]
end else if (io_valid) begin // @[MyModule.scala 14:64]
taps_3 <= taps_2; // @[MyModule.scala 14:68]
end
end
// Register and memory initialization
... // 省略
endmodule
可以看到,输入输出那里的Vec
和后面的Seq
在生成的Verilog代码中都展开了,后面会学习到具体的用法。
DspBlock
)的应用和测试继承DSP组件到一个大的系统很有挑战性,也很容易出错。dsptools/rocket at master · ucb-bar/dsptools (github.com)这个仓库就包含了一些有用的生成器可以帮助完成类似的工作。
核的一种抽象记作DspBlock
,一个DspBlock
应包括:
注:AXI(Advanced eXtensible Interface)是一种总线协议。
DspBlock
使用了rocket的diplomatic接口,Diplomacy and TileLink from the Rocket Chip · lowRISC: Collaborative open silicon engineering概括了关于diplomacy的基础知识,但不用管这个例子里面它是咋工作的。
把许多个不同的DspBlock
连接到一起构成一个复杂的SoC是,diplomacy就会大放异彩。
这个例子中,只是制作了一个外设。StandaloneBlock
特性被混合在一起来让diplomacy接口作为顶级的IO接口工作。只有当DspBlock
被用作没有任何diplomacy连接的接口时,才需要用到StandaloneBlock
特性。
使用DspBlock
需要在build.sbt
中添加添加以下依赖:
libraryDependencies += "edu.berkeley.cs" %% "rocket-dsptools" % "1.4.3"
下面的例子就是将FIR滤波器封装进了AXI4接口:
import chisel3._
import chisel3.util._
import dspblocks._
import freechips.rocketchip.amba.axi4._
import freechips.rocketchip.amba.axi4stream._
import freechips.rocketchip.config._
import freechips.rocketchip.diplomacy._
import freechips.rocketchip.regmapper._
class MyModule(length: Int) extends Module {
val io = IO(new Bundle {
val in = Input(UInt(8.W))
val valid = Input(Bool())
val consts = Input(Vec(length, UInt(8.W))) // 后面会提到的用法
val out = Output(UInt(8.W))
})
// 后面才会提到的用法
val taps = Seq(io.in) ++ Seq.fill(io.consts.length - 1)(RegInit(0.U(8.W)))
taps.zip(taps.tail).foreach { case (a, b) => when (io.valid) { b := a } }
io.out := taps.zip(io.consts).map { case (a, b) => a * b }.reduce(_ + _)
}
object MyModule extends App {
println(getVerilogString(new MyModule(4)))
}
//
// Base class for all FIRBlocks.
// This can be extended to make TileLink, AXI4, APB, AHB, etc. flavors of the FIR filter
//
abstract class FIRBlock[D, U, EO, EI, B <: Data](val nFilters: Int, val nTaps: Int)(implicit p: Parameters)
// HasCSR means that the memory interface will be using the RegMapper API to define status and control registers
extends DspBlock[D, U, EO, EI, B] with HasCSR {
// diplomatic node for the streaming interface
// identity node means the output and input are parameterized to be the same
val streamNode = AXI4StreamIdentityNode()
// define the what hardware will be elaborated
lazy val module = new LazyModuleImp(this) {
// get streaming input and output wires from diplomatic node
val (in, _) = streamNode.in(0)
val (out, _) = streamNode.out(0)
require(in.params.n >= nFilters,
s"""AXI-4 Stream port must be big enough for all
|the filters (need $nFilters,, only have ${in.params.n})""".stripMargin)
// make registers to store taps
val taps = Reg(Vec(nFilters, Vec(nTaps, UInt(8.W))))
// memory map the taps, plus the first address is a read-only field that says how many filter lanes there are
val mmap = Seq(
RegField.r(64, nFilters.U, RegFieldDesc("nFilters", "Number of filter lanes"))
) ++ taps.flatMap(_.map(t => RegField(8, t, RegFieldDesc("tap", "Tap"))))
// generate the hardware for the memory interface
// in this class, regmap is abstract (unimplemented). mixing in something like AXI4HasCSR or TLHasCSR
// will define regmap for the particular memory interface
regmap(mmap.zipWithIndex.map({case (r, i) => i * 8 -> Seq(r)}): _*)
// make the FIR lanes and connect inputs and taps
val outs = for (i <- 0 until nFilters) yield {
val fir = Module(new MyModule(nTaps))
fir.io.in := in.bits.data((i+1)*8, i*8)
fir.io.valid := in.valid && out.ready
fir.io.consts := taps(i)
fir.io.out
}
val output = if (outs.length == 1) {
outs.head
} else {
outs.reduce((x: UInt, y: UInt) => Cat(y, x))
}
out.bits.data := output
in.ready := out.ready
out.valid := in.valid
}
}
// make AXI4 flavor of FIRBlock
abstract class AXI4FIRBlock(nFilters: Int, nTaps: Int)(implicit p: Parameters) extends FIRBlock[AXI4MasterPortParameters, AXI4SlavePortParameters, AXI4EdgeParameters, AXI4EdgeParameters, AXI4Bundle](nFilters, nTaps) with AXI4DspBlock with AXI4HasCSR {
override val mem = Some(AXI4RegisterNode(
AddressSet(0x0, 0xffffL), beatBytes = 8
))
}
// running the code below will show what firrtl is generated
// note that LazyModules aren't really chisel modules- you need to call ".module" on them when invoking the chisel driver
// also note that AXI4StandaloneBlock is mixed in- if you forget it, you will get weird diplomacy errors because the memory
// interface expects a master and the streaming interface expects to be connected. AXI4StandaloneBlock will add top level IOs
// println(chisel3.Driver.emit(() => LazyModule(new AXI4FIRBlock(1, 8)(Parameters.empty) with AXI4StandaloneBlock).module))
对DspBlock
的测试稍有不同,现在要跟内存接口和LazyModule
打交道,dsptools
里面有一些特性可以帮助测试DspBl;ock
。
一个重要的特性是MemMasterModel
,这个特性定义了memReadWord
和memWriteWord
这样的通用函数用于内存操作。这就允许你写一个通用的测试,可以指定你使用的是那种内存接口,比如你写了一个测试然后特殊化到TileLink和AXI4接口。
下面的例子就是用这种方法测试FIRBlock
的:
import dsptools.tester.MemMasterModel
import freechips.rocketchip.amba.axi4
import chisel3.iotesters._
abstract class FIRBlockTester[D, U, EO, EI, B <: Data](c: FIRBlock[D, U, EO, EI, B]) extends PeekPokeTester(c.module) with MemMasterModel {
// check that address 0 is the number of filters
require(memReadWord(0) == c.nFilters)
// write 1 to all the taps
for (i <- 0 until c.nFilters * c.nTaps) {
memWriteWord(8 + i * 8, 1)
}
}
// specialize the generic tester for axi4
class AXI4FIRBlockTester(c: AXI4FIRBlock with AXI4StandaloneBlock) extends FIRBlockTester(c) with AXI4MasterModel {
def memAXI = c.ioMem.get
}
// invoking testers on lazymodules is a little strange.
// note that the firblocktester takes a lazymodule, not a module (it calls .module in "extends PeekPokeTester()").
val lm = LazyModule(new AXI4FIRBlock(1, 8)(Parameters.empty) with AXI4StandaloneBlock)
chisel3.iotesters.Driver(() => lm.module) { _ => new AXI4FIRBlockTester(lm) }
但是很遗憾,教程中的这个测试的例子没有运行成功,我也没能找到解决方案,只好作罢,后面涉及到这块的时候再做进一步研究。