Chisel教程——06.阶段性汇总:实现一个FIR滤波器(Chisel实现4-bit的FIR滤波器和参数化FIR滤波器)

阶段性汇总:实现一个FIR滤波器

动机

截至目前,我们已经掌握了Chisel的基础,这一节尝试用前面所学构建一个FIR(Finite Impulse Response,有限脉冲相应)滤波器模块。

FIR滤波器在数字信号处理中十分常见,在后续学习中也会经常出现,所以必须掌握。

这里先贴上百度百科对FIR滤波器的定义:

FIR(Finite Impulse Response)滤波器:有限长单位冲激响应滤波器,又称为非递归型滤波器,是数字信号处理系统中最基本的元件,它可以在保证任[幅频特性的同时具有严格的线性相频特性,同时其单位抽样响应是有限长的,因而滤波器是稳定的系统。因此,FIR滤波器在通信、图像处理、模式识别等领域都有着广泛的应用。

FIR滤波器

本节设计实现的FIR滤波器需要能够执行以下操作:

Chisel教程——06.阶段性汇总:实现一个FIR滤波器(Chisel实现4-bit的FIR滤波器和参数化FIR滤波器)_第1张图片

实际上,这个模块就是执行了对应位相乘(Element-wise Multiplication),两部分操作数分别是滤波器的系数元素和输入信号的元素,然后对他们的乘积求和,也叫做卷积(Convolution)。

数学上的基于信号的定义就是:
y n = b 0 x n + b 1 x n − 1 + b 2 x n − 2 + . . . y_n=b_0x_n+b_1x_{n-1}+b_2x_{n-2}+... yn=b0xn+b1xn1+b2xn2+...
其中:

  1. y n y_n yn是在时间 n n n是的输出信号;
  2. x n x_n xn是在时间 n n n时的输入信号;
  3. b i b_i bi是滤波器的系数或叫作脉冲响应;
  4. n − 1 , n − 2 , . . . n-1,n-2,... n1,n2,...是时间 n n n被延迟了1,2,……个周期

8-bit规格的四元素FIR滤波器实现

现在尝试创建一个四元素的FIR滤波器模块,滤波器的系数是模块的参数。

输入和输出都是8-bit的无符号整数。

提示:

  1. 需要用类似移位寄存器的构造来存储必要的状态(比如延迟的信号值);
  2. 常数输入的寄存器可以用移位值为1的ShiftRegister来实现,也可以用RegNext构造来实现;
  3. 所有的寄存器初始化为0。

首先确定一下输入输出:

  • 输入:一个8-bit的无符号整数输入信号;

  • 输出:一个8-bit的无符号整数输出信号;

然后确定一下需要存储哪些状态:

  • 滤波器的系数可以通过硬连线直接给定,不需要存储,由模块的参数给出值;
  • 被延迟的输入信号是需要存储的,语法为RegNext(next_val, init_value)

于是实现如下:

// MyModule.scala
import chisel3._
import chisel3.util._

class MyModule(b0: Int, b1: Int, b2: Int, b3: Int) extends Module {
  val io = IO(new Bundle {
    val in  = Input(UInt(8.W))
    val out = Output(UInt(8.W))
  })

  val x_n1 = RegNext(io.in, 0.U)
  val x_n2 = RegNext(x_n1, 0.U)
  val x_n3 = RegNext(x_n2, 0.U)
  io.out := io.in * b0.U + x_n1 * b1.U + x_n2 * b2.U + x_n3 * b3.U
}

object MyModule extends App {
  println(getVerilogString(new MyModule(0, 0, 0, 0)))
}

// MyModuleTest.scala
import chisel3._
import chiseltest._
import org.scalatest.flatspec.AnyFlatSpec

class MyModuleTest extends AnyFlatSpec with ChiselScalatestTester {
  behavior of "MyModule"
  it should "get right results" in {
    // Simple sanity check: a element with all zero coefficients should always produce zero
    test(new MyModule(0, 0, 0, 0)) { c =>
        c.io.in.poke(0.U)
        c.io.out.expect(0.U)
        c.clock.step(1)
        c.io.in.poke(4.U)
        c.io.out.expect(0.U)
        c.clock.step(1)
        c.io.in.poke(5.U)
        c.io.out.expect(0.U)
        c.clock.step(1)
        c.io.in.poke(2.U)
        c.io.out.expect(0.U)
    }
    // Simple 4-point moving average
    test(new MyModule(1, 1, 1, 1)) { c =>
        c.io.in.poke(1.U)
        c.io.out.expect(1.U)  // 1, 0, 0, 0
        c.clock.step(1)
        c.io.in.poke(4.U)
        c.io.out.expect(5.U)  // 4, 1, 0, 0
        c.clock.step(1)
        c.io.in.poke(3.U)
        c.io.out.expect(8.U)  // 3, 4, 1, 0
        c.clock.step(1)
        c.io.in.poke(2.U)
        c.io.out.expect(10.U)  // 2, 3, 4, 1
        c.clock.step(1)
        c.io.in.poke(7.U)
        c.io.out.expect(16.U)  // 7, 2, 3, 4
        c.clock.step(1)
        c.io.in.poke(0.U)
        c.io.out.expect(12.U)  // 0, 7, 2, 3
    }
    // Nonsymmetric filter
    test(new MyModule(1, 2, 3, 4)) { c =>
        c.io.in.poke(1.U)
        c.io.out.expect(1.U)  // 1*1, 0*2, 0*3, 0*4
        c.clock.step(1)
        c.io.in.poke(4.U)
        c.io.out.expect(6.U)  // 4*1, 1*2, 0*3, 0*4
        c.clock.step(1)
        c.io.in.poke(3.U)
        c.io.out.expect(14.U)  // 3*1, 4*2, 1*3, 0*4
        c.clock.step(1)
        c.io.in.poke(2.U)
        c.io.out.expect(24.U)  // 2*1, 3*2, 4*3, 1*4
        c.clock.step(1)
        c.io.in.poke(7.U)
        c.io.out.expect(36.U)  // 7*1, 2*2, 3*3, 4*4
        c.clock.step(1)
        c.io.in.poke(0.U)
        c.io.out.expect(32.U)  // 0*1, 7*2, 2*3, 3*4
    }
    println("SUCCESS!!")
  }
}

Verilog代码输入如下:

module MyModule(
  input        clock,
  input        reset,
  input  [7:0] io_in,
  output [7:0] io_out
);
`ifdef RANDOMIZE_REG_INIT
  reg [31:0] _RAND_0;
  reg [31:0] _RAND_1;
  reg [31:0] _RAND_2;
`endif // RANDOMIZE_REG_INIT
  reg [7:0] x_n1; // @[MyModule.scala 12:21]
  reg [7:0] x_n2; // @[MyModule.scala 13:21]
  reg [7:0] x_n3; // @[MyModule.scala 14:21]
  wire [8:0] _io_out_T = io_in * 1'h0; // @[MyModule.scala 15:19]
  wire [8:0] _io_out_T_1 = x_n1 * 1'h0; // @[MyModule.scala 15:33]
  wire [8:0] _io_out_T_3 = _io_out_T + _io_out_T_1; // @[MyModule.scala 15:26]
  wire [8:0] _io_out_T_4 = x_n2 * 1'h0; // @[MyModule.scala 15:47]
  wire [8:0] _io_out_T_6 = _io_out_T_3 + _io_out_T_4; // @[MyModule.scala 15:40]
  wire [8:0] _io_out_T_7 = x_n3 * 1'h0; // @[MyModule.scala 15:61]
  wire [8:0] _io_out_T_9 = _io_out_T_6 + _io_out_T_7; // @[MyModule.scala 15:54]
  assign io_out = _io_out_T_9[7:0]; // @[MyModule.scala 15:10]
  always @(posedge clock) begin
    if (reset) begin // @[MyModule.scala 12:21]
      x_n1 <= 8'h0; // @[MyModule.scala 12:21]
    end else begin
      x_n1 <= io_in; // @[MyModule.scala 12:21]
    end
    if (reset) begin // @[MyModule.scala 13:21]
      x_n2 <= 8'h0; // @[MyModule.scala 13:21]
    end else begin
      x_n2 <= x_n1; // @[MyModule.scala 13:21]
    end
    if (reset) begin // @[MyModule.scala 14:21]
      x_n3 <= 8'h0; // @[MyModule.scala 14:21]
    end else begin
      x_n3 <= x_n2; // @[MyModule.scala 14:21]
    end
  end
// Register and memory initialization
... // 省略
endmodule

测试通过。

FIR滤波器生成器

这一部分需要后面的内容,但现在先介绍构建FIR滤波器生成器的基本思想。

这个生成器有一个长度参数length,这个参数指示了滤波器的节拍数(taps),每一拍的系数由硬件模块的输入给定。

因此,这个生成器有三个输入:

  1. in,滤波器的输入;
  2. valid,输入的有效位;
  3. consts,taps系数的常数向量;

以及一个输出:

  1. out,滤波器的输出

因此实现如下:

import chisel3._
import chisel3.util._

class MyModule(length: Int) extends Module {
  val io = IO(new Bundle {
    val in  = Input(UInt(8.W))
    val valid = Input(Bool())
    val consts = Input(Vec(length, UInt(8.W))) // 后面会提到的用法
    val out = Output(UInt(8.W))
  })

  // 后面才会提到的用法
  val taps = Seq(io.in) ++ Seq.fill(io.consts.length - 1)(RegInit(0.U(8.W)))
  taps.zip(taps.tail).foreach { case (a, b) => when (io.valid) { b := a } }

  io.out := taps.zip(io.consts).map { case (a, b) => a * b }.reduce(_ + _)
}

object MyModule extends App {
  println(getVerilogString(new MyModule(4)))
}

生成的Verilog代码如下:

module MyModule(
  input        clock,
  input        reset,
  input  [7:0] io_in,
  input        io_valid,
  input  [7:0] io_consts_0,
  input  [7:0] io_consts_1,
  input  [7:0] io_consts_2,
  input  [7:0] io_consts_3,
  output [7:0] io_out
);
`ifdef RANDOMIZE_REG_INIT
  reg [31:0] _RAND_0;
  reg [31:0] _RAND_1;
  reg [31:0] _RAND_2;
`endif // RANDOMIZE_REG_INIT
  reg [7:0] taps_1; // @[MyModule.scala 13:66]
  reg [7:0] taps_2; // @[MyModule.scala 13:66]
  reg [7:0] taps_3; // @[MyModule.scala 13:66]
  wire [15:0] _io_out_T = io_in * io_consts_0; // @[MyModule.scala 16:56]
  wire [15:0] _io_out_T_1 = taps_1 * io_consts_1; // @[MyModule.scala 16:56]
  wire [15:0] _io_out_T_2 = taps_2 * io_consts_2; // @[MyModule.scala 16:56]
  wire [15:0] _io_out_T_3 = taps_3 * io_consts_3; // @[MyModule.scala 16:56]
  wire [15:0] _io_out_T_5 = _io_out_T + _io_out_T_1; // @[MyModule.scala 16:71]
  wire [15:0] _io_out_T_7 = _io_out_T_5 + _io_out_T_2; // @[MyModule.scala 16:71]
  wire [15:0] _io_out_T_9 = _io_out_T_7 + _io_out_T_3; // @[MyModule.scala 16:71]
  assign io_out = _io_out_T_9[7:0]; // @[MyModule.scala 16:10]
  always @(posedge clock) begin
    if (reset) begin // @[MyModule.scala 13:66]
      taps_1 <= 8'h0; // @[MyModule.scala 13:66]
    end else if (io_valid) begin // @[MyModule.scala 14:64]
      taps_1 <= io_in; // @[MyModule.scala 14:68]
    end
    if (reset) begin // @[MyModule.scala 13:66]
      taps_2 <= 8'h0; // @[MyModule.scala 13:66]
    end else if (io_valid) begin // @[MyModule.scala 14:64]
      taps_2 <= taps_1; // @[MyModule.scala 14:68]
    end
    if (reset) begin // @[MyModule.scala 13:66]
      taps_3 <= 8'h0; // @[MyModule.scala 13:66]
    end else if (io_valid) begin // @[MyModule.scala 14:64]
      taps_3 <= taps_2; // @[MyModule.scala 14:68]
    end
  end
// Register and memory initialization
... // 省略
endmodule

可以看到,输入输出那里的Vec和后面的Seq在生成的Verilog代码中都展开了,后面会学习到具体的用法。

DSP块(DspBlock)的应用和测试

继承DSP组件到一个大的系统很有挑战性,也很容易出错。dsptools/rocket at master · ucb-bar/dsptools (github.com)这个仓库就包含了一些有用的生成器可以帮助完成类似的工作。

核的一种抽象记作DspBlock,一个DspBlock应包括:

  • AXI-4流输入和输出;
  • 内存映射状态和控制(这个例子中就是AXI4)

注:AXI(Advanced eXtensible Interface)是一种总线协议。

Chisel教程——06.阶段性汇总:实现一个FIR滤波器(Chisel实现4-bit的FIR滤波器和参数化FIR滤波器)_第2张图片

DspBlock使用了rocket的diplomatic接口,Diplomacy and TileLink from the Rocket Chip · lowRISC: Collaborative open silicon engineering概括了关于diplomacy的基础知识,但不用管这个例子里面它是咋工作的。

把许多个不同的DspBlock连接到一起构成一个复杂的SoC是,diplomacy就会大放异彩。

这个例子中,只是制作了一个外设。StandaloneBlock特性被混合在一起来让diplomacy接口作为顶级的IO接口工作。只有当DspBlock被用作没有任何diplomacy连接的接口时,才需要用到StandaloneBlock特性。

使用DspBlock需要在build.sbt中添加添加以下依赖:

libraryDependencies += "edu.berkeley.cs" %% "rocket-dsptools" % "1.4.3"

下面的例子就是将FIR滤波器封装进了AXI4接口:

import chisel3._
import chisel3.util._
import dspblocks._
import freechips.rocketchip.amba.axi4._
import freechips.rocketchip.amba.axi4stream._
import freechips.rocketchip.config._
import freechips.rocketchip.diplomacy._
import freechips.rocketchip.regmapper._

class MyModule(length: Int) extends Module {
  val io = IO(new Bundle {
    val in  = Input(UInt(8.W))
    val valid = Input(Bool())
    val consts = Input(Vec(length, UInt(8.W))) // 后面会提到的用法
    val out = Output(UInt(8.W))
  })

  // 后面才会提到的用法
  val taps = Seq(io.in) ++ Seq.fill(io.consts.length - 1)(RegInit(0.U(8.W)))
  taps.zip(taps.tail).foreach { case (a, b) => when (io.valid) { b := a } }

  io.out := taps.zip(io.consts).map { case (a, b) => a * b }.reduce(_ + _)
}

object MyModule extends App {
  println(getVerilogString(new MyModule(4)))
}

//
// Base class for all FIRBlocks.
// This can be extended to make TileLink, AXI4, APB, AHB, etc. flavors of the FIR filter
//
abstract class FIRBlock[D, U, EO, EI, B <: Data](val nFilters: Int, val nTaps: Int)(implicit p: Parameters)
// HasCSR means that the memory interface will be using the RegMapper API to define status and control registers
extends DspBlock[D, U, EO, EI, B] with HasCSR {
    // diplomatic node for the streaming interface
    // identity node means the output and input are parameterized to be the same
    val streamNode = AXI4StreamIdentityNode()
    
    // define the what hardware will be elaborated
    lazy val module = new LazyModuleImp(this) {
        // get streaming input and output wires from diplomatic node
        val (in, _)  = streamNode.in(0)
        val (out, _) = streamNode.out(0)

        require(in.params.n >= nFilters,
                s"""AXI-4 Stream port must be big enough for all 
                   |the filters (need $nFilters,, only have ${in.params.n})""".stripMargin)

        // make registers to store taps
        val taps = Reg(Vec(nFilters, Vec(nTaps, UInt(8.W))))

        // memory map the taps, plus the first address is a read-only field that says how many filter lanes there are
        val mmap = Seq(
            RegField.r(64, nFilters.U, RegFieldDesc("nFilters", "Number of filter lanes"))
        ) ++ taps.flatMap(_.map(t => RegField(8, t, RegFieldDesc("tap", "Tap"))))

        // generate the hardware for the memory interface
        // in this class, regmap is abstract (unimplemented). mixing in something like AXI4HasCSR or TLHasCSR
        // will define regmap for the particular memory interface
        regmap(mmap.zipWithIndex.map({case (r, i) => i * 8 -> Seq(r)}): _*)

        // make the FIR lanes and connect inputs and taps
        val outs = for (i <- 0 until nFilters) yield {
            val fir = Module(new MyModule(nTaps))
            
            fir.io.in := in.bits.data((i+1)*8, i*8)
            fir.io.valid := in.valid && out.ready
            fir.io.consts := taps(i)            
            fir.io.out
        }

        val output = if (outs.length == 1) {
            outs.head
        } else {
            outs.reduce((x: UInt, y: UInt) => Cat(y, x))
        }

        out.bits.data := output
        in.ready  := out.ready
        out.valid := in.valid
    }
}

// make AXI4 flavor of FIRBlock
abstract class AXI4FIRBlock(nFilters: Int, nTaps: Int)(implicit p: Parameters) extends FIRBlock[AXI4MasterPortParameters, AXI4SlavePortParameters, AXI4EdgeParameters, AXI4EdgeParameters, AXI4Bundle](nFilters, nTaps) with AXI4DspBlock with AXI4HasCSR {
    override val mem = Some(AXI4RegisterNode(
        AddressSet(0x0, 0xffffL), beatBytes = 8
    ))
}

// running the code below will show what firrtl is generated
// note that LazyModules aren't really chisel modules- you need to call ".module" on them when invoking the chisel driver
// also note that AXI4StandaloneBlock is mixed in- if you forget it, you will get weird diplomacy errors because the memory
// interface expects a master and the streaming interface expects to be connected. AXI4StandaloneBlock will add top level IOs
// println(chisel3.Driver.emit(() => LazyModule(new AXI4FIRBlock(1, 8)(Parameters.empty) with AXI4StandaloneBlock).module))

DspBlock的测试稍有不同,现在要跟内存接口和LazyModule打交道,dsptools里面有一些特性可以帮助测试DspBl;ock

一个重要的特性是MemMasterModel,这个特性定义了memReadWordmemWriteWord这样的通用函数用于内存操作。这就允许你写一个通用的测试,可以指定你使用的是那种内存接口,比如你写了一个测试然后特殊化到TileLink和AXI4接口。

下面的例子就是用这种方法测试FIRBlock的:

import dsptools.tester.MemMasterModel
import freechips.rocketchip.amba.axi4
import chisel3.iotesters._

abstract class FIRBlockTester[D, U, EO, EI, B <: Data](c: FIRBlock[D, U, EO, EI, B]) extends PeekPokeTester(c.module) with MemMasterModel {
    // check that address 0 is the number of filters
    require(memReadWord(0) == c.nFilters)
    // write 1 to all the taps
    for (i <- 0 until c.nFilters * c.nTaps) {
        memWriteWord(8 + i * 8, 1)
    }
}

// specialize the generic tester for axi4
class AXI4FIRBlockTester(c: AXI4FIRBlock with AXI4StandaloneBlock) extends FIRBlockTester(c) with AXI4MasterModel {
    def memAXI = c.ioMem.get
}

// invoking testers on lazymodules is a little strange.
// note that the firblocktester takes a lazymodule, not a module (it calls .module in "extends PeekPokeTester()").
val lm = LazyModule(new AXI4FIRBlock(1, 8)(Parameters.empty) with AXI4StandaloneBlock)
chisel3.iotesters.Driver(() => lm.module) { _ => new AXI4FIRBlockTester(lm) }

但是很遗憾,教程中的这个测试的例子没有运行成功,我也没能找到解决方案,只好作罢,后面涉及到这块的时候再做进一步研究。

你可能感兴趣的:(Chisel速成班教程,scala,人工智能,fpga开发,Chisel,计算机体系结构)