CPU设计之二——VerilogHDL 开发流水线处理器(支持42条指令)

CPU设计之一——VerilogHDL 开发单周期处理器(支持10条指令)
CPU设计之三——VerilogHDL 开发流水线处理器(支持50条指令)

所有代码和参考文件已经上传至github:https://github.com/lcy19981225/Multi-Cycle-CPU-42

VerilogHDL 开发流水线处理器

  • 说明
  • 写在前面
    • 什么时候插nop
    • 在哪里设置旁路以及怎么选择旁路
    • 控制器的设计
  • 模块设计
    • IF阶段
      • PC
      • PCAdd4
      • im_4k
      • REG_IF_ID
    • ID阶段
      • CU
      • RegisterFile
      • if_c_adventure
      • FU
      • REG_ID_EX
    • EX阶段
      • ALU
      • REG_EX_MEM
    • MEM阶段
      • save_to_BE
      • dm_4k
      • REG_MEM_WB
    • WB阶段
      • data_ext_load
    • MUX
    • Extend
    • shifter
    • mips
    • test
  • 结果验证
    • 我的验证方法
    • 最终结果
  • 实验总结

说明

  • 这篇博客默认大家已经对流水线,数据冒险等知识已经掌握了,就不赘述了,不熟悉的话可以复习一下老师上课的内容或者参考一下相关的博客~
  • 处理器应支持MIPS-C3指令集。
    • MIPS-C3={ LB, LBU, LH, LHU, LW, SB, SH, SW, ADD, ADDU, SUB, SUBU, MULT, MULTU, DIV, DIVU, SLL, SRL, SRA, SLLV, SRLV, SRAV, AND, OR, XOR, NOR, ADDI, ADDIU, ANDI, ORI,XORI, LUI, SLT, SLTI, SLTIU, SLTU, BEQ, BNE, BLEZ, BGTZ, BLTZ, BGEZ, J, JAL, JALR, JR, MFHI, MFLO, MTHI, MTLO }。
    • 指令的具体含义可以参考我在单周期设计博客中github上的《MIPS32指令集》,建议先好好阅读该文档,搞明白每一条指令的具体含义再开始做!否则很容易出现反复修改的现象!
    • 本实验只完成了除乘除法之外的42条指令,部分代码包含乘除法的内容我并未做出解释
  • 处理器为流水线设计
  • 参考流水线顶层设计视图(本视图仅供参考,不能完整支持本次实验全部指令,建议自己规划)
    CPU设计之二——VerilogHDL 开发流水线处理器(支持42条指令)_第1张图片
  • 流水线 的设计以追求性能为第一目标,因此 的设计以追求性能为第一目标,因此 必须尽最大可能 支持转发以解决数据冒险
  • 指令存储器(IM)和数据存储器(DM)容量都扩充为8KB(32bit*2K)
  • 在没有想清楚流水线的时候就在VerilogHDL层次进行编码是不明智的。前人大量经验与教 训告诉我们, 对于大工程而言,在代码层次上做设计是短视的行为,表面看起来可以在早期迅速看到初步结果,但从整个项目的过程来看,开发人员将付出巨大的频繁试错的代价 (没错就是我) 。我强烈建议你将设计与实现阶段分离。
    • 设计:在EXCEL之类的工具中进行详尽的设计并进行逻辑推演,是高效率的设计方法。
    • 实现:当你认为你自己的设计没有问题或基本没有问题时,再用VerilogHDL将你的设计描述 出来。

写在前面

流水线和单周期有几个很明显不一样的地方,我挑选了几个我觉得很重要的首先在前面说明:

什么时候插nop

首先,想插入一个nop我们需要做三个操作,冻结pc,冻结IF/ID,清除ID/EX,这就需要我们分别在PC,IF/ID寄存器,ID/EX寄存器里面分别设置一个信号,从而知道需要插nop。

其次,就是什么时候需要插nop的问题,经过多次的尝试,包括在运行测试样例时候发现的问题,我自己认为一共有三种情况需要插入nop,分别是lw+add、add+b指令(jalr)、lw+beq,下面分别做分析。

第一种情况是指lw写入的寄存器在下一条指令中被当做rs或rt使用了,我们需要插nop是因为lw在mem才能返回结果,所以不能通过旁路来解决,只能插nop。

第二种情况是指前一条指令对寄存器的值进行了修改,但是后面一条指令为b指令或者jalr,b指令在id阶段就要比较rs和st从而决定下一条进入流水线的指令是哪一条,jalr需要得到rs存的最新的值才能决定是否跳转,但是上一条指令在ex后才能得出结果,同级之间不能通过旁路来解决,只能插nop。

第三种情况是lw后面跟了b指令,lw指令的写回的寄存器在b指令中被拿来判断是否需要跳转,但是lw写回的值在mem才能拿到,b指令又要在id就判断出结果,所以这种情况需要插2个nop。同时对于lw+x+beq的情况,同样需要插一条nop。

对于以上的各种情况,全部都在冒险检测单元FU中进行判断,如果需要插一个nop,那么stall信号就置为1,如果要插2个nop,stallstall信号就置为1。

在哪里设置旁路以及怎么选择旁路

一共需要设置两个旁路,一个设在ID,一个设在EX

在EX阶段设置一个旁路选择器,可以从EX/MEM或MEM/WB阶段提前获得数据

在ID阶段设置一个旁路选择器是因为对于像b指令这样的指令,我们在ID阶段就需要决定下一条进入流水线的指令是哪一条,所以我们在ID就要获得rs和rt的值,在这里同样存在一个值尚未写回的问题,比如ori $1,$0, 1+ori $2,$0, 1+beq $1,$2,l这三条指令的组合,我们就需要在ID阶段获得尚未写回的值,但是这个时候就只用从MEM/WB寄存器获得值就可以了,和EX阶段的旁路选择器有区别。

控制器的设计

我一共设计了两个控制器:

主控制器:指令译码,功能部件控制,MUX(不包括转发MUX)控制等。

冒险控制器:处理nop,转发控制

模块设计

IF阶段

PC

PC需要对reset进行判断,如果不需要重置,那么只用把PC置为下一条指令即可,同时要判断是否需要插nop从而冻结

module PC(Result,Clk,Reset,Address,stall,stallstall);  //En=1可以写,stall=0没有发生lw数据冒险
    input Clk;//时钟
    input Reset;//是否重置地址。0-初始化PC,否则接受新地址       
    input[31:0] Result;
    input stall,stallstall;
    output reg[31:0] Address;
    wire En;

    assign En=(~stall)&(~stallstall);
    
    initial begin
        Address  <= 32'h00003000;
    end
    
    always @(posedge Clk or negedge Reset)  
    begin  
        if(En==1)
        begin
             if (!Reset) //如果为0则初始化PC,否则接受新地址
                begin  
                    Address <= 32'h000003000;  
                end  
            else   
                begin
                    Address =  Result;  
                end  
        end
    end  
endmodule

PCAdd4

PCAdd4只需要对PC+4就行

module PCAdd4(PC,PCadd4);
    input [31:0] PC;//偏移量
    output [31:0] PCadd4;//新指令地址
    assign PCadd4 = PC+4;
endmodule

im_4k

从code42.txt读取指令码,这个地方需要额外注意地址个数扩展到2048的影响,就需要取地址的2到12位(而不是2到11),然后减掉11’b10000000000,因为比如0x30000000,可能会把3的最后一个1读进来,那显然就是一个无效值。

记得改文件的地址,在单周期设计里面解释过

module im_4k(Addr,Inst);//指令存储器
    input[31:0]Addr;
    reg [31:0]Rom[2047:0];

    output[31:0]Inst;

    initial 
    begin
        $readmemh("C:\\Users\\Y\\Desktop\\code42.txt", Rom);
    end
    
    integer i;
    initial begin
        $display("start simulation");
        for (i=0;i<20;i=i+1)
            $display("%d %h", i,Rom[i]);
    end
    assign Inst=Rom[Addr[12:2]-11'b10000000000];

endmodule

REG_IF_ID

主要是进行inst和pcadd4的传递,然后需要在插nop的时候冻结

module REG_IF_ID (IF_PCadd4,IF_Inst,Clk,Reset,ID_PCadd4,ID_Inst,stall,stallstall);
    input [31:0] IF_PCadd4,IF_Inst;
    input Clk,Reset,stall,stallstall;
    output reg[31:0] ID_PCadd4,ID_Inst;

    wire En,clr;
    assign En=(~stall)&(~stallstall);
    //assign clr=~condep;
    initial begin
        ID_Inst = 0;
        ID_PCadd4= 0 ;
    end
    
    always @(posedge Clk) begin 
        if(En==1)begin
            //$display("IF_Inst:",IF_Inst);
            ID_PCadd4 = IF_PCadd4;
            ID_Inst = IF_Inst;             
        end
    end  
endmodule

ID阶段

CU

控制单元控制各种信号的值,控制信号有

  • RegDst:判断寄存器堆的写回地址
  • Se:控制扩展单元是零扩展还是符号扩展
  • RegWrite:判断是否需要写回寄存器堆
  • ALUXSrc:控制ALU的B端输入是来自寄存器堆还是扩展单元
  • ALUOp:控制ALU进行运算的类型
  • MemWrite:判断是否需要对dm进行写操作,只有sw会用到
  • PCSrc:判断下一条指令的形式
  • MemtoReg:判断写回寄存器堆的数据来源,是从ALU来还是从dm来,只有lw会用到

对于多周期,需要多加几个信号

  • B_code:bltz和bgez之间的区别和别的指令都不太一样,额外设置了一个用来区分
  • ALUXSrc:ALU的第一个输入可以是rs,但是有的在位移操作中可以作为sa传进来
  • load_option:用来判断load指令的类型
  • save_option:用来判断save指令的类型
  • usigned:用来判断是否是unsigned指令

对于不同的命令取值如下参见CU.xlsx,我已上传到github,这里放一张截图
CPU设计之二——VerilogHDL 开发流水线处理器(支持42条指令)_第2张图片
可以看到信号是非常多的,所以大家一定要仔细核对MIPS手册,不然之后debug很麻烦

以上控制信号需要多个多路选择器,具体实现在MUX.v中介绍

module CU(ID_mfhi,ID_mflo,Inst,Func,ID_B_code,RegDst,Se,RegWrite,ALUXSrc,ALUYSrc,ALUControl,md_control,MemWrite,PCSrc,MemtoReg,load_option,save_option,usigned,c_adventure,md_signal);
    input [5:0]Func;
    input [31:0]Inst;
    input c_adventure,ID_B_code;
    output RegDst,Se,RegWrite,ALUXSrc,ALUYSrc,MemWrite,MemtoReg,usigned,md_signal;
    output [2:0]PCSrc,load_option,md_control;
    output [1:0]save_option;
    output [3:0]ALUControl;
    output ID_mfhi,ID_mflo;

    wire R_type=~Inst[31] & ~Inst[30] & ~Inst[29] & ~Inst[28] & ~Inst[27] & ~Inst[26];
    wire I_lb = Inst[31] & ~Inst[30] & ~Inst[29] & ~Inst[28] & ~Inst[27] & ~Inst[26];
    wire I_lbu = Inst[31] & ~Inst[30] & ~Inst[29] & Inst[28] & ~Inst[27] & ~Inst[26];
    wire I_lh = Inst[31] & ~Inst[30] & ~Inst[29] & ~Inst[28] & ~Inst[27] & Inst[26];
    wire I_lhu = Inst[31] & ~Inst[30] & ~Inst[29] & Inst[28] & ~Inst[27] & Inst[26];
    wire I_lw = Inst[31] & ~Inst[30] & ~Inst[29] & ~Inst[28] & Inst[27] & Inst[26];
    wire I_sb = Inst[31] & ~Inst[30] & Inst[29] & ~Inst[28] & ~Inst[27] & ~Inst[26];
    wire I_sh = Inst[31] & ~Inst[30] & Inst[29] & ~Inst[28] & ~Inst[27] & Inst[26];
    wire I_sw = Inst[31] & ~Inst[30] & Inst[29] & ~Inst[28] & Inst[27] & Inst[26];
    wire I_add = R_type & Func[5] & ~Func[4] & ~Func[3] & ~Func[2] & ~Func[1] & ~Func[0];
    wire I_addu = R_type & Func[5] & ~Func[4] & ~Func[3] & ~Func[2] & ~Func[1] & Func[0];
    wire I_sub = R_type & Func[5] & ~Func[4] & ~Func[3] & ~Func[2] & Func[1] & ~Func[0];
    wire I_subu = R_type & Func[5] & ~Func[4] & ~Func[3] & ~Func[2] & Func[1] & Func[0];
    wire I_sll = R_type & ~Func[5] & ~Func[4] & ~Func[3] & ~Func[2] & ~Func[1] & ~Func[0] & (Inst!=0);
    wire I_srl = R_type & ~Func[5] & ~Func[4] & ~Func[3] & ~Func[2] & Func[1] & ~Func[0];
    wire I_sra = R_type & ~Func[5] & ~Func[4] & ~Func[3] & ~Func[2] & Func[1] & Func[0];
    wire I_sllv = R_type & ~Func[5] & ~Func[4] & ~Func[3] & Func[2] & ~Func[1] & ~Func[0] & (Inst!=0);
    wire I_srlv = R_type & ~Func[5] & ~Func[4] & ~Func[3] & Func[2] & Func[1] & ~Func[0];
    wire I_srav = R_type & ~Func[5] & ~Func[4] & ~Func[3] & Func[2] & Func[1] & Func[0];
    wire I_and = R_type & Func[5] & ~Func[4] & ~Func[3] & Func[2] & ~Func[1] & ~Func[0];
    wire I_or = R_type & Func[5] & ~Func[4] & ~Func[3] & Func[2] & ~Func[1] & Func[0];
    wire I_xor = R_type & Func[5] & ~Func[4] & ~Func[3] & Func[2] & Func[1] & ~Func[0];
    wire I_nor = R_type & Func[5] & ~Func[4] & ~Func[3] & Func[2] & Func[1] & Func[0];
    wire I_addi = ~Inst[31] & ~Inst[30] & Inst[29] & ~Inst[28] & ~Inst[27] & ~Inst[26];
    wire I_addiu = ~Inst[31] & ~Inst[30] & Inst[29] & ~Inst[28] & ~Inst[27] & Inst[26];
    wire I_andi = ~Inst[31] & ~Inst[30] & Inst[29] & Inst[28] & ~Inst[27] & ~Inst[26];
    wire I_ori = ~Inst[31] & ~Inst[30] & Inst[29] & Inst[28] & ~Inst[27] & Inst[26];
    wire I_xori = ~Inst[31] & ~Inst[30] & Inst[29] & Inst[28] & Inst[27] & ~Inst[26];
    wire I_lui = ~Inst[31] & ~Inst[30] & Inst[29] & Inst[28] & Inst[27] & Inst[26];
    wire I_slt = R_type & Func[5] & ~Func[4] & Func[3] & ~Func[2] & Func[1] & ~Func[0];
    wire I_slti = ~Inst[31] & ~Inst[30] & Inst[29] & ~Inst[28] & Inst[27] & ~Inst[26];
    wire I_sltiu = ~Inst[31] & ~Inst[30] & Inst[29] & ~Inst[28] & Inst[27] & Inst[26];
    wire I_sltu = R_type & Func[5] & ~Func[4] & Func[3] & ~Func[2] & Func[1] & Func[0];
    wire I_beq = ~Inst[31] & ~Inst[30] & ~Inst[29] & Inst[28] & ~Inst[27] & ~Inst[26];
    wire I_bne = ~Inst[31] & ~Inst[30] & ~Inst[29] & Inst[28] & ~Inst[27] & Inst[26];
    wire I_blez = ~Inst[31] & ~Inst[30] & ~Inst[29] & Inst[28] & Inst[27] & ~Inst[26];
    wire I_bgtz = ~Inst[31] & ~Inst[30] & ~Inst[29] & Inst[28] & Inst[27] & Inst[26];
    wire I_bltz = ~Inst[31] & ~Inst[30] & ~Inst[29] & ~Inst[28] & ~Inst[27] & Inst[26] & ~ID_B_code;
    wire I_bgez = ~Inst[31] & ~Inst[30] & ~Inst[29] & ~Inst[28] & ~Inst[27] & Inst[26] & ID_B_code;
    wire I_j = ~Inst[31] & ~Inst[30] & ~Inst[29] & ~Inst[28] & Inst[27] & ~Inst[26];
    wire I_jal = ~Inst[31] & ~Inst[30] & ~Inst[29] & ~Inst[28] & Inst[27] & Inst[26];
    wire I_jalr = R_type & ~Func[5] & ~Func[4] & Func[3] & ~Func[2] & ~Func[1] & Func[0];
    wire I_jr = R_type & ~Func[5] & ~Func[4] & Func[3] & ~Func[2] & ~Func[1] & ~Func[0];
    wire I_mult = R_type & ~Func[5] & Func[4] & Func[3] & ~Func[2] & ~Func[1] & ~Func[0];
    wire I_multu = R_type & ~Func[5] & Func[4] & Func[3] & ~Func[2] & ~Func[1] & Func[0];
    wire I_div = R_type & ~Func[5] & Func[4] & Func[3] & ~Func[2] & Func[1] & ~Func[0];
    wire I_divu = R_type & ~Func[5] & Func[4] & Func[3] & ~Func[2] & Func[1] & Func[0];
    wire I_mthi = R_type & ~Func[5] & Func[4] & ~Func[3] & ~Func[2] & ~Func[1] & Func[0];
    wire I_mtlo = R_type & ~Func[5] & Func[4] & ~Func[3] & ~Func[2] & Func[1] & Func[0];
    wire I_mfhi = R_type & ~Func[5] & Func[4] & ~Func[3] & ~Func[2] & ~Func[1] & ~Func[0];
    wire I_mflo = R_type & ~Func[5] & Func[4] & ~Func[3] & ~Func[2] & Func[1] & ~Func[0];  
      
    assign RegDst = I_lb | I_lbu | I_lh | I_lhu | I_lw | I_addi | I_addiu | I_andi | I_ori | I_xori | I_lui | I_slti | I_sltiu;
    assign Se = I_lb | I_lbu | I_lh | I_lhu | I_lw | I_sb | I_sh | I_sw | I_addi | I_addiu | I_slti | I_sltiu | I_beq | I_bne | I_blez | I_bgtz | I_bltz | I_bgez;
    assign RegWrite = I_lb | I_lbu | I_lh | I_lhu | I_lw | I_add | I_addu | I_sub | I_subu | I_sll | I_srl | I_sra | I_sllv | I_srlv | I_srav | I_and | I_or | I_xor | I_nor | I_addi | I_addiu | I_andi | I_ori | I_xori | I_lui | I_slt | I_slti | I_sltiu | I_sltu | I_mfhi | I_mflo;
    assign ALUXSrc = I_sll | I_srl | I_sra;
    assign ALUYSrc = I_add | I_addu | I_sub | I_subu | I_and | I_or | I_xor | I_nor | I_slt | I_sltu | I_beq | I_bne | I_j | I_jal | I_jalr | I_jr | I_sll | I_srl | I_sra | I_sllv | I_srlv | I_srav;
    assign ALUControl[0] =I_sub | I_subu | I_sll | I_sllv | I_or | I_nor | I_ori | I_slt | I_slti | I_sltiu | I_sltu | I_beq | I_bne | I_bgtz | I_bgez;
    assign ALUControl[1] =I_sll | I_sllv | I_and | I_or | I_andi | I_ori | I_lui | I_blez | I_bgtz | I_sra | I_srav;
    assign ALUControl[2] =I_sll | I_sllv | I_xor | I_nor | I_xori | I_lui | I_bltz | I_bgez | I_sra | I_srav;
    assign ALUControl[3] =I_srl | I_sra | I_srlv | I_srav | I_slt | I_slti | I_sltiu | I_sltu | I_blez | I_bgtz | I_bltz | I_bgez | I_sll | I_sllv;
    assign MemWrite = I_sb | I_sh | I_sw;
    assign PCSrc[0] = (I_blez & c_adventure) | (I_bgtz & c_adventure)| (I_bltz & c_adventure)| (I_bgez & c_adventure)| I_jal | I_jalr | (I_beq & c_adventure) | (I_bne & ~c_adventure);
    assign PCSrc[1] = I_j | I_jal;
    assign PCSrc[2] = I_jalr | I_jr;
    assign MemtoReg = I_lb | I_lbu | I_lh | I_lhu | I_lw;
    assign load_option[0] = I_lb | I_lbu | I_lh | I_lhu;
    assign load_option[1] = I_lh | I_lhu;
    assign load_option[2] = I_lh | I_lb;
    assign save_option[0] = I_sb;
    assign save_option[1] = I_sh;
    assign usigned = I_lbu | I_lhu | I_addu | I_subu | I_addiu | I_sltiu | I_sltu;
    assign md_control[0] = I_multu | I_divu | I_mtlo | I_mflo;
    assign md_control[1] = I_div | I_divu | I_mfhi | I_mflo;
    assign md_control[2] = I_mthi | I_mtlo | I_mfhi | I_mflo;
    assign md_signal = I_mfhi | I_mflo;
    assign ID_mfhi = I_mfhi;
    assign ID_mflo = I_mflo;
endmodule

我设置类似于I_add的信号是为了方便debug,从而方便判断指令运行到哪一条了,而且也方便后面各种信号的赋值

以上代码均用python生成,我自己写的python代码也上传到github了,大家可以直接用也可以自己写一个,挺简单的

RegisterFile

寄存器堆需要找到存在寄存器中的值,还需要往寄存器堆中写入值,在jal,jalr的时候需要往寄存器里面写值,这个可以通过PCSrc的取值来判断,对于同时对寄存器读和写的同级冲突,我们设置为一直读,然后再Clk下降沿的时候写,这样就保证同级之间的信号传递。

module RegisterFile(ReadReg1,ReadReg2,rd,WriteData,WriteReg,RegWrite,CLK,Reset,ReadData1,ReadData2,PCSrc,PcAdd4);
    input [4:0] ReadReg1,rd;//rs
    input [4:0] ReadReg2;//rt或者立即数
    input [31:0] WriteData;//写入的数据
    input [4:0] WriteReg;//写入地址
    input RegWrite; //写信号
    input CLK;
    input Reset;
    input [2:0] PCSrc;
    input [31:0] PcAdd4;
    output [31:0] ReadData1;
    output [31:0] ReadData2;


    reg [31:0] regFile[31:0];
    
    integer i;
    initial begin
        for (i=0;i<32;i=i+1)
            regFile[i]<=0;
    end
      
    assign    ReadData1 = regFile[ReadReg1];
    assign    ReadData2 = regFile[ReadReg2];
        //$display("regfile %d %d\n", ReadReg1, ReadReg2);
    
    
    always@(negedge CLK )
    begin
     //$display("lala");
        if(RegWrite && WriteReg)
        begin
            regFile[WriteReg] = WriteData;
            //$display("%d %d",WriteReg, WriteData);
        end
    end

    always@(negedge CLK)
    begin
        if(PCSrc == 3'b011)
        begin
        //$display("%h",PcAdd4);
            //$display("lala");
            regFile[31] = PcAdd4+4;
            //$display("%h",regFile[31]);
        end
    end
    always@(negedge CLK)
    begin
        if(PCSrc == 3'b101)
        begin
        //$display("%h",PcAdd4);
            //$display("lala");
            regFile[rd] = PcAdd4+4;
            //$display("%h",regFile[31]);
        end
    end
endmodule

if_c_adventure

这个模块的作用就是用来在ID阶段判断是否需要跳转,对于bne,beq这样的指令我们就要判断是是否相等,对于和0比较大小从而决定是否跳转的指令,我们同样在这个模块去决定。

module if_c_adventure(A,B,Op,usigned,c_adventure);
    input [31:0] A,B;
    input [3:0] Op;
    input usigned;
    output c_adventure;

    wire less_res,less_v_res;
    wire unsigned_less_res,unsigned_less_v_res;
    wire less_equal_res;
    wire greater_equal_res;
    wire greater_res;
    wire eq_res;
    
	assign less_v_res = ($signed(A)<$signed(B)) ? 1:0;
	assign unsigned_less_v_res = A<B ? 1:0;
    assign less_res = $signed(A)<0 ? 1:0;
	//assign unsigned_less_res = A<0 ? 1:0;    
	assign less_equal_res = $signed(A)<=0 ? 1:0;
	assign greater_equal_res = $signed(A)>=0 ? 1:0;
	assign greater_res = $signed(A)>0 ? 1:0;
	assign eq_res = A==B?1:0;
	
	assign c_adventure = (~Op[3]&&~Op[2]&&~Op[1])?eq_res:(Op[2]==0 && Op[1]==0 && Op[0]==1) ? (usigned==1 ? unsigned_less_v_res: less_v_res):(Op[2]==1 ? (Op[0]==1 ? greater_equal_res : less_res):(Op[0]==1 ? greater_res : less_equal_res)); 	

endmodule

FU

冒险检测单元,会对数据冒险进行检测,传递出FwdA和FwdB,从而判断是否需要从EX/MEM,MEM/WB传值,还包括是否需要stall,stallstall,详细内容都在“写在前面”部分说明了。

module FU(ID_mfhi,ID_mflo,E_md_signal,E_RegWrite,E_WriteReg,E_MemtoReg,M_RegWrite,M_WriteReg,M_MemtoReg,ID_rs,ID_rt,ID_FwdA,ID_FwdB,ID_Op,ID_func,c_adventure,stall,stallstall);
    input [4:0] E_WriteReg,M_WriteReg,ID_rs,ID_rt;
    input  E_RegWrite,M_RegWrite,E_MemtoReg,M_MemtoReg,c_adventure,E_md_signal,ID_mfhi,ID_mflo;
    input [5:0] ID_Op,ID_func;
    output reg [2:0] ID_FwdA,ID_FwdB;
    output stall,stallstall;

    always@(E_WriteReg,M_WriteReg,E_RegWrite,M_RegWrite,ID_rs,ID_rt)begin
        ID_FwdA=3'b000;
        if((ID_rs==E_WriteReg)&(E_WriteReg!=0)&(E_RegWrite==1))begin
            ID_FwdA=3'b001;
        end 
        else begin
            if((ID_rs==M_WriteReg)&(M_WriteReg!=0)&(M_RegWrite==1))begin
                ID_FwdA=3'b010;
            end
        end
        if((ID_rs==E_WriteReg)&(E_WriteReg!=0)&(E_RegWrite==1)&&ID_mfhi)begin
            ID_FwdA=3'b100;
        end 
        else begin
            if((ID_rs==E_WriteReg)&(E_WriteReg!=0)&(E_RegWrite==1)&&ID_mflo)begin
                ID_FwdA=3'b101;
            end
        end
    end

    always@(E_WriteReg,M_WriteReg,E_RegWrite,M_RegWrite,ID_rs,ID_rt)begin
        ID_FwdB=3'b000;
        if((ID_rt==E_WriteReg)&(E_WriteReg!=0)&(E_RegWrite==1))begin
            ID_FwdB=3'b001;
        end 
        else begin
            if((ID_rt==M_WriteReg)&(M_WriteReg!=0)&(M_RegWrite==1))begin
                ID_FwdB=3'b010;
            end
        end
        if((ID_rt==E_WriteReg)&(E_WriteReg!=0)&(E_RegWrite==1)&&ID_mfhi)begin
            ID_FwdB=3'b100;
        end 
        else begin
            if((ID_rt==E_WriteReg)&(E_WriteReg!=0)&(E_RegWrite==1)&&ID_mflo)begin
                ID_FwdB=3'b101;
            end
        end
    end

    wire ID_beq=~ID_Op[5]&~ID_Op[4]&~ID_Op[3]&ID_Op[2]&~ID_Op[1]&~ID_Op[0];
    wire ID_bne=~ID_Op[5]&~ID_Op[4]&~ID_Op[3]&ID_Op[2]&~ID_Op[1]&ID_Op[0];
    wire ID_jalr=~ID_Op[5]&~ID_Op[4]&~ID_Op[3]&~ID_Op[2]&~ID_Op[1]&~ID_Op[0]&~ID_func[5]&~ID_func[4]&ID_func[3]&~ID_func[2]&~ID_func[1]&ID_func[0];
    
    //lw+add  add+beq(jalr) lw+x+beq
    assign stall=(((ID_rs==E_WriteReg)|(ID_rt==E_WriteReg))&(E_MemtoReg==1)&(E_WriteReg!=0)&(E_RegWrite==1))|((ID_beq | ID_bne | ID_jalr)&((ID_rs==E_WriteReg)|(ID_rt==E_WriteReg))&(E_WriteReg!=0)&(E_RegWrite==1))|(((ID_rs==M_WriteReg)|(ID_rt==M_WriteReg))&(M_MemtoReg==1)&(M_WriteReg!=0)&(M_RegWrite==1)&ID_beq);
    //lw+beq
    assign stallstall=((ID_rs==E_WriteReg)|(ID_rt==E_WriteReg))&(E_MemtoReg==1)&(E_WriteReg!=0)&(E_RegWrite==1)&ID_beq;

endmodule

REG_ID_EX

用来传递信号,注意要初始化,一开始就是没有初识化各个层间寄存器从而导致流水线跑不起来,然后再需要插nop的时候要对清空,从而插入一条空指令。其他的就是各种信号从ID传到EX。具体包括:ID_md_signal,ID_B_code,ID_sa,ID_RegDst,ID_RegWrite,ID_ALUXSrc,ID_ALUYSrc,ID_ALUControl,ID_md_control,ID_MemWrite,ID_MemtoReg,ID_WriteReg,ID_usigned,ID_Qa,ID_Qb,ID_ext32,ID_FwdA,ID_FwdB,ID_load_option,ID_save_option

module REG_ID_EX(ID_md_signal,ID_B_code,ID_sa,ID_RegDst,ID_RegWrite,ID_ALUXSrc,ID_ALUYSrc,ID_ALUControl,ID_md_control,ID_MemWrite,ID_MemtoReg,ID_WriteReg,ID_usigned,ID_Qa,ID_Qb,ID_ext32,ID_FwdA,ID_FwdB,ID_load_option,ID_save_option,Clk,Reset,
E_md_signal,E_B_code,E_sa,E_RegDst,E_RegWrite,E_ALUXSrc,E_ALUYSrc,E_ALUControl,E_md_control,E_MemWrite,E_MemtoReg,E_WriteReg,E_usigned,E_Qa,E_Qb,E_ext32,E_FwdA,E_FwdB,E_load_option,E_save_option,stall,stallstall);

    input [31:0] ID_Qa,ID_Qb,ID_ext32,ID_sa;
    input [4:0] ID_WriteReg;
    input [2:0] ID_FwdA,ID_FwdB;
    input [3:0] ID_ALUControl;
    input [2:0] ID_load_option,ID_md_control;
    input [1:0] ID_save_option;
    input Clk,Reset,stall,stallstall;
    input ID_RegDst,ID_RegWrite,ID_ALUYSrc,ID_MemWrite,ID_MemtoReg,ID_ALUXSrc,ID_usigned,ID_B_code,ID_md_signal;

    wire clr;
    assign clr=(~stall)&(~stallstall);

    output reg[31:0] E_Qa,E_Qb,E_ext32,E_sa;
    output reg[2:0] E_FwdA,E_FwdB;
    output reg[3:0] E_ALUControl;
    output reg[4:0] E_WriteReg;
    output reg[2:0] E_load_option,E_md_control;
    output reg[1:0] E_save_option;
    output reg E_RegDst,E_RegWrite,E_ALUYSrc,E_MemWrite,E_MemtoReg,E_ALUXSrc,E_usigned,E_B_code,E_md_signal;

    initial begin
        E_sa = 0;
        E_ALUControl = 0;
        E_ALUXSrc = 0;
        E_ALUYSrc = 0;
        E_ext32 = 0;
        E_FwdA = 0;
        E_FwdB = 0;
        E_MemtoReg = 0;
        E_MemWrite = 0;
        E_Qa = 0;
        E_Qb = 0;
        E_RegDst = 0;
        E_RegWrite = 0;
        E_WriteReg = 0;  
        E_usigned = 0;
        E_B_code = 0;
        E_load_option = 0;
        E_save_option = 0;
        E_md_control = 0;
        E_md_signal = 0;
    end

    always @(posedge Clk or negedge Reset)  
    begin  
    //$display("test?????????");
    if (clr==0) 
        begin  
            E_sa = 0;
            E_ALUControl = 0;
            E_ALUXSrc = 0;
            E_ALUYSrc = 0;
            E_ext32 = 0;
            E_FwdA = 0;
            E_FwdB = 0;
            E_MemtoReg = 0;
            E_MemWrite = 0;
            E_Qa = 0;
            E_Qb = 0;
            E_RegDst = 0;
            E_RegWrite = 0;
            E_WriteReg = 0;  
            E_usigned = 0;
            E_B_code = 0;
            E_load_option = 0;
            E_save_option = 0;
            E_md_control = 0;
            E_md_signal = 0;
        end  
    else   
        begin
            //$display("test");
            E_sa = ID_sa;
            E_ALUControl = ID_ALUControl;
            E_ALUXSrc = ID_ALUXSrc;
            E_ALUYSrc = ID_ALUYSrc;
            E_ext32 = ID_ext32;
            E_FwdA = ID_FwdA;
            E_FwdB = ID_FwdB;
            E_MemtoReg = ID_MemtoReg;
            E_MemWrite = ID_MemWrite;
            E_Qa = ID_Qa;
            E_Qb = ID_Qb;
            E_RegDst = ID_RegDst;
            E_RegWrite = ID_RegWrite;
            E_WriteReg = ID_WriteReg;  
            E_usigned = ID_usigned;
            E_B_code = ID_B_code;
            E_load_option = ID_load_option;
            E_save_option = ID_save_option;
            E_md_control = ID_md_control;
            E_md_signal = ID_md_signal;
        end  
    end

endmodule

EX阶段

ALU

这里就使用了我们之前布置作业写的ALU,包括了adder,shifter,aoxn,leg四个子模块用来计算加减法,移位,与或,比较大小。

module ADDER(input [31:0] A, input [31:0]B, input cin, output [31:0] s, output [0:0]cout);

	wire[31:0] g;
	wire[31:0] p;
	wire[31:0] c;

	assign g = A & B;
	assign p = A ^ B;

	assign c[0] = g[0] | (p[0] & cin);
	assign c[1] = g[1] | (p[1] & (g[0] | (p[0] & cin)));
	assign c[2] = g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))));
	assign c[3] = g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))));
	assign c[4] = g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))));
	assign c[5] = g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))));
	assign c[6] = g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))));
	assign c[7] = g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))));
	assign c[8] = g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))));
	assign c[9] = g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))));
	assign c[10] = g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))));
	assign c[11] = g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))));
	assign c[12] = g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))));
	assign c[13] = g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))));
	assign c[14] = g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))));
	assign c[15] = g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))));
	assign c[16] = g[16] | (p[16] & (g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))))));
	assign c[17] = g[17] | (p[17] & (g[16] | (p[16] & (g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))))))));
	assign c[18] = g[18] | (p[18] & (g[17] | (p[17] & (g[16] | (p[16] & (g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))))))))));
	assign c[19] = g[19] | (p[19] & (g[18] | (p[18] & (g[17] | (p[17] & (g[16] | (p[16] & (g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))))))))))));
	assign c[20] = g[20] | (p[20] & (g[19] | (p[19] & (g[18] | (p[18] & (g[17] | (p[17] & (g[16] | (p[16] & (g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))))))))))))));
	assign c[21] = g[21] | (p[21] & (g[20] | (p[20] & (g[19] | (p[19] & (g[18] | (p[18] & (g[17] | (p[17] & (g[16] | (p[16] & (g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))))))))))))))));
	assign c[22] = g[22] | (p[22] & (g[21] | (p[21] & (g[20] | (p[20] & (g[19] | (p[19] & (g[18] | (p[18] & (g[17] | (p[17] & (g[16] | (p[16] & (g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))))))))))))))))));
	assign c[23] = g[23] | (p[23] & (g[22] | (p[22] & (g[21] | (p[21] & (g[20] | (p[20] & (g[19] | (p[19] & (g[18] | (p[18] & (g[17] | (p[17] & (g[16] | (p[16] & (g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))))))))))))))))))));
	assign c[24] = g[24] | (p[24] & (g[23] | (p[23] & (g[22] | (p[22] & (g[21] | (p[21] & (g[20] | (p[20] & (g[19] | (p[19] & (g[18] | (p[18] & (g[17] | (p[17] & (g[16] | (p[16] & (g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))))))))))))))))))))));
	assign c[25] = g[25] | (p[25] & (g[24] | (p[24] & (g[23] | (p[23] & (g[22] | (p[22] & (g[21] | (p[21] & (g[20] | (p[20] & (g[19] | (p[19] & (g[18] | (p[18] & (g[17] | (p[17] & (g[16] | (p[16] & (g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))))))))))))))))))))))));
	assign c[26] = g[26] | (p[26] & (g[25] | (p[25] & (g[24] | (p[24] & (g[23] | (p[23] & (g[22] | (p[22] & (g[21] | (p[21] & (g[20] | (p[20] & (g[19] | (p[19] & (g[18] | (p[18] & (g[17] | (p[17] & (g[16] | (p[16] & (g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))))))))))))))))))))))))));
	assign c[27] = g[27] | (p[27] & (g[26] | (p[26] & (g[25] | (p[25] & (g[24] | (p[24] & (g[23] | (p[23] & (g[22] | (p[22] & (g[21] | (p[21] & (g[20] | (p[20] & (g[19] | (p[19] & (g[18] | (p[18] & (g[17] | (p[17] & (g[16] | (p[16] & (g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))))))))))))))))))))))))))));
	assign c[28] = g[28] | (p[28] & (g[27] | (p[27] & (g[26] | (p[26] & (g[25] | (p[25] & (g[24] | (p[24] & (g[23] | (p[23] & (g[22] | (p[22] & (g[21] | (p[21] & (g[20] | (p[20] & (g[19] | (p[19] & (g[18] | (p[18] & (g[17] | (p[17] & (g[16] | (p[16] & (g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))))))))))))))))))))))))))))));
	assign c[29] = g[29] | (p[29] & (g[28] | (p[28] & (g[27] | (p[27] & (g[26] | (p[26] & (g[25] | (p[25] & (g[24] | (p[24] & (g[23] | (p[23] & (g[22] | (p[22] & (g[21] | (p[21] & (g[20] | (p[20] & (g[19] | (p[19] & (g[18] | (p[18] & (g[17] | (p[17] & (g[16] | (p[16] & (g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))))))))))))))))))))))))))))))));
	assign c[30] = g[30] | (p[30] & (g[29] | (p[29] & (g[28] | (p[28] & (g[27] | (p[27] & (g[26] | (p[26] & (g[25] | (p[25] & (g[24] | (p[24] & (g[23] | (p[23] & (g[22] | (p[22] & (g[21] | (p[21] & (g[20] | (p[20] & (g[19] | (p[19] & (g[18] | (p[18] & (g[17] | (p[17] & (g[16] | (p[16] & (g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))));
	assign c[31] = g[31] | (p[31] & (g[30] | (p[30] & (g[29] | (p[29] & (g[28] | (p[28] & (g[27] | (p[27] & (g[26] | (p[26] & (g[25] | (p[25] & (g[24] | (p[24] & (g[23] | (p[23] & (g[22] | (p[22] & (g[21] | (p[21] & (g[20] | (p[20] & (g[19] | (p[19] & (g[18] | (p[18] & (g[17] | (p[17] & (g[16] | (p[16] & (g[15] | (p[15] & (g[14] | (p[14] & (g[13] | (p[13] & (g[12] | (p[12] & (g[11] | (p[11] & (g[10] | (p[10] & (g[9] | (p[9] & (g[8] | (p[8] & (g[7] | (p[7] & (g[6] | (p[6] & (g[5] | (p[5] & (g[4] | (p[4] & (g[3] | (p[3] & (g[2] | (p[2] & (g[1] | (p[1] & (g[0] | (p[0] & cin)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))));

	assign s[0] = p[0] ^ cin;
	assign s[1] = p[1] ^ c[0];
	assign s[2] = p[2] ^ c[1];
	assign s[3] = p[3] ^ c[2];
	assign s[4] = p[4] ^ c[3];
	assign s[5] = p[5] ^ c[4];
	assign s[6] = p[6] ^ c[5];
	assign s[7] = p[7] ^ c[6];
	assign s[8] = p[8] ^ c[7];
	assign s[9] = p[9] ^ c[8];
	assign s[10] = p[10] ^ c[9];
	assign s[11] = p[11] ^ c[10];
	assign s[12] = p[12] ^ c[11];
	assign s[13] = p[13] ^ c[12];
	assign s[14] = p[14] ^ c[13];
	assign s[15] = p[15] ^ c[14];
	assign s[16] = p[16] ^ c[15];
	assign s[17] = p[17] ^ c[16];
	assign s[18] = p[18] ^ c[17];
	assign s[19] = p[19] ^ c[18];
	assign s[20] = p[20] ^ c[19];
	assign s[21] = p[21] ^ c[20];
	assign s[22] = p[22] ^ c[21];
	assign s[23] = p[23] ^ c[22];
	assign s[24] = p[24] ^ c[23];
	assign s[25] = p[25] ^ c[24];
	assign s[26] = p[26] ^ c[25];
	assign s[27] = p[27] ^ c[26];
	assign s[28] = p[28] ^ c[27];
	assign s[29] = p[29] ^ c[28];
	assign s[30] = p[30] ^ c[29];
	assign s[31] = p[31] ^ c[30];
	
	assign cout[0] = c[31];
endmodule

module SHIFT(input [31:0] B, input[3:0] Op, input [31:0] A, input usigned,output [31:0] res);
	wire [31:0] left_shift;
	wire [31:0] right_shift;
	wire [31:0] aright_shift;

	assign left_shift = B << A;
	assign right_shift = B >> A;
	assign aright_shift = $signed(B)>>> A;

	assign res = (Op[3] == 1) ? left_shift:(usigned == 1 ? aright_shift : right_shift);
endmodule

module AOXN(input[31:0] A, input[31:0] B, input[3:0] Op, output [31:0] res);
	wire [31:0] and_res;
	wire [31:0] or_res;
	wire [31:0] xor_res;
	wire [31:0] nor_res;
	
    
	assign and_res = A & B;
	assign or_res = A | B;
	assign xor_res = A ^ B;
	assign nor_res = ~or_res;

	assign res = Op[2] == 0 ? (Op[0]==0 ? and_res : or_res) : (Op[0] == 0 ? xor_res : nor_res);
endmodule

module LEG(input[31:0] A, input[31:0] B, input[3:0] Op ,input usigned,output [31:0] res);
    wire less_res,less_v_res;
    wire unsigned_less_res,unsigned_less_v_res;
    wire less_equal_res;
    wire greater_equal_res;
    wire greater_res;
    wire eq_res;
    
	assign less_v_res = ($signed(A)<$signed(B)) ? 1:0;
	assign unsigned_less_v_res = A<B ? 1:0;
    assign less_res = $signed(A)<0 ? 1:0;
	//assign unsigned_less_res = A<0 ? 1:0;    
	assign less_equal_res = $signed(A)<=0 ? 1:0;
	assign greater_equal_res = $signed(A)>=0 ? 1:0;
	assign greater_res = $signed(A)>0 ? 1:0;
	assign eq_res = A==B?1:0;
	
	assign res = (~Op[3]&&~Op[2]&&~Op[1])?eq_res:(Op[2]==0 && Op[1]==0 && Op[0]==1) ? (usigned==1 ? unsigned_less_v_res: less_v_res):(Op[2]==1 ? (Op[0]==1 ? greater_equal_res : less_res):(Op[0]==1 ? greater_res : less_equal_res)); 	
endmodule

 module ALU(ReadData1, ReadData2,ALUOp,usigned,result,zero,over);
    input [31:0] ReadData1;
    input [31:0] ReadData2;
    input [3:0] ALUOp;
    input usigned;
    output [31:0] result;
    output zero;
    output over;
    
    wire [31:0] shift_res,aoxn_res,sum_res,leg_res;
    wire [31:0]b_input;
	wire cout;
	assign b_input = ALUOp[0] ? ~ReadData2 : ReadData2;
    
    SHIFT my_shift(ReadData2, ALUOp, ReadData1, usigned,shift_res);
	AOXN my_aoxn(ReadData1, ReadData2, ALUOp, aoxn_res);
	ADDER my_adder(ReadData1, b_input, ALUOp[0], sum_res, cout);
    LEG my_leg(ReadData1, ReadData2, ALUOp,usigned,leg_res);
    
	assign over = (~ALUOp[3] && ~ALUOp[2] && ~ALUOp[1]) ? ((usigned) & ((ReadData1[31] == b_input[31]) && (ReadData1[31] != sum_res[31]))) : 0;
    
    assign result = (ALUOp[3])?((~ALUOp[2]&&~ALUOp[1]&&ALUOp[0])?leg_res:shift_res):((~ALUOp[2]&&~ALUOp[1])? sum_res : (ALUOp[2]&&ALUOp[1])? (ReadData2<<16) :aoxn_res);
    
    assign zero = (result == 0) ? 1 : 0;
endmodule

REG_EX_MEM

将信号从EX传到MEM,注意最开始要初始化,具体包括

E_md_control,E_md_signal,E_res_hi,E_res_lo,E_RegWrite,E_RegDst,E_MemWrite,E_MemtoReg,E_WriteReg,E_Qb,E_ALUanswer,E_load_option,E_save_option

module REG_EX_MEM(E_md_control,E_md_signal,E_res_hi,E_res_lo,E_RegWrite,E_RegDst,E_MemWrite,E_MemtoReg,E_WriteReg,E_Qb,E_ALUanswer,E_load_option,E_save_option,Clk,Reset,
M_md_control,M_md_signal,M_res_hi,M_res_lo,M_RegWrite,M_RegDst,M_MemWrite,M_MemtoReg,M_WriteReg,M_Qb,M_ALUanswer,M_load_option,M_save_option);

    input [31:0] E_ALUanswer,E_Qb,E_res_hi,E_res_lo;
    input [4:0] E_WriteReg;
    input [2:0] E_load_option,E_md_control;
    input [1:0] E_save_option;
    input Clk,Reset,E_md_signal;
    input E_RegWrite,E_RegDst,E_MemWrite,E_MemtoReg;

    output reg[31:0] M_ALUanswer ,M_Qb,M_res_hi,M_res_lo;
    output reg[4:0] M_WriteReg;
    output reg[2:0] M_load_option,M_md_control;
    output reg[1:0] M_save_option;
    output reg M_RegWrite,M_RegDst,M_MemWrite,M_MemtoReg,M_md_signal;

    initial begin
        M_ALUanswer = 0;
        M_MemtoReg = 0;
        M_MemWrite = 0;
        M_Qb = 0;
        M_RegDst = 0;
        M_RegWrite = 0;
        M_WriteReg = 0;
        M_load_option = 0;
        M_save_option = 0;
        M_md_signal = 0;
        M_res_hi = 0;
        M_res_lo = 0;
        M_md_control = 0;
    end

    always @(posedge Clk or negedge Reset)  
    begin  
    if (!Reset) 
        begin  
            M_ALUanswer = 0;
            M_MemtoReg = 0;
            M_MemWrite = 0;
            M_Qb = 0;
            M_RegDst = 0;
            M_RegWrite = 0;
            M_WriteReg = 0;
            M_load_option = 0;
            M_save_option = 0;
            M_md_signal = 0;
            M_res_hi = 0;
            M_res_lo = 0;
            M_md_control = 0;
        end  
    else   
        begin
            M_ALUanswer = E_ALUanswer;
            M_MemtoReg = E_MemtoReg;
            M_MemWrite = E_MemWrite;
            M_Qb = E_Qb;
            M_RegDst = E_RegDst;
            M_RegWrite = E_RegWrite;
            M_WriteReg = E_WriteReg;
            M_load_option = E_load_option;
            M_save_option = E_save_option;
            M_md_signal = E_md_signal;
            M_res_hi = E_res_hi;
            M_res_lo = E_res_lo;
            M_md_control = E_md_control;
        end  
    end

endmodule

MEM阶段

为支持 sb,sh 指令,DM模块的接口规范更新如下。

信息号 方向 描述
A[13:2] Input DM的地址
BE[3:0] Input 4位字节使能,分别对应4个字节。 BE[x]为1:对应的WD中的第x字节数据有效 BE[x]为0:对应的WD中的第x字节数据无效
WD[31:0] Input 32位写入数据
RD[31:0] Output 32位输出数据
We Input 写使能
Clk Input 时钟

由于DM地址的低两位没有意义,所以可以用来传递额外的信息。由于BE的状态只有3 种,所以使用DM地址的低两位进行编码,并在输入DM模块前使用BE扩展进行解码,如图所示。
CPU设计之二——VerilogHDL 开发流水线处理器(支持42条指令)_第3张图片

save_to_BE

这个模块是为了控制save的类型而设计的,作用就是生成BE

module save_to_BE(save_option,BE);
    input [1:0]save_option;
    output [3:0]BE;
    
    wire sb = ~save_option[1]&save_option[0];
    wire sh = save_option[1]&~save_option[0];
    wire sw = ~save_option[1] & ~save_option[0];
    assign BE[0] = sb | sh | sw;
    assign BE[1] = sh | sw;
    assign BE[2] = sw;
    assign BE[3] = sw;    
endmodule 

dm_4k

进行load,save操作,分别用来存数据和读数据,需要注意的是对于字节的操作要求地址为4的倍数,对半字的操作地址为2的倍数,对于字节的操作没有要求,一开始花了很多时间才明白sb,sh等指令的效果。

module dm_4k(Addr,BE,Din,Dout,MemWrite,Clk);
    input[31:0]Din;
    input[31:0]Addr;
    input [3:0]BE;
    input Clk,MemWrite;
    output[31:0]Dout;
    reg[31:0]Ram[2047:0];
    
    assign Dout=Ram[Addr[12:2]];
    wire [7:0] Din_b;
    wire [15:0] Din_h;
    assign Din_b = Din[7:0];
    assign Din_h = Din[15:0];
    
    always@(Clk)begin
        if(MemWrite && BE[3]==1)begin
            Ram[Addr[12:2]]<=Din;
        end
        else if(MemWrite && BE[1]==1) begin
            if(Addr[1]==1)begin
                Ram[Addr[12:2]][31:16]<=Din_h;
            end
            else if (Addr[1]==0)begin
                Ram[Addr[12:2]][15:0]<=Din_h;            
            end
        end
        else if(MemWrite && BE[0]==1) begin
            if(Addr[1]==1)begin
                if(Addr[0]==1)begin
                    Ram[Addr[12:2]][31:24]<=Din_b;
                end
                else if (Addr[0]==0)begin
                    Ram[Addr[12:2]][23:16]<=Din_b;            
                end
            end
            else if (Addr[1]==0)begin
                if(Addr[0]==1)begin
                    Ram[Addr[12:2]][15:8]<=Din_b;
                end
                else if (Addr[0]==0)begin
                    Ram[Addr[12:2]][7:0]<=Din_b;            
                end   
            end
        end
    end

    integer i;
    initial begin
        for(i=0;i<2048;i=i+1)
            Ram[i]=0;
    end
endmodule

REG_MEM_WB

将信号从MEM传到WB,注意最开始要初始化,具体包括

M_md_control,M_md_signal,M_res_hi,M_res_lo,M_RegWrite,M_MemtoReg,M_ALUanswer,M_Dout,M_WriteReg,M_load_option

module REG_MEM_WB(M_md_control,M_md_signal,M_res_hi,M_res_lo,M_RegWrite,M_MemtoReg,M_ALUanswer,M_Dout,M_WriteReg,M_load_option,Clk,Reset,
W_md_control,W_md_signal,W_res_hi,W_res_lo,W_RegWrite,W_MemtoReg,W_ALUanswer,W_Dout,W_WriteReg,W_load_option);

    input [31:0] M_ALUanswer,M_Dout,M_res_hi,M_res_lo;
    input [4:0] M_WriteReg;
    input [2:0] M_load_option,M_md_control;
    input Clk,Reset,M_RegWrite,M_MemtoReg,M_md_signal;

    output reg[31:0] W_ALUanswer,W_Dout,W_res_hi,W_res_lo;
    output reg[4:0] W_WriteReg;
    output reg[2:0] W_load_option,W_md_control;
    output reg W_RegWrite,W_MemtoReg,W_md_signal;

    initial begin
        W_RegWrite = 0;
        W_MemtoReg = 0;
        W_ALUanswer = 0;
        W_Dout = 0;
        W_WriteReg = 0;
        W_load_option = 0;
        W_md_signal = 0;
        W_res_hi = 0;
        W_res_lo = 0;
        W_md_control = 0;
    end

    always @(posedge Clk or negedge Reset)  
    begin  
    if (!Reset) 
        begin  
            W_RegWrite = 0;
            W_MemtoReg = 0;
            W_ALUanswer = 0;
            W_Dout = 0;
            W_WriteReg = 0;
            W_load_option = 0;
            W_md_signal = 0;
            W_res_hi = 0;
            W_res_lo = 0;
            W_md_control = 0;
        end  
    else   
        begin
            W_RegWrite = M_RegWrite;
            W_MemtoReg = M_MemtoReg;
            W_ALUanswer = M_ALUanswer;
            W_Dout = M_Dout;
            W_WriteReg = M_WriteReg;
            W_load_option = M_load_option;
            W_md_signal = M_md_signal;
            W_res_hi = M_res_hi;
            W_res_lo = M_res_lo;
            W_md_control = M_md_control;
        end  
    end
endmodule

WB阶段

data_ext_load

对于dm中load出来的数据,我们要在WB阶段根据具体的load指令截取,要考虑是字节,半字,word,还要考虑扩展的时候是符号扩展还是零扩展

module data_ext_load(Dout,Addr,load_option,ext_Dout);
    input [31:0] Dout;
    input [31:0] Addr;
    input [2:0] load_option;
    output [31:0] ext_Dout;
    
    wire [31:0] lb,lbu,lh,lhu,lw;
    wire [23:0] e1 = {
     24{
     Dout[7]}};
    wire [23:0] e2 = {
     24{
     Dout[15]}};
    wire [23:0] e3 = {
     24{
     Dout[23]}};
    wire [23:0] e4 = {
     24{
     Dout[31]}};
    wire [15:0] e5 = {
     16{
     Dout[15]}};
    wire [15:0] e6 = {
     16{
     Dout[31]}};
    parameter z1 = 24'b0;
    parameter z2 = 16'b0;
    assign lb = (Addr[1]==1)?(Addr[0]==1?{
     e4, Dout[31:24]}:{
     e3, Dout[23:16]}):(Addr[0]==1?{
     e2, Dout[15:8]}:{
     e1, Dout[7:0]});
    assign lh = (Addr[1]==1)?{
     e6, Dout[31:16]}:{
     e5, Dout[15:0]};
    assign lbu = (Addr[1]==1)?(Addr[0]==1?{
     z1, Dout[31:24]}:{
     z1, Dout[23:16]}):(Addr[0]==1?{
     z1, Dout[15:8]}:{
     z1, Dout[7:0]});
    assign lhu = (Addr[1]==1)?{
     z2, Dout[31:16]}:{
     z2, Dout[15:0]};
    assign lw = Dout;
    
    MUX5X32_load i(lb,lbu,lh,lhu,lw,load_option,ext_Dout);

endmodule

最后因为多路选择器比较多,还有很多阶段都设计到符号扩展,移位操作,我将MUX,shift,extend三个模块最后说明,每一个模块前面都有注释去解释其作用

MUX

//决定下一条指令的是哪一条,也就是跳转到哪去,可以根据pcsrc的值在pc+4,b指令的地址,j指令的地址,jr指令返回的地址中间做选择
module MUX4X32_addr (PCAdd4, B, J, Jr, PCSrc, nextAddr);
    input [31:0] PCAdd4, B, J, Jr;
    input [2:0] PCSrc;
    output [31:0] nextAddr;

    function [31:0] select;
        input [31:0] PCAdd4, B, J, Jr;
        input [2:0] PCSrc;
        case(PCSrc)
            3'b000: select = PCAdd4;
            3'b001: select = B;
            3'b010: select = J;
            3'b011: select = J;
            3'b100: select = Jr;
            3'b101: select = Jr;
        endcase
    endfunction

    assign nextAddr = select (PCAdd4, B, J, Jr, PCSrc);
endmodule

//选择写回寄存器堆的地址,rd还是rt,1为rt
module MUX2X5(rd,rt,RegDst,Y);
    input [4:0] rd,rt;
    input RegDst;
    output [4:0] Y;

    function [4:0] select;
        input [4:0] rd,rt;
        input RegDst;
        case(RegDst)
            1:select=rt;
            0:select=rd;
        endcase
    endfunction
    assign Y=select(rd,rt,RegDst);
endmodule

//旁路选择器,从ID读出的值,EX/MEM返回的值,MEM/WB返回的值中选一个
module MUX5X32 (Q, EX_MEM, MEM_WB, res_hi,res_lo,S, Y);
    input [31:0] Q, EX_MEM, MEM_WB,res_hi,res_lo;
    input [2:0] S;
    output [31:0] Y;

    function [31:0] select;
        input [31:0] Q, EX_MEM, MEM_WB,res_hi,res_lo;
        input [2:0] S;
        case(S)
            3'b000: select = Q;
            3'b001: select = EX_MEM;
            3'b010: select = MEM_WB;
            3'b100: select = res_hi;
            3'b101: select = res_lo;
        endcase
    endfunction
    assign Y = select (Q, EX_MEM, MEM_WB,res_hi,res_lo, S);
endmodule

//选择ALU x端的来源,rs(已经通过旁路选择器的rs)还是sa
//选择ALU y端的来源,是来自于扩展为32位的立即数,还是rt(已经通过旁路选择器的rt)0的时候为扩展立即数
//选择写回寄存器堆的数据来源,是从dm里面来还是从alu里面来     
module MUX2X32(EXT,Qb_FORWARD,S,Y);
    input [31:0] EXT,Qb_FORWARD;
    input S;
    output [31:0] Y;

    function [31:0] select;
        input [31:0] EXT,Qb_FORWARD;
        input S;
        case(S)
            0:select=EXT;
            1:select=Qb_FORWARD;
        endcase
    endfunction
    assign Y=select(EXT,Qb_FORWARD,S);
endmodule

//ID阶段的旁路选择器,和EX阶段类似,至少少了一个选项,原因前面说过了
module MUX4X32_forward(ID_Q,ALU_OUT,res_hi,res_lo,Fwd,Y);
    input [31:0] ID_Q,ALU_OUT,res_hi,res_lo;
    input [2:0] Fwd;
    output [31:0] Y;
    
    function [31:0] select;
    input [31:0] ID_Q,ALU_OUT,res_hi,res_lo;
    input [1:0] Fwd;
    case(Fwd)
        3'b000: select = ID_Q;
        3'b001: select = ALU_OUT;
        3'b100: select = res_hi;
        3'b101: select = res_lo;
    endcase
    endfunction

    assign Y = select (ID_Q,ALU_OUT,res_hi,res_lo,Fwd);
endmodule

//选择load的数据,是lb,lbu,lh,lhu,lw中的哪一个,根据load_option选择
module MUX5X32_load(lb,lbu,lh,lhu,lw,load_option,ext_Dout);
    input [31:0] lb,lbu,lh,lhu,lw;
    input [2:0]load_option;
    output [31:0] ext_Dout;

    function [31:0] select;
        input [31:0]lb,lbu,lh,lhu,lw;
        input [2:0]load_option;
        case(load_option)
            3'b000:select=lw;
            3'b101:select=lb;
            3'b001:select=lbu;
            3'b111:select=lh;
            3'b011:select=lhu;
        endcase
    endfunction
    assign ext_Dout=select(lb,lbu,lh,lhu,lw,load_option);
endmodule

//选择最后的写回数据,考虑到乘除法
module MUX2X32_md(WriteData,res_hi,res_lo,md_signal,md_control,WriteData_final);
    input [31:0] WriteData,res_hi,res_lo;
    input md_signal;
    input [2:0] md_control;
    output [31:0] WriteData_final;

    assign WriteData_final = md_signal?((md_control[2]&md_control[1]&~md_control[0])?res_hi:res_lo):WriteData;
endmodule

Extend

进行扩展操作

//16位扩展为32位,se为0时零扩展
module EXT16T32 (X, Se, Y);
    input [15:0] X;
    input Se;
    output [31:0] Y;

    wire [31:0] E0, E1;
    wire [15:0] e = {
     16{
     X[15]}};
    parameter z = 16'b0;
    assign E0 = {
     z, X};
    assign E1 = {
     e, X};
    MUX2X32 i(E0, E1, Se, Y);
endmodule   

//扩展sa,从5位到32module EXT5T32 (sa, Y);
    input [4:0] sa;
    output [31:0] Y;

    wire [31:0] E;
    parameter z = 27'b0;
    assign Y = {
     z, sa};
endmodule   

shifter

进行移位操作,包括将立即数左移两位,还有取PC+4前4位+address26位+两个0

//左移两位
module SHIFTER32_L2(X,Sh);
    input [31:0] X;
    output [31:0] Sh;
    parameter z=2'b00;
    assign Sh={
     X[29:0],z};
endmodule

//形成B指令的地址
module SHIFTER_COMBINATION(X,PCADD4,Sh);
    input [25:0] X;
    input [31:0] PCADD4;
    output [31:0] Sh;
    parameter z=2'b00;
    assign Sh={
     PCADD4[31:28],X[25:0],z};
endmodule

mips

将所有的模块合起来

module mips(Clk,Reset);

input Clk,Reset;

wire [31:0] IF_NextAddr,IF_Addr,WriteData,WriteData_final,Alu_Y,Alu_X,E_NUM_X,E_NUM_Y,ID_ext32_L2,ID_B,ID_J,IF_PCAdd4,ID_PCAdd4,IF_Inst,ID_Qa,ID_Qb,ID_rs,ID_rt,ID_ext32,E_Alu_Out,ID_Inst,M_Dout,W_ext_Dout,E_res_hi,E_res_lo;
wire [31:0] E_Qa,E_Qb,M_Alu_Out,M_NUM_Y,W_Alu_Out,W_Dout,E_ext32,E_sa,ID_sa,M_res_hi,M_res_lo,W_res_hi,W_res_lo;
wire [4:0] W_WriteReg,M_WriteReg;
wire [4:0] E_WriteReg;
wire [4:0] ID_WriteReg;
wire [1:0] E_save_option;
wire [1:0] ID_save_option,M_save_option;
wire [2:0] E_FwdA,E_FwdB,ID_FwdB,ID_FwdA,ID_PCSrc,ID_load_option,E_load_option,M_load_option,W_load_option,ID_md_control,E_md_control,M_md_control,W_md_control;
wire [3:0] ID_ALUControl,E_ALUControl,BE;
wire Se,Z,E_RegDst,c_adventure,ID_RegDst,M_RegWrite,E_RegWrite,W_RegWrite,ID_RegWrite,E_ALUXSrc,E_ALUYSrc,M_RegDst,W_MemtoReg,ID_ALUXSrc,ID_ALUYSrc;
wire ID_MemtoReg,ID_MemWrite,ID_usigned;
wire E_MemtoReg,E_MemWrite,E_usigned,M_MemtoReg,M_MemWrite,ID_B_code,E_B_code,over,ID_md_signal,E_md_signal,M_md_signal,W_md_signal,start,busy;
wire stall,stallstall,Cout,ID_mfhi,ID_mflo;


//IF

MUX4X32_addr mux4x32(IF_PCAdd4,ID_B,ID_J,ID_rs,ID_PCSrc,IF_NextAddr);
PC PC(IF_NextAddr,Clk,Reset,IF_Addr,stall,stallstall);
PCAdd4 PCAdd4(IF_Addr,IF_PCAdd4);
im_4k im_4k(IF_Addr,IF_Inst);

REG_IF_ID REG_IF_ID(IF_PCAdd4,IF_Inst,Clk,Reset,ID_PCAdd4,ID_Inst,stall,stallstall);

//ID
CU CU(ID_mfhi,ID_mflo,ID_Inst,ID_Inst[5:0],ID_Inst[16],ID_RegDst,Se,ID_RegWrite,ID_ALUXSrc,ID_ALUYSrc,ID_ALUControl,ID_md_control,ID_MemWrite,ID_PCSrc,ID_MemtoReg,ID_load_option,ID_save_option,ID_usigned,c_adventure,ID_md_signal);

MUX2X5 mux2x5(ID_Inst[15:11],ID_Inst[20:16],ID_RegDst,ID_WriteReg);//选择写到rt还是rd0
RegisterFile RegisterFile(ID_Inst[25:21],ID_Inst[20:16],ID_Inst[15:11],WriteData_final,W_WriteReg,W_RegWrite,Clk,Reset,ID_Qa,ID_Qb,ID_PCSrc,ID_PCAdd4);

MUX4X32_forward mux2x32_ID_X(ID_Qa,M_Alu_Out,E_res_hi,E_res_lo,ID_FwdA,ID_rs);
MUX4X32_forward mux2x32_ID_Y(ID_Qb,M_Alu_Out,E_res_hi,E_res_lo,ID_FwdB,ID_rt);

if_c_adventure if_c_adventure(ID_rs,ID_rt,ID_ALUControl,ID_usigned,c_adventure);
FU FU(ID_mfhi,ID_mflo,E_md_signal,E_RegWrite,E_WriteReg,E_MemtoReg,M_RegWrite,M_WriteReg,M_MemtoReg,ID_Inst[25:21],ID_Inst[20:16],ID_FwdA,ID_FwdB,ID_Inst[31:26],ID_Inst[5:0],c_adventure,stall,stallstall);

EXT16T32 ext16t32(ID_Inst[15:0],Se,ID_ext32);
EXT5T32 ext5t32(ID_Inst[10:6],ID_sa);
SHIFTER32_L2 shifter(ID_ext32,ID_ext32_L2);
CLA_32 get_b_address(ID_PCAdd4,ID_ext32_L2,0,ID_B,Cout);

SHIFTER_COMBINATION get_j_address(ID_Inst[25:0],ID_PCAdd4,ID_J);//J指令的跳转地址

REG_ID_EX REG_ID_EX(ID_md_signal,ID_Inst[16],ID_sa,ID_RegDst,ID_RegWrite,ID_ALUXSrc,ID_ALUYSrc,ID_ALUControl,ID_md_control,ID_MemWrite,ID_MemtoReg,ID_WriteReg,ID_usigned,ID_Qa,ID_Qb,ID_ext32,ID_FwdA,ID_FwdB,ID_load_option,ID_save_option,Clk,Reset,
E_md_signal,E_B_code,E_sa,E_RegDst,E_RegWrite,E_ALUXSrc,E_ALUYSrc,E_ALUControl,E_md_control,E_MemWrite,E_MemtoReg,E_WriteReg,E_usigned,E_Qa,E_Qb,E_ext32,E_FwdA,E_FwdB,E_load_option,E_save_option,stall,stallstall);

//EX
MUX5X32 mux3x32_ex_X(E_Qa,M_Alu_Out,WriteData_final,M_res_hi,M_res_lo,E_FwdA,E_NUM_X);
MUX2X32 choose_alu_x(E_NUM_X,E_sa,E_ALUXSrc,Alu_X);

MUX5X32 mux3x32_ex_Y(E_Qb,M_Alu_Out,WriteData_final,M_res_hi,M_res_lo,E_FwdB,E_NUM_Y);
MUX2X32 choose_alu_y(E_ext32,E_NUM_Y,E_ALUYSrc,Alu_Y);
ALU ALU(Alu_X,Alu_Y,E_ALUControl,E_usigned,E_Alu_Out,Z,over);
md md(Clk,E_NUM_X,E_NUM_Y,E_md_control,start,busy,E_res_hi,E_res_lo);

REG_EX_MEM REG_EX_MEM(E_md_control,E_md_signal,E_res_hi,E_res_lo,E_RegWrite,E_RegDst,E_MemWrite,E_MemtoReg,E_WriteReg,E_NUM_Y,E_Alu_Out,E_load_option,E_save_option,Clk,Reset,
M_md_control,M_md_signal,M_res_hi,M_res_lo,M_RegWrite,M_RegDst,M_MemWrite,M_MemtoReg,M_WriteReg,M_NUM_Y,M_Alu_Out,M_load_option,M_save_option);

//MEM
save_to_BE save_to_BE(M_save_option,BE);
dm_4k dm_4k(M_Alu_Out,BE,M_NUM_Y,M_Dout,M_MemWrite,Clk);

REG_MEM_WB REG_MEM_WB(M_md_control,M_md_signal,M_res_hi,M_res_lo,M_RegWrite,M_MemtoReg,M_Alu_Out,M_Dout,M_WriteReg,M_load_option,Clk,Reset,
W_md_control,W_md_signal,W_res_hi,W_res_lo,W_RegWrite,W_MemtoReg,W_Alu_Out,W_Dout,W_WriteReg,W_load_option);

//WB
data_ext_load data_ext_load(W_Dout,W_Alu_Out,W_load_option,W_ext_Dout);
MUX2X32 mux2x322(W_Alu_Out,W_ext_Dout,W_MemtoReg,WriteData);
MUX2X32_md choose_md(WriteData,W_res_hi,W_res_lo,W_md_signal,W_md_control,WriteData_final);
endmodule

test

最后的模拟仿真文件,包括clk(不断进行翻转)和reset(最开始进行初始化之后就一直不需要初始化了)

module test;
    reg CLK;
    reg Reset;
    mips uut(
    .Clk(CLK),
    .Reset(Reset)
    );

    initial begin
        CLK = 0;
        Reset = 0;
        //#1
        CLK = !CLK;  // 下降沿,使PC先清零
        Reset = 1;  // 清除保持信号
        forever #2
        begin 
             CLK = !CLK;
        end
    end
endmodule

结果验证

我的验证方法

  • 查看I_addu等信号来判断执行的是哪一条指令
  • 查看Regrt等信号是否正确
  • 查看寄存器堆的读写情况可以很好地判断进行的操作是否正确
  • 查看各级输出以及传递情况

最终结果

这里给大家提供两个测试样例,test10.asm和test42.asm,一个是使用10条指令,一个是使用了42条指令的完整版,都已经上传到github了

test42.asm的Mars结果如图
CPU设计之二——VerilogHDL 开发流水线处理器(支持42条指令)_第4张图片

可以在示波器中查看dm_4k中Ram的值来判断是否正确,这里给出我从0x394(寄存器221)往下的部分值的截图

CPU设计之二——VerilogHDL 开发流水线处理器(支持42条指令)_第5张图片
可以发现结果相同,至此42条指令实验结束。

建议大家可以利用Mars工具先从简单的指令开始测试,相关内容我在单周期设计的时候已经讲过了

实验总结

  1. CU文件出现过几次错误,比如没有用BE来判断save指令,然后在改的时候就忘记CU里面修改了,然后就debug了好久,所以一定要统一自己的excel表和CU.v
  2. 最开始流水线一直跑不起来,是因为层间寄存器没有初始化
  3. 一开始没有写ID阶段的旁路选择器,然后测试样例一直不对,找了好久才发现问题
  4. 一开始没有想到jalr也需要在ID进行转发
  5. 像sb,sh等指令的操作一开始没有搞明白,然后就按照自己的一开始理解的写了,但是后来一直不对,就从头好好地又看了一遍,然后才明白的,花了很多时间,也对偏移量有了更深的理解
  6. 整个实验随着后面越写越多就开始乱了,比如在mips.v里面找一个变量要找半天,所以以后代码规范需要进一步提升

单周期和流水线的实现难度是完全不同的,建议大家可以先实现支持10条指令的流水线CPU 设计,之后扩展到42条其实就不算太难,很多重复性工作,最后还剩下8条和乘除法相关的指令,在下一篇博客会更新。如有疏漏之处,希望大家指出,也欢迎大家一起讨论交流~

你可能感兴趣的:(Verilog,现代处理器设计,Modern,Processor)