7.1项目的模块设计采用的最简单的verilog方式,乘法和除法都是直接使用的是
*
和/
,也就是说这个运算器可能无法用使用在一般的小型FPGA中,大型的FPGA设计需要调用专门的硬件计算IP核。设计计划
三周期乘法算子
模块设计:
module three_cycle_mult(A, B, clk, rst_n, start, done_mult, result_mult); input [7:0] A; input [7:0] B; input clk; input rst_n; input start; output reg done_mult; output reg[15:0] result_mult; reg [7:0] a_int; reg [7:0] b_int; reg [15:0] mult1; reg done2; reg done1; reg done_mult_int; //multiplier always @(posedge clk or negedge rst_n) begin if (!rst_n) begin done_mult_int <= 1'b0; done2 <= 1'b0; done1 <= 1'b0; a_int <= 8'd0; b_int <= 8'd0; mult1 <= 16'd0; result_mult <= 16'd0; end else begin a_int <= A; b_int <= B; mult1 <= a_int * b_int; result_mult <= mult1; done2 <= start & ((~done_mult_int)); done1 <= done2 & ((~done_mult_int)); done_mult_int <= done1 & ((~done_mult_int)); end end assign done_mult = done_mult_int; endmodule
采用最简单的乘法计算方式。
三周期除法算子
模块设计:
module three_cycle_div(A, B, clk, rst_n, start, done_div, result_div); input [7:0] A; input [7:0] B; input clk; input rst_n; input start; output reg done_div; output reg[15:0] result_div; reg [7:0] a_int; reg [7:0] b_int; reg [15:0] div1; reg done2; reg done1; reg done_div_int; //divider always @(posedge clk or negedge rst_n) begin if (!rst_n) begin done_div_int <= 1'b0; done2 <= 1'b0; done1 <= 1'b0; a_int <= 8'd0; b_int <= 8'd0; div1 <= 16'd0; result_div <= 16'd0; end else begin a_int <= A; b_int <= B; if(b_int == 0) div1 <= 'h0; else div1 <= a_int / b_int; result_div <= div1; done2 <= start & ((~done_div_int)); done1 <= done2 & ((~done_div_int)); done_div_int <= done1 & ((~done_div_int)); end end assign done_div = done_div_int; endmodule
采用最简单的除法计算方式。
验证计划
本设计需要考虑两部分测试用例,一部分用例是对运算器操作数和运算指令随机生成的测试用例;另一部分用例是为了满足覆盖率而编写的直接测试用例。
总线接口
运算总线接口
`ifndef ALU_INTERFACE `define ALU_INTERFACE interface alu_interface(input wire clk); import alu_pkg::*; parameter tsu = 1ps; parameter tco = 0ps; logic[7:0] A; logic[7:0] B; logic[2:0] op; logic start; logic done; logic [15:0] result; logic rst_n; clocking drv@(posedge clk); output #tco A; output #tco B; output #tco op; output #tco start; input #tsu done; input #tsu result; endclocking clocking mon@(posedge clk); input #tsu A; input #tsu B; input #tsu op; input #tsu start; input #tsu done; input #tsu result; endclocking task send_op(input transaction req, output bit[15:0] alu_result); case(req.op.name()) "no_op": begin @(drv); drv.op <= enum2op(req.op); drv.start <= 1'b1; @(drv); drv.start <= 1'b0; @(drv); alu_result = drv.result; end "rst_op": begin @(drv); drv.op <= enum2op(req.op); rst_n <= 0; drv.start <= 1'b1; @(drv); rst_n <= 1; drv.start <= 1'b0; @(drv); alu_result = drv.result; end "mul_op","div_op": begin @(drv); drv.op <= enum2op(req.op); drv.A <= req.A; drv.B <= req.B; drv.start <= 1'b1; repeat(2) begin @(drv); end @(drv); alu_result = drv.result; @(drv); drv.start <= 1'b0; end default: begin @(drv); drv.op <= enum2op(req.op); drv.A <= req.A; drv.B <= req.B; drv.start <= 1'b1; @(drv); alu_result = drv.result; @(drv); drv.start <= 1'b0; end endcase endtask task init(); start <= 0; A <= 'dx; B <= 'dx; op <= 'd0; endtask function bit[2:0] enum2op(operation_t op); case(op) no_op : return 3'b000; add_op : return 3'b001; and_op : return 3'b010; xor_op : return 3'b011; mul_op : return 3'b100; div_op : return 3'b101; rst_op : return 3'b111; endcase endfunction endinterface `endif
这里着重讲一下里面的时钟模块:
clocking drv@(posedge clk); output #tco A; output #tco B; output #tco op; output #tco start; input #tsu done; input #tsu result; endclocking clocking mon@(posedge clk); input #tsu A; input #tsu B; input #tsu op; input #tsu start; input #tsu done; input #tsu result; endclocking
在这里,
clocking
块drv
在时钟clk
的上升沿之后0ps(即时钟边沿时刻)更新信号A
,而clocking
块mon
在同个时钟边沿之后1ps更新A
,这实际上代表了两个不同的操作视角,而不是说信号A
会被更新两次。drv
角度(驱动):drv
块中的output #0 A
意味着在时钟边沿瞬间(或极其接近边沿的时刻,取决于具体实现和仿真器解析),信号A
的值会被驱动到相应网络。这是信号改变发生的时间点。
mon
角度(监测):而mon
块中的input #1 A
说明在同一个时钟边沿之后1ps,我们开始采样信号A
的新值。这个1ps的延时是为了确保在采样时,信号已经稳定,避免了由于信号传播延迟导致的不稳定读取。
所以,这不是说
A
信号本身被物理地更新两次,而是说明了在仿真环境中,我们如何控制对信号的操作时间以反映真实的硬件行为。驱动操作定义了信号何时改变,而监视操作定义了我们何时认为这个改变是可靠的并可以安全读取。在实际的硬件中,信号改变是瞬时的,而仿真中的这些定义帮助我们精确地模拟这一过程,确保仿真结果的正确性。寄存器总线接口
`ifndef ALU_REG_INTERFACE `define ALU_REG_INTERFACE interface alu_reg_interface(input wire clk, input wire rst_n); import alu_pkg::*; parameter tsu = 1ps; parameter tco = 0ps; logic vld; logic op; logic [15:0] wr_data; logic [15:0] addr; logic [15:0] rd_data; clocking drv@(posedge clk); output #tco vld; output #tco op; output #tco wr_data; output #tco addr; input #tsu rd_data; endclocking clocking mon@(posedge clk); input #tsu vld; input #tsu op; input #tsu wr_data; input #tsu addr; input #tsu rd_data; endclocking task send_op(input reg_transaction req, output bit[15:0] o_rd_data); //drive vld cmd begin @(drv); drv.vld <= 1'b1; case(req.op.name()) "reg_wr": drv.op <= 1'b1; "reg_rd": drv.op <= 1'b0; endcase drv.addr <= req.addr; case(req.op.name()) "reg_rd": drv.wr_data <= 16'h0; "reg_wr": drv.wr_data <= req.wr_data; endcase end //drive idle cmd begin @(drv); drv.vld <= 1'b0; drv.op <= 1'b0; drv.addr <= 16'h0; drv.wr_data <= 16'h0; end //monitor rd data begin @(drv); if(req.op.name()=="reg_rd") o_rd_data = drv.rd_data; end endtask task init(); vld <= 0; op <= 'dx; wr_data <= 'dx; addr <= 'dx; endtask function reg_operation_t op2enum(); case(op) 1'b0 : return reg_rd; 1'b1 : return reg_wr; default : $fatal("Illegal operation on reg interface"); endcase endfunction function bit[15:0] peek(string reg_name); case(reg_name) "cfg_cfg_ctrl" : return $root.top.DUT.i_demo_reg_slave.cfg_cfg_ctrl.reg_field; "sta_sta_status" : return $root.top.DUT.i_demo_reg_slave.sta_sta_status.reg_field; "cnt_cnt_operation": return $root.top.DUT.i_demo_reg_slave.cnt_cnt_operation.reg_field; "int_int_interrupt": return $root.top.DUT.i_demo_reg_slave.int_int_interrupt.reg_field; endcase endfunction endinterface `endif