Verify backend on Verilog simulator¶
Until now, we have developed an LLVM backend capable of compiling C or assembly code, as illustrated in the white part of Fig. 60. If the program does not contain global variables, the ELF object file can be dumped to a hex file using the following command:
llvm-objdump -d
This functionality was completed in Chapter ELF Support.
![digraph G {
rankdir=LR;
"Verilog machine" [style=filled, color=gray];
"clang" -> "llc" [label="IR"];
"llvm backend asm parser" -> "llc" [label="asm"];
"llc" -> "llvm-objdump -d" [label="obj"];
"llvm-objdump -d" -> "Verilog machine" [label="hex"];
// label = "Figure: Cpu0 backend without linker";
}](_images/graphviz-7f3372150b115295375f540a388a53ddb15b4fca.png)
Fig. 60 Cpu0 backend without linker¶
This chapter implements the Cpu0 instructions using the Verilog language, as represented by the gray part in the figure above.
With this Verilog-based machine, we can execute the hex program generated by the LLVM backend on the Cpu0 Verilog simulator running on a PC. This allows us to observe and verify the execution results of Cpu0 instructions directly on the hardware model.
Create Verilog Simulator of Cpu0¶
Verilog is an IEEE-standard language widely used in IC design. There are many books and free online resources available for learning Verilog [1] [2] [3] [4] [5].
Verilog is also known as Verilog HDL (Hardware Description Language), not to be confused with VHDL, which serves the same purpose but is a competing language [6].
An example implementation, lbdex/verilog/cpu0.v
, contains the Cpu0 processor
design written in Verilog. As described in Appendix A, we have installed the
Icarus Verilog tool on both iMac and Linux systems. The cpu0.v
design is
relatively simple, with only a few hundred lines of code in total.
Although this implementation does not include pipelining, it simulates delay
slots (via the SIMULATE_DELAY_SLOT
section of the code) to accurately
estimate pipeline machine cycles.
Verilog has a C-like syntax, and since this book focuses on compiler
implementation, we present the cpu0.v
code and the build commands below
without an in-depth explanation. We expect that readers with some patience
and curiosity will be able to understand the Verilog code.
Cpu0 supports memory-mapped I/O, one of the two primary I/O models in
computer architecture (the other being instruction-based I/O). Cpu0 maps the
output port to memory address 0x80000
. When executing the instruction:
st $ra, cx($rb)
where cx($rb)
equals 0x80000
, the Cpu0 processor outputs the content to
that I/O port, as demonstrated below.
ST : begin
...
if (R[b]+c16 == `IOADDR) begin
outw(R[a]);
lbdex/verilog/cpu0.v
// https://www.francisz.cn/download/IEEE_Standard_1800-2012%20SystemVerilog.pdf
// configuable value below
`define SIMULATE_DELAY_SLOT
// cpu032I memory limit, jsub:24-bit
`define MEMSIZE 'h1000000
`define MEMEMPTY 8'hFF
`define NULL 8'h00
`define IOADDR 'hff000000 // IO mapping address
`define TIMEOUT #3000000000
// Operand width
`define INT32 2'b11 // 32 bits
`define INT24 2'b10 // 24 bits
`define INT16 2'b01 // 16 bits
`define BYTE 2'b00 // 8 bits
`define EXE 3'b000
`define RESET 3'b001
`define ABORT 3'b010
`define IRQ 3'b011
`define ERROR 3'b100
// Reference web: http://ccckmit.wikidot.com/ocs:cpu0
module cpu0(input clock, reset, input [2:0] itype, output reg [2:0] tick,
output reg [31:0] ir, pc, mar, mdr, inout [31:0] dbus,
output reg m_en, m_rw, output reg [1:0] m_size,
input cfg);
reg signed [31:0] R [0:15];
reg signed [31:0] C0R [0:1]; // co-processor 0 register
// High and Low part of 64 bit result
reg [7:0] op;
reg [3:0] a, b, c;
reg [4:0] c5;
reg signed [31:0] c12, c16, c24, Ra, Rb, Rc, pc0; // pc0: instruction pc
reg [31:0] uc16, URa, URb, URc, HI, LO, CF, tmp;
reg [63:0] cycles;
// register name
`define SP R[13] // Stack Pointer
`define LR R[14] // Link Register
`define SW R[15] // Status Word
// C0 register name
`define PC C0R[0] // Program Counter
`define EPC C0R[1] // exception PC value
// SW Flage
`define I2 `SW[16] // Hardware Interrupt 1, IO1 interrupt, status,
// 1: in interrupt
`define I1 `SW[15] // Hardware Interrupt 0, timer interrupt, status,
// 1: in interrupt
`define I0 `SW[14] // Software interrupt, status, 1: in interrupt
`define I `SW[13] // Interrupt, 1: in interrupt
`define I2E `SW[12] // Hardware Interrupt 1, IO1 interrupt, Enable
`define I1E `SW[11] // Hardware Interrupt 0, timer interrupt, Enable
`define I0E `SW[10] // Software Interrupt Enable
`define IE `SW[9] // Interrupt Enable
`define M `SW[8:6] // Mode bits, itype
`define D `SW[5] // Debug Trace
`define V `SW[3] // Overflow
`define C `SW[2] // Carry
`define Z `SW[1] // Zero
`define N `SW[0] // Negative flag
`define LE CF[0] // Endian bit, Big Endian:0, Little Endian:1
// Instruction Opcode
parameter [7:0] NOP=8'h00,LD=8'h01,ST=8'h02,LB=8'h03,LBu=8'h04,SB=8'h05,
LH=8'h06,LHu=8'h07,SH=8'h08,ADDiu=8'h09,MOVZ=8'h0A,MOVN=8'h0B,ANDi=8'h0C,
ORi=8'h0D,XORi=8'h0E,LUi=8'h0F,
ADDu=8'h11,SUBu=8'h12,ADD=8'h13,SUB=8'h14,CLZ=8'h15,CLO=8'h16,MUL=8'h17,
AND=8'h18,OR=8'h19,XOR=8'h1A,NOR=8'h1B,
ROL=8'h1C,ROR=8'h1D,SHL=8'h1E,SHR=8'h1F,
SRA=8'h20,SRAV=8'h21,SHLV=8'h22,SHRV=8'h23,ROLV=8'h24,RORV=8'h25,
`ifdef CPU0II
SLTi=8'h26,SLTiu=8'h27, SLT=8'h28,SLTu=8'h29,
`endif
CMP=8'h2A,
CMPu=8'h2B,
JEQ=8'h30,JNE=8'h31,JLT=8'h32,JGT=8'h33,JLE=8'h34,JGE=8'h35,
JMP=8'h36,
`ifdef CPU0II
BEQ=8'h37,BNE=8'h38,
`endif
JALR=8'h39,BAL=8'h3A,JSUB=8'h3B,RET=8'h3C,
MULT=8'h41,MULTu=8'h42,DIV=8'h43,DIVu=8'h44,
MFHI=8'h46,MFLO=8'h47,MTHI=8'h48,MTLO=8'h49,
MFC0=8'h50,MTC0=8'h51,C0MOV=8'h52;
reg [0:0] inExe = 0;
reg [2:0] state, next_state;
reg [2:0] st_taskInt, ns_taskInt;
parameter Reset=3'h0, Fetch=3'h1, Decode=3'h2, Execute=3'h3, MemAccess=3'h4,
WriteBack=3'h5;
integer i;
`ifdef SIMULATE_DELAY_SLOT
reg [0:0] nextInstIsDelaySlot;
reg [0:0] isDelaySlot;
reg signed [31:0] delaySlotNextPC;
`endif
//transform data from the memory to little-endian form
task changeEndian(input [31:0] value, output [31:0] changeEndian); begin
changeEndian = {value[7:0], value[15:8], value[23:16], value[31:24]};
end endtask
// Read Memory Word
task memReadStart(input [31:0] addr, input [1:0] size); begin
mar = addr; // read(m[addr])
m_rw = 1; // Access Mode: read
m_en = 1; // Enable read
m_size = size;
end endtask
// Read Memory Finish, get data
task memReadEnd(output [31:0] data); begin
mdr = dbus; // get momory, dbus = m[addr]
data = mdr; // return to data
m_en = 0; // read complete
end endtask
// Write memory -- addr: address to write, data: date to write
task memWriteStart(input [31:0] addr, input [31:0] data, input [1:0] size);
begin
mar = addr; // write(m[addr], data)
mdr = data;
m_rw = 0; // access mode: write
m_en = 1; // Enable write
m_size = size;
end endtask
task memWriteEnd; begin // Write Memory Finish
m_en = 0; // write complete
end endtask
task regSet(input [3:0] i, input [31:0] data); begin
if (i != 0) R[i] = data;
end endtask
task C0regSet(input [3:0] i, input [31:0] data); begin
if (i < 2) C0R[i] = data;
end endtask
task PCSet(input [31:0] data); begin
`ifdef SIMULATE_DELAY_SLOT
nextInstIsDelaySlot = 1;
delaySlotNextPC = data;
`else
`PC = data;
`endif
end endtask
task retValSet(input [3:0] i, input [31:0] data); begin
if (i != 0)
`ifdef SIMULATE_DELAY_SLOT
R[i] = data + 4;
`else
R[i] = data;
`endif
end endtask
task regHILOSet(input [31:0] data1, input [31:0] data2); begin
HI = data1;
LO = data2;
end endtask
// output a word to Output port (equal to display the word to terminal)
task outw(input [31:0] data); begin
if (`LE) begin // Little Endian
changeEndian(data, data);
end
if (data[7:0] != 8'h00) begin
$write("%c", data[7:0]);
if (data[15:8] != 8'h00)
$write("%c", data[15:8]);
if (data[23:16] != 8'h00)
$write("%c", data[23:16]);
if (data[31:24] != 8'h00)
$write("%c", data[31:24]);
end
end endtask
// output a character (a byte)
task outc(input [7:0] data); begin
$write("%c", data);
end endtask
task taskInterrupt(input [2:0] iMode); begin
if (inExe == 0) begin
case (iMode)
`RESET: begin
`PC = 0; tick = 0; R[0] = 0; `SW = 0; `LR = -1;
`IE = 0; `I0E = 1; `I1E = 1; `I2E = 1;
`I = 0; `I0 = 0; `I1 = 0; `I2 = 0; inExe = 1;
`LE = cfg;
cycles = 0;
end
`ABORT: begin `PC = 4; end
`IRQ: begin `PC = 8; `IE = 0; inExe = 1; end
`ERROR: begin `PC = 12; end
endcase
end
$display("taskInterrupt(%3b)", iMode);
end endtask
task taskExecute; begin
tick = tick+1;
case (state)
Fetch: begin // Tick 1 : instruction fetch, throw PC to address bus,
// memory.read(m[PC])
memReadStart(`PC, `INT32);
pc0 = `PC;
`ifdef SIMULATE_DELAY_SLOT
if (nextInstIsDelaySlot == 1) begin
isDelaySlot = 1;
nextInstIsDelaySlot = 0;
`PC = delaySlotNextPC;
end
else begin
if (isDelaySlot == 1) isDelaySlot = 0;
`PC = `PC+4;
end
`else
`PC = `PC+4;
`endif
next_state = Decode;
end
Decode: begin // Tick 2 : instruction decode, ir = m[PC]
memReadEnd(ir); // IR = dbus = m[PC]
{op,a,b,c} = ir[31:12];
c24 = $signed(ir[23:0]);
c16 = $signed(ir[15:0]);
uc16 = ir[15:0];
c12 = $signed(ir[11:0]);
c5 = ir[4:0];
Ra = R[a];
Rb = R[b];
Rc = R[c];
URa = R[a];
URb = R[b];
URc = R[c];
next_state = Execute;
end
Execute: begin // Tick 3 : instruction execution
case (op)
NOP: ;
// load and store instructions
LD: memReadStart(Rb+c16, `INT32); // LD Ra,[Rb+Cx]; Ra<=[Rb+Cx]
ST: memWriteStart(Rb+c16, Ra, `INT32); // ST Ra,[Rb+Cx]; Ra=>[Rb+Cx]
// LB Ra,[Rb+Cx]; Ra<=(byte)[Rb+Cx]
LB: memReadStart(Rb+c16, `BYTE);
// LBu Ra,[Rb+Cx]; Ra<=(byte)[Rb+Cx]
LBu: memReadStart(Rb+c16, `BYTE);
// SB Ra,[Rb+Cx]; Ra=>(byte)[Rb+Cx]
SB: memWriteStart(Rb+c16, Ra, `BYTE);
LH: memReadStart(Rb+c16, `INT16); // LH Ra,[Rb+Cx]; Ra<=(2bytes)[Rb+Cx]
LHu: memReadStart(Rb+c16, `INT16); // LHu Ra,[Rb+Cx]; Ra<=(2bytes)[Rb+Cx]
// SH Ra,[Rb+Cx]; Ra=>(2bytes)[Rb+Cx]
SH: memWriteStart(Rb+c16, Ra, `INT16);
// Conditional move
MOVZ: if (Rc==0) regSet(a, Rb); // move if Rc equal to 0
MOVN: if (Rc!=0) regSet(a, Rb); // move if Rc not equal to 0
// Mathematic
ADDiu: regSet(a, Rb+c16); // ADDiu Ra, Rb+Cx; Ra<=Rb+Cx
CMP: begin
if (Rb < Rc) `N=1; else `N=0;
// `N=(Rb-Rc<0); // why not work for bash make.sh cpu032I el Makefile.builtins?
`Z=(Rb-Rc==0);
end // CMP Rb, Rc; SW=(Rb >=< Rc)
CMPu: begin
if (URb < URc) `N=1; else `N=0;
`Z=(URb-URc==0);
end // CMPu URb, URc; SW=(URb >=< URc)
ADDu: regSet(a, Rb+Rc); // ADDu Ra,Rb,Rc; Ra<=Rb+Rc
ADD: begin regSet(a, Rb+Rc); if (a < Rb) `V = 1; else `V = 0;
if (`V) begin `I0 = 1; `I = 1; end
end
// ADD Ra,Rb,Rc; Ra<=Rb+Rc
SUBu: regSet(a, Rb-Rc); // SUBu Ra,Rb,Rc; Ra<=Rb-Rc
SUB: begin regSet(a, Rb-Rc); if (Rb < 0 && Rc > 0 && a >= 0)
`V = 1; else `V =0;
if (`V) begin `I0 = 1; `I = 1; end
end // SUB Ra,Rb,Rc; Ra<=Rb-Rc
CLZ: begin
for (i=0; (i<32)&&((Rb&32'h80000000)==32'h00000000); i=i+1) begin
Rb=Rb<<1;
end
regSet(a, i);
end
CLO: begin
for (i=0; (i<32)&&((Rb&32'h80000000)==32'h80000000); i=i+1) begin
Rb=Rb<<1;
end
regSet(a, i);
end
MUL: regSet(a, Rb*Rc); // MUL Ra,Rb,Rc; Ra<=Rb*Rc
DIVu: regHILOSet(URa%URb, URa/URb); // DIVu URa,URb; HI<=URa%URb;
// LO<=URa/URb
// without exception overflow
DIV: begin regHILOSet(Ra%Rb, Ra/Rb);
if ((Ra < 0 && Rb < 0) || (Ra == 0)) `V = 1;
else `V =0; end // DIV Ra,Rb; HI<=Ra%Rb; LO<=Ra/Rb; With overflow
AND: regSet(a, Rb&Rc); // AND Ra,Rb,Rc; Ra<=(Rb and Rc)
ANDi: regSet(a, Rb&uc16); // ANDi Ra,Rb,c16; Ra<=(Rb and c16)
OR: regSet(a, Rb|Rc); // OR Ra,Rb,Rc; Ra<=(Rb or Rc)
ORi: regSet(a, Rb|uc16); // ORi Ra,Rb,c16; Ra<=(Rb or c16)
XOR: regSet(a, Rb^Rc); // XOR Ra,Rb,Rc; Ra<=(Rb xor Rc)
NOR: regSet(a, ~(Rb|Rc)); // NOR Ra,Rb,Rc; Ra<=(Rb nor Rc)
XORi: regSet(a, Rb^uc16); // XORi Ra,Rb,c16; Ra<=(Rb xor c16)
LUi: regSet(a, uc16<<16);
SHL: regSet(a, Rb<<c5); // Shift Left; SHL Ra,Rb,Cx; Ra<=(Rb << Cx)
SRA: regSet(a, (Rb>>>c5)); // Shift Right with signed bit fill;
// https://stackoverflow.com/questions/39911655/how-to-synthesize-hardware-for-sra-instruction
SHR: regSet(a, Rb>>c5); // Shift Right with 0 fill;
// SHR Ra,Rb,Cx; Ra<=(Rb >> Cx)
SHLV: regSet(a, Rb<<Rc); // Shift Left; SHLV Ra,Rb,Rc; Ra<=(Rb << Rc)
SRAV: regSet(a, (Rb>>>Rc)); // Shift Right with signed bit fill;
SHRV: regSet(a, Rb>>Rc); // Shift Right with 0 fill;
// SHRV Ra,Rb,Rc; Ra<=(Rb >> Rc)
ROL: regSet(a, (Rb<<c5)|(Rb>>(32-c5))); // Rotate Left;
ROR: regSet(a, (Rb>>c5)|(Rb<<(32-c5))); // Rotate Right;
ROLV: begin // Can set Rc to -32<=Rc<=32 more efficently.
while (Rc < -32) Rc=Rc+32;
while (Rc > 32) Rc=Rc-32;
regSet(a, (Rb<<Rc)|(Rb>>(32-Rc))); // Rotate Left;
end
RORV: begin
while (Rc < -32) Rc=Rc+32;
while (Rc > 32) Rc=Rc-32;
regSet(a, (Rb>>Rc)|(Rb<<(32-Rc))); // Rotate Right;
end
MFLO: regSet(a, LO); // MFLO Ra; Ra<=LO
MFHI: regSet(a, HI); // MFHI Ra; Ra<=HI
MTLO: LO = Ra; // MTLO Ra; LO<=Ra
MTHI: HI = Ra; // MTHI Ra; HI<=Ra
MULT: {HI, LO}=Ra*Rb; // MULT Ra,Rb; HI<=((Ra*Rb)>>32);
// LO<=((Ra*Rb) and 0x00000000ffffffff);
// with exception overflow
MULTu: {HI, LO}=URa*URb; // MULT URa,URb; HI<=((URa*URb)>>32);
// LO<=((URa*URb) and 0x00000000ffffffff);
// without exception overflow
MFC0: regSet(a, C0R[b]); // MFC0 a, b; Ra<=C0R[Rb]
MTC0: C0regSet(a, Rb); // MTC0 a, b; C0R[a]<=Rb
C0MOV: C0regSet(a, C0R[b]); // C0MOV a, b; C0R[a]<=C0R[b]
`ifdef CPU0II
// set
SLT: if (Rb < Rc) R[a]=1; else R[a]=0;
SLTu: if (URb < URc) R[a]=1; else R[a]=0;
SLTi: if (Rb < c16) R[a]=1; else R[a]=0;
SLTiu: if (URb < uc16) R[a]=1; else R[a]=0;
// Branch Instructions
BEQ: if (Ra==Rb) PCSet(`PC+c16);
BNE: if (Ra!=Rb) PCSet(`PC+c16);
`endif
// Jump Instructions
JEQ: if (`Z) PCSet(`PC+c24); // JEQ Cx; if SW(=) PC PC+Cx
JNE: if (!`Z) PCSet(`PC+c24); // JNE Cx; if SW(!=) PC PC+Cx
JLT: if (`N) PCSet(`PC+c24); // JLT Cx; if SW(<) PC PC+Cx
JGT: if (!`N&&!`Z) PCSet(`PC+c24); // JGT Cx; if SW(>) PC PC+Cx
JLE: if (`N || `Z) PCSet(`PC+c24); // JLE Cx; if SW(<=) PC PC+Cx
JGE: if (!`N || `Z) PCSet(`PC+c24); // JGE Cx; if SW(>=) PC PC+Cx
JMP: `PC = `PC+c24; // JMP Cx; PC <= PC+Cx
JALR: begin retValSet(a, `PC); PCSet(Rb); end // JALR Ra,Rb; Ra<=PC; PC<=Rb
BAL: begin `LR = `PC; `PC = `PC+c24; end // BAL Cx; LR<=PC; PC<=PC+Cx
JSUB: begin retValSet(14, `PC); PCSet(`PC+c24); end // JSUB Cx; LR<=PC; PC<=PC+Cx
RET: begin PCSet(Ra); end // RET; PC <= Ra
default :
$display("%4dns %8x : OP code %8x not support", $stime, pc0, op);
endcase
if (`IE && `I && (`I0E && `I0 || `I1E && `I1 || `I2E && `I2)) begin
`EPC = `PC;
next_state = Fetch;
inExe = 0;
end else
next_state = MemAccess;
end
MemAccess: begin
case (op)
ST, SB, SH :
memWriteEnd(); // write memory complete
endcase
next_state = WriteBack;
end
WriteBack: begin // Read/Write finish, close memory
case (op)
LB, LBu :
memReadEnd(R[a]); //read memory complete
LH, LHu :
memReadEnd(R[a]);
LD : begin
memReadEnd(R[a]);
if (`D)
$display("%4dns %8x : %8x m[%-04x+%-04x]=%8x SW=%8x", $stime, pc0,
ir, R[b], c16, R[a], `SW);
end
endcase
case (op)
LB : begin
if (R[a] > 8'h7f) R[a]=R[a]|32'hffffff80;
end
LH : begin
if (R[a] > 16'h7fff) R[a]=R[a]|32'hffff8000;
end
endcase
case (op)
MULT, MULTu, DIV, DIVu, MTHI, MTLO :
if (`D)
$display("%4dns %8x : %8x HI=%8x LO=%8x SW=%8x", $stime, pc0, ir, HI,
LO, `SW);
ST : begin
if (`D)
$display("%4dns %8x : %8x m[%-04x+%-04x]=%8x SW=%8x", $stime, pc0,
ir, R[b], c16, R[a], `SW);
if (R[b]+c16 == `IOADDR) begin
outw(R[a]);
end
end
SB : begin
if (`D)
$display("%4dns %8x : %8x m[%-04x+%-04x]=%c SW=%8x, R[a]=%8x",
$stime, pc0, ir, R[b], c16, R[a][7:0], `SW, R[a]);
if (R[b]+c16 == `IOADDR) begin
if (`LE)
outc(R[a][7:0]);
else
outc(R[a][7:0]);
end
end
MFC0, MTC0 :
if (`D)
$display("%4dns %8x : %8x R[%02d]=%-8x C0R[%02d]=%-8x SW=%8x",
$stime, pc0, ir, a, R[a], a, C0R[a], `SW);
C0MOV :
if (`D)
$display("%4dns %8x : %8x C0R[%02d]=%-8x C0R[%02d]=%-8x SW=%8x",
$stime, pc0, ir, a, C0R[a], b, C0R[b], `SW);
default :
if (`D) // Display the written register content
$display("%4dns %8x : %8x R[%02d]=%-8x SW=%8x", $stime, pc0, ir,
a, R[a], `SW);
endcase
if (`PC < 0) begin
$display("total cpu cycles = %-d", cycles);
$display("RET to PC < 0, finished!");
$finish;
end
next_state = Fetch;
end
endcase
end endtask
always @(posedge clock) begin
if (inExe == 0 && (state == Fetch) && (`IE && `I) && (`I0E && `I0)) begin
// software int
`M = `IRQ;
taskInterrupt(`IRQ);
m_en = 0;
state = Fetch;
end else if (inExe == 0 && (state == Fetch) && (`IE && `I) &&
((`I1E && `I1) || (`I2E && `I2)) ) begin
`M = `IRQ;
taskInterrupt(`IRQ);
m_en = 0;
state = Fetch;
end else if (inExe == 0 && itype == `RESET) begin
// Condition itype == `RESET must after the other `IE condition
taskInterrupt(`RESET);
`M = `RESET;
state = Fetch;
end else begin
`ifdef TRACE
`D = 1; // Trace register content at beginning
`endif
taskExecute();
state = next_state;
end
pc = `PC;
cycles = cycles + 1;
end
endmodule
module memory0(input clock, reset, en, rw, input [1:0] m_size,
input [31:0] abus, dbus_in, output [31:0] dbus_out,
output cfg);
reg [31:0] mconfig [0:0];
reg [7:0] m [0:`MEMSIZE-1];
reg [31:0] data;
integer i;
`define LE mconfig[0][0:0] // Endian bit, Big Endian:0, Little Endian:1
initial begin
// erase memory
for (i=0; i < `MEMSIZE; i=i+1) begin
m[i] = `MEMEMPTY;
end
// load config from file to memory
$readmemh("cpu0.config", mconfig);
// load program from file to memory
$readmemh("cpu0.hex", m);
// display memory contents
`ifdef TRACE
for (i=0; i < `MEMSIZE && (m[i] != `MEMEMPTY || m[i+1] != `MEMEMPTY ||
m[i+2] != `MEMEMPTY || m[i+3] != `MEMEMPTY); i=i+4) begin
$display("%8x: %8x", i, {m[i], m[i+1], m[i+2], m[i+3]});
end
`endif
end
always @(clock or abus or en or rw or dbus_in)
begin
if (abus >= 0 && abus <= `MEMSIZE-4) begin
if (en == 1 && rw == 0) begin // r_w==0:write
data = dbus_in;
if (`LE) begin // Little Endian
case (m_size)
`BYTE: {m[abus]} = dbus_in[7:0];
`INT16: {m[abus], m[abus+1] } = {dbus_in[7:0], dbus_in[15:8]};
`INT24: {m[abus], m[abus+1], m[abus+2]} =
{dbus_in[7:0], dbus_in[15:8], dbus_in[23:16]};
`INT32: {m[abus], m[abus+1], m[abus+2], m[abus+3]} =
{dbus_in[7:0], dbus_in[15:8], dbus_in[23:16], dbus_in[31:24]};
endcase
end else begin // Big Endian
case (m_size)
`BYTE: {m[abus]} = dbus_in[7:0];
`INT16: {m[abus], m[abus+1] } = dbus_in[15:0];
`INT24: {m[abus], m[abus+1], m[abus+2]} = dbus_in[23:0];
`INT32: {m[abus], m[abus+1], m[abus+2], m[abus+3]} = dbus_in;
endcase
end
end else if (en == 1 && rw == 1) begin // r_w==1:read
if (`LE) begin // Little Endian
case (m_size)
`BYTE: data = {8'h00, 8'h00, 8'h00, m[abus]};
`INT16: data = {8'h00, 8'h00, m[abus+1], m[abus]};
`INT24: data = {8'h00, m[abus+2], m[abus+1], m[abus]};
`INT32: data = {m[abus+3], m[abus+2], m[abus+1], m[abus]};
endcase
end else begin // Big Endian
case (m_size)
`BYTE: data = {8'h00 , 8'h00, 8'h00, m[abus] };
`INT16: data = {8'h00 , 8'h00, m[abus], m[abus+1]};
`INT24: data = {8'h00 , m[abus], m[abus+1], m[abus+2]};
`INT32: data = {m[abus], m[abus+1], m[abus+2], m[abus+3]};
endcase
end
end else
data = 32'hZZZZZZZZ;
end else
data = 32'hZZZZZZZZ;
end
assign dbus_out = data;
assign cfg = mconfig[0][0:0];
endmodule
module main;
reg clock;
reg [2:0] itype;
wire [2:0] tick;
wire [31:0] pc, ir, mar, mdr, dbus;
wire m_en, m_rw;
wire [1:0] m_size;
wire cfg;
cpu0 cpu(.clock(clock), .itype(itype), .pc(pc), .tick(tick), .ir(ir),
.mar(mar), .mdr(mdr), .dbus(dbus), .m_en(m_en), .m_rw(m_rw), .m_size(m_size),
.cfg(cfg));
memory0 mem(.clock(clock), .reset(reset), .en(m_en), .rw(m_rw),
.m_size(m_size), .abus(mar), .dbus_in(mdr), .dbus_out(dbus), .cfg(cfg));
initial
begin
clock = 0;
itype = `RESET;
`TIMEOUT $finish;
end
always #10 clock=clock+1;
endmodule
lbdex/verilog/Makefile
#TRACE=-D TRACE
all:
iverilog ${TRACE} -o cpu0Is cpu0.v
iverilog ${TRACE} -D CPU0II -o cpu0IIs cpu0.v
.PHONY: clean
clean:
rm -rf cpu0.hex cpu0Is cpu0IIs
rm -f *~ cpu0.config
Since the Cpu0 Verilog machine supports both big-endian and little-endian modes, the memory and CPU modules communicate this configuration through a dedicated wire.
The endian information is stored in the ROM of the memory module. Upon system startup, the memory module reads this configuration and sends the endian setting to the CPU via the connected wire.
This mechanism is implemented according to the following code snippet:
lbdex/verilog/cpu0.v
assign cfg = mconfig[0][0:0];
...
wire cfg;
cpu0 cpu(.clock(clock), .itype(itype), .pc(pc), .tick(tick), .ir(ir),
.mar(mar), .mdr(mdr), .dbus(dbus), .m_en(m_en), .m_rw(m_rw), .m_size(m_size),
.cfg(cfg));
memory0 mem(.clock(clock), .reset(reset), .en(m_en), .rw(m_rw),
.m_size(m_size), .abus(mar), .dbus_in(mdr), .dbus_out(dbus), .cfg(cfg));
Verify backend¶
Now let’s compile ch_run_backend.cpp
as shown below. Since code size grows
from low to high addresses and the stack grows from high to low addresses, the
$sp
register is set to 0x7fffc
. This is because cpu0.v
is assumed to
use 0x80000
bytes of memory.
lbdex/input/start.h
#ifndef _START_H_
#define _START_H_
#include "config.h"
#define SET_SW \
asm("andi $sw, $zero, 0"); \
asm("ori $sw, $sw, 0x1e00"); // enable all interrupts
#define initRegs() \
asm("addiu $1, $zero, 0"); \
asm("addiu $2, $zero, 0"); \
asm("addiu $3, $zero, 0"); \
asm("addiu $4, $zero, 0"); \
asm("addiu $5, $zero, 0"); \
asm("addiu $t9, $zero, 0"); \
asm("addiu $7, $zero, 0"); \
asm("addiu $8, $zero, 0"); \
asm("addiu $9, $zero, 0"); \
asm("addiu $10, $zero, 0"); \
SET_SW; \
asm("addiu $fp, $zero, 0");
#endif
lbdex/input/boot.cpp
#include "start.h"
// boot:
asm("boot:");
// asm("_start:");
asm("jmp 12"); // RESET: jmp RESET_START;
asm("jmp 4"); // ERROR: jmp ERR_HANDLE;
asm("jmp 4"); // IRQ: jmp IRQ_HANDLE;
asm("jmp -4"); // ERR_HANDLE: jmp ERR_HANDLE; (loop forever)
// RESET_START:
initRegs();
asm("addiu $gp, $ZERO, 0");
asm("addiu $lr, $ZERO, -1");
INIT_SP;
asm("mfc0 $3, $pc");
asm("addiu $3, $3, 0x8"); // Assume main() entry point is at the next next
// instruction.
asm("jr $3");
asm("nop");
lbdex/input/print.h
#ifndef _PRINT_H_
#define _PRINT_H_
#include "start.h"
void print_char(const char c);
void dump_mem(unsigned char *str, int n);
void print_string(const char *str);
void print_integer(int x);
#endif
lbdex/input/print.cpp
#include "print.h"
#include "itoa.cpp"
// For memory IO
void print_char(const char c)
{
char *p = (char*)IOADDR;
*p = c;
return;
}
void print_string(const char *str)
{
const char *p;
for (p = str; *p != '\0'; p++)
print_char(*p);
print_char(*p);
print_char('\n');
return;
}
// For memory IO
void print_integer(int x)
{
char str[INT_DIGITS + 2];
itoa(str, x);
print_string(str);
return;
}
lbdex/input/ch_nolld.h
#include "debug.h"
#include "boot.cpp"
#include "print.h"
int test_nolld();
lbdex/input/ch_nolld.cpp
#define TEST_ROXV
#define RUN_ON_VERILOG
#include "print.cpp"
#include "ch4_1_math.cpp"
#include "ch4_1_rotate.cpp"
#include "ch4_1_mult2.cpp"
#include "ch4_1_mod.cpp"
#include "ch4_1_div.cpp"
#include "ch4_2_logic.cpp"
#include "ch7_1_localpointer.cpp"
#include "ch7_1_char_short.cpp"
#include "ch7_1_bool.cpp"
#include "ch7_1_longlong.cpp"
#include "ch7_1_vector.cpp"
#include "ch8_1_ctrl.cpp"
#include "ch8_2_deluselessjmp.cpp"
#include "ch8_2_select.cpp"
#include "ch9_1_longlong.cpp"
#include "ch9_3_vararg.cpp"
#include "ch9_3_stacksave.cpp"
#include "ch9_3_bswap.cpp"
#include "ch9_3_alloc.cpp"
#include "ch11_2.cpp"
// Test build only for the following files on build-run_backend.sh since it
// needs lld linker support.
// Test in build-slink.sh
#include "ch6_1.cpp"
#include "ch9_1_struct.cpp"
#include "ch9_1_constructor.cpp"
#include "ch9_3_template.cpp"
#include "ch12_inherit.cpp"
void test_asm_build()
{
#include "ch11_1.cpp"
#ifdef CPU032II
#include "ch11_1_2.cpp"
#endif
}
int test_rotate()
{
int a = test_rotate_left1(); // rolv 4, 30 = 1
int b = test_rotate_left(); // rol 8, 30 = 2
int c = test_rotate_right(); // rorv 1, 30 = 4
return (a+b+c);
}
int test_nolld()
{
bool pass = true;
int a = 0;
a = test_math();
print_integer(a); // a = 68
if (a != 68) pass = false;
a = test_rotate();
print_integer(a); // a = 7
if (a != 7) pass = false;
a = test_mult();
print_integer(a); // a = 0
if (a != 0) pass = false;
a = test_mod();
print_integer(a); // a = 0
if (a != 0) pass = false;
a = test_div();
print_integer(a); // a = 253
if (a != 253) pass = false;
a = test_local_pointer();
print_integer(a); // a = 3
if (a != 3) pass = false;
a = (int)test_load_bool();
print_integer(a); // a = 1
if (a != 1) pass = false;
a = test_andorxornotcomplement();
print_integer(a); // a = 13
if (a != 13) pass = false;
a = test_setxx();
print_integer(a); // a = 3
if (a != 3) pass = false;
a = test_signed_char();
print_integer(a); // a = -126
if (a != -126) pass = false;
a = test_unsigned_char();
print_integer(a); // a = 130
if (a != 130) pass = false;
a = test_signed_short();
print_integer(a); // a = -32766
if (a != -32766) pass = false;
a = test_unsigned_short();
print_integer(a); // a = 32770
if (a != 32770) pass = false;
long long b = test_longlong();
print_integer((int)(b >> 32)); // 393307
if ((int)(b >> 32) != 393307) pass = false;
print_integer((int)b); // 16777218
if ((int)(b) != 16777218) pass = false;
a = test_cmplt_short();
print_integer(a); // a = -3
if (a != -3) pass = false;
a = test_cmplt_long();
print_integer(a); // a = -4
if (a != -4) pass = false;
a = test_control1();
print_integer(a); // a = 51
if (a != 51) pass = false;
a = test_DelUselessJMP();
print_integer(a); // a = 2
if (a != 2) pass = false;
a = test_movx_1();
print_integer(a); // a = 3
if (a != 3) pass = false;
a = test_movx_2();
print_integer(a); // a = 1
if (a != 1) pass = false;
print_integer(2147483647); // test mod % (mult) from itoa.cpp
print_integer(-2147483648); // test mod % (multu) from itoa.cpp
a = test_sum_longlong();
print_integer(a); // a = 9
if (a != 9) pass = false;
a = test_va_arg();
print_integer(a); // a = 12
if (a != 12) pass = false;
a = test_stacksaverestore(100);
print_integer(a); // a = 5
if (a != 5) pass = false;
a = test_bswap();
print_integer(a); // a = 0
if (a != 0) pass = false;
a = test_alloc();
print_integer(a); // a = 31
if (a != 31) pass = false;
a = test_inlineasm();
print_integer(a); // a = 49
if (a != 49) pass = false;
return pass;
}
lbdex/input/ch_run_backend.cpp
#include "ch_nolld.h"
int main()
{
bool pass = true;
pass = test_nolld();
return pass;
}
#include "ch_nolld.cpp"
lbdex/input/functions.sh
prologue() {
if [ $ARG_NUM == 0 ]; then
echo "useage: bash $sh_name cpu_type endian"
echo " cpu_type: cpu032I or cpu032II"
echo " endian: eb (big endian, default) or el (little endian)"
echo "for example:"
echo " bash build-slinker.sh cpu032I be"
exit 1;
fi
if [ $CPU != cpu032I ] && [ $CPU != cpu032II ]; then
echo "1st argument is cpu032I or cpu032II"
exit 1
fi
OS=`uname -s`
echo "OS =" ${OS}
TOOLDIR=~/llvm/test/build/bin
CLANG=~/llvm/test/build/bin/clang
CPU=$CPU
echo "CPU =" "${CPU}"
if [ "$ENDIAN" != "" ] && [ $ENDIAN != el ] && [ $ENDIAN != eb ]; then
echo "2nd argument is eb (big endian, default) or el (little endian)"
exit 1
fi
if [ $ENDIAN == eb ]; then
ENDIAN=
fi
echo "ENDIAN =" "${ENDIAN}"
bash clean.sh
}
isLittleEndian() {
echo "ENDIAN = " "$ENDIAN"
if [ "$ENDIAN" == "LittleEndian" ] ; then
LE="true"
elif [ "$ENDIAN" == "BigEndian" ] ; then
LE="false"
else
echo "!ENDIAN unknown"
exit 1
fi
}
elf2hex() {
${TOOLDIR}/llvm-objdump -elf2hex -le=$LE a.out > ../verilog/cpu0.hex
if [ $LE == "true" ] ; then
echo "1 /* 0: big endian, 1: little endian */" > ../verilog/cpu0.config
else
echo "0 /* 0: big endian, 1: little endian */" > ../verilog/cpu0.config
fi
cat ../verilog/cpu0.config
}
epilogue() {
endian=`${TOOLDIR}/llvm-readobj -h a.out|grep "DataEncoding"|awk '{print $2}'`
isLittleEndian;
elf2hex;
}
lbdex/input/build-run_backend.sh
#!/usr/bin/env bash
# for example:
# bash build-run_backend.sh cpu032I el
# bash build-run_backend.sh cpu032II eb
source functions.sh
sh_name=build-run_backend.sh
ARG_NUM=$#
CPU=$1
ENDIAN=$2
DEFFLAGS=""
if [ "$CPU" == cpu032II ] ; then
DEFFLAGS=${DEFFLAGS}" -DCPU032II"
fi
echo ${DEFFLAGS}
prologue;
# ch8_2_select_global_pic.cpp just for compile build test only, without running
# on verilog.
$CLANG ${DEFFLAGS} -target mips-unknown-linux-gnu -c ch8_2_select_global_pic.cpp \
-emit-llvm -o ch8_2_select_global_pic.bc
${TOOLDIR}/llc -march=cpu0${ENDIAN} -mcpu=${CPU} -relocation-model=pic \
-filetype=obj ch8_2_select_global_pic.bc -o ch8_2_select_global_pic.cpu0.o
$CLANG ${DEFFLAGS} -target mips-unknown-linux-gnu -c ch_run_backend.cpp \
-emit-llvm -o ch_run_backend.bc
echo "${TOOLDIR}/llc -march=cpu0${ENDIAN} -mcpu=${CPU} -relocation-model=static \
-filetype=obj -enable-cpu0-tail-calls ch_run_backend.bc -o ch_run_backend.cpu0.o"
${TOOLDIR}/llc -march=cpu0${ENDIAN} -mcpu=${CPU} -relocation-model=static \
-filetype=obj -enable-cpu0-tail-calls ch_run_backend.bc -o ch_run_backend.cpu0.o
# print must at the same line, otherwise it will spilt into 2 lines
${TOOLDIR}/llvm-objdump --section=.text -d ch_run_backend.cpu0.o | tail -n +8| awk \
'{print "/* " $1 " */\t" $2 " " $3 " " $4 " " $5 "\t/* " $6"\t" $7" " $8" " $9" " $10 "\t*/"}' \
> ../verilog/cpu0.hex
ENDIAN=`${TOOLDIR}/llvm-readobj -h ch_run_backend.cpu0.o|grep "DataEncoding"|awk '{print $2}'`
isLittleEndian;
if [ $LE == "true" ] ; then
echo "1 /* 0: big ENDIAN, 1: little ENDIAN */" > ../verilog/cpu0.config
else
echo "0 /* 0: big ENDIAN, 1: little ENDIAN */" > ../verilog/cpu0.config
fi
cat ../verilog/cpu0.config
To run program without linker implementation at this point, the boot.cpp
must
be set at the beginning of code, and the main()
of ch_run_backend.cpp
comes immediately after it.
Let’s run Chapter11_2/
with llvm-objdump -d
for input file
ch_run_backend.cpp
to generate the hex file via build-run_bacekend.sh
,
then feed the hex file to cpu0
’s Verilog simulator to get the output result
as below.
Remind that ch_run_backend.cpp
has to be compiled with the option
clang -target mips-unknown-linux-gnu
since the example code
ch9_3_vararg.cpp
, which uses vararg, needs to be compiled with this option.
Other example codes have no differences between this option and the default
option.
JonathantekiiMac:input Jonathan$ pwd
/Users/Jonathan/llvm/test/lbdex/input
JonathantekiiMac:input Jonathan$ bash build-run_backend.sh cpu032I eb
JonathantekiiMac:input Jonathan$ cd ../verilog cd ../verilog
JonathantekiiMac:input Jonathan$ pwd
/Users/Jonathan/llvm/test/lbdex/verilog
JonathantekiiMac:verilog Jonathan$ make
JonathantekiiMac:verilog Jonathan$ ./cpu0Is
WARNING: cpu0Is.v:386: $readmemh(cpu0.hex): Not enough words in the file for the
taskInterrupt(001)
68
7
0
0
253
3
1
13
3
-126
130
-32766
32770
393307
16777218
3
4
51
2
3
1
2147483647
-2147483648
15
5
0
31
49
total cpu cycles = 50645
RET to PC < 0, finished!
JonathantekiiMac:input Jonathan$ bash build-run_backend.sh cpu032II eb
JonathantekiiMac:input Jonathan$ cd ../verilog
JonathantekiiMac:verilog Jonathan$ ./cpu0IIs
...
total cpu cycles = 48335
RET to PC < 0, finished!
The “total CPU cycles” are calculated in this Verilog simulator to allow performance review of both the backend compiler and the CPU.
Only CPU cycles are counted in this implementation, as I/O cycle times are unknown.
As explained in Chapter “Control Flow Statements”, cpu032II
, which uses
instructions slt
and beq
, performs better than cmp
and jeq
in
cpu032I
.
The instruction jmp
has no delay slot, making it preferable for use in
dynamic linker implementations.
You can trace memory binary code and changes to destination registers at every
instruction execution by unmarking TRACE
in the Makefile, as shown below:
lbdex/verilog/Makefile
TRACE=-D TRACE
JonathantekiiMac:raw Jonathan$ ./cpu0Is
WARNING: cpu0.v:386: $readmemh(cpu0.hex): Not enough words in the file for the
requested range [0:28671].
00000000: 2600000c
00000004: 26000004
00000008: 26000004
0000000c: 26fffffc
00000010: 09100000
00000014: 09200000
...
taskInterrupt(001)
1530ns 00000054 : 02ed002c m[28620+44 ]=-1 SW=00000000
1610ns 00000058 : 02bd0028 m[28620+40 ]=0 SW=00000000
...
RET to PC < 0, finished!
As shown in the result above, cpu0.v
dumps the memory content after reading
the input file cpu0.hex
. Next, it runs instructions from address 0 and prints
each destination register value in the fourth column.
The first column is the timestamp in nanoseconds. The second column is the instruction address. The third column is the instruction content.
Most of the example codes discussed in previous chapters are verified by printing
variables using print_integer()
.
Since the cpu0.v
machine is written in Verilog, it is assumed to be capable of
running on a real FPGA device (though I have not tested this myself). The actual
output hardware interface or port depends on the specific output device, such as
RS232, speaker, LED, etc. You must implement the I/O interface or port and wire
your I/O device accordingly when programming an FPGA.
By running the compiled code on the Verilog simulator, the compiled result from the Cpu0 backend and the total CPU cycles can be verified and measured.
Currently, this Cpu0 Verilog implementation does not support pipeline architecture. However, based on the instruction set, it can be extended to a pipelined model.
The cycle time of the pipelined Cpu0 model is expected to be more than 1/5 of the “total CPU cycles” shown above, due to dependencies between instructions.
Although the Verilog simulator is slow for running full system programs and does not count cycles for cache and I/O operations, it provides a simple and effective way to validate CPU design ideas in the early development stages using small program patterns.
Creating a full system simulator is complex. While the Wiki website [7] provides tools for building simulators, doing so requires significant effort.
To generate cpu032I
code with little-endian format, you can run the following
command. The script build-run_backend.sh
writes the endian configuration to
../verilog/cpu0.config
as shown below.
JonathantekiiMac:input Jonathan$ bash build-run_backend.sh cpu032I el
../verilog/cpu0.config
1 /* 0: big endian, 1: little endian */
The following files test more features.
lbdex/input/ch_nolld2.h
#include "debug.h"
#include "boot.cpp"
#include "print.h"
int test_nolld2();
lbdex/input/ch_nolld2.cpp
#include "print.cpp"
#include "ch9_3_alloc.cpp"
int test_nolld2()
{
bool pass = true;
int a = 0;
a = test_alloc();
print_integer(a); // a = 31
if (a != 31) pass = false;
return pass;
}
lbdex/input/ch_run_backend2.cpp
#include "ch_nolld2.h"
int main()
{
bool pass = true;
pass = test_nolld2();
return pass;
}
#include "ch_nolld2.cpp"
lbdex/input/build-run_backend2.sh
#!/usr/bin/env bash
source functions.sh
sh_name=build-run_backend.sh
ARG_NUM=$#
CPU=$1
ENDIAN=$2
DEFFLAGS=""
if [ "$arg1" == cpu032II ] ; then
DEFFLAGS=${DEFFLAGS}" -DCPU032II"
fi
echo ${DEFFLAGS}
prologue;
${CLANG} ${DEFFLAGS} -c ch_run_backend2.cpp \
-emit-llvm -o ch_run_backend2.bc
${TOOLDIR}/llc -march=cpu0${ENDIAN} -mcpu=${CPU} -relocation-model=static \
-filetype=obj ch_run_backend2.bc -o ch_run_backend2.cpu0.o
${TOOLDIR}/llvm-objdump -d ch_run_backend2.cpu0.o | tail -n +8| awk \
'{print "/* " $1 " */\t" $2 " " $3 " " $4 " " $5 "\t/* " $6"\t" $7" " $8" \
" $9" " $10 "\t*/"}' > ../verilog/cpu0.hex
ENDIAN=`${TOOLDIR}/llvm-readobj -h ch_run_backend2.cpu0.o|grep "DataEncoding"|awk '{print $2}'`
isLittleEndian;
if [ $LE == "true" ] ; then
echo "1 /* 0: big endian, 1: little endian */" > ../verilog/cpu0.config
else
echo "0 /* 0: big endian, 1: little endian */" > ../verilog/cpu0.config
fi
cat ../verilog/cpu0.config
JonathantekiiMac:input Jonathan$ bash build-run_backend.sh cpu032II el
...
JonathantekiiMac:input Jonathan$ cd ../verilog
JonathantekiiMac:verilog Jonathan$ ./cpu0IIs
...
31
...
Other LLVM-Based Tools for Cpu0 Processor¶
You can find the Cpu0 ELF linker implementation based on lld
, which is the
official LLVM linker project, as well as elf2hex
, which is modified from the
llvm-objdump
driver, at the following website:
http://jonathan2251.github.io/lbt/index.html