Verify backend on Verilog simulator

Until now, we have developed an LLVM backend capable of compiling C or assembly code, as illustrated in the white part of Fig. 60. If the program does not contain global variables, the ELF object file can be dumped to a hex file using the following command:

llvm-objdump -d

This functionality was completed in Chapter ELF Support.

digraph G {
  rankdir=LR;
  "Verilog machine" [style=filled, color=gray];
  "clang" -> "llc" [label="IR"];
  "llvm backend asm parser" -> "llc" [label="asm"];
  "llc" -> "llvm-objdump -d" [label="obj"];
  "llvm-objdump -d" -> "Verilog machine" [label="hex"];
  
//  label = "Figure: Cpu0 backend without linker";
}

Fig. 60 Cpu0 backend without linker

This chapter implements the Cpu0 instructions using the Verilog language, as represented by the gray part in the figure above.

With this Verilog-based machine, we can execute the hex program generated by the LLVM backend on the Cpu0 Verilog simulator running on a PC. This allows us to observe and verify the execution results of Cpu0 instructions directly on the hardware model.

Create Verilog Simulator of Cpu0

Verilog is an IEEE-standard language widely used in IC design. There are many books and free online resources available for learning Verilog [1] [2] [3] [4] [5].

Verilog is also known as Verilog HDL (Hardware Description Language), not to be confused with VHDL, which serves the same purpose but is a competing language [6].

An example implementation, lbdex/verilog/cpu0.v, contains the Cpu0 processor design written in Verilog. As described in Appendix A, we have installed the Icarus Verilog tool on both iMac and Linux systems. The cpu0.v design is relatively simple, with only a few hundred lines of code in total.

Although this implementation does not include pipelining, it simulates delay slots (via the SIMULATE_DELAY_SLOT section of the code) to accurately estimate pipeline machine cycles.

Verilog has a C-like syntax, and since this book focuses on compiler implementation, we present the cpu0.v code and the build commands below without an in-depth explanation. We expect that readers with some patience and curiosity will be able to understand the Verilog code.

Cpu0 supports memory-mapped I/O, one of the two primary I/O models in computer architecture (the other being instruction-based I/O). Cpu0 maps the output port to memory address 0x80000. When executing the instruction:

st $ra, cx($rb)

where cx($rb) equals 0x80000, the Cpu0 processor outputs the content to that I/O port, as demonstrated below.

ST : begin
  ...
  if (R[b]+c16 == `IOADDR) begin
    outw(R[a]);

lbdex/verilog/cpu0.v

// https://www.francisz.cn/download/IEEE_Standard_1800-2012%20SystemVerilog.pdf

// configuable value below
`define SIMULATE_DELAY_SLOT
// cpu032I memory limit, jsub:24-bit
`define MEMSIZE   'h1000000
`define MEMEMPTY   8'hFF
`define NULL       8'h00
`define IOADDR    'hff000000  // IO mapping address
`define TIMEOUT   #3000000000


// Operand width
`define INT32 2'b11     // 32 bits
`define INT24 2'b10     // 24 bits
`define INT16 2'b01     // 16 bits
`define BYTE  2'b00     // 8  bits

`define EXE 3'b000
`define RESET 3'b001
`define ABORT 3'b010
`define IRQ 3'b011
`define ERROR 3'b100

// Reference web: http://ccckmit.wikidot.com/ocs:cpu0
module cpu0(input clock, reset, input [2:0] itype, output reg [2:0] tick, 
            output reg [31:0] ir, pc, mar, mdr, inout [31:0] dbus, 
            output reg m_en, m_rw, output reg [1:0] m_size, 
            input cfg);
  reg signed [31:0] R [0:15];
  reg signed [31:0] C0R [0:1]; // co-processor 0 register
  // High and Low part of 64 bit result
  reg [7:0] op;
  reg [3:0] a, b, c;
  reg [4:0] c5;
  reg signed [31:0] c12, c16, c24, Ra, Rb, Rc, pc0; // pc0: instruction pc
  reg [31:0] uc16, URa, URb, URc, HI, LO, CF, tmp;
  reg [63:0] cycles;

  // register name
  `define SP   R[13]   // Stack Pointer
  `define LR   R[14]   // Link Register
  `define SW   R[15]   // Status Word

  // C0 register name
  `define PC   C0R[0]   // Program Counter
  `define EPC  C0R[1]  // exception PC value

  // SW Flage
  `define I2   `SW[16] // Hardware Interrupt 1, IO1 interrupt, status, 
                      // 1: in interrupt
  `define I1   `SW[15] // Hardware Interrupt 0, timer interrupt, status, 
                      // 1: in interrupt
  `define I0   `SW[14] // Software interrupt, status, 1: in interrupt
  `define I    `SW[13] // Interrupt, 1: in interrupt
  `define I2E  `SW[12]  // Hardware Interrupt 1, IO1 interrupt, Enable
  `define I1E  `SW[11]  // Hardware Interrupt 0, timer interrupt, Enable
  `define I0E  `SW[10]  // Software Interrupt Enable
  `define IE   `SW[9]  // Interrupt Enable
  `define M    `SW[8:6]  // Mode bits, itype
  `define D    `SW[5]  // Debug Trace
  `define V    `SW[3]  // Overflow
  `define C    `SW[2]  // Carry
  `define Z    `SW[1]  // Zero
  `define N    `SW[0]  // Negative flag
  
  `define LE   CF[0]  // Endian bit, Big Endian:0, Little Endian:1
  // Instruction Opcode 
  parameter [7:0] NOP=8'h00,LD=8'h01,ST=8'h02,LB=8'h03,LBu=8'h04,SB=8'h05,
  LH=8'h06,LHu=8'h07,SH=8'h08,ADDiu=8'h09,MOVZ=8'h0A,MOVN=8'h0B,ANDi=8'h0C,
  ORi=8'h0D,XORi=8'h0E,LUi=8'h0F,
  ADDu=8'h11,SUBu=8'h12,ADD=8'h13,SUB=8'h14,CLZ=8'h15,CLO=8'h16,MUL=8'h17,
  AND=8'h18,OR=8'h19,XOR=8'h1A,NOR=8'h1B,
  ROL=8'h1C,ROR=8'h1D,SHL=8'h1E,SHR=8'h1F,
  SRA=8'h20,SRAV=8'h21,SHLV=8'h22,SHRV=8'h23,ROLV=8'h24,RORV=8'h25,
`ifdef CPU0II
  SLTi=8'h26,SLTiu=8'h27, SLT=8'h28,SLTu=8'h29,
`endif
  CMP=8'h2A,
  CMPu=8'h2B,
  JEQ=8'h30,JNE=8'h31,JLT=8'h32,JGT=8'h33,JLE=8'h34,JGE=8'h35,
  JMP=8'h36,
`ifdef CPU0II
  BEQ=8'h37,BNE=8'h38,
`endif
  JALR=8'h39,BAL=8'h3A,JSUB=8'h3B,RET=8'h3C,
  MULT=8'h41,MULTu=8'h42,DIV=8'h43,DIVu=8'h44,
  MFHI=8'h46,MFLO=8'h47,MTHI=8'h48,MTLO=8'h49,
  MFC0=8'h50,MTC0=8'h51,C0MOV=8'h52;

  reg [0:0] inExe = 0;
  reg [2:0] state, next_state; 
  reg [2:0] st_taskInt, ns_taskInt; 
  parameter Reset=3'h0, Fetch=3'h1, Decode=3'h2, Execute=3'h3, MemAccess=3'h4, 
            WriteBack=3'h5;
  integer i;
`ifdef SIMULATE_DELAY_SLOT
  reg [0:0] nextInstIsDelaySlot;
  reg [0:0] isDelaySlot;
  reg signed [31:0] delaySlotNextPC;
`endif

  //transform data from the memory to little-endian form
  task changeEndian(input [31:0] value, output [31:0] changeEndian); begin
    changeEndian = {value[7:0], value[15:8], value[23:16], value[31:24]};
  end endtask

  // Read Memory Word
  task memReadStart(input [31:0] addr, input [1:0] size); begin 
    mar = addr;     // read(m[addr])
    m_rw = 1;     // Access Mode: read 
    m_en = 1;     // Enable read
    m_size = size;
  end endtask

  // Read Memory Finish, get data
  task memReadEnd(output [31:0] data); begin
    mdr = dbus; // get momory, dbus = m[addr]
    data = mdr; // return to data
    m_en = 0; // read complete
  end endtask

  // Write memory -- addr: address to write, data: date to write
  task memWriteStart(input [31:0] addr, input [31:0] data, input [1:0] size); 
  begin 
    mar = addr;    // write(m[addr], data)
    mdr = data;
    m_rw = 0;    // access mode: write
    m_en = 1;     // Enable write
    m_size  = size;
  end endtask

  task memWriteEnd; begin // Write Memory Finish
    m_en = 0; // write complete
  end endtask

  task regSet(input [3:0] i, input [31:0] data); begin
    if (i != 0) R[i] = data;
  end endtask

  task C0regSet(input [3:0] i, input [31:0] data); begin
    if (i < 2) C0R[i] = data;
  end endtask

  task PCSet(input [31:0] data); begin
  `ifdef SIMULATE_DELAY_SLOT
    nextInstIsDelaySlot = 1;
    delaySlotNextPC = data;
  `else
    `PC = data;
  `endif
  end endtask

  task retValSet(input [3:0] i, input [31:0] data); begin
    if (i != 0)
    `ifdef SIMULATE_DELAY_SLOT
      R[i] = data + 4;
    `else
      R[i] = data;
    `endif
  end endtask
  
  task regHILOSet(input [31:0] data1, input [31:0] data2); begin
    HI = data1;
    LO = data2;
  end endtask

  // output a word to Output port (equal to display the word to terminal)
  task outw(input [31:0] data); begin
    if (`LE) begin // Little Endian
      changeEndian(data, data);
    end 
    if (data[7:0] != 8'h00) begin
      $write("%c", data[7:0]);
      if (data[15:8] != 8'h00) 
        $write("%c", data[15:8]);
      if (data[23:16] != 8'h00) 
        $write("%c", data[23:16]);
      if (data[31:24] != 8'h00) 
        $write("%c", data[31:24]);
    end
  end endtask

  // output a character (a byte)
  task outc(input [7:0] data); begin
    $write("%c", data);
  end endtask

  task taskInterrupt(input [2:0] iMode); begin
  if (inExe == 0) begin
    case (iMode)
      `RESET: begin 
        `PC = 0; tick = 0; R[0] = 0; `SW = 0; `LR = -1;
        `IE = 0; `I0E = 1; `I1E = 1; `I2E = 1;
        `I = 0; `I0 = 0; `I1 = 0; `I2 = 0; inExe = 1;
        `LE = cfg;
        cycles = 0;
      end
      `ABORT: begin `PC = 4; end
      `IRQ:   begin `PC = 8; `IE = 0; inExe = 1; end
      `ERROR: begin `PC = 12; end
    endcase
  end
  $display("taskInterrupt(%3b)", iMode);
  end endtask

  task taskExecute; begin
    tick = tick+1;
    case (state)
    Fetch: begin  // Tick 1 : instruction fetch, throw PC to address bus, 
                  // memory.read(m[PC])
      memReadStart(`PC, `INT32);
      pc0  = `PC;
   `ifdef SIMULATE_DELAY_SLOT
     if (nextInstIsDelaySlot == 1) begin
       isDelaySlot = 1;
       nextInstIsDelaySlot = 0;
       `PC = delaySlotNextPC;
     end
     else begin
       if (isDelaySlot == 1) isDelaySlot = 0;
       `PC = `PC+4;
     end
   `else
     `PC = `PC+4;
   `endif
      next_state = Decode;
    end
    Decode: begin  // Tick 2 : instruction decode, ir = m[PC]
      memReadEnd(ir); // IR = dbus = m[PC]
      {op,a,b,c} = ir[31:12];
      c24 = $signed(ir[23:0]);
      c16 = $signed(ir[15:0]);
      uc16 = ir[15:0];
      c12 = $signed(ir[11:0]);
      c5  = ir[4:0];
      Ra = R[a];
      Rb = R[b];
      Rc = R[c];
      URa = R[a];
      URb = R[b];
      URc = R[c];
      next_state = Execute;
    end
    Execute: begin // Tick 3 : instruction execution
      case (op)
      NOP:   ;
      // load and store instructions
      LD:    memReadStart(Rb+c16, `INT32);      // LD Ra,[Rb+Cx]; Ra<=[Rb+Cx]
      ST:    memWriteStart(Rb+c16, Ra, `INT32); // ST Ra,[Rb+Cx]; Ra=>[Rb+Cx]
      // LB Ra,[Rb+Cx]; Ra<=(byte)[Rb+Cx]
      LB:    memReadStart(Rb+c16, `BYTE);
      // LBu Ra,[Rb+Cx]; Ra<=(byte)[Rb+Cx]
      LBu:   memReadStart(Rb+c16, `BYTE);
      // SB Ra,[Rb+Cx]; Ra=>(byte)[Rb+Cx]
      SB:    memWriteStart(Rb+c16, Ra, `BYTE);
      LH:    memReadStart(Rb+c16, `INT16); // LH Ra,[Rb+Cx]; Ra<=(2bytes)[Rb+Cx]
      LHu:   memReadStart(Rb+c16, `INT16); // LHu Ra,[Rb+Cx]; Ra<=(2bytes)[Rb+Cx]
      // SH Ra,[Rb+Cx]; Ra=>(2bytes)[Rb+Cx]
      SH:    memWriteStart(Rb+c16, Ra, `INT16);
      // Conditional move
      MOVZ:  if (Rc==0) regSet(a, Rb);             // move if Rc equal to 0
      MOVN:  if (Rc!=0) regSet(a, Rb);             // move if Rc not equal to 0
      // Mathematic 
      ADDiu: regSet(a, Rb+c16);                   // ADDiu Ra, Rb+Cx; Ra<=Rb+Cx
      CMP:  begin 
        if (Rb < Rc) `N=1; else `N=0;
        // `N=(Rb-Rc<0); // why not work for bash make.sh cpu032I el Makefile.builtins?
       `Z=(Rb-Rc==0); 
      end // CMP Rb, Rc; SW=(Rb >=< Rc)
      CMPu:  begin 
        if (URb < URc) `N=1; else `N=0; 
       `Z=(URb-URc==0); 
      end // CMPu URb, URc; SW=(URb >=< URc)
      ADDu:  regSet(a, Rb+Rc);               // ADDu Ra,Rb,Rc; Ra<=Rb+Rc
      ADD:   begin regSet(a, Rb+Rc); if (a < Rb) `V = 1; else `V = 0; 
        if (`V) begin `I0 = 1; `I = 1; end
      end
                                             // ADD Ra,Rb,Rc; Ra<=Rb+Rc
      SUBu:  regSet(a, Rb-Rc);               // SUBu Ra,Rb,Rc; Ra<=Rb-Rc
      SUB:   begin regSet(a, Rb-Rc); if (Rb < 0 && Rc > 0 && a >= 0) 
             `V = 1; else `V =0; 
        if (`V) begin `I0 = 1; `I = 1; end
      end         // SUB Ra,Rb,Rc; Ra<=Rb-Rc
      CLZ:   begin
        for (i=0; (i<32)&&((Rb&32'h80000000)==32'h00000000); i=i+1) begin
            Rb=Rb<<1;
        end
        regSet(a, i);
      end
      CLO:   begin
        for (i=0; (i<32)&&((Rb&32'h80000000)==32'h80000000); i=i+1) begin
            Rb=Rb<<1;
        end
        regSet(a, i);
      end
      MUL:   regSet(a, Rb*Rc);               // MUL Ra,Rb,Rc;     Ra<=Rb*Rc
      DIVu:  regHILOSet(URa%URb, URa/URb);   // DIVu URa,URb; HI<=URa%URb; 
                                             // LO<=URa/URb
                                             // without exception overflow
      DIV:   begin regHILOSet(Ra%Rb, Ra/Rb); 
             if ((Ra < 0 && Rb < 0) || (Ra == 0)) `V = 1; 
             else `V =0; end  // DIV Ra,Rb; HI<=Ra%Rb; LO<=Ra/Rb; With overflow
      AND:   regSet(a, Rb&Rc);               // AND Ra,Rb,Rc; Ra<=(Rb and Rc)
      ANDi:  regSet(a, Rb&uc16);             // ANDi Ra,Rb,c16; Ra<=(Rb and c16)
      OR:    regSet(a, Rb|Rc);               // OR Ra,Rb,Rc; Ra<=(Rb or Rc)
      ORi:   regSet(a, Rb|uc16);             // ORi Ra,Rb,c16; Ra<=(Rb or c16)
      XOR:   regSet(a, Rb^Rc);               // XOR Ra,Rb,Rc; Ra<=(Rb xor Rc)
      NOR:   regSet(a, ~(Rb|Rc));            // NOR Ra,Rb,Rc; Ra<=(Rb nor Rc)
      XORi:  regSet(a, Rb^uc16);             // XORi Ra,Rb,c16; Ra<=(Rb xor c16)
      LUi:   regSet(a, uc16<<16);
      SHL:   regSet(a, Rb<<c5);     // Shift Left; SHL Ra,Rb,Cx; Ra<=(Rb << Cx)
      SRA:   regSet(a, (Rb>>>c5));  // Shift Right with signed bit fill;
        // https://stackoverflow.com/questions/39911655/how-to-synthesize-hardware-for-sra-instruction
      SHR:   regSet(a, Rb>>c5);     // Shift Right with 0 fill; 
                                    // SHR Ra,Rb,Cx; Ra<=(Rb >> Cx)
      SHLV:  regSet(a, Rb<<Rc);     // Shift Left; SHLV Ra,Rb,Rc; Ra<=(Rb << Rc)
      SRAV:  regSet(a, (Rb>>>Rc));  // Shift Right with signed bit fill;
      SHRV:  regSet(a, Rb>>Rc);     // Shift Right with 0 fill; 
                                    // SHRV Ra,Rb,Rc; Ra<=(Rb >> Rc)
      ROL:   regSet(a, (Rb<<c5)|(Rb>>(32-c5)));     // Rotate Left;
      ROR:   regSet(a, (Rb>>c5)|(Rb<<(32-c5)));     // Rotate Right;
      ROLV:  begin // Can set Rc to -32<=Rc<=32 more efficently.
        while (Rc < -32) Rc=Rc+32;
        while (Rc > 32) Rc=Rc-32;
        regSet(a, (Rb<<Rc)|(Rb>>(32-Rc)));     // Rotate Left;
      end
      RORV:  begin 
        while (Rc < -32) Rc=Rc+32;
        while (Rc > 32) Rc=Rc-32;
        regSet(a, (Rb>>Rc)|(Rb<<(32-Rc)));     // Rotate Right;
      end
      MFLO:  regSet(a, LO);         // MFLO Ra; Ra<=LO
      MFHI:  regSet(a, HI);         // MFHI Ra; Ra<=HI
      MTLO:  LO = Ra;               // MTLO Ra; LO<=Ra
      MTHI:  HI = Ra;               // MTHI Ra; HI<=Ra
      MULT:  {HI, LO}=Ra*Rb;        // MULT Ra,Rb; HI<=((Ra*Rb)>>32); 
                                    // LO<=((Ra*Rb) and 0x00000000ffffffff);
                                    // with exception overflow
      MULTu: {HI, LO}=URa*URb;      // MULT URa,URb; HI<=((URa*URb)>>32); 
                                    // LO<=((URa*URb) and 0x00000000ffffffff);
                                    // without exception overflow
      MFC0:  regSet(a, C0R[b]);     // MFC0 a, b; Ra<=C0R[Rb]
      MTC0:  C0regSet(a, Rb);       // MTC0 a, b; C0R[a]<=Rb
      C0MOV: C0regSet(a, C0R[b]);   // C0MOV a, b; C0R[a]<=C0R[b]
   `ifdef CPU0II
      // set
      SLT:   if (Rb < Rc) R[a]=1; else R[a]=0;
      SLTu:  if (URb < URc) R[a]=1; else R[a]=0;
      SLTi:  if (Rb < c16) R[a]=1; else R[a]=0;
      SLTiu: if (URb < uc16) R[a]=1; else R[a]=0;
      // Branch Instructions
      BEQ:   if (Ra==Rb) PCSet(`PC+c16);
      BNE:   if (Ra!=Rb) PCSet(`PC+c16);
    `endif
      // Jump Instructions
      JEQ:   if (`Z) PCSet(`PC+c24);            // JEQ Cx; if SW(=) PC  PC+Cx
      JNE:   if (!`Z) PCSet(`PC+c24);           // JNE Cx; if SW(!=) PC PC+Cx
      JLT:   if (`N) PCSet(`PC+c24);            // JLT Cx; if SW(<) PC  PC+Cx
      JGT:   if (!`N&&!`Z) PCSet(`PC+c24);      // JGT Cx; if SW(>) PC  PC+Cx
      JLE:   if (`N || `Z) PCSet(`PC+c24);      // JLE Cx; if SW(<=) PC PC+Cx    
      JGE:   if (!`N || `Z) PCSet(`PC+c24);     // JGE Cx; if SW(>=) PC PC+Cx
      JMP:   `PC = `PC+c24;                     // JMP Cx; PC <= PC+Cx
      JALR:  begin retValSet(a, `PC); PCSet(Rb); end    // JALR Ra,Rb; Ra<=PC; PC<=Rb
      BAL:   begin `LR = `PC; `PC = `PC+c24; end // BAL Cx; LR<=PC; PC<=PC+Cx
      JSUB:  begin retValSet(14, `PC); PCSet(`PC+c24); end // JSUB Cx; LR<=PC; PC<=PC+Cx
      RET:   begin PCSet(Ra); end               // RET; PC <= Ra
      default : 
        $display("%4dns %8x : OP code %8x not support", $stime, pc0, op);
      endcase
      if (`IE && `I && (`I0E && `I0 || `I1E && `I1 || `I2E && `I2)) begin
        `EPC = `PC;
        next_state = Fetch;
        inExe = 0;
      end else
        next_state = MemAccess;
    end
    MemAccess: begin
      case (op)
      ST, SB, SH  :
        memWriteEnd();                // write memory complete
      endcase
      next_state = WriteBack;
    end
    WriteBack: begin // Read/Write finish, close memory
      case (op)
      LB, LBu  :
        memReadEnd(R[a]);        //read memory complete
      LH, LHu  :
        memReadEnd(R[a]);
      LD  : begin
        memReadEnd(R[a]);
        if (`D)
          $display("%4dns %8x : %8x m[%-04x+%-04x]=%8x  SW=%8x", $stime, pc0, 
                   ir, R[b], c16, R[a], `SW);
      end
      endcase
      case (op)
      LB  : begin 
        if (R[a] > 8'h7f) R[a]=R[a]|32'hffffff80;
      end
      LH  : begin 
        if (R[a] > 16'h7fff) R[a]=R[a]|32'hffff8000;
      end
      endcase
      case (op)
      MULT, MULTu, DIV, DIVu, MTHI, MTLO :
        if (`D)
          $display("%4dns %8x : %8x HI=%8x LO=%8x SW=%8x", $stime, pc0, ir, HI, 
                   LO, `SW);
      ST : begin
        if (`D)
          $display("%4dns %8x : %8x m[%-04x+%-04x]=%8x  SW=%8x", $stime, pc0, 
                   ir, R[b], c16, R[a], `SW);
        if (R[b]+c16 == `IOADDR) begin
          outw(R[a]);
        end
      end
      SB : begin
        if (`D)
          $display("%4dns %8x : %8x m[%-04x+%-04x]=%c  SW=%8x, R[a]=%8x", 
                   $stime, pc0, ir, R[b], c16, R[a][7:0], `SW, R[a]);
        if (R[b]+c16 == `IOADDR) begin
          if (`LE)
            outc(R[a][7:0]);
          else
            outc(R[a][7:0]);
        end
      end
      MFC0, MTC0 :
        if (`D)
          $display("%4dns %8x : %8x R[%02d]=%-8x  C0R[%02d]=%-8x SW=%8x", 
                   $stime, pc0, ir, a, R[a], a, C0R[a], `SW);
      C0MOV :
        if (`D)
          $display("%4dns %8x : %8x C0R[%02d]=%-8x C0R[%02d]=%-8x SW=%8x", 
                   $stime, pc0, ir, a, C0R[a], b, C0R[b], `SW);
      default :
        if (`D) // Display the written register content
          $display("%4dns %8x : %8x R[%02d]=%-8x SW=%8x", $stime, pc0, ir, 
                   a, R[a], `SW);
      endcase
      if (`PC < 0) begin
        $display("total cpu cycles = %-d", cycles);
        $display("RET to PC < 0, finished!");
        $finish;
      end
      next_state = Fetch;
    end
    endcase
  end endtask

  always @(posedge clock) begin
    if (inExe == 0 && (state == Fetch) && (`IE && `I) && (`I0E && `I0)) begin
    // software int
      `M = `IRQ;
      taskInterrupt(`IRQ);
      m_en = 0;
      state = Fetch;
    end else if (inExe == 0 && (state == Fetch) && (`IE && `I) && 
                 ((`I1E && `I1) || (`I2E && `I2)) ) begin
      `M = `IRQ;
      taskInterrupt(`IRQ);
      m_en = 0;
      state = Fetch;
    end else if (inExe == 0 && itype == `RESET) begin
    // Condition itype == `RESET must after the other `IE condition
      taskInterrupt(`RESET);
      `M = `RESET;
      state = Fetch;
    end else begin
    `ifdef TRACE
      `D = 1; // Trace register content at beginning
    `endif
      taskExecute();
      state = next_state;
    end
    pc = `PC;
    cycles = cycles + 1;
  end
endmodule

module memory0(input clock, reset, en, rw, input [1:0] m_size, 
               input [31:0] abus, dbus_in, output [31:0] dbus_out, 
               output cfg);
  reg [31:0] mconfig [0:0];
  reg [7:0] m [0:`MEMSIZE-1];
  reg [31:0] data;

  integer i;

  `define LE  mconfig[0][0:0]   // Endian bit, Big Endian:0, Little Endian:1

  initial begin
  // erase memory
    for (i=0; i < `MEMSIZE; i=i+1) begin
       m[i] = `MEMEMPTY;
    end
  // load config from file to memory
    $readmemh("cpu0.config", mconfig);
  // load program from file to memory
    $readmemh("cpu0.hex", m);
  // display memory contents
    `ifdef TRACE
      for (i=0; i < `MEMSIZE && (m[i] != `MEMEMPTY || m[i+1] != `MEMEMPTY || 
         m[i+2] != `MEMEMPTY || m[i+3] != `MEMEMPTY); i=i+4) begin
        $display("%8x: %8x", i, {m[i], m[i+1], m[i+2], m[i+3]});
      end
    `endif
  end

  always @(clock or abus or en or rw or dbus_in) 
  begin
    if (abus >= 0 && abus <= `MEMSIZE-4) begin
      if (en == 1 && rw == 0) begin // r_w==0:write
        data = dbus_in;
        if (`LE) begin // Little Endian
          case (m_size)
          `BYTE:  {m[abus]} = dbus_in[7:0];
          `INT16: {m[abus], m[abus+1] } = {dbus_in[7:0], dbus_in[15:8]};
          `INT24: {m[abus], m[abus+1], m[abus+2]} = 
                  {dbus_in[7:0], dbus_in[15:8], dbus_in[23:16]};
          `INT32: {m[abus], m[abus+1], m[abus+2], m[abus+3]} = 
                  {dbus_in[7:0], dbus_in[15:8], dbus_in[23:16], dbus_in[31:24]};
          endcase
        end else begin // Big Endian
          case (m_size)
          `BYTE:  {m[abus]} = dbus_in[7:0];
          `INT16: {m[abus], m[abus+1] } = dbus_in[15:0];
          `INT24: {m[abus], m[abus+1], m[abus+2]} = dbus_in[23:0];
          `INT32: {m[abus], m[abus+1], m[abus+2], m[abus+3]} = dbus_in;
          endcase
        end
      end else if (en == 1 && rw == 1) begin // r_w==1:read
        if (`LE) begin // Little Endian
          case (m_size)
          `BYTE:  data = {8'h00,     8'h00,     8'h00,     m[abus]};
          `INT16: data = {8'h00,     8'h00,     m[abus+1], m[abus]};
          `INT24: data = {8'h00,     m[abus+2], m[abus+1], m[abus]};
          `INT32: data = {m[abus+3], m[abus+2], m[abus+1], m[abus]};
          endcase
        end else begin // Big Endian
          case (m_size)
          `BYTE:  data = {8'h00  , 8'h00,     8'h00,     m[abus]  };
          `INT16: data = {8'h00  , 8'h00,     m[abus],   m[abus+1]};
          `INT24: data = {8'h00  , m[abus],   m[abus+1], m[abus+2]};
          `INT32: data = {m[abus], m[abus+1], m[abus+2], m[abus+3]};
          endcase
        end
      end else
        data = 32'hZZZZZZZZ;
    end else 
      data = 32'hZZZZZZZZ;
  end
  assign dbus_out = data;
  assign cfg = mconfig[0][0:0];
endmodule

module main;
  reg clock;
  reg [2:0] itype;
  wire [2:0] tick;
  wire [31:0] pc, ir, mar, mdr, dbus;
  wire m_en, m_rw;
  wire [1:0] m_size;
  wire cfg;

  cpu0 cpu(.clock(clock), .itype(itype), .pc(pc), .tick(tick), .ir(ir),
  .mar(mar), .mdr(mdr), .dbus(dbus), .m_en(m_en), .m_rw(m_rw), .m_size(m_size),
  .cfg(cfg));

  memory0 mem(.clock(clock), .reset(reset), .en(m_en), .rw(m_rw), 
  .m_size(m_size), .abus(mar), .dbus_in(mdr), .dbus_out(dbus), .cfg(cfg));

  initial
  begin
    clock = 0;
    itype = `RESET;
    `TIMEOUT $finish;
  end

  always #10 clock=clock+1;

endmodule

lbdex/verilog/Makefile

#TRACE=-D TRACE
all:
	iverilog ${TRACE} -o cpu0Is cpu0.v 
	iverilog ${TRACE} -D CPU0II -o cpu0IIs cpu0.v 

.PHONY: clean
clean:
	rm -rf cpu0.hex cpu0Is cpu0IIs 
	rm -f *~ cpu0.config

Since the Cpu0 Verilog machine supports both big-endian and little-endian modes, the memory and CPU modules communicate this configuration through a dedicated wire.

The endian information is stored in the ROM of the memory module. Upon system startup, the memory module reads this configuration and sends the endian setting to the CPU via the connected wire.

This mechanism is implemented according to the following code snippet:

lbdex/verilog/cpu0.v

assign cfg = mconfig[0][0:0];
...
wire cfg;

cpu0 cpu(.clock(clock), .itype(itype), .pc(pc), .tick(tick), .ir(ir),
.mar(mar), .mdr(mdr), .dbus(dbus), .m_en(m_en), .m_rw(m_rw), .m_size(m_size),
.cfg(cfg));

memory0 mem(.clock(clock), .reset(reset), .en(m_en), .rw(m_rw),
.m_size(m_size), .abus(mar), .dbus_in(mdr), .dbus_out(dbus), .cfg(cfg));

Verify backend

Now let’s compile ch_run_backend.cpp as shown below. Since code size grows from low to high addresses and the stack grows from high to low addresses, the $sp register is set to 0x7fffc. This is because cpu0.v is assumed to use 0x80000 bytes of memory.

lbdex/input/start.h


#ifndef _START_H_
#define _START_H_

#include "config.h"

#define SET_SW \
asm("andi $sw, $zero, 0"); \
asm("ori  $sw, $sw, 0x1e00"); // enable all interrupts

#define initRegs() \
  asm("addiu $1,  $zero, 0"); \
  asm("addiu $2,  $zero, 0"); \
  asm("addiu $3,  $zero, 0"); \
  asm("addiu $4,  $zero, 0"); \
  asm("addiu $5,  $zero, 0"); \
  asm("addiu $t9, $zero, 0"); \
  asm("addiu $7,  $zero, 0"); \
  asm("addiu $8,  $zero, 0"); \
  asm("addiu $9,  $zero, 0"); \
  asm("addiu $10, $zero, 0"); \
  SET_SW;                     \
  asm("addiu $fp, $zero, 0");

#endif

lbdex/input/boot.cpp


#include "start.h"

// boot:
  asm("boot:");
//  asm("_start:");
  asm("jmp 12"); // RESET: jmp RESET_START;
  asm("jmp 4");  // ERROR: jmp ERR_HANDLE;
  asm("jmp 4");  // IRQ: jmp IRQ_HANDLE;
  asm("jmp -4"); // ERR_HANDLE: jmp ERR_HANDLE; (loop forever)

// RESET_START:
  initRegs();
  asm("addiu $gp, $ZERO, 0");
  asm("addiu $lr, $ZERO, -1");
  
  INIT_SP;
  asm("mfc0 $3, $pc");
  asm("addiu $3, $3, 0x8"); // Assume main() entry point is at the next next 
                             // instruction.
  asm("jr $3");
  asm("nop");

lbdex/input/print.h

#ifndef _PRINT_H_
#define _PRINT_H_

#include "start.h"

void print_char(const char c);
void dump_mem(unsigned char *str, int n);
void print_string(const char *str);
void print_integer(int x);
#endif

lbdex/input/print.cpp

#include "print.h"
#include "itoa.cpp"

// For memory IO
void print_char(const char c)
{
  char *p = (char*)IOADDR;
  *p = c;

  return;
}

void print_string(const char *str)
{
  const char *p;

  for (p = str; *p != '\0'; p++)
    print_char(*p);
  print_char(*p);
  print_char('\n');

  return;
}

// For memory IO
void print_integer(int x)
{
  char str[INT_DIGITS + 2];
  itoa(str, x);
  print_string(str);

  return;
}

lbdex/input/ch_nolld.h


#include "debug.h"
#include "boot.cpp"

#include "print.h"

int test_nolld();

lbdex/input/ch_nolld.cpp

#define TEST_ROXV
#define RUN_ON_VERILOG

#include "print.cpp"

#include "ch4_1_math.cpp"
#include "ch4_1_rotate.cpp"
#include "ch4_1_mult2.cpp"
#include "ch4_1_mod.cpp"
#include "ch4_1_div.cpp"
#include "ch4_2_logic.cpp"
#include "ch7_1_localpointer.cpp"
#include "ch7_1_char_short.cpp"
#include "ch7_1_bool.cpp"
#include "ch7_1_longlong.cpp"
#include "ch7_1_vector.cpp"
#include "ch8_1_ctrl.cpp"
#include "ch8_2_deluselessjmp.cpp"
#include "ch8_2_select.cpp"
#include "ch9_1_longlong.cpp"
#include "ch9_3_vararg.cpp"
#include "ch9_3_stacksave.cpp"
#include "ch9_3_bswap.cpp"
#include "ch9_3_alloc.cpp"
#include "ch11_2.cpp"

// Test build only for the following files on build-run_backend.sh since it 
// needs lld linker support.
// Test in build-slink.sh
#include "ch6_1.cpp"
#include "ch9_1_struct.cpp"
#include "ch9_1_constructor.cpp"
#include "ch9_3_template.cpp"
#include "ch12_inherit.cpp"

void test_asm_build()
{
  #include "ch11_1.cpp"
#ifdef CPU032II
  #include "ch11_1_2.cpp"
#endif
}

int test_rotate()
{
  int a = test_rotate_left1(); // rolv 4, 30 = 1
  int b = test_rotate_left(); // rol 8, 30  = 2
  int c = test_rotate_right(); // rorv 1, 30 = 4
  
  return (a+b+c);
}

int test_nolld()
{
  bool pass = true;
  int a = 0;

  a = test_math();
  print_integer(a);  // a = 68
  if (a != 68) pass = false;
  a = test_rotate();
  print_integer(a);  // a = 7
  if (a != 7) pass = false;
  a = test_mult();
  print_integer(a);  // a = 0
  if (a != 0) pass = false;
  a = test_mod();
  print_integer(a);  // a = 0
  if (a != 0) pass = false;
  a = test_div();
  print_integer(a);  // a = 253
  if (a != 253) pass = false;
  a = test_local_pointer();
  print_integer(a);  // a = 3
  if (a != 3) pass = false;
  a = (int)test_load_bool();
  print_integer(a);  // a = 1
  if (a != 1) pass = false;
  a = test_andorxornotcomplement();
  print_integer(a); // a = 13
  if (a != 13) pass = false;
  a = test_setxx();
  print_integer(a); // a = 3
  if (a != 3) pass = false;
  a = test_signed_char();
  print_integer(a); // a = -126
  if (a != -126) pass = false;
  a = test_unsigned_char();
  print_integer(a); // a = 130
  if (a != 130) pass = false;
  a = test_signed_short();
  print_integer(a); // a = -32766
  if (a != -32766) pass = false;
  a = test_unsigned_short();
  print_integer(a); // a = 32770
  if (a != 32770) pass = false;
  long long b = test_longlong();
  print_integer((int)(b >> 32)); // 393307
  if ((int)(b >> 32) != 393307) pass = false;
  print_integer((int)b); // 16777218
  if ((int)(b) != 16777218) pass = false;
  a = test_cmplt_short();
  print_integer(a); // a = -3
  if (a != -3) pass = false;
  a = test_cmplt_long();
  print_integer(a); // a = -4
  if (a != -4) pass = false;
  a = test_control1();
  print_integer(a);	// a = 51
  if (a != 51) pass = false;
  a = test_DelUselessJMP();
  print_integer(a); // a = 2
  if (a != 2) pass = false;
  a = test_movx_1();
  print_integer(a); // a = 3
  if (a != 3) pass = false;
  a = test_movx_2();
  print_integer(a); // a = 1
  if (a != 1) pass = false;
  print_integer(2147483647); // test mod % (mult) from itoa.cpp
  print_integer(-2147483648); // test mod % (multu) from itoa.cpp
  a = test_sum_longlong();
  print_integer(a); // a = 9
  if (a != 9) pass = false;
  a = test_va_arg();
  print_integer(a); // a = 12
  if (a != 12) pass = false;
  a = test_stacksaverestore(100);
  print_integer(a); // a = 5
  if (a != 5) pass = false;
  a = test_bswap();
  print_integer(a); // a = 0
  if (a != 0) pass = false;
  a = test_alloc();
  print_integer(a); // a = 31
  if (a != 31) pass = false;
  a = test_inlineasm();
  print_integer(a); // a = 49
  if (a != 49) pass = false;

  return pass;
}

lbdex/input/ch_run_backend.cpp


#include "ch_nolld.h"

int main()
{
  bool pass = true;
  pass = test_nolld();

  return pass;
}

#include "ch_nolld.cpp"

lbdex/input/functions.sh

prologue() {
  if [ $ARG_NUM == 0 ]; then
    echo "useage: bash $sh_name cpu_type endian"
    echo "  cpu_type: cpu032I or cpu032II"
    echo "  endian: eb (big endian, default) or el (little endian)"
    echo "for example:"
    echo "  bash build-slinker.sh cpu032I be"
    exit 1;
  fi
  if [ $CPU != cpu032I ] && [ $CPU != cpu032II ]; then
    echo "1st argument is cpu032I or cpu032II"
    exit 1
  fi

  OS=`uname -s`
  echo "OS =" ${OS}

  TOOLDIR=~/llvm/test/build/bin
  CLANG=~/llvm/test/build/bin/clang

  CPU=$CPU
  echo "CPU =" "${CPU}"

  if [ "$ENDIAN" != "" ] && [ $ENDIAN != el ] && [ $ENDIAN != eb ]; then
    echo "2nd argument is eb (big endian, default) or el (little endian)"
    exit 1
  fi
  if [ $ENDIAN == eb ]; then
    ENDIAN=
  fi
  echo "ENDIAN =" "${ENDIAN}"

  bash clean.sh
}

isLittleEndian() {
  echo "ENDIAN = " "$ENDIAN"
  if [ "$ENDIAN" == "LittleEndian" ] ; then
    LE="true"
  elif [ "$ENDIAN" == "BigEndian" ] ; then
    LE="false"
  else
    echo "!ENDIAN unknown"
    exit 1
  fi
}

elf2hex() {
  ${TOOLDIR}/llvm-objdump -elf2hex -le=$LE a.out > ../verilog/cpu0.hex
  if [ $LE == "true" ] ; then
    echo "1   /* 0: big endian, 1: little endian */" > ../verilog/cpu0.config
  else
    echo "0   /* 0: big endian, 1: little endian */" > ../verilog/cpu0.config
  fi
  cat ../verilog/cpu0.config
}

epilogue() {
  endian=`${TOOLDIR}/llvm-readobj -h a.out|grep "DataEncoding"|awk '{print $2}'`
  isLittleEndian;
  elf2hex;
}

lbdex/input/build-run_backend.sh

#!/usr/bin/env bash

# for example:
# bash build-run_backend.sh cpu032I el
# bash build-run_backend.sh cpu032II eb

source functions.sh

sh_name=build-run_backend.sh
ARG_NUM=$#
CPU=$1
ENDIAN=$2

DEFFLAGS=""
if [ "$CPU" == cpu032II ] ; then
  DEFFLAGS=${DEFFLAGS}" -DCPU032II"
fi
echo ${DEFFLAGS}

prologue;

# ch8_2_select_global_pic.cpp just for compile build test only, without running 
# on verilog.
$CLANG ${DEFFLAGS} -target mips-unknown-linux-gnu -c ch8_2_select_global_pic.cpp \
-emit-llvm -o ch8_2_select_global_pic.bc
${TOOLDIR}/llc -march=cpu0${ENDIAN} -mcpu=${CPU} -relocation-model=pic \
-filetype=obj ch8_2_select_global_pic.bc -o ch8_2_select_global_pic.cpu0.o

$CLANG ${DEFFLAGS} -target mips-unknown-linux-gnu -c ch_run_backend.cpp \
-emit-llvm -o ch_run_backend.bc
echo "${TOOLDIR}/llc -march=cpu0${ENDIAN} -mcpu=${CPU} -relocation-model=static \
-filetype=obj -enable-cpu0-tail-calls ch_run_backend.bc -o ch_run_backend.cpu0.o"
${TOOLDIR}/llc -march=cpu0${ENDIAN} -mcpu=${CPU} -relocation-model=static \
-filetype=obj -enable-cpu0-tail-calls ch_run_backend.bc -o ch_run_backend.cpu0.o
# print must at the same line, otherwise it will spilt into 2 lines
${TOOLDIR}/llvm-objdump --section=.text -d ch_run_backend.cpu0.o | tail -n +8| awk \
'{print "/* " $1 " */\t" $2 " " $3 " " $4 " " $5 "\t/* " $6"\t" $7" " $8" " $9" " $10 "\t*/"}' \
 > ../verilog/cpu0.hex

ENDIAN=`${TOOLDIR}/llvm-readobj -h ch_run_backend.cpu0.o|grep "DataEncoding"|awk '{print $2}'`
isLittleEndian;

if [ $LE == "true" ] ; then
  echo "1   /* 0: big ENDIAN, 1: little ENDIAN */" > ../verilog/cpu0.config
else
  echo "0   /* 0: big ENDIAN, 1: little ENDIAN */" > ../verilog/cpu0.config
fi
cat ../verilog/cpu0.config

To run program without linker implementation at this point, the boot.cpp must be set at the beginning of code, and the main() of ch_run_backend.cpp comes immediately after it.

Let’s run Chapter11_2/ with llvm-objdump -d for input file ch_run_backend.cpp to generate the hex file via build-run_bacekend.sh, then feed the hex file to cpu0’s Verilog simulator to get the output result as below.

Remind that ch_run_backend.cpp has to be compiled with the option clang -target mips-unknown-linux-gnu since the example code ch9_3_vararg.cpp, which uses vararg, needs to be compiled with this option. Other example codes have no differences between this option and the default option.

JonathantekiiMac:input Jonathan$ pwd
/Users/Jonathan/llvm/test/lbdex/input
JonathantekiiMac:input Jonathan$ bash build-run_backend.sh cpu032I eb
JonathantekiiMac:input Jonathan$ cd ../verilog cd ../verilog
JonathantekiiMac:input Jonathan$ pwd
/Users/Jonathan/llvm/test/lbdex/verilog
JonathantekiiMac:verilog Jonathan$ make
JonathantekiiMac:verilog Jonathan$ ./cpu0Is
WARNING: cpu0Is.v:386: $readmemh(cpu0.hex): Not enough words in the file for the
taskInterrupt(001)
68
7
0
0
253
3
1
13
3
-126
130
-32766
32770
393307
16777218
3
4
51
2
3
1
2147483647
-2147483648
15
5
0
31
49
total cpu cycles = 50645
RET to PC < 0, finished!

JonathantekiiMac:input Jonathan$ bash build-run_backend.sh cpu032II eb
JonathantekiiMac:input Jonathan$ cd ../verilog
JonathantekiiMac:verilog Jonathan$ ./cpu0IIs
...
total cpu cycles = 48335
RET to PC < 0, finished!

The “total CPU cycles” are calculated in this Verilog simulator to allow performance review of both the backend compiler and the CPU.

Only CPU cycles are counted in this implementation, as I/O cycle times are unknown.

As explained in Chapter “Control Flow Statements”, cpu032II, which uses instructions slt and beq, performs better than cmp and jeq in cpu032I.

The instruction jmp has no delay slot, making it preferable for use in dynamic linker implementations.

You can trace memory binary code and changes to destination registers at every instruction execution by unmarking TRACE in the Makefile, as shown below:

lbdex/verilog/Makefile

TRACE=-D TRACE
JonathantekiiMac:raw Jonathan$ ./cpu0Is
WARNING: cpu0.v:386: $readmemh(cpu0.hex): Not enough words in the file for the
requested range [0:28671].
00000000: 2600000c
00000004: 26000004
00000008: 26000004
0000000c: 26fffffc
00000010: 09100000
00000014: 09200000
...
taskInterrupt(001)
1530ns 00000054 : 02ed002c m[28620+44  ]=-1          SW=00000000
1610ns 00000058 : 02bd0028 m[28620+40  ]=0           SW=00000000
...
RET to PC < 0, finished!

As shown in the result above, cpu0.v dumps the memory content after reading the input file cpu0.hex. Next, it runs instructions from address 0 and prints each destination register value in the fourth column.

The first column is the timestamp in nanoseconds. The second column is the instruction address. The third column is the instruction content.

Most of the example codes discussed in previous chapters are verified by printing variables using print_integer().

Since the cpu0.v machine is written in Verilog, it is assumed to be capable of running on a real FPGA device (though I have not tested this myself). The actual output hardware interface or port depends on the specific output device, such as RS232, speaker, LED, etc. You must implement the I/O interface or port and wire your I/O device accordingly when programming an FPGA.

By running the compiled code on the Verilog simulator, the compiled result from the Cpu0 backend and the total CPU cycles can be verified and measured.

Currently, this Cpu0 Verilog implementation does not support pipeline architecture. However, based on the instruction set, it can be extended to a pipelined model.

The cycle time of the pipelined Cpu0 model is expected to be more than 1/5 of the “total CPU cycles” shown above, due to dependencies between instructions.

Although the Verilog simulator is slow for running full system programs and does not count cycles for cache and I/O operations, it provides a simple and effective way to validate CPU design ideas in the early development stages using small program patterns.

Creating a full system simulator is complex. While the Wiki website [7] provides tools for building simulators, doing so requires significant effort.

To generate cpu032I code with little-endian format, you can run the following command. The script build-run_backend.sh writes the endian configuration to ../verilog/cpu0.config as shown below.

JonathantekiiMac:input Jonathan$ bash build-run_backend.sh cpu032I el

../verilog/cpu0.config

1   /* 0: big endian, 1: little endian */

The following files test more features.

lbdex/input/ch_nolld2.h


#include "debug.h"
#include "boot.cpp"

#include "print.h"

int test_nolld2();

lbdex/input/ch_nolld2.cpp


#include "print.cpp"

#include "ch9_3_alloc.cpp"

int test_nolld2()
{
  bool pass = true;
  int a = 0;

  a = test_alloc();
  print_integer(a);  // a = 31
  if (a != 31) pass = false;
  return pass;
}

lbdex/input/ch_run_backend2.cpp


#include "ch_nolld2.h"

int main()
{
  bool pass = true;
  pass = test_nolld2();

  return pass;
}

#include "ch_nolld2.cpp"

lbdex/input/build-run_backend2.sh

#!/usr/bin/env bash

source functions.sh

sh_name=build-run_backend.sh
ARG_NUM=$#
CPU=$1
ENDIAN=$2

DEFFLAGS=""
if [ "$arg1" == cpu032II ] ; then
  DEFFLAGS=${DEFFLAGS}" -DCPU032II"
fi
echo ${DEFFLAGS}

prologue;

${CLANG} ${DEFFLAGS} -c ch_run_backend2.cpp \
-emit-llvm -o ch_run_backend2.bc
${TOOLDIR}/llc -march=cpu0${ENDIAN} -mcpu=${CPU} -relocation-model=static \
-filetype=obj ch_run_backend2.bc -o ch_run_backend2.cpu0.o
${TOOLDIR}/llvm-objdump -d ch_run_backend2.cpu0.o | tail -n +8| awk \
'{print "/* " $1 " */\t" $2 " " $3 " " $4 " " $5 "\t/* " $6"\t" $7" " $8" \
" $9" " $10 "\t*/"}' > ../verilog/cpu0.hex

ENDIAN=`${TOOLDIR}/llvm-readobj -h ch_run_backend2.cpu0.o|grep "DataEncoding"|awk '{print $2}'`
isLittleEndian;

if [ $LE == "true" ] ; then
  echo "1   /* 0: big endian, 1: little endian */" > ../verilog/cpu0.config
else
  echo "0   /* 0: big endian, 1: little endian */" > ../verilog/cpu0.config
fi
cat ../verilog/cpu0.config
JonathantekiiMac:input Jonathan$ bash build-run_backend.sh cpu032II el
...
JonathantekiiMac:input Jonathan$ cd ../verilog
JonathantekiiMac:verilog Jonathan$ ./cpu0IIs
...
31
...

Other LLVM-Based Tools for Cpu0 Processor

You can find the Cpu0 ELF linker implementation based on lld, which is the official LLVM linker project, as well as elf2hex, which is modified from the llvm-objdump driver, at the following website:

http://jonathan2251.github.io/lbt/index.html