Global variables

In the last three chapters, we only access the local variables. This chapter deals global variable access translation.

The global variable DAG translation is different from the previous DAG translations until now we have. It creates IR DAG nodes at run time in backend C++ code according the llc -relocation-model option while the others of DAG just do IR DAG to Machine DAG translation directly according the input file of IR DAGs (except the Pseudo instruction RetLR used in Chapter3_4). Readers should focus on how to add code for creating DAG nodes at run time and how to define the pattern match in td for the run time created DAG nodes. In addition, the machine instruction printing function for global variable related assembly directive (macro) should be cared if your backend has it.

Chapter6_1/ supports the global variable, let’s compile ch6_1.cpp with this version first, then explain the code changes after that.

lbdex/input/ch6_1.cpp

int gStart = 3;
int gI = 100;
int test_global()
{
  int c = 0;

  c = gI;

  return c;
}
118-165-78-166:input Jonathan$ llvm-dis ch6_1.bc -o -
...
@gStart = global i32 2, align 4
@gI = global i32 100, align 4

define i32 @_Z3funv() nounwind uwtable ssp {
  %1 = alloca i32, align 4
  %c = alloca i32, align 4
  store i32 0, i32* %1
  store i32 0, i32* %c, align 4
  %2 = load i32* @gI, align 4
  store i32 %2, i32* %c, align 4
  %3 = load i32* %c, align 4
  ret i32 %3
}

Cpu0 global variable options

Just like Mips, Cpu0 supports both static and pic mode. There are two different layout of global variables for static mode which controlled by option cpu0-use-small-section. Chapter6_1/ supports the global variable translation. Let’s run Chapter6_1/ with ch6_1.cpp via four different options llc  -relocation-model=static -cpu0-use-small-section=false, llc  -relocation-model=static -cpu0-use-small-section=true, llc  -relocation-model=pic -cpu0-use-small-section=false and llc  -relocation-model=pic -cpu0-use-small-section=true to tracing the DAGs and Cpu0 instructions.

118-165-78-166:input Jonathan$ clang -target mips-unknown-linux-gnu -c
ch6_1.cpp -emit-llvm -o ch6_1.bc
118-165-78-166:input Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
Debug/bin/llc -march=cpu0 -relocation-model=static -cpu0-use-small-section=false
-filetype=asm -debug ch6_1.bc -o -

...
Type-legalized selection DAG: BB#0 '_Z11test_globalv:'
SelectionDAG has 12 nodes:
  ...
      0x7ffd5902cc10: <multiple use>
    0x7ffd5902cf10: ch = store 0x7ffd5902cd10, 0x7ffd5902ca10, 0x7ffd5902ce10,
    0x7ffd5902cc10<ST4[%c]> [ORD=2] [ID=-3]

    0x7ffd5902d010: i32 = GlobalAddress<i32* @gI> 0 [ORD=3] [ID=-3]

    0x7ffd5902cc10: <multiple use>
  0x7ffd5902d110: i32,ch = load 0x7ffd5902cf10, 0x7ffd5902d010,
  0x7ffd5902cc10<LD4[@gI]> [ORD=3] [ID=-3]
  ...

Legalized selection DAG: BB#0 '_Z11test_globalv:'
SelectionDAG has 16 nodes:
  ...
      0x7ffd5902cc10: <multiple use>
    0x7ffd5902cf10: ch = store 0x7ffd5902cd10, 0x7ffd5902ca10, 0x7ffd5902ce10,
    0x7ffd5902cc10<ST4[%c]> [ORD=2] [ID=8]

        0x7ffd5902d310: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=5]

      0x7ffd5902d710: i32 = Cpu0ISD::Hi 0x7ffd5902d310

        0x7ffd5902d610: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=6]

      0x7ffd5902d810: i32 = Cpu0ISD::Lo 0x7ffd5902d610

    0x7ffd5902fe10: i32 = add 0x7ffd5902d710, 0x7ffd5902d810

    0x7ffd5902cc10: <multiple use>
  0x7ffd5902d110: i32,ch = load 0x7ffd5902cf10, 0x7ffd5902fe10,
  0x7ffd5902cc10<LD4[@gI]> [ORD=3] [ID=9]
  ...
  lui $2, %hi(gI)
  ori $2, $2, %lo(gI)
      ld      $2, 0($2)
      ...
      .type   gStart,@object          # @gStart
      .data
      .globl  gStart
      .align  2
gStart:
      .4byte  2                       # 0x2
      .size   gStart, 4

      .type   gI,@object              # @gI
      .globl  gI
      .align  2
gI:
      .4byte  100                     # 0x64
      .size   gI, 4
118-165-78-166:input Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
Debug/bin/llc -march=cpu0 -relocation-model=static -cpu0-use-small-section=true
-filetype=asm -debug ch6_1.bc -o -

...
Type-legalized selection DAG: BB#0 '_Z11test_globalv:'
SelectionDAG has 12 nodes:
  ...
      0x7fc5f382cc10: <multiple use>
    0x7fc5f382cf10: ch = store 0x7fc5f382cd10, 0x7fc5f382ca10, 0x7fc5f382ce10,
    0x7fc5f382cc10<ST4[%c]> [ORD=2] [ID=-3]

    0x7fc5f382d010: i32 = GlobalAddress<i32* @gI> 0 [ORD=3] [ID=-3]

    0x7fc5f382cc10: <multiple use>
  0x7fc5f382d110: i32,ch = load 0x7fc5f382cf10, 0x7fc5f382d010,
  0x7fc5f382cc10<LD4[@gI]> [ORD=3] [ID=-3]
  ...
Legalized selection DAG: BB#0 '_Z11test_globalv:'
SelectionDAG has 15 nodes:
  ...
      0x7fc5f382cc10: <multiple use>
    0x7fc5f382cf10: ch = store 0x7fc5f382cd10, 0x7fc5f382ca10, 0x7fc5f382ce10,
    0x7fc5f382cc10<ST4[%c]> [ORD=2] [ID=8]

      0x7fc5f382d710: i32 = register %GP

        0x7fc5f382d310: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=4]

      0x7fc5f382d610: i32 = Cpu0ISD::GPRel 0x7fc5f382d310

    0x7fc5f382d810: i32 = add 0x7fc5f382d710, 0x7fc5f382d610

    0x7fc5f382cc10: <multiple use>
  0x7fc5f382d110: i32,ch = load 0x7fc5f382cf10, 0x7fc5f382d810,
  0x7fc5f382cc10<LD4[@gI]> [ORD=3] [ID=9]
  ...

      ori     $2, $gp, %gp_rel(gI)
      ld      $2, 0($2)
      ...
      .type   gStart,@object          # @gStart
      .section        .sdata,"aw",@progbits
      .globl  gStart
      .align  2
gStart:
      .4byte  2                       # 0x2
      .size   gStart, 4

      .type   gI,@object              # @gI
      .globl  gI
      .align  2
gI:
      .4byte  100                     # 0x64
      .size   gI, 4
118-165-78-166:input Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
Debug/bin/llc -march=cpu0 -relocation-model=pic -cpu0-use-small-section=false
-filetype=asm -debug ch6_1.bc -o -

  ...
Type-legalized selection DAG: BB#0 '_Z11test_globalv:'
SelectionDAG has 11 nodes:
  ...
      0x7fe03c02e010: <multiple use>
    0x7fe03c02e118: ch = store 0x7fe03b50dee0, 0x7fe03c02de00, 0x7fe03c02df08,
    0x7fe03c02e010<ST4[%c]> [ORD=3] [ID=-3]

    0x7fe03c02e220: i32 = GlobalAddress<i32* @gI> 0 [ORD=4] [ID=-3]

    0x7fe03c02e010: <multiple use>
  0x7fe03c02e328: i32,ch = load 0x7fe03c02e118, 0x7fe03c02e220,
  0x7fe03c02e010<LD4[@gI]> [ORD=4] [ID=-3]
  ...
Legalized selection DAG: BB#0 '_Z11test_globalv:'
SelectionDAG has 15 nodes:
  ...
      0x7fe03c02e010: <multiple use>
    0x7fe03c02e118: ch = store 0x7fe03b50dee0, 0x7fe03c02de00, 0x7fe03c02df08,
    0x7fe03c02e010<ST4[%c]> [ORD=3] [ID=6]

        0x7fe03c02e538: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=5] [ORD=4]

      0x7fe03c02ea60: i32 = Cpu0ISD::Hi 0x7fe03c02e538 [ORD=4]

        0x7fe03c02e958: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=6] [ORD=4]

      0x7fe03c02eb68: i32 = Cpu0ISD::Lo 0x7fe03c02e958 [ORD=4]

    0x7fe03c02ec70: i32 = add 0x7fe03c02ea60, 0x7fe03c02eb68 [ORD=4]

    0x7fe03c02e010: <multiple use>
  0x7fe03c02e328: i32,ch = load 0x7fe03c02e118, 0x7fe03c02ec70,
  0x7fe03c02e010<LD4[@gI]> [ORD=4] [ID=7]
  ...
        lui   $2, %got_hi(gI)
        addu  $2, $2, $gp
        ld    $2, %got_lo(gI)($2)
  ...
    .type gStart,@object          # @gStart
  .data
  .globl  gStart
  .align  2
gStart:
  .4byte  3                       # 0x3
  .size gStart, 4

  .type gI,@object              # @gI
  .globl  gI
  .align  2
gI:
  .4byte  100                     # 0x64
  .size gI, 4
118-165-78-166:input Jonathan$ /Users/Jonathan/llvm/test/cmake_debug_build/
Debug/bin/llc -march=cpu0 -relocation-model=pic -cpu0-use-small-section=true
-filetype=asm -debug ch6_1.bc -o -

...
Type-legalized selection DAG: BB#0 '_Z11test_globalv:'
SelectionDAG has 11 nodes:
  ...
      0x7fad7102cc10: <multiple use>
    0x7fad7102cf10: ch = store 0x7fad7102cd10, 0x7fad7102ca10, 0x7fad7102ce10,
    0x7fad7102cc10<ST4[%c]> [ORD=2] [ID=-3]

    0x7fad7102d010: i32 = GlobalAddress<i32* @gI> 0 [ORD=3] [ID=-3]

    0x7fad7102cc10: <multiple use>
  0x7fad7102d110: i32,ch = load 0x7fad7102cf10, 0x7fad7102d010,
  0x7fad7102cc10<LD4[@gI]> [ORD=3] [ID=-3]
  ...
Legalized selection DAG: BB#0 '_Z11test_globalv:'
SelectionDAG has 14 nodes:
  0x7ff3c9c10b98: ch = EntryToken [ORD=1] [ID=0]
  ...
      0x7fad7102cc10: <multiple use>
    0x7fad7102cf10: ch = store 0x7fad7102cd10, 0x7fad7102ca10, 0x7fad7102ce10,
    0x7fad7102cc10<ST4[%c]> [ORD=2] [ID=8]

      0x7fad70c10b98: <multiple use>
        0x7fad7102d610: i32 = Register %GP

        0x7fad7102d310: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=1]

      0x7fad7102d710: i32 = Cpu0ISD::Wrapper 0x7fad7102d610, 0x7fad7102d310

      0x7fad7102cc10: <multiple use>
    0x7fad7102d810: i32,ch = load 0x7fad70c10b98, 0x7fad7102d710,
    0x7fad7102cc10<LD4[<unknown>]>

    0x7ff3ca02cc10: <multiple use>
  0x7ff3ca02d110: i32,ch = load 0x7ff3ca02cf10, 0x7ff3ca02d810,
  0x7ff3ca02cc10<LD4[@gI]> [ORD=3] [ID=9]
  ...
        .set  noreorder
        .cpload       $6
        .set  nomacro
  ...
      ld      $2, %got(gI)($gp)
      ld      $2, 0($2)
  ...
      .type   gStart,@object          # @gStart
      .data
      .globl  gStart
      .align  2
gStart:
      .4byte  2                       # 0x2
      .size   gStart, 4

      .type   gI,@object              # @gI
      .globl  gI
      .align  2
gI:
      .4byte  100                     # 0x64
      .size   gI, 4

Summary above information to Table: Cpu0 global variable options.

Table 22 Cpu0 global variable options
option name default other option value discription
-relocation-model pic static
  • pic: Postion Independent Address
  • static: Absolute Address
-cpu0-use-small-section false true
  • false: .data or .bss, 32 bits addressable
  • true: .sdata or .sbss, 16 bits addressable
Table 23 Cpu0 DAGs and instructions for -relocation-model=static
option: cpu0-use-small-section false true
addressing mode absolute $gp relative
addressing absolute $gp+offset
Legalized selection DAG (add Cpu0ISD::Hi<gI offset Hi16> Cpu0ISD::Lo<gI offset Lo16>) (add register %GP, Cpu0ISD::GPRel<gI offset>)
Cpu0 lui $2, %hi(gI); ori $2, $2, %lo(gI); ori $2, $gp, %gp_rel(gI);
relocation records solved link time link time
  • In static, cpu0-use-small-section=true, offset between gI and .data can be calculated since the $gp is assigned at fixed address of the start of global address table.
  • In “static, cpu0-use-small-section=false”, the gI high and low address (%hi(gI) and %lo(gI)) are translated into absolute address.
Table 24 Cpu0 DAGs and instructions for -relocation-model=pic
option: cpu0-use-small-section false true
addressing mode $gp relative $gp relative
addressing $gp+offset $gp+offset
Legalized selection DAG (load (Cpu0ISD::Wrapper register %GP, <gI offset>)) (load EntryToken, (Cpu0ISD::Wrapper (add Cpu0ISD::Hi<gI offset Hi16>, Register %GP), Cpu0ISD::Lo<gI offset Lo16>))
Cpu0 ld $2, %got(gI)($gp); lui $2, %got_hi(gI); add $2, $2, $gp; ld $2, %got_lo(gI)($2);
relocation records solved link/load time link/load time
  • In pic, offset between gI and .data cannot be calculated if the function is loaded at run time (dynamic link); the offset can be calculated if use static link.
  • In C, all variable names binding staticly. In C++, the overload variable or function are binding dynamicly.

According book of system program, there are Absolute Addressing Mode and Position Independent Addressing Mode. The dynamic function must be compiled with Position Independent Addressing Mode. In general, option -relocation-model is used to generate either Absolute Addressing or Position Independent Addressing. The exception is -relocation-model=static and -cpu0-use-small-section=false. In this case, the register $gp is reserved to set at the start address of global variable area. Cpu0 uses $gp relative addressing in this mode.

To support global variable, first add UseSmallSectionOpt command variable to Cpu0Subtarget.cpp. After that, user can run llc with option llc -cpu0-use-small-section=false to specify UseSmallSectionOpt to false. The default of UseSmallSectionOpt is false if without specify it further. About the cl::opt command line variable, you can refer to here [1] further.

lbdex/chapters/Chapter6_1/Cpu0Subtarget.h

extern bool Cpu0ReserveGP;
extern bool Cpu0NoCpload;
class Cpu0Subtarget : public Cpu0GenSubtargetInfo {
  ...
  // UseSmallSection - Small section is used.
  bool UseSmallSection;
  bool useSmallSection() const { return UseSmallSection; }
  ...
};

lbdex/chapters/Chapter6_1/Cpu0Subtarget.cpp

static cl::opt<bool> UseSmallSectionOpt
                ("cpu0-use-small-section", cl::Hidden, cl::init(false),
                 cl::desc("Use small section. Only work when -relocation-model="
                 "static. pic always not use small section."));

static cl::opt<bool> ReserveGPOpt
                ("cpu0-reserve-gp", cl::Hidden, cl::init(false),
                 cl::desc("Never allocate $gp to variable"));

static cl::opt<bool> NoCploadOpt
                ("cpu0-no-cpload", cl::Hidden, cl::init(false),
                 cl::desc("No issue .cpload"));

bool Cpu0ReserveGP;
bool Cpu0NoCpload;
Cpu0Subtarget::Cpu0Subtarget(const Triple &TT, const std::string &CPU,
                             const std::string &FS, bool little, 
                             const Cpu0TargetMachine &_TM) :
  // Set UseSmallSection.
  UseSmallSection = UseSmallSectionOpt;
  Cpu0ReserveGP = ReserveGPOpt;
  Cpu0NoCpload = NoCploadOpt;
  ...
}

The options ReserveGPOpt and NoCploadOpt will used in Cpu0 linker at later Chapter. Next add the following code to files Cpu0BaseInfo.h, Cpu0TargetObjectFile.h, Cpu0TargetObjectFile.cpp, Cpu0RegisterInfo.cpp and Cpu0ISelLowering.cpp.

lbdex/chapters/Chapter6_1/Cpu0BaseInfo.h

enum TOF {
  ...
  /// MO_GOT16 - Represents the offset into the global offset table at which
  /// the address the relocation entry symbol resides during execution.
  MO_GOT16,
  MO_GOT,
...
}; // enum TOF {

lbdex/chapters/Chapter6_1/Cpu0TargetObjectFile.h

    /// IsGlobalInSmallSection - Return true if this global address should be
    /// placed into small data/bss section.
    bool IsGlobalInSmallSection(const GlobalValue *GV,
                                const TargetMachine &TM, SectionKind Kind) const;
    bool IsGlobalInSmallSection(const GlobalValue *GV,
                                const TargetMachine &TM) const;
    bool IsGlobalInSmallSectionImpl(const GlobalValue *GV,
                                    const TargetMachine &TM) const;

    MCSection *SelectSectionForGlobal(const GlobalValue *GV, SectionKind Kind,
                                      Mangler &Mang,
                                      const TargetMachine &TM) const override;

lbdex/chapters/Chapter6_1/Cpu0TargetObjectFile.cpp

// A address must be loaded from a small section if its size is less than the
// small section size threshold. Data in this section must be addressed using
// gp_rel operator.
static bool IsInSmallSection(uint64_t Size) {
  return Size > 0 && Size <= SSThreshold;
}

bool Cpu0TargetObjectFile::IsGlobalInSmallSection(const GlobalValue *GV,
                                                const TargetMachine &TM) const {
  if (GV->isDeclaration() || GV->hasAvailableExternallyLinkage())
    return false;

  return IsGlobalInSmallSection(GV, TM, getKindForGlobal(GV, TM));
}

/// IsGlobalInSmallSection - Return true if this global address should be
/// placed into small data/bss section.
bool Cpu0TargetObjectFile::
IsGlobalInSmallSection(const GlobalValue *GV, const TargetMachine &TM,
                       SectionKind Kind) const {
  return (IsGlobalInSmallSectionImpl(GV, TM) &&
          (Kind.isData() || Kind.isBSS() || Kind.isCommon()));
}

/// Return true if this global address should be placed into small data/bss
/// section. This method does all the work, except for checking the section
/// kind.
bool Cpu0TargetObjectFile::
IsGlobalInSmallSectionImpl(const GlobalValue *GV,
                           const TargetMachine &TM) const {
  const Cpu0Subtarget &Subtarget =
      *static_cast<const Cpu0TargetMachine &>(TM).getSubtargetImpl();

  // Return if small section is not available.
  if (!Subtarget.useSmallSection())
    return false;

  // Only global variables, not functions.
  const GlobalVariable *GVA = dyn_cast<GlobalVariable>(GV);
  if (!GVA)
    return false;

  Type *Ty = GV->getValueType();
  return IsInSmallSection(
      GV->getParent()->getDataLayout().getTypeAllocSize(Ty));
}


MCSection *
Cpu0TargetObjectFile::SelectSectionForGlobal(const GlobalValue *GV, 
                                             SectionKind Kind, Mangler &Mang,
                                             const TargetMachine &TM) const {
  // TODO: Could also support "weak" symbols as well with ".gnu.linkonce.s.*"
  // sections?

  // Handle Small Section classification here.
  if (Kind.isBSS() && IsGlobalInSmallSection(GV, TM, Kind))
    return SmallBSSSection;
  if (Kind.isData() && IsGlobalInSmallSection(GV, TM, Kind))
    return SmallDataSection;

  // Otherwise, we work the same as ELF.
  return TargetLoweringObjectFileELF::SelectSectionForGlobal(GV, Kind, Mang,TM);
}

lbdex/chapters/Chapter6_1/Cpu0RegisterInfo.cpp

BitVector Cpu0RegisterInfo::
getReservedRegs(const MachineFunction &MF) const {
  ...
    Reserved.set(Cpu0::GP);
  ...
}

lbdex/chapters/Chapter6_1/Cpu0ISelLowering.h

    SDValue getGlobalReg(SelectionDAG &DAG, EVT Ty) const;

    // This method creates the following nodes, which are necessary for
    // computing a local symbol's address:
    //
    // (add (load (wrapper $gp, %got(sym)), %lo(sym))
    template<class NodeTy>
    SDValue getAddrLocal(NodeTy *N, EVT Ty, SelectionDAG &DAG) const {
      SDLoc DL(N);
      unsigned GOTFlag = Cpu0II::MO_GOT;
      SDValue GOT = DAG.getNode(Cpu0ISD::Wrapper, DL, Ty, getGlobalReg(DAG, Ty),
                                getTargetNode(N, Ty, DAG, GOTFlag));
      SDValue Load =
          DAG.getLoad(Ty, DL, DAG.getEntryNode(), GOT,
                      MachinePointerInfo::getGOT(DAG.getMachineFunction()));
      unsigned LoFlag = Cpu0II::MO_ABS_LO;
      SDValue Lo = DAG.getNode(Cpu0ISD::Lo, DL, Ty,
                               getTargetNode(N, Ty, DAG, LoFlag));
      return DAG.getNode(ISD::ADD, DL, Ty, Load, Lo);
    }

    //@getAddrGlobal {
    // This method creates the following nodes, which are necessary for
    // computing a global symbol's address:
    //
    // (load (wrapper $gp, %got(sym)))
    template<class NodeTy>
    SDValue getAddrGlobal(NodeTy *N, EVT Ty, SelectionDAG &DAG,
                          unsigned Flag, SDValue Chain,
                          const MachinePointerInfo &PtrInfo) const {
      SDLoc DL(N);
      SDValue Tgt = DAG.getNode(Cpu0ISD::Wrapper, DL, Ty, getGlobalReg(DAG, Ty),
                                getTargetNode(N, Ty, DAG, Flag));
      return DAG.getLoad(Ty, DL, Chain, Tgt, PtrInfo);
    }
    //@getAddrGlobal }

    //@getAddrGlobalLargeGOT {
    // This method creates the following nodes, which are necessary for
    // computing a global symbol's address in large-GOT mode:
    //
    // (load (wrapper (add %hi(sym), $gp), %lo(sym)))
    template<class NodeTy>
    SDValue getAddrGlobalLargeGOT(NodeTy *N, EVT Ty, SelectionDAG &DAG,
                                  unsigned HiFlag, unsigned LoFlag,
                                  SDValue Chain,
                                  const MachinePointerInfo &PtrInfo) const {
      SDLoc DL(N);
      SDValue Hi = DAG.getNode(Cpu0ISD::Hi, DL, Ty,
                               getTargetNode(N, Ty, DAG, HiFlag));
      Hi = DAG.getNode(ISD::ADD, DL, Ty, Hi, getGlobalReg(DAG, Ty));
      SDValue Wrapper = DAG.getNode(Cpu0ISD::Wrapper, DL, Ty, Hi,
                                    getTargetNode(N, Ty, DAG, LoFlag));
      return DAG.getLoad(Ty, DL, Chain, Wrapper, PtrInfo);
    }
    //@getAddrGlobalLargeGOT }

    //@getAddrNonPIC
    // This method creates the following nodes, which are necessary for
    // computing a symbol's address in non-PIC mode:
    //
    // (add %hi(sym), %lo(sym))
    template<class NodeTy>
    SDValue getAddrNonPIC(NodeTy *N, EVT Ty, SelectionDAG &DAG) const {
      SDLoc DL(N);
      SDValue Hi = getTargetNode(N, Ty, DAG, Cpu0II::MO_ABS_HI);
      SDValue Lo = getTargetNode(N, Ty, DAG, Cpu0II::MO_ABS_LO);
      return DAG.getNode(ISD::ADD, DL, Ty,
                         DAG.getNode(Cpu0ISD::Hi, DL, Ty, Hi),
                         DAG.getNode(Cpu0ISD::Lo, DL, Ty, Lo));
    }

lbdex/chapters/Chapter6_1/Cpu0ISelLowering.cpp

SDValue Cpu0TargetLowering::getGlobalReg(SelectionDAG &DAG, EVT Ty) const {
  Cpu0FunctionInfo *FI = DAG.getMachineFunction().getInfo<Cpu0FunctionInfo>();
  return DAG.getRegister(FI->getGlobalBaseReg(), Ty);
}

//@getTargetNode(GlobalAddressSDNode
SDValue Cpu0TargetLowering::getTargetNode(GlobalAddressSDNode *N, EVT Ty,
                                          SelectionDAG &DAG,
                                          unsigned Flag) const {
  return DAG.getTargetGlobalAddress(N->getGlobal(), SDLoc(N), Ty, 0, Flag);
}

//@getTargetNode(ExternalSymbolSDNode
SDValue Cpu0TargetLowering::getTargetNode(ExternalSymbolSDNode *N, EVT Ty,
                                          SelectionDAG &DAG,
                                          unsigned Flag) const {
  return DAG.getTargetExternalSymbol(N->getSymbol(), Ty, Flag);
}
Cpu0TargetLowering::Cpu0TargetLowering(const Cpu0TargetMachine &TM,
                                       const Cpu0Subtarget &STI)
    : TargetLowering(TM), Subtarget(STI), ABI(TM.getABI()) {

  setOperationAction(ISD::GlobalAddress,      MVT::i32,   Custom);
}
SDValue Cpu0TargetLowering::
LowerOperation(SDValue Op, SelectionDAG &DAG) const
{
  switch (Op.getOpcode())
  {
  case ISD::GlobalAddress:      return lowerGlobalAddress(Op, DAG);
  }
  return SDValue();
}
SDValue Cpu0TargetLowering::lowerGlobalAddress(SDValue Op,
                                               SelectionDAG &DAG) const {
  //@lowerGlobalAddress }
  SDLoc DL(Op);
  const Cpu0TargetObjectFile *TLOF =
        static_cast<const Cpu0TargetObjectFile *>(
            getTargetMachine().getObjFileLowering());
  //@lga 1 {
  EVT Ty = Op.getValueType();
  GlobalAddressSDNode *N = cast<GlobalAddressSDNode>(Op);
  const GlobalValue *GV = N->getGlobal();
  //@lga 1 }

  if (!isPositionIndependent()) {
    //@ %gp_rel relocation
    if (TLOF->IsGlobalInSmallSection(GV, getTargetMachine())) {
      SDValue GA = DAG.getTargetGlobalAddress(GV, DL, MVT::i32, 0,
                                              Cpu0II::MO_GPREL);
      SDValue GPRelNode = DAG.getNode(Cpu0ISD::GPRel, DL,
                                      DAG.getVTList(MVT::i32), GA);
      SDValue GPReg = DAG.getRegister(Cpu0::GP, MVT::i32);
      return DAG.getNode(ISD::ADD, DL, MVT::i32, GPReg, GPRelNode);
    }

    //@ %hi/%lo relocation
    return getAddrNonPIC(N, Ty, DAG);
  }

  if (GV->hasInternalLinkage() || (GV->hasLocalLinkage() && !isa<Function>(GV)))
    return getAddrLocal(N, Ty, DAG);

  //@large section
  if (!TLOF->IsGlobalInSmallSection(GV, getTargetMachine()))
    return getAddrGlobalLargeGOT(
        N, Ty, DAG, Cpu0II::MO_GOT_HI16, Cpu0II::MO_GOT_LO16, 
        DAG.getEntryNode(), 
        MachinePointerInfo::getGOT(DAG.getMachineFunction()));
  return getAddrGlobal(
      N, Ty, DAG, Cpu0II::MO_GOT, DAG.getEntryNode(), 
      MachinePointerInfo::getGOT(DAG.getMachineFunction()));
}

The setOperationAction(ISD::GlobalAddress, MVT::i32, Custom) tells llc that we implement global address operation in C++ function Cpu0TargetLowering::LowerOperation(). LLVM will call this function only when llvm want to translate IR DAG of loading global variable into machine code. Although all the Custom type of IR operations set by setOperationAction(ISD::XXX, MVT::XXX, Custom) in construction function Cpu0TargetLowering() will invoke llvm to call Cpu0TargetLowering::LowerOperation() in stage “Legalized selection DAG”, the global address access operation can be identified by checking whether the opcode of DAG Node is ISD::GlobalAddress or not, furthmore.

Finally, add the following code in Cpu0ISelDAGToDAG.cpp and Cpu0InstrInfo.td.

lbdex/chapters/Chapter6_1/Cpu0ISelDAGToDAG.h

  SDNode *getGlobalBaseReg();

lbdex/chapters/Chapter6_1/Cpu0ISelDAGToDAG.cpp

/// getGlobalBaseReg - Output the instructions required to put the
/// GOT address into a register.
SDNode *Cpu0DAGToDAGISel::getGlobalBaseReg() {
  unsigned GlobalBaseReg = MF->getInfo<Cpu0FunctionInfo>()->getGlobalBaseReg();
  return CurDAG->getRegister(GlobalBaseReg, getTargetLowering()->getPointerTy(
                                                CurDAG->getDataLayout()))
      .getNode();
}
/// ComplexPattern used on Cpu0InstrInfo
/// Used on Cpu0 Load/Store instructions
bool Cpu0DAGToDAGISel::
SelectAddr(SDNode *Parent, SDValue Addr, SDValue &Base, SDValue &Offset) {
  // on PIC code Load GA
  if (Addr.getOpcode() == Cpu0ISD::Wrapper) {
    Base   = Addr.getOperand(0);
    Offset = Addr.getOperand(1);
    return true;
  }

  //@static
  if (TM.getRelocationModel() != Reloc::PIC_) {
    if ((Addr.getOpcode() == ISD::TargetExternalSymbol ||
        Addr.getOpcode() == ISD::TargetGlobalAddress))
      return false;
  }
  ...
}
/// Select instructions not customized! Used for
/// expanded, promoted and normal instructions
void Cpu0DAGToDAGISel::Select(SDNode *Node) {
  // Get target GOT address.
  case ISD::GLOBAL_OFFSET_TABLE:
    ReplaceNode(Node, getGlobalBaseReg());
    return;
  ...
}

lbdex/chapters/Chapter6_1/Cpu0InstrInfo.td

// Hi and Lo nodes are used to handle global addresses. Used on
// Cpu0ISelLowering to lower stuff like GlobalAddress, ExternalSymbol
// static model. (nothing to do with Cpu0 Registers Hi and Lo)
def Cpu0Hi    : SDNode<"Cpu0ISD::Hi", SDTIntUnaryOp>;
def Cpu0Lo    : SDNode<"Cpu0ISD::Lo", SDTIntUnaryOp>;
def Cpu0GPRel : SDNode<"Cpu0ISD::GPRel", SDTIntUnaryOp>;
def Cpu0Wrapper    : SDNode<"Cpu0ISD::Wrapper", SDTIntBinOp>;
def RelocPIC    :     Predicate<"TM.getRelocationModel() == Reloc::PIC_">;
// hi/lo relocs
let Predicates = [Ch6_1] in {
def : Pat<(Cpu0Hi tglobaladdr:$in), (LUi tglobaladdr:$in)>;
}
let Predicates = [Ch6_1] in {
def : Pat<(Cpu0Lo tglobaladdr:$in), (ORi ZERO, tglobaladdr:$in)>;
}
let Predicates = [Ch6_1] in {
def : Pat<(add CPURegs:$hi, (Cpu0Lo tglobaladdr:$lo)),
          (ORi CPURegs:$hi, tglobaladdr:$lo)>;
}
// gp_rel relocs
let Predicates = [Ch6_1] in {
def : Pat<(add CPURegs:$gp, (Cpu0GPRel tglobaladdr:$in)),
          (ORi CPURegs:$gp, tglobaladdr:$in)>;
}

//@ wrapper_pic
let Predicates = [Ch6_1] in {
class WrapperPat<SDNode node, Instruction ORiOp, RegisterClass RC>:
      Pat<(Cpu0Wrapper RC:$gp, node:$in),
              (ORiOp RC:$gp, node:$in)>;

def : WrapperPat<tglobaladdr, ORi, GPROut>;
}

Static mode

From Table: Cpu0 global variable options, option cpu0-use-small-section=false puts the global varibale in data/bss while cpu0-use-small-section=true puts in sdata/sbss. The sdata stands for small data area. Section data and sdata are areas for global variables with initial value (such as int gI = 100 in this example) while Section bss and sbss are areas for global variables without initial value (for instance, int gI;).

data or bss

The data/bss are 32 bits addressable areas since Cpu0 is a 32 bits architecture. Option cpu0-use-small-section=false will generate the following instructions.

  ...
  lui $2, %hi(gI)
  ori $2, $2, %lo(gI)
  ld  $2, 0($2)
  ...
  .type       gStart,@object          # @gStart
  .data
  .globl      gStart
  .align      2
gStart:
  .4byte      2                       # 0x2
  .size       gStart, 4

  .type       gI,@object              # @gI
  .globl      gI
  .align      2
gI:
  .4byte      100                     # 0x64
  .size       gI, 4

As above code, it loads the high address part of gI PC relative address (16 bits) to register $2 and shift 16 bits. Now, the register $2 got it’s high part of gI absolute address. Next, it adds register $2 and low part of gI absolute address into $2. At this point, it gets the gI memory address. Finally, it gets the gI content by instruction “ld $2, 0($2)”. The llc -relocation-model=static is for absolute address mode which must be used in static link mode. The dynamic link must be encoded with Position Independent Addressing. As you can see, the PC relative address can be solved in static link ( The offset between the address of gI and instruction “lui $2, %hi(gI)” can be caculated). Since Cpu0 uses PC relative address coding, this program can be loaded to any address and run correctly there. If this program uses absolute address and can be loaded at a specific address known at link stage, the relocation record of gI variable access instruction such as “lui $2, %hi(gI)” and “ori $2, $2, %lo(gI)” can be solved at link time. On the other hand, if this program use absolute address and the loading address is known at load time, then this relocation record will be solved by loader at load time.

IsGlobalInSmallSection() returns true or false depends on UseSmallSectionOpt.

The code fragment of lowerGlobalAddress() as the following corresponding option llc -relocation-model=static -cpu0-use-small-section=false will translate DAG (GlobalAddress<i32* @gI> 0) into (add Cpu0ISD::Hi<gI offset Hi16> Cpu0ISD::Lo<gI offset Lo16>) in stage “Legalized selection DAG” as below.

lbdex/chapters/Chapter6_1/Cpu0ISelLowering.h

    // This method creates the following nodes, which are necessary for
    // computing a symbol's address in non-PIC mode:
    //
    // (add %hi(sym), %lo(sym))
    template<class NodeTy>
    SDValue getAddrNonPIC(NodeTy *N, EVT Ty, SelectionDAG &DAG) const {
      SDLoc DL(N);
      SDValue Hi = getTargetNode(N, Ty, DAG, Cpu0II::MO_ABS_HI);
      SDValue Lo = getTargetNode(N, Ty, DAG, Cpu0II::MO_ABS_LO);
      return DAG.getNode(ISD::ADD, DL, Ty,
                         DAG.getNode(Cpu0ISD::Hi, DL, Ty, Hi),
                         DAG.getNode(Cpu0ISD::Lo, DL, Ty, Lo));
    }

lbdex/chapters/Chapter6_1/Cpu0ISelLowering.cpp

SDValue Cpu0TargetLowering::getTargetNode(GlobalAddressSDNode *N, EVT Ty,
                                          SelectionDAG &DAG,
                                          unsigned Flag) const {
  return DAG.getTargetGlobalAddress(N->getGlobal(), SDLoc(N), Ty, 0, Flag);
}

SDValue Cpu0TargetLowering::lowerGlobalAddress(SDValue Op,
                                               SelectionDAG &DAG) const {
  ...
  EVT Ty = Op.getValueType();
  GlobalAddressSDNode *N = cast<GlobalAddressSDNode>(Op);
  ..

  if (getTargetMachine().getRelocationModel() != Reloc::PIC_) {
    ...
    // %hi/%lo relocation
    return getAddrNonPIC(N, Ty, DAG);
  }
  ...
}
118-165-78-166:input Jonathan$ clang -target mips-unknown-linux-gnu -c
ch6_1.cpp -emit-llvm -o ch6_1.bc
118-165-78-166:input Jonathan$ ~/llvm/test/cmake_debug_build/Debug/bin/llc
-march=cpu0 -relocation-model=static -cpu0-use-small-section=false
-filetype=asm -debug ch6_1.bc -o -

...
Type-legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 12 nodes:
  ...
      0x7ffd5902cc10: <multiple use>
    0x7ffd5902cf10: ch = store 0x7ffd5902cd10, 0x7ffd5902ca10, 0x7ffd5902ce10,
    0x7ffd5902cc10<ST4[%c]> [ORD=2] [ID=-3]

    0x7ffd5902d010: i32 = GlobalAddress<i32* @gI> 0 [ORD=3] [ID=-3]

    0x7ffd5902cc10: <multiple use>
  0x7ffd5902d110: i32,ch = load 0x7ffd5902cf10, 0x7ffd5902d010,
  0x7ffd5902cc10<LD4[@gI]> [ORD=3] [ID=-3]
  ...

Legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 16 nodes:
  ...
      0x7ffd5902cc10: <multiple use>
    0x7ffd5902cf10: ch = store 0x7ffd5902cd10, 0x7ffd5902ca10, 0x7ffd5902ce10,
    0x7ffd5902cc10<ST4[%c]> [ORD=2] [ID=8]

        0x7ffd5902d310: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=5]

      0x7ffd5902d710: i32 = Cpu0ISD::Hi 0x7ffd5902d310

        0x7ffd5902d610: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=6]

      0x7ffd5902d810: i32 = Cpu0ISD::Lo 0x7ffd5902d610

    0x7ffd5902fe10: i32 = add 0x7ffd5902d710, 0x7ffd5902d810

    0x7ffd5902cc10: <multiple use>
  0x7ffd5902d110: i32,ch = load 0x7ffd5902cf10, 0x7ffd5902fe10,
  0x7ffd5902cc10<LD4[@gI]> [ORD=3] [ID=9]

Finally, the pattern defined in Cpu0InstrInfo.td as the following will translate DAG (add Cpu0ISD::Hi<gI offset Hi16> Cpu0ISD::Lo<gI offset Lo16>) into Cpu0 instructions as below.

lbdex/chapters/Chapter6_1/Cpu0InstrInfo.td

// Hi and Lo nodes are used to handle global addresses. Used on
// Cpu0ISelLowering to lower stuff like GlobalAddress, ExternalSymbol
// static model. (nothing to do with Cpu0 Registers Hi and Lo)
def Cpu0Hi    : SDNode<"Cpu0ISD::Hi", SDTIntUnaryOp>;
def Cpu0Lo    : SDNode<"Cpu0ISD::Lo", SDTIntUnaryOp>;
// hi/lo relocs
let Predicates = [Ch6_1] in {
def : Pat<(Cpu0Hi tglobaladdr:$in), (LUi tglobaladdr:$in)>;
}
let Predicates = [Ch6_1] in {
def : Pat<(Cpu0Lo tglobaladdr:$in), (ORi ZERO, tglobaladdr:$in)>;
}
let Predicates = [Ch6_1] in {
def : Pat<(add CPURegs:$hi, (Cpu0Lo tglobaladdr:$lo)),
          (ORi CPURegs:$hi, tglobaladdr:$lo)>;
}
...
lui $2, %hi(gI)
ori $2, $2, %lo(gI)
...

As above, Pat<(...),(...)> include two lists of DAGs. The left is IR DAG and the right is machine instruction DAG. “Pat<(Cpu0Hi tglobaladdr:$in), (LUi, tglobaladdr:$in)>;” will translate DAG (Cpu0ISD::Hi tglobaladdr) into (lui (ori ZERO, tglobaladdr), 16). “Pat<(add CPURegs:$hi, (Cpu0Lo tglobaladdr:$lo)), (ORi CPURegs:$hi, tglobaladdr:$lo)>;” will translate DAG (add Cpu0ISD::Hi, Cpu0ISD::Lo) into Cpu0 instruction (ori Cpu0ISD::Hi, Cpu0ISD::Lo).

sdata or sbss

The sdata/sbss are 16 bits addressable areas which placed in ELF for fast access. Option cpu0-use-small-section=true will generate the following instructions.

  ori $2, $gp, %gp_rel(gI)
  ld  $2, 0($2)
  ...
  .type       gStart,@object          # @gStart
  .section    .sdata,"aw",@progbits
  .globl      gStart
  .align      2
gStart:
  .4byte      2                       # 0x2
  .size       gStart, 4

  .type       gI,@object              # @gI
  .globl      gI
  .align      2
gI:
  .4byte      100                     # 0x64
  .size       gI, 4

The code fragment of lowerGlobalAddress() as the following corresponding option llc -relocation-model=static -cpu0-use-small-section=true will translate DAG (GlobalAddress<i32* @gI> 0) into (add register %GP Cpu0ISD::GPRel<gI offset>) in stage “Legalized selection DAG” as below.

lbdex/chapters/Chapter6_1/Cpu0ISelLowering.cpp

SDValue Cpu0TargetLowering::lowerGlobalAddress(SDValue Op,
                                               SelectionDAG &DAG) const {
  //@lowerGlobalAddress }
  SDLoc DL(Op);
  const Cpu0TargetObjectFile *TLOF =
        static_cast<const Cpu0TargetObjectFile *>(
            getTargetMachine().getObjFileLowering());
  //@lga 1 {
  EVT Ty = Op.getValueType();
  GlobalAddressSDNode *N = cast<GlobalAddressSDNode>(Op);
  const GlobalValue *GV = N->getGlobal();
  //@lga 1 }

  if (!isPositionIndependent()) {
    //@ %gp_rel relocation
    if (TLOF->IsGlobalInSmallSection(GV, getTargetMachine())) {
      SDValue GA = DAG.getTargetGlobalAddress(GV, DL, MVT::i32, 0,
                                              Cpu0II::MO_GPREL);
      SDValue GPRelNode = DAG.getNode(Cpu0ISD::GPRel, DL,
                                      DAG.getVTList(MVT::i32), GA);
      SDValue GPReg = DAG.getRegister(Cpu0::GP, MVT::i32);
      return DAG.getNode(ISD::ADD, DL, MVT::i32, GPReg, GPRelNode);
    }

    ...
  }
  ...
}
...
Type-legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 12 nodes:
  ...
      0x7fc5f382cc10: <multiple use>
    0x7fc5f382cf10: ch = store 0x7fc5f382cd10, 0x7fc5f382ca10, 0x7fc5f382ce10,
    0x7fc5f382cc10<ST4[%c]> [ORD=2] [ID=-3]

    0x7fc5f382d010: i32 = GlobalAddress<i32* @gI> 0 [ORD=3] [ID=-3]

    0x7fc5f382cc10: <multiple use>
  0x7fc5f382d110: i32,ch = load 0x7fc5f382cf10, 0x7fc5f382d010,
  0x7fc5f382cc10<LD4[@gI]> [ORD=3] [ID=-3]

Legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 15 nodes:
  ...
      0x7fc5f382cc10: <multiple use>
    0x7fc5f382cf10: ch = store 0x7fc5f382cd10, 0x7fc5f382ca10, 0x7fc5f382ce10,
    0x7fc5f382cc10<ST4[%c]> [ORD=2] [ID=8]

      0x7fc5f382d710: i32 = register %GP

        0x7fc5f382d310: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=4]

      0x7fc5f382d610: i32 = Cpu0ISD::GPRel 0x7fc5f382d310

    0x7fc5f382d810: i32 = add 0x7fc5f382d710, 0x7fc5f382d610

    0x7fc5f382cc10: <multiple use>
  0x7fc5f382d110: i32,ch = load 0x7fc5f382cf10, 0x7fc5f382d810,
  0x7fc5f382cc10<LD4[@gI]> [ORD=3] [ID=9]
  ...

Finally, the pattern defined in Cpu0InstrInfo.td as the following will translate DAG (add register %GP Cpu0ISD::GPRel<gI offset>) into Cpu0 instruction as below.

lbdex/chapters/Chapter6_1/Cpu0InstrInfo.td

def Cpu0GPRel : SDNode<"Cpu0ISD::GPRel", SDTIntUnaryOp>;
// gp_rel relocs
let Predicates = [Ch6_1] in {
def : Pat<(add CPURegs:$gp, (Cpu0GPRel tglobaladdr:$in)),
          (ORi CPURegs:$gp, tglobaladdr:$in)>;
}

ori $2, $gp, %gp_rel(gI)
...

“Pat<(add CPURegs:$gp, (Cpu0GPRel tglobaladdr:$in)), (ADD CPURegs:$gp, (ORi ZERO, tglobaladdr:$in))>;” will translate (add register %GP Cpu0ISD::GPRel tglobaladdr) into (add $gp, (ori ZERO, tglobaladdr)).

In this mode, the $gp content is assigned at compile/link time, changed only at program be loaded, and is fixed during the program running; on the contrary, when -relocation-model=pic the $gp can be changed during program running. For this example code, if $gp is assigned to the start address of .sdata by loader when program ch6_1.cpu0.s is loaded, then linker can caculate %gp_rel(gI) (= the relative address distance between gI and start of .sdata section). Which meaning this relocation record can be solved at link time, that’s why it is static mode.

In this mode, we reserve $gp to a specfic fixed address of the program is loaded. As a result, the $gp cannot be allocated as a general purpose for variables. The following code tells llvm never allocate $gp for variables.

lbdex/chapters/Chapter6_1/Cpu0Subtarget.cpp

Cpu0Subtarget::Cpu0Subtarget(const Triple &TT, const std::string &CPU,
                             const std::string &FS, bool little, 
                             const Cpu0TargetMachine &_TM) :
  // Set UseSmallSection.
  UseSmallSection = UseSmallSectionOpt;
  Cpu0ReserveGP = ReserveGPOpt;
  Cpu0NoCpload = NoCploadOpt;
#ifdef ENABLE_GPRESTORE
  if (!TM.isPositionIndependent() && !UseSmallSection && !Cpu0ReserveGP)
    FixGlobalBaseReg = false;
  else
#endif
    FixGlobalBaseReg = true;
}

lbdex/chapters/Chapter6_1/Cpu0RegisterInfo.cpp

BitVector Cpu0RegisterInfo::
getReservedRegs(const MachineFunction &MF) const {
//@getReservedRegs body {
#ifdef ENABLE_GPRESTORE //1
  const Cpu0FunctionInfo *Cpu0FI = MF.getInfo<Cpu0FunctionInfo>();
  // Reserve GP if globalBaseRegFixed()
  if (Cpu0FI->globalBaseRegFixed())
#endif
    Reserved.set(Cpu0::GP);
  ...
}

pic mode

sdata or sbss

Option llc -relocation-model=pic -cpu0-use-small-section=true will generate the following instructions.

  ...
  .set        noreorder
  .cpload     $6
  .set        nomacro
  ...
  ld  $2, %got(gI)($gp)
  ld  $2, 0($2)
  ...
  .type       gStart,@object          # @gStart
  .data
  .globl      gStart
  .align      2
gStart:
  .4byte      2                       # 0x2
  .size       gStart, 4

  .type       gI,@object              # @gI
  .globl      gI
  .align      2
gI:
  .4byte      100                     # 0x64
  .size       gI, 4

The following code fragment of Cpu0AsmPrinter.cpp will emit .cpload asm pseudo instruction at function entry point as below.

lbdex/chapters/Chapter6_1/Cpu0MachineFunction.h

/// Cpu0FunctionInfo - This class is derived from MachineFunction private
/// Cpu0 target-specific information for each MachineFunction.
class Cpu0FunctionInfo : public MachineFunctionInfo {
public:
  Cpu0FunctionInfo(MachineFunction& MF)
  : MF(MF), 
    GlobalBaseReg(0),
  bool globalBaseRegFixed() const;
  bool globalBaseRegSet() const;
  unsigned getGlobalBaseReg();
  /// GlobalBaseReg - keeps track of the virtual register initialized for
  /// use as the global base register. This is used for PIC in some PIC
  /// relocation models.
  unsigned GlobalBaseReg;
  int GPFI; // Index of the frame object for restoring $gp
  ...
};

lbdex/chapters/Chapter6_1/Cpu0MachineFunction.cpp

bool Cpu0FunctionInfo::globalBaseRegFixed() const {
  return FixGlobalBaseReg;
}

bool Cpu0FunctionInfo::globalBaseRegSet() const {
  return GlobalBaseReg;
}

unsigned Cpu0FunctionInfo::getGlobalBaseReg() {
  return GlobalBaseReg = Cpu0::GP;
}

lbdex/chapters/Chapter6_1/Cpu0AsmPrinter.cpp

/// EmitFunctionBodyStart - Targets can override this to emit stuff before
/// the first basic block in the function.
void Cpu0AsmPrinter::EmitFunctionBodyStart() {
  bool EmitCPLoad = (MF->getTarget().getRelocationModel() == Reloc::PIC_) &&
    Cpu0FI->globalBaseRegSet() &&
    Cpu0FI->globalBaseRegFixed();
  if (Cpu0NoCpload)
    EmitCPLoad = false;
    // Emit .cpload directive if needed.
    if (EmitCPLoad)
      OutStreamer->EmitRawText(StringRef("\t.cpload\t$t9"));
  } else if (EmitCPLoad) {
    SmallVector<MCInst, 4> MCInsts;
    MCInstLowering.LowerCPLOAD(MCInsts);
    for (SmallVector<MCInst, 4>::iterator I = MCInsts.begin();
       I != MCInsts.end(); ++I)
      OutStreamer->EmitInstruction(*I, getSubtargetInfo());
}
...
.set        noreorder
.cpload     $6
.set        nomacro
...

The .cpload is the assembly directive (macro) which will expand to several instructions. Issue .cpload before .set nomacro since the .set nomacro option causes the assembler to print a warning message whenever an assembler operation generates more than one machine language instruction, reference Mips ABI [2].

Following code will exspand .cpload into machine instructions as below. “0fa00000 09aa0000 13aa6000” is the .cpload machine instructions displayed in comments of Cpu0MCInstLower.cpp.

lbdex/chapters/Chapter6_1/Cpu0MCInstLower.h

/// This class is used to lower an MachineInstr into an MCInst.
class LLVM_LIBRARY_VISIBILITY Cpu0MCInstLower {
  void LowerCPLOAD(SmallVector<MCInst, 4>& MCInsts);
private:
  MCOperand LowerSymbolOperand(const MachineOperand &MO,
                               MachineOperandType MOTy, unsigned Offset) const;
  ...
}

lbdex/chapters/Chapter6_1/Cpu0MCInstLower.cpp

// Lower ".cpload $reg" to
//  "lui   $gp, %hi(_gp_disp)"
//  "addiu $gp, $gp, %lo(_gp_disp)"
//  "addu  $gp, $gp, $t9"
void Cpu0MCInstLower::LowerCPLOAD(SmallVector<MCInst, 4>& MCInsts) {
  MCOperand GPReg = MCOperand::createReg(Cpu0::GP);
  MCOperand T9Reg = MCOperand::createReg(Cpu0::T9);
  StringRef SymName("_gp_disp");
  const MCSymbol *Sym = Ctx->getOrCreateSymbol(SymName);
  const Cpu0MCExpr *MCSym;

  MCSym = Cpu0MCExpr::create(Sym, Cpu0MCExpr::CEK_ABS_HI, *Ctx);
  MCOperand SymHi = MCOperand::createExpr(MCSym);
  MCSym = Cpu0MCExpr::create(Sym, Cpu0MCExpr::CEK_ABS_LO, *Ctx);
  MCOperand SymLo = MCOperand::createExpr(MCSym);

  MCInsts.resize(3);

  CreateMCInst(MCInsts[0], Cpu0::LUi, GPReg, SymHi);
  CreateMCInst(MCInsts[1], Cpu0::ORi, GPReg, GPReg, SymLo);
  CreateMCInst(MCInsts[2], Cpu0::ADD, GPReg, GPReg, T9Reg);
}
118-165-76-131:input Jonathan$ /Users/Jonathan/llvm/test/
cmake_debug_build/Debug/bin/llc -march=cpu0 -relocation-model=pic -filetype=
obj ch6_1.bc -o ch6_1.cpu0.o
118-165-76-131:input Jonathan$ gobjdump -s ch6_1.cpu0.o

ch6_1.cpu0.o:     file format elf32-big

Contents of section .text:
 0000 0fa00000 0daa0000 13aa6000  ...
...

118-165-76-131:input Jonathan$ gobjdump -tr ch6_1.cpu0.o
...
RELOCATION RECORDS FOR [.text]:
OFFSET   TYPE              VALUE
00000000 UNKNOWN           _gp_disp
00000008 UNKNOWN           _gp_disp
00000020 UNKNOWN           gI

Note

// Mips ABI: _gp_disp After calculating the gp, a function allocates the local stack space and saves the gp on the stack, so it can be restored after subsequent function calls. In other words, the gp is a caller saved register.

...

_gp_disp represents the offset between the beginning of the function and the global offset table. Various optimizations are possible in this code example and the others that follow. For example, the calculation of gp need not be done for a position-independent function that is strictly local to an object module.

The _gp_disp as above is a relocation record, it means both the machine instructions 0da00000 (offset 0) and 0daa0000 (offset 8) which equal to assembly “ori $gp, $zero, %hi(_gp_disp)” and assembly “ori $gp, $gp, %lo(_gp_disp)”, respectively, are relocated records depend on _gp_disp. The loader or OS can caculate _gp_disp by (x - start address of .data) when load the dynamic function into memory x, and adjusts these two instructions offet correctly. Since shared function is loaded when this function is called, the relocation record “ld $2, %got(gI)($gp)” cannot be resolved in link time. In spite of the reloation record is solved on load time, the name binding is static, since linker deliver the memory address to loader, and loader can solve this just by caculate the offset directly. The memory reference bind with the offset of _gp_disp at link time. The ELF relocation records will be introduced in Chapter ELF Support. So, don’t worry if you don’t quite understand it at this point.

The code fragment of lowerGlobalAddress() as the following corresponding option llc -relocation-model=pic will translate DAG (GlobalAddress<i32* @gI> 0) into (load EntryToken, (Cpu0ISD::Wrapper Register %GP, TargetGlobalAddress<i32* @gI> 0)) in stage “Legalized selection DAG” as below.

lbdex/chapters/Chapter6_1/Cpu0ISelLowering.h

    // This method creates the following nodes, which are necessary for
    // computing a global symbol's address:
    //
    // (load (wrapper $gp, %got(sym)))
    template<class NodeTy>
    SDValue getAddrGlobal(NodeTy *N, EVT Ty, SelectionDAG &DAG,
                          unsigned Flag, SDValue Chain,
                          const MachinePointerInfo &PtrInfo) const {
      SDLoc DL(N);
      SDValue Tgt = DAG.getNode(Cpu0ISD::Wrapper, DL, Ty, getGlobalReg(DAG, Ty),
                                getTargetNode(N, Ty, DAG, Flag));
      return DAG.getLoad(Ty, DL, Chain, Tgt, PtrInfo);
    }

lbdex/chapters/Chapter6_1/Cpu0ISelLowering.cpp

SDValue Cpu0TargetLowering::lowerGlobalAddress(SDValue Op,
                                               SelectionDAG &DAG) const {
  EVT Ty = Op.getValueType();
  GlobalAddressSDNode *N = cast<GlobalAddressSDNode>(Op);
  const GlobalValue *GV = N->getGlobal();
    if (TLOF->IsGlobalInSmallSection(GV, getTargetMachine())) {
      SDValue GA = DAG.getTargetGlobalAddress(GV, DL, MVT::i32, 0,
                                              Cpu0II::MO_GPREL);
      SDValue GPRelNode = DAG.getNode(Cpu0ISD::GPRel, DL,
                                      DAG.getVTList(MVT::i32), GA);
      SDValue GPReg = DAG.getRegister(Cpu0::GP, MVT::i32);
      return DAG.getNode(ISD::ADD, DL, MVT::i32, GPReg, GPRelNode);
    }

  ...
}

lbdex/chapters/Chapter6_1/Cpu0ISelDAGToDAG.cpp

/// ComplexPattern used on Cpu0InstrInfo
/// Used on Cpu0 Load/Store instructions
bool Cpu0DAGToDAGISel::
SelectAddr(SDNode *Parent, SDValue Addr, SDValue &Base, SDValue &Offset) {
  // on PIC code Load GA
  if (Addr.getOpcode() == Cpu0ISD::Wrapper) {
    Base   = Addr.getOperand(0);
    Offset = Addr.getOperand(1);
    return true;
  }

  ...
}
...
Type-legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 12 nodes:
  ...
      0x7fad7102cc10: <multiple use>
    0x7fad7102cf10: ch = store 0x7fad7102cd10, 0x7fad7102ca10, 0x7fad7102ce10,
    0x7fad7102cc10<ST4[%c]> [ORD=2] [ID=-3]

    0x7fad7102d010: i32 = GlobalAddress<i32* @gI> 0 [ORD=3] [ID=-3]

    0x7fad7102cc10: <multiple use>
  0x7fad7102d110: i32,ch = load 0x7fad7102cf10, 0x7fad7102d010,
  0x7fad7102cc10<LD4[@gI]> [ORD=3] [ID=-3]
  ...
Legalized selection DAG: BB#0 '_Z3funv:entry'
SelectionDAG has 15 nodes:
  0x7ff3c9c10b98: ch = EntryToken [ORD=1] [ID=0]
  ...
      0x7fad7102cc10: <multiple use>
    0x7fad7102cf10: ch = store 0x7fad7102cd10, 0x7fad7102ca10, 0x7fad7102ce10,
    0x7fad7102cc10<ST4[%c]> [ORD=2] [ID=8]

      0x7fad70c10b98: <multiple use>
        0x7fad7102d610: i32 = Register %GP

        0x7fad7102d310: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=1]

      0x7fad7102d710: i32 = Cpu0ISD::Wrapper 0x7fad7102d610, 0x7fad7102d310

      0x7fad7102cc10: <multiple use>
    0x7fad7102d810: i32,ch = load 0x7fad70c10b98, 0x7fad7102d710,
    0x7fad7102cc10<LD4[<unknown>]>
    0x7ff3ca02cc10: <multiple use>
  0x7ff3ca02d110: i32,ch = load 0x7ff3ca02cf10, 0x7ff3ca02d810,
  0x7ff3ca02cc10<LD4[@gI]> [ORD=3] [ID=9]
  ...

Finally, the pattern Cpu0 instruction ld defined before in Cpu0InstrInfo.td will translate DAG (load EntryToken, (Cpu0ISD::Wrapper Register %GP, TargetGlobalAddress<i32* @gI> 0)) into Cpu0 instruction as follows,

...
ld  $2, %got(gI)($gp)
...

Remind in pic mode, Cpu0 uses ”.cpload” and “ld $2, %got(gI)($gp)” to access global variable as Mips. It takes 4 instructions in both Cpu0 and Mips. The cost came from we didn’t assume that register $gp is always assigned to address .sdata and fixed there. Even we reserve $gp in this function, the $gp register can be changed at other functions. In last sub-section, the $gp is assumed to preserved at any function. If $gp is fixed during the run time, then ”.cpload” can be removed here and have only one instruction cost in global variable access. The advantage of ”.cpload” removing come from losing one general purpose register $gp which can be allocated for variables. In last sub-section, .sdata mode, we use ”.cpload” removing since it is static link. In pic mode, the dynamic loading takes too much time. Romove ”.cpload” with the cost of losing one general purpose register at all functions is not deserved here. The relocation records of ”.cpload” from llc -relocation-model=pic can also be solved in link stage if we want to link this function by static link.

data or bss

The code fragment of lowerGlobalAddress() as the following corresponding option llc -relocation-model=pic will translate DAG (GlobalAddress<i32* @gI> 0) into (load EntryToken, (Cpu0ISD::Wrapper (add Cpu0ISD::Hi<gI offset Hi16>, Register %GP), TargetGlobalAddress<i32* @gI> 0)) in stage “Legalized selection DAG” as below.

lbdex/chapters/Chapter6_1/Cpu0ISelLowering.h

    // This method creates the following nodes, which are necessary for
    // computing a global symbol's address in large-GOT mode:
    //
    // (load (wrapper (add %hi(sym), $gp), %lo(sym)))
    template<class NodeTy>
    SDValue getAddrGlobalLargeGOT(NodeTy *N, EVT Ty, SelectionDAG &DAG,
                                  unsigned HiFlag, unsigned LoFlag,
                                  SDValue Chain,
                                  const MachinePointerInfo &PtrInfo) const {
      SDLoc DL(N);
      SDValue Hi = DAG.getNode(Cpu0ISD::Hi, DL, Ty,
                               getTargetNode(N, Ty, DAG, HiFlag));
      Hi = DAG.getNode(ISD::ADD, DL, Ty, Hi, getGlobalReg(DAG, Ty));
      SDValue Wrapper = DAG.getNode(Cpu0ISD::Wrapper, DL, Ty, Hi,
                                    getTargetNode(N, Ty, DAG, LoFlag));
      return DAG.getLoad(Ty, DL, Chain, Wrapper, PtrInfo);
    }

lbdex/chapters/Chapter6_1/Cpu0ISelLowering.cpp

SDValue Cpu0TargetLowering::lowerGlobalAddress(SDValue Op,
                                               SelectionDAG &DAG) const {
  EVT Ty = Op.getValueType();
  GlobalAddressSDNode *N = cast<GlobalAddressSDNode>(Op);
  const GlobalValue *GV = N->getGlobal();
  if (!TLOF->IsGlobalInSmallSection(GV, getTargetMachine()))
    return getAddrGlobalLargeGOT(
        N, Ty, DAG, Cpu0II::MO_GOT_HI16, Cpu0II::MO_GOT_LO16, 
        DAG.getEntryNode(), 
        MachinePointerInfo::getGOT(DAG.getMachineFunction()));
  return getAddrGlobal(
      N, Ty, DAG, Cpu0II::MO_GOT, DAG.getEntryNode(), 
      MachinePointerInfo::getGOT(DAG.getMachineFunction()));
}
...
Type-legalized selection DAG: BB#0 '_Z3funv:'
SelectionDAG has 10 nodes:
  ...
    0x7fb77a02cd10: ch = store 0x7fb779c10a08, 0x7fb77a02ca10, 0x7fb77a02cb10,
    0x7fb77a02cc10<ST4[%c]> [ORD=1] [ID=-3]

    0x7fb77a02ce10: i32 = GlobalAddress<i32* @gI> 0 [ORD=2] [ID=-3]

    0x7fb77a02cc10: <multiple use>
  0x7fb77a02cf10: i32,ch = load 0x7fb77a02cd10, 0x7fb77a02ce10,
  0x7fb77a02cc10<LD4[@gI]> [ORD=2] [ID=-3]
  ...

Legalized selection DAG: BB#0 '_Z3funv:'
SelectionDAG has 16 nodes:
  ...
    0x7fb77a02cd10: ch = store 0x7fb779c10a08, 0x7fb77a02ca10, 0x7fb77a02cb10,
    0x7fb77a02cc10<ST4[%c]> [ORD=1] [ID=6]

      0x7fb779c10a08: <multiple use>
            0x7fb77a02d110: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=19]

          0x7fb77a02d410: i32 = Cpu0ISD::Hi 0x7fb77a02d110

          0x7fb77a02d510: i32 = Register %GP

        0x7fb77a02d610: i32 = add 0x7fb77a02d410, 0x7fb77a02d510

        0x7fb77a02d710: i32 = TargetGlobalAddress<i32* @gI> 0 [TF=20]

      0x7fb77a02d810: i32 = Cpu0ISD::Wrapper 0x7fb77a02d610, 0x7fb77a02d710

      0x7fb77a02cc10: <multiple use>
    0x7fb77a02fe10: i32,ch = load 0x7fb779c10a08, 0x7fb77a02d810,
    0x7fb77a02cc10<LD4[GOT]>

    0x7fb77a02cc10: <multiple use>
  0x7fb77a02cf10: i32,ch = load 0x7fb77a02cd10, 0x7fb77a02fe10,
  0x7fb77a02cc10<LD4[@gI]> [ORD=2] [ID=7]
  ...

Finally, the pattern Cpu0 instruction ld defined before in Cpu0InstrInfo.td will translate DAG (load EntryToken, (Cpu0ISD::Wrapper (add Cpu0ISD::Hi<gI offset Hi16>, Register %GP), Cpu0ISD::Lo<gI offset Lo16>)) into Cpu0 instructions as below.

...
ori $2, $zero, %got_hi(gI)
shl $2, $2, 16
add $2, $2, $gp
ld  $2, %got_lo(gI)($2)
...

The following code in Cpu0InstrInfo.td is needed for example input ch8_2_select_global_pic.cpp. Since ch8_2_select_global_pic.cpp uses llvm IR select, it cannot be run at this point. It will be run in later Chapter Control flow statements.

lbdex/chapters/Chapter6_1/Cpu0InstrInfo.td

def Cpu0Wrapper    : SDNode<"Cpu0ISD::Wrapper", SDTIntBinOp>;
let Predicates = [Ch6_1] in {
class WrapperPat<SDNode node, Instruction ORiOp, RegisterClass RC>:
      Pat<(Cpu0Wrapper RC:$gp, node:$in),
              (ORiOp RC:$gp, node:$in)>;

def : WrapperPat<tglobaladdr, ORi, GPROut>;
}

lbdex/input/ch8_2_select_global_pic.cpp

volatile int a1 = 1;
volatile int b1 = 2;

int gI1 = 100;
int gJ1 = 50;

int test_select_global_pic()
{
  if (a1 < b1)
    return gI1;
  else
    return gJ1;
}

Global variable print support

Above code is for global address DAG translation. Next, add the following code to Cpu0MCInstLower.cpp and Cpu0ISelLowering.cpp for global variable printing operand function.

lbdex/chapters/Chapter6_1/Cpu0MCInstLower.cpp

MCOperand Cpu0MCInstLower::LowerSymbolOperand(const MachineOperand &MO,
                                              MachineOperandType MOTy,
                                              unsigned Offset) const {
  MCSymbolRefExpr::VariantKind Kind = MCSymbolRefExpr::VK_None;
  Cpu0MCExpr::Cpu0ExprKind TargetKind = Cpu0MCExpr::CEK_None;
  const MCSymbol *Symbol;

  switch(MO.getTargetFlags()) {
  default:                   llvm_unreachable("Invalid target flag!");
  case Cpu0II::MO_NO_FLAG:
    break;

// Cpu0_GPREL is for llc -march=cpu0 -relocation-model=static -cpu0-islinux-
//  format=false (global var in .sdata).
  case Cpu0II::MO_GPREL:
    TargetKind = Cpu0MCExpr::CEK_GPREL;
    break;

  case Cpu0II::MO_GOT:
    TargetKind = Cpu0MCExpr::CEK_GOT;
    break;
// ABS_HI and ABS_LO is for llc -march=cpu0 -relocation-model=static (global 
//  var in .data).
  case Cpu0II::MO_ABS_HI:
    TargetKind = Cpu0MCExpr::CEK_ABS_HI;
    break;
  case Cpu0II::MO_ABS_LO:
    TargetKind = Cpu0MCExpr::CEK_ABS_LO;
    break;
  case Cpu0II::MO_GOT_HI16:
    TargetKind = Cpu0MCExpr::CEK_GOT_HI16;
    break;
  case Cpu0II::MO_GOT_LO16:
    TargetKind = Cpu0MCExpr::CEK_GOT_LO16;
    break;
  }

  switch (MOTy) {
  case MachineOperand::MO_GlobalAddress:
    Symbol = AsmPrinter.getSymbol(MO.getGlobal());
    Offset += MO.getOffset();
    break;

  default:
    llvm_unreachable("<unknown operand type>");
  }

  const MCExpr *Expr = MCSymbolRefExpr::create(Symbol, Kind, *Ctx);

  if (Offset) {
    // Assume offset is never negative.
    assert(Offset > 0);
    Expr = MCBinaryExpr::createAdd(Expr, MCConstantExpr::create(Offset, *Ctx),
                                   *Ctx);
  }

  if (TargetKind != Cpu0MCExpr::CEK_None)
    Expr = Cpu0MCExpr::create(TargetKind, Expr, *Ctx);

  return MCOperand::createExpr(Expr);

}
MCOperand Cpu0MCInstLower::LowerOperand(const MachineOperand& MO,
                                        unsigned offset) const {
  MachineOperandType MOTy = MO.getType();

  switch (MOTy) {
  case MachineOperand::MO_GlobalAddress:
//@1
    return LowerSymbolOperand(MO, MOTy, offset);
  ...
  }
...
}

The Cpu0MCExpr::printImpl() of Cpu0InstPrinter.cpp in last chapter is for global variable printing operand function too.

The following function is for llc -debug this chapter DAG node name printing. It is added at Chapter3_1 already.

lbdex/chapters/Chapter3_1/Cpu0ISelLowering.cpp

const char *Cpu0TargetLowering::getTargetNodeName(unsigned Opcode) const {
  switch (Opcode) {
  ..
  case Cpu0ISD::GPRel:             return "Cpu0ISD::GPRel";
  ...
  case Cpu0ISD::Wrapper:           return "Cpu0ISD::Wrapper";
  ...
  }
}

OS is the output stream which output to the assembly file.

Summary

The global variable Instruction Selection for DAG translation is not like the ordinary IR node translation, it has static (absolute address) and pic mode. Backend deals this translation by create DAG nodes in function lowerGlobalAddress() which called by LowerOperation(). Function LowerOperation() takes care all Custom type of operation. Backend set global address as Custom operation by ”setOperationAction(ISD::GlobalAddress, MVT::i32, Custom);” in Cpu0TargetLowering() constructor. Different address mode create their own DAG list at run time. By setting the pattern Pat<> in Cpu0InstrInfo.td, the llvm can apply the compiler mechanism, pattern match, in the Instruction Selection stage.

There are three types for setXXXAction(), Promote, Expand and Custom. Except Custom, the other two maybe no need to coding. Here [3] is the references.

As shown in this chapter, the global variable can be laid in .sdata/.sbss by option -cpu0-use-small-section=true. It is possible that the variables of small data section (16 bits addressable) are full out at link stage. When that happens, linker will highlights that error and forces the toolchain users to fix it. As the result, the toolchain user need to reconsider which global variables should be moved from .sdata/.sbss to .data/.bss by set option -cpu0-use-small-section=false in Makefile as follows,

Makefile

# Set the global variables declared in a.cpp to .data/.bss
llc  -march=cpu0 -relocation-model=static -cpu0-use-small-section=false \
-filetype=obj a.bc -o a.cpu0.o
# Set the global variables declared in b.cpp to .sdata/.sbss
llc  -march=cpu0 -relocation-model=static -cpu0-use-small-section=true \
-filetype=obj b.bc -o b.cpu0.o

The rule for global variables allocation is “set the small and frequent variables in small 16 addressable area”.

[1]http://llvm.org/docs/CommandLine.html
[2]http://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf
[3]http://llvm.org/docs/WritingAnLLVMBackend.html#the-selectiondag-legalize-phase