Library

The theory of Floating Point Implementation

Fixed Point Represention

Fixed-point representation is used to implement floating-point numbers, as illustrated in Fig. 6. The calculation is described below.

_images/fixed-point.png

Fig. 6 Fixed point representation

Assume Sign part: 1-bit (0 is +, 1 is -), Integer part: 2-bit, Fraction part: 2-bit.

Table 2 Fixed Point Representation.

value

sign

integer+fraction

\(3.0\)

\(0\)

\(11 00\)

\(0.05\)

\(0\)

\(00 10\)

\(3.0 \, \mathsf{x} \, 0.05\)

\(0 \oplus 0 = 0\)

\(1100 \, \mathsf x \, 0010 >> 2 = 0010\)

The \(0 01 10\) for fixed-point is \(1.5\).

IEEE Floating Point Respresenation

The layout for half-precision floating-point format is shown in Fig. 7.

_images/floating-point-half.png

Fig. 7 IEEE 754 half precision of Floating point representation [2]

IEEE-754 floating standard also consider NaN (Not a Number) such as 0/0 and \(\infty\) as Fig. 8.

Floating-point arithmetic can be implemented in both software and hardware.

The 16-bit product \(a \mathsf x b\) can be computed by first converting both a and b to fixed-point format using more bits of memory or registers. After performing the multiplication as fixed-point arithmetic, the result is converted back, as described on this website [1].

✅ An example of multiplication \(a \, \mathsf{x} \, b\) using exponent base 2 is given below:

  • Precondition: a and b are normalized IEEE half-precision (16-bit) floating point values [2]. The exponent bias is 0b01111 = 15. As result, the value 15 in exponent field means \(2^0\). List some of the exponent field values as follows:

\[ \begin{align}\begin{aligned}15 \rightarrow 2^0,\\1 \rightarrow 2^{14},\\30 \rightarrow 2^{15},\\31 \rightarrow NaN.\end{aligned}\end{align} \]
  • Transformation for \(a \, \mathsf{x} \, b\):

    1. Calculation:

      Table 3 \(a \, \mathsf{x} \, b\)

      variable

      sign

      exponent

      significand

      \(a \, \mathsf{x} \, b\)

      \(sign(a) \oplus sign(b)\)

      \(exponent(a) + exponent(b) - 15\)

      \(significand(a)\, \mathsf{x} \,significand(b) >> 10\)

      • \(exponent(a) + exponent(b) - 15\) since exponent bias of 5-bit = 01111 = 15.

      • \(significand(a) * significand(b) >> 10\) since signicand is 10 bits.

    2. Normalize the result.

  • Example:

    Table 4 \(a \, \mathsf{x} \, b\)

    variable

    binary value

    sign

    exponent

    significand

    \(a\)

    \(0.01\)

    \(0\)

    \(01110\)

    \(1000000000\)

    \(b\)

    \(1.1\)

    \(0\)

    \(10000\)

    \(1100000000\)

    \(a \, \mathsf{x} \, b\)

    \(0.01 \, \mathsf{x} \, 1.1\)

    \(0 \oplus 0 = 0\)

    \(01110 + 10000 - 01111 = 01111\)

    \(1000000000\, \mathsf{x} \, 1100000000 >> 10 = 0110000000\)

After normalization, \(0 01111 0110000000\) becomes:

\[0\, 01110\, 1100000000 = 0.011\]

✅ Division is handled similarly:

  • Transformation for a / b:

    1. Calculation:

      Table 5 \(a / b\)

      variable

      sign

      exponent

      significand

      \(a / b\)

      \(sign(a) \oplus sign(b)\)

      \(exponent(a) - exponent(b) + 15\)

      \((significand(a) << 10) / significand(b)\)

    2. Normalize the result.

  • Example:

    \[sign: 0, exponent: 10001, fraction: 1000000000\]
    Table 6 \(a / b\)

    variable

    binary value

    sign

    exponent

    significand

    \(a\)

    \(0.01\)

    \(0\)

    \(01110\)

    \(1000000000\)

    \(b\)

    \(10.0\)

    \(0\)

    \(10001\)

    \(1000000000\)

    \(a / b\)

    \(0.01 / 10.0\)

    \(0 \oplus 0 = 0\)

    \(01110 - 10001 + 01111 = 01100\)

    \((1000000000 << 10) / 1000000000 = 100000000 0\)

After normalization, \(0 01100 1000000000 0\) becomes:

\[0\, 01101\, 1000000000 = 0.001\]

The IEEE-754 floating-point standard also includes special cases such as NaN (Not a Number), which can result from operations like 0/0, and infty, as illustrated in Fig. 8.

_images/exp-enc.png

Fig. 8 Encoding of exponent for IEEE 754 half precision [2]

Since normalization in floating-point arithmetic is a critical operation, the Cpu0 hardware provides clz (count leading zeros) and clo (count leading ones) instructions to speed up the normalization process.

The compiler-rt library implements floating-point multiplication by handling special cases like NaN and \(\infty\) in the same way as described in the implementation above. It also uses clz and clo instructions to accelerate normalization.

~/llvm/debug/compiler-rt/lib/builtins/int_type.h

#if UINT_MAX == 0xFFFFFFFF
#define clzsi __builtin_clz
#define ctzsi __builtin_ctz
#elif ULONG_MAX == 0xFFFFFFFF
#define clzsi __builtin_clzl
#define ctzsi __builtin_ctzl
#else
#error could not determine appropriate clzsi macro for this system
#endif

~/llvm/debug/compiler-rt/lib/builtins/fp_lib.h

#if defined(SINGLE_PRECISION)
...
#define significandBits 23
..
#define implicitBit (REP_C(1) << significandBits)
...
static __inline int rep_clz(rep_t a) { return clzsi(a); }
...
static __inline int rep_clz(rep_t a) { return __builtin_clz(a); }
...
#elif defined(DOUBLE_PRECISION)
...
#define significandBits 52
...
static inline int rep_clz(rep_t a) {
#if defined __LP64__
    return __builtin_clzl(a);
#else
    if (a & REP_C(0xffffffff00000000))
        return __builtin_clz(a >> 32);
    else
        return 32 + __builtin_clz(a & REP_C(0xffffffff));
#endif
...
#elif defined(QUAD_PRECISION)
...
// Note: Since there is no explicit way to tell compiler the constant is a
// 128-bit integer, we let the constant be casted to 128-bit integer
#define significandBits 112
...
#define implicitBit     (REP_C(1) << significandBits)
...
static inline int rep_clz(rep_t a) {
    const union
        {
             __uint128_t ll;
#if _YUGA_BIG_ENDIAN
             struct { uint64_t high, low; } s;
#else
             struct { uint64_t low, high; } s;
#endif
        } uu = { .ll = a };

    uint64_t word;
    uint64_t add;

    if (uu.s.high){
        word = uu.s.high;
        add = 0;
    }
    else{
        word = uu.s.low;
        add = 64;
    }
    return __builtin_clzll(word) + add;
}
...
#endif // __LDBL_MANT_DIG__ == 113
...
static __inline rep_t toRep(fp_t x) {
  const union {
    fp_t f;
    rep_t i;
  } rep = {.f = x};
  return rep.i;
}
...
static __inline int normalize(rep_t *significand) {
  const int shift = rep_clz(*significand) - rep_clz(implicitBit);
  *significand <<= shift;
  return 1 - shift;
}
_images/floating-point-single.png

Fig. 9 IEEE 754 single precision of Floating point representation [3]

The following implementation for 32-bit float multiplication of a*b uses the algorithm introduced above and proceeds as follows:

  1. Normalize a and b.

  2. Computation:

Table 7 \(a \, \mathsf{x} \, b\)

variable

sign

exponent

significand

\(a \, \mathsf{x} \, b\)

\(sign(a) \oplus sign(b)\)

\(exponent(a) + exponent(b) - 127\)

\(significand(a)\, \mathsf{x} \,significand(b) >> 23\)

  1. Normalize the result.

~/llvm/debug/compiler-rt/lib/builtins/fp_mul_impl.inc

#include "fp_lib.h"
...
static __inline fp_t __mulXf3__(fp_t a, fp_t b) {
  const unsigned int aExponent = toRep(a) >> significandBits & maxExponent;
  const unsigned int bExponent = toRep(b) >> significandBits & maxExponent;
  ...

  // Detect if a or b is zero, denormal, infinity, or NaN.
  if (aExponent - 1U >= maxExponent - 1U ||
      bExponent - 1U >= maxExponent - 1U) {
    ....
    // One or both of a or b is denormal.  The other (if applicable) is a
    // normal number.  Renormalize one or both of a and b, and set scale to
    // include the necessary exponent adjustment.
    if (aAbs < implicitBit)
      scale += normalize(&aSignificand);
    if (bAbs < implicitBit)
      scale += normalize(&bSignificand);
  }
  ...

  // exponentBias is 127 for 32-bit single precision.
  // scale is 0 when both a and b are normal form.
  int productExponent = aExponent + bExponent - exponentBias + scale;

  // Normalize the significand and adjust the exponent if needed.
  ...
    // For 32-bit float, productExponent << 23 since exponent is bit(1..23)
    productHi |= (rep_t)productExponent << significandBits;
  ...
  return fromRep(productHi);
}

The dependence for Cpu0 based on Compiler-rt’s builtins

Since Cpu0 does not have hardware floating-point instructions, it requires a software floating-point library to perform floating-point operations.

The LLVM compiler-rt project provides a software floating-point implementation (Fig. 10), so I chose it for this purpose.

As compiler-rt assumes a Unix/Linux rootfs structure, we bridge the gap by adding a few empty header files in exlbt/include.

// dot -Tpng lib.gv -o lib.png
digraph G {
  rankdir=LR;

  node [shape="",style=filled,fillcolor=lightyellow]; lib [label="lib (libm/soft-float/\nscanf/printf)"];
  node [shape="",style=solid,color=black];
  "User program" -> "clang/llc" [ label = "c/c++" ];
  lib -> "clang/llc" [ label = "c" ];
  "clang/llc" -> lld [ label = "obj" ];
}

Fig. 10 compiler-rt/lib/builtins’ software float library

The dependencies for compiler-rt on libm are shown in Fig. 12.

// dot -Tpng compiler-rt-dep-short.gv -o compiler-rt-dep-short.png
digraph G {
  rankdir=LR;

  compound=true;
  node [shape=record];

  subgraph cluster_compiler_rt {
    label = "compiler-rt";
    utb [label="test/builtins/Unit"];
    subgraph cluster_builtins {
      label = "lib/builtins";
      builtins [label="<fdt> float and double types | <ct> complex type"];
    }
  }

  node [label = "sanitizer_printf(%lld)"]; sanitizer_printf;
  node [label = "Cpu0 backend of llvm"]; cpu0;

  subgraph cluster_libm {
    label = "libm";
    libm [label="<c> common | <ma> math"];
  }

  builtins:ct -> libm:c;
  builtins:ct:se -> libm:ma;
  builtins:fdt -> cpu0;
  utb -> sanitizer_printf;
}

Fig. 11 Dependences for compiler-rt on libm

Table 8 lldb dependences

functions

depend on

scanf

newlib/libc

printf

sanitizer_printf.c of compiler-rt

Table 9 sanitizer_printf.c of compiler-rt dependences

functions

depend on

sanitizter_printf.c

builtins of compiler-rt

C Library (Newlib)

Since the complex type in compiler-rt depends on libm, I port Newlib in this section.

Newlib is a C library designed for bare-metal platforms. It consists of two libraries: libc and libm. The libc library supports I/O, file, and string functions, while libm provides mathematical functions.

The official website for Newlib is available here [4], and the libm library can be found here [5].

Since the next section, compiler-rt/builtins, depends on libm, please run the following bash script to install and build Newlib for Cpu0.

lbt/exlbt/newlib-cpu0.sh

#!/usr/bin/env bash

# change this dir for newlib-cygwin
NEWLIB_PARENT_DIR=$HOME/git

NEWLIB_DIR=$NEWLIB_PARENT_DIR/newlib-cygwin
CURR_DIR=`pwd`
CC=$HOME/llvm/test/build/bin/clang
CFLAGS="-target cpu0el-unknown-linux-gnu -static -fintegrated-as -Wno-error=implicit-function-declaration"
AS="$HOME/llvm/test/build/bin/clang -static -fintegrated-as -c"
AR="$HOME/llvm/test/build/bin/llvm-ar"
RANLIB="$HOME/llvm/test/build/bin/llvm-ranlib"
READELF="$HOME/llvm/test/build/bin/llvm-readelf"

install_newlib() {
  pushd $NEWLIB_PARENT_DIR
  git clone git://sourceware.org/git/newlib-cygwin.git
  cd newlib-cygwin
  git checkout dcb25665be227fb5a05497b7178a3d5df88050ec
  cp $CURR_DIR/newlib.patch .
  git apply newlib.patch
  cp -rf $CURR_DIR/newlib-cygwin/newlib/libc/machine/cpu0 newlib/libc/machine/. 
  cp -rf $CURR_DIR/newlib-cygwin/libgloss/cpu0 libgloss/. 
  popd
}

build_cpu0() {
  rm -rf build-$CPU-$ENDIAN
  mkdir build-$CPU-$ENDIAN
  cd build-$CPU-$ENDIAN
  CFLAGS="-target cpu0$ENDIAN-unknown-linux-gnu -mcpu=$CPU -static -fintegrated-as -Wno-error=implicit-function-declaration"
  CC=$CC CFLAGS=$CFLAGS AS=$AS AR=$AR RANLIB=$RANLIB READELF=$READELF ../newlib/configure --host=cpu0
  make
  cd ..
}

build_newlib() {
  pushd $NEWLIB_DIR
  CPU=cpu032I
  ENDIAN=eb
  build_cpu0;
  CPU=cpu032I
  ENDIAN=el
  build_cpu0;
  CPU=cpu032II
  ENDIAN=eb
  build_cpu0;
  CPU=cpu032II
  ENDIAN=el
  build_cpu0;
  popd
}

install_newlib;
build_newlib;

Note

In order to add Cpu0 backend to NewLib, the following changes in lbt/exlbt/newlib.patch

  • lbt/exlbt/newlib-cygwin/newlib/libc/machine/cpu0/setjmp.S is added;

  • newlib-cygwin/config.sub, newlib-cygwin/newlib/configure.host, newlib-cygwin/newlib/libc/include/machine/ieeefp.h, newlib-cygwin/newlib/libc/include/sys/unistd.h and newlib-cygwin/newlib/libc/machine/configure are modified for adding cpu0.

lbt/exlbt/newlib.patch

diff --git a/config.sub b/config.sub
index 63c1f1c8b..575e8d9d7 100755
--- a/config.sub
+++ b/config.sub
@@ -1177,6 +1177,7 @@ case $cpu-$vendor in
 			| d10v | d30v | dlx | dsp16xx \
 			| e2k | elxsi | epiphany \
 			| f30[01] | f700 | fido | fr30 | frv | ft32 | fx80 \
+                        | cpu0 \
 			| h8300 | h8500 \
 			| hppa | hppa1.[01] | hppa2.0 | hppa2.0[nw] | hppa64 \
 			| hexagon \
diff --git a/newlib/configure.host b/newlib/configure.host
index ca6b46f03..7bbf46f25 100644
--- a/newlib/configure.host
+++ b/newlib/configure.host
@@ -176,6 +176,10 @@ case "${host_cpu}" in
   fr30)
 	machine_dir=fr30
 	;;
+  cpu0)
+	machine_dir=cpu0
+	newlib_cflags="${newlib_cflags} -DCOMPACT_CTYPE"
+	;;
   frv)
 	machine_dir=frv
         ;;
@@ -751,6 +755,9 @@ newlib_cflags="${newlib_cflags} -DCLOCK_PROVIDED -DMALLOC_PROVIDED -DEXIT_PROVID
   fr30-*-*)
 	syscall_dir=syscalls
 	;;
+  cpu0-*)
+	syscall_dir=syscalls
+	;;
   frv-*-*)
         syscall_dir=syscalls
 	default_newlib_io_long_long="yes"
diff --git a/newlib/libc/include/machine/ieeefp.h b/newlib/libc/include/machine/ieeefp.h
index 3c1f41e03..1e79a6b26 100644
--- a/newlib/libc/include/machine/ieeefp.h
+++ b/newlib/libc/include/machine/ieeefp.h
@@ -249,6 +249,16 @@
 #define __IEEE_BIG_ENDIAN
 #endif
 
+// pre-defined compiler macro (from llc -march=cpu0${ENDIAN} or 
+// clang -target cpu0${ENDIAN}-unknown-linux-gnu 
+// http://beefchunk.com/documentation/lang/c/pre-defined-c/prearch.html 
+#ifdef __CPU0EL__
+#define __IEEE_LITTLE_ENDIAN
+#endif
+#ifdef __CPU0EB__
+#define __IEEE_BIG_ENDIAN
+#endif
+
 #ifdef __MMIX__
 #define __IEEE_BIG_ENDIAN
 #endif
@@ -507,4 +517,3 @@
 
 #endif /* not __IEEE_LITTLE_ENDIAN */
 #endif /* not __IEEE_BIG_ENDIAN */
-
diff --git a/newlib/libc/include/sys/unistd.h b/newlib/libc/include/sys/unistd.h
index 3cc313080..605929173 100644
--- a/newlib/libc/include/sys/unistd.h
+++ b/newlib/libc/include/sys/unistd.h
@@ -50,7 +50,7 @@ int     dup3 (int __fildes, int __fildes2, int flags);
 int	eaccess (const char *__path, int __mode);
 #endif
 #if __XSI_VISIBLE
-void	encrypt (char *__block, int __edflag);
+void	encrypt (char *__libc_block, int __edflag);
 #endif
 #if __BSD_VISIBLE || (__XSI_VISIBLE && __XSI_VISIBLE < 500)
 void	endusershell (void);
diff --git a/newlib/libc/machine/configure b/newlib/libc/machine/configure
index 62064cdfd..5ef5eec08 100755
--- a/newlib/libc/machine/configure
+++ b/newlib/libc/machine/configure
@@ -823,6 +823,7 @@ csky
 d10v
 d30v
 epiphany
+cpu0
 fr30
 frv
 ft32
@@ -12007,6 +12008,8 @@ subdirs="$subdirs a29k"
 	d30v) subdirs="$subdirs d30v"
  ;;
 	epiphany) subdirs="$subdirs epiphany"
+ ;;
+	cpu0) subdirs="$subdirs cpu0"
  ;;
 	fr30) subdirs="$subdirs fr30"
  ;;

lbt/exlbt/newlib-cygwin/newlib/libc/machine/cpu0/setjmp.S

# setjmp/longjmp for cpu0.  The jmpbuf looks like this:
#	
# Register	jmpbuf offset
# $9		0x00
# $10		0x04
# $11		0x08
# $12		0x0c
# $13		0x10
# $14		0x14
# $15		0x18
	
.macro save reg
	st	\reg,@r4
	add	#4,r4
.endm
	
.macro restore reg
	ld	@r4,\reg
	add	#4,r4
.endm


	.text
	.global	setjmp
setjmp:
	st $9,  0($a0)
	st $10, 4($a0)
	st $11, 8($a0)
	st $12, 12($a0)
	st $13, 16($a0)
	st $14, 20($a0)
	st $15, 24($a0)
# Return 0 to caller.
	addiu $lr, $zero, 0x0
	ret $lr

	.global	longjmp
longjmp:
	ld $9,  0($a0)
	ld $10, 4($a0)
	ld $11, 8($a0)
	ld $12, 12($a0)
	ld $13, 16($a0)
	ld $14, 20($a0)
	ld $15, 24($a0)

# If caller attempted to return 0, return 1 instead.
        cmp     $sw, $5,$0
        jne     $sw, $BB1
        addiu   $5,$0,1
$BB1:
        addu    $2,$0,$5
        ret	$lr
cschen@cschendeiMac exlbt % bash newlib-cpu0.sh

The libm.a library depends on the errno variable from libc, which is defined in sys/errno.h.

  • libgloss is BSP license [6]

Compiler-rt’s builtins

Compiler-rt is a project for runtime libraries implementation [7]. The compiler-rt/lib/builtins directory provides functions for basic operations such as +, -, *, /, etc., on float or double types. It also supports type conversions between floating-point and integer, or conversions involving types wider than 32 bits, such as long long.

The compiler-rt/lib/builtins/README.txt [8] lists the dependent functions used throughout the builtins. These dependent functions are a small subset of libm, which are defined in compiler-rt/lib/builtins/int_math.h [9].

~git/newlib-cygwin/build-cpu032I-eb/Makefile

MATHDIR = math

# The newlib hardware floating-point routines have been disabled due to
# inaccuracy.  If you wish to work on them, you will need to edit the
# configure.in file to re-enable the configuration option.  By default,
# the NEWLIB_HW_FP variable will always be false.
#MATHDIR = mathfp

As shown in the Makefile above, Newlib uses the libm/math directory.

The dependencies for the builtin functions of compiler-rt on libm are shown in Fig. 12.

// dot -Tpng compiler-rt-dep.gv -o compiler-rt-dep.png
digraph G {
  rankdir=LR;

  compound=true;
  node [shape=record];

  subgraph cluster_compiler_rt {
    label = "compiler-rt";
    utb [label="test/builtins/Unit"];
    subgraph cluster_builtins {
      label = "lib/builtins";
      builtins [label="<fdt> float and double types | <ct> complex type"];
    }
  }

  node [label = "sanitizer_printf(%lld)"]; sanitizer_printf;
  node [label = "Cpu0 backend of llvm"]; cpu0;

  subgraph cluster_libm {
    label = "libm";
    libm [label="<c> common | <ma> math"];
  }

  builtins:ct -> libm:c [label = "the __builtin functions of isinf, isnan, fabsl, \n fmax, fmaxf, fmaxl, log, logf, logl, scalbn, scalbnf, \n scalbnl, copysign, copysignf, copysignl, fabsl" ];
  builtins:ct:se -> libm:ma [label = "the __builtin functions of fabs, fabsf" ];
  builtins:fdt -> cpu0 [label = "__builtin_clz(), __builtin_clo() and abort()" ];
  utb -> sanitizer_printf [label = "sanitizer_printf.cpp and sanitizer_internal_defs.h \n of compiler-rt/test/builtins/Unit" ];
}

Fig. 12 Dependences for builtin functions of compiler-rt on libm

In this section, I copied test cases for verification of software floating point (SW FP) from compiler-rt/test/builtins/Unit to compiler-rt-test/builtins/Unit/.

Since lbt/exlbt/input/printf-stdarg.c does not support %lld (long long integer, 64-bit), and the test cases in compiler-rt/test/builtins/Unit require this format to verify SW FP results, I ported sanitizer_printf.cpp and sanitizer_internal_defs.h to lbt/exlbt/input/ from compiler-rt/lib/sanitizer_common/.

Table 10 compiler-rt builtins dependences on newlib/libm (open source libc for bare metal)

function

file

directory of libm

abort

lbt/exlbt/compiler-rt/ cpu0/abort.c

isinf

s_isinf.c

newlib-cygwin/newlib/libm/common

isnan

s_isnan.c

fabsl

fabsl.c

fmax

s_fmax.c

fmaxf

sf_fmax.c

fmaxl

fmaxl.c

log

log.c

logf

sf_log.c

logl

logl.c

scalbn

s_scalbn.c

scalbnf

sf_scalbn.c

scalbnl

scalblnl.c

copysign

s_copysign.c

copysignf

sf_copysign.c

copysignl

copysignl.c

fabsl

fabsl.c

fabs

s_fabs.c

newlib-cygwin/newlib/libm/math

fabsf

sf_fabs.c

  • Libm has no dependencies on any other library.

  • Only the complex type in compiler-rt/lib/builtins depends on libm. Other types (float and double) only depend on __builtin_clz(), __builtin_clo(), and abort(). I have ported abort() in lbt/exlbt/compiler-rt/cpu0/abort.c.

  • All test cases in compiler-rt/test/builtins/Unit depend on printf(%lld or %llX, …). I ported this functionality from compiler-rt/lib/sanitizer_common/sanitizer_printf.cpp to lbt/exlbt/input/sanitizer_printf.cpp.

  • The dependent functions for complex type have been ported from newlib/libm.

  • Except for builtins, the other three components—sanitizer runtimes, profile and BlocksRuntime in compiler-rt are not needed for my embedded Cpu0.

The libgcc integer and soft float libraries [10] [11] [12] are functionally equivalent to the builtins in compiler-rt.

In compiler-rt/lib/builtins, the file-level dependencies are listed in the following table.

Table 11 dependence between files for compiler-rt/lib/builtins

functions

depend on

*.c

*.inc

*.inc

*.h

Though the ‘rt’ stands for Runtime Libraries, most functions in the builtins library are written in target-independent C code. These functions can be compiled and statically linked into the target.

When you compile the following C code, llc will generate a call to __addsf3 to invoke the compiler-rt floating-point function for Cpu0.

This is because Cpu0 does not have hardware floating-point instructions, so the Cpu0 backend does not handle the DAG for __addsf3. As a result, LLVM treats the DAG for __addsf3 as a function call, rather than a direct float-add instruction.

lbt/exlbt/input/ch_call_compilerrt_func.c

// clang -target mips-unknown-linux-gnu -S ch_call_compilerrt_func.c -emit-llvm -o ch_call_compilerrt_func.ll
// ~/llvm/test/build/bin/llc -march=cpu0 -mcpu=cpu032II -relocation-model=static -filetype=asm ch_call_compilerrt_func.ll -o -

/// start
float ch_call_compilerrt_func()
{
  float a = 3.1;
  float b = 2.2;
  float c = a + b;

  return c;
}

chungshu@ChungShudeMacBook-Air input % clang -target mips-unknown-linux-gnu -S
ch_call_compilerrt_func.c -emit-llvm
chungshu@ChungShudeMacBook-Air input % cat ch_call_compilerrt_func.ll
  ...
  %4 = load float, float* %1, align 4
  %5 = load float, float* %2, align 4
  %6 = fadd float %4, %5

chungshu@ChungShudeMacBook-Air input % ~/llvm/test/build/bin/llc -march=cpu0
-mcpu=cpu032II -relocation-model=static -filetype=asm ch_call_compilerrt_func.ll -o -
      ...
      ld      $4, 20($fp)
      ld      $5, 16($fp)
      jsub    __addsf3

For some bare-metal or embedded applications, the C code does not need the file and high-level I/O features provided by libc.

libm provides a wide range of functions to support software floating-point operations beyond basic arithmetic [13].

libc provides file handling, high-level I/O functions, and some basic float operations [14].

Cpu0 uses compiler-rt/lib/builtins and compiler-rt/lib/sanitizer_common/sanitizer_printf.cpp to support software floating-point.

The compiler-rt/lib/builtins is a target-independent C implementation of a software floating-point library. Cpu0 currently implements only compiler-rt-12.x/cpu0/abort.c to support this functionality.

Note

Why are these libm functions called builtins in compiler-rt/lib/builtins?

Though these compiler-rt built-in functions are written in C, CPUs can provide hardware float or high-level instructions to accelerate them. Compilers like Clang can convert float-type operations in C into LLVM IR. Then, the backend compiles them into specific hardware instructions for performance.

To optimize libm functions, many CPUs include hardware floating-point instructions.

For example, the Clang compilation and backend translation go as follows:

  • float a, b, c; a = b * c; → (Clang) → %add = fmul float %0, %1 [16]

MIPS backend compiles fmul into hardware instructions:

  • %add = fmul float %0, %1 → (LLVM-MIPS) → mul.s [16] [15]

Cpu0 backend, without hardware float support, compiles fmul into a library function call:

  • %add = fmul float %0, %1 → (LLVM-Cpu0) → jsub fmul [16]

For high-level math functions, Clang compiles float-type operations in C into LLVM intrinsic functions. Then, LLVM backends for different CPUs compile these intrinsics into hardware instructions when available.

For example, Clang compiles pow() into @llvm.pow.f32 as follows:

  • %pow = call float @llvm.pow.f32(float %x, float %y) [17]

The AMDGPU backend compiles @llvm.pow.f32 into a sequence of instructions:

  • %pow = call float @llvm.pow.f32(float %x, float %y) → (LLVM-AMDGPU) → … + v_exp_f32_e32 v0, v0 + … [17]

The MIPS backend compiles @llvm.pow.f32 into a function call:

  • %pow = call float @llvm.pow.f32(float %x, float %y) → (LLVM-MIPS) → jal powf [17]

Clang treats these libm functions as built-ins and compiles them into LLVM IR or intrinsics. Then, the LLVM backend can either lower them into hardware instructions (if available) or generate function calls to built-in implementations in libm.

The following quote is from Clang’s documentation [18]:

RValue CodeGenFunction::EmitBuiltinExpr(...)
  ...
  // There are LLVM math intrinsics/instructions corresponding to math library
  // functions except the LLVM op will never set errno while the math library
  // might. Also, math builtins have the same semantics as their math library
  // twins. Thus, we can transform math library and builtin calls to their
  // LLVM counterparts if the call is marked 'const' (known to never set errno).

Verification

The following sanitizer_printf.cpp, extended from compiler-rt, supports printf(“%lld”). Its implementation calls some floating-point library functions in compiler-rt/lib/builtins.

exlbt/include/math.h

#ifndef _MATH_H_
#define	_MATH_H_

//#ifdef HAS_COMPLEX
 #ifndef HUGE_VALF
  #define HUGE_VALF (1.0e999999999F)
 #endif

 #if !defined(INFINITY)
  #define INFINITY (HUGE_VALF)
 #endif

 #if !defined(NAN)
  #define NAN (0.0F/0.0F)
 #endif

 float cabsf(float complex) ;
//#endif
#endif

exlbt/include/stdio.h

#ifndef _STDIO_H_
#define	_STDIO_H_

#define stdin   0
#define stdout  1
#define stderr  2

#define size_t unsigned int

#endif

exlbt/include/stdlib.h

#ifndef _STDLIB_H_
#define	_STDLIB_H_

#ifdef __cplusplus
extern "C" {
#endif

void abort();

#ifdef __cplusplus
}
#endif

#endif

exlbt/include/string.h

#ifndef _STRING_H_
#define	_STRING_H_


#endif

exlbt/compiler-rt/cpu0/abort.c

void abort() {
  // cpu0.v: ABORT at mem 0x04
  asm("addiu $lr, $ZERO, 4");
  asm("ret $lr"); 
}

exlbt/input/sanitizer_internal_defs.h

//===-- sanitizer_internal_defs.h -------------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file is shared between AddressSanitizer and ThreadSanitizer.
// It contains macro used in run-time libraries code.
//===----------------------------------------------------------------------===//
#ifndef SANITIZER_DEFS_H
#define SANITIZER_DEFS_H

// For portability reasons we do not include stddef.h, stdint.h or any other
// system header, but we do need some basic types that are not defined
// in a portable way by the language itself.
namespace __sanitizer {

#if defined(_WIN64)
// 64-bit Windows uses LLP64 data model.
typedef unsigned long long uptr;
typedef signed long long sptr;
#else
typedef unsigned long uptr;
typedef signed long sptr;
#endif  // defined(_WIN64)
#if defined(__x86_64__)
// Since x32 uses ILP32 data model in 64-bit hardware mode, we must use
// 64-bit pointer to unwind stack frame.
typedef unsigned long long uhwptr;
#else
typedef uptr uhwptr;
#endif

typedef unsigned char u8;
typedef unsigned short u16;
typedef unsigned int u32;
typedef unsigned long long u64;
typedef signed char s8;
typedef signed short s16;
typedef signed int s32;
typedef signed long long s64;

// Check macro
#define RAW_CHECK_MSG(expr, msg) 

#define RAW_CHECK(expr) RAW_CHECK_MSG(expr, #expr)

#define CHECK_IMPL(c1, op, c2)

#define CHECK(a)       CHECK_IMPL((a), !=, 0)
#define CHECK_EQ(a, b) CHECK_IMPL((a), ==, (b))
#define CHECK_NE(a, b) CHECK_IMPL((a), !=, (b))
#define CHECK_LT(a, b) CHECK_IMPL((a), <,  (b))
#define CHECK_LE(a, b) CHECK_IMPL((a), <=, (b))
#define CHECK_GT(a, b) CHECK_IMPL((a), >,  (b))
#define CHECK_GE(a, b) CHECK_IMPL((a), >=, (b))

}  // namespace __sanitizer

#endif

exlbt/input/sanitizer_printf.cpp

//===-- sanitizer_printf.cpp ----------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file is shared between AddressSanitizer and ThreadSanitizer.
//
// Internal printf function, used inside run-time libraries.
// We can't use libc printf because we intercept some of the functions used
// inside it.
//===----------------------------------------------------------------------===//

#include "sanitizer_internal_defs.h"

#include <stdio.h>
#include <stdarg.h>

#include "debug.h"

extern "C" int putchar(int c);

extern void* internal_memset(void* b, int c, size_t len);

#if SANITIZER_WINDOWS && defined(_MSC_VER) && _MSC_VER < 1800 &&               \
      !defined(va_copy)
# define va_copy(dst, src) ((dst) = (src))
#endif

namespace __sanitizer {

static int strlen(const char* s) {
  int len = 0;
  for (const char* p = s; *p != '\0'; p++) {
    len++;
  }
  return len;
}

static int AppendChar(char **buff, const char *buff_end, char c) {
  if (*buff < buff_end) {
    **buff = c;
    (*buff)++;
  }
  return 1;
}

// Appends number in a given base to buffer. If its length is less than
// |minimal_num_length|, it is padded with leading zeroes or spaces, depending
// on the value of |pad_with_zero|.
static int AppendNumber(char **buff, const char *buff_end, u64 absolute_value,
                        u8 base, u8 minimal_num_length, bool pad_with_zero,
                        bool negative, bool uppercase, bool left_justified) {
  uptr const kMaxLen = 30;
  RAW_CHECK(base == 10 || base == 16);
  RAW_CHECK(base == 10 || !negative);
  RAW_CHECK(absolute_value || !negative);
  RAW_CHECK(minimal_num_length < kMaxLen);
  int result = 0;
  if (negative && minimal_num_length)
    --minimal_num_length;
  if (negative && pad_with_zero)
    result += AppendChar(buff, buff_end, '-');
  uptr num_buffer[kMaxLen];
  int num_pads = 0;
  int pos = 0;
  do {
    RAW_CHECK_MSG((uptr)pos < kMaxLen, "AppendNumber buffer overflow");
    num_buffer[pos++] = absolute_value % base;
    absolute_value /= base;
  } while (absolute_value > 0);
  if (pos < minimal_num_length) {
    // Make sure compiler doesn't insert call to memset here.
    internal_memset(&num_buffer[pos], 0,
                    sizeof(num_buffer[0]) * (minimal_num_length - pos));
    num_pads = minimal_num_length - pos;
    pos = minimal_num_length;
  }
  RAW_CHECK(pos > 0);
  pos--;
  for (; pos >= 0 && num_buffer[pos] == 0; pos--) {
    char c = (pad_with_zero || pos == 0) ? '0' : ' ';
    if (!left_justified)
      result += AppendChar(buff, buff_end, c);
  }
  if (negative && !pad_with_zero) result += AppendChar(buff, buff_end, '-');
  for (; pos >= 0; pos--) {
    char digit = static_cast<char>(num_buffer[pos]);
    digit = (digit < 10) ? '0' + digit : (uppercase ? 'A' : 'a') + digit - 10;
    result += AppendChar(buff, buff_end, digit);
  }
  if (left_justified) {
    for (int i = 0; i < num_pads; i++)
      result += AppendChar(buff, buff_end, ' ');
  }
  return result;
}

static int AppendUnsigned(char **buff, const char *buff_end, u64 num, u8 base,
                          u8 minimal_num_length, bool pad_with_zero,
                          bool uppercase, bool left_justified) {
  return AppendNumber(buff, buff_end, num, base, minimal_num_length,
                      pad_with_zero, false /* negative */, uppercase, 
                      left_justified);
}

static int AppendSignedDecimal(char **buff, const char *buff_end, s64 num,
                               u8 minimal_num_length, bool pad_with_zero,
                               bool left_justified) {
  bool negative = (num < 0);
  return AppendNumber(buff, buff_end, (u64)(negative ? -num : num), 10,
                      minimal_num_length, pad_with_zero, negative,
                      false /* uppercase */, left_justified);
}


// Use the fact that explicitly requesting 0 width (%0s) results in UB and
// interpret width == 0 as "no width requested":
// width == 0 - no width requested
// width  < 0 - left-justify s within and pad it to -width chars, if necessary
// width  > 0 - right-justify s, implement for cpu0
static int AppendString(char **buff, const char *buff_end, int width,
                        int max_chars, const char *s, bool left_justified) {
  if (!s)
    s = "<null>";
  int result = 0;
  if (!left_justified) {
    int s_len = strlen(s);
    while (result < width - s_len)
      result += AppendChar(buff, buff_end, ' ');
  }
  for (; *s; s++) {
    if (max_chars >= 0 && result >= max_chars)
      break;
    result += AppendChar(buff, buff_end, *s);
  }
  if (left_justified) {
    while (width < -result)
      result += AppendChar(buff, buff_end, ' ');
  }
  return result;
}

static int AppendPointer(char **buff, const char *buff_end, u64 ptr_value,
                         bool left_justified) {
  int result = 0;
  result += AppendString(buff, buff_end, 0, -1, "0x", left_justified);
  result += AppendUnsigned(buff, buff_end, ptr_value, 16,
// By running clang -E, can get the macro value for SANITIZER_POINTER_FORMAT_LENGTH is (12)
//                           SANITIZER_POINTER_FORMAT_LENGTH,
                           (12),
                           true /* pad_with_zero */, false /* uppercase */,
                           left_justified);
  return result;
}

int VSNPrintf(char *buff, int buff_length,
              const char *format, va_list args) {
  static const char *kPrintfFormatsHelp =
      "Supported Printf formats: %([0-9]*)?(z|ll)?{d,u,x,X}; %p; "
      "%[-]([0-9]*)?(\\.\\*)?s; %c\n";
  RAW_CHECK(format);
  RAW_CHECK(buff_length > 0);
  const char *buff_end = &buff[buff_length - 1];
  const char *cur = format;
  int result = 0;
  for (; *cur; cur++) {
    if (*cur != '%') {
      result += AppendChar(&buff, buff_end, *cur);
      continue;
    }
    cur++;
    bool left_justified = *cur == '-';
    if (left_justified)
      cur++;
    bool have_width = (*cur >= '0' && *cur <= '9');
    bool pad_with_zero = (*cur == '0');
    int width = 0;
    if (have_width) {
      while (*cur >= '0' && *cur <= '9') {
        width = width * 10 + *cur++ - '0';
      }
    }
    bool have_precision = (cur[0] == '.' && cur[1] == '*');
    int precision = -1;
    if (have_precision) {
      cur += 2;
      precision = va_arg(args, int);
    }
    bool have_z = (*cur == 'z');
    cur += have_z;
    bool have_ll = !have_z && (cur[0] == 'l' && cur[1] == 'l');
    cur += have_ll * 2;
    s64 dval;
    u64 uval;
    const bool have_length = have_z || have_ll;
    const bool have_flags = have_width || have_length;
    // At the moment only %s supports precision and left-justification.
    CHECK(!((precision >= 0 || left_justified) && *cur != 's'));
    switch (*cur) {
      case 'd': {
        dval = have_ll ? va_arg(args, s64)
             : have_z ? va_arg(args, sptr)
             : va_arg(args, int);
        result += AppendSignedDecimal(&buff, buff_end, dval, width,
                                      pad_with_zero, left_justified);
        break;
      }
      case 'u':
      case 'x':
      case 'X': {
        uval = have_ll ? va_arg(args, u64)
             : have_z ? va_arg(args, uptr)
             : va_arg(args, unsigned);
        bool uppercase = (*cur == 'X');
        result += AppendUnsigned(&buff, buff_end, uval, (*cur == 'u') ? 10 : 16,
                                 width, pad_with_zero, uppercase, left_justified);
        break;
      }
      case 'p': {
        RAW_CHECK_MSG(!have_flags, kPrintfFormatsHelp);
        result += AppendPointer(&buff, buff_end, va_arg(args, uptr),
                                left_justified);
        break;
      }
      case 's': {
        RAW_CHECK_MSG(!have_length, kPrintfFormatsHelp);
        CHECK(!have_width || left_justified);
        result += AppendString(&buff, buff_end, left_justified ? -width : width,
                               precision, va_arg(args, char*), left_justified);
        break;
      }
      case 'c': {
        RAW_CHECK_MSG(!have_flags, kPrintfFormatsHelp);
        result += AppendChar(&buff, buff_end, va_arg(args, int));
        break;
      }
      case '%' : {
        RAW_CHECK_MSG(!have_flags, kPrintfFormatsHelp);
        result += AppendChar(&buff, buff_end, '%');
        break;
      }
      default: {
        RAW_CHECK_MSG(false, kPrintfFormatsHelp);
      }
    }
  }
  RAW_CHECK(buff <= buff_end);
  AppendChar(&buff, buff_end + 1, '\0');
  return result;
}

} // namespace __sanitizer

int prints(const char *string)
{
  int pc = 0, padchar = ' ';

  for ( ; *string ; ++string) {
    putchar (*string);
    ++pc;
  }

  return pc;
}

extern "C" int sprintf(char *buffer, const char *format, ...) {
  int length = 1000;
  va_list args;
  va_start(args, format);
  int needed_length = __sanitizer::VSNPrintf(buffer, length, format, args);
  va_end(args);
  return 0;
}

extern "C" int printf(const char *format, ...) {
  int length = 1000;
  char buffer[1000];
  va_list args;
  va_start(args, format);
  int needed_length = __sanitizer::VSNPrintf(buffer, length, format, args);
  va_end(args);
  prints(buffer);
  return 0;
}

extern "C" int san_printf(const char *format, ...) {
  int length = 1000;
  char buffer[1000];
  va_list args;
  va_start(args, format);
  int needed_length = __sanitizer::VSNPrintf(buffer, length, format, args);
  va_end(args);
  prints(buffer);
  return 0;
}

The above two sanitizer_*.* files are ported from compiler-rt. I added code to support left-justify for number printf and right-justify for string printf. The following ch_float.cpp tests the float library.

lbt/exlbt/compiler-rt-12.x/builtins/Makefile

# Thanks .c .cb Vranish (https://spin.a.cmi.cbject..cm/2016/08/26/makefile-c-p.cjects/)

# CPU and endian passed from command line, such as "make CPU=cpu032II ENDIAN=el"

TARGET_LIB := libbuiltins.a
BUILD_DIR := ./build-$(CPU)-$(ENDIAN)
TARGET := $(BUILD_DIR)/$(TARGET_LIB)

SRC_DIR := $(HOME)/llvm/llvm-project/compiler-rt/lib/builtins

PWD := $(shell pwd)

TOOLDIR := ~/llvm/test/build/bin
CC := $(TOOLDIR)/clang
AR := $(TOOLDIR)/llvm-ar

# copy GENERIC_SOURCES from compiler-rt/lib/builtin/CMakeLists.txt
GENERIC_SOURCES := \
  absvdi2.c \
  absvsi2.c \
  absvti2.c \
  adddf3.c \
  addsf3.c \
  addvdi3.c \
  addvsi3.c \
  addvti3.c \
  apple_versioning.c \
  ashldi3.c \
  ashlti3.c \
  ashrdi3.c \
  ashrti3.c \
  bswapdi2.c \
  bswapsi2.c \
  clzdi2.c \
  clzsi2.c \
  clzti2.c \
  cmpdi2.c \
  cmpti2.c \
  comparedf2.c \
  comparesf2.c \
  ctzdi2.c \
  ctzsi2.c \
  ctzti2.c \
  divdc3.c \
  divdf3.c \
  divdi3.c \
  divsc3.c \
  divtc3.c \
  divmoddi4.c \
  divmodsi4.c \
  divmodti4.c \
  divsc3.c \
  divsf3.c \
  divsi3.c \
  divti3.c \
  divxc3.c \
  extendsfdf2.c \
  extendhfsf2.c \
  ffsdi2.c \
  ffssi2.c \
  ffsti2.c \
  fixdfdi.c \
  fixdfsi.c \
  fixdfti.c \
  fixsfdi.c \
  fixsfsi.c \
  fixsfti.c \
  fixunsdfdi.c \
  fixunsdfsi.c \
  fixunsdfti.c \
  fixunssfdi.c \
  fixunssfsi.c \
  fixunssfti.c \
  floatdidf.c \
  floatdisf.c \
  floatsidf.c \
  floatsisf.c \
  floattidf.c \
  floattisf.c \
  floatundidf.c \
  floatundisf.c \
  floatunsidf.c \
  floatunsisf.c \
  floatuntidf.c \
  floatuntisf.c \
  fp_mode.c \
  int_util.c \
  lshrdi3.c \
  lshrti3.c \
  moddi3.c \
  modsi3.c \
  modti3.c \
  muldc3.c \
  muldf3.c \
  muldi3.c \
  mulodi4.c \
  mulosi4.c \
  muloti4.c \
  mulsc3.c \
  mulsf3.c \
  multi3.c \
  mulvdi3.c \
  mulvsi3.c \
  mulvti3.c \
  mulxc3.c \
  negdf2.c \
  negdi2.c \
  negsf2.c \
  negti2.c \
  negvdi2.c \
  negvsi2.c \
  negvti2.c \
  os_version_check.c \
  paritydi2.c \
  paritysi2.c \
  parityti2.c \
  popcountdi2.c \
  popcountsi2.c \
  popcountti2.c \
  powidf2.c \
  powisf2.c \
  subdf3.c \
  subsf3.c \
  subvdi3.c \
  subvsi3.c \
  subvti3.c \
  trampoline_setup.c \
  truncdfhf2.c \
  truncdfsf2.c \
  truncsfhf2.c \
  ucmpdi2.c \
  ucmpti2.c \
  udivdi3.c \
  udivmoddi4.c \
  udivmodsi4.c \
  udivmodti4.c \
  udivsi3.c \
  udivti3.c \
  umoddi3.c \
  umodsi3.c \
  umodti3.c

SRCS := $(GENERIC_SOURCES)

# String substitution for every C file.
# As an example, absvdi2.c turns into ./builtins/absvdi2.c
SRCS := $(SRCS:%=$(SRC_DIR)/%) $(PWD)/../cpu0/abort.c

# String substitution for every C/C++ file.
# As an example, absvdi2.c turns into ./build-$(CPU)-$(ENDIAN)/absvdi2.c.o
OBJS := $(SRCS:%=$(BUILD_DIR)/%.o)

# String substitution (suffix version without %).
# As an example, ./build/absvdi2.c.o turns into ./build-$(CPU)-$(ENDIAN)/absvdi2.c.d
DEPS := $(OBJS:.o=.d)

# Every folder in ./src will need to be passed to GCC so that it can find header files
# stdlib.h, ..., etc existed in ../../include
INC_DIRS := $(shell find $(SRC_DIR) -type d)  ../../include
# Add a prefix to INC_DIRS. So moduleA would become -ImoduleA. GCC understands this -I flag
INC_FLAGS := $(addprefix -I,$(INC_DIRS))

# The -MMD and -MP flags together generate Makefiles for us!
# These files will have .d instead of .o as the output.
CPPFLAGS := -MMD -MP -target cpu0$(ENDIAN)-unknown-linux-gnu -static \
  -fintegrated-as $(INC_FLAGS) -mcpu=$(CPU) -mllvm -has-lld=true

# The final build step.
$(TARGET): $(OBJS)
	$(AR) -rcs $@ $(OBJS)

# Build step for C source
$(BUILD_DIR)/%.c.o: %.c
	mkdir -p $(dir $@)
	$(CC) $(CPPFLAGS) $(CFLAGS) -c $< -o $@


.PHONY: clean
clean:
	rm -rf $(BUILD_DIR)

# Include the .d makefiles. The - at the f.cnt suppresses the er.crs.cf missing
# Makefiles. Initially, all the .d files will be missing, and we .cn't want t.cse
# er.crs .c s.cw up.
-include $(DEPS)

exlbt/input/ch_float.cpp

//#include "debug.h"

extern "C" int printf(const char *format, ...);
extern "C" int sprintf(char *out, const char *format, ...);

#include "ch9_3_longlongshift.cpp"

void test_printf()
{
  char buf[80];
  long long a = 0x100000007fffffff;
  printf("a: %llX, %llx, %lld\n", a, a, a);
  int b = 0x10000000;
  printf("b: %x, %d\n", b, b);
  sprintf(buf, "b: %x, %d\n", b, b); printf("%s", buf);

  // sanitizer_printf.cpp support right-justify for num only and left-justify
  // for string only. However, I change and support right-justify for cpu0.
  char ptr[] = "Hello world!";
  char *np = 0;
  int i = 5;
  unsigned int bs = sizeof(int)*8;
  int mi;

  mi = (1 << (bs-1)) + 1;
  printf("%s\n", ptr);
  printf("printf test\n");
  printf("%s is null pointer\n", np);
  printf("%d = 5\n", i);
  printf("%d = - max int\n", mi);
  printf("char %c = 'a'\n", 'a');
  printf("hex %x = ff\n", 0xff);
  printf("hex %02x = 00\n", 0);
  printf("signed %d = unsigned %u = hex %x\n", -3, -3, -3);
  printf("%d %s(s)", 0, "message");
  printf("\n");
  printf("%d %s(s) with %%\n", 0, "message");
  sprintf(buf, "justif: \"%-10s\"\n", "left"); printf("%s", buf);
  sprintf(buf, "justif: \"%10s\"\n", "right"); printf("%s", buf);
  sprintf(buf, " 3: %04d zero padded\n", 3); printf("%s", buf);
  sprintf(buf, " 3: %-4d left justif.\n", 3); printf("%s", buf);
  sprintf(buf, " 3: %4d right justif.\n", 3); printf("%s", buf);
  sprintf(buf, "-3: %04d zero padded\n", -3); printf("%s", buf);
  sprintf(buf, "-3: %-4d left justif.\n", -3); printf("%s", buf);
  sprintf(buf, "-3: %4d right justif.\n", -3); printf("%s", buf);
}

template <class T>
T test_shift_left(T a, T b) {
  return (a << b);
}

template <class T>
T test_shift_right(T a, T b) {
  return (a >> b);
}

template <class T1, class T2, class T3>
T1 test_add(T2 a, T3 b) {
  T1 c = a + b;
  return c;
}

template <class T1, class T2, class T3>
T1 test_mul(T2 a, T3 b) {
  T1 c = a * b;
  return c;
}

template <class T1, class T2, class T3>
T1 test_div(T2 a, T3 b) {
  T1 c = a / b;
  return c;
}

bool check_result(const char* fn, long long res, long long expected) {
  printf("%s = %lld\n", fn, res);
  if (res != expected) {
    printf("\terror: result %lld, expected %lld\n", res, expected);
  }
  return (res == expected);
}

bool check_result(const char* fn, unsigned long long res, unsigned long long expected) {
  printf("%s = %llu\n", fn, res);
  if (res != expected) {
    printf("\terror: result %llu, expected %llu\n", res, expected);
  }
  return (res == expected);
}

bool check_result(const char* fn, int res, int expected) {
  printf("%s = %d\n", fn, res);
  if (res != expected) {
    printf("\terror: result %d, expected %d\n", res, expected);
  }
  return (res == expected);
}

int main() {
  long long a;
  unsigned long long b;
  int c;

  test_printf();

  a = test_longlong_shift1();
  check_result("test_longlong_shift1()", a, 289LL);

  a = test_longlong_shift2();
  check_result("test_longlong_shift2()", a, 22LL);

// call __ashldi3
  a = test_shift_left<long long>(0x12LL, 4LL); // 0x120 = 288
  check_result("test_shift_left<long long>(0x12LL, 4LL)", a, 288LL);
  
// call __ashrdi3
  a = test_shift_right<long long>(0x001666660000000a, 48LL); // 0x16 = 22
  check_result("test_shift_right<long long>(0x001666660000000a, 48LL)", a, 22LL);
  
// call __lshrdi3
  b = test_shift_right<unsigned long long>(0x001666660000000a, 48LLu); // 0x16 = 22
  check_result("test_shift_right<unsigned long long>(0x001666660000000a, 48LLu)", b, 22LLu);
  
// call __addsf3, __fixsfsi
  c = (int)test_add<float, float, float>(-2.2, 3.3); // (int)1.1 = 1
  check_result("(int)test_add<float, float, float>(-2.2, 3.3)", c, 1);
  
// call __mulsf3, __fixsfsi
  c = (int)test_mul<float, float, float>(-2.2, 3.3); // (int)-7.26 = -7
  check_result("(int)test_mul<float, float, float>(-2.2, 3.3)", c, -7);
  
// call __divsf3, __fixsfsi
  c = (int)test_div<float, float, float>(-1.8, 0.5); // (int)-3.6 = -3
  check_result("(int)test_div<float, float, float>(-1.8, 0.5)", c, -3);
  
// call __extendsfdf2, __adddf3, __fixdfsi
  c = (int)test_add<double, double, float>(-2.2, 3.3); // (int)1.1 = 1
  check_result("(int)test_add<double, double, float>(-2.2, 3.3)", c, 1);
  
// call __extendsfdf2, __adddf3, __fixdfsi
  c = (int)test_add<double, float, double>(-2.2, 3.3); // (int)1.1 = 1
  check_result("(int)test_add<double, float, double>(-2.2, 3.3)", c, 1);
  
// call __extendsfdf2, __adddf3, __fixdfsi
  c = (int)test_add<float, float, double>(-2.2, 3.3); // (int)1.1 = 1
  check_result("(int)test_add<float, float, double>(-2.2, 3.3)", c, 1);
  
// call __extendsfdf2, __muldf3, __fixdfsi
  c = (int)test_mul<double, float, double>(-2.2, 3.3); // (int)-7.26 = -7
  check_result("(int)test_mul<double, float, double>(-2.2, 3.3)", c, -7);
  
// call __extendsfdf2, __muldf3, __truncdfsf2, __fixdfsi
// ! __truncdfsf2 in truncdfsf2.c is not work for Cpu0
  c = (int)test_mul<float, float, double>(-2.2, 3.3); // (int)-7.26 = -7
  check_result("(int)test_mul<float, float, double>(-2.2, 3.3)", c, -7);
  
// call __divdf3, __fixdfsi
  c = (int)test_div<double, double, double>(-1.8, 0.5); // (int)-3.6 = -3
  check_result("(int)test_div<double, double, double>(-1.8, 0.5)", c, -3);

#if 0 // these three do call builtins  
  c = (int)test_mul<int, int, int>(-2, 3); // -6
  check_result("(int)test_mul<int, int, int>(-2, 3)", c, -6);
  
  c = (int)test_div<int, int, int>(-10, 4); // -2 <- -2*4+2, quotient:-2, remainder:2 (remainder < 4:divident)
  check_result("(int)test_div<int, int, int>(-10, 4)", c, -3);
  
  a = test_mul<long long, long long, long long>(-2LL, 3LL); // -6LL
  check_result("test_mul<long long, long long, long long>(-2LL, 3LL)", a, -6LL);
#endif

// call __divdi3,
  a = test_div<long long, long long, long long>(-10LL, 4LL); // -3
  check_result("test_div<long long, long long, long long>(-10LL, 4LL)", a, -2LL);
  
  return 0;
}

exlbt/input/Makefile.float


SRCS := start.cpp debug.cpp sanitizer_printf.cpp printf-stdarg-def.c \
        cpu0-builtins.cpp ch_float.cpp lib_cpu0.c
LIBBUILTINS_DIR := ../compiler-rt/builtins
INC_DIRS := ../ $(NEWLIB_DIR)/newlib/libc/include $(LBDEX_DIR)/input
LIBS := $(LIBBUILTINS_DIR)/build-$(CPU)-$(ENDIAN)/libbuiltins.a

include Common.mk
chungshu@ChungShudeMacBook-Air input % bash make.sh cpu032II eb Makefile.float
...
endian =  BigEndian
ISR address:00020614
0   /* 0: big endian, 1: little endian */

chungshu@ChungShudeMacBook-Air verilog % iverilog -o cpu0IIs cpu0IIs.v
chungshu@ChungShudeMacBook-Air verilog % ./cpu0IIs
...

a: 100000007FFFFFFF, 100000007fffffff, 1152921506754330623
b: 10000000, 268435456
b: 10000000, 268435456
Hello world!
printf test
<null> is null pointer
5 = 5
-2147483647 = - max int
char a = 'a'
hex ff = ff
hex 00 = 00
signed -3 = unsigned 4294967293 = hex fffffffd
0 message(s)
0 message(s) with %
justif: "left      "
justif: "     right"
 3: 0003 zero padded
 3: 3    left justif.
 3:    3 right justif.
-3: -003 zero padded
-3: -3   left justif.
-3:   -3 right justif.
test_longlong_shift1() = 289
test_longlong_shift2() = 22
test_shift_left<long long>(0x12, 4LL) = 288
test_shift_right<long long>(0x001666660000000a, 48LL) = 22
test_shift_right<unsigned long long>(0x001666660000000a, 48LLu) = 22
(int)test_add<float, float, float>(-2.2, 3.3) = 1
(int)test_mul<float, float, float>(-2.2, 3.3) = -7
(int)test_div<float, float, float>(-1.8, 0.5) = -3
(int)test_add<double, double, float>(-2.2, 3.3) = 1
(int)test_add<double, float, double>(-2.2, 3.3) = 1
(int)test_add<float, float, double>(-2.2, 3.3) = 1
(int)test_mul<double, float, double>(-2.2, 3.3) = -7
(int)test_mul<float, float, double>(-2.2, 3.3) = -7
(int)test_div<double, double, double>(-1.8, 0.5) = -3
test_div<long long, long long, long long>(-10LL, 4LL) = -2
...
RET to PC < 0, finished!

The exlbt/input/compiler-rt-test/builtins/Unit directory is copied from compiler-rt/test/builtins/Unit as follows,

exlbt/input/ch_builtins.cpp

#include "debug.h"
#include <stdlib.h>

extern "C" int printf(const char *format, ...);
extern "C" int sprintf(char *out, const char *format, ...);

extern "C" int absvdi2_test();
extern "C" int absvsi2_test();
extern "C" int absvti2_test();
extern "C" int adddf3vfp_test();
extern "C" int addsf3vfp_test();
extern "C" int addvdi3_test();
extern "C" int addvsi3_test();
extern "C" int addvti3_test();
extern "C" int ashldi3_test();
extern "C" int ashlti3_test();
extern "C" int ashrdi3_test();
extern "C" int ashrti3_test();

// atomic.c need memcmp(...)
//extern "C" int atomic_test();
extern "C" int bswapdi2_test();
extern "C" int bswapsi2_test();

extern "C" int clzdi2_test();
extern "C" int clzsi2_test();
extern "C" int clzti2_test();
extern "C" int cmpdi2_test();
extern "C" int cmpti2_test();
extern "C" int comparedf2_test();
extern "C" int comparesf2_test();

// Needless to compare compiler_rt_logb() with logb() of libm
//extern "C" int compiler_rt_logb_test();
//extern "C" int compiler_rt_logbf_test();
//extern "C" int compiler_rt_logbl_test();

extern "C" int cpu_model_test();
extern "C" int ctzdi2_test();
extern "C" int ctzsi2_test();
extern "C" int ctzti2_test();

// div for complex type need libm: fabs, isinf, ..., skip it at this point
#ifdef HAS_COMPLEX
extern "C" int divdc3_test();
#endif
extern "C" int divdf3_test();
extern "C" int divdf3vfp_test();
extern "C" int divdi3_test();
extern "C" int divmodsi4_test();
extern "C" int divmodti4_test();
#ifdef HAS_COMPLEX
extern "C" int divsc3_test();
#endif
extern "C" int divsf3_test();
extern "C" int divsf3vfp_test();
extern "C" int divsi3_test();
#ifdef HAS_COMPLEX
extern "C" int divtc3_test();
#endif
extern "C" int divtf3_test();
extern "C" int divti3_test();
#ifdef HAS_COMPLEX
extern "C" int divxc3_test();
#endif
extern "C" int enable_execute_stack_test();
extern "C" int eqdf2vfp_test();
extern "C" int eqsf2vfp_test();
extern "C" int eqtf2_test();
extern "C" int extenddftf2_test();
extern "C" int extendhfsf2_test();
extern "C" int extendhftf2_test();
extern "C" int extendsfdf2vfp_test();
extern "C" int extendsftf2_test();
#if 0
extern "C" int gcc_personality_test();
#endif
extern "C" int gedf2vfp_test();
extern "C" int gesf2vfp_test();
extern "C" int getf2_test();
extern "C" int gtdf2vfp_test();
extern "C" int gtsf2vfp_test();
extern "C" int gttf2_test();
extern "C" int ledf2vfp_test();
extern "C" int lesf2vfp_test();
extern "C" int letf2_test();
extern "C" int lshrdi3_test();
extern "C" int lshrti3_test();
extern "C" int ltdf2vfp_test();
extern "C" int ltsf2vfp_test();
extern "C" int lttf2_test();
extern "C" int moddi3_test();
extern "C" int modsi3_test();
extern "C" int modst3_test();
extern "C" int modti3_test();
#ifdef HAS_COMPLEX
extern "C" int muldc3_test();
#endif
extern "C" int muldf3vfp_test();
extern "C" int muldi3_test();
extern "C" int mulodi4_test();
extern "C" int mulosi4_test();
extern "C" int muloti4_test();
#ifdef HAS_COMPLEX
extern "C" int mulsc3_test();
#endif
extern "C" int mulsf3vfp_test();
//extern "C" int mulsi3_test(); no this mulsi3.c
#ifdef HAS_COMPLEX
extern "C" int multc3_test();
#endif
extern "C" int multf3_test();
extern "C" int multi3_test();
extern "C" int mulvdi3_test();
extern "C" int mulvsi3_test();
extern "C" int mulvti3_test();
#ifdef HAS_COMPLEX
extern "C" int mulxc3_test();
#endif
extern "C" int nedf2vfp_test();
extern "C" int negdf2vfp_test();
extern "C" int negdi2_test();
extern "C" int negsf2vfp_test();
extern "C" int negti2_test();
extern "C" int negvdi2_test();
extern "C" int negvsi2_test();
extern "C" int negvti2_test();
extern "C" int nesf2vfp_test();
extern "C" int netf2_test();
/* need rand, signbit, ...
extern "C" int paritydi2_test();
extern "C" int paritysi2_test();
extern "C" int parityti2_test();
extern "C" int popcountdi2_test();
extern "C" int popcountsi2_test();
extern "C" int popcountti2_test();
extern "C" int powidf2_test();
extern "C" int powisf2_test();
extern "C" int powitf2_test();
extern "C" int powixf2_test();
*/
extern "C" int subdf3vfp_test();
extern "C" int subsf3vfp_test();
extern "C" int subtf3_test();
extern "C" int subvdi3_test();
extern "C" int subvsi3_test();
extern "C" int subvti3_test();
extern "C" int trampoline_setup_test();
extern "C" int truncdfhf2_test();
extern "C" int truncdfsf2_test();
extern "C" int truncdfsf2vfp_test();
extern "C" int truncsfhf2_test();
extern "C" int trunctfdf2_test();
extern "C" int trunctfhf2_test();
extern "C" int trunctfsf2_test();
extern "C" int ucmpdi2_test();
extern "C" int ucmpti2_test();
extern "C" int udivdi3_test();
extern "C" int udivmoddi4_test();
extern "C" int udivmodsi4_test();
extern "C" int udivmodti4_test();
extern "C" int udivsi3_test();
extern "C" int udivti3_test();
extern "C" int umoddi3_test();
extern "C" int umodsi3_test();
extern "C" int umodti3_test();
extern "C" int unorddf2vfp_test();
extern "C" int unordsf2vfp_test();
extern "C" int unordtf2_test();

void show_result(const char *fn, int res) {
  if (res == 1)
    printf("%s: FAIL!\n", fn);
  else if (res == 0)
    printf("%s: PASS!\n", fn);
  else if (res == -1)
    printf("%s: SKIPPED!\n", fn);
  else {
    printf("FIXME!");
    abort();
  }
}

int main() {
  int res = 0;

// pre-defined compiler macro (from llc -march=cpu0${ENDIAN} or
// clang -target cpu0${ENDIAN}-unknown-linux-gnu
#ifdef __CPU0EB__
  printf("__CPU0EB__\n");
#endif
#ifdef __CPU0EL__
  printf("__CPU0EL__\n");
#endif

  res = absvdi2_test();
  show_result("absvdi2_test()", res);

  res = absvsi2_test();
  show_result("absvsi2_test()", res);

  res = absvti2_test();
  show_result("absvti2_test()", res);

  res = adddf3vfp_test();
  show_result("adddf3vfp_test()", res);

  res = addsf3vfp_test();
  show_result("addsf3vfp_test()", res);

  res = addvdi3_test();
  show_result("addvdi3_test()", res);

  res = addvsi3_test();
  show_result("addvsi3_test()", res);

  res = addvti3_test();
  show_result("addvti3_test()", res);

  res = ashldi3_test();
  show_result("ashldi3_test()", res);

  res = ashlti3_test();
  show_result("ashlti3_test()", res);

  res = ashrdi3_test();
  show_result("ashrdi3_test()", res);

  res = ashrti3_test();
  show_result("ashrti3_test()", res);

#if 0 // atomic.c need memcmp(...)
  res = atomic_test();
  show_result("atomic_test()", res);
#endif

  res = bswapdi2_test();
  show_result("bswapdi2_test()", res);

  res = bswapsi2_test();
  show_result("bswapsi2_test()", res);

  res = clzdi2_test();
  show_result("clzdi2_test()", res);

  res = clzsi2_test();
  show_result("clzsi2_test()", res);

  res = clzti2_test();
  show_result("clzti2_test()", res);

  res = cmpdi2_test();
  show_result("cmpdi2_test()", res);

  res = cmpti2_test();
  show_result("cmpti2_test()", res);

  res = comparedf2_test();
  show_result("comparedf2_test()", res);

  res = comparesf2_test();
  show_result("comparesf2_test()", res);

//  res = compiler_rt_logb_test();
//  show_result("compiler_rt_logb_test()", res);

//  res = compiler_rt_logbf_test();
//  show_result("compiler_rt_logbf_test()", res);

//  res = compiler_rt_logbl_test();
//  show_result("compiler_rt_logbl_test()", res);

  res = cpu_model_test();
  show_result("cpu_model_test()", res);

  res = ctzdi2_test();
  show_result("ctzdi2_test()", res);

  res = ctzsi2_test();
  show_result("ctzsi2_test()", res);

  res = ctzti2_test();
  show_result("ctzti2_test()", res);

#ifdef HAS_COMPLEX
  res = divdc3_test();
  show_result("divdc3_test()", res);
#endif

  res = divdf3_test();
  show_result("divdf3_test()", res);

  res = divdf3vfp_test();
  show_result("divdf3vfp_test()", res);

  res = divdi3_test();
  show_result("divdi3_test()", res);

  res = divmodsi4_test();
  show_result("divmodsi4_test()", res);

  res = divmodti4_test();
  show_result("divmodti4_test()", res);

#ifdef HAS_COMPLEX
  res = divsc3_test();
  show_result("divsc3_test()", res);
#endif

  res = divsf3_test();
  show_result("divsf3_test()", res);

  res = divsf3vfp_test();
  show_result("divsf3vfp_test()", res);

  res = divsi3_test();
  show_result("divsi3_test()", res);

#ifdef HAS_COMPLEX
  res = divtc3_test();
  show_result("divtc3_test()", res);
#endif

  res = divtf3_test();
  show_result("divtf3_test()", res);

  res = divti3_test();
  show_result("divti3_test()", res);

#ifdef HAS_COMPLEX
  res = divxc3_test();
  show_result("divxc3_test()", res);
#endif

#if 0
  res = enable_execute_stack_test();
  show_result("enable_execute_stack_test()", res);
#endif

  res = eqdf2vfp_test();
  show_result("eqdf2vfp_test()", res);

  res = eqsf2vfp_test();
  show_result("eqsf2vfp_test()", res);

  res = eqtf2_test();
  show_result("eqtf2_test()", res);

  res = extenddftf2_test();
  show_result("extenddftf2_test()", res);

  res = extendhfsf2_test();
  show_result("extendhfsf2_test()", res);

  res = extendhftf2_test();
  show_result("extendhftf2_test()", res);

  res = extendsfdf2vfp_test();
  show_result("extendsfdf2vfp_test()", res);

  res = extendsftf2_test();
  show_result("extendsftf2_test()", res);

#if 0
  res = gcc_personality_test();
  show_result("gcc_personality_test()", res);
#endif

  res = gedf2vfp_test();
  show_result("gedf2vfp_test()", res);

  res = gesf2vfp_test();
  show_result("gesf2vfp_test()", res);

  res = getf2_test();
  show_result("getf2_test()", res);

  res = gtdf2vfp_test();
  show_result("gtdf2vfp_test()", res);

  res = gtsf2vfp_test();
  show_result("gtsf2vfp_test()", res);

  res = gttf2_test();
  show_result("gttf2_test()", res);

  res = ledf2vfp_test();
  show_result("ledf2vfp_test()", res);

  res = lesf2vfp_test();
  show_result("lesf2vfp_test()", res);

  res = letf2_test();
  show_result("letf2_test()", res);

  res = lshrdi3_test();
  show_result("lshrdi3_test()", res);

  res = lshrti3_test();
  show_result("lshrti3_test()", res);

  res = ltdf2vfp_test();
  show_result("ltdf2vfp_test()", res);

  res = ltsf2vfp_test();
  show_result("ltsf2vfp_test()", res);

  res = lttf2_test();
  show_result("lttf2_test()", res);

  res = moddi3_test();
  show_result("moddi3_test()", res);

  res = modsi3_test();
  show_result("modsi3_test()", res);

  res = modti3_test();
  show_result("modti3_test()", res);

#ifdef HAS_COMPLEX
  res = muldc3_test();
  show_result("muldc3_test()", res);
#endif

  res = muldf3vfp_test();
  show_result("muldf3vfp_test()", res);

  res = muldi3_test();
  show_result("muldi3_test()", res);

  res = mulodi4_test();
  show_result("mulodi4_test()", res);

  res = mulosi4_test();
  show_result("mulosi4_test()", res);

  res = muloti4_test();
  show_result("muloti4_test()", res);

#ifdef HAS_COMPLEX
  res = mulsc3_test();
  show_result("mulsc3_test()", res);
#endif

  res = mulsf3vfp_test();
  show_result("mulsf3vfp_test()", res);

// no mulsi3.c
//  res = mulsi3_test();
//  show_result("mulsi3_test()", res);

#ifdef HAS_COMPLEX
  res = multc3_test();
  show_result("multc3_test()", res);
#endif

  res = multf3_test();
  show_result("multf3_test()", res);

  res = multi3_test();
  show_result("multi3_test()", res);

  res = mulvdi3_test();
  show_result("mulvdi3_test()", res);

  res = mulvsi3_test();
  show_result("mulvsi3_test()", res);

  res = mulvti3_test();
  show_result("mulvti3_test()", res);

#ifdef HAS_COMPLEX
  res = mulxc3_test();
  show_result("mulxc3_test()", res);
#endif

  res = nedf2vfp_test();
  show_result("nedf2vfp_test()", res);

  res = negdf2vfp_test();
  show_result("negdf2vfp_test()", res);

  res = negdi2_test();
  show_result("negdi2_test()", res);

  res = negsf2vfp_test();
  show_result("negsf2vfp_test()", res);

  res = negti2_test();
  show_result("negti2_test()", res);

  res = negvdi2_test();
  show_result("negvdi2_test()", res);

  res = negvsi2_test();
  show_result("negvsi2_test()", res);

  res = negvti2_test();
  show_result("negvti2_test()", res);

  res = nesf2vfp_test();
  show_result("nesf2vfp_test()", res);

  res = netf2_test();
  show_result("netf2_test()", res);

/* need rand, signbit, ...
  res = paritydi2_test();
  show_result("paritydi2_test()", res);

  res = paritysi2_test();
  show_result("paritysi2_test()", res);

  res = parityti2_test();
  show_result("parityti2_test()", res);

  res = popcountdi2_test();
  show_result("popcountdi2_test()", res);

  res = popcountsi2_test();
  show_result("popcountsi2_test()", res);

  res = popcountti2_test();
  show_result("popcountti2_test()", res);

  res = powidf2_test();
  show_result("powidf2_test()", res);

  res = powisf2_test();
  show_result("powisf2_test()", res);

  res = powitf2_test();
  show_result("powitf2_test()", res);

  res = powixf2_test();
  show_result("powixf2_test()", res);
*/

  res = subdf3vfp_test();
  show_result("subdf3vfp_test()", res);

  res = subsf3vfp_test();
  show_result("subsf3vfp_test()", res);

  res = subtf3_test();
  show_result("subtf3_test()", res);

  res = subvdi3_test();
  show_result("subvdi3_test()", res);

  res = subvsi3_test();
  show_result("subvsi3_test()", res);

  res = subvti3_test();
  show_result("subvti3_test()", res);

  res = trampoline_setup_test();
  show_result("trampoline_setup_test()", res);

  res = truncdfhf2_test();
  show_result("truncdfhf2_test()", res);

  res = truncdfsf2_test();
  show_result("truncdfsf2_test()", res);

  res = truncdfsf2vfp_test();
  show_result("truncdfsf2vfp_test()", res);

  res = truncsfhf2_test();
  show_result("truncsfhf2_test()", res);

  res = trunctfdf2_test();
  show_result("trunctfdf2_test()", res);

  res = trunctfhf2_test();
  show_result("trunctfhf2_test()", res);

  res = trunctfsf2_test();
  show_result("trunctfsf2_test()", res);

  res = ucmpdi2_test();
  show_result("ucmpdi2_test()", res);

  res = ucmpti2_test();
  show_result("ucmpti2_test()", res);

  res = udivdi3_test();
  show_result("udivdi3_test()", res);

  res = udivmoddi4_test();
  show_result("udivmoddi4_test()", res);

  res = udivmodsi4_test();
  show_result("udivmodsi4_test()", res);

  res = udivmodti4_test();
  show_result("udivmodti4_test()", res);

  res = udivsi3_test();
  show_result("udivsi3_test()", res);

  res = udivti3_test();
  show_result("udivti3_test()", res);

  res = umoddi3_test();
  show_result("umoddi3_test()", res);

  res = umodsi3_test();
  show_result("umodsi3_test()", res);

  res = umodti3_test();
  show_result("umodti3_test()", res);

  res = unorddf2vfp_test();
  show_result("unorddf2vfp_test()", res);

  res = unordsf2vfp_test();
  show_result("unordsf2vfp_test()", res);

  res = unordtf2_test();
  show_result("unordtf2_test()", res);

  return 0;
}

exlbt/input/Makefile.builtins

# CPU and endian passed from command line, such as 
#   "make -f Makefile.builtins CPU=cpu032II ENDIAN=eb or
#   "make -f Makefile.builtins CPU=cpu032I ENDIAN=el

# start.cpp must be put at beginning
SRCS :=  start.cpp debug.cpp syscalls.c sanitizer_printf.cpp printf-stdarg-def.c \
  compiler-rt-test/builtins/Unit/absvdi2_test.c \
  compiler-rt-test/builtins/Unit/absvsi2_test.c \
  compiler-rt-test/builtins/Unit/absvti2_test.c \
  compiler-rt-test/builtins/Unit/adddf3vfp_test.c \
  compiler-rt-test/builtins/Unit/addsf3vfp_test.c \
  compiler-rt-test/builtins/Unit/addvdi3_test.c \
  compiler-rt-test/builtins/Unit/addvsi3_test.c \
  compiler-rt-test/builtins/Unit/addvti3_test.c \
  compiler-rt-test/builtins/Unit/ashldi3_test.c \
  compiler-rt-test/builtins/Unit/ashlti3_test.c \
  compiler-rt-test/builtins/Unit/ashrdi3_test.c \
  compiler-rt-test/builtins/Unit/ashrti3_test.c \
  compiler-rt-test/builtins/Unit/bswapdi2_test.c \
  compiler-rt-test/builtins/Unit/bswapsi2_test.c \
  compiler-rt-test/builtins/Unit/clzdi2_test.c \
  compiler-rt-test/builtins/Unit/clzsi2_test.c \
  compiler-rt-test/builtins/Unit/clzti2_test.c \
  compiler-rt-test/builtins/Unit/cmpdi2_test.c \
  compiler-rt-test/builtins/Unit/cmpti2_test.c \
  compiler-rt-test/builtins/Unit/comparedf2_test.c \
  compiler-rt-test/builtins/Unit/comparesf2_test.c \
  compiler-rt-test/builtins/Unit/cpu_model_test.c \
  compiler-rt-test/builtins/Unit/ctzdi2_test.c \
  compiler-rt-test/builtins/Unit/ctzsi2_test.c \
  compiler-rt-test/builtins/Unit/ctzti2_test.c \
  compiler-rt-test/builtins/Unit/divdc3_test.c \
  compiler-rt-test/builtins/Unit/divdf3_test.c \
  compiler-rt-test/builtins/Unit/divdf3vfp_test.c \
  compiler-rt-test/builtins/Unit/divdi3_test.c \
  compiler-rt-test/builtins/Unit/divmodsi4_test.c \
  compiler-rt-test/builtins/Unit/divmodti4_test.c \
  compiler-rt-test/builtins/Unit/divsc3_test.c \
  compiler-rt-test/builtins/Unit/divsf3_test.c \
  compiler-rt-test/builtins/Unit/divsf3vfp_test.c \
  compiler-rt-test/builtins/Unit/divsi3_test.c \
  compiler-rt-test/builtins/Unit/divtc3_test.c \
  compiler-rt-test/builtins/Unit/divtf3_test.c \
  compiler-rt-test/builtins/Unit/divti3_test.c \
  compiler-rt-test/builtins/Unit/divxc3_test.c \
  compiler-rt-test/builtins/Unit/enable_execute_stack_test.c \
  compiler-rt-test/builtins/Unit/eqdf2vfp_test.c \
  compiler-rt-test/builtins/Unit/eqsf2vfp_test.c \
  compiler-rt-test/builtins/Unit/eqtf2_test.c \
  compiler-rt-test/builtins/Unit/extenddftf2_test.c \
  compiler-rt-test/builtins/Unit/extendhfsf2_test.c \
  compiler-rt-test/builtins/Unit/extendhftf2_test.c \
  compiler-rt-test/builtins/Unit/extendsfdf2vfp_test.c \
  compiler-rt-test/builtins/Unit/extendsftf2_test.c \
  compiler-rt-test/builtins/Unit/gedf2vfp_test.c \
  compiler-rt-test/builtins/Unit/gesf2vfp_test.c \
  compiler-rt-test/builtins/Unit/getf2_test.c \
  compiler-rt-test/builtins/Unit/gtdf2vfp_test.c \
  compiler-rt-test/builtins/Unit/gtsf2vfp_test.c \
  compiler-rt-test/builtins/Unit/gttf2_test.c \
  compiler-rt-test/builtins/Unit/ledf2vfp_test.c \
  compiler-rt-test/builtins/Unit/lesf2vfp_test.c \
  compiler-rt-test/builtins/Unit/letf2_test.c \
  compiler-rt-test/builtins/Unit/lshrdi3_test.c \
  compiler-rt-test/builtins/Unit/lshrti3_test.c \
  compiler-rt-test/builtins/Unit/ltdf2vfp_test.c \
  compiler-rt-test/builtins/Unit/ltsf2vfp_test.c \
  compiler-rt-test/builtins/Unit/lttf2_test.c \
  compiler-rt-test/builtins/Unit/moddi3_test.c \
  compiler-rt-test/builtins/Unit/modsi3_test.c \
  compiler-rt-test/builtins/Unit/modti3_test.c \
  compiler-rt-test/builtins/Unit/muldc3_test.c \
  compiler-rt-test/builtins/Unit/muldf3vfp_test.c \
  compiler-rt-test/builtins/Unit/muldi3_test.c \
  compiler-rt-test/builtins/Unit/mulodi4_test.c \
  compiler-rt-test/builtins/Unit/mulosi4_test.c \
  compiler-rt-test/builtins/Unit/muloti4_test.c \
  compiler-rt-test/builtins/Unit/mulsc3_test.c \
  compiler-rt-test/builtins/Unit/mulsf3vfp_test.c \
  compiler-rt-test/builtins/Unit/mulsi3_test.c \
  compiler-rt-test/builtins/Unit/multc3_test.c \
  compiler-rt-test/builtins/Unit/multf3_test.c \
  compiler-rt-test/builtins/Unit/multi3_test.c \
  compiler-rt-test/builtins/Unit/mulvdi3_test.c \
  compiler-rt-test/builtins/Unit/mulvsi3_test.c \
  compiler-rt-test/builtins/Unit/mulvti3_test.c \
  compiler-rt-test/builtins/Unit/mulxc3_test.c \
  compiler-rt-test/builtins/Unit/nedf2vfp_test.c \
  compiler-rt-test/builtins/Unit/negdf2vfp_test.c \
  compiler-rt-test/builtins/Unit/negdi2_test.c \
  compiler-rt-test/builtins/Unit/negsf2vfp_test.c \
  compiler-rt-test/builtins/Unit/negti2_test.c \
  compiler-rt-test/builtins/Unit/negvdi2_test.c \
  compiler-rt-test/builtins/Unit/negvsi2_test.c \
  compiler-rt-test/builtins/Unit/negvti2_test.c \
  compiler-rt-test/builtins/Unit/nesf2vfp_test.c \
  compiler-rt-test/builtins/Unit/netf2_test.c \
  compiler-rt-test/builtins/Unit/paritydi2_test.c \
  compiler-rt-test/builtins/Unit/paritysi2_test.c \
  compiler-rt-test/builtins/Unit/parityti2_test.c \
  compiler-rt-test/builtins/Unit/popcountdi2_test.c \
  compiler-rt-test/builtins/Unit/popcountsi2_test.c \
  compiler-rt-test/builtins/Unit/popcountti2_test.c \
  compiler-rt-test/builtins/Unit/powidf2_test.c \
  compiler-rt-test/builtins/Unit/powisf2_test.c \
  compiler-rt-test/builtins/Unit/powitf2_test.c \
  compiler-rt-test/builtins/Unit/powixf2_test.c \
  compiler-rt-test/builtins/Unit/subdf3vfp_test.c \
  compiler-rt-test/builtins/Unit/subsf3vfp_test.c \
  compiler-rt-test/builtins/Unit/subtf3_test.c \
  compiler-rt-test/builtins/Unit/subvdi3_test.c \
  compiler-rt-test/builtins/Unit/subvsi3_test.c \
  compiler-rt-test/builtins/Unit/subvti3_test.c \
  compiler-rt-test/builtins/Unit/trampoline_setup_test.c \
  compiler-rt-test/builtins/Unit/truncdfhf2_test.c \
  compiler-rt-test/builtins/Unit/truncdfsf2_test.c \
  compiler-rt-test/builtins/Unit/truncdfsf2vfp_test.c \
  compiler-rt-test/builtins/Unit/truncsfhf2_test.c \
  compiler-rt-test/builtins/Unit/trunctfdf2_test.c \
  compiler-rt-test/builtins/Unit/trunctfhf2_test.c \
  compiler-rt-test/builtins/Unit/trunctfsf2_test.c \
  compiler-rt-test/builtins/Unit/ucmpdi2_test.c \
  compiler-rt-test/builtins/Unit/ucmpti2_test.c \
  compiler-rt-test/builtins/Unit/udivdi3_test.c \
  compiler-rt-test/builtins/Unit/udivmoddi4_test.c \
  compiler-rt-test/builtins/Unit/udivmodsi4_test.c \
  compiler-rt-test/builtins/Unit/udivmodti4_test.c \
  compiler-rt-test/builtins/Unit/udivsi3_test.c \
  compiler-rt-test/builtins/Unit/udivti3_test.c \
  compiler-rt-test/builtins/Unit/umoddi3_test.c \
  compiler-rt-test/builtins/Unit/umodsi3_test.c \
  compiler-rt-test/builtins/Unit/umodti3_test.c \
  compiler-rt-test/builtins/Unit/unorddf2vfp_test.c \
  compiler-rt-test/builtins/Unit/unordsf2vfp_test.c \
  compiler-rt-test/builtins/Unit/unordtf2_test.c \
  cpu0-builtins.cpp ch_builtins.cpp lib_cpu0.c

INC_DIRS := ./ $(LBDEX_DIR)/input \
            $(HOME)/llvm/llvm-project/compiler-rt/lib/builtins \
            $(NEWLIB_DIR)/newlib/libc/include \
            $(NEWLIB_DIR)/libgloss 
LIBBUILTINS_DIR := ../compiler-rt/builtins
LIBS := $(LIBBUILTINS_DIR)/build-$(CPU)-$(ENDIAN)/libbuiltins.a \
        $(NEWLIB_DIR)/build-$(CPU)-$(ENDIAN)/libm.a \
        $(NEWLIB_DIR)/build-$(CPU)-$(ENDIAN)/libc.a

include Common.mk

Run the tests as follows,

chungshu@ChungShudeMacBook-Air input % bash make.sh cpu032II eb Makefile.builtins
...
chungshu@ChungShudeMacBook-Air verilog % ./cpu0IIs
...
absvdi2_test(): PASS!
absvsi2_test(): PASS!
absvti2_test(): SKIPPED!
adddf3vfp_test(): SKIPPED!
addsf3vfp_test(): SKIPPED!
addvdi3_test(): PASS!
addvsi3_test(): PASS!
addvti3_test(): SKIPPED!
ashldi3_test(): PASS!
ashlti3_test(): SKIPPED!
ashrdi3_test(): PASS!
ashrti3_test(): SKIPPED!
bswapdi2_test(): PASS!
bswapsi2_test(): PASS!
clzdi2_test(): PASS!
clzsi2_test(): PASS!
clzti2_test(): SKIPPED!
cmpdi2_test(): PASS!
cmpti2_test(): SKIPPED!
comparedf2_test(): PASS!
comparesf2_test(): PASS!
cpu_model_test(): SKIPPED!
ctzdi2_test(): PASS!
ctzsi2_test(): PASS!
ctzti2_test(): SKIPPED!
divdc3_test(): PASS!
divdf3_test(): PASS!
divdf3vfp_test(): SKIPPED!
divdi3_test(): PASS!
divmodsi4_test(): PASS!
divmodti4_test(): SKIPPED!
divsf3_test(): PASS!
divsf3vfp_test(): SKIPPED!
divsi3_test(): PASS!
divtc3_test(): PASS!
divtf3_test(): SKIPPED!
divti3_test(): SKIPPED!
divxc3_test(): PASS!
eqdf2vfp_test(): SKIPPED!
eqsf2vfp_test(): SKIPPED!
eqtf2_test(): SKIPPED!
extenddftf2_test(): SKIPPED!
extendhfsf2_test(): PASS!
extendhftf2_test(): SKIPPED!
extendsfdf2vfp_test(): SKIPPED!
extendsftf2_test(): SKIPPED!
gedf2vfp_test(): SKIPPED!
gesf2vfp_test(): SKIPPED!
getf2_test(): SKIPPED!
gtdf2vfp_test(): SKIPPED!
gtsf2vfp_test(): SKIPPED!
gttf2_test(): SKIPPED!
ledf2vfp_test(): SKIPPED!
lesf2vfp_test(): SKIPPED!
letf2_test(): SKIPPED!
lshrdi3_test(): PASS!
lshrti3_test(): SKIPPED!
ltdf2vfp_test(): SKIPPED!
ltsf2vfp_test(): SKIPPED!
lttf2_test(): SKIPPED!
moddi3_test(): PASS!
modsi3_test(): PASS!
modti3_test(): SKIPPED!
muldc3_test(): PASS!
muldf3vfp_test(): SKIPPED!
muldi3_test(): PASS!
mulodi4_test(): PASS!
mulosi4_test(): PASS!
muloti4_test(): SKIPPED!
mulsc3_test(): PASS!
mulsf3vfp_test(): SKIPPED!
multc3_test(): SKIPPED!
multf3_test(): SKIPPED!
multi3_test(): SKIPPED!
mulvdi3_test(): PASS!
mulvsi3_test(): PASS!
mulvti3_test(): SKIPPED!
mulxc3_test(): PASS!
nedf2vfp_test(): SKIPPED!
negdf2vfp_test(): SKIPPED!
negdi2_test(): PASS!
negsf2vfp_test(): SKIPPED!
negti2_test(): SKIPPED!
negvdi2_test(): PASS!
negvsi2_test(): PASS!
negvti2_test(): SKIPPED!
nesf2vfp_test(): SKIPPED!
netf2_test(): SKIPPED!
subdf3vfp_test(): SKIPPED!
subsf3vfp_test(): SKIPPED!
subtf3_test(): SKIPPED!
subvdi3_test(): PASS!
subvsi3_test(): PASS!
subvti3_test(): SKIPPED!
trampoline_setup_test(): SKIPPED!
truncdfhf2_test(): PASS!
truncdfsf2_test(): PASS!
truncdfsf2vfp_test(): SKIPPED!
truncsfhf2_test(): PASS!
trunctfdf2_test(): SKIPPED!
trunctfhf2_test(): SKIPPED!
trunctfsf2_test(): SKIPPED!
ucmpdi2_test(): PASS!
ucmpti2_test(): SKIPPED!
udivdi3_test(): PASS!
udivmoddi4_test(): PASS!
udivmodsi4_test(): PASS!
udivmodti4_test(): SKIPPED!
udivsi3_test(): PASS!
udivti3_test(): SKIPPED!
umoddi3_test(): PASS!
umodsi3_test(): PASS!
umodti3_test(): SKIPPED!
unorddf2vfp_test(): SKIPPED!
unordsf2vfp_test(): SKIPPED!
unordtf2_test(): SKIPPED!
...
RET to PC < 0, finished!