This removes the compiler_rt.setXmm0 hack. Instead, for
the functions that use i128 or u128 in their parameter and
return types, we use `@Vector(2, u64)` which generates
the LLVM IR `<2 x i64>` type that matches what Clang
generates for `typedef int ti_int __attribute__ ((mode (TI)))`
when targeting Windows x86_64.
* add __multi3 compiler rt function. See #1290
* compiler rt includes ARM functions for thumb and aarch64 and
other sub-arches left out. See #1526
* support C ABI for returning structs on ARM. see #1481