* add `@bswap` builtin function. See #767
* comptime evaluation facilities are improved to be able to
handle a `@ptrCast` with a backing array.
* `@truncate` allows "truncating" a u0 value to any integer
type, and the result is always comptime known to be `0`.
* when specifying pointer alignment in a type expression,
the alignment value of pointers which do not have addresses
at runtime is ignored, and always has the default/ABI alignment
* threw in a fix to freebsd/x86_64.zig to update syntax from
language changes
* some improvements are pending #863closes#638closes#1733
std lib API changes
* io.InStream().readIntNe renamed to readIntNative
* io.InStream().readIntLe renamed to readIntLittle
* io.InStream().readIntBe renamed to readIntBig
* introduced io.InStream().readIntForeign
* io.InStream().readInt has parameter order changed
* io.InStream().readVarInt has parameter order changed
* io.InStream().writeIntNe renamed to writeIntNative
* introduced io.InStream().writeIntForeign
* io.InStream().writeIntLe renamed to writeIntLittle
* io.InStream().writeIntBe renamed to writeIntBig
* io.InStream().writeInt has parameter order changed
* mem.readInt has different parameters and semantics
* introduced mem.readIntNative
* introduced mem.readIntForeign
* mem.readIntBE renamed to mem.readIntBig and different API
* mem.readIntLE renamed to mem.readIntLittle and different API
* introduced mem.readIntSliceNative
* introduced mem.readIntSliceForeign
* introduced mem.readIntSliceLittle
* introduced mem.readIntSliceBig
* introduced mem.readIntSlice
* mem.writeInt has different parameters and semantics
* introduced mem.writeIntNative
* introduced mem.writeIntForeign
* mem.writeIntBE renamed to mem.readIntBig and different semantics
* mem.writeIntLE renamed to mem.readIntLittle and different semantics
* introduced mem.writeIntSliceForeign
* introduced mem.writeIntSliceNative
* introduced mem.writeIntSliceBig
* introduced mem.writeIntSliceLittle
* introduced mem.writeIntSlice
* removed mem.endianSwapIfLe
* removed mem.endianSwapIfBe
* removed mem.endianSwapIf
* added mem.littleToNative
* added mem.bigToNative
* added mem.toNative
* added mem.nativeTo
* added mem.nativeToLittle
* added mem.nativeToBig
These are translated from [monocypher](https://monocypher.org/) which
has fairly competitive performance while remaining quite simple.
Initial performance comparision:
Zig:
Poly1305: 1423 MiB/s
X25519: 8671 exchanges per second
Monocypher:
Poly1305: 1567 MiB/s
X25519: 10539 exchanges per second
There is room for improvement and no real effort has been made at all in
optimization beyond a direct translation.
The main changes are:
Unrolling the inner rounds of salsa20_wordtobyte which doubles the speed.
Passing the slice explicitly instead of returning the array saves a copy (can optimize out in future with copy elision) and gives ~10% improvement.
Inlining the outer loop gives ~15-20% improvement but it costs an extra 4Kb of code space. I think the tradeoff is worthwhile here.
The other inline loops are small and can be done by the compiler if it is worthwhile.
The rotate function replacement doesn't alter the performance from the former.
The modified throughput test I've used to benchmark is as follows. Interestingly we need to allocate memory instead of using a fixed buffer else Zig optimizes the whole thing out.
https://github.com/ziglang/zig/pull/1369#issuecomment-416456628
See #770
To help automatically translate code, see the
zig-fmt-pointer-reform-2 branch.
This will convert all & into *. Due to the syntax
ambiguity (which is why we are making this change),
even address-of & will turn into *, so you'll have
to manually fix thes instances. You will be guaranteed
to get compile errors for them - expected 'type', found 'foo'
Issue #699 since fixed. Nearly a x3 perf improvement.
Using --release-fast.
Sha3_256 (before): 96 Mb/s
Sha3_256 (after): 267 Mb/s
Sha3_512 (before): 53 Mb/s
Sha3_512 (after): 142 Mb/s
No real gains from unrolling other initialization loops in crypto
functions so have been left as is.
The purpose of this is:
* Only one way to do things
* Changing a function with void return type to return a possible
error becomes a 1 character change, subtly encouraging
people to use errors.
See #632
Here are some imperfect sed commands for performing this update:
remove arrow:
```
sed -i 's/\(\bfn\b.*\)-> /\1/g' $(find . -name "*.zig")
```
add void:
```
sed -i 's/\(\bfn\b.*\))\s*{/\1) void {/g' $(find ../ -name "*.zig")
```
Some cleanup may be necessary, but this should do the bulk of the work.
These are on the slower side and could be improved. No performance optimizations
yet have been done.
```
Cpu: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
```
-- Sha3-256
```
Zig --release-fast
93 Mb/s
Zig --release-safe
99 Mb/s
Zig
4 Mb/s
```
-- Sha3-512
```
Zig --release-fast
49 Mb/s
Zig --release-safe
54 Mb/s
Zig
2 Mb/s
```
Interestingly, release-safe is producing slightly better code than
release-fast.
Some performance comparisons to C.
We take the fastest time measurement taken across multiple runs.
The block hashing functions use the same md5/sha1 methods.
```
Cpu: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
Gcc: 7.2.1 20171224
Clang: 5.0.1
Zig: 0.1.1.304f6f1d
```
See https://www.nayuki.io/page/fast-md5-hash-implementation-in-x86-assembly:
```
gcc -O2
661 Mb/s
clang -O2
490 Mb/s
zig --release-fast and zig --release-safe
570 Mb/s
zig
50 Mb/s
```
See https://www.nayuki.io/page/fast-sha1-hash-implementation-in-x86-assembly:
```
gcc -O2
588 Mb/s
clang -O2
563 Mb/s
zig --release-fast and zig --release-safe
610 Mb/s
zig
21 Mb/s
```
In short, zig provides pretty useful tools for writing this sort of
code. We are in the lead against clang (which uses the same LLVM
backend) with us being slower only against md5 with GCC.