191 lines
9.7 KiB
Markdown
191 lines
9.7 KiB
Markdown
|
blake256
|
||
|
========
|
||
|
|
||
|
[![Build Status](https://github.com/decred/dcrd/workflows/Build%20and%20Test/badge.svg)](https://github.com/decred/dcrd/actions)
|
||
|
[![ISC License](https://img.shields.io/badge/license-ISC-blue.svg)](http://copyfree.org)
|
||
|
[![Doc](https://img.shields.io/badge/doc-reference-blue.svg)](https://pkg.go.dev/github.com/decred/dcrd/crypto/blake256)
|
||
|
|
||
|
## Overview
|
||
|
|
||
|
Package `blake256` implements the [BLAKE-256 and BLAKE-224 cryptographic hash
|
||
|
functions](https://www.aumasson.jp/blake/blake.pdf) (SHA-3 candidate) in pure Go
|
||
|
along with highly optimized SSE2, SSE4.1, and AVX acceleration.
|
||
|
|
||
|
It provides an API that enables zero allocations and the ability to save and
|
||
|
restore the intermediate state (also often called the midstate). The design
|
||
|
philosophy has a strong on emphasis correctness, readability, and efficiency
|
||
|
while also aiming to provide an ergonomic API.
|
||
|
|
||
|
In addition to the zero allocation API, it also implements the standard library
|
||
|
interfaces `hash.Hash`, `encoding.BinaryMarshaler`, and
|
||
|
`encoding.BinaryUnmarshaler` for callers that are not as concerned about
|
||
|
avoiding allocations. No dependencies beyond the standard library are required.
|
||
|
|
||
|
A full suite of tests with 100% branch coverage and benchmarks are provided to
|
||
|
help ensure proper functionality and analyze performance characteristics.
|
||
|
|
||
|
The core assembly code to take advantage of the `amd64` SIMD vector extensions
|
||
|
is generated with Go via [avo](https://github.com/mmcloughlin/avo).
|
||
|
|
||
|
[Show me the benchmarks already](#benchmarks)!
|
||
|
|
||
|
[Example Usage?](#examples)
|
||
|
|
||
|
## Hashing Data
|
||
|
|
||
|
The simplest way to hash data that is already serialized into bytes is via the
|
||
|
global `Sum224` (BLAKE-224) or `Sum256` (BLAKE-256) functions. This is
|
||
|
demonstrated for BLAKE-256 via the "Basic Usage" example linked in the
|
||
|
[Examples](#examples) section.
|
||
|
|
||
|
However, since hashing typically involves writing various pieces of information
|
||
|
that aren't already serialized, this package provides `NewHasher224` (BLAKE-224)
|
||
|
and `NewHasher256` (BLAKE-256) (and their respective variants `NewHasher224Salt`
|
||
|
and `NewHasher256Salt` that accept salt).
|
||
|
|
||
|
These methods return rolling hasher instances that support writing an arbitrary
|
||
|
amount of data along with several convenience methods for writing various data
|
||
|
types in either big endian or little endian. For example, `WriteString` adds a
|
||
|
string encoded as its UTF-8 byte sequence to the rolling hash and
|
||
|
`WriteUint64BE` adds an unsigned 64-bit integer encoded as an 8-byte big-endian
|
||
|
byte sequence to the rolling hash.
|
||
|
|
||
|
The hash is then obtained via the `Sum224` (BLAKE-224) or `Sum256` (BLAKE-256)
|
||
|
method on the respective hasher instance.
|
||
|
|
||
|
See the "Rolling Hasher Usage" example linked in the [Examples](#examples)
|
||
|
section to see rolling hashing in action.
|
||
|
|
||
|
## Saving and Resuming Intermediate States
|
||
|
|
||
|
Many applications involve hashing data that always starts with the same sequence
|
||
|
of bytes (aka a shared prefix). Whenever that prefix is larger than the block
|
||
|
size (`BlockSize`), or it is otherwise costly to generate and serialize, it is
|
||
|
typically more efficient to save the intermediate state (midstate) after writing
|
||
|
the shared prefix so that all future hashes can resume from that midstate and
|
||
|
thereby avoid redoing work.
|
||
|
|
||
|
To that end, the aforementioned rolling hasher instances support being copied to
|
||
|
save and restore the current midstate within the same process. This is
|
||
|
demonstrated via the "Same Process Save and Restore" example linked in the
|
||
|
[Examples](#examples) section.
|
||
|
|
||
|
Alternatively, when a simple copy of the instance is not possible, such as when
|
||
|
the midstate is needed among multiple processes, perhaps on entirely different
|
||
|
hardware, it can be serialized via `SaveState` and restored via
|
||
|
`UnmarshalBinary`. Note that there is necessarily additional overhead involved
|
||
|
with serializing and deserializing the intermediate state, so callers should be
|
||
|
sure to compare that overhead with rehashing the shared data to see which
|
||
|
approach yields better results for their particular application.
|
||
|
|
||
|
## Hashing With Salt
|
||
|
|
||
|
This implementation also provides `NewHasher224Salt` (BLAKE-224) and
|
||
|
`NewHasher256Salt` (BLAKE-256) which accept a 16-byte salt input as described by
|
||
|
the specification. Hashing with distinct salts effectively provides an
|
||
|
efficient method to hash with different functions while using the same
|
||
|
underlying algorithm. The salted variants behave exactly the same as the normal
|
||
|
unsalted variants described throughout the documentation.
|
||
|
|
||
|
## Benchmarks
|
||
|
|
||
|
The following benchmarks are from a Ryzen 7 5800X3D processor on Linux and are
|
||
|
the result of feeding `benchstat` 10 iterations of each. Benchmarks for both
|
||
|
BLAKE-224 and BLAKE-256 are provided. They are essentialy identical (within the
|
||
|
margin of error) as expected since the only notable difference as it pertains to
|
||
|
performance is that the final output is 4 bytes shorter.
|
||
|
|
||
|
### BLAKE-256 Hashing Benchmarks
|
||
|
|
||
|
The following results demonstrate the performance of hashing various amounts of
|
||
|
data for both small and larger inputs with the `Sum256` method.
|
||
|
|
||
|
Operation | Pure Go | SSE2 | SSE4.1 | AVX
|
||
|
-----------------|--------------|--------------|--------------|-------------
|
||
|
`Sum256` (32b) | 168MB/s ± 1% | 188MB/s ± 1% | 232MB/s ± 0% | 234MB/s ± 1%
|
||
|
`Sum256` (64b) | 187MB/s ± 0% | 208MB/s ± 0% | 270MB/s ± 1% | 271MB/s ± 1%
|
||
|
`Sum256` (1KiB) | 378MB/s ± 1% | 421MB/s ± 1% | 536MB/s ± 1% | 539MB/s ± 1%
|
||
|
`Sum256` (8KiB) | 405MB/s ± 1% | 448MB/s ± 0% | 573MB/s ± 0% | 573MB/s ± 0%
|
||
|
`Sum256` (16KiB) | 402MB/s ± 1% | 449MB/s ± 0% | 575MB/s ± 0% | 575MB/s ± 0%
|
||
|
|
||
|
Operation | Pure Go | SSE2 | SSE4.1 | AVX | Allocs / Op
|
||
|
-----------------|-------------|-------------|-------------|-------------|------------
|
||
|
`Sum256` (32b) | 190ns ± 1% | 170ns ± 1% | 138ns ± 0% | 137ns ± 1% | 0
|
||
|
`Sum256` (64b) | 342ns ± 0% | 308ns ± 0% | 237ns ± 1% | 236ns ± 1% | 0
|
||
|
`Sum256` (1KiB) | 2.71µs ± 1% | 2.43µs ± 1% | 1.91µs ± 1% | 1.90µs ± 1% | 0
|
||
|
`Sum256` (8KiB) | 20.2µs ± 1% | 18.3µs ± 0% | 14.3µs ± 0% | 14.3µs ± 0% | 0
|
||
|
`Sum256` (16KiB) | 40.8µs ± 1% | 36.5µs ± 0% | 28.5µs ± 0% | 28.5µs ± 0% | 0
|
||
|
|
||
|
### BLAKE-224 Hashing Benchmarks
|
||
|
|
||
|
The following results demonstrate the performance of hashing various amounts of
|
||
|
data for both small and larger inputs with the `Sum224` method.
|
||
|
|
||
|
Operation | Pure Go | SSE2 | SSE4.1 | AVX
|
||
|
-----------------|--------------|--------------|--------------|-------------
|
||
|
`Sum224` (32b) | 171MB/s ± 1% | 188MB/s ± 1% | 232MB/s ± 1% | 234MB/s ± 1%
|
||
|
`Sum224` (64b) | 187MB/s ± 2% | 209MB/s ± 1% | 269MB/s ± 1% | 271MB/s ± 1%
|
||
|
`Sum224` (1KiB) | 378MB/s ± 1% | 423MB/s ± 1% | 539MB/s ± 1% | 536MB/s ± 1%
|
||
|
`Sum224` (8KiB) | 404MB/s ± 1% | 447MB/s ± 1% | 577MB/s ± 1% | 577MB/s ± 0%
|
||
|
`Sum224` (16KiB) | 401MB/s ± 1% | 453MB/s ± 0% | 577MB/s ± 0% | 577MB/s ± 0%
|
||
|
|
||
|
Operation | Pure Go | SSE2 | SSE4.1 | AVX | Allocs / Op
|
||
|
-----------------|-------------|-------------|-------------|-------------|------------
|
||
|
`Sum224` (32b) | 187ns ± 1% | 170ns ± 1% | 138ns ± 1% | 137ns ± 1% | 0
|
||
|
`Sum224` (64b) | 342ns ± 2% | 306ns ± 1% | 238ns ± 1% | 236ns ± 1% | 0
|
||
|
`Sum224` (1KiB) | 2.71µs ± 1% | 2.42µs ± 1% | 1.90µs ± 1% | 1.91µs ± 1% | 0
|
||
|
`Sum224` (8KiB) | 20.3µs ± 1% | 18.3µs ± 1% | 14.2µs ± 1% | 14.2µs ± 0% | 0
|
||
|
`Sum224` (16KiB) | 40.9µs ± 1% | 36.2µs ± 0% | 28.4µs ± 0% | 28.4µs ± 0% | 0
|
||
|
|
||
|
### State Serialization Benchmarks
|
||
|
|
||
|
The following results demonstrate the performance of serializing the
|
||
|
intermediate state for both BLAKE-224 and BLAKE-256 using the zero-alloc
|
||
|
`SaveState` method versus the standard library `encoding.MarshalBinary`
|
||
|
interface.
|
||
|
|
||
|
Metric | `MarshalBinary` | `SaveState` | Delta
|
||
|
------------|-----------------|-------------|---------------------------
|
||
|
Time / Op | 40.6ns ± 1% | 16.0ns ± 0% | -60.60% (p=0.000 n=10+10)
|
||
|
Allocs / Op | 1 | 0 | -100.00% (p=0.000 n=10+10)
|
||
|
|
||
|
## Disabling Assembler Optimizations
|
||
|
|
||
|
The `purego` build tag may be used to disable all assembly code.
|
||
|
|
||
|
Additionally, when built normally without the `purego` build tag, the assembly
|
||
|
optimizations for each of the supported vector extensions can individually be
|
||
|
disabled at runtime by setting the following environment variables to `1`.
|
||
|
|
||
|
* `BLAKE256_DISABLE_AVX=1`: Disable Advanced Vector Extensions (AVX) optimizations
|
||
|
* `BLAKE256_DISABLE_SSE41=1`: Disable Streaming SIMD Extensions 4.1 (SSE4.1) optimizations
|
||
|
* `BLAKE256_DISABLE_SSE2=1`: Disable Streaming SIMD Extensions 2 (SSE2) optimizations
|
||
|
|
||
|
The package will automatically use the fastest available extensions that are not
|
||
|
disabled.
|
||
|
|
||
|
## Examples
|
||
|
|
||
|
* [Basic Usage](https://pkg.go.dev/github.com/decred/dcrd/crypto/blake256#example-package-BasicUsage)
|
||
|
Demonstrates the simplest method of hashing an existing serialized data buffer
|
||
|
with BLAKE-256.
|
||
|
* [Rolling Hasher Usage](https://pkg.go.dev/github.com/decred/dcrd/crypto/blake256#example-package-RollingHasherUsage)
|
||
|
Demonstrates creating a rolling BLAKE-256 hasher, writing various data types
|
||
|
to it, computing the hash, writing more data, and finally computing the
|
||
|
cumulative hash.
|
||
|
* [Same Process Save and Restore](https://pkg.go.dev/github.com/decred/dcrd/crypto/blake256#example-package-SameProcessSaveRestore)
|
||
|
Demonstrates creating a rolling BLAKE-256 hasher, writing some data to it,
|
||
|
making a copy of the intermediate state, restoring the intermediate state in
|
||
|
multiple goroutines, writing more data to each of those restored copies, and
|
||
|
computing the final hashes.
|
||
|
|
||
|
## Installation and Updating
|
||
|
|
||
|
This package is part of the `github.com/decred/dcrd/crypto/blake256` module.
|
||
|
Use the standard go tooling for working with modules to incorporate it.
|
||
|
|
||
|
## License
|
||
|
|
||
|
Package blake256 is licensed under the [copyfree](http://copyfree.org) ISC
|
||
|
License.
|