well-goknown/vendor/github.com/decred/dcrd/crypto/blake256/README.md

191 lines
9.7 KiB
Markdown
Raw Permalink Normal View History

blake256
========
[![Build Status](https://github.com/decred/dcrd/workflows/Build%20and%20Test/badge.svg)](https://github.com/decred/dcrd/actions)
[![ISC License](https://img.shields.io/badge/license-ISC-blue.svg)](http://copyfree.org)
[![Doc](https://img.shields.io/badge/doc-reference-blue.svg)](https://pkg.go.dev/github.com/decred/dcrd/crypto/blake256)
## Overview
Package `blake256` implements the [BLAKE-256 and BLAKE-224 cryptographic hash
functions](https://www.aumasson.jp/blake/blake.pdf) (SHA-3 candidate) in pure Go
along with highly optimized SSE2, SSE4.1, and AVX acceleration.
It provides an API that enables zero allocations and the ability to save and
restore the intermediate state (also often called the midstate). The design
philosophy has a strong on emphasis correctness, readability, and efficiency
while also aiming to provide an ergonomic API.
In addition to the zero allocation API, it also implements the standard library
interfaces `hash.Hash`, `encoding.BinaryMarshaler`, and
`encoding.BinaryUnmarshaler` for callers that are not as concerned about
avoiding allocations. No dependencies beyond the standard library are required.
A full suite of tests with 100% branch coverage and benchmarks are provided to
help ensure proper functionality and analyze performance characteristics.
The core assembly code to take advantage of the `amd64` SIMD vector extensions
is generated with Go via [avo](https://github.com/mmcloughlin/avo).
[Show me the benchmarks already](#benchmarks)!
[Example Usage?](#examples)
## Hashing Data
The simplest way to hash data that is already serialized into bytes is via the
global `Sum224` (BLAKE-224) or `Sum256` (BLAKE-256) functions. This is
demonstrated for BLAKE-256 via the "Basic Usage" example linked in the
[Examples](#examples) section.
However, since hashing typically involves writing various pieces of information
that aren't already serialized, this package provides `NewHasher224` (BLAKE-224)
and `NewHasher256` (BLAKE-256) (and their respective variants `NewHasher224Salt`
and `NewHasher256Salt` that accept salt).
These methods return rolling hasher instances that support writing an arbitrary
amount of data along with several convenience methods for writing various data
types in either big endian or little endian. For example, `WriteString` adds a
string encoded as its UTF-8 byte sequence to the rolling hash and
`WriteUint64BE` adds an unsigned 64-bit integer encoded as an 8-byte big-endian
byte sequence to the rolling hash.
The hash is then obtained via the `Sum224` (BLAKE-224) or `Sum256` (BLAKE-256)
method on the respective hasher instance.
See the "Rolling Hasher Usage" example linked in the [Examples](#examples)
section to see rolling hashing in action.
## Saving and Resuming Intermediate States
Many applications involve hashing data that always starts with the same sequence
of bytes (aka a shared prefix). Whenever that prefix is larger than the block
size (`BlockSize`), or it is otherwise costly to generate and serialize, it is
typically more efficient to save the intermediate state (midstate) after writing
the shared prefix so that all future hashes can resume from that midstate and
thereby avoid redoing work.
To that end, the aforementioned rolling hasher instances support being copied to
save and restore the current midstate within the same process. This is
demonstrated via the "Same Process Save and Restore" example linked in the
[Examples](#examples) section.
Alternatively, when a simple copy of the instance is not possible, such as when
the midstate is needed among multiple processes, perhaps on entirely different
hardware, it can be serialized via `SaveState` and restored via
`UnmarshalBinary`. Note that there is necessarily additional overhead involved
with serializing and deserializing the intermediate state, so callers should be
sure to compare that overhead with rehashing the shared data to see which
approach yields better results for their particular application.
## Hashing With Salt
This implementation also provides `NewHasher224Salt` (BLAKE-224) and
`NewHasher256Salt` (BLAKE-256) which accept a 16-byte salt input as described by
the specification. Hashing with distinct salts effectively provides an
efficient method to hash with different functions while using the same
underlying algorithm. The salted variants behave exactly the same as the normal
unsalted variants described throughout the documentation.
## Benchmarks
The following benchmarks are from a Ryzen 7 5800X3D processor on Linux and are
the result of feeding `benchstat` 10 iterations of each. Benchmarks for both
BLAKE-224 and BLAKE-256 are provided. They are essentialy identical (within the
margin of error) as expected since the only notable difference as it pertains to
performance is that the final output is 4 bytes shorter.
### BLAKE-256 Hashing Benchmarks
The following results demonstrate the performance of hashing various amounts of
data for both small and larger inputs with the `Sum256` method.
Operation | Pure Go | SSE2 | SSE4.1 | AVX
-----------------|--------------|--------------|--------------|-------------
`Sum256` (32b) | 168MB/s ± 1% | 188MB/s ± 1% | 232MB/s ± 0% | 234MB/s ± 1%
`Sum256` (64b) | 187MB/s ± 0% | 208MB/s ± 0% | 270MB/s ± 1% | 271MB/s ± 1%
`Sum256` (1KiB) | 378MB/s ± 1% | 421MB/s ± 1% | 536MB/s ± 1% | 539MB/s ± 1%
`Sum256` (8KiB) | 405MB/s ± 1% | 448MB/s ± 0% | 573MB/s ± 0% | 573MB/s ± 0%
`Sum256` (16KiB) | 402MB/s ± 1% | 449MB/s ± 0% | 575MB/s ± 0% | 575MB/s ± 0%
Operation | Pure Go | SSE2 | SSE4.1 | AVX | Allocs / Op
-----------------|-------------|-------------|-------------|-------------|------------
`Sum256` (32b) | 190ns ± 1% | 170ns ± 1% | 138ns ± 0% | 137ns ± 1% | 0
`Sum256` (64b) | 342ns ± 0% | 308ns ± 0% | 237ns ± 1% | 236ns ± 1% | 0
`Sum256` (1KiB) | 2.71µs ± 1% | 2.43µs ± 1% | 1.91µs ± 1% | 1.90µs ± 1% | 0
`Sum256` (8KiB) | 20.2µs ± 1% | 18.3µs ± 0% | 14.3µs ± 0% | 14.3µs ± 0% | 0
`Sum256` (16KiB) | 40.8µs ± 1% | 36.5µs ± 0% | 28.5µs ± 0% | 28.5µs ± 0% | 0
### BLAKE-224 Hashing Benchmarks
The following results demonstrate the performance of hashing various amounts of
data for both small and larger inputs with the `Sum224` method.
Operation | Pure Go | SSE2 | SSE4.1 | AVX
-----------------|--------------|--------------|--------------|-------------
`Sum224` (32b) | 171MB/s ± 1% | 188MB/s ± 1% | 232MB/s ± 1% | 234MB/s ± 1%
`Sum224` (64b) | 187MB/s ± 2% | 209MB/s ± 1% | 269MB/s ± 1% | 271MB/s ± 1%
`Sum224` (1KiB) | 378MB/s ± 1% | 423MB/s ± 1% | 539MB/s ± 1% | 536MB/s ± 1%
`Sum224` (8KiB) | 404MB/s ± 1% | 447MB/s ± 1% | 577MB/s ± 1% | 577MB/s ± 0%
`Sum224` (16KiB) | 401MB/s ± 1% | 453MB/s ± 0% | 577MB/s ± 0% | 577MB/s ± 0%
Operation | Pure Go | SSE2 | SSE4.1 | AVX | Allocs / Op
-----------------|-------------|-------------|-------------|-------------|------------
`Sum224` (32b) | 187ns ± 1% | 170ns ± 1% | 138ns ± 1% | 137ns ± 1% | 0
`Sum224` (64b) | 342ns ± 2% | 306ns ± 1% | 238ns ± 1% | 236ns ± 1% | 0
`Sum224` (1KiB) | 2.71µs ± 1% | 2.42µs ± 1% | 1.90µs ± 1% | 1.91µs ± 1% | 0
`Sum224` (8KiB) | 20.3µs ± 1% | 18.3µs ± 1% | 14.2µs ± 1% | 14.2µs ± 0% | 0
`Sum224` (16KiB) | 40.9µs ± 1% | 36.2µs ± 0% | 28.4µs ± 0% | 28.4µs ± 0% | 0
### State Serialization Benchmarks
The following results demonstrate the performance of serializing the
intermediate state for both BLAKE-224 and BLAKE-256 using the zero-alloc
`SaveState` method versus the standard library `encoding.MarshalBinary`
interface.
Metric | `MarshalBinary` | `SaveState` | Delta
------------|-----------------|-------------|---------------------------
Time / Op | 40.6ns ± 1% | 16.0ns ± 0% | -60.60% (p=0.000 n=10+10)
Allocs / Op | 1 | 0 | -100.00% (p=0.000 n=10+10)
## Disabling Assembler Optimizations
The `purego` build tag may be used to disable all assembly code.
Additionally, when built normally without the `purego` build tag, the assembly
optimizations for each of the supported vector extensions can individually be
disabled at runtime by setting the following environment variables to `1`.
* `BLAKE256_DISABLE_AVX=1`: Disable Advanced Vector Extensions (AVX) optimizations
* `BLAKE256_DISABLE_SSE41=1`: Disable Streaming SIMD Extensions 4.1 (SSE4.1) optimizations
* `BLAKE256_DISABLE_SSE2=1`: Disable Streaming SIMD Extensions 2 (SSE2) optimizations
The package will automatically use the fastest available extensions that are not
disabled.
## Examples
* [Basic Usage](https://pkg.go.dev/github.com/decred/dcrd/crypto/blake256#example-package-BasicUsage)
Demonstrates the simplest method of hashing an existing serialized data buffer
with BLAKE-256.
* [Rolling Hasher Usage](https://pkg.go.dev/github.com/decred/dcrd/crypto/blake256#example-package-RollingHasherUsage)
Demonstrates creating a rolling BLAKE-256 hasher, writing various data types
to it, computing the hash, writing more data, and finally computing the
cumulative hash.
* [Same Process Save and Restore](https://pkg.go.dev/github.com/decred/dcrd/crypto/blake256#example-package-SameProcessSaveRestore)
Demonstrates creating a rolling BLAKE-256 hasher, writing some data to it,
making a copy of the intermediate state, restoring the intermediate state in
multiple goroutines, writing more data to each of those restored copies, and
computing the final hashes.
## Installation and Updating
This package is part of the `github.com/decred/dcrd/crypto/blake256` module.
Use the standard go tooling for working with modules to incorporate it.
## License
Package blake256 is licensed under the [copyfree](http://copyfree.org) ISC
License.