Asara 1dbe1dc2c7 replace context with caller for zerolog, add vendor

2024-09-16 21:01:50 -04:00

9.7 KiB

Raw Blame History

blake256

Overview

Package blake256 implements the BLAKE-256 and BLAKE-224 cryptographic hash functions (SHA-3 candidate) in pure Go along with highly optimized SSE2, SSE4.1, and AVX acceleration.

It provides an API that enables zero allocations and the ability to save and restore the intermediate state (also often called the midstate). The design philosophy has a strong on emphasis correctness, readability, and efficiency while also aiming to provide an ergonomic API.

In addition to the zero allocation API, it also implements the standard library interfaces hash.Hash, encoding.BinaryMarshaler, and encoding.BinaryUnmarshaler for callers that are not as concerned about avoiding allocations. No dependencies beyond the standard library are required.

A full suite of tests with 100% branch coverage and benchmarks are provided to help ensure proper functionality and analyze performance characteristics.

The core assembly code to take advantage of the amd64 SIMD vector extensions is generated with Go via avo.

Show me the benchmarks already!

Example Usage?

Hashing Data

The simplest way to hash data that is already serialized into bytes is via the global Sum224 (BLAKE-224) or Sum256 (BLAKE-256) functions. This is demonstrated for BLAKE-256 via the "Basic Usage" example linked in the Examples section.

However, since hashing typically involves writing various pieces of information that aren't already serialized, this package provides NewHasher224 (BLAKE-224) and NewHasher256 (BLAKE-256) (and their respective variants NewHasher224Salt and NewHasher256Salt that accept salt).

These methods return rolling hasher instances that support writing an arbitrary amount of data along with several convenience methods for writing various data types in either big endian or little endian. For example, WriteString adds a string encoded as its UTF-8 byte sequence to the rolling hash and WriteUint64BE adds an unsigned 64-bit integer encoded as an 8-byte big-endian byte sequence to the rolling hash.

The hash is then obtained via the Sum224 (BLAKE-224) or Sum256 (BLAKE-256) method on the respective hasher instance.

See the "Rolling Hasher Usage" example linked in the Examples section to see rolling hashing in action.

Saving and Resuming Intermediate States

Many applications involve hashing data that always starts with the same sequence of bytes (aka a shared prefix). Whenever that prefix is larger than the block size (BlockSize), or it is otherwise costly to generate and serialize, it is typically more efficient to save the intermediate state (midstate) after writing the shared prefix so that all future hashes can resume from that midstate and thereby avoid redoing work.

To that end, the aforementioned rolling hasher instances support being copied to save and restore the current midstate within the same process. This is demonstrated via the "Same Process Save and Restore" example linked in the Examples section.

Alternatively, when a simple copy of the instance is not possible, such as when the midstate is needed among multiple processes, perhaps on entirely different hardware, it can be serialized via SaveState and restored via UnmarshalBinary. Note that there is necessarily additional overhead involved with serializing and deserializing the intermediate state, so callers should be sure to compare that overhead with rehashing the shared data to see which approach yields better results for their particular application.

Hashing With Salt

This implementation also provides NewHasher224Salt (BLAKE-224) and NewHasher256Salt (BLAKE-256) which accept a 16-byte salt input as described by the specification. Hashing with distinct salts effectively provides an efficient method to hash with different functions while using the same underlying algorithm. The salted variants behave exactly the same as the normal unsalted variants described throughout the documentation.

Benchmarks

The following benchmarks are from a Ryzen 7 5800X3D processor on Linux and are the result of feeding benchstat 10 iterations of each. Benchmarks for both BLAKE-224 and BLAKE-256 are provided. They are essentialy identical (within the margin of error) as expected since the only notable difference as it pertains to performance is that the final output is 4 bytes shorter.

BLAKE-256 Hashing Benchmarks

The following results demonstrate the performance of hashing various amounts of data for both small and larger inputs with the Sum256 method.

Operation	Pure Go	SSE2	SSE4.1	AVX
`Sum256` (32b)	168MB/s ± 1%	188MB/s ± 1%	232MB/s ± 0%	234MB/s ± 1%
`Sum256` (64b)	187MB/s ± 0%	208MB/s ± 0%	270MB/s ± 1%	271MB/s ± 1%
`Sum256` (1KiB)	378MB/s ± 1%	421MB/s ± 1%	536MB/s ± 1%	539MB/s ± 1%
`Sum256` (8KiB)	405MB/s ± 1%	448MB/s ± 0%	573MB/s ± 0%	573MB/s ± 0%
`Sum256` (16KiB)	402MB/s ± 1%	449MB/s ± 0%	575MB/s ± 0%	575MB/s ± 0%

Operation	Pure Go	SSE2	SSE4.1	AVX
`Sum256` (32b)	190ns ± 1%	170ns ± 1%	138ns ± 0%	137ns ± 1%
`Sum256` (64b)	342ns ± 0%	308ns ± 0%	237ns ± 1%	236ns ± 1%
`Sum256` (1KiB)	2.71µs ± 1%	2.43µs ± 1%	1.91µs ± 1%	1.90µs ± 1%
`Sum256` (8KiB)	20.2µs ± 1%	18.3µs ± 0%	14.3µs ± 0%	14.3µs ± 0%
`Sum256` (16KiB)	40.8µs ± 1%	36.5µs ± 0%	28.5µs ± 0%	28.5µs ± 0%

BLAKE-224 Hashing Benchmarks

The following results demonstrate the performance of hashing various amounts of data for both small and larger inputs with the Sum224 method.

Operation	Pure Go	SSE2	SSE4.1	AVX
`Sum224` (32b)	171MB/s ± 1%	188MB/s ± 1%	232MB/s ± 1%	234MB/s ± 1%
`Sum224` (64b)	187MB/s ± 2%	209MB/s ± 1%	269MB/s ± 1%	271MB/s ± 1%
`Sum224` (1KiB)	378MB/s ± 1%	423MB/s ± 1%	539MB/s ± 1%	536MB/s ± 1%
`Sum224` (8KiB)	404MB/s ± 1%	447MB/s ± 1%	577MB/s ± 1%	577MB/s ± 0%
`Sum224` (16KiB)	401MB/s ± 1%	453MB/s ± 0%	577MB/s ± 0%	577MB/s ± 0%

Operation	Pure Go	SSE2	SSE4.1	AVX
`Sum224` (32b)	187ns ± 1%	170ns ± 1%	138ns ± 1%	137ns ± 1%
`Sum224` (64b)	342ns ± 2%	306ns ± 1%	238ns ± 1%	236ns ± 1%
`Sum224` (1KiB)	2.71µs ± 1%	2.42µs ± 1%	1.90µs ± 1%	1.91µs ± 1%
`Sum224` (8KiB)	20.3µs ± 1%	18.3µs ± 1%	14.2µs ± 1%	14.2µs ± 0%
`Sum224` (16KiB)	40.9µs ± 1%	36.2µs ± 0%	28.4µs ± 0%	28.4µs ± 0%

State Serialization Benchmarks

The following results demonstrate the performance of serializing the intermediate state for both BLAKE-224 and BLAKE-256 using the zero-alloc SaveState method versus the standard library encoding.MarshalBinary interface.

Metric	`MarshalBinary`	`SaveState`	Delta
Time / Op	40.6ns ± 1%	16.0ns ± 0%	-60.60% (p=0.000 n=10+10)
Allocs / Op	1	0	-100.00% (p=0.000 n=10+10)

Disabling Assembler Optimizations

The purego build tag may be used to disable all assembly code.

Additionally, when built normally without the purego build tag, the assembly optimizations for each of the supported vector extensions can individually be disabled at runtime by setting the following environment variables to 1.

BLAKE256_DISABLE_AVX=1: Disable Advanced Vector Extensions (AVX) optimizations
BLAKE256_DISABLE_SSE41=1: Disable Streaming SIMD Extensions 4.1 (SSE4.1) optimizations
BLAKE256_DISABLE_SSE2=1: Disable Streaming SIMD Extensions 2 (SSE2) optimizations

The package will automatically use the fastest available extensions that are not disabled.

Examples

Basic Usage
Demonstrates the simplest method of hashing an existing serialized data buffer with BLAKE-256.
Rolling Hasher Usage
Demonstrates creating a rolling BLAKE-256 hasher, writing various data types to it, computing the hash, writing more data, and finally computing the cumulative hash.
Same Process Save and Restore
Demonstrates creating a rolling BLAKE-256 hasher, writing some data to it, making a copy of the intermediate state, restoring the intermediate state in multiple goroutines, writing more data to each of those restored copies, and computing the final hashes.

Installation and Updating

This package is part of the github.com/decred/dcrd/crypto/blake256 module. Use the standard go tooling for working with modules to incorporate it.

License

Package blake256 is licensed under the copyfree ISC License.

9.7 KiB Raw Blame History