pax_global_header00006660000000000000000000000064137756626340014535gustar00rootroot0000000000000052 comment=776275e0c9a74ceebbd50fe5c1d61b0c80c608df md5-simd-1.1.2/000077500000000000000000000000001377566263400131555ustar00rootroot00000000000000md5-simd-1.1.2/LICENSE000066400000000000000000000261361377566263400141720ustar00rootroot00000000000000 Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. md5-simd-1.1.2/LICENSE.Golang000066400000000000000000000027071377566263400153760ustar00rootroot00000000000000Copyright (c) 2009 The Go Authors. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Google Inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. md5-simd-1.1.2/README.md000066400000000000000000000256471377566263400144520ustar00rootroot00000000000000 # md5-simd This is a SIMD accelerated MD5 package, allowing up to either 8 (AVX2) or 16 (AVX512) independent MD5 sums to be calculated on a single CPU core. It was originally based on the [md5vec](https://github.com/igneous-systems/md5vec) repository by Igneous Systems, but has been made more flexible by amongst others supporting different message sizes per lane and adding AVX512. `md5-simd` integrates a similar mechanism as described in [minio/sha256-simd](https://github.com/minio/sha256-simd#support-for-avx512) for making it easy for clients to take advantages of the parallel nature of the MD5 calculation. This will result in reduced overall CPU load. It is important to understand that `md5-simd` **does not speed up** a single threaded MD5 hash sum. Rather it allows multiple __independent__ MD5 sums to be computed in parallel on the same CPU core, thereby making more efficient usage of the computing resources. ## Usage [![Documentation](https://godoc.org/github.com/minio/md5-simd?status.svg)](https://pkg.go.dev/github.com/minio/md5-simd?tab=doc) In order to use `md5-simd`, you must first create an `Server` which can be used to instantiate one or more objects for MD5 hashing. These objects conform to the regular [`hash.Hash`](https://pkg.go.dev/hash?tab=doc#Hash) interface and as such the normal Write/Reset/Sum functionality works as expected. As an example: ``` // Create server server := md5simd.NewServer() defer server.Close() // Create hashing object (conforming to hash.Hash) md5Hash := server.NewHash() defer md5Hash.Close() // Write one (or more) blocks md5Hash.Write(block) // Return digest digest := md5Hash.Sum([]byte{}) ``` To keep performance both a [Server](https://pkg.go.dev/github.com/minio/md5-simd?tab=doc#Server) and individual [Hasher](https://pkg.go.dev/github.com/minio/md5-simd?tab=doc#Hasher) should be closed using the `Close()` function when no longer needed. A Hasher can efficiently be re-used by using [`Reset()`](https://pkg.go.dev/hash?tab=doc#Hash) functionality. In case your system does not support the instructions required it will fall back to using `crypto/md5` for hashing. ## Limitations As explained above `md5-simd` does not speed up an individual MD5 hash sum computation, unless some hierarchical tree construct is used but this will result in different outcomes. Running a single hash on a server results in approximately half the throughput. Instead, it allows running multiple MD5 calculations in parallel on a single CPU core. This can be beneficial in e.g. multi-threaded server applications where many go-routines are dealing with many requests and multiple MD5 calculations can be packed/scheduled for parallel execution on a single core. This will result in a lower overall CPU usage as compared to using the standard `crypto/md5` functionality where each MD5 hash computation will consume a single thread (core). It is best to test and measure the overall CPU usage in a representative usage scenario in your application to get an overall understanding of the benefits of `md5-simd` as compared to `crypto/md5`, ideally under heavy CPU load. Also note that `md5-simd` is best meant to work with large objects, so if your application only hashes small objects of a few kilobytes you may be better of by using `crypto/md5`. ## Performance For the best performance writes should be a multiple of 64 bytes, ideally a multiple of 32KB. To help with that a [`buffered := bufio.NewWriterSize(hasher, 32<<10)`](https://golang.org/pkg/bufio/#NewWriterSize) can be inserted if you are unsure of the sizes of the writes. Remember to [flush](https://golang.org/pkg/bufio/#Writer.Flush) `buffered` before reading the hash. A single 'server' can process 16 streams concurrently with 1 core (AVX-512) or 2 cores (AVX2). In situations where it is likely that more than 16 streams are fully loaded it may be beneficial to use multiple servers. The following chart compares the multi-core performance between `crypto/md5` vs the AVX2 vs the AVX512 code: ![md5-performance-overview](chart/Multi-core-MD5-Aggregated-Hashing-Performance.png) Compared to `crypto/md5`, the AVX2 version is up to 4x faster: ``` $ benchcmp crypto-md5.txt avx2.txt benchmark old MB/s new MB/s speedup BenchmarkParallel/32KB-4 2229.22 7370.50 3.31x BenchmarkParallel/64KB-4 2233.61 8248.46 3.69x BenchmarkParallel/128KB-4 2235.43 8660.74 3.87x BenchmarkParallel/256KB-4 2236.39 8863.87 3.96x BenchmarkParallel/512KB-4 2238.05 8985.39 4.01x BenchmarkParallel/1MB-4 2233.56 9042.62 4.05x BenchmarkParallel/2MB-4 2224.11 9014.46 4.05x BenchmarkParallel/4MB-4 2199.78 8993.61 4.09x BenchmarkParallel/8MB-4 2182.48 8748.22 4.01x ``` Compared to `crypto/md5`, the AVX512 is up to 8x faster (for larger block sizes): ``` $ benchcmp crypto-md5.txt avx512.txt benchmark old MB/s new MB/s speedup BenchmarkParallel/32KB-4 2229.22 11605.78 5.21x BenchmarkParallel/64KB-4 2233.61 14329.65 6.42x BenchmarkParallel/128KB-4 2235.43 16166.39 7.23x BenchmarkParallel/256KB-4 2236.39 15570.09 6.96x BenchmarkParallel/512KB-4 2238.05 16705.83 7.46x BenchmarkParallel/1MB-4 2233.56 16941.95 7.59x BenchmarkParallel/2MB-4 2224.11 17136.01 7.70x BenchmarkParallel/4MB-4 2199.78 17218.61 7.83x BenchmarkParallel/8MB-4 2182.48 17252.88 7.91x ``` These measurements were performed on AWS EC2 instance of type `c5.xlarge` equipped with a Xeon Platinum 8124M CPU at 3.0 GHz. If only one or two inputs are available the scalar calculation method will be used for the optimal speed in these cases. ## Operation To make operation as easy as possible there is a “Server” coordinating everything. The server keeps track of individual hash states and updates them as new data comes in. This can be visualized as follows: ![server-architecture](chart/server-architecture.png) The data is sent to the server from each hash input in blocks of up to 32KB per round. In our testing we found this to be the block size that yielded the best results. Whenever there is data available the server will collect data for up to 16 hashes and process all 16 lanes in parallel. This means that if 16 hashes have data available all the lanes will be filled. However since that may not be the case, the server will fill less lanes and do a round anyway. Lanes can also be partially filled if less than 32KB of data is written. ![server-lanes-example](chart/server-lanes-example.png) In this example 4 lanes are fully filled and 2 lanes are partially filled. In this case the black areas will simply be masked out from the results and ignored. This is also why calculating a single hash on a server will not result in any speedup and hash writes should be a multiple of 32KB for the best performance. For AVX512 all 16 calculations will be done on a single core, on AVX2 on 2 cores if there is data for more than 8 lanes. So for optimal usage there should be data available for all 16 hashes. It may be perfectly reasonable to use more than 16 concurrent hashes. ## Design & Tech md5-simd has both an AVX2 (8-lane parallel), and an AVX512 (16-lane parallel version) algorithm to accelerate the computation with the following function definitions: ``` //go:noescape func block8(state *uint32, base uintptr, bufs *int32, cache *byte, n int) //go:noescape func block16(state *uint32, ptrs *int64, mask uint64, n int) ``` The AVX2 version is based on the [md5vec](https://github.com/igneous-systems/md5vec) repository and is essentially unchanged except for minor (cosmetic) changes. The AVX512 version is derived from the AVX2 version but adds some further optimizations and simplifications. ### Caching in upper ZMM registers The AVX2 version passes in a `cache8` block of memory (about 0.5 KB) for temporary storage of intermediate results during `ROUND1` which are subsequently used during `ROUND2` through to `ROUND4`. Since AVX512 has double the amount of registers (32 ZMM registers as compared to 16 YMM registers), it is possible to use the upper 16 ZMM registers for keeping the intermediate states on the CPU. As such, there is no need to pass in a corresponding `cache16` into the AVX512 block function. ### Direct loading using 64-bit pointers The AVX2 uses the `VPGATHERDD` instruction (for YMM) to do a parallel load of 8 lanes using (8 independent) 32-bit offets. Since there is no control over how the 8 slices that are passed into the (Golang) `blockMd5` function are laid out into memory, it is not possible to derive a "base" address and corresponding offsets (all within 32-bits) for all 8 slices. As such the AVX2 version uses an interim buffer to collect the byte slices to be hashed from all 8 inut slices and passed this buffer along with (fixed) 32-bit offsets into the assembly code. For the AVX512 version this interim buffer is not needed since the AVX512 code uses a pair of `VPGATHERQD` instructions to directly dereference 64-bit pointers (from a base register address that is initialized to zero). Note that two load (gather) instructions are needed because the AVX512 version processes 16-lanes in parallel, requiring 16 times 64-bit = 1024 bits in total to be loaded. A simple `VALIGND` and `VPORD` are subsequently used to merge the lower and upper halves together into a single ZMM register (that contains 16 lanes of 32-bit DWORDS). ### Masking support Due to the fact that pointers are passed directly from the Golang slices, we need to protect against NULL pointers. For this a 16-bit mask is passed in the AVX512 assembly code which is used during the `VPGATHERQD` instructions to mask out lanes that could otherwise result in segment violations. ### Minor optimizations The `roll` macro (three instructions on AVX2) is no longer needed for AVX512 and is replaced by a single `VPROLD` instruction. Also several logical operations from the various ROUNDS of the AVX2 version could be combined into a single instruction using ternary logic (with the `VPTERMLOGD` instruction), resulting in a further simplification and speed-up. ## Low level block function performance The benchmark below shows the (single thread) maximum performance of the `block()` function for AVX2 (having 8 lanes) and AVX512 (having 16 lanes). Also the baseline single-core performance from the standard `crypto/md5` package is shown for comparison. ``` BenchmarkCryptoMd5-4 687.66 MB/s 0 B/op 0 allocs/op BenchmarkBlock8-4 4144.80 MB/s 0 B/op 0 allocs/op BenchmarkBlock16-4 8228.88 MB/s 0 B/op 0 allocs/op ``` ## License `md5-simd` is released under the Apache License v2.0. You can find the complete text in the file LICENSE. ## Contributing Contributions are welcome, please send PRs for any enhancements.md5-simd-1.1.2/_gen/000077500000000000000000000000001377566263400140655ustar00rootroot00000000000000md5-simd-1.1.2/_gen/gen.go000066400000000000000000000175441377566263400152000ustar00rootroot00000000000000package main //go:generate go run gen.go -out ../md5block_amd64.s -stubs ../md5block_amd64.go -pkg=md5simd import ( x "github.com/mmcloughlin/avo/build" "github.com/mmcloughlin/avo/buildtags" o "github.com/mmcloughlin/avo/operand" "github.com/mmcloughlin/avo/reg" ) // AMD: // 2025 BMI2 :RORX r32, r32, r32 L: 0.29ns= 1.0c T: 0.15ns= 0.50c // 271 X86 :ROL r32, imm8 L: 0.29ns= 1.0c T: 0.15ns= 0.50c // // INTEL: // 271 X86 :ROL r32, imm8 L: 0.27ns= 1.0c T: 0.14ns= 0.50c // 2025 BMI2 :RORX r32, r32, r32 L: 0.27ns= 1.0c T: 0.14ns= 0.50c // Neither appear to have any gains // Don't bother with BMI2 const useROLX = false func ROLL(imm int, gpr reg.GPVirtual) { if useROLX { x.RORXL(o.U8(32-imm), gpr, gpr) } else { x.ROLL(o.U8(imm), gpr) } } // AMD: // 154 X86 :XOR r32, r32 L: 0.06ns= 0.2c T: 0.06ns= 0.25c // 166 X86 :NOT r32 L: 0.26ns= 1.0c T: 0.11ns= 0.43c // // INTEL: // Inst 166 X86 : NOT r32 L: 0.45ns= 1.0c T: 0.11ns= 0.25c // Inst 154 X86 : XOR r32, r32 L: 0.11ns= 0.2c T: 0.11ns= 0.25c func NOTL(gpr, ones reg.GPVirtual) { // Use XOR if false { x.NOTL(gpr) } else { x.XORL(ones, gpr) } } func main() { x.Constraint(buildtags.Not("appengine").ToConstraint()) x.Constraint(buildtags.Not("noasm").ToConstraint()) x.Constraint(buildtags.Term("gc").ToConstraint()) x.TEXT("blockScalar", 0, "func(dig *[4]uint32, p []byte)") x.Doc("Encode p to digest") x.Pragma("noescape") srcLen := x.Load(x.Param("p").Len(), x.GP64()) digest := x.Load(x.Param("dig"), x.GP64()) src := x.Load(x.Param("p").Base(), x.GP64()) x.SHRQ(o.U8(6), srcLen) x.SHLQ(o.U8(6), srcLen) end := x.GP64() x.LEAQ(o.Mem{Base: src, Index: srcLen, Scale: 1}, end) x.CMPQ(src, end) x.JEQ(o.LabelRef("end")) var dig [4]reg.GPVirtual for i := range dig { dig[i] = x.GP32() x.MOVL(o.Mem{Base: digest, Disp: i * 4}, dig[i]) } AX, BX, CX, DX := dig[0], dig[1], dig[2], dig[3] // Keep ones in a register ones := x.GP32() x.MOVL(o.U32(0xffffffff), ones) x.Label("loop") var block [4]reg.VecVirtual R8, R9 := x.GP32(), x.GP32() // load source. Skipped if idx < 0 var loadSrc func(idx int, dst reg.GPVirtual) // Appears slower. const useXMM = false if useXMM { for i := range block { block[i] = x.XMM() x.MOVUPS(o.Mem{Base: src, Disp: 16 * i}, block[i]) } // load source. Skipped if idx < 0 loadSrc = func(idx int, dst reg.GPVirtual) { if idx < 0 { return } // 4 per block xmm := block[idx/4] x.PEXTRD(o.U8(idx&3), xmm, dst) } } else { loadSrc = func(idx int, dst reg.GPVirtual) { if idx < 0 { return } x.MOVL(o.Mem{Base: src, Disp: idx * 4}, dst) } } const useLEA = false loadSrc(0, R8) x.MOVL(DX, R9) // Copy digest R12, R13, R14, R15 := x.GP32(), x.GP32(), x.GP32(), x.GP32() x.MOVL(AX, R12) x.MOVL(BX, R13) x.MOVL(CX, R14) x.MOVL(DX, R15) // ROUND 1: x.Comment("ROUND1") ROUND1 := func(a, b, c, d reg.GPVirtual, index, con, shift int) { x.XORL(c, R9) if useLEA { x.LEAL(o.Mem{Base: a, Disp: con, Index: R8, Scale: 1}, a) } else { x.ADDL(o.U32(con), a) x.ADDL(R8, a) } x.ANDL(b, R9) x.XORL(d, R9) loadSrc(index, R8) x.ADDL(R9, a) ROLL(shift, a) x.MOVL(c, R9) x.ADDL(b, a) } ROUND1(AX, BX, CX, DX, 1, 0xd76aa478, 7) ROUND1(DX, AX, BX, CX, 2, 0xe8c7b756, 12) ROUND1(CX, DX, AX, BX, 3, 0x242070db, 17) ROUND1(BX, CX, DX, AX, 4, 0xc1bdceee, 22) ROUND1(AX, BX, CX, DX, 5, 0xf57c0faf, 7) ROUND1(DX, AX, BX, CX, 6, 0x4787c62a, 12) ROUND1(CX, DX, AX, BX, 7, 0xa8304613, 17) ROUND1(BX, CX, DX, AX, 8, 0xfd469501, 22) ROUND1(AX, BX, CX, DX, 9, 0x698098d8, 7) ROUND1(DX, AX, BX, CX, 10, 0x8b44f7af, 12) ROUND1(CX, DX, AX, BX, 11, 0xffff5bb1, 17) ROUND1(BX, CX, DX, AX, 12, 0x895cd7be, 22) ROUND1(AX, BX, CX, DX, 13, 0x6b901122, 7) ROUND1(DX, AX, BX, CX, 14, 0xfd987193, 12) ROUND1(CX, DX, AX, BX, 15, 0xa679438e, 17) // adjusted to load index 1 ROUND1(BX, CX, DX, AX, 1, 0x49b40821, 22) x.Comment("ROUND2") x.MOVL(DX, R9) R10 := x.GP32() x.MOVL(DX, R10) ROUND2 := func(a, b, c, d reg.GPVirtual, index, con, shift int) { NOTL(R9, ones) if useLEA { x.LEAL(o.Mem{Base: a, Disp: con, Index: R8, Scale: 1}, a) } else { x.ADDL(o.U32(con), a) x.ADDL(R8, a) } x.ANDL(b, R10) x.ANDL(c, R9) loadSrc(index, R8) x.ORL(R9, R10) x.MOVL(c, R9) x.ADDL(R10, a) x.MOVL(c, R10) ROLL(shift, a) x.ADDL(b, a) } ROUND2(AX, BX, CX, DX, 6, 0xf61e2562, 5) ROUND2(DX, AX, BX, CX, 11, 0xc040b340, 9) ROUND2(CX, DX, AX, BX, 0, 0x265e5a51, 14) ROUND2(BX, CX, DX, AX, 5, 0xe9b6c7aa, 20) ROUND2(AX, BX, CX, DX, 10, 0xd62f105d, 5) ROUND2(DX, AX, BX, CX, 15, 0x2441453, 9) ROUND2(CX, DX, AX, BX, 4, 0xd8a1e681, 14) ROUND2(BX, CX, DX, AX, 9, 0xe7d3fbc8, 20) ROUND2(AX, BX, CX, DX, 14, 0x21e1cde6, 5) ROUND2(DX, AX, BX, CX, 3, 0xc33707d6, 9) ROUND2(CX, DX, AX, BX, 8, 0xf4d50d87, 14) ROUND2(BX, CX, DX, AX, 13, 0x455a14ed, 20) ROUND2(AX, BX, CX, DX, 2, 0xa9e3e905, 5) ROUND2(DX, AX, BX, CX, 7, 0xfcefa3f8, 9) ROUND2(CX, DX, AX, BX, 12, 0x676f02d9, 14) // Adjusted to load index 5 ROUND2(BX, CX, DX, AX, 5, 0x8d2a4c8a, 20) x.Comment("ROUND3") x.MOVL(CX, R9) ROUND3 := func(a, b, c, d reg.GPVirtual, index, con, shift int) { // LEAL const(a)(R8*1), a; \ if useLEA { x.LEAL(o.Mem{Base: a, Disp: con, Index: R8, Scale: 1}, a) } else { x.ADDL(o.U32(con), a) x.ADDL(R8, a) } loadSrc(index, R8) x.XORL(d, R9) x.XORL(b, R9) x.ADDL(R9, a) ROLL(shift, a) x.MOVL(b, R9) x.ADDL(b, a) } ROUND3(AX, BX, CX, DX, 8, 0xfffa3942, 4) ROUND3(DX, AX, BX, CX, 11, 0x8771f681, 11) ROUND3(CX, DX, AX, BX, 14, 0x6d9d6122, 16) ROUND3(BX, CX, DX, AX, 1, 0xfde5380c, 23) ROUND3(AX, BX, CX, DX, 4, 0xa4beea44, 4) ROUND3(DX, AX, BX, CX, 7, 0x4bdecfa9, 11) ROUND3(CX, DX, AX, BX, 10, 0xf6bb4b60, 16) ROUND3(BX, CX, DX, AX, 13, 0xbebfbc70, 23) ROUND3(AX, BX, CX, DX, 0, 0x289b7ec6, 4) ROUND3(DX, AX, BX, CX, 3, 0xeaa127fa, 11) ROUND3(CX, DX, AX, BX, 6, 0xd4ef3085, 16) ROUND3(BX, CX, DX, AX, 9, 0x4881d05, 23) ROUND3(AX, BX, CX, DX, 12, 0xd9d4d039, 4) ROUND3(DX, AX, BX, CX, 15, 0xe6db99e5, 11) ROUND3(CX, DX, AX, BX, 2, 0x1fa27cf8, 16) ROUND3(BX, CX, DX, AX, 0, 0xc4ac5665, 23) // Use extra reg for constant x.Comment("ROUND4") x.MOVL(ones, R9) x.XORL(DX, R9) ROUND4 := func(a, b, c, d reg.GPVirtual, index, con, shift int) { // LEAL const(a)(R8*1), a; \ if useLEA { x.LEAL(o.Mem{Base: a, Disp: con, Index: R8, Scale: 1}, a) } else { x.ADDL(o.U32(con), a) x.ADDL(R8, a) } x.ORL(b, R9) x.XORL(c, R9) x.ADDL(R9, a) loadSrc(index, R8) if index >= 0 { x.MOVL(ones, R9) } ROLL(shift, a) if index >= 0 { x.XORL(c, R9) } x.ADDL(b, a) } ROUND4(AX, BX, CX, DX, 7, 0xf4292244, 6) ROUND4(DX, AX, BX, CX, 14, 0x432aff97, 10) ROUND4(CX, DX, AX, BX, 5, 0xab9423a7, 15) ROUND4(BX, CX, DX, AX, 12, 0xfc93a039, 21) ROUND4(AX, BX, CX, DX, 3, 0x655b59c3, 6) ROUND4(DX, AX, BX, CX, 10, 0x8f0ccc92, 10) ROUND4(CX, DX, AX, BX, 1, 0xffeff47d, 15) ROUND4(BX, CX, DX, AX, 8, 0x85845dd1, 21) ROUND4(AX, BX, CX, DX, 15, 0x6fa87e4f, 6) ROUND4(DX, AX, BX, CX, 6, 0xfe2ce6e0, 10) ROUND4(CX, DX, AX, BX, 13, 0xa3014314, 15) ROUND4(BX, CX, DX, AX, 4, 0x4e0811a1, 21) ROUND4(AX, BX, CX, DX, 11, 0xf7537e82, 6) ROUND4(DX, AX, BX, CX, 2, 0xbd3af235, 10) ROUND4(CX, DX, AX, BX, 9, 0x2ad7d2bb, 15) ROUND4(BX, CX, DX, AX, -1, 0xeb86d391, 21) x.ADDL(R12, AX) x.ADDL(R13, BX) x.ADDL(R14, CX) x.ADDL(R15, DX) // NEXT LOOP x.Comment("Prepare next loop") x.ADDQ(o.U8(64), src) x.CMPQ(src, end) x.JB(o.LabelRef("loop")) // Write... x.Comment("Write output") digest = x.Load(x.Param("dig"), x.GP64()) for i := range dig { x.MOVL(dig[i], o.Mem{Base: digest, Disp: i * 4}) } x.Label("end") x.RET() x.Generate() } md5-simd-1.1.2/_gen/go.mod000066400000000000000000000002021377566263400151650ustar00rootroot00000000000000module github.com/minio/md5-simd/_gen go 1.14 require github.com/mmcloughlin/avo v0.0.0-20210104032911-599bdd1269f4 // indirect md5-simd-1.1.2/_gen/go.sum000066400000000000000000000054541377566263400152300ustar00rootroot00000000000000github.com/mmcloughlin/avo v0.0.0-20210104032911-599bdd1269f4 h1:ExoghBBFY7A3RzgkAOq0XbHs9zaT/bHq7xysgyp3z3Q= github.com/mmcloughlin/avo v0.0.0-20210104032911-599bdd1269f4/go.mod h1:6aKT4zZIrpGqB3RpFU14ByCSSyKY6LfJz4J/JJChHfI= github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74= golang.org/x/arch v0.0.0-20201008161808-52c3e6f60cff/go.mod h1:flIaEI6LNU6xOCD5PaJvn9wGP0agmIOqjrtsKGRguv4= golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto= golang.org/x/mod v0.3.0 h1:RM4zey1++hCTbCVQfnWeKs9/IEsaBLA8vTkd0WVtmH4= golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= golang.org/x/net v0.0.0-20201021035429-f5854403a974/go.mod h1:sp8m0HH+o8qH0wwXwYZr8TS3Oi6o0r6Gce1SSxlDquU= golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= golang.org/x/tools v0.0.0-20201105001634-bc3cf281b174 h1:0rx0F4EjJNbxTuzWe0KjKcIzs+3VEb/Mrs/d1ciNz1c= golang.org/x/tools v0.0.0-20201105001634-bc3cf281b174/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA= golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 h1:go1bK/D/BFZV2I8cIQd1NKEZ+0owSTG1fDTci4IqFcE= golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= rsc.io/pdf v0.1.1/go.mod h1:n8OzWcQ6Sp37PL01nO98y4iUCRdTGarVfzxY20ICaU4= md5-simd-1.1.2/block16_amd64.s000066400000000000000000000125771377566263400156110ustar00rootroot00000000000000// Copyright (c) 2020 MinIO Inc. All rights reserved. // Use of this source code is governed by a license that can be // found in the LICENSE file. //+build !noasm,!appengine,gc // This is the AVX512 implementation of the MD5 block function (16-way parallel) #define prep(index) \ KMOVQ kmask, ktmp \ VPGATHERDD index*4(base)(ptrs*1), ktmp, mem #define ROUND1(a, b, c, d, index, const, shift) \ VPXORQ c, tmp, tmp \ VPADDD 64*const(consts), a, a \ VPADDD mem, a, a \ VPTERNLOGD $0x6C, b, d, tmp \ prep(index) \ VPADDD tmp, a, a \ VPROLD $shift, a, a \ VMOVAPD c, tmp \ VPADDD b, a, a #define ROUND1noload(a, b, c, d, const, shift) \ VPXORQ c, tmp, tmp \ VPADDD 64*const(consts), a, a \ VPADDD mem, a, a \ VPTERNLOGD $0x6C, b, d, tmp \ VPADDD tmp, a, a \ VPROLD $shift, a, a \ VMOVAPD c, tmp \ VPADDD b, a, a #define ROUND2(a, b, c, d, zreg, const, shift) \ VPADDD 64*const(consts), a, a \ VPADDD zreg, a, a \ VANDNPD c, tmp, tmp \ VPTERNLOGD $0xEC, b, tmp, tmp2 \ VMOVAPD c, tmp \ VPADDD tmp2, a, a \ VMOVAPD c, tmp2 \ VPROLD $shift, a, a \ VPADDD b, a, a #define ROUND3(a, b, c, d, zreg, const, shift) \ VPADDD 64*const(consts), a, a \ VPADDD zreg, a, a \ VPTERNLOGD $0x96, b, d, tmp \ VPADDD tmp, a, a \ VPROLD $shift, a, a \ VMOVAPD b, tmp \ VPADDD b, a, a #define ROUND4(a, b, c, d, zreg, const, shift) \ VPADDD 64*const(consts), a, a \ VPADDD zreg, a, a \ VPTERNLOGD $0x36, b, c, tmp \ VPADDD tmp, a, a \ VPROLD $shift, a, a \ VPXORQ c, ones, tmp \ VPADDD b, a, a TEXT ·block16(SB), 4, $0-40 MOVQ state+0(FP), BX MOVQ base+8(FP), SI MOVQ ptrs+16(FP), AX KMOVQ mask+24(FP), K1 MOVQ n+32(FP), DX MOVQ ·avx512md5consts+0(SB), DI #define a Z0 #define b Z1 #define c Z2 #define d Z3 #define sa Z4 #define sb Z5 #define sc Z6 #define sd Z7 #define tmp Z8 #define tmp2 Z9 #define ptrs Z10 #define ones Z12 #define mem Z15 #define kmask K1 #define ktmp K3 // ---------------------------------------------------------- // Registers Z16 through to Z31 are used for caching purposes // ---------------------------------------------------------- #define dig BX #define count DX #define base SI #define consts DI // load digest into state registers VMOVUPD (dig), a VMOVUPD 0x40(dig), b VMOVUPD 0x80(dig), c VMOVUPD 0xc0(dig), d // load source pointers VMOVUPD 0x00(AX), ptrs MOVQ $-1, AX VPBROADCASTQ AX, ones loop: VMOVAPD a, sa VMOVAPD b, sb VMOVAPD c, sc VMOVAPD d, sd prep(0) VMOVAPD d, tmp VMOVAPD mem, Z16 ROUND1(a,b,c,d, 1,0x00, 7) VMOVAPD mem, Z17 ROUND1(d,a,b,c, 2,0x01,12) VMOVAPD mem, Z18 ROUND1(c,d,a,b, 3,0x02,17) VMOVAPD mem, Z19 ROUND1(b,c,d,a, 4,0x03,22) VMOVAPD mem, Z20 ROUND1(a,b,c,d, 5,0x04, 7) VMOVAPD mem, Z21 ROUND1(d,a,b,c, 6,0x05,12) VMOVAPD mem, Z22 ROUND1(c,d,a,b, 7,0x06,17) VMOVAPD mem, Z23 ROUND1(b,c,d,a, 8,0x07,22) VMOVAPD mem, Z24 ROUND1(a,b,c,d, 9,0x08, 7) VMOVAPD mem, Z25 ROUND1(d,a,b,c,10,0x09,12) VMOVAPD mem, Z26 ROUND1(c,d,a,b,11,0x0a,17) VMOVAPD mem, Z27 ROUND1(b,c,d,a,12,0x0b,22) VMOVAPD mem, Z28 ROUND1(a,b,c,d,13,0x0c, 7) VMOVAPD mem, Z29 ROUND1(d,a,b,c,14,0x0d,12) VMOVAPD mem, Z30 ROUND1(c,d,a,b,15,0x0e,17) VMOVAPD mem, Z31 ROUND1noload(b,c,d,a, 0x0f,22) VMOVAPD d, tmp VMOVAPD d, tmp2 ROUND2(a,b,c,d, Z17,0x10, 5) ROUND2(d,a,b,c, Z22,0x11, 9) ROUND2(c,d,a,b, Z27,0x12,14) ROUND2(b,c,d,a, Z16,0x13,20) ROUND2(a,b,c,d, Z21,0x14, 5) ROUND2(d,a,b,c, Z26,0x15, 9) ROUND2(c,d,a,b, Z31,0x16,14) ROUND2(b,c,d,a, Z20,0x17,20) ROUND2(a,b,c,d, Z25,0x18, 5) ROUND2(d,a,b,c, Z30,0x19, 9) ROUND2(c,d,a,b, Z19,0x1a,14) ROUND2(b,c,d,a, Z24,0x1b,20) ROUND2(a,b,c,d, Z29,0x1c, 5) ROUND2(d,a,b,c, Z18,0x1d, 9) ROUND2(c,d,a,b, Z23,0x1e,14) ROUND2(b,c,d,a, Z28,0x1f,20) VMOVAPD c, tmp ROUND3(a,b,c,d, Z21,0x20, 4) ROUND3(d,a,b,c, Z24,0x21,11) ROUND3(c,d,a,b, Z27,0x22,16) ROUND3(b,c,d,a, Z30,0x23,23) ROUND3(a,b,c,d, Z17,0x24, 4) ROUND3(d,a,b,c, Z20,0x25,11) ROUND3(c,d,a,b, Z23,0x26,16) ROUND3(b,c,d,a, Z26,0x27,23) ROUND3(a,b,c,d, Z29,0x28, 4) ROUND3(d,a,b,c, Z16,0x29,11) ROUND3(c,d,a,b, Z19,0x2a,16) ROUND3(b,c,d,a, Z22,0x2b,23) ROUND3(a,b,c,d, Z25,0x2c, 4) ROUND3(d,a,b,c, Z28,0x2d,11) ROUND3(c,d,a,b, Z31,0x2e,16) ROUND3(b,c,d,a, Z18,0x2f,23) VPXORQ d, ones, tmp ROUND4(a,b,c,d, Z16,0x30, 6) ROUND4(d,a,b,c, Z23,0x31,10) ROUND4(c,d,a,b, Z30,0x32,15) ROUND4(b,c,d,a, Z21,0x33,21) ROUND4(a,b,c,d, Z28,0x34, 6) ROUND4(d,a,b,c, Z19,0x35,10) ROUND4(c,d,a,b, Z26,0x36,15) ROUND4(b,c,d,a, Z17,0x37,21) ROUND4(a,b,c,d, Z24,0x38, 6) ROUND4(d,a,b,c, Z31,0x39,10) ROUND4(c,d,a,b, Z22,0x3a,15) ROUND4(b,c,d,a, Z29,0x3b,21) ROUND4(a,b,c,d, Z20,0x3c, 6) ROUND4(d,a,b,c, Z27,0x3d,10) ROUND4(c,d,a,b, Z18,0x3e,15) ROUND4(b,c,d,a, Z25,0x3f,21) VPADDD sa, a, a VPADDD sb, b, b VPADDD sc, c, c VPADDD sd, d, d LEAQ 64(base), base SUBQ $64, count JNE loop VMOVUPD a, (dig) VMOVUPD b, 0x40(dig) VMOVUPD c, 0x80(dig) VMOVUPD d, 0xc0(dig) VZEROUPPER RET md5-simd-1.1.2/block16_amd64_test.go000066400000000000000000000176011377566263400170040ustar00rootroot00000000000000//+build !noasm,!appengine,gc // Copyright (c) 2020 MinIO Inc. All rights reserved. // Use of this source code is governed by a license that can be // found in the LICENSE file. package md5simd import ( "bytes" "encoding/binary" "encoding/hex" "strings" "testing" "unsafe" "github.com/klauspost/cpuid/v2" ) func reverse(s string) string { runes := []rune(s) for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 { runes[i], runes[j] = runes[j], runes[i] } return string(runes) } func block16Inputs() (input [16][]byte) { gld, i := golden[8:], 0 // fill initial test vectors from golden test vectors with length >= 64 for g := range gld { if len(gld[g].in) >= 64 { input[i] = []byte(gld[g].in[:64]) i++ if i >= 8 /*len(input)*/ { break } } } // fill upper 8 test vectors with the reverse strings of lower for ; i < len(input); i++ { input[i] = []byte(reverse(string(input[i-8]))) } return } func TestBlock16(t *testing.T) { if !hasAVX512 { t.SkipNow() } input := block16Inputs() var s digest16 for i := 0; i < 16; i++ { s.v0[i], s.v1[i], s.v2[i], s.v3[i] = init0, init1, init2, init3 } bufs := [16]int32{4, 4 + internalBlockSize, 4 + internalBlockSize*2, 4 + internalBlockSize*3, 4 + internalBlockSize*4, 4 + internalBlockSize*5, 4 + internalBlockSize*6, 4 + internalBlockSize*7, 4 + internalBlockSize*8, 4 + internalBlockSize*9, 4 + internalBlockSize*10, 4 + internalBlockSize*11, 4 + internalBlockSize*12, 4 + internalBlockSize*13, 4 + internalBlockSize*14, 4 + internalBlockSize*15} base := make([]byte, 4+16*internalBlockSize) for i := 0; i < len(input); i++ { copy(base[bufs[i]:], input[i]) } block16(&s.v0[0], uintptr(unsafe.Pointer(&(base[0]))), &bufs[0], 0xffff, 64) want := `00000000 82 3c 09 52 b9 77 11 2a 65 ee 4c 82 f9 ad 4d 28 |.<.R.w.*e.L...M(| 00000010 82 53 aa b9 4d 9c 94 07 93 c7 ce 70 9b 18 c9 a7 |.S..M......p....| 00000020 4a 95 90 aa 8f 8f f6 29 59 b3 95 9f 5f 3c b0 08 |J......)Y..._<..| 00000030 56 1d 88 66 26 d7 12 cc e4 41 2f 07 1b 7c 1a 4d |V..f&....A/..|.M| 00000040 ab 71 57 fc 43 2d ee a3 b5 a8 11 9a 3d e2 33 84 |.qW.C-......=.3.| 00000050 41 b0 a7 71 38 3e 16 e6 8c 23 80 fa f2 18 45 c3 |A..q8>...#....E.| 00000060 72 08 7e 17 a6 52 b7 a9 24 38 d1 44 f1 12 ec a2 |r.~..R..$8.D....| 00000070 bb 0a 2c c5 7a cc a2 49 bf 44 a2 1b 0f fe 08 49 |..,.z..I.D.....I| 00000080 9f 5d 41 c2 1b 45 75 aa 36 3a 05 f9 36 a9 14 18 |.]A..Eu.6:..6...| 00000090 e1 1c f8 67 52 f4 59 c8 de 2e c6 c1 24 f3 fd 82 |...gR.Y.....$...| 000000a0 7c 0d c0 7d 2a 1e f4 9e 60 f9 0e 11 b9 fd a5 79 ||..}*..........y| 000000b0 57 9d 20 80 cc f3 da 4e ec 7b 5d 2b 71 86 1d e0 |W. ....N.{]+q...| 000000c0 06 db 9c fa 5d fa 1f 90 fc 1f f4 61 cc 2c 8e 3a |....]......a.,.:| 000000d0 87 84 9f 50 39 78 ec 5b 01 a8 be fa 0a 0b 5f 9d |...P9x.[......_.| 000000e0 75 e1 ce 30 97 4c 9e 87 6d b4 1c e8 ae 59 0f cd |u..0.L..m....Y..| 000000f0 7e 4d a1 cf 85 2d 33 1d 4a a7 0f 36 26 9e fd 37 |~M...-3.J..6&..7| ` state := [256]byte{} for i := 0; i < 16; i++ { binary.LittleEndian.PutUint32(state[0x00+i*4:], s.v0[i]) binary.LittleEndian.PutUint32(state[0x40+i*4:], s.v1[i]) binary.LittleEndian.PutUint32(state[0x80+i*4:], s.v2[i]) binary.LittleEndian.PutUint32(state[0xc0+i*4:], s.v3[i]) } got := hex.Dump(state[:]) got = strings.ReplaceAll(got, "`", ".") if got != want { t.Fatalf("got %s\n want %s", got, want) } } func TestBlock16Masked(t *testing.T) { if !hasAVX512 { t.SkipNow() } input := block16Inputs() // Nil out every other input vector for i := range input { if (i & 1) == 1 { input[i] = nil } } const mask = 0x5555 var s digest16 for i := 0; i < 16; i++ { s.v0[i], s.v1[i], s.v2[i], s.v3[i] = init0, init1, init2, init3 } bufs := [16]int32{4, 4 + internalBlockSize, 4 + internalBlockSize*2, 4 + internalBlockSize*3, 4 + internalBlockSize*4, 4 + internalBlockSize*5, 4 + internalBlockSize*6, 4 + internalBlockSize*7, 4 + internalBlockSize*8, 4 + internalBlockSize*9, 4 + internalBlockSize*10, 4 + internalBlockSize*11, 4 + internalBlockSize*12, 4 + internalBlockSize*13, 4 + internalBlockSize*14, 4 + internalBlockSize*15} base := make([]byte, 4+16*internalBlockSize) for i := 0; i < len(input); i++ { if input[i] != nil { copy(base[bufs[i]:], input[i]) } } block16(&s.v0[0], uintptr(unsafe.Pointer(&(base[0]))), &bufs[0], mask, 64) want := `00000000 82 3c 09 52 ac 1d 1f 03 65 ee 4c 82 ac 1d 1f 03 |.<.R....e.L.....| 00000010 82 53 aa b9 ac 1d 1f 03 93 c7 ce 70 ac 1d 1f 03 |.S.........p....| 00000020 4a 95 90 aa ac 1d 1f 03 59 b3 95 9f ac 1d 1f 03 |J.......Y.......| 00000030 56 1d 88 66 ac 1d 1f 03 e4 41 2f 07 ac 1d 1f 03 |V..f.....A/.....| 00000040 ab 71 57 fc d0 8e a5 6e b5 a8 11 9a d0 8e a5 6e |.qW....n.......n| 00000050 41 b0 a7 71 d0 8e a5 6e 8c 23 80 fa d0 8e a5 6e |A..q...n.#.....n| 00000060 72 08 7e 17 d0 8e a5 6e 24 38 d1 44 d0 8e a5 6e |r.~....n$8.D...n| 00000070 bb 0a 2c c5 d0 8e a5 6e bf 44 a2 1b d0 8e a5 6e |..,....n.D.....n| 00000080 9f 5d 41 c2 b7 67 ab 1f 36 3a 05 f9 b7 67 ab 1f |.]A..g..6:...g..| 00000090 e1 1c f8 67 b7 67 ab 1f de 2e c6 c1 b7 67 ab 1f |...g.g.......g..| 000000a0 7c 0d c0 7d b7 67 ab 1f 60 f9 0e 11 b7 67 ab 1f ||..}.g.......g..| 000000b0 57 9d 20 80 b7 67 ab 1f ec 7b 5d 2b b7 67 ab 1f |W. ..g...{]+.g..| 000000c0 06 db 9c fa 91 77 31 74 fc 1f f4 61 91 77 31 74 |.....w1t...a.w1t| 000000d0 87 84 9f 50 91 77 31 74 01 a8 be fa 91 77 31 74 |...P.w1t.....w1t| 000000e0 75 e1 ce 30 91 77 31 74 6d b4 1c e8 91 77 31 74 |u..0.w1tm....w1t| 000000f0 7e 4d a1 cf 91 77 31 74 4a a7 0f 36 91 77 31 74 |~M...w1tJ..6.w1t| ` state := [256]byte{} for i := 0; i < 16; i++ { binary.LittleEndian.PutUint32(state[0x00+i*4:], s.v0[i]) binary.LittleEndian.PutUint32(state[0x40+i*4:], s.v1[i]) binary.LittleEndian.PutUint32(state[0x80+i*4:], s.v2[i]) binary.LittleEndian.PutUint32(state[0xc0+i*4:], s.v3[i]) } got := hex.Dump(state[:]) got = strings.ReplaceAll(got, "`", ".") if got != want { t.Fatalf("got %s\n want %s", got, want) } } func BenchmarkBlock8(b *testing.B) { if !cpuid.CPU.Supports(cpuid.AVX2) { b.SkipNow() } const size = 64 input := [8][]byte{} for i := range input { input[i] = bytes.Repeat([]byte{0x61 + byte(i*1)}, size) } var s digest8 for i := 0; i < 8; i++ { s.v0[i], s.v1[i], s.v2[i], s.v3[i] = init0, init1, init2, init3 } var cache cache8 // stack storage for block16 tmp state bufs := [8]int32{4, 4 + internalBlockSize, 4 + internalBlockSize*2, 4 + internalBlockSize*3, 4 + internalBlockSize*4, 4 + internalBlockSize*5, 4 + internalBlockSize*6, 4 + internalBlockSize*7} base := make([]byte, 4+16*internalBlockSize) for i := 0; i < len(input); i++ { copy(base[bufs[i]:], input[i]) } b.SetBytes(int64(size * 8)) b.ReportAllocs() b.ResetTimer() for j := 0; j < b.N; j++ { block8(&s.v0[0], uintptr(unsafe.Pointer(&(base[0]))), &bufs[0], &cache[0], size) } } func BenchmarkBlock16(b *testing.B) { if !hasAVX512 { b.SkipNow() } const size = 64 input := [16][]byte{} for i := range input { input[i] = bytes.Repeat([]byte{0x61 + byte(i*1)}, size) } var s digest16 for i := 0; i < 16; i++ { s.v0[i], s.v1[i], s.v2[i], s.v3[i] = init0, init1, init2, init3 } bufs := [16]int32{4, 4 + internalBlockSize, 4 + internalBlockSize*2, 4 + internalBlockSize*3, 4 + internalBlockSize*4, 4 + internalBlockSize*5, 4 + internalBlockSize*6, 4 + internalBlockSize*7, 4 + internalBlockSize*8, 4 + internalBlockSize*9, 4 + internalBlockSize*10, 4 + internalBlockSize*11, 4 + internalBlockSize*12, 4 + internalBlockSize*13, 4 + internalBlockSize*14, 4 + internalBlockSize*15} base := make([]byte, 4+16*internalBlockSize) for i := 0; i < len(input); i++ { copy(base[bufs[i]:], input[i]) } b.SetBytes(int64(size * 16)) b.ReportAllocs() b.ResetTimer() for j := 0; j < b.N; j++ { block16(&s.v0[0], uintptr(unsafe.Pointer(&(base[0]))), &bufs[0], 0xffff, size) } } md5-simd-1.1.2/block8_amd64.s000066400000000000000000000155421377566263400155250ustar00rootroot00000000000000//+build !noasm,!appengine,gc // Copyright (c) 2018 Igneous Systems // MIT License // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to deal // in the Software without restriction, including without limitation the rights // to use, copy, modify, merge, publish, distribute, sublicense, and/or sell // copies of the Software, and to permit persons to whom the Software is // furnished to do so, subject to the following conditions: // // The above copyright notice and this permission notice shall be included in all // copies or substantial portions of the Software. // // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE // AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER // LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE // SOFTWARE. // Copyright (c) 2020 MinIO Inc. All rights reserved. // Use of this source code is governed by a license that can be // found in the LICENSE file. // This is the AVX2 implementation of the MD5 block function (8-way parallel) // block8(state *uint64, base uintptr, bufs *int32, cache *byte, n int) TEXT ·block8(SB), 4, $0-40 MOVQ state+0(FP), BX MOVQ base+8(FP), SI MOVQ bufs+16(FP), AX MOVQ cache+24(FP), CX MOVQ n+32(FP), DX MOVQ ·avx256md5consts+0(SB), DI // Align cache (which is stack allocated by the compiler) // to a 256 bit boundary (ymm register alignment) // The cache8 type is deliberately oversized to permit this. ADDQ $31, CX ANDB $-32, CL #define a Y0 #define b Y1 #define c Y2 #define d Y3 #define sa Y4 #define sb Y5 #define sc Y6 #define sd Y7 #define tmp Y8 #define tmp2 Y9 #define mask Y10 #define off Y11 #define ones Y12 #define rtmp1 Y13 #define rtmp2 Y14 #define mem Y15 #define dig BX #define cache CX #define count DX #define base SI #define consts DI #define prepmask \ VPXOR mask, mask, mask \ VPCMPGTD mask, off, mask #define prep(index) \ VMOVAPD mask, rtmp2 \ VPGATHERDD rtmp2, index*4(base)(off*1), mem #define load(index) \ VMOVAPD index*32(cache), mem #define store(index) \ VMOVAPD mem, index*32(cache) #define roll(shift, a) \ VPSLLD $shift, a, rtmp1 \ VPSRLD $32-shift, a, a \ VPOR rtmp1, a, a #define ROUND1(a, b, c, d, index, const, shift) \ VPXOR c, tmp, tmp \ VPADDD 32*const(consts), a, a \ VPADDD mem, a, a \ VPAND b, tmp, tmp \ VPXOR d, tmp, tmp \ prep(index) \ VPADDD tmp, a, a \ roll(shift,a) \ VMOVAPD c, tmp \ VPADDD b, a, a #define ROUND1load(a, b, c, d, index, const, shift) \ VXORPD c, tmp, tmp \ VPADDD 32*const(consts), a, a \ VPADDD mem, a, a \ VPAND b, tmp, tmp \ VPXOR d, tmp, tmp \ load(index) \ VPADDD tmp, a, a \ roll(shift,a) \ VMOVAPD c, tmp \ VPADDD b, a, a #define ROUND2(a, b, c, d, index, const, shift) \ VPADDD 32*const(consts), a, a \ VPADDD mem, a, a \ VPAND b, tmp2, tmp2 \ VANDNPD c, tmp, tmp \ load(index) \ VPOR tmp, tmp2, tmp2 \ VMOVAPD c, tmp \ VPADDD tmp2, a, a \ VMOVAPD c, tmp2 \ roll(shift,a) \ VPADDD b, a, a #define ROUND3(a, b, c, d, index, const, shift) \ VPADDD 32*const(consts), a, a \ VPADDD mem, a, a \ load(index) \ VPXOR d, tmp, tmp \ VPXOR b, tmp, tmp \ VPADDD tmp, a, a \ roll(shift,a) \ VMOVAPD b, tmp \ VPADDD b, a, a #define ROUND4(a, b, c, d, index, const, shift) \ VPADDD 32*const(consts), a, a \ VPADDD mem, a, a \ VPOR b, tmp, tmp \ VPXOR c, tmp, tmp \ VPADDD tmp, a, a \ load(index) \ roll(shift,a) \ VPXOR c, ones, tmp \ VPADDD b, a, a // load digest into state registers VMOVUPD (dig), a VMOVUPD 32(dig), b VMOVUPD 64(dig), c VMOVUPD 96(dig), d // load source buffer offsets VMOVUPD (AX), off prepmask VPCMPEQD ones, ones, ones loop: VMOVAPD a, sa VMOVAPD b, sb VMOVAPD c, sc VMOVAPD d, sd prep(0) VMOVAPD d, tmp store(0) ROUND1(a,b,c,d, 1,0x00, 7) store(1) ROUND1(d,a,b,c, 2,0x01,12) store(2) ROUND1(c,d,a,b, 3,0x02,17) store(3) ROUND1(b,c,d,a, 4,0x03,22) store(4) ROUND1(a,b,c,d, 5,0x04, 7) store(5) ROUND1(d,a,b,c, 6,0x05,12) store(6) ROUND1(c,d,a,b, 7,0x06,17) store(7) ROUND1(b,c,d,a, 8,0x07,22) store(8) ROUND1(a,b,c,d, 9,0x08, 7) store(9) ROUND1(d,a,b,c,10,0x09,12) store(10) ROUND1(c,d,a,b,11,0x0a,17) store(11) ROUND1(b,c,d,a,12,0x0b,22) store(12) ROUND1(a,b,c,d,13,0x0c, 7) store(13) ROUND1(d,a,b,c,14,0x0d,12) store(14) ROUND1(c,d,a,b,15,0x0e,17) store(15) ROUND1load(b,c,d,a, 1,0x0f,22) VMOVAPD d, tmp VMOVAPD d, tmp2 ROUND2(a,b,c,d, 6,0x10, 5) ROUND2(d,a,b,c,11,0x11, 9) ROUND2(c,d,a,b, 0,0x12,14) ROUND2(b,c,d,a, 5,0x13,20) ROUND2(a,b,c,d,10,0x14, 5) ROUND2(d,a,b,c,15,0x15, 9) ROUND2(c,d,a,b, 4,0x16,14) ROUND2(b,c,d,a, 9,0x17,20) ROUND2(a,b,c,d,14,0x18, 5) ROUND2(d,a,b,c, 3,0x19, 9) ROUND2(c,d,a,b, 8,0x1a,14) ROUND2(b,c,d,a,13,0x1b,20) ROUND2(a,b,c,d, 2,0x1c, 5) ROUND2(d,a,b,c, 7,0x1d, 9) ROUND2(c,d,a,b,12,0x1e,14) ROUND2(b,c,d,a, 0,0x1f,20) load(5) VMOVAPD c, tmp ROUND3(a,b,c,d, 8,0x20, 4) ROUND3(d,a,b,c,11,0x21,11) ROUND3(c,d,a,b,14,0x22,16) ROUND3(b,c,d,a, 1,0x23,23) ROUND3(a,b,c,d, 4,0x24, 4) ROUND3(d,a,b,c, 7,0x25,11) ROUND3(c,d,a,b,10,0x26,16) ROUND3(b,c,d,a,13,0x27,23) ROUND3(a,b,c,d, 0,0x28, 4) ROUND3(d,a,b,c, 3,0x29,11) ROUND3(c,d,a,b, 6,0x2a,16) ROUND3(b,c,d,a, 9,0x2b,23) ROUND3(a,b,c,d,12,0x2c, 4) ROUND3(d,a,b,c,15,0x2d,11) ROUND3(c,d,a,b, 2,0x2e,16) ROUND3(b,c,d,a, 0,0x2f,23) load(0) VPXOR d, ones, tmp ROUND4(a,b,c,d, 7,0x30, 6) ROUND4(d,a,b,c,14,0x31,10) ROUND4(c,d,a,b, 5,0x32,15) ROUND4(b,c,d,a,12,0x33,21) ROUND4(a,b,c,d, 3,0x34, 6) ROUND4(d,a,b,c,10,0x35,10) ROUND4(c,d,a,b, 1,0x36,15) ROUND4(b,c,d,a, 8,0x37,21) ROUND4(a,b,c,d,15,0x38, 6) ROUND4(d,a,b,c, 6,0x39,10) ROUND4(c,d,a,b,13,0x3a,15) ROUND4(b,c,d,a, 4,0x3b,21) ROUND4(a,b,c,d,11,0x3c, 6) ROUND4(d,a,b,c, 2,0x3d,10) ROUND4(c,d,a,b, 9,0x3e,15) ROUND4(b,c,d,a, 0,0x3f,21) VPADDD sa, a, a VPADDD sb, b, b VPADDD sc, c, c VPADDD sd, d, d LEAQ 64(base), base SUBQ $64, count JNE loop VMOVUPD a, (dig) VMOVUPD b, 32(dig) VMOVUPD c, 64(dig) VMOVUPD d, 96(dig) VZEROUPPER RET md5-simd-1.1.2/block_amd64.go000066400000000000000000000145051377566263400155760ustar00rootroot00000000000000//+build !noasm,!appengine,gc // Copyright (c) 2020 MinIO Inc. All rights reserved. // Use of this source code is governed by a license that can be // found in the LICENSE file. package md5simd import ( "fmt" "math" "unsafe" "github.com/klauspost/cpuid/v2" ) var hasAVX512 bool func init() { // VANDNPD requires AVX512DQ. Technically it could be VPTERNLOGQ which is AVX512F. hasAVX512 = cpuid.CPU.Supports(cpuid.AVX512F, cpuid.AVX512DQ) } //go:noescape func block8(state *uint32, base uintptr, bufs *int32, cache *byte, n int) //go:noescape func block16(state *uint32, base uintptr, ptrs *int32, mask uint64, n int) // 8-way 4x uint32 digests in 4 ymm registers // (ymm0, ymm1, ymm2, ymm3) type digest8 struct { v0, v1, v2, v3 [8]uint32 } // Stack cache for 8x64 byte md5.BlockSize bytes. // Must be 32-byte aligned, so allocate 512+32 and // align upwards at runtime. type cache8 [512 + 32]byte // MD5 magic numbers for one lane of hashing; inflated // 8x below at init time. var md5consts = [64]uint32{ 0xd76aa478, 0xe8c7b756, 0x242070db, 0xc1bdceee, 0xf57c0faf, 0x4787c62a, 0xa8304613, 0xfd469501, 0x698098d8, 0x8b44f7af, 0xffff5bb1, 0x895cd7be, 0x6b901122, 0xfd987193, 0xa679438e, 0x49b40821, 0xf61e2562, 0xc040b340, 0x265e5a51, 0xe9b6c7aa, 0xd62f105d, 0x02441453, 0xd8a1e681, 0xe7d3fbc8, 0x21e1cde6, 0xc33707d6, 0xf4d50d87, 0x455a14ed, 0xa9e3e905, 0xfcefa3f8, 0x676f02d9, 0x8d2a4c8a, 0xfffa3942, 0x8771f681, 0x6d9d6122, 0xfde5380c, 0xa4beea44, 0x4bdecfa9, 0xf6bb4b60, 0xbebfbc70, 0x289b7ec6, 0xeaa127fa, 0xd4ef3085, 0x04881d05, 0xd9d4d039, 0xe6db99e5, 0x1fa27cf8, 0xc4ac5665, 0xf4292244, 0x432aff97, 0xab9423a7, 0xfc93a039, 0x655b59c3, 0x8f0ccc92, 0xffeff47d, 0x85845dd1, 0x6fa87e4f, 0xfe2ce6e0, 0xa3014314, 0x4e0811a1, 0xf7537e82, 0xbd3af235, 0x2ad7d2bb, 0xeb86d391, } // inflate the consts 8-way for 8x md5 (256 bit ymm registers) var avx256md5consts = func(c []uint32) []uint32 { inf := make([]uint32, 8*len(c)) for i := range c { for j := 0; j < 8; j++ { inf[(i*8)+j] = c[i] } } return inf }(md5consts[:]) // 16-way 4x uint32 digests in 4 zmm registers type digest16 struct { v0, v1, v2, v3 [16]uint32 } // inflate the consts 16-way for 16x md5 (512 bit zmm registers) var avx512md5consts = func(c []uint32) []uint32 { inf := make([]uint32, 16*len(c)) for i := range c { for j := 0; j < 16; j++ { inf[(i*16)+j] = c[i] } } return inf }(md5consts[:]) // Interface function to assembly code func (s *md5Server) blockMd5_x16(d *digest16, input [16][]byte, half bool) { if hasAVX512 { blockMd5_avx512(d, input, s.allBufs, &s.maskRounds16) return } // Preparing data using copy is slower since copies aren't inlined. // Calculate on this goroutine if half { for i := range s.i8[0][:] { s.i8[0][i] = input[i] } for i := range s.d8a.v0[:] { s.d8a.v0[i], s.d8a.v1[i], s.d8a.v2[i], s.d8a.v3[i] = d.v0[i], d.v1[i], d.v2[i], d.v3[i] } blockMd5_avx2(&s.d8a, s.i8[0], s.allBufs, &s.maskRounds8a) for i := range s.d8a.v0[:] { d.v0[i], d.v1[i], d.v2[i], d.v3[i] = s.d8a.v0[i], s.d8a.v1[i], s.d8a.v2[i], s.d8a.v3[i] } return } for i := range s.i8[0][:] { s.i8[0][i], s.i8[1][i] = input[i], input[8+i] } for i := range s.d8a.v0[:] { j := (i + 8) & 15 s.d8a.v0[i], s.d8a.v1[i], s.d8a.v2[i], s.d8a.v3[i] = d.v0[i], d.v1[i], d.v2[i], d.v3[i] s.d8b.v0[i], s.d8b.v1[i], s.d8b.v2[i], s.d8b.v3[i] = d.v0[j], d.v1[j], d.v2[j], d.v3[j] } // Benchmarks appears to be slightly faster when spinning up 2 goroutines instead // of using the current for one of the blocks. s.wg.Add(2) go func() { blockMd5_avx2(&s.d8a, s.i8[0], s.allBufs, &s.maskRounds8a); s.wg.Done() }() go func() { blockMd5_avx2(&s.d8b, s.i8[1], s.allBufs, &s.maskRounds8b); s.wg.Done() }() s.wg.Wait() for i := range s.d8a.v0[:] { d.v0[i], d.v1[i], d.v2[i], d.v3[i] = s.d8a.v0[i], s.d8a.v1[i], s.d8a.v2[i], s.d8a.v3[i] } for i := range s.d8b.v0[:] { j := (i + 8) & 15 d.v0[j], d.v1[j], d.v2[j], d.v3[j] = s.d8b.v0[i], s.d8b.v1[i], s.d8b.v2[i], s.d8b.v3[i] } } // Interface function to AVX512 assembly code func blockMd5_avx512(s *digest16, input [16][]byte, base []byte, maskRounds *[16]maskRounds) { baseMin := uint64(uintptr(unsafe.Pointer(&(base[0])))) ptrs := [16]int32{} for i := range ptrs { if len(input[i]) > 0 { if len(input[i]) > internalBlockSize { panic(fmt.Sprintf("Sanity check fails for lane %d: maximum input length cannot exceed internalBlockSize", i)) } off := uint64(uintptr(unsafe.Pointer(&(input[i][0])))) - baseMin if off > math.MaxUint32 { panic(fmt.Sprintf("invalid buffer sent with offset %x", off)) } ptrs[i] = int32(off) } } sdup := *s // create copy of initial states to receive intermediate updates rounds := generateMaskAndRounds16(input, maskRounds) for r := 0; r < rounds; r++ { m := maskRounds[r] block16(&sdup.v0[0], uintptr(baseMin), &ptrs[0], m.mask, int(64*m.rounds)) for j := 0; j < len(ptrs); j++ { ptrs[j] += int32(64 * m.rounds) // update pointers for next round if m.mask&(1< 0 { if len(input[i]) > internalBlockSize { panic(fmt.Sprintf("Sanity check fails for lane %d: maximum input length cannot exceed internalBlockSize", i)) } off := uint64(uintptr(unsafe.Pointer(&(input[i][0])))) - baseMin if off > math.MaxUint32 { panic(fmt.Sprintf("invalid buffer sent with offset %x", off)) } ptrs[i] = int32(off) } } sdup := *s // create copy of initial states to receive intermediate updates rounds := generateMaskAndRounds8(input, maskRounds) for r := 0; r < rounds; r++ { m := maskRounds[r] var cache cache8 // stack storage for block8 tmp state block8(&sdup.v0[0], uintptr(baseMin), &ptrs[0], &cache[0], int(64*m.rounds)) for j := 0; j < len(ptrs); j++ { ptrs[j] += int32(64 * m.rounds) // update pointers for next round if m.mask&(1<?::JF aΰvQK眰E{wwՓ5t,aEUaf~~)8ЛR+yxxlo_RSS drQQQdiqwX?.RHV#j:ح#9>77H477\SSEaaaQa|,nOO)a||X[Hq 9d#[&4mIXT5i #www 6+QneeeVJJJ$}usswa0 OxXTeQ566VܜU :NTv(򢇇%*EѾA;??'KF~ɞCndggK;ƅʐ|R0<<,~+Kttl;׫տJOO5""a0 @#ΓjEUvv&{X% HB1STE()---2A!%*_FvN]\\Rv HeT\9997 .Ń6HE|QjqgVWWgRJ3*xOLLXHT3;;+ lccCm*D!Z455%2)c =p<PeUEClaaq>h4b.ZV<: {c<44EZtVVzܷǹA6aV{x ˮ9Pֹht߿AKJ (B RAI* I@ PAPBUT\Gɹ{>y#VHUwޘ +VD_||~ԨQ̱|A}v7J@N֪U 9m|ޏ75w! kݶmۢÇGuԑr BQ6"ϫWNQ3wяy; s|`?~~a46gυ Jm D9g-A!JO .@9uc9zn!K ;avu(Ia!U;!o 2*_f216^ ۩S譑UVh+x { >$օo<;2AY1g߯ `ӧO &UQ\ErZhQ$q\Q]>HǜyQ7a;{rHUI'N# qʗ/J/4  )U*n{d 2e!O@i?wP=b'9JMիW^~mvڕP[%>Ν;s!FY!fx0-u2PlUV1uaȑ :4IUTf cNRUQiӦC9"3x$J7 ݻwR y]}4a.<7I,GAr~+@!ܽ{7:]|9 ms&˳`1Pgif͚"U1*4 ̒JYT3qy yvwO(%x-Zxw&C1;kc9>[Ff_i6+[d UƬ3 HU#U M@^z5Iձcʲ}XH3/2ŁA^߷o(G\00:pXEYgϞhΜ9r.}q|~ANW(*pvr\WB)Gſ$5dJI&䓁Pw'L Z0`@**LOTGe K_Hm)7p@?1ćP 5r=Ɠ'O(Ua\$A umԨQ:r ~]ςĬJm2a?/^pE, R|JEܚÜKCBuO$TU'JCCp##t%t*'u̙hA3^e?ߓ0IW(;ݻwc=F~W}O1=k``0X_d"4@Xl0Sߏr'ѯ .yTr"iB*Y6$*"MQTݸq#*}ݺu&I5rtî`.7Z5A'In"1$INH&iY;T1R Lҵk\$_RTm9ϳ scRE2h>*"QSCޓ@͍` @AMr7RlcjkB}Ez9T(Je>>DhY1g`b-N! Ɩ\b"8LSr1 AR >|(bF`dQ1q!8w܀jr*4Λ$f'+7"w,HV\Yϟ?-ζiPI @ ,J-Dy@<66IAxP~Rt9n@2(!v9́}HU{BNƀ1w>`烀 Pne_lF "{E\2w>?B,LەƬ3 HU#U sqBJ$,E_lٲ%;*nj#yEMxebFT{0y>FLi 3$U~ꨗ&j;i\O!!{$;*kr52yV"Ug'N ,eޘ {Fh0 Fj0 D}"}e%U!'dTm…3$ȸ`JdeMD"UCKYJ#U!Gɛ:ܹs r5ST21 ߝ̇9aZ\ X^ (DYd i~+cL zpGχs趁~Yf UK.~pM&IS⒪cOyAɇ%/~#*:4J?*ko5ݦ j}_K2ǻHU1?4}`0T5R`0ڵks(ZjT17BTL$}aÆe7u_/M$ NY"8T1URTAȈHU"L2m_Xe$nԷ2/U27I>|opgӅ2t@L<Ν;JR_C o@) N\K(34e|72"U'q/TjF6kX52$F`/qFȎyf|.&'1HUCW&XI@ BIU}ſFY1g` '5e[n-r=Q/cVDB{蒪wܑy0O,^%DRӟ2Au$R cVTt7OFp-V8 6%a%>K֭ʻ}vTlAN>m .O1*,gv'Ue6m~ԩOJ2?S .+ ~iҤIrpq'U-Z(R?e(\$*A+Ru޼yג$UïIU}ſFjY1gT`0R`0܀eә0)u*1g*6DF l!+JFCHIqs!ʃvHe˖H_{AIUŋK+#&01Ç[ӧO ]IsK5&4.b +m)WP-+[yAzm# /qM(ut@~8~k̓R:u_T}G DꯑIU}ſFjY1g,`0j0 %E?&!Ȏ8֭[4u@s΂@ӧOb|{U88uLQir@ уoS 3d.HU8P$U 2L.3dk5~!yQ.X ׯ3@UժU05%JmO!>88VP!mO `ysi21Fم;9C;ꏂeeg!K?_#ÓTUc@~f0 a`0DQ m0TLOyW"hN0R4o\R=R5IRrTѣG#_\~k~شim򚞜 dyu(APl^E mٳg# Sܹs!]#TE] "EW$#F<֒T G|C52<IU1=4j}`0J T5 #U ]z6O<:ts=nRv}I5k֬yQ`V,Ru׮]^'DJ|2r9t7GgϞu\yڵ6R/A$D pH+r5Jfz%*miɒ%Fy˄+ׯ_GժUZHU~mѮ];crʕ^ !D*ߐOWq?_#ÓTc@~f0 F UЪ$%@ʕ M0CwE/hLtk-͉=N8 VF $pN=֯_??>E*)>x>.@vp7zJMe8׫ի*Dl1xycʔ)y?I{<3Arq{Ԓ 6T\p]ٳg۾}{?C߿?:}4D?ʩ£HXAv\#5hNPW$[̣|<c@~f0 F (|*$ ΂€ dH>;JgǎC:߿}eT8m۶L+46fx^LP\§D-mVҐD۷uSq˥H@Y);-!>ϐ[nC$\w Tl:7· D%sݻ;_{_@?td?~QKm=igcϞ̗9G!Ɯ#޳|5(3/ж̭wa^Pe^ ]x?0G5RZ-}5rgL*u1?$~w6 P@Wa&b6i#ѥq)E$:¢yg,'TCsTnnu;a=7UR B~g{*&=oeIP2`yן^kx^>TH\;,&}L[z'8M70Ԟt{#u}l4{/eOj6!T}w& FWp+--,\Atwp'IAyl8+| Y^x7|q,Ky qx(=n8Զ{B{UTض-7MJc1u]4Mq!v7 .OOY Qa<}DUQ@TUDUQQ@TUDUQ@T@TUDUQ@TUUDUQ@TUDUDU{w"SqbYhIA%E7l* " /xҊ$ ^UM Lchj0ktJ{=~ƌpN*ZZZԣG͟?_A|P뒵٥R!lND BUBݻw˗+4h ޼sq%Pj{6;o4Ę>ͩ&BU;wεmܸQ,F1B_|[hQֶxbիWϩTJ6lݻb@̩&BU֖H$Tmٲe^gʝZ"c߇}>yM:խ- ! ɜj"T*֭[]ہTX,ZXi[sH.]vuu… R(rԤio*J*ߚ5kdŒi/`N Tlct@>f27nP}}ǎxֽj*b1544#:k:>mBO]FСCmaljjr=zŜs BUXhZWWg!b`?~\egΜ[h۷oקOTҥK7n7ȑ#sv.קO>LrI&v^js Px >}[WlL&?46tNQ!TΎ] sט_"#HUh UA UA UA UATETET T T T R R R*R*R*R*HU*HU*HUo,n*{za[۶m_۶mfj7󞵥|f~mGϜFUץܺ.WtxA@sڵj֟kւõZxN'Q05- 'K+5"_j:zUM*6ea( hv]i& _*f][GyY{fMz *ɩi &O+7<7'UOXGڗզ|Z}QzwG^Tjy}E-:Z'FUUSW/-|HM~U1Q+ YsM/ztէ2DO.ԃKtWt.Z<*2QFǧ7 Ӓ *LOˮw*5Ȁj*ex]* ۵J֝ksƎ"ţQ{Ag~"DU+#v k;zUԔwAO?ݽ &A2;ex;Gٿ,T̸ U S 1lZSsǒW[LJpw^S Yzhi 6Mr}]i㭗|ڗզc~]UuAަ}L vDUULLN%:T(;d.9^U7Cmev9ga~~] nMڳMv٧6|,ژJItڻu j8=!UATxSt(=Dۤ^qsҮ }ksG{s}97lRqSDvwv{r_Ex7l U ;d+ A4Ws;,p>ݨY/SYMza}M Jպf]CүUEW}6 IwhLIfDT`ox̦lqQM7-x?׬%,գ顥Yz`q҆"EWh_gKBA hVcOb) ̸ L}Bo!5П, AM߻Bo+|5LM~#unB؞O{M6=|:~?80F@T2(I;tH }׭Gm`N;LBn*Փ toY~x]{;2Ӛ3M"eۡ evޙn䰒i SS #*@TĔqVsv)hkݹ&-: j.msYv$ 69MϲH_ԔTmFR? @>DULNMZ}YKtV^^3Vy3]ط)Q^;do( rE{'ԦC}tLQ_ju|.Lk/εk=hwQZ{Iۯpn{ʮ u+6I<61%~*@TBa7q)ӏT I8e):`]rɧW[#>D U C]#.Ytk>cNy;bLҲB{+_зDIi`zU8UoTU?ٳ,εI׷xljw:qEGlNmL-ݶBjގr*0s*l;rnѮ濿ſ |.f|r+&v) zuh[LJO+5:)Y ={'3>"mB]zlEjza}nW =[F+6 Z#4Y 0P[tнiWiD-͛5+.ɳ7xEGm^h+.n5aϣ*@TS̫p^hhu6ALmj4VjlR? URu(.9ZqgC8u =@tr_ԔG'&𳈪Q``d\jb:no{L-ʙkO* 7dQP%I8%_DTj FuTڵxzYNfՅ :]TKMr _@T#H >]GfYv}Utѫ3%!t{DSS_BTQ{pT>]hV-%UZZvH$,:uJ %/ٻ 0϶m۶ڶm7ۭm6mf2}}fzsΞ`q1$&&RJJJ`oowwwTTT@5-'DDDPg0"%^9 ͕FiVGn9ˎFG2T@DDDDCU?n͛7EHhd^:u*?~ΜwyGjkkѵkWtMgr-?>ykYlOcJDDT[gTΔnw=z Xc@8)EJN7""""$T%|@;+\2T/;K.ETTpij$44s^{5:tH5k[oū:tMѣGᅲ׶lbjӅ1T%""+3Jm[K<$TB^K<0mg06!8 Y`pJDDDD*âEsPuذa2nggK߿oȑ}FFnflo_mƚ 'DDDH).!l۝!.Č!g}S""""wmmWKj=0T%"Z3KV8L>K=[Exq@丹-qp {d?(4]۶5µmF8[ nFHFns~oYO2軛Ssyo=O}7EgW.DUzFhhhIIߞ|>k\+>'QY%sN=wx^[&'#Q1M,[\Q Uhiʕ֬qϛՅwy݄cخOU'6%|z~<^$&.Z+ַD.QQw3.Xw^U8k2']15yvA7ULkxqw/;`|;߉#8ƹ[3gw>`5{{zDUX|s~TA5~ԨST\nݺ^{mW_}ר|N7o^>}QQQ$ 믿#y3Vtފ_yskc1cLBϣ*aÆHl޼yQaO;Xre>?яz}_O?[{Xvm5{{J+ىzWn9ο'M>[+F<*98}};1c1ƔpT=K_RrmoQ5wu[]_dɒ駟^٬YRCS:USnK4FUQuĉ{o$zo]7Oy~}-\iӦ҈#X~5{{1*tu׫Hjn|qQweE򗿌|>ըl!OG??"1`µ---d~䝣)iDUX%N}J:tlYT_.UEosڴiZT={v]1$@#>$ҨQ ?Hf}vO&B44UP=ifs[ETU/7/sկ~U\]]кǖp8 /^[:S ?Lfz'BkGw\T1uaqQMrv*~~0>`ttt'>B[o\_#CŖ~ħ?hooOfzUzuc}TA+bvv*>I 2$ K51o޼uw\5uuu1>DUxn{DIzóvJ;#BP|.ҥKSGɓ'G>WrKqQG=S/$z&@}o|#/1k~.|lٲ>UE/}hVg'.`$}o{x &\siŖJKK_׿\rIqO}p݇>c-zk򗿌}cRDUHOOTԬn Qr5 ۓ磶Jӯ>'Qr|7ULkxq<^$UAT7a׎OTn|,Z@f*G6'ULkDI<1ni mî*y݄X!2UU!x6^*yb^=λwF:pTi6}Ud*"b+So׽ٕ^L"iѕ DUDUX%NmJ1qUUɔ7gAKSsk 3QDUZڻ⊧楊{/Jގ\. R1~Bz51wTTQ/F - *;g,&LEUUQ!j\zLTQ\>}:V*^hDUU95UL2,J@*ͭOizSnckd&J{g.~a7; R[4)UP=qC2UAT)+b%ĚM *-19b^#J⩊@v*[)"UP= xeCd&J.pim98UPy -ιgz:pTi1cud*EbSՓn+ֽ ٕ[_YDTNY 2MvndW+@*$A84UP=d&;m]IbË ;QDUl#*vtSd&J>TkDI:9 #.z`fȒxqʊTTQ+ǥ 4)j 3QDUqձǰTAF{g.2UATnck~T1uȘ(W*d*R:>)KTϸsZoj Q2Uik[*&2eJִGEEE<1k֬m_~9.]1a„hmmg{ţ*o5ű7NLTj\̨=Ā=yO 80z3za0Ƕm۶mfc۶m۶3۶mcv2:SA^z:Uݷޝ׸=Uڴiq)8:z(Ǐ,Y/ŞjEÆ *T„ c̈9P6{ͺ>' ~Pl6|@!ҤI6T} 6m[ &IO2%, 4.]B(Q.]:lذA:k, V?~S֭:̙3(^ڷh"}=1@҇/0 3""""""""e-Z4L0/_tʫNĞ>}VZ}/^.]{x)Bڵk;ÇG…tfĜIJ6p[ UIޑ*ww޹ UݫV:l6CbB̙]>~/c[n !AlYʕ+!Cŋ:z:1jN1ܹөY].[Lmv؉w=wc6~~,bX,͛ " Uסj 9>xp_XѢE }]8СŰttb؜;QGX~4rL6{.t:bX,bڿ?(@ U<- |}pEQ[84hXNS|_yz# TL? U rūBN"-JĈ={v%J_|$x=5j-$$Uӷ]""""""""Pje;wN]EJ*Qzuw4duzzjS Uh508v%(>}˗uVL6 'NĂ qF\~]a2:Tu{Ǧ=zկ_dA,9֠A&Mk׮Q۶mz*)c+P>4^}"""""""" Bsƌ=cǖ%1c wf͒1QbA(Q*U*8ڽ{+_|xb8*TG߿?' U7+o1{l|@DDDDDDDDA!Tpr۶mW‡/Xbشil6?._\-T|yT[թSGWw[n8/8VwOS\t9SJWCUqp;Q U%u C9PR%-[3gF̘1L& r٘9s&Ə/ߒmU݃R .z-ϟͱc!B$HSNŖ-[иqcջ]vد/,5g$Iq3M4j?$S]7()S&Nmڴݓp۷os֯_ĊgT7-0`3x"""""""" =z'#eZa6՝ݺuC@{K@,wmO%s2>T7_4ꀮ@Ŕc>`[Pօ{[!8#Ӂd9N{E@ޛ(q8A8l((y'|2|{N|5ifFUUsW]_"?[ 4>#i՞={P:,nѯ_`y|8QL=7ѓ'Oά ҥKrUWe4}7 ?T?N :T?^FU͚5|&ˊ,_wvLxɫq:? ^٨:sXn3O .˾~G^v|Z19U{lT}"I&Eqqq3i}UDeg_M[A?~ DT3fL6Ec1}o'7<.,Ue3E_[hLrkBu=M5%%%)²GUzcBΠbwQ/ۖ[nȠZTT~{DUmfS}Q5:U;bw, mڴnݺ12귘ogQuqM2%QDUDUUߨQҺguV/\3g?mQDYfGP]psAEnڴi|AAGUU;;?+ 80(ب jEEE6ѫWh׮]6ڵkX:OFUU~L,M;62EՓO>9g} p )뮑,$'СCPQDէz*uY'~\Q5o۶mXZ N0!jΝ㣏>N:)}}ȑ#㠃~o :{왍_ Θ1#vqEƍ?zTQ5x v}XuUSHmذalѣGDUUyQD2~*L;>f욪m۶#GU@T7n\6=:2c-YU˖-cԩAFUU fm{lH-.._?_jTQ5s)v1ju!Q_z8=z 6z=XzEUUU[٠;F+2s=B f3QKKKcW~}FFVc]t 9ӫu7jb޼yk_qAAGUUgΜz֭[GӦMsQYYzT[h|MGUU?xw&"?~ԑM4)ΝDTQo߾iuY'GBTQgϞMr-뮋' U@T}RDMAEvb1eʔQlTy?Z W`mԨQt-x≘3gN ؔߎN8!JKKsִ޽W^IUQQD~:tht=JJJr6mDyyy|AFUU>}zs=^{E rM6$.0aBPQD馛bmW6l?c Zرc /_ODUUދN:)[oU@T9sfD*1f̘ ,n+m]AFUUh֬Y6>) 4(T7wy'( jz5g8رclΥvm Ο??G[oСCm۶( jeee6o>&NӦM>;{|6J瞠*izVMMMG]oC s_}P0QD>UX@ftii j*V[m41~h߾}h׮]:v}E̹͛G۶mgIO4(c7\|U@T6mZ^_{΋QFE>|GѢE[^hTbW>^lܸqnQOLש{guVx yf}'QQui1yns-"=U:q̝;7~Ș:u¢j&sOvQF)3&Q[n\{ee_|3>"=WqqO^@T]Vk&mT'MU۷o͞=;KP\U/B&MV[mk斏{Z*iӁN;T/5v5s.J_W\ν1r̯k9M:t萟k斏{Z*x'L8V+RZj3vFUd:wG]kl2_57olA/g5s=- QD~!.Xsjǎ3fD=yǏOo}k5*S <Çǂ9|]3s=zt$%U@T5kViӦnHM}^1dȐ=S#57hto=- QD7x#RҫI&Eeic?F߫WhѢE:רQtcƌ^dM^s5 /]3mմiӴ^jii 6,2]wc=ɤ'*5_'L6mZ_3{FpOKYTQ_Oyyy7K[TQ"Z . ( v8qbj~*䋨 **C CFUUU,?w_\FUU7`TӦM㠃?LWeee 0 g(..Na2fͿAѦMnSNqǥ^vݻw~[DYYY|{c2ZQQrJ4o<ř8 6:uju]ѹs())!YfefϞ\Tϙ3'>裴9{lt==8sύ;>䓘;wn7** " *  rU@TETQuرѲe(++[hp<(ب #bVH/¬Z3ݻw=*;;wLa\ԥ$P @W;P|6"0fȒ4̲ u9%yZaTLSupMVQ0#@u>5庮\.25Lt:]Nvn5>jRr==7ph4`0G Qu(X$4MUU݇_@I5Rn\.B$A5bQWFzj_MRmblibPZ5z]qZ]y_>DU+? 8#QcQwl۶mKmfN=6j5+eԊET9T2SIqB!~^\.dvdVM& $Z$* dRD%bD~BWx|+x//9;8w݆}k5Va]Zeai&`atA톦6Zڞ.r~h{#C<0>3<43Żsײ87X*lN]rsB}7##_*FnB&7yFEEI:jٲez}n/nAӋ5":yyySOɣ>*'N,VwE\ sGzqujkזEɚ5kB z;SG0*_.K.!MY`\Z,?X(R0 0 ðΑ> db(?- a( ٤)cvʩ V,d/" )e*/ZPSgLbn1;w[ A!}*6#BP^=$G7IY;w$]kiR[6f%IChXO"ב:5%f5 ^EBT%| ,'~_>5`eO>5'\^_: *ɠ eu}Wk5Ze]5UCsx^Dԯ[@DDJƁ mVj*qrz]dd8|rwʂ Jy9C(UIRПٻwmLIU0 0,-WH!$k 9f55qkD>fܩoFhҤv>|XTcŕ%KuZg-=묳t;>~\y0՚5kʞ={1WbR0ja/ Z!O,dh2ȅ/ ~!+H`cXX͛#O#5d_2膼KhelլD4@rV%w +++++/V5fLmZ u ^zd;n5@7i|MİaR_1k,ӧ1GD ֫][ #aPIT:'F5 Tbyh6K'}g߮(ѣř^zSNW6l 0+{rWd"b^8?#p-U֭m0&U-[yռys1Naa@:ݝ) *=^%.ӧˉv);~6L뉟0{dpRH3}c!2ԍ@$>d>!d"_"#Y_㡕+x|ډ}$CPg^9M#GvO*N x9y{$F^#Y˖e {]kc9gJ?w>5&'v* ݺz`ס- 'bG3k+NƈD^60mu]!* Mdt+*eb=3h"UX zs4t u6mN;CѣGLov%//:Q[lQ $%%ɕW^)7p> O?'e:|p_|!>MlRh)S矯A LaZbw(Cc۴Z~iRjeZF Y.D R.hd A z[ [ CH"Riq !P)J#Y-$KמY3U?cqDC!mqȱyCo1C"=zDa^J~ddh.pQn.: 0N[f^TBQ~>' x)HO]Gza;Diݝ W6u:uO6MZ,ÔT~q P oTů1@6NAU&L`ahDll^IU0 úCqtz4w+*DdaFcd׾-ݍ7lOrפgt#zF:J$D9^HrQ0 0|~[;Q?K!sω+֗\r `޼yzjժ3ͻロV9{^zI//3fHZϖ~"~O<ڵkux~8;v$'X3FJ)&U墋.bqo]La6IGwhjum5NMd-lk*w#Lb1kl[gD6Z]]E'B1 0 #55NV_oz+O>^WfMq/]v۟%_Ẍ́[S_|4jH݊HSd(q9X,ԎFE)d'[hT~Ovݻw j PJUQTUB3h$2t~@E~/w}tՠjztmhF+K[캵vxV-(rR @UA HISj rUM)`4U~quyP_Pu18cGSZ``6h 5k*BUP+8i)-|` ͤ\R綀П~Ȱ:TsAաqT9P5!!7nlW^yyW\q5m *hvp*U jAL\;[Z@PHukQ ߆)))VZL-*U%&& t߹S3+UjEW6?BvTJ:١U7lؐ&8 쩧ҥKۋ/hEI PR޽a3ˢǍ-}z6?ۊ/JF աMu$CdB~y%J_-VZ @r 7)sl~USd]ްAա{fͤ:@nEڵkWo`hѢtgz歾k)SPa͛ۜi^rfuff߹POիjPPs!+\V!ؔ42Xn;NaC,S[׶nm[V(kKԲ5~m;Xsd;| 0? ?8p3> 7`{1ȻlCWٮA9rubk[~oSeOZ}ߵo8b/^dnXPUzw{ڵk{Oɒ%;?Ks:t`y4wS38<Ψmsmyl~oڢ_YH65m4-oaav~)BUQU`z禛n2d/BU)IIvr^۸Λkۖ-M+[6W-[Tl+hh[l[zcmϜ!Nc)r-ZpUة-~&۷p>ɶ UM٢O>Te L%ٿۺ[ڦ,j(=3]OlCZ۵kwVg„ H#۶Kl_Z6תas O)K'1vMVڱh7@ P@ѣv4*jS1lvhkƖ֫mʗ)xʕ%ujo[h>dXԎD[*y ԡCv`2ז6RvqOluqU;U޲N2 ` BUjAN/Nؔ;67~}l 6o jRR3J*e<]s5vUW=={4P@rBŇYq"sS~[ſm}m6[lz;{$%DڰaLooԨPԢ٪fŢ)m"G4)۔oɧNYPU4ӧUVP8vMMݺVyʺMkܦCR+WѨ(K:~w}6i$ Hɓ܍PZҦZs⧲ے5l7ʹ)UϺMKڑmo2P5((ȐVi [4gSlQlUƶ׶m@[MPUUPuС\PHIcۣmW4 і֯~XV|ڶc'NZ\F;o)>s=㾘",,/1cXF6oެ9|͜9.\hǏςkf=V~6?ۂ*rns~*j(]OPUTvNjռժw}K̝;ײۖ-[쥗^J Ϸ5kZbbxb馛ҼG=-ըQ.."رI_VS UXۿtm4V5ky9jΩf2َFF]tݻ|%''ۊ+.<<\mORRΝ;-5?M8}lBBAwt4ib' /-ڽ{EGG{Ν;GHH]~vo*Bg -@V߼ysqV%Jx5}e=Pl叱]3,kgmeQ+}i[ھ-!. 9|jFؑpK+ⲑ'x|ǻkzDmޠA̗ |_~Ů*=fWPC*N Pyʖ-k88d׮].XiAc…zBU@q7XqZrZ[ZԘ~3yom]/==#0U~rWO9F2ekk֬1]O.oٍ`طoS:unݺ6yd={﮻rC U]vrTQʕ,XZz˳[GyoۤK.WR4Z@%噾uOgͨ=ȑ#3}$ѻlaֲce=Ze٤ަq_p8htan` S}+,֭[LdĈ3ƽߚ?uC4h@Ϲ#%%Ei#^ޕ+WXTƘ:u͙32cϞ=nDvz7jM4_T[~ =|k7R~}8""|}WzM12}M?ΐ~/ 7T9,?~5ndj״\W6~= l:Kp83łjlݯ){WmfEuUjǎSF@~[:{gNVcu{YfBe˖y[Ւ߿[>uV*.^>[Xu7%KY׿KG˟<-z{:dZji5}d=ʿAcm}r,jZ@IINγΚ9-ѣ_Ϝ9|}G˗[jӧOww| *؛oi_|q|} Bmji6O7lЎ?nj[z-d3?CGի_p%\h\dI&Ç~k{*p*_ȶ o6٭ujYhsT;eP-ki U/PLU>E߾u״&BUQ[]o>.tT |=(8JAV ]?ݻk7`*-{՞M6g~d=H mYXN^_[+7hߢgΛ'޶nU W\qiE}z_^{ӷo_OO?WZ@z(kƪTgiaq>},wޱ)o'R,UԖUתVf@qXXk6m5ˎ{BUbU*\+ʶ-j 2k)?'OZ|LDˢ 8oѣ÷z…&3fpo9rD3Z][P!qlF>S]e*HPuРAp&'N <[nHHH2t?NJo(H i6 6|˪_ӿ츧 UA+@[դQζ׭m;Vc8{) vxVXv"4$~[o~J)LM]W^=IFI&y:|gT;45}zxO[WsYYfݺuԒV;?{l*UrT4tX+]t@hӛ5E|߬@ -^. q*ԥ VZsw}YjSQZݖ~DޅVG>~)ѢqIP5 Gy }3{iӦyCգG9={,+iӛJE6k,WY|yj׮mK,"E^;>m_@Xbס׳랏{.k h~c |$000Êݻ/-kʐ1civQFMZYo"-zgU QWnO=!CxW_}ղڔ)S%?,R6ٳgۃ> hٲ|m,o'fjr6|K/a6ߓ8u8i[4{9j[i]o*Q]OttXy_e*xdB+_{O5NeT,lYPuř S/"+^;qBUZ׭ѣlݏ-l~@NWm~ -.t$&@"TVm_gԨQVpaPBU:p\+>/ru-[=c߱UsaÆPPBzjCUi又SZ9?|[ݼh.% v饗`7PVCkX葶nie'NPZ@^F3[TR<"9yX+lS;x߹BXoymR]vePA<BQVo xQ K3}&Ol TbcmNֶ}F+?@A U .lm۶5u֬Vklw2-ZhӮ];BHDxִl/ ߇ T*K3Q5 iƶOH+?@ZG ݻlc3gδGSF+?@Uk7tyz 4RRR %Tc~ԈCk!C2K.qqqv5x+~m裏߷Ez_+Rm޼P`CUPu4 ;f%%%Y۶mرj``70ݕRxf͚]vuaiB,11oJթS?޹u5P Tݻw]p Lv!'O? ^;ud(*@*M6U`s!, t IJʜ9s5u떣ffBÇ<TjņP Q Uִ3eRիW'|Ҋ/O:ezmb +Q=6yd{SNk׮]ȑ#MNjeʔD-kײ(26hͣo߾VlYeebŊ:\;c['NXzj׮g'}.]ڞz)o}޽{+ ߉G?*U>nݺzUg϶;}*THߤ{Νky Œߌy6T .N?vW^yBGסC}ݧBA~G^*xΝ;)R*WloŊs!Sϻzlذ$!!z-f~g=~}_ Mu P=RSZp.}Gs5ktx%XE;IO>v绥TUVnͮz馛 UB $ӧI&܁P<6-k]oT:TU6GYRRyh?~٥7p*S 7nh:tZ䞩=z`o)Gv;otӦM& ^=Իرc.z뮳8`jU}i3Py܁P:@uļv"PeN># Uip15 ˕+g۶ms/|LUOU*_3*U%F PO5m_͛gUݻD ]v骚=-T0a 25:AwcbbUπ~D^:܃P<b4a`R=rb P70^u]իm]sٲe>ϣzU|m*Pk?*@TٳGT]tUVz|Qz^hO>=Rާkhl UBU|J՟L8b)IZf\rʳU k L+]̄>O U}bt]kG.]͛%TUzfjU-hWPSN/k ]Ǎguo?~PL5l05jd*JO7 oG b}I7՗^zɛucWPP]}=\p*jT>cOUsVA?MehbK.[=JW i[o)u{6m4vmzj݋~e}.7:3L:JʑUԖ-~_ӌ.].b[4e@Ou?rUf͚ϧ`]ǎ3_ j?GQH3]g>zO 4j@ a UU֌Pw ZuwP2$%C @ :$t(jeF RIKlIVK6ۚ\vR]ztpj6;^=޻q~}=,{T[/Rw_WXFtUj!VVV6*O?O=T477G{f͊s;bٲe)>^@TQ5;E7O>佫UG{QDN$կ~5x(*/~EM}}} UuSc/_xӧOGUUU?>:rطo_:t(#1KpN6MO_ף2d[7n\uΝXpa~zM6ER1JpN2G;N:u*^ؽ{wj.cѱk׮bn{QFEKKKTTTkצ=z4&Oޫ,y8nU@TMQm89YlYگ6Zf[>={ٳg&M*s꘨ J%~1q@sРA1vnOsټy̝6mZŕ+W8:ɮMtǎ1c1c:?gΜ DUyȴƛoy/_N-X _͘1#2K.Mϝ;yWNێ;V1ҟSD޸qc~YX1c1c:? DU~󟧠~6>OG>}}>%Z*ڑT~֬Yik׮E s̼S~-Jlя~Ok7ވǏgۑ5u̜937ݞaÆc$_UUo}+rtRsFիWrUŋ'OFފ+ҶSNqvDUQsٲe}.mxb477Gyyy1"ٓʔ~۷GĉcT1Usꐨ ۷}o :4EʂU=XwFf)LD[s קO>[P[[[dId8fuuuQPs꘨ 3<_,>O^|(8|p@`lUUUí~+W6OϢgzԶmb1xt}g9r JyNUUimm/ѷo,:f~;p@3&{Uoذanݺvoͯn?qD9a„xG" 3gƍ6>[Ϟ=kLJXsj ** " *  * " **  ** **  **  ** " *  * " **  ** **  **  ** " *  * u* " **gΜqƍxӧOGUUU?>:rطo_:t(#1KuNUUٿ<#i/_y5551dȐ0ƍK?ܹ .=zۯW^iӦbWs* JЇ` ˖-ꢼяƄ ݻ/B<1~L2%{֭mKq̜ҝSDUDUtEHU?noGzԩS#3slt{~ކ Ҷ"Ssꐨ ʐ!CbĈ?׾"'?u2Bh;wn]z5m7o^d/^^}m'N1'L>hSDUDU5ƍַn?{}WZ1ӱR-9u **  **  ** " *  * " **  ** **  **  ** " *  * " **  ** ** DUUU@TETͦ3;bc4c1ӳ-8" c1c1濘vEj*CnO~"0c1O@TET}/c1cw[@TET}a1c1w*èοb'c1c̋BQPWW1zصkW"t֭ѻwa[ZZ"cڵsGɓ'*++=f^Ωc**wލgF[1r?Fe{֚5k{LCCC3fϞ}_h۷oL4c' 2cƌ%Aرcۻ>|l޼9޹sgM6-ʕ+3TԩŸ{dWͦEcǎc1c13g*cƌIŋ|rz`Ȼ~z-ҥKsEիmaXE3dTt%ƍ-"M1c19x`:RSTd{Vv?n5kVڵk~mE3T  },{xUzSACӿ?~|dLuVd]ʶ8fNΩ(**K,cn6v@M:523gMmذ!m.%?f y.]JΝyW^M͛ŋ'O+VmN*IIT#ug͚1bĈ۳gOPL^o߾=&NDUޞO~PӧP dzU[s>gݽ{7 jkk~J⎙444D]]] ϩc**YLӧOׯ_ :4lUUUhѢٕ+WdP:`8n?qD++JtɁb̘1QVV6lX[.FT%ݾَ.Oۊnܸљ@TUUDUQ@TUDUDUQ@T㗿ekqx^z)NhP Baf`)B!}W8yR0Wc <8@QJoo =)%užbx^Z?_s?9i9пhA !=2`Yl=~Zq]>d1϶m۶m۶m۶m۶mR_zWoy$3u?7oW`r?>ȱcǔ}_یXm}CJxdΜYϟ4mT-H0`M߄ Dž؏)SHC# l5k&Z }ĴYׯ_S|]Xy rɒ%ݻ&GAO;vA\?jӦLzyQXlYk8#F3ȕ+ן<{L/A|mF!PT%jJgώ!櫍kDԤ>w\1xZ84h@6l v풪U~qƕ_ cƌQc_P!c&M0L,$NS~B4i҈e*ݻ>wԨQBo/ zy :K? Tƍ'W\'OȖ-[qmҥK')S4"=Ld"رc{*hB$I"'N̙3%bĈq c)A x)BB{ǎrIɝ;16.\랧O 5華ځ7pSkf͚5j2[kT>̋Wcz ! @! D;ȑ#բqƍU=`ҥKΌ 3C;a۷obMQz:/^\ϯ֭[{*Ϙy%J =TP2zhz:Gz7$˗G?.Ɓl-Z4Tԯ_D|mF!PT%,5j$QT)e>|8\Ejwlr΋K,A%WkQqhCq3_ҍ J4WftjBEUBÇUݻc( 0 7*D:H̘1aSUNDG<{ PT>~B 砠/^xFQ|E<‚ k˖-S5kĵo*Q?GU7ocs bnhfC?U!ܢ?5b-Qh{ꕘ۷CmmHaYUیXm}CJڵ LB5e/O!J"6HNC1JJ C9a†TUw#B,mڴ*={H0a;TDb{Ft("P'J(̘1P 6UM^ :b% l\ɓş?r1U=3aCՊ\e6,FD{q,Cj2 π*kaM4 1%VX!UQmfp}9e>}8qy JtWftBEUBİ $|>n^&,X0 #G&8~Ad" _ڵkB=[nU?lRA; AG^b?{-[?l01F;b) sDU&5[Ě!p]|Y̴hmȉm CMuOƌ=zYf 5華ځ7\D L:ըTac ULq 9>F>DDg\%AOD5ϋ5j8l~nݺ'NpiϢע1(QEUD0H*XWT>m3ٲeC ra:e#Gĺ݃96A|mF!PT%ܾ}[̼yQX -Z^+\(# F,Ǝ+ 2 {\'~B w Q%NJg0C*5D>n>P:$ҶӢwI*zX[T~:ws!b' یXe}CJAa >6mB $jܭ[7r|lt?*o V,B{#eʔȏiSQ }sQDy(A{sND`Mn/ȝ+/͛7Qȡ>}uU͚5dɐ!zmҥB\PQa!P V@|HTԩGB}$W!^#տUVB=≣U;q?b}{ADx  ¯[hFG-ї6ȇ)S&D/~fԩ}!AٽE%n_7x:sq @JpQurkτz_]ѵ o!U !䑘ݩE!Ϟ=)1tA$y-6/k+cܙ+RI]م`c_@Wn޼(A;(qA&Nf* * ** * lDlf_[DUxػX9( m۶mm۶m۶m۶ܷݝmӼ3;;ϹE͓AJ@wy7o,]v' $_|I!˗/J.m]?~BDDDD%}.]:eϞ]*T ۷sΉ3 Bfmkזk߿>߱c']V_KʕԦM?$KL6n(ϟ?KŊ%^x4iRY|WPB^=,)RDԩ#3f̐?#TׯGDDDD%Kc[v<2T})1=*> E)SLبQԥK} !ٳgU>J(rU1={LV,1T%"""(f3gN[/_^Efkq*`ƍe„ >ŋj6`u+T=pW9r(hUn5kJܹiҤGyq""""PYD)Sc͚5CUawѢEE U߽{zrc… N5;VkРLJDDDD U=UձeRyiҤd͚UB e-#ϓ'K9өz0 6 F .eUKQa/_>J*pEI:5)X`V?1cƈzk/^\޼y#k.)X ؇۷o2|p‡/"E¯~]}2ĝPjUPu;T}P̙3bB u,nܸEDDDD U" T@.YA*9%J$׮]Ç%UTN۷or"p%?qqSs?.J~:Զ ̰\>&{'mڴ9hImi&{SN :tH :O˗/X1y\JDDDPuֲn:kwZ~@[* D ^za8AҥK<*GQJ˘1c.;x`U7n;K.ՏGgٽ{7چsʥ7L7R8V(QBm&#F@c)_~Ν;mcϞ=%jԨӦM U#Fh5kѣ%F:О1c~6DDDDA*CUw)~zc . !9eӧOo:;/$ImUj z vCUg6leŊ N?$ITۥKl^'MT&Tݚ6m>{GXw^!""""DDDD UN'N֭(E2y!L<*TJV6l7ҜzꯪaT"Hu:FǏl^M]>U~}D+582DžDDDD U'NVwaPpm߾}*Jk$~9VPq }FERxLGիb9 m~դIkTT6TTR]vgpf_R9e{^j?c+M65[A]ºܪU&ʕ+q7kLfv JY旦+VL޿/GzDDDD1*aؒ  S3fvljJR֭[ {y]A(&;~zL7 JXo{^ϙ3>!Q9rD'OSx;wn+,4AXDut !۶mx>N?,Y;/˖-8Bc8ǜ9s~8#G*U/zB zjq߭^DDDDDDDDe_Kݻ͛d~ 8s _:/#u/j>zH_xa S@۷c>V&op~sW""""0T%""""ReM6GhСCDDDDܹSŋ2n8};v""""g1T%""""J6-L вz… [߿K7=x@֭[g)/^kZ=o~c'0b%U*T@HU U*T@HU U*T@HU U*T@HU U*T@P;v`0 Da*]+YB6x$祥Tq7^EDDD[rEak$'c`:$RqM@ծ 5}3'F`RRIi (|ctvUvi*Z">k:f-ikA7hV)8QaWs_1MmJ0J@+h^EDD\%B@FN#kJ`Xp(w&`㩎G;`؁8A07& `N>y|'ȷg@G=Ե؂:A?BityYo9<k\ro}eT݆a( [mRL:,/Inib7*S1չ^֞Jk$OҨ~ps.䜜Y{^8,Gˠ>7m"O֞S<ʋ @6chǖA .!`q1 |f2U+6GvD"p&E]DȩԾ }| &QoN.ntx9" I Vq;E{9l.I'>!`>KV3f[^e.b^5D>@8)p1w@D,^UQk?!AC ĖɀȦNGc5SYVsr ,.-q ٠k+ D"ȏ$۩n^"ecJ k.Z>m5_DĒoǦ'(:]Z="/w<`nJG$D"%7QktE @5Dv'N]X3|@&ZYNZ/_;O?vǡٮ {uD"HjE<oDn@$fHX{UmVcA\HT{/HxXwr=}2J֏>nj3} sk7;֏^}KRH$@l&"^UG#{,, ~ekcKv9O{~זs@o֚d58@Y[N4ڄz2+ĭRt_/ԁt3Z{u5'N|]J zoEoVEE{e3ôW {]:w=|bv{b #??A0 8 GCpe\YD` ChѨ. UQQk{mjU6̼×}I{ߦ4` { @ 0@0`0NLO#c\_ʽym>aJ&a+ }{z^`mY>|9|%9WfZ;գ[l 3z<=@7qYh*:u6/uM2 6jЋ뙟g3 zQf1I4kͱX[/3Zȼ_] SlMopoL7Zު-YMӫ{@n氆2\m$s/ham@`Pۭ6Zr?h74ÃQL,gF1ugõ 6318pw?@g*(\iwZ_O Vf:UO [1MVVkɴ I@g>tc@`!l3 `@9bÀ Ef#! ^)&Ccj8höJ^Ly\Gk-:Cu 2fK^jmŤ83ہ^;>:CةD X(S߷ S'mi=3t̐^]`,dv3N:Cf԰wqz_zZ@gȌ;=v3R!3d,װLjz. ΐy4ȜK3djcoj~atЭ-o`Ԯy\Z=!umQتۇ_.3a`.VP351"ZxQ7 aRpW[*ŭQj+㱗ta9F>\Uth߯w=tܡog.bh~}T of~U+fRf|PCF3dz?e^gA̶~'Π> ޑ2j_1sU[mc-еD!cr/?2?0C)̿7Jtt2{T'J/sRΐ qt2;S7ΐh'X Fw^m!+z&:3_dCoModꔫ{@z3>\U>󸨭-`٫hب @ԞhX3`21Bj_̯3_g`Et` ~||7/3?Ӯ^8 3mW3;7 8 ۙ+\f|1B&JD4@=a޿m۶϶m۶m۶m۶m&Iٳu{I~ۙ)rogf 2qRjW                   0,̀  PZuʱUpTu_;O=XU/}o~>kmܭkV={{3.kO|ǿß<1j r_}Ox]n=ɔ}:Ԛt[~d>gPcL 8;"?C Țlw-vv1\>!' ('knvm YݟuWLfEŷ>q k_hz֙Wl/f^G|:TJ底ݬ@Q_5s.DB@^*9aπ$4h,51,IƝsev,v䣭^|˿3nߋ6ܪeV[?4 Hfܴ,JlɌDM?<7sʾkHƐ N33qH4Ɣ3wRտ%,@B@JبoFRܜg`H@d̬d!}|}3˝j3 o (=nJ/*A| 0{'tԗe_L^gV/B9Y^dI-"WnėQj,jV#cpfJ^y<27_w =3vxҧ"$,9~<= =_*JeCxA[H)OʿY47LLGKflrMDNU.}']6ے6/H[,}*@R0j@@@@@Vc`@@W @WojCf@@@@@@@@@@@@@Rj|:wG#8ͨzgDW)ƨZU |pTmT@@@@@&0 `d{;Fi8L #uޥWp ]Z\T>!C7I%M?xy @|\@ @ @ @c@@6O:] @>`@y>Y_~<^ ]@4yz 8sA0>~O~bv^/eכ>]_Mo{ծv]߀Sf   @@ڷi/{rUfH``H   ӕ( !98a# IN"6<х8&f{|69޺p!ݗu/nĦM }$eA/֭x=ws!w_yu8l3na0 TV-^ܽ`@/ J,ݟ=F#RLJAͱXX}0 !x…W׮>|+z qr<= j JrE?A;'NTtEëWc%^6 Aro zq؋{PSɾxAӾ1]_tw_#&o8p˖ bo\OϞto%2#q4Z,5c%KZ,܇W#qZFS f]6 a6 &rzAFl+áِXg.=m*S!hT.|昭:*@+!yp^qT:) \Qk%\5cPCu2,YCQǫNJ%7Qj0n Ǘf$nH@9d}Wu!{Lu'L( fKd8$uekJ1<ǞNY=2]7 a6 *-Z2U(ow 7pBW~*'*2Z?G%ҪV{ڱ!P$V3يe7yTch;֎1T~!X-Ζd!̈AUcң`*YFٞk!JjaSz$61HU Je1/=zb #y$6!_/^ǹg2:L{=u\ƞm0 Ȍ23ID*AR)V1 0J\(@Ȃȩ[S*v&F" S+׋5J>$VsS2b2uc RbZ)#eNd0`*96h5^DT5>kKXJ/_ fX԰_q5X~p.`ڀta؀ H$6G*u^"bgU*L_[O2Es!TncԐ 0*QΘBr~S=V DHb5s!X/_=Vr"9XMe,C[L0!%ci>OXu-̮iE'SB^V;V22I=>^ܘr. ð5b3aDjZW~O%ʶ_9| P=Vb!T42 ]ppWf SeS8] Uc(XrR3.qRbsXU ~Ǟn*5L3TlG CIs_aj@Uo 1pd0 Ħ*(*-X_=t{#,MBT<2%J9ęp Y!X)0*Qr/?[{b3/C0Q`y?J TSƊMJb18щ!XJΩS偘KbU<> ^_0+C@SczKuL%S+"Tj! 0 AE p"V%jDX|"I-q'TX,b,ʢ$tSc%w`A|sSj*5 q'k^R߲Xm7{/ fCb_OKBTp{CĪDyj*1z)J$V59s2X>JMB@< ==Ir*JMB.^y:y^Iz5߲TF~I݆a6 J>JbLSW8QV} o|Tp|hI ^#nhk>W2015yPbOnp ʧcx5qrQ2Sɾ Ȍ| o|TpҪϺX0qCOd?aMS>,VTT=z7q5bգK֑GM<%SI[sEߑ5VTwR xیl@ 0g@+Q5Xr`R>{f[#rc1;w:7X~53*u:y/U,Q5X-Nb$_Y=ȓO o_:5e{@ 7̛X}{^Ko;o+>~wSc0 aaaFO?iٴ~=Kq;'N4OnڬYyd͚={ol˗77y!,^z<{vBﻧN޸1Gv0evp__8'pp2!\:ystΜmC[b)b`ֱ1lCY+Euyg69Wrm(#~3q`>NHm(+oZ]!,9bl󗜴ytΙ_\?Qn亾 U5'Λku,]/gʠ<3{/_~]ۄ69?lܮߗɇk0/e@@_]X*7 aF zTDRR/%0Clo |j վTT Hْ„[NuB?ZWQI;Ze#2/][b*P@.] /_ggcڼU_XfT1G_z5 \m+7&"Ǝ ;"Mbl@ 0:k@h@b7߭UT-LMu2lbq%jl8*$3$Czv1#+ST݀ y, T!l"iAzX\w$3P݀*L1XR݀Pጿl;64Խay6 a7 "h)͸l@ FV.eʬ}nGPcBsb@\#R]0 G!+-_sT*W7 /I5Qyd .|/=sw___H a - @%0ÙRۀ0TK aGqh@h!D1!G1ۍe@p6 v*7z2n rRk 5rUa+w!6$w'<,aҵZzFWK9%z2&EbaM=dSc@t)C -ű*O0.Dy8DbIc*[1_eZp Go5c1$š`8D0m4 0" $8J:_rgcu݀a؀:FZQ=F╛8DyT^* лDI=G 1c ==0,  !a|0o4 qoM%\Pяe@3r xÞiA+ ADXNc6-RwkbU`>/p_839W9GGX&}cL&_8y0c[qC=ɟI/=I03@<_[<=+0 Dl 7lnT Bb.i% /YhU>7ʅʒIVqYn i&+#Q1.1 AGDb- ᗾ#FqN./eQ#9Db%Wټ"]s#E'n=|M$NEC81WmS e0-=_̟̕΋k;//121}bƫIz H5a2BełЙCh˝.ħX9vbEI%yՏsR~JRc5__j0 tձ:Njt qԤaaa aaa aaa6 aaa6 aaa؀aaaݷ 0 0 paaaaE.\h]y|޾ɓ_tl߼[_޿yxEVh߽7Mf9ӬYydކݻXڶٴ~}\xt/sJ>شip.]dpյk䯸ux Oٹsp?ĵ /f󮼈0 ØX}qϞ {.l\ܺ9~@wǎ-C*VyG1Oo[A V/\]SI˷7n.À Y\=}7K6lk!x}[HM^m@ ðbYZzk@csv ~^JƲ+iu>xobe g Z%~ǚ/+cIh}Oo% NBW*&Ͻ7 Ҿ};snX5{l9=%V:0 ð0Κ7k/c}1 ONb=b!J}kUE#n@ ..f@x|Eg&sޗ]{0\a<aFl@0 d]=V՛X_?r!܀a.;BoGJ})Oɲ 0  !Yz Mob%Ɛ=X!;L<&9/Sd~KCCwzSrH.@M #> ;cH H0/yEuo`1SOĸ"Է|gl=,ayL'ы1 a$-KEyһ!,Xv scݧXBsx<ìXcz Fl1azy 0<gcSxڛ.S=+ VW9p_1 ðqEXtd~øk0 0 0 0 0 0 0 0 0 qaaa aaa6 aaa6 aaa؀aaa؀aaa= ?:͋8,XylivY'blYt=2 0ç΋8VXt/x0 2 0 0 0 0 0 0 0 0 0 0 0l@ 0 0 0l@ 0 0 ð1 0 0 ð1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 ð1ϾroL]o~_WN/~1ǩ618Vm\|7UCϑu_qYGĿ ynˋ' mL9>*0+Y_W78۷Ʃ{Fr5 qJ g9K@Y(>mmo_r#1_%byw0 >77e~u== 6߼ټv0w6:4S>ǟ"VΑsX[7(q?شic6 յk3ٴ~}sҥͷ7nTby̙8g߻cG/fG wѢA\ԶmUs֭c-]ܥܻztw/X|}:-iXu\r'2Њl3}SUsΉ+opn[.4~A<֮moU۷s,·ܣ<}_ڷc m= /'p~S%r[ 8٥KZ)G;v`6 jXEE*q*5VQH ʅU+3G+ UEQJBj%k)SIeM>X/8cÏNJɓ%:TQ-q]#$6əz/IahiC`y{eSCj/&۞<[bf=Fس J̳g=kW~* EEg!-\]5 a6 J+Q*HڪK+@%6G'3gjug42[%*5T'8%S}WdX(wb|ڱb*%ܕl\ \2;RY/\tdMe2Me)S}CL'ôRSv\Wy*JT> 2sT7 a6 f+pUZ冉UrzQ^ +@ZЈ!bb5-ĞbD0ieQSUz==Ʉn8QNP u&r(j*JLP*Q^4Qqm*upNΑuXo[9p 7@e6:LTX-+MΟ2w)Şn37 a6 y p*lhz`Yl"V3ش I& hed ԊScce-(T:9 뚩XC*g\= ӰFSYy@pW:g*uҋy?L_aX5yT:z ^XsSɢ9Gq;6Sakl}1 ðQwqe*Tx0fX.9L2/!Hk)Y5XELL 0],-ْXU9R&%J,V/T2L z&RsXMMe|2OtʦZYT1VF[sSF=܇kJ<4B8un4nǽL.<l鉋]4 a6  RlbPA7(xXxdj =W`qT5xqr^22Vq^tJQ"V552Vom3X O2ToxQqĸo)P:Y*P$6 `4\ ;g732Vh45}Gt-g@^3S)$4 a6 Э=Xֵ[b *|{0F:\4aQCn$G.(yX:$S ,%V%IjŚ5fj/SY2qIihdm3arXռtB/'|g R8<+3\ou5=4i&s 55RS+&اk0  ل#-[*Y.JBHb7`xRF쓽Eq:xA)gB/?`0 ØznӧEI}?~@~g^y߳0 0 0 0 0?|H۷7í[ͥ'>;X9qcǚ/\in~sv=օ +/0u1be-ڵÇ97Αs?:wNw<qQVm^޿'{gΰ<^j^=t(@؟8`׏Ic:$˗_Οs8?s`b!<&_k)9OG‡9fgO.^ُ(7__Z/~zy(Yve$ 0l@XYdɠذv`'nMz}7^ܿti*  [4,\7%$^CCܵq)~A`E/a+/눓x}:a@߽_A18U[Wˢ|%61Uϱ_'ߵ H+% מ٥K3(1 ðپ-VLņX~TlT"x8ձEM]-@ q!vLI[ QR"Vm@ȓ<=yxfNq2B0P󿾣kS"63"Nߑ_DǬm@a/9 u ř vp}}K4Tema0 oyUrnmj/Ʋ!Ysb>վQ9ܺn@ 0l@ =O[ǥhWZd߸G+1*'6mgE2ߒl@SrԎ$Tٷc0ԡu,c=C& l1鶩_S>1wOjoXmbo(dxQ3&~5봷N1Tf[܃|4s~v360 IVӢV:eí}L "*Rc4t#5 mYT^lXѸxBND<81`*@ߘkze{Cu9/r o}1?λ_w_/!kD;cb*0 Z4O*5Z6 Bde'NZ5 N+u \*VDט&}p"_z*Z/Ba?k@H+eZۀP?%]1`8;/\(_ۀh=2-k@01%v݀a؀h1F$VZ+ 5 tӺfTB뱆K=B.9C4$CXp1Lφhm@\4Ab+R=1w_L4El)m)Kmübǀ;[\o@ 0l@T"&cƝ' = x*D8Cd(($>шɫ>,2bzv}Z5s1*SZ㣛J _)<Sv#eXӀ81ƦO_/ۏ_-8"vPN r1 \)1 ð FACSg(8%UDz+DHJeG+>TbFԍe@3zz&9%,ιz}4cT'aCY鱘z+'&S^u<Ӥؔ Abi΍|pe@6)G=h/G^ޘյ)&S 5Kq5G(|zM-u틿6 a -`XYݨ22Zh1v dl6vzia 7iĤI alv^ޕT"15iR9Us'Va;۱ɣ8\\xw-s1VޖMNytsbTNy49Eo&ν;vu}i߾%>ފ͵[*yx\S_ba+0a؀\:ygfʕ#k4kyr…fڵT{lÆɭ[/n_a;gT%Ϲ1ۆ߸Q繖SL&Y1fի~t:g;r:׌%"_wѢ+f\Ol4)^dI߰po{sΟ˗XW  H>P2Wb,(;GrJ˾ #'  P:ctf=AXI_uz7 $%)s"En7@د9io0 /@*Jr\9- ׀|YAH3uLrm/{nR EAuz v8Ɯ^ rd3oRT*"Pm-o-~HHי]01.\`:-?GȪ8~wrY yRT ۹[%)_G$@U `p{)yq♔3@ps)@ج:<. )T*J0Gة{0>ָ/@|;{חDU'9"t a  9iNS@*J%@~vrDMit?r'pqRhߥUJIE6'\@]Idl 1|wS& J iX"S5(^= ;bIU#@zkȀksm$Uw1kz;/zˉ:T[ q><=5~cN&v_uQ_t9j}Լ:7nذtXϷHRXD|9ؾe|khW9QÓhza%0/y;5;@$l!~fjC旓1UtY%Ϳdۥo' EfQf:8KD< @N[mV[er  @tvm!읻c;`-GXED'6\o%Mg%ʿK}p t lCM0mM%IENDB`md5-simd-1.1.2/chart/server-lanes-example.png000066400000000000000000000152041377566263400210250ustar00rootroot00000000000000PNG  IHDRu7QPLTEzff=33ə ղ\MM"Ἴ A66ݹ.&&900`PP忿Ʀ}}wcc& XJJ*##6--~jjάүTFFzzkZZmmٶ2**@@@o]]s``E::I==vvppss¢PCCʩM@@dSSgVV $$$___ 000{{{444cccoooWWWkkkGGG,,,CCCOOOggg888sss<<<(((SSSwwwKKK[[[vtRNS@f pHYs  IDATxձ EA0B ?:ODfE]‚Pա:Pա:Pա:TCu:TCu:TAu:T@uT@uf|ՁPՁPա:Pա:Pա:TCu:TCuT@uT@uՁPՁPա:Pա:Pա:TCu:TCu:T@uT@unmn̂?uf w'4M 4 ESo!u.o~wf~=23˓2xQ^d VBM>ܢp'#!XpM2-ߜ1t 2Ye36alZ3RhWFXʒ.sc gYa #Cb+N9it*|kN0g9?^pZO$=߮=`t44 ddQgf]F]Ԙ3"/_O`$[<+ tVh'2I[e9/3< 먗ȪT%ѫ@y@&:udt O*sMѧ hp6ڮTW1ll3TX::Qwg^WĞ}7{ޝ)%Οqo4+004Um%˴VPqv+V?fla s8B%`80mmq8bRU ϙOXgMpE,i9[xGc5x#H]Ԩ`2B']p;!i^#|+O =ka:rOŐ-/m.Rh>Mج"NL&!ZӒX{9,J{ JcUVç1GoFxP(a9%sBTbChDomfI& %>XUؖ .;Uqned pnNC5bsBh?f8բK;055P 7C>ၤ|u]z%mӝgw>$unbhAAs5H~lH )%Ş+tYGz ĆPs+b6^4Eœ;EGO,ZPV9:-ꂵ4e:ŌCgQz< sԇ+`pKg_k?Ec= +S{fQnA1bp΋/lCa-ϲqɽDfЄApsMAtD"Ӌ{7%d\2e7 bQ Wcm(tvKvAǟf6r:E0.[#H]d31R3 K%U# 5ȑ-`Rꏴ]d! efGiIĂ(~O$hs u7G^R9t=:X u;\b늅Uz)AyUֆ8WBh42)\ꢖ2XmJ@Am*DOO٥.A ?fIuAF6W ur ~2D_#|Gg,w~&5=wwwfd6?/\g3}ػ&LF5Cᩎp VGQhu BPhuZBiWڬp\iʛoVYW+#w@quHٍ8\ e\'#Ȃ+(#F3d |x(k_3c]ҽ^{ Է##^нu!ae.ԙ*h!JegX'3)2fa;6\WZsY|^,AK1!6Rj*=e ?9#pun;V\RĿ Zvr "ֲVVΰʂT`x'ϔS^-Rv~ۃ7g \zl;Fo?~^1޹}+2^b¶uTBC(qsbŶ:m׈FGU3H۶`l:~}+#a٦:?W,j)mun $Â1J6cM Mu* ?whHKV'`յF[S 3@2=Ѷ,Kuuc-1~Kul+FxWWAZh[݅Bcp-afzo r0urHT灄AcNRBDpr2 DKu)/rri#֣%|{e-}8^9hu?{`21=ccaQpܹ: Kus;jlf,y:3R8Wgu-եcuHv0?\]Rssu6/~F=u=&An4JZ:!Jn<\s>Ku-XcJ C[UF%Kݍ x2tdn϶ ⩵ |IE ne]{R_ MO>[umnF/1Mjr ?~o2YT .Uf XZx zw&a<4PGzaji4rihufj_zp }؁fqYرĚUR f tf5Egb\^`/Dz;3tCn3Wo@׈5btB(ZVG(: BY y]iuþZNJ'ȯއk~{1R %Me0B%< K7Ϥ2齉ρ{vpuNdتguI!\Kј#J{}?}^S]f\TM{5Wה&Tuw6=_٧jT":u0P'.F*An۹ xjø.w Rث@Uuw[ęձno@sgVw\b* 䦺>d,| /n3 .mZsu1mT քgK*^YB͠梺_] / ;VGSLabzSYa]\ .bqj\\%m Iq\ss:-s䈽_jH;_ b_>g85 oYBM1 5ZО$)#n. .ūlػ A(&,ag ?K0#gCu:T@uT@uTPՁPՁPա:PաKp^e_PZ~m􈌾"iN+>;qY8tbgC 33-3G։ӒzCV4RCF, u+ D}L2T#>,[,M4uLa͹xױ[m˦S(DKn1m=n@&P=UnDhQsHx_AOD'>ak|آ;ߝ0&F&c1=<`,8D7uTIǤ@0 JɊK ֮Szjn$@E9tEeo-eDBm^b͜Auk8v^՘1%"k(Ϋ}Gݚ$(w PT:5u.[wN4^!x;N~D':; λm8ְ@ɩ ٯ궥Ez;LKw.7e7ۖ.}J%XGkZsu 4_=#u?s[Etskukr ׉T H]c4Zmb;$uI+r@+X@!u1lyxױ[m2/=}^XE u1R th w7jM 3uQV7R~}\d fX>wJG R O}$!ܖ^vױ7ۖrE֣y3p:y[,xX"TѱnM_~~n7EF~).puSs,H2W+:y514N`GQ*f2Yb=LYrulhFtSIb¾ϰHݍ~=GsEusξG]܈HCˁ `}DRu0Q >r8zm,O%u3-/6γ:::R f)nEShJUgWਃs7&P09:G]&rq*kF\x ~f&6srZcٱ6+G@%uuͶemr0n#⬰:)N10v@XE\T_%4S?]5>%܏ y#]g&2<:ssĜ#Cu(ՁPՁP|(u:TCu:TCuT@uTNPՁPՁPա:P5uPՁPՁPա:Pt:TCuT@uT@uՁPՁPա:PJPCu:TCu:TCuT@uT^',i4raIENDB`md5-simd-1.1.2/go.mod000066400000000000000000000001301377566263400142550ustar00rootroot00000000000000module github.com/minio/md5-simd go 1.14 require github.com/klauspost/cpuid/v2 v2.0.1 md5-simd-1.1.2/go.sum000066400000000000000000000002611377566263400143070ustar00rootroot00000000000000github.com/klauspost/cpuid/v2 v2.0.1 h1:lb04bBEJoAoV48eHs4Eq0UyhmJCkRSdIjQ3uS8WJRM4= github.com/klauspost/cpuid/v2 v2.0.1/go.mod h1:FInQzS24/EEf25PyTYn52gqo7WaD8xa0213Md/qVLRg= md5-simd-1.1.2/md5-digest_amd64.go000066400000000000000000000077121377566263400164500ustar00rootroot00000000000000//+build !noasm,!appengine,gc // Copyright (c) 2020 MinIO Inc. All rights reserved. // Use of this source code is governed by a license that can be // found in the LICENSE file. package md5simd import ( "encoding/binary" "errors" "fmt" "sync" "sync/atomic" ) // md5Digest - Type for computing MD5 using either AVX2 or AVX512 type md5Digest struct { uid uint64 blocksCh chan blockInput cycleServer chan uint64 x [BlockSize]byte nx int len uint64 buffers <-chan []byte } // NewHash - initialize instance for Md5 implementation. func (s *md5Server) NewHash() Hasher { uid := atomic.AddUint64(&s.uidCounter, 1) blockCh := make(chan blockInput, buffersPerLane) s.newInput <- newClient{ uid: uid, input: blockCh, } return &md5Digest{ uid: uid, buffers: s.buffers, blocksCh: blockCh, cycleServer: s.cycle, } } // Size - Return size of checksum func (d *md5Digest) Size() int { return Size } // BlockSize - Return blocksize of checksum func (d md5Digest) BlockSize() int { return BlockSize } func (d *md5Digest) Reset() { if d.blocksCh == nil { panic("reset after close") } d.nx = 0 d.len = 0 d.sendBlock(blockInput{uid: d.uid, reset: true}, false) } // write to digest func (d *md5Digest) Write(p []byte) (nn int, err error) { if d.blocksCh == nil { return 0, errors.New("md5Digest closed") } // break input into chunks of maximum internalBlockSize size for { l := len(p) if l > internalBlockSize { l = internalBlockSize } nnn, err := d.write(p[:l]) if err != nil { return nn, err } nn += nnn p = p[l:] if len(p) == 0 { break } } return } func (d *md5Digest) write(p []byte) (nn int, err error) { nn = len(p) d.len += uint64(nn) if d.nx > 0 { n := copy(d.x[d.nx:], p) d.nx += n if d.nx == BlockSize { // Create a copy of the overflow buffer in order to send it async over the channel // (since we will modify the overflow buffer down below with any access beyond multiples of 64) tmp := <-d.buffers tmp = tmp[:BlockSize] copy(tmp, d.x[:]) d.sendBlock(blockInput{uid: d.uid, msg: tmp}, len(p)-n < BlockSize) d.nx = 0 } p = p[n:] } if len(p) >= BlockSize { n := len(p) &^ (BlockSize - 1) buf := <-d.buffers buf = buf[:n] copy(buf, p) d.sendBlock(blockInput{uid: d.uid, msg: buf}, len(p)-n < BlockSize) p = p[n:] } if len(p) > 0 { d.nx = copy(d.x[:], p) } return } func (d *md5Digest) Close() { if d.blocksCh != nil { close(d.blocksCh) d.blocksCh = nil } } var sumChPool sync.Pool func init() { sumChPool.New = func() interface{} { return make(chan sumResult, 1) } } // Sum - Return MD5 sum in bytes func (d *md5Digest) Sum(in []byte) (result []byte) { if d.blocksCh == nil { panic("sum after close") } trail := <-d.buffers trail = append(trail[:0], d.x[:d.nx]...) length := d.len // Padding. Add a 1 bit and 0 bits until 56 bytes mod 64. var tmp [64]byte tmp[0] = 0x80 if length%64 < 56 { trail = append(trail, tmp[0:56-length%64]...) } else { trail = append(trail, tmp[0:64+56-length%64]...) } // Length in bits. length <<= 3 binary.LittleEndian.PutUint64(tmp[:], length) // append length in bits trail = append(trail, tmp[0:8]...) if len(trail)%BlockSize != 0 { panic(fmt.Errorf("internal error: sum block was not aligned. len=%d, nx=%d", len(trail), d.nx)) } sumCh := sumChPool.Get().(chan sumResult) d.sendBlock(blockInput{uid: d.uid, msg: trail, sumCh: sumCh}, true) sum := <-sumCh sumChPool.Put(sumCh) return append(in, sum.digest[:]...) } // sendBlock will send a block for processing. // If cycle is true we will block on cycle, otherwise we will only block // if the block channel is full. func (d *md5Digest) sendBlock(bi blockInput, cycle bool) { if cycle { select { case d.blocksCh <- bi: d.cycleServer <- d.uid } return } // Only block on cycle if we filled the buffer select { case d.blocksCh <- bi: return default: d.cycleServer <- d.uid d.blocksCh <- bi } } md5-simd-1.1.2/md5-server_amd64.go000066400000000000000000000232561377566263400165000ustar00rootroot00000000000000//+build !noasm,!appengine,gc // Copyright (c) 2020 MinIO Inc. All rights reserved. // Use of this source code is governed by a license that can be // found in the LICENSE file. package md5simd import ( "encoding/binary" "fmt" "runtime" "sync" "github.com/klauspost/cpuid/v2" ) // MD5 initialization constants const ( // Lanes is the number of concurrently calculated hashes. Lanes = 16 init0 = 0x67452301 init1 = 0xefcdab89 init2 = 0x98badcfe init3 = 0x10325476 // Use scalar routine when below this many lanes useScalarBelow = 3 ) // md5ServerUID - Does not start at 0 but next multiple of 16 so as to be able to // differentiate with default initialisation value of 0 const md5ServerUID = Lanes const buffersPerLane = 3 // Message to send across input channel type blockInput struct { uid uint64 msg []byte sumCh chan sumResult reset bool } type sumResult struct { digest [Size]byte } type lanesInfo [Lanes]blockInput // md5Server - Type to implement parallel handling of MD5 invocations type md5Server struct { uidCounter uint64 cycle chan uint64 // client with uid has update. newInput chan newClient // Add new client. digests map[uint64][Size]byte // Map of uids to (interim) digest results maskRounds16 [16]maskRounds // Pre-allocated static array for max 16 rounds maskRounds8a [8]maskRounds // Pre-allocated static array for max 8 rounds (1st AVX2 core) maskRounds8b [8]maskRounds // Pre-allocated static array for max 8 rounds (2nd AVX2 core) allBufs []byte // Preallocated buffer. buffers chan []byte // Preallocated buffers, sliced from allBufs. i8 [2][8][]byte // avx2 temporary vars d8a, d8b digest8 wg sync.WaitGroup } // NewServer - Create new object for parallel processing handling func NewServer() Server { if !cpuid.CPU.Supports(cpuid.AVX2) { return &fallbackServer{} } md5srv := &md5Server{} md5srv.digests = make(map[uint64][Size]byte) md5srv.newInput = make(chan newClient, Lanes) md5srv.cycle = make(chan uint64, Lanes*10) md5srv.uidCounter = md5ServerUID - 1 md5srv.allBufs = make([]byte, 32+buffersPerLane*Lanes*internalBlockSize) md5srv.buffers = make(chan []byte, buffersPerLane*Lanes) // Fill buffers. for i := 0; i < buffersPerLane*Lanes; i++ { s := 32 + i*internalBlockSize md5srv.buffers <- md5srv.allBufs[s : s+internalBlockSize : s+internalBlockSize] } // Start a single thread for reading from the input channel go md5srv.process(md5srv.newInput) return md5srv } type newClient struct { uid uint64 input chan blockInput } // process - Sole handler for reading from the input channel. func (s *md5Server) process(newClients chan newClient) { // To fill up as many lanes as possible: // // 1. Wait for a cycle id. // 2. If not already in a lane, add, otherwise leave on channel // 3. Start timer // 4. Check if lanes is full, if so, goto 10 (process). // 5. If timeout, goto 10. // 6. Wait for new id (goto 2) or timeout (goto 10). // 10. Process. // 11. Check all input if there is already input, if so add to lanes. // 12. Goto 1 // lanes contains the lanes. var lanes lanesInfo // lanesFilled contains the number of filled lanes for current cycle. var lanesFilled int // clients contains active clients var clients = make(map[uint64]chan blockInput, Lanes) addToLane := func(uid uint64) { cl, ok := clients[uid] if !ok { // Unknown client. Maybe it was already removed. return } // Check if we already have it. for _, lane := range lanes[:lanesFilled] { if lane.uid == uid { return } } // Continue until we get a block or there is nothing on channel for { select { case block, ok := <-cl: if !ok { // Client disconnected delete(clients, block.uid) return } if block.uid != uid { panic(fmt.Errorf("uid mismatch, %d (block) != %d (client)", block.uid, uid)) } // If reset message, reset and we're done if block.reset { delete(s.digests, uid) continue } // If requesting sum, we will need to maintain state. if block.sumCh != nil { var dig digest d, ok := s.digests[uid] if ok { dig.s[0] = binary.LittleEndian.Uint32(d[0:4]) dig.s[1] = binary.LittleEndian.Uint32(d[4:8]) dig.s[2] = binary.LittleEndian.Uint32(d[8:12]) dig.s[3] = binary.LittleEndian.Uint32(d[12:16]) } else { dig.s[0], dig.s[1], dig.s[2], dig.s[3] = init0, init1, init2, init3 } sum := sumResult{} // Add end block to current digest. blockScalar(&dig.s, block.msg) binary.LittleEndian.PutUint32(sum.digest[0:], dig.s[0]) binary.LittleEndian.PutUint32(sum.digest[4:], dig.s[1]) binary.LittleEndian.PutUint32(sum.digest[8:], dig.s[2]) binary.LittleEndian.PutUint32(sum.digest[12:], dig.s[3]) block.sumCh <- sum if block.msg != nil { s.buffers <- block.msg } continue } if len(block.msg) == 0 { continue } lanes[lanesFilled] = block lanesFilled++ return default: return } } } addNewClient := func(cl newClient) { if _, ok := clients[cl.uid]; ok { panic("internal error: duplicate client registration") } clients[cl.uid] = cl.input } allLanesFilled := func() bool { return lanesFilled == Lanes || lanesFilled >= len(clients) } for { // Step 1. for lanesFilled == 0 { select { case cl, ok := <-newClients: if !ok { return } addNewClient(cl) // Check if it already sent a payload. addToLane(cl.uid) continue case uid := <-s.cycle: addToLane(uid) } } fillLanes: for !allLanesFilled() { select { case cl, ok := <-newClients: if !ok { return } addNewClient(cl) case uid := <-s.cycle: addToLane(uid) default: // Nothing more queued... break fillLanes } } // If we did not fill all lanes, check if there is more waiting if !allLanesFilled() { runtime.Gosched() for uid := range clients { addToLane(uid) if allLanesFilled() { break } } } if false { if !allLanesFilled() { fmt.Println("Not all lanes filled", lanesFilled, "of", len(clients)) //pprof.Lookup("goroutine").WriteTo(os.Stdout, 1) } else if true { fmt.Println("all lanes filled") } } // Process the lanes we could collect s.blocks(lanes[:lanesFilled]) // Clear lanes... lanesFilled = 0 // Add all current queued for uid := range clients { addToLane(uid) if allLanesFilled() { break } } } } func (s *md5Server) Close() { if s.newInput != nil { close(s.newInput) s.newInput = nil } } // Invoke assembly and send results back func (s *md5Server) blocks(lanes []blockInput) { if len(lanes) < useScalarBelow { // Use scalar routine when below this many lanes switch len(lanes) { case 0: case 1: lane := lanes[0] var d digest a, ok := s.digests[lane.uid] if ok { d.s[0] = binary.LittleEndian.Uint32(a[0:4]) d.s[1] = binary.LittleEndian.Uint32(a[4:8]) d.s[2] = binary.LittleEndian.Uint32(a[8:12]) d.s[3] = binary.LittleEndian.Uint32(a[12:16]) } else { d.s[0] = init0 d.s[1] = init1 d.s[2] = init2 d.s[3] = init3 } if len(lane.msg) > 0 { // Update... blockScalar(&d.s, lane.msg) } dig := [Size]byte{} binary.LittleEndian.PutUint32(dig[0:], d.s[0]) binary.LittleEndian.PutUint32(dig[4:], d.s[1]) binary.LittleEndian.PutUint32(dig[8:], d.s[2]) binary.LittleEndian.PutUint32(dig[12:], d.s[3]) s.digests[lane.uid] = dig if lane.msg != nil { s.buffers <- lane.msg } lanes[0] = blockInput{} default: s.wg.Add(len(lanes)) var results [useScalarBelow]digest for i := range lanes { lane := lanes[i] go func(i int) { var d digest defer s.wg.Done() a, ok := s.digests[lane.uid] if ok { d.s[0] = binary.LittleEndian.Uint32(a[0:4]) d.s[1] = binary.LittleEndian.Uint32(a[4:8]) d.s[2] = binary.LittleEndian.Uint32(a[8:12]) d.s[3] = binary.LittleEndian.Uint32(a[12:16]) } else { d.s[0] = init0 d.s[1] = init1 d.s[2] = init2 d.s[3] = init3 } if len(lane.msg) == 0 { results[i] = d return } // Update... blockScalar(&d.s, lane.msg) results[i] = d }(i) } s.wg.Wait() for i, lane := range lanes { dig := [Size]byte{} binary.LittleEndian.PutUint32(dig[0:], results[i].s[0]) binary.LittleEndian.PutUint32(dig[4:], results[i].s[1]) binary.LittleEndian.PutUint32(dig[8:], results[i].s[2]) binary.LittleEndian.PutUint32(dig[12:], results[i].s[3]) s.digests[lane.uid] = dig if lane.msg != nil { s.buffers <- lane.msg } lanes[i] = blockInput{} } } return } inputs := [16][]byte{} for i := range lanes { inputs[i] = lanes[i].msg } // Collect active digests... state := s.getDigests(lanes) // Process all lanes... s.blockMd5_x16(&state, inputs, len(lanes) <= 8) for i, lane := range lanes { uid := lane.uid dig := [Size]byte{} binary.LittleEndian.PutUint32(dig[0:], state.v0[i]) binary.LittleEndian.PutUint32(dig[4:], state.v1[i]) binary.LittleEndian.PutUint32(dig[8:], state.v2[i]) binary.LittleEndian.PutUint32(dig[12:], state.v3[i]) s.digests[uid] = dig if lane.msg != nil { s.buffers <- lane.msg } lanes[i] = blockInput{} } } func (s *md5Server) getDigests(lanes []blockInput) (d digest16) { for i, lane := range lanes { a, ok := s.digests[lane.uid] if ok { d.v0[i] = binary.LittleEndian.Uint32(a[0:4]) d.v1[i] = binary.LittleEndian.Uint32(a[4:8]) d.v2[i] = binary.LittleEndian.Uint32(a[8:12]) d.v3[i] = binary.LittleEndian.Uint32(a[12:16]) } else { d.v0[i] = init0 d.v1[i] = init1 d.v2[i] = init2 d.v3[i] = init3 } } return } md5-simd-1.1.2/md5-server_fallback.go000066400000000000000000000005141377566263400173140ustar00rootroot00000000000000//+build !amd64 appengine !gc noasm // Copyright (c) 2020 MinIO Inc. All rights reserved. // Use of this source code is governed by a license that can be // found in the LICENSE file. package md5simd // NewServer - Create new object for parallel processing handling func NewServer() *fallbackServer { return &fallbackServer{} } md5-simd-1.1.2/md5-util_amd64.go000066400000000000000000000037751377566263400161530ustar00rootroot00000000000000//+build !noasm,!appengine,gc // Copyright (c) 2020 MinIO Inc. All rights reserved. // Use of this source code is governed by a license that can be // found in the LICENSE file. package md5simd // Helper struct for sorting blocks based on length type lane struct { len uint pos uint } type digest struct { s [4]uint32 } // Helper struct for generating number of rounds in combination with mask for valid lanes type maskRounds struct { mask uint64 rounds uint64 } func generateMaskAndRounds8(input [8][]byte, mr *[8]maskRounds) (rounds int) { // Sort on blocks length small to large var sorted [8]lane for c, inpt := range input[:] { sorted[c] = lane{uint(len(inpt)), uint(c)} for i := c - 1; i >= 0; i-- { // swap so largest is at the end... if sorted[i].len > sorted[i+1].len { sorted[i], sorted[i+1] = sorted[i+1], sorted[i] continue } break } } // Create mask array including 'rounds' (of processing blocks of 64 bytes) between masks m, round := uint64(0xff), uint64(0) for _, s := range sorted[:] { if s.len > 0 { if uint64(s.len)>>6 > round { mr[rounds] = maskRounds{m, (uint64(s.len) >> 6) - round} rounds++ } round = uint64(s.len) >> 6 } m = m & ^(1 << uint(s.pos)) } return } func generateMaskAndRounds16(input [16][]byte, mr *[16]maskRounds) (rounds int) { // Sort on blocks length small to large var sorted [16]lane for c, inpt := range input[:] { sorted[c] = lane{uint(len(inpt)), uint(c)} for i := c - 1; i >= 0; i-- { // swap so largest is at the end... if sorted[i].len > sorted[i+1].len { sorted[i], sorted[i+1] = sorted[i+1], sorted[i] continue } break } } // Create mask array including 'rounds' (of processing blocks of 64 bytes) between masks m, round := uint64(0xffff), uint64(0) for _, s := range sorted[:] { if s.len > 0 { if uint64(s.len)>>6 > round { mr[rounds] = maskRounds{m, (uint64(s.len) >> 6) - round} rounds++ } round = uint64(s.len) >> 6 } m = m & ^(1 << uint(s.pos)) } return } md5-simd-1.1.2/md5-util_amd64_test.go000066400000000000000000000042421377566263400172000ustar00rootroot00000000000000//+build !noasm,!appengine,gc // Copyright (c) 2020 MinIO Inc. All rights reserved. // Use of this source code is governed by a license that can be // found in the LICENSE file. package md5simd import ( "reflect" "testing" ) type maskTest struct { in [8]int out []maskRounds } var goldenMask = []maskTest{ {[8]int{0, 0, 0, 0, 0, 0, 0, 0}, []maskRounds{}}, {[8]int{64, 0, 64, 0, 64, 0, 64, 0}, []maskRounds{{0x55, 1}}}, {[8]int{0, 64, 0, 64, 0, 64, 0, 64}, []maskRounds{{0xaa, 1}}}, {[8]int{64, 64, 64, 64, 64, 64, 64, 64}, []maskRounds{{0xff, 1}}}, {[8]int{128, 128, 128, 128, 128, 128, 128, 128}, []maskRounds{{0xff, 2}}}, {[8]int{64, 128, 64, 128, 64, 128, 64, 128}, []maskRounds{{0xff, 1}, {0xaa, 1}}}, {[8]int{128, 64, 128, 64, 128, 64, 128, 64}, []maskRounds{{0xff, 1}, {0x55, 1}}}, {[8]int{64, 192, 64, 192, 64, 192, 64, 192}, []maskRounds{{0xff, 1}, {0xaa, 2}}}, {[8]int{0, 64, 128, 0, 64, 128, 0, 64}, []maskRounds{{0xb6, 1}, {0x24, 1}}}, {[8]int{1 * 64, 2 * 64, 3 * 64, 4 * 64, 5 * 64, 6 * 64, 7 * 64, 8 * 64}, []maskRounds{{0xff, 1}, {0xfe, 1}, {0xfc, 1}, {0xf8, 1}, {0xf0, 1}, {0xe0, 1}, {0xc0, 1}, {0x80, 1}}}, {[8]int{2 * 64, 1 * 64, 3 * 64, 4 * 64, 5 * 64, 6 * 64, 7 * 64, 8 * 64}, []maskRounds{{0xff, 1}, {0xfd, 1}, {0xfc, 1}, {0xf8, 1}, {0xf0, 1}, {0xe0, 1}, {0xc0, 1}, {0x80, 1}}}, {[8]int{10 * 64, 20 * 64, 30 * 64, 40 * 64, 50 * 64, 60 * 64, 70 * 64, 80 * 64}, []maskRounds{{0xff, 10}, {0xfe, 10}, {0xfc, 10}, {0xf8, 10}, {0xf0, 10}, {0xe0, 10}, {0xc0, 10}, {0x80, 10}}}, {[8]int{10 * 64, 19 * 64, 27 * 64, 34 * 64, 40 * 64, 45 * 64, 49 * 64, 52 * 64}, []maskRounds{{0xff, 10}, {0xfe, 9}, {0xfc, 8}, {0xf8, 7}, {0xf0, 6}, {0xe0, 5}, {0xc0, 4}, {0x80, 3}}}, } func TestGenerateMaskAndRounds(t *testing.T) { input := [8][]byte{} maskRound := [8]maskRounds{} for gcase, g := range goldenMask { for i, l := range g.in { buf := make([]byte, l) input[i] = buf[:] } rounds := generateMaskAndRounds8(input, &maskRound) mr := make([]maskRounds, 0, 8) for r := 0; r < rounds; r++ { mr = append(mr, maskRound[r]) } if !reflect.DeepEqual(mr, g.out) { t.Fatalf("case %d: got %04x\n want %04x", gcase, mr, g.out) } } } md5-simd-1.1.2/md5.go000066400000000000000000000021421377566263400141700ustar00rootroot00000000000000package md5simd import ( "crypto/md5" "hash" "sync" ) const ( // The blocksize of MD5 in bytes. BlockSize = 64 // The size of an MD5 checksum in bytes. Size = 16 // internalBlockSize is the internal block size. internalBlockSize = 32 << 10 ) type Server interface { NewHash() Hasher Close() } type Hasher interface { hash.Hash Close() } // StdlibHasher returns a Hasher that uses the stdlib for hashing. // Used hashers are stored in a pool for fast reuse. func StdlibHasher() Hasher { return &md5Wrapper{Hash: md5Pool.New().(hash.Hash)} } // md5Wrapper is a wrapper around the builtin hasher. type md5Wrapper struct { hash.Hash } var md5Pool = sync.Pool{New: func() interface{} { return md5.New() }} // fallbackServer - Fallback when no assembly is available. type fallbackServer struct { } // NewHash -- return regular Golang md5 hashing from crypto func (s *fallbackServer) NewHash() Hasher { return &md5Wrapper{Hash: md5Pool.New().(hash.Hash)} } func (s *fallbackServer) Close() { } func (m *md5Wrapper) Close() { if m.Hash != nil { m.Reset() md5Pool.Put(m.Hash) m.Hash = nil } } md5-simd-1.1.2/md5_amd64_test.go000066400000000000000000000143121377566263400162240ustar00rootroot00000000000000//+build !noasm,!appengine,gc // Copyright (c) 2020 MinIO Inc. All rights reserved. // Use of this source code is governed by a license that can be // found in the LICENSE file. package md5simd import ( "bytes" "hash" "runtime" "sync" "testing" "github.com/klauspost/cpuid/v2" ) const benchmarkWithSum = true func BenchmarkAvx512(b *testing.B) { if !hasAVX512 { b.SkipNow() } b.Run("32KB", func(b *testing.B) { benchmarkSingle(b, 32*1024) }) b.Run("64KB", func(b *testing.B) { benchmarkSingle(b, 64*1024) }) b.Run("128KB", func(b *testing.B) { benchmarkSingle(b, 128*1024) }) b.Run("256KB", func(b *testing.B) { benchmarkSingle(b, 256*1024) }) b.Run("512KB", func(b *testing.B) { benchmarkSingle(b, 512*1024) }) b.Run("1MB", func(b *testing.B) { benchmarkSingle(b, 1024*1024) }) b.Run("2MB", func(b *testing.B) { benchmarkSingle(b, 2*1024*1024) }) b.Run("4MB", func(b *testing.B) { benchmarkSingle(b, 4*1024*1024) }) b.Run("8MB", func(b *testing.B) { benchmarkSingle(b, 8*1024*1024) }) } func BenchmarkAvx512Parallel(b *testing.B) { if !hasAVX512 { b.SkipNow() } b.Run("32KB", func(b *testing.B) { benchmarkParallel(b, 32*1024) }) b.Run("64KB", func(b *testing.B) { benchmarkParallel(b, 64*1024) }) b.Run("128KB", func(b *testing.B) { benchmarkParallel(b, 128*1024) }) b.Run("256KB", func(b *testing.B) { benchmarkParallel(b, 256*1024) }) b.Run("512KB", func(b *testing.B) { benchmarkParallel(b, 512*1024) }) b.Run("1MB", func(b *testing.B) { benchmarkParallel(b, 1024*1024) }) b.Run("2MB", func(b *testing.B) { benchmarkParallel(b, 2*1024*1024) }) b.Run("4MB", func(b *testing.B) { benchmarkParallel(b, 4*1024*1024) }) b.Run("8MB", func(b *testing.B) { benchmarkParallel(b, 8*1024*1024) }) } func benchmarkSingle(b *testing.B, blockSize int) { server := NewServer() defer server.Close() h16 := [16]hash.Hash{} input := [16][]byte{} for i := range h16 { h16[i] = server.NewHash() input[i] = bytes.Repeat([]byte{0x61 + byte(i)}, blockSize) } // Technically this uses up to 2 cores, but it is the throughput of a single server. b.SetBytes(int64(blockSize * 16)) b.ReportAllocs() b.ResetTimer() var tmp [Size]byte for j := 0; j < b.N; j++ { var wg sync.WaitGroup wg.Add(16) for i := range h16 { go func(i int) { // write to all concurrently defer wg.Done() h16[i].Reset() h16[i].Write(input[i]) if benchmarkWithSum { _ = h16[i].Sum(tmp[:0]) } }(i) } wg.Wait() } } func benchmarkSingleWriter(b *testing.B, blockSize int) { server := NewServer() defer server.Close() h := server.NewHash() input := bytes.Repeat([]byte{0x61}, blockSize) b.SetBytes(int64(blockSize)) b.ReportAllocs() b.ResetTimer() var tmp [Size]byte for j := 0; j < b.N; j++ { h.Write(input) if benchmarkWithSum { _ = h.Sum(tmp[:0]) } } } func benchmarkParallel(b *testing.B, blockSize int) { // We write input 16x per loop. // We have to alloc per parallel b.SetBytes(int64(blockSize * 16)) b.ReportAllocs() b.ResetTimer() b.RunParallel(func(pb *testing.PB) { input := bytes.Repeat([]byte{0x61}, blockSize) server := NewServer() defer server.Close() var h16 [16]Hasher for i := range h16 { h16[i] = server.NewHash() defer h16[i].Close() } var tmp [Size]byte for pb.Next() { var wg sync.WaitGroup wg.Add(16) for i := range h16 { // Write to all concurrently go func(i int) { defer wg.Done() h16[i].Reset() h16[i].Write(input) if benchmarkWithSum { _ = h16[i].Sum(tmp[:0]) } }(i) } wg.Wait() } }) } func BenchmarkAvx2(b *testing.B) { // Make sure AVX512 is disabled restore := hasAVX512 hasAVX512 = false b.Run("32KB", func(b *testing.B) { benchmarkSingle(b, 32*1024) }) b.Run("64KB", func(b *testing.B) { benchmarkSingle(b, 64*1024) }) b.Run("128KB", func(b *testing.B) { benchmarkSingle(b, 128*1024) }) b.Run("256KB", func(b *testing.B) { benchmarkSingle(b, 256*1024) }) b.Run("512KB", func(b *testing.B) { benchmarkSingle(b, 512*1024) }) b.Run("1MB", func(b *testing.B) { benchmarkSingle(b, 1024*1024) }) b.Run("2MB", func(b *testing.B) { benchmarkSingle(b, 2*1024*1024) }) b.Run("4MB", func(b *testing.B) { benchmarkSingle(b, 4*1024*1024) }) b.Run("8MB", func(b *testing.B) { benchmarkSingle(b, 8*1024*1024) }) hasAVX512 = restore } func BenchmarkAvx2Parallel(b *testing.B) { if !cpuid.CPU.Supports(cpuid.AVX2) { b.SkipNow() } restore := hasAVX512 // Make sure AVX512 is disabled hasAVX512 = false b.SetParallelism((runtime.GOMAXPROCS(0) + 1) / 2) b.Run("32KB", func(b *testing.B) { benchmarkParallel(b, 32*1024) }) b.Run("64KB", func(b *testing.B) { benchmarkParallel(b, 64*1024) }) b.Run("128KB", func(b *testing.B) { benchmarkParallel(b, 128*1024) }) b.Run("256KB", func(b *testing.B) { benchmarkParallel(b, 256*1024) }) b.Run("512KB", func(b *testing.B) { benchmarkParallel(b, 512*1024) }) b.Run("1MB", func(b *testing.B) { benchmarkParallel(b, 1024*1024) }) b.Run("2MB", func(b *testing.B) { benchmarkParallel(b, 2*1024*1024) }) b.Run("4MB", func(b *testing.B) { benchmarkParallel(b, 4*1024*1024) }) b.Run("8MB", func(b *testing.B) { benchmarkParallel(b, 8*1024*1024) }) hasAVX512 = restore } // BenchmarkAvx2SingleWriter will benchmark the speed having only a single writer // writing blocks with the specified size. // This is pretty much the worst case scenario. func BenchmarkAvx2SingleWriter(b *testing.B) { // Make sure AVX512 is disabled restore := hasAVX512 hasAVX512 = false b.Run("32KB", func(b *testing.B) { benchmarkSingleWriter(b, 32*1024) }) b.Run("64KB", func(b *testing.B) { benchmarkSingleWriter(b, 64*1024) }) b.Run("128KB", func(b *testing.B) { benchmarkSingleWriter(b, 128*1024) }) b.Run("256KB", func(b *testing.B) { benchmarkSingleWriter(b, 256*1024) }) b.Run("512KB", func(b *testing.B) { benchmarkSingleWriter(b, 512*1024) }) b.Run("1MB", func(b *testing.B) { benchmarkSingleWriter(b, 1024*1024) }) b.Run("2MB", func(b *testing.B) { benchmarkSingleWriter(b, 2*1024*1024) }) b.Run("4MB", func(b *testing.B) { benchmarkSingleWriter(b, 4*1024*1024) }) b.Run("8MB", func(b *testing.B) { benchmarkSingleWriter(b, 8*1024*1024) }) hasAVX512 = restore } md5-simd-1.1.2/md5_test.go000066400000000000000000000265141377566263400152400ustar00rootroot00000000000000// Copyright (c) 2020 MinIO Inc. All rights reserved. // Use of this source code is governed by a license that can be // found in the LICENSE file. package md5simd import ( "bytes" "crypto/md5" "encoding/hex" "fmt" "hash" "io" "math/rand" "runtime" "sync" "testing" ) type md5Test struct { in string want string } var golden = []md5Test{ {"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "014842d480b571495a4a0363793f7367"}, {"bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb", "0b649bcb5a82868817fec9a6e709d233"}, {"cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc", "bcd5708ed79b18f0f0aaa27fd0056d86"}, {"dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd", "e987c862fbd2f2f0ca859cb8d7806bf3"}, {"eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee", "982731671f0cd82cafce8d96a98e7a48"}, {"ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff", "baf13e8b16d8c06324d7c9ab32cb7ff0"}, {"gggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggggg", "8ea3109cbd951bba1ace2f401a784ae4"}, {"hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh", "d141045bfb385cad357e7c39c60e5da0"}, {"", "d41d8cd98f00b204e9800998ecf8427e"}, {"a", "0cc175b9c0f1b6a831c399e269772661"}, {"ab", "187ef4436122d1cc2f40dc2b92f0eba0"}, {"abc", "900150983cd24fb0d6963f7d28e17f72"}, {"abcd", "e2fc714c4727ee9395f324cd2e7f331f"}, {"abcde", "ab56b4d92b40713acc5af89985d4b786"}, {"abcdef", "e80b5017098950fc58aad83c8c14978e"}, {"abcdefg", "7ac66c0f148de9519b8bd264312c4d64"}, {"abcdefgh", "e8dc4081b13434b45189a720b77b6818"}, {"abcdefghi", "8aa99b1f439ff71293e95357bac6fd94"}, {"abcdefghij", "a925576942e94b2ef57a066101b48876"}, {"Discard medicine more than two years old.", "d747fc1719c7eacb84058196cfe56d57"}, {"He who has a shady past knows that nice guys finish last.", "bff2dcb37ef3a44ba43ab144768ca837"}, {"I wouldn't marry him with a ten foot pole.", "0441015ecb54a7342d017ed1bcfdbea5"}, {"Free! Free!/A trip/to Mars/for 900/empty jars/Burma Shave", "9e3cac8e9e9757a60c3ea391130d3689"}, {"The days of the digital watch are numbered. -Tom Stoppard", "a0f04459b031f916a59a35cc482dc039"}, {"Nepal premier won't resign.", "e7a48e0fe884faf31475d2a04b1362cc"}, {"For every action there is an equal and opposite government program.", "637d2fe925c07c113800509964fb0e06"}, {"His money is twice tainted: 'taint yours and 'taint mine.", "834a8d18d5c6562119cf4c7f5086cb71"}, {"There is no reason for any individual to have a computer in their home. -Ken Olsen, 1977", "de3a4d2fd6c73ec2db2abad23b444281"}, {"It's a tiny change to the code and not completely disgusting. - Bob Manchek", "acf203f997e2cf74ea3aff86985aefaf"}, {"size: a.out: bad magic", "e1c1384cb4d2221dfdd7c795a4222c9a"}, {"The major problem is with sendmail. -Mark Horton", "c90f3ddecc54f34228c063d7525bf644"}, {"Give me a rock, paper and scissors and I will move the world. CCFestoon", "cdf7ab6c1fd49bd9933c43f3ea5af185"}, {"If the enemy is within range, then so are you.", "83bc85234942fc883c063cbd7f0ad5d0"}, {"It's well we cannot hear the screams/That we create in others' dreams.", "277cbe255686b48dd7e8f389394d9299"}, {"You remind me of a TV show, but that's all right: I watch it anyway.", "fd3fb0a7ffb8af16603f3d3af98f8e1f"}, {"C is as portable as Stonehedge!!", "469b13a78ebf297ecda64d4723655154"}, {"Even if I could be Shakespeare, I think I should still choose to be Faraday. - A. Huxley", "63eb3a2f466410104731c4b037600110"}, {"The fugacity of a constituent in a mixture of gases at a given temperature is proportional to its mole fraction. Lewis-Randall Rule", "72c2ed7592debca1c90fc0100f931a2f"}, {"How can you write a big system without C++? -Paul Glick", "132f7619d33b523b1d9e5bd8e0928355"}, {"", "d41d8cd98f00b204e9800998ecf8427e"}, } func testGolden16(t *testing.T, megabyte int) { server := NewServer() h16 := [16]hash.Hash{} input := [16][]byte{} for i := range h16 { h16[i] = server.NewHash() input[i] = bytes.Repeat([]byte{0x61 + byte(i)}, megabyte*1024*1024) } for i := range h16 { h16[i].Write(input[i]) } for i := range h16 { digest := h16[i].Sum([]byte{}) got := fmt.Sprintf("%x\n", digest) h := md5.New() h.Write(input[i]) want := fmt.Sprintf("%x\n", h.Sum(nil)) if got != want { t.Errorf("TestGolden16[%d], got %v, want %v", i, got, want) } } } func TestGolden16(t *testing.T) { t.Run("1MB", func(t *testing.T) { testGolden16(t, 1) }) t.Run("2MB", func(t *testing.T) { testGolden16(t, 2) }) } func TestGolangGolden16(t *testing.T) { server := NewServer() defer server.Close() h16 := [16]Hasher{} for i := range h16 { h16[i] = server.NewHash() defer h16[i].Close() } // Skip first 8, so we even 2 rounds of 16 test vectors golden16 := golden[8:] for tc := 0; tc < len(golden16); tc += 16 { for i := range h16 { h16[i].Reset() h16[i].Write([]byte(golden16[tc+i].in)) } for i := range h16 { digest := h16[i].Sum([]byte{}) if fmt.Sprintf("%x", digest) != golden16[tc+i].want { t.Errorf("TestGolangGolden[%d], got %v, want %v, uid:%+v", tc+i, fmt.Sprintf("%x", digest), golden16[tc+i].want, h16[i]) } } } } func testMultipleSums(t *testing.T, incr, incr2 int) { server := NewServer() defer server.Close() h := server.NewHash() var tmp [Size]byte h.Write(bytes.Repeat([]byte{0x61}, 64+incr)) digestMiddle1 := fmt.Sprintf("%x", h.Sum(tmp[:0])) digestMiddle1b := fmt.Sprintf("%x", h.Sum(tmp[:0])) if digestMiddle1 != digestMiddle1b { t.Errorf("TestMultipleSums, got %s, want %s", digestMiddle1, digestMiddle1b) } h.Write(bytes.Repeat([]byte{0x62}, 64+incr2)) digestMiddle2 := fmt.Sprintf("%x", h.Sum(tmp[:0])) digestMiddle2b := fmt.Sprintf("%x", h.Sum(tmp[:0])) if digestMiddle2 != digestMiddle2b { t.Errorf("TestMultipleSums, got %s, want %s", digestMiddle2, digestMiddle2b) } h.Write(bytes.Repeat([]byte{0x63}, 64)) digestFinal := fmt.Sprintf("%x", h.Sum(tmp[:0])) h2 := md5.New() h2.Write(bytes.Repeat([]byte{0x61}, 64+incr)) digestCryptoMiddle1 := fmt.Sprintf("%x", h2.Sum(tmp[:0])) if digestMiddle1 != digestCryptoMiddle1 { t.Errorf("TestMultipleSums, got %s, want %s", digestMiddle1, digestCryptoMiddle1) } h2.Write(bytes.Repeat([]byte{0x62}, 64+incr2)) digestCryptoMiddle2 := fmt.Sprintf("%x", h2.Sum(tmp[:0])) if digestMiddle2 != digestCryptoMiddle2 { t.Errorf("TestMultipleSums, got %s, want %s", digestMiddle2, digestCryptoMiddle2) } h2.Write(bytes.Repeat([]byte{0x63}, 64)) digestCryptoFinal := fmt.Sprintf("%x", h2.Sum(tmp[:0])) if digestFinal != digestCryptoFinal { t.Errorf("TestMultipleSums, got %s, want %s", digestFinal, digestCryptoFinal) } } func TestMultipleSums(t *testing.T) { t.Run("", func(t *testing.T) { for i := 0; i < 64*2; i++ { for j := 0; j < 64; j++ { testMultipleSums(t, i, j) } } }) } func testMd5Simulator(t *testing.T, concurrency, iterations, maxSize int, server Server) { // Use deterministic RNG. rng := rand.New(rand.NewSource(0xabad1dea)) for i := 0; i < iterations; i++ { var wg sync.WaitGroup wg.Add(concurrency) for j := 0; j < concurrency; j++ { size := 1 + rng.Intn(maxSize) go func(j int) { defer wg.Done() h := server.NewHash() defer h.Close() input := bytes.Repeat([]byte{0x61 + byte(i^j)}, size) // Copy using odd-sized buffer n, err := io.CopyBuffer(h, bytes.NewBuffer(input), make([]byte, 13773)) if int(n) != size || err != nil { panic(fmt.Errorf("wrote %d of %d, err: %v", n, size, err)) } got := h.Sum([]byte{}) // Calculate reference want := md5.Sum(input) if !bytes.Equal(got, want[:]) { panic(fmt.Errorf("got %s, want %s", hex.EncodeToString(got), hex.EncodeToString(want[:]))) } }(j) } wg.Wait() } } func TestMd5Simulator(t *testing.T) { iterations := 400 if testing.Short() { iterations = 40 } t.Run("c16", func(t *testing.T) { server := NewServer() t.Cleanup(server.Close) t.Parallel() testMd5Simulator(t, 16, iterations/10, 20<<20, server) }) t.Run("c1", func(t *testing.T) { server := NewServer() t.Cleanup(server.Close) t.Parallel() testMd5Simulator(t, 1, iterations, 5<<20, server) }) t.Run("c19", func(t *testing.T) { server := NewServer() t.Cleanup(server.Close) t.Parallel() testMd5Simulator(t, 19, iterations*2, 100<<10, server) }) } // TestRandomInput tests a number of random inputs. func TestRandomInput(t *testing.T) { n := 500 if testing.Short() { n = 100 } conc := runtime.GOMAXPROCS(0) for c := 0; c < conc; c++ { t.Run(fmt.Sprint("routine-", c), func(t *testing.T) { server := NewServer() t.Cleanup(server.Close) for i := 0; i < n; i++ { rng := rand.New(rand.NewSource(0xabad1dea + int64(c*n+i))) // Up to 1 MB length := rng.Intn(1 << 20) baseBuf := make([]byte, length) t.Run(fmt.Sprint("hash-", i), func(t *testing.T) { t.Parallel() testBuffer := baseBuf rng.Read(testBuffer) wantMD5 := md5.Sum(testBuffer) h := server.NewHash() for len(testBuffer) > 0 { wrLen := rng.Intn(len(testBuffer) + 1) n, err := h.Write(testBuffer[:wrLen]) if err != nil { t.Fatal(err) } if n != wrLen { t.Fatalf("write mismatch, want %d, got %d", wrLen, n) } testBuffer = testBuffer[n:] if len(testBuffer) == 0 { // Test if we can use the buffer without races. rng.Read(baseBuf) } } got := h.Sum(nil) if !bytes.Equal(wantMD5[:], got) { t.Fatalf("mismatch, want %v, got %v", wantMD5[:], got) } h.Close() }) } }) } } func benchmarkCryptoMd5(b *testing.B, blockSize int) { input := bytes.Repeat([]byte{0x61}, blockSize) b.SetBytes(int64(blockSize)) b.ReportAllocs() b.ResetTimer() h := md5.New() var tmp [Size]byte for j := 0; j < b.N; j++ { h.Write(input) h.Sum(tmp[:0]) } } func benchmarkCryptoMd5P(b *testing.B, blockSize int) { b.SetBytes(int64(blockSize)) b.ReportAllocs() b.ResetTimer() var tmp [Size]byte b.RunParallel(func(pb *testing.PB) { input := bytes.Repeat([]byte{0x61}, blockSize) h := md5.New() for pb.Next() { h.Write(input) h.Sum(tmp[:0]) } }) } func BenchmarkCryptoMd5(b *testing.B) { b.Run("32KB", func(b *testing.B) { benchmarkCryptoMd5(b, 32*1024) }) b.Run("64KB", func(b *testing.B) { benchmarkCryptoMd5(b, 64*1024) }) b.Run("128KB", func(b *testing.B) { benchmarkCryptoMd5(b, 128*1024) }) b.Run("256KB", func(b *testing.B) { benchmarkCryptoMd5(b, 256*1024) }) b.Run("512KB", func(b *testing.B) { benchmarkCryptoMd5(b, 512*1024) }) b.Run("1MB", func(b *testing.B) { benchmarkCryptoMd5(b, 1024*1024) }) b.Run("2MB", func(b *testing.B) { benchmarkCryptoMd5(b, 2*1024*1024) }) } func BenchmarkCryptoMd5Parallel(b *testing.B) { b.Run("32KB", func(b *testing.B) { benchmarkCryptoMd5P(b, 32*1024) }) b.Run("64KB", func(b *testing.B) { benchmarkCryptoMd5P(b, 64*1024) }) b.Run("128KB", func(b *testing.B) { benchmarkCryptoMd5P(b, 128*1024) }) b.Run("256KB", func(b *testing.B) { benchmarkCryptoMd5P(b, 256*1024) }) b.Run("512KB", func(b *testing.B) { benchmarkCryptoMd5P(b, 512*1024) }) b.Run("1MB", func(b *testing.B) { benchmarkCryptoMd5P(b, 1024*1024) }) b.Run("2MB", func(b *testing.B) { benchmarkCryptoMd5P(b, 2*1024*1024) }) b.Run("4MB", func(b *testing.B) { benchmarkCryptoMd5P(b, 4*1024*1024) }) b.Run("8MB", func(b *testing.B) { benchmarkCryptoMd5P(b, 8*1024*1024) }) } md5-simd-1.1.2/md5block_amd64.go000066400000000000000000000004211377566263400161740ustar00rootroot00000000000000// Code generated by command: go run gen.go -out ../md5block_amd64.s -stubs ../md5block_amd64.go -pkg=md5simd. DO NOT EDIT. // +build !appengine // +build !noasm // +build gc package md5simd // Encode p to digest //go:noescape func blockScalar(dig *[4]uint32, p []byte) md5-simd-1.1.2/md5block_amd64.s000066400000000000000000000243551377566263400160450ustar00rootroot00000000000000// Code generated by command: go run gen.go -out ../md5block_amd64.s -stubs ../md5block_amd64.go -pkg=md5simd. DO NOT EDIT. // +build !appengine // +build !noasm // +build gc // func blockScalar(dig *[4]uint32, p []byte) TEXT ·blockScalar(SB), $0-32 MOVQ p_len+16(FP), AX MOVQ dig+0(FP), CX MOVQ p_base+8(FP), DX SHRQ $0x06, AX SHLQ $0x06, AX LEAQ (DX)(AX*1), AX CMPQ DX, AX JEQ end MOVL (CX), BX MOVL 4(CX), BP MOVL 8(CX), SI MOVL 12(CX), CX MOVL $0xffffffff, DI loop: MOVL (DX), R8 MOVL CX, R9 MOVL BX, R10 MOVL BP, R11 MOVL SI, R12 MOVL CX, R13 // ROUND1 XORL SI, R9 ADDL $0xd76aa478, BX ADDL R8, BX ANDL BP, R9 XORL CX, R9 MOVL 4(DX), R8 ADDL R9, BX ROLL $0x07, BX MOVL SI, R9 ADDL BP, BX XORL BP, R9 ADDL $0xe8c7b756, CX ADDL R8, CX ANDL BX, R9 XORL SI, R9 MOVL 8(DX), R8 ADDL R9, CX ROLL $0x0c, CX MOVL BP, R9 ADDL BX, CX XORL BX, R9 ADDL $0x242070db, SI ADDL R8, SI ANDL CX, R9 XORL BP, R9 MOVL 12(DX), R8 ADDL R9, SI ROLL $0x11, SI MOVL BX, R9 ADDL CX, SI XORL CX, R9 ADDL $0xc1bdceee, BP ADDL R8, BP ANDL SI, R9 XORL BX, R9 MOVL 16(DX), R8 ADDL R9, BP ROLL $0x16, BP MOVL CX, R9 ADDL SI, BP XORL SI, R9 ADDL $0xf57c0faf, BX ADDL R8, BX ANDL BP, R9 XORL CX, R9 MOVL 20(DX), R8 ADDL R9, BX ROLL $0x07, BX MOVL SI, R9 ADDL BP, BX XORL BP, R9 ADDL $0x4787c62a, CX ADDL R8, CX ANDL BX, R9 XORL SI, R9 MOVL 24(DX), R8 ADDL R9, CX ROLL $0x0c, CX MOVL BP, R9 ADDL BX, CX XORL BX, R9 ADDL $0xa8304613, SI ADDL R8, SI ANDL CX, R9 XORL BP, R9 MOVL 28(DX), R8 ADDL R9, SI ROLL $0x11, SI MOVL BX, R9 ADDL CX, SI XORL CX, R9 ADDL $0xfd469501, BP ADDL R8, BP ANDL SI, R9 XORL BX, R9 MOVL 32(DX), R8 ADDL R9, BP ROLL $0x16, BP MOVL CX, R9 ADDL SI, BP XORL SI, R9 ADDL $0x698098d8, BX ADDL R8, BX ANDL BP, R9 XORL CX, R9 MOVL 36(DX), R8 ADDL R9, BX ROLL $0x07, BX MOVL SI, R9 ADDL BP, BX XORL BP, R9 ADDL $0x8b44f7af, CX ADDL R8, CX ANDL BX, R9 XORL SI, R9 MOVL 40(DX), R8 ADDL R9, CX ROLL $0x0c, CX MOVL BP, R9 ADDL BX, CX XORL BX, R9 ADDL $0xffff5bb1, SI ADDL R8, SI ANDL CX, R9 XORL BP, R9 MOVL 44(DX), R8 ADDL R9, SI ROLL $0x11, SI MOVL BX, R9 ADDL CX, SI XORL CX, R9 ADDL $0x895cd7be, BP ADDL R8, BP ANDL SI, R9 XORL BX, R9 MOVL 48(DX), R8 ADDL R9, BP ROLL $0x16, BP MOVL CX, R9 ADDL SI, BP XORL SI, R9 ADDL $0x6b901122, BX ADDL R8, BX ANDL BP, R9 XORL CX, R9 MOVL 52(DX), R8 ADDL R9, BX ROLL $0x07, BX MOVL SI, R9 ADDL BP, BX XORL BP, R9 ADDL $0xfd987193, CX ADDL R8, CX ANDL BX, R9 XORL SI, R9 MOVL 56(DX), R8 ADDL R9, CX ROLL $0x0c, CX MOVL BP, R9 ADDL BX, CX XORL BX, R9 ADDL $0xa679438e, SI ADDL R8, SI ANDL CX, R9 XORL BP, R9 MOVL 60(DX), R8 ADDL R9, SI ROLL $0x11, SI MOVL BX, R9 ADDL CX, SI XORL CX, R9 ADDL $0x49b40821, BP ADDL R8, BP ANDL SI, R9 XORL BX, R9 MOVL 4(DX), R8 ADDL R9, BP ROLL $0x16, BP MOVL CX, R9 ADDL SI, BP // ROUND2 MOVL CX, R9 MOVL CX, R14 XORL DI, R9 ADDL $0xf61e2562, BX ADDL R8, BX ANDL BP, R14 ANDL SI, R9 MOVL 24(DX), R8 ORL R9, R14 MOVL SI, R9 ADDL R14, BX MOVL SI, R14 ROLL $0x05, BX ADDL BP, BX XORL DI, R9 ADDL $0xc040b340, CX ADDL R8, CX ANDL BX, R14 ANDL BP, R9 MOVL 44(DX), R8 ORL R9, R14 MOVL BP, R9 ADDL R14, CX MOVL BP, R14 ROLL $0x09, CX ADDL BX, CX XORL DI, R9 ADDL $0x265e5a51, SI ADDL R8, SI ANDL CX, R14 ANDL BX, R9 MOVL (DX), R8 ORL R9, R14 MOVL BX, R9 ADDL R14, SI MOVL BX, R14 ROLL $0x0e, SI ADDL CX, SI XORL DI, R9 ADDL $0xe9b6c7aa, BP ADDL R8, BP ANDL SI, R14 ANDL CX, R9 MOVL 20(DX), R8 ORL R9, R14 MOVL CX, R9 ADDL R14, BP MOVL CX, R14 ROLL $0x14, BP ADDL SI, BP XORL DI, R9 ADDL $0xd62f105d, BX ADDL R8, BX ANDL BP, R14 ANDL SI, R9 MOVL 40(DX), R8 ORL R9, R14 MOVL SI, R9 ADDL R14, BX MOVL SI, R14 ROLL $0x05, BX ADDL BP, BX XORL DI, R9 ADDL $0x02441453, CX ADDL R8, CX ANDL BX, R14 ANDL BP, R9 MOVL 60(DX), R8 ORL R9, R14 MOVL BP, R9 ADDL R14, CX MOVL BP, R14 ROLL $0x09, CX ADDL BX, CX XORL DI, R9 ADDL $0xd8a1e681, SI ADDL R8, SI ANDL CX, R14 ANDL BX, R9 MOVL 16(DX), R8 ORL R9, R14 MOVL BX, R9 ADDL R14, SI MOVL BX, R14 ROLL $0x0e, SI ADDL CX, SI XORL DI, R9 ADDL $0xe7d3fbc8, BP ADDL R8, BP ANDL SI, R14 ANDL CX, R9 MOVL 36(DX), R8 ORL R9, R14 MOVL CX, R9 ADDL R14, BP MOVL CX, R14 ROLL $0x14, BP ADDL SI, BP XORL DI, R9 ADDL $0x21e1cde6, BX ADDL R8, BX ANDL BP, R14 ANDL SI, R9 MOVL 56(DX), R8 ORL R9, R14 MOVL SI, R9 ADDL R14, BX MOVL SI, R14 ROLL $0x05, BX ADDL BP, BX XORL DI, R9 ADDL $0xc33707d6, CX ADDL R8, CX ANDL BX, R14 ANDL BP, R9 MOVL 12(DX), R8 ORL R9, R14 MOVL BP, R9 ADDL R14, CX MOVL BP, R14 ROLL $0x09, CX ADDL BX, CX XORL DI, R9 ADDL $0xf4d50d87, SI ADDL R8, SI ANDL CX, R14 ANDL BX, R9 MOVL 32(DX), R8 ORL R9, R14 MOVL BX, R9 ADDL R14, SI MOVL BX, R14 ROLL $0x0e, SI ADDL CX, SI XORL DI, R9 ADDL $0x455a14ed, BP ADDL R8, BP ANDL SI, R14 ANDL CX, R9 MOVL 52(DX), R8 ORL R9, R14 MOVL CX, R9 ADDL R14, BP MOVL CX, R14 ROLL $0x14, BP ADDL SI, BP XORL DI, R9 ADDL $0xa9e3e905, BX ADDL R8, BX ANDL BP, R14 ANDL SI, R9 MOVL 8(DX), R8 ORL R9, R14 MOVL SI, R9 ADDL R14, BX MOVL SI, R14 ROLL $0x05, BX ADDL BP, BX XORL DI, R9 ADDL $0xfcefa3f8, CX ADDL R8, CX ANDL BX, R14 ANDL BP, R9 MOVL 28(DX), R8 ORL R9, R14 MOVL BP, R9 ADDL R14, CX MOVL BP, R14 ROLL $0x09, CX ADDL BX, CX XORL DI, R9 ADDL $0x676f02d9, SI ADDL R8, SI ANDL CX, R14 ANDL BX, R9 MOVL 48(DX), R8 ORL R9, R14 MOVL BX, R9 ADDL R14, SI MOVL BX, R14 ROLL $0x0e, SI ADDL CX, SI XORL DI, R9 ADDL $0x8d2a4c8a, BP ADDL R8, BP ANDL SI, R14 ANDL CX, R9 MOVL 20(DX), R8 ORL R9, R14 MOVL CX, R9 ADDL R14, BP MOVL CX, R14 ROLL $0x14, BP ADDL SI, BP // ROUND3 MOVL SI, R9 ADDL $0xfffa3942, BX ADDL R8, BX MOVL 32(DX), R8 XORL CX, R9 XORL BP, R9 ADDL R9, BX ROLL $0x04, BX MOVL BP, R9 ADDL BP, BX ADDL $0x8771f681, CX ADDL R8, CX MOVL 44(DX), R8 XORL SI, R9 XORL BX, R9 ADDL R9, CX ROLL $0x0b, CX MOVL BX, R9 ADDL BX, CX ADDL $0x6d9d6122, SI ADDL R8, SI MOVL 56(DX), R8 XORL BP, R9 XORL CX, R9 ADDL R9, SI ROLL $0x10, SI MOVL CX, R9 ADDL CX, SI ADDL $0xfde5380c, BP ADDL R8, BP MOVL 4(DX), R8 XORL BX, R9 XORL SI, R9 ADDL R9, BP ROLL $0x17, BP MOVL SI, R9 ADDL SI, BP ADDL $0xa4beea44, BX ADDL R8, BX MOVL 16(DX), R8 XORL CX, R9 XORL BP, R9 ADDL R9, BX ROLL $0x04, BX MOVL BP, R9 ADDL BP, BX ADDL $0x4bdecfa9, CX ADDL R8, CX MOVL 28(DX), R8 XORL SI, R9 XORL BX, R9 ADDL R9, CX ROLL $0x0b, CX MOVL BX, R9 ADDL BX, CX ADDL $0xf6bb4b60, SI ADDL R8, SI MOVL 40(DX), R8 XORL BP, R9 XORL CX, R9 ADDL R9, SI ROLL $0x10, SI MOVL CX, R9 ADDL CX, SI ADDL $0xbebfbc70, BP ADDL R8, BP MOVL 52(DX), R8 XORL BX, R9 XORL SI, R9 ADDL R9, BP ROLL $0x17, BP MOVL SI, R9 ADDL SI, BP ADDL $0x289b7ec6, BX ADDL R8, BX MOVL (DX), R8 XORL CX, R9 XORL BP, R9 ADDL R9, BX ROLL $0x04, BX MOVL BP, R9 ADDL BP, BX ADDL $0xeaa127fa, CX ADDL R8, CX MOVL 12(DX), R8 XORL SI, R9 XORL BX, R9 ADDL R9, CX ROLL $0x0b, CX MOVL BX, R9 ADDL BX, CX ADDL $0xd4ef3085, SI ADDL R8, SI MOVL 24(DX), R8 XORL BP, R9 XORL CX, R9 ADDL R9, SI ROLL $0x10, SI MOVL CX, R9 ADDL CX, SI ADDL $0x04881d05, BP ADDL R8, BP MOVL 36(DX), R8 XORL BX, R9 XORL SI, R9 ADDL R9, BP ROLL $0x17, BP MOVL SI, R9 ADDL SI, BP ADDL $0xd9d4d039, BX ADDL R8, BX MOVL 48(DX), R8 XORL CX, R9 XORL BP, R9 ADDL R9, BX ROLL $0x04, BX MOVL BP, R9 ADDL BP, BX ADDL $0xe6db99e5, CX ADDL R8, CX MOVL 60(DX), R8 XORL SI, R9 XORL BX, R9 ADDL R9, CX ROLL $0x0b, CX MOVL BX, R9 ADDL BX, CX ADDL $0x1fa27cf8, SI ADDL R8, SI MOVL 8(DX), R8 XORL BP, R9 XORL CX, R9 ADDL R9, SI ROLL $0x10, SI MOVL CX, R9 ADDL CX, SI ADDL $0xc4ac5665, BP ADDL R8, BP MOVL (DX), R8 XORL BX, R9 XORL SI, R9 ADDL R9, BP ROLL $0x17, BP MOVL SI, R9 ADDL SI, BP // ROUND4 MOVL DI, R9 XORL CX, R9 ADDL $0xf4292244, BX ADDL R8, BX ORL BP, R9 XORL SI, R9 ADDL R9, BX MOVL 28(DX), R8 MOVL DI, R9 ROLL $0x06, BX XORL SI, R9 ADDL BP, BX ADDL $0x432aff97, CX ADDL R8, CX ORL BX, R9 XORL BP, R9 ADDL R9, CX MOVL 56(DX), R8 MOVL DI, R9 ROLL $0x0a, CX XORL BP, R9 ADDL BX, CX ADDL $0xab9423a7, SI ADDL R8, SI ORL CX, R9 XORL BX, R9 ADDL R9, SI MOVL 20(DX), R8 MOVL DI, R9 ROLL $0x0f, SI XORL BX, R9 ADDL CX, SI ADDL $0xfc93a039, BP ADDL R8, BP ORL SI, R9 XORL CX, R9 ADDL R9, BP MOVL 48(DX), R8 MOVL DI, R9 ROLL $0x15, BP XORL CX, R9 ADDL SI, BP ADDL $0x655b59c3, BX ADDL R8, BX ORL BP, R9 XORL SI, R9 ADDL R9, BX MOVL 12(DX), R8 MOVL DI, R9 ROLL $0x06, BX XORL SI, R9 ADDL BP, BX ADDL $0x8f0ccc92, CX ADDL R8, CX ORL BX, R9 XORL BP, R9 ADDL R9, CX MOVL 40(DX), R8 MOVL DI, R9 ROLL $0x0a, CX XORL BP, R9 ADDL BX, CX ADDL $0xffeff47d, SI ADDL R8, SI ORL CX, R9 XORL BX, R9 ADDL R9, SI MOVL 4(DX), R8 MOVL DI, R9 ROLL $0x0f, SI XORL BX, R9 ADDL CX, SI ADDL $0x85845dd1, BP ADDL R8, BP ORL SI, R9 XORL CX, R9 ADDL R9, BP MOVL 32(DX), R8 MOVL DI, R9 ROLL $0x15, BP XORL CX, R9 ADDL SI, BP ADDL $0x6fa87e4f, BX ADDL R8, BX ORL BP, R9 XORL SI, R9 ADDL R9, BX MOVL 60(DX), R8 MOVL DI, R9 ROLL $0x06, BX XORL SI, R9 ADDL BP, BX ADDL $0xfe2ce6e0, CX ADDL R8, CX ORL BX, R9 XORL BP, R9 ADDL R9, CX MOVL 24(DX), R8 MOVL DI, R9 ROLL $0x0a, CX XORL BP, R9 ADDL BX, CX ADDL $0xa3014314, SI ADDL R8, SI ORL CX, R9 XORL BX, R9 ADDL R9, SI MOVL 52(DX), R8 MOVL DI, R9 ROLL $0x0f, SI XORL BX, R9 ADDL CX, SI ADDL $0x4e0811a1, BP ADDL R8, BP ORL SI, R9 XORL CX, R9 ADDL R9, BP MOVL 16(DX), R8 MOVL DI, R9 ROLL $0x15, BP XORL CX, R9 ADDL SI, BP ADDL $0xf7537e82, BX ADDL R8, BX ORL BP, R9 XORL SI, R9 ADDL R9, BX MOVL 44(DX), R8 MOVL DI, R9 ROLL $0x06, BX XORL SI, R9 ADDL BP, BX ADDL $0xbd3af235, CX ADDL R8, CX ORL BX, R9 XORL BP, R9 ADDL R9, CX MOVL 8(DX), R8 MOVL DI, R9 ROLL $0x0a, CX XORL BP, R9 ADDL BX, CX ADDL $0x2ad7d2bb, SI ADDL R8, SI ORL CX, R9 XORL BX, R9 ADDL R9, SI MOVL 36(DX), R8 MOVL DI, R9 ROLL $0x0f, SI XORL BX, R9 ADDL CX, SI ADDL $0xeb86d391, BP ADDL R8, BP ORL SI, R9 XORL CX, R9 ADDL R9, BP ROLL $0x15, BP ADDL SI, BP ADDL R10, BX ADDL R11, BP ADDL R12, SI ADDL R13, CX // Prepare next loop ADDQ $0x40, DX CMPQ DX, AX JB loop // Write output MOVQ dig+0(FP), AX MOVL BX, (AX) MOVL BP, 4(AX) MOVL SI, 8(AX) MOVL CX, 12(AX) end: RET