Using X64 assembly language to write high performance SM3 hash algorithm code

Keywords: Programming less Assembly Language C

Description of C language style operators used in this paper:

=Assignment
==Equal to
< less than
< = less than or equal to
~Reverse by position
&Bitwise AND
^Bitwise XOR
|Bitwise OR
< cycle left
>>>Cycle right

4.1 initial value
There's nothing to say about this. Just copy the standard documents:

7380166f 4914b2b9 172442d7 da8a0600 a96f30bc 163138aa e38dee4d b0fb0e4e

4.2 constant
The description of the constant Tj in the standard document is summarized as follows:

0 <= j <= 15    Tj = 0x79cc4519
16 <= j <= 63   Tj = 0x7a879d8a

This j is an integer, which means the number of cycles of the compression function (from 0 to 63 in turn, 64 times in total). In the compression function, Tj needs to cycle displacement J times before adding, so Tj becomes a constant array as follows:

00 <= j <= 15
79cc4519 f3988a32 e7311465 ce6228cb 9cc45197 3988a32f 7311465e e6228cbc
cc451979 988a32f3 311465e7 6228cbce c451979c 88a32f39 11465e73 228cbce6

16 <= j <= 47
9d8a7a87 3b14f50f 7629ea1e ec53d43c d8a7a879 b14f50f3 629ea1e7 c53d43ce
8a7a879d 14f50f3b 29ea1e76 53d43cec a7a879d8 4f50f3b1 9ea1e762 3d43cec5
7a879d8a f50f3b14 ea1e7629 d43cec53 a879d8a7 50f3b14f a1e7629e 43cec53d
879d8a7a 0f3b14f5 1e7629ea 3cec53d4 79d8a7a8 f3b14f50 e7629ea1 cec53d43

48 <= j <= 63
9d8a7a87 3b14f50f 7629ea1e ec53d43c d8a7a879 b14f50f3 629ea1e7 c53d43ce
8a7a879d 14f50f3b 29ea1e76 53d43cec a7a879d8 4f50f3b1 9ea1e762 3d43cec5

When using this array, Tj is no longer rotated.

4.3 Boolean function
The description of Boolean functions in standard documents is summarized as follows:

0 <= j <= 15    FFj(X,Y,Z) = X ^ Y ^ Z
16 <= j <= 63   FFj(X,Y,Z) = (X & Y) | (X & Z) | (Y & Z)
0 <= j <= 15    GGj(X,Y,Z) = X ^ Y ^ Z
16 <= j <= 63   GGj(X,Y,Z) = (X & Y) |(~X & Z)

According to the operation rules of logic algebra, FFj is optimized as follows:

16 <= j <= 63   FFj(X,Y,Z) = (X & (Y | Z)) | (Y & Z)

4.4 permutation function
The description of Boolean functions in standard documents is summarized as follows:

P0(X) = X ^ (X <<< 9) ^ (X <<< 17)
P1(X) = X ^ (X <<< 15) ^ (X <<< 23)

When using register operation, the equivalent transformation is as follows:

P0(X) = X ^ (X <<< 9) ^ ((X <<< 9) <<< 8)
P1(X) = X ^ (X <<< 15) ^ ((X <<< 15) <<< 8)

5.3.2 message extension
The description of message extension in standard documents is summarized as follows:

The message grouping (fixed length 512 bits) is extended to generate 132 words for compressing the function CF.

After the message expansion, two arrays Wj[68] and Wj'[64] are used to save the data in the original document. In order to maximize the performance of X64 processor, the size of array Wj is reduced from 68 to 20, array Wj' is canceled, and the compression function CF is modified accordingly to adapt to this change.

5.3.3 compression function
The compression function CF is the core of the whole algorithm, and the standard document is not quite designed for X64 series processors. On the premise of ensuring the correct algorithm implementation, the algorithm implementation process has been greatly adjusted.
The standard documents are excerpted as follows:

SS1 = ((A <<< 12) + E + (Tj <<< j)) <<< 7
SS2 = SS1 ^ (A <<< 12)
TT1 = FFj(A,B,C) + D + SS2 + Wj'
TT2 = GGj(E,F,G) + H + SS1 + Wj
D   = C
C   = B <<< 9
B   = A
A   = TT1
H   = G
G   = F <<< 19
F   = E
E   = P0(TT2)

This paper adjusts the operation flow as follows:
 

SS0 = A <<< 12
SS1 = (SS0 + E + Tj) <<< 7
TT2 = GGj(E,F,G) + H + SS1 + W[j]
TT2 = P0(TT2)
SS2 = SS1 ^ SS0
Wj' = W[j] ^ W[j+4]
TT1 = FFj(A,B,C) + D + SS2 + Wj'
B   = B <<< 9
F   = F >>> 13

The principle of adjusting the flow is not to change the numerical dependency. Here, we use the register redefinition technique which is only available in assembly language programming. Only reading the corresponding complete code and comments can we fully understand it. The so-called redefinition refers to the situation that the register changes its algorithm definition after completing the operation, which can reduce some assignment instructions. The actual effect is to omit four assignment operations in the standard document:

D   = C
B   = A
H   = G
F   = E

Accordingly, in the next iteration, we should remember who changed the name and never make a mistake.

(to be continued)

Posted by nEmoGrinder on Tue, 17 Dec 2019 20:08:40 -0800