What is an intrinsics
Intrinsics are way to access special CPU intructions through some special C functions and data types. It seem fairly portable (YMMV) and multiple mainstream compilers (gcc, intel, visual studio, and llvm) seems to provide support.
Here i’ll be looking on how to interact with them from a typical C program, which would provide a quick access to fancy instructions without having to drop to inline assembly or separate assembly file.
In this adventure, i’m particularly interested in looking at the AES instruction set in the latest AMD and Intel cpu.
Intrinsics have they own datatypes, and the only one i’m interested in is the __m128i type. This type represent a 128 bit value. This simple program get a value in the datatype using two uint64_t. SSE operations need to be aligned to 16 for loading and storing, which is the reason of the align attribute.
1 2 3
Now we can get a simple value out using:
1 2 3 4
From there it’s easy to use simple operation like a xor:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
AES is composed of 2 phases, key expansion and then the rounds.
The first phase is covered by the AESKEYGENASSIST instruction which is available as an intrinsic as _mm_aeskeygenassist_si128.
This operation take 2 parameters, the previous key and the round to generate mapped through the AES’s rcon box. here’s how to generate an AES 128 bits expanded key:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Then the rounds is in total 10 rounds, where the last round is special. This is covered by AESENC and AESENCLAST available as _mm_aesenc_si128 and _mm_aesenclast_si128 respectively.
both intrinsics takes as parameter the key part used for the round as second parameter, and the current state of encryption as first parameter.
A typical AES 128 bits encryption will looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
That’s it, it’s a very small tutorial that doesn’t go in the depth of every operations used. It serves as an implementation of an AES-ni backend for my AES implementation, This might helps people understand how to use those fancy instructions (SSE, NEON, ..) that more than often yields a very significant improvement over using standard instructions.
posted byon April 12, 2012.