07-24-2023, 12:12 PM
I'm trying to figure out an efficient way to load compile time constant floats into SSE(2/3) registers. I've tried doing simple code like this,
const __m128 x = { 1.0f, 2.0f, 3.0f, 4.0f };
but that generates 4 movss instructions from memory!
movss xmm0,dword ptr [__real@3f800000 (14048E534h)]
movss xmm1,dword ptr [__real@40000000 (14048E530h)]
movaps xmm6,xmm12
shufps xmm6,xmm12,0C6h
movss dword ptr [rsp],xmm0
movss xmm0,dword ptr [__real@40400000 (14048E52Ch)]
movss dword ptr [rsp+4],xmm1
movss xmm1,dword ptr [__real@40a00000 (14048E528h)]
which load the scalars in and out of memory... (?!?!)
Doing this though..
float Align(16) myfloat4[4] = { 1.0f, 2.0f, 3.0f, 4.0f, }; // out in global scope
generates.
movaps xmm5,xmmword ptr [::myarray4 (140512050h)]
Ideally, it would be nice if I have constants their would be a way not to even touch memory and just do it with immediate style instructions (e.g. the constants compiled into the instruction itself).
Thanks
const __m128 x = { 1.0f, 2.0f, 3.0f, 4.0f };
but that generates 4 movss instructions from memory!
movss xmm0,dword ptr [__real@3f800000 (14048E534h)]
movss xmm1,dword ptr [__real@40000000 (14048E530h)]
movaps xmm6,xmm12
shufps xmm6,xmm12,0C6h
movss dword ptr [rsp],xmm0
movss xmm0,dword ptr [__real@40400000 (14048E52Ch)]
movss dword ptr [rsp+4],xmm1
movss xmm1,dword ptr [__real@40a00000 (14048E528h)]
which load the scalars in and out of memory... (?!?!)
Doing this though..
float Align(16) myfloat4[4] = { 1.0f, 2.0f, 3.0f, 4.0f, }; // out in global scope
generates.
movaps xmm5,xmmword ptr [::myarray4 (140512050h)]
Ideally, it would be nice if I have constants their would be a way not to even touch memory and just do it with immediate style instructions (e.g. the constants compiled into the instruction itself).
Thanks