Apparently, the Pentium 4 has no SSE integer comparison operation. And emulating an integer comparison with floating-point compare operations is SLOW.
I have a 128-bit XMM register containing a bitfield. There are no restrictions on what bits are set (although it is highly probably that none are), and I need to find out if any of them are set. The best I have been able to come up with (that works in all cases) is
movq %xmm7, %xmm1 # set %xmm1 to 0
cmpneqpd %xmm0, %xmm1 # compare u to 0, store result in %xmm1
movmskpd %xmm1, %edi # move sign bits to %edi
testl %edi, %edi # if u != 0f
jne .L67
movq %xmm7, %xmm1 # set %xmm1 to 0
cmpunordpd %xmm0, %xmm1 # check if u is NaN
movmskpd %xmm1, %edi # move result to %edi
testl %edi, %edi # if u == NaN
je .L67
Anyone know of a faster way of checking this?
The next step in my algorithm, if the register is non-zero, is to extract one word at a time from %xmm0 and find out what bits are set using a bunch of shifts and ands. If the register is zero, this next part does nothing but waste time. As it is, my floating-point compare is so slow that the code runs faster if I just leave it out and let the processor waste time shifting zeroes around. |