Recently, I just finish my implementation on MMX image constant
addition. During my implementation, there are several things under my
I am processing my image row by row. If my image width is not 8 byte
boundary, says 34 bytes.
Hence, I will use the following MMX register
mm0 = x|x|x|x|x|x|x|x where x is the constant to be added
to perform addition for 4 times (32 byte)
For the remaining byte (34 byte - 32 byte = 2 byte), I will use the
mm1 = 0|0|0|0|0|0|x|x
to perform the last addition in that particular row.
For result 35th to 40th byte, since 0 are added to them, they are
totally remain unchanged
However, when performing thresholding (Packed Compare for Equal/ Packed
Compare for Greater Than), things goes different.
Of course, I still for the first 34 byte, I still can perform PCMPEQB
or PCMPGTB for 4 times using
mm0 = x|x|x|x|x|x|x|x where x is the threshold to be compared.
However, how about for the remaining byte which is not 8 byte
boundary. How can I ignore the 35th to 40th byte? Is there any way
source = 6|5|4|3|2|1|4|3
mm1 = ?|?|?|?|?|?|3|3
source = 6|5|4|3|2|1|ff|0
Comparison is only done on the last two byte.
Is there any optimized way I can achieve that?