MMX Comparison Instructions 62215

  • Permalink
  • submit to reddit
  • Email
  • Follow


Recently, I just finish my implementation on MMX image constant
addition. During my implementation, there are several things under my
consideration.

I am processing my image row by row. If my image width is not 8 byte
boundary, says 34 bytes.

Hence, I will use the following MMX register
mm0 = x|x|x|x|x|x|x|x where x is the constant to be added

to perform addition for 4 times (32 byte)

For the remaining byte (34 byte - 32 byte = 2 byte), I will use the
following register
mm1 = 0|0|0|0|0|0|x|x

to perform the last addition in that particular row.

For result 35th to 40th byte, since 0 are added to them, they are
totally remain unchanged

However, when performing thresholding (Packed Compare for Equal/ Packed
Compare for Greater Than), things goes different.

Of course, I still for the first 34 byte, I still can perform PCMPEQB
or PCMPGTB for 4 times using
mm0 = x|x|x|x|x|x|x|x where x is the threshold to be compared.

However,  how about for the remaining byte which is not 8 byte
boundary. How can I ignore the 35th to 40th byte? Is there any way

source = 6|5|4|3|2|1|4|3

PCMPGTB

mm1   = ?|?|?|?|?|?|3|3

will result

source = 6|5|4|3|2|1|ff|0

Comparison is only done on the last two byte.

Is there any optimized way I can achieve that?

Thank you.

yccheok

0
Reply spamtrap2 (1627) 6/8/2006 5:11:22 PM

See related articles to this posting

comp.lang.asm.x86 4889 articles. 10 followers. Post

0 Replies
264 Views

Similar Articles

[PageSpeed] 24


Reply:

Similar Artilces:

MMX Comparison Instructions 258630
Is there any way to compare quadwords with MMX? I'm looking to find something like PCMPEQD, but no luck yet. Thanks for any help you can offer. "Jack Pappas" <jack.pappas@gmail.com> wrote in message news:989d1c03.0406241709.6a797250@posting.google.com... > Is there any way to compare quadwords with MMX? I'm looking to find > something like PCMPEQD, but no luck yet. > > Thanks for any help you can offer. There is none. However, you can do this: pcmpeqd mm1, mm0 pshufw mm2, mm1, 0x4E ; 01 00 11 10 pand mm1, mm2 On Athlon I would use the pswapd instructio...

MMX instruction to speed up computation
Hi all, I know this is a newbie question, but I cannot get any decent tutorial online about it... I have the following C code: int calculate_distance(unsigned int* v1, unsigned int* v2) { int dist = 0, i; for(i = 0; i < N; i++) { dist += abs(v1[i] - v2[i]); } return dist; } N >> 1000 Now, I would like to speed up the computation by using __builtin_ia32_psadbw (I'm compiling with GCC), but I cannot get any example how to apply it to a vector taht has more than 8 elements... (integers in v1 and v2 are [0..255], so it could be even a unsigned char). Any pointer? thanks On N...

About MMX\SSE2 PSADBW instruction ?
Hi, i have try to optimize a simple loop like : for(uj = 0 ; uj < VEC_SIZE ; uj++) res+=abs(pvec1[uj]-pvec2[uj]); where pvec1 and 2 are char* with >=0 values i have write a function with inline assembly code which use psadbw, it seems to be slower than the C loop, but i don't know how explain it....and how correct it..... all advice are wellcome ... with i_size the number of SSE2 vector in pvec1 and 2 (ie VEC_SIZE/16) inline int sad_sse2(char* ps1, char* ps2, int i_size) { int res; int i_vector_size = i_size; __asm { mov eax, DWORD PTR [ps1] mov ebx, DWORD PTR [ps2]...

Investigating MMX instructions in Delphi.
Well, this will cheer up those in this gropu with a low level mentality :-) I did a bit of experimenting today, and found a really cool Add-in for delphi called "MMXasm", which does a bit of preprocessing to your delphi source files, and lets you write inline MMX assembler. ALL in all, it's *very* cool. I expect that the speed optimisation freaks (i.e. me!) can fine some great ways to use this. It's real nice because it effectively gives you 8 extra registers to play with, and I can see ways or rewriting some algorithms to include a lot more parallelism. My litt...

How much does it take to execute MMX instruction?
Hello group, I need to develop a highly optimized MMX based image processing algorithm. From the Intel Optimization Manual I found worst case instructions timings. It appears that instruction timings may vary from execution to execution. It may not be significant problem if you are not trying to squeeze every piece of performance available for your application. If extreme performance is the primary goal, then you need to use all available stuff to speed-up your calculation. The main advantage could be achieved using instruction pairing in U and V execution pipes. And here is the biggest contr...

MMX instructions and floating point stack
I notice a strange relation between the floating point instructions and instructions that read/write mmx registers. The following C code should print 420, but it prints a weird value (-2147483648) : </begin code> int main(void) { int foo=42; long long bar; __asm__ volatile ("movq %%mm0, %0" : "=g" (bar)) ; foo = foo*1e1 ; printf ("%d\n", foo) ; return (0) ; } </end code> If I remove the __asm__ instruction to read the mmx register, the program runs as expected (i.e. prints 420). Since I am using 1e1 as the constant literal, t...

MMX instructions wipe out floating point variables
Hi all, I tried adding some MMX instructions to an assembly function. And of course, they wiped out my floating point data. However when I tried to rectify this, it didn't work. I tried saving mm0 (the only MMX register I was using) and then restoring it when my MMX code was done. For some reason however this did not work. Like so: function: movq [tmp], mm0 ....various code... movq mm0, [tmp] ret Can anybody explain what went wrong? Thanks. "Campy Happer" <spamtrap@crayne.org> wrote in message news:66ca4a72-1e72-40ab-a0ce-6bb858e835fa@8g2000hse.googlegroups....

visualc inline assembly with MMX instructions error
hi all, I have a program using inline assembly with MMX instuctions. It compiles OK on Visual studio.net 2003, but when it runs to mmx intructions. It causes access violations. I am working on Pentium3, obviously it supports MMX. Here is my program. Please help. Thanks! ============================================================= #include <stdio.h> #include <time.h> #include <stdlib.h> #include <malloc.h> int main(int argc, char *argv[]) { clock_t start,end; double cpu_time_used; short *s0,*s1,*s2,*s3; int *so; unsigned int i,j; unsign...

How to find out which instruction sets (mmx/sse/etc.) are used?
Is there an automated tool that can parse nasm input or object file output and tell me which instruction set(s) are needed to run the code without an illegal instruction violation? Would it be possible to have nasm produce this information as an optional report? This would be very useful as a formal method to avoid the need to test software on every x86 processor. Thanks IA ea913 wrote: > Is there an automated tool that can parse nasm input or object file > output and tell me which instruction set(s) are needed to run the code > without an illegal instruction violation? > &g...

OT: ASCII files and Intel P4 MMX instructions
Hi all. A couple of weeks ago I wrote this small test program that took an ASCII file as input, did some processing and wrote the results back to an ASCII file. The input file contained some 4e6 numbers (2^22) on double precision. The file format is clean, one number per line, no fuzz. When I first wrote the program the time to load the data was some 70 seconds. The program wrote a similar number of processed data back to file in some 30 seconds. Then I got the new Turbo C++ compiler. I switched on P4 MMX instruction support, and ran the same program again. Now the program reads 4e6 data po...

Is EMMS required in a C-style function call using MMX instructions?
Well, I understand that inside a C-style function, I can use the FPU registers without worry about saving them (or so it seems from "Professional Assembly Language"). But am I required to use EMMS or FEMMS to restore the FPU functionality when the functions return to ensure that the calling function can continue using the FPU while the called function uses MMX instructions? Also, where can I get a detailed account of what happens during a C-style function call? I understand that input variables are stored in the stack, integer return values are stored in eax, floating point in st(0),...

How to use MMX/SSE/SSE2 instructions in Intel Fortran 8.0
I want to optimize some fortran code. So it is necessary to use assembly or intrinsics to rewrite some code. In the C++, I only need to include <xmmintrin.h>. But how could I use intrinsics in Fortran. Thanks Steven On Wed, 30 Jun 2004 14:00:37 +0800, "Ge, Steven" <geliang_cn@sina.com> wrote: >I want to optimize some fortran code. >So it is necessary to use assembly or intrinsics to rewrite some code. > >In the C++, I only need to include ><xmmintrin.h>. But how could I use intrinsics in Fortran. Intel Fortran does not provide intrinsics for th...

Need instruction or command to programmatically close the File and Folder Comparison Tool
I have used visdiff.m file for comparing two models, by passing the path of two models in its argument list. for ex: visdiff('C:/Data/model1.mdl','C:/Data/model2.mdl') On running this it loads the File and Folder Comparison tool to Compare two model and provides difference output. Now I want to programmatically close this tool. Please let me know how can i do this? MATLAB Version 7.10.0.499 (R2010a) Operating System: Microsoft Windows 7 Version 6.1 (Build 7600) Java VM Version: Java 1.6.0_12-b04 with Sun Microsystems Inc. Java HotSpot(TM) Client VM mixed mode ...

MMX
Hi, I'm learning assembly programming and I need to make a simple task as soon as posible. The formulation task is: int countchr(char *str, char c); Count the occurences of c in str using MMX. I've alredy done the task in x86 but i have to do it in MMX and i don't know how to manage it. Here is the x86 code: section .text global countchr countchr: push ebp mov ebp, esp mov edx, [ebp+8] mov ecx, 0 lop1: mov al,[edx] test al,al jz fin cmp al, [ebp+12] jne no add ecx, 1 no: inc edx jmp lop1 fin: mov eax, ecx mov esp, ebp pop ebp ret ...

comparison
Hi there, Is there a good comparison table out there that show the differences between the corporate firewalls like Checkpoint Firewall 1, Cisco PIX, Netscreen, Sonicwall, Bordermanager, ISA etc. ? Thanks Sjuull sjuull@gmail.com wrote: > Hi there, > > Is there a good comparison table out there that show the differences > between the corporate firewalls like Checkpoint Firewall 1, Cisco PIX, > Netscreen, Sonicwall, Bordermanager, ISA etc. ? > > > Thanks > Sjuull > Not that I have seen. If you find one, please let me know! Occaisionally there are short li...

comparison
Hi I am a new user of matlab. I have just started with it. I need to develop a multilayered perceptron and train it with back propagation method. Can anybody tell me whether I would use nntool or train using program based on standard algorithm.Please suggest Thanks Hi, With out NNtool box you can do. If you want to learn matlab. it is better option. You can cross check using NN Toolbox compands like "newff", " traingdr". Best Regards, Sathish Athreya. "soumita" <soumita_neogi@rediffmail.com> wrote in message news:eefa39a.-1@webx.raydaftYaTP... >...

comparison
Hello! I've a question concerning performance tests in MATLAB. Basically I' trying to get some (M) DFT-values from an size N signal. (actually N=102 for my application) I tried to find out the maximal M for which I can use goertzel, cz (chirp-z-transformation), ... that's it - I don't know any othe 'narrow-band-algorithm', which seem to pay off at all (by the way: wha are for - '... apply some filter ...' -> the algorithm loses). Now comes the point: the fft-function wins every time BY FAR, even if try to bother it with some reconfigurations using the fftw...

comparison
This may sound a bit stupid but I am not computer savvy, but could someone tell me what the difference is between macs OS and windows? In article <036e4a98-e2ce-4f50-b9ed-3c69b9a69f1d@z9g2000yqi.googlegroups.com>, Missd <denise.davis@murraystate.edu> wrote: > This may sound a bit stupid but I am not computer savvy, but could > someone tell me what the difference is between macs OS and windows? The Mac OS is designed to work well with it's Apple hardware, unlike Windows where the OS has to play nice, and often doesn't, on zillions of different kinds of hardwar...

comparison
I need some articles, papers, or researches which talk about comparison between genetic algorithms and other approaches used for timetabling(or comparison between optimization techniques used for timetabling) "Salhin" <faragsalhin@hotmail.com> wrote: > I need some articles, papers, or researches which talk about > comparison between genetic algorithms and other approaches used for > timetabling(or comparison between optimization techniques used for > timetabling) This research proposal contains a fairly good review of the state of the art, with references: http:...

comparison
What is the difference betweem if (x == 1) AND if (1==x) -Ajay Ajay said: > What is the difference betweem > > if (x == 1) > > AND > > if (1==x) The reason some people (including myself) prefer the second is that the error of mistyping = instead of == will be picked up by the compiler. Consider: if(x = 1) and if(1 = x) The former is legal C, but not what you wanted (or you would not have bothered with the if). The second is not legal C, so the compiler will diagnose it. -- Richard Heathfield "Usenet is a strange place" - dmr 29/7/1999 http:/...

Comparison
At a computer expo, Bill Gates reportedly compared the computer industry with the auto industry and stated: "If GM had kept up with technology like the computer industry has, we would be driving twenty-five dollar cars that got 1000 miles to the gallon." In response to Bill's comments, GM issued a press release stating (by Mr. Welch himself): If GM had developed technology like Microsoft, we would be driving cars with the following characteristics: 1. For no reason whatsoever your car would crash twice a day. 2. Every time they repainted the lines on the road you woul...

Comparison
Does anyone know of a website or have a technical comparison, including the pros and cons of the following products: IBM Via Voice Dragon Naturally Speaking Preferred iCommunicator iListen I have been looking through the products home web pages but I haven't come across a comparison made out between any of them. Any help would be greatly appreciated. ...

comparison
Good mornig to every body. I have a little sintax problem in script and i don't find the reason. It is about the comparison tests <=. That line seems to be OK : if zone(2*i+2)<zone(2*i), then zone1(2*i)=zone(2*i+2);end but not that one : if zone(2*i+2)<=zone(2*i), then zone1(2*i)=zone(2*i+2);end Scilab answeers me --> !--error 34 --> incorrect control intruction syntax But according to the scilab's help, its should be a good syntax ... Any idea about my problem ? Thank you. Fred -------------------------------...

Comparison
Hi All, Quick question here. In the code I am working on I am apparently doing a lot of dictionary lookups and those are taking a lot of time. I looked at the possibility of changing the structure and I found about the numpy structured arrays. The concrete question is: what would be the difference between using one or the other in terms of performance? meaning looping and performing checks an operations on them? Thanks, On Tue, Sep 23, 2014 at 2:57 AM, LJ <luisjosenovoa@gmail.com> wrote: > Quick question here. In the code I am working on I am apparently doing a lot of dict...