f



PIII Integer Performance Problems

Initially I posted the problem below to a webforum at SandPile.org, and I
was redirected to a post in this group:
http://groups.google.com/groups?q=andy+glew+processor+scheduler&hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=8as0ju%244pp%40spool.cs.wisc.edu&rnum=2

i don't think that post answered my questions, but I got a nice hint, where
the place for asking such thing might be (i.e. here).

any help with the anomaly i discuss below will be highly appreciated!

My intent is to detect the number and type of functional units in a CPU
with running benchmark tests. Just for the sake of clarity I will talk in
"C" below, although, be assured I looked at the assembly and it is what you
expect.
If you repeat "r1+=r2" on a X GHz P4, you get 2X BIPS (billion instructions
per second), because the ALU is double pumped. Now, if you repeat
"r1+=r3;r2+=r3", these are two independent instructions, which can go to the
two different ALUs (and P4 has 2 double-pumped integer ALUs). So one would
expect a twofold improvement in BIPS (i.e. 4X BIPS). The cruel fact is that
you only get 3X BIPS. The reason, as far as I understand it now is that the
trace cache can only pass 3 instructions per cycle to the pipe (as opposed
to the 4 we are trying...)
Now everything looks fine till now, until you repeat the same experiment on
a PIII. PIII also has 2 ALUs, but they are not double-pumped. When I try the
first experiment amove on a X GHz PIII, I get X BIPS as expected. When I try
the second experiment I get 1.5X BIPS (instead of the expected 2X). I don't
really see where the problem is here! We are trying to sustain 2 integer
instructions per cycle and PIII does not cope with it!

Please help with any ideas!



0
Kamen
7/5/2003 9:26:27 PM
comp.arch 7611 articles. 0 followers. carchreader (32) is leader. Post Follow

0 Replies
873 Views

Similar Articles

[PageSpeed] 33

Reply: