f



Memory bandwidth on Northwood vs Madison

The current Pentium 4 has a 64-bit-wide bus, clocked at 200MHz with
quadrupled data rate, so a theoretical speed of 6.4GB/s; a
configuration might feed this chip with two channels of DDR400 memory.

The current Itanium2 has a 128-bit-wide bus, clocked at 200MHz with
double data rate, so a theoretical speed of 6.4GB/s; a configuration
might feed this chip with four channels of DDR200 memory.

The swim sub-benchmark of Spec2000 is often used as a proxy for
bandwidth. So you'd expect the P4 and the I2 to score similarly.

But a P4/3200 scores 2053 on this benchmark (versus 2034 for a P4/2400
on the same platform); a Madison/1300 scores 3690.

What am I missing here?  Have compilers recently become good enough
that swim is no longer a proxy for bandwidth?  I suppose the
bus has been the same from the McKinley/900 through the Madison/1500,
and scores have ranged from 2328 to 3952 in that group, though that's
with a range of compilers and chipsets.

Tom


0
pmxtow
7/2/2003 12:24:32 PM
comp.arch 7611 articles. 0 followers. carchreader (32) is leader. Post Follow

3 Replies
791 Views

Similar Articles

[PageSpeed] 14

In article <bduiu0$oqt$1@oyez.ccc.nottingham.ac.uk>,
pmxtow@merlot.uucp (Thomas Womack) writes:
|> 
|> But a P4/3200 scores 2053 on this benchmark (versus 2034 for a P4/2400
|> on the same platform); a Madison/1300 scores 3690.
|> 
|> What am I missing here?  Have compilers recently become good enough
|> that swim is no longer a proxy for bandwidth?  I suppose the
|> bus has been the same from the McKinley/900 through the Madison/1500,
|> and scores have ranged from 2328 to 3952 in that group, though that's
|> with a range of compilers and chipsets.

Not necessarily.  Another likelihood is that bandwidth is no longer
a scalar value.  I have got factors of 2-4 difference on 'pure'
bandwidth tests (STREAM and my equivalent), depending on details
of usage.

For example, some CPUs might have the property that the bandwidth
for a simple CPU is, say, 4 GB/sec theoretical, 2 GB/sec practical,
and only 1 GB/sec for reading sequentially.

This might be because only half the bandwidth is available for
reading from memory, and the other half is reserved for writing
back dirty pages and loading lines that are being written into.

Similarly, a CPU might be good at sequential reads but poor at
random ones (even if whole cache lines were being read).  Or it
might deliver at most 1 GB/sec to a single stream, but 3 GB/sec
overall.  And so on.


Regards,
Nick Maclaren.
0
nmm1
7/2/2003 1:10:04 PM
In article <bduiu0$oqt$1@oyez.ccc.nottingham.ac.uk>, 
pmxtow@merlot.uucp (Thomas Womack) wrote:

>What am I missing here?  Have compilers recently become good enough
>that swim is no longer a proxy for bandwidth?  I suppose the

Isn't it common for P4 systems not to be configured to maximize 
use of FSB bandwidth for memory?  (I.e., did the SPECed P4 
machine have "two channels of DDR400"?)
0
memorymorass
7/4/2003 8:47:32 AM
In article <20030704044732.23307.00001154@mb-m29.aol.com>,
Paul A. Clayton <memorymorass@aol.com> wrote:
>In article <bduiu0$oqt$1@oyez.ccc.nottingham.ac.uk>, 
>pmxtow@merlot.uucp (Thomas Womack) wrote:
>
>>What am I missing here?  Have compilers recently become good enough
>>that swim is no longer a proxy for bandwidth?  I suppose the
>
>Isn't it common for P4 systems not to be configured to maximize 
>use of FSB bandwidth for memory?  (I.e., did the SPECed P4 
>machine have "two channels of DDR400"?)

Yes, it did; they also had results for a P4 with two channels of DDR333,
which consistently scored about 1660 independent of clock speed, as you'd
expect from the scaling.

The referenced result is 

http://www.specbench.org/cpu2000/results/res2003q3/cpu2000-20030616-02266.asc

and the Dell machine tested is built around the Intel 875 chipset for which
Intel at

http://www.specbench.org/cpu2000/results/res2003q2/cpu2000-20030505-02156.asc

get a score of around 1890

Tom


0
Thomas
7/4/2003 2:30:59 PM
Reply: