We are currently migrating from IBM RS/6000 p660 4 cpu machine to v1280 12
cpu machine due to corporate directive to standardize on Sun.
We are running Oracle 8.1.7.4 (32 bit). We are finding during our
performance and parallel tests that our IBM is 2 times as fast as the v1280.
Our processes are only using 1 CPU.
Management does not want to go to 64 bit so that is not on the agenda.
I have pulled off the spec2000 for the 2 chips and it says the Ultrasparc
III should be faster than the IBM processor but we are not finding that with
our application.
We ran a simple C program test and did find that just CPU use it says the
IBM is twice the speed for cpu operations.
Any comments?
|
|
0
|
|
|
|
Reply
|
computer
|
12/28/2003 2:37:16 PM |
|
On Sun, 28 Dec 2003 14:37:16 +0000, computer person wrote:
> We are currently migrating from IBM RS/6000 p660 4 cpu machine to v1280 12
> cpu machine due to corporate directive to standardize on Sun.
>
> We are running Oracle 8.1.7.4 (32 bit). We are finding during our
> performance and parallel tests that our IBM is 2 times as fast as the v1280.
> Our processes are only using 1 CPU.
>
> Management does not want to go to 64 bit so that is not on the agenda.
>
> I have pulled off the spec2000 for the 2 chips and it says the Ultrasparc
> III should be faster than the IBM processor but we are not finding that with
> our application.
>
> We ran a simple C program test and did find that just CPU use it says the
> IBM is twice the speed for cpu operations.
>
> Any comments?
You are taking a too simplistic approach when trying to verify
expectations from the V1280. Oracle is NOT strictly a "CPU" based
application and is NOT a "simple C program" It is a database
that taxes disk IO, memory and CPU. Did someone just "load up" oracle on
the V1280 just to run some benchmarks or has it properly been setup? I
would guess that "it has something to do with configuration" and not any
particular deficency of the machine
|
|
0
|
|
|
|
Reply
|
Tom
|
12/28/2003 4:00:10 PM
|
|
computer person wrote:
> We are currently migrating from IBM RS/6000 p660 4 cpu machine to v1280 12
> cpu machine due to corporate directive to standardize on Sun.
>
> We are running Oracle 8.1.7.4 (32 bit). We are finding during our
> performance and parallel tests that our IBM is 2 times as fast as the v1280.
> Our processes are only using 1 CPU.
>
> Management does not want to go to 64 bit so that is not on the agenda.
>
> I have pulled off the spec2000 for the 2 chips and it says the Ultrasparc
> III should be faster than the IBM processor but we are not finding that with
> our application.
>
> We ran a simple C program test and did find that just CPU use it says the
> IBM is twice the speed for cpu operations.
>
> Any comments?
>
>
I'm curious about what version and patch level of Solaris if that is
what you are using, and if not, then what...
Also, what about I/O? How is your storage arranged? Probably significant
given the application is a database.
We are using an elderly E4000 recently upgraded from two 250 MHz CPUs to
eight 400 MHz CPUs, and did not see more than an incremental
improvement. I expected it to improve much more. I do not have detailed
performance info, however, so I am certainly missing the details.
Perhaps in my case we were not loading the system enough to make the
addition of more processors a significant addition, and most of our
improvement is simply because the processors are running at a faster clock.
|
|
0
|
|
|
|
Reply
|
Chuck
|
12/28/2003 4:03:28 PM
|
|
"Tom Hamilton" <sg7188@snet.net> wrote in message
news:pan.2003.12.28.16.00.00.833832@snet.net...
> On Sun, 28 Dec 2003 14:37:16 +0000, computer person wrote:
>
> > We are currently migrating from IBM RS/6000 p660 4 cpu machine to v1280
12
> > cpu machine due to corporate directive to standardize on Sun.
> >
> > We are running Oracle 8.1.7.4 (32 bit). We are finding during our
> > performance and parallel tests that our IBM is 2 times as fast as the
v1280.
> > Our processes are only using 1 CPU.
> >
> > Management does not want to go to 64 bit so that is not on the agenda.
> >
> > I have pulled off the spec2000 for the 2 chips and it says the
Ultrasparc
> > III should be faster than the IBM processor but we are not finding that
with
> > our application.
> >
> > We ran a simple C program test and did find that just CPU use it says
the
> > IBM is twice the speed for cpu operations.
> >
> > Any comments?
>
> You are taking a too simplistic approach when trying to verify
> expectations from the V1280. Oracle is NOT strictly a "CPU" based
> application and is NOT a "simple C program" It is a database
> that taxes disk IO, memory and CPU. Did someone just "load up" oracle on
> the V1280 just to run some benchmarks or has it properly been setup? I
> would guess that "it has something to do with configuration" and not any
> particular deficency of the machine
The queries that run during the parallel test are using sqlplus as the
client. They seem to be very CPU intensize because I do not see any Wait for
IO. The CPU total runs at 8.3 % which is 1 full cpu during the run. We have
12 cpus so 100/12=8.3%.
We are using 3510 storage with an extra jbod. We are using Veritas
vm/filesystem 3.5 and have tuned this as much as we could with no
improvment. It keeps coming back to the CPU speed.
We are using Solaris 8 latest recommended patches. We do not have plans for
Sol9 right now due to too many machines to upgrade.
Hope that answers your questions. By the way, the simple "C" program counts
from 1 to 100 million or something like that and was supplied by Sun
Engineering due to our concern about CPU speed. They have not gotten back to
us since I published the fact that the IBM machine ran it twice as fast with
1 CPU. When I run 8 concurrent the V1280 is faster by a few % and when I run
24 the v1280 is faster by 27% then our 4 cpu ibm box. Thats because there is
12 processors vs 4, I guess. That does not help us becuase our Oracle
queries only use 1 CPU during the query. Sure, we could look at parralel
query using more CPUs but this test is pretty valid if you want to compare
like scenerios. Oracle has been tuned by Oracle professional services and
Sun professional services. Several changes we made which improved a little
bit.
We have done our homework so don't think we simply banged up a machine and
ran stuff :)
Any other comments why IBM p660 single processor is kicking the ass of a
900Mhz Ultrasparc III on a V1280?
Thanks
|
|
0
|
|
|
|
Reply
|
computer
|
12/28/2003 4:36:15 PM
|
|
At Sun, 28 Dec 2003 14:37:16 GMT, "computer person" <fake_address@nothing.com> writes:
At Sun, 28 Dec 2003 14:37:16 GMT, "computer person" <fake_address@nothing.com> writes:
> We are currently migrating from IBM RS/6000 p660 4 cpu machine to v1280 12
> cpu machine due to corporate directive to standardize on Sun.
> We ran a simple C program test and did find that just CPU use it says the
> IBM is twice the speed for cpu operations.
This is a bit vague. What C program is that, exactly? Can you post
it here? Exactly how did you compile and time it on both hosts?
Also, which processors exactly are in your two servers?
SPEC says the IBM eServer pSeries 660 Model 6H0 (750 MHz) got a
SPECint_base2000 score of 431, whereas the Sun Fire V1280 (900 MHz)
got a 479. That's pretty close. I wouldn't be surprised if the IBM
beat the Sun by a factor of two on some CPU benchmarks, but I would be
a bit surprised if it won that big overall on a broader CPU test.
|
|
0
|
|
|
|
Reply
|
Paul
|
12/28/2003 6:27:07 PM
|
|
computer person wrote:
> We are currently migrating from IBM RS/6000 p660 4 cpu machine to v1280 12
> cpu machine due to corporate directive to standardize on Sun.
>
> We are running Oracle 8.1.7.4 (32 bit). We are finding during our
> performance and parallel tests that our IBM is 2 times as fast as the v1280.
> Our processes are only using 1 CPU.
>
> Management does not want to go to 64 bit so that is not on the agenda.
>
> I have pulled off the spec2000 for the 2 chips and it says the Ultrasparc
> III should be faster than the IBM processor but we are not finding that with
> our application.
>
> We ran a simple C program test and did find that just CPU use it says the
> IBM is twice the speed for cpu operations.
>
> Any comments?
>
>
I imagine the devil is in the details. You've probably tuned your IBM
installation over the years you've had it. Identify what is limiting
performance on the new machine and tune IT.
BTW, is there any particular reason to just use one processor?
--
After being targeted with gigabytes of trash by the "SWEN" worm, I have
concluded we must conceal our e-mail address. Our true address is the
mirror image of what you see before the "@" symbol. It's a shame such
steps are necessary. ...Charlie
|
|
0
|
|
|
|
Reply
|
CJT
|
12/28/2003 7:08:50 PM
|
|
On Sun, 28 Dec 2003, Tom Hamilton wrote:
> You are taking a too simplistic approach when trying to verify
> expectations from the V1280. Oracle is NOT strictly a "CPU" based
> application and is NOT a "simple C program" It is a database
> that taxes disk IO, memory and CPU. Did someone just "load up" oracle on
> the V1280 just to run some benchmarks or has it properly been setup? I
> would guess that "it has something to do with configuration" and not any
> particular deficency of the machine
Don't forget that IBM CPUs have two cores per processor
(or is the OP's machine too old for this to be a factor?).
That might explain some of the discrepancy.
--
Rich Teer, SCNA, SCSA . * * . * .* .
. * . .*
President, * . . /\ ( . . *
Rite Online Inc. . . / .\ . * .
.*. / * \ . .
. /* o \ .
Voice: +1 (250) 979-1638 * '''||''' .
URL: http://www.rite-online.net ******************
|
|
0
|
|
|
|
Reply
|
Rich
|
12/28/2003 9:20:49 PM
|
|
"CJT" <abujlehc@prodigy.net> wrote in message
news:3FEF2A46.7000402@prodigy.net...
> computer person wrote:
>
> > We are currently migrating from IBM RS/6000 p660 4 cpu machine to v1280
12
> > cpu machine due to corporate directive to standardize on Sun.
> >
> > We are running Oracle 8.1.7.4 (32 bit). We are finding during our
> > performance and parallel tests that our IBM is 2 times as fast as the
v1280.
> > Our processes are only using 1 CPU.
> >
> > Management does not want to go to 64 bit so that is not on the agenda.
> >
> > I have pulled off the spec2000 for the 2 chips and it says the
Ultrasparc
> > III should be faster than the IBM processor but we are not finding that
with
> > our application.
> >
> > We ran a simple C program test and did find that just CPU use it says
the
> > IBM is twice the speed for cpu operations.
> >
> > Any comments?
> >
> >
>
> I imagine the devil is in the details. You've probably tuned your IBM
> installation over the years you've had it. Identify what is limiting
> performance on the new machine and tune IT.
>
> BTW, is there any particular reason to just use one processor?
>
>
> --
> After being targeted with gigabytes of trash by the "SWEN" worm, I have
> concluded we must conceal our e-mail address. Our true address is the
> mirror image of what you see before the "@" symbol. It's a shame such
> steps are necessary. ...Charlie
Our IBM system is only 1 year old and no tuning has been done what so ever
to the IBM system. kernel parms are all default .
We use 1 processor cuz that is what Oracle uses in the 1 query.
You are right about the spec2000 as I already mentioned above but thanks for
inserting the exact #'s which what it is.
The simple C program is as follows from Sun Egineering:
#define LIMIT (1000000*1000)
main()
{
int cnt = 0;
while (cnt < LIMIT)
cnt++;
printf("final = %d million\n", cnt/1000000);
}
It was compiled with default option on GCC on both IBM and Sun..It counts to
1000 million in 10 seconds on IBM p660 and 20 seconds on v1280.
As far as Oracle tuning goes, as I said in another follow up post before,
Oracle Prof services and Sun prof services were hired to tune the Sun
machine after we noticed the perf issue.
Perhaps thers can compile that C program and run it on their boxes (IBM and
SUN) and post the results just for fun!
|
|
0
|
|
|
|
Reply
|
computer
|
12/28/2003 9:21:09 PM
|
|
"computer person" <fake_address@nothing.com> writes in comp.unix.solaris:
|The simple C program is as follows from Sun Egineering:
|#define LIMIT (1000000*1000)
|main()
|{
| int cnt = 0;
| while (cnt < LIMIT)
| cnt++;
| printf("final = %d million\n", cnt/1000000);
|}
|
|It was compiled with default option on GCC on both IBM and Sun..It counts to
|1000 million in 10 seconds on IBM p660 and 20 seconds on v1280.
I think the lesson I'd learn here is not to use gcc with default
options:
System: Sun E250 with 2x400Mhz UltraSPARC II
Compilers: gcc 2.95, SunOne CC 8.0 (patch 112760-04)
gcc w/no options: 32.72u 0.01s 0:32.73 100.0%
gcc -O2: 5.03u 0.01s 0:05.04 100.0%
cc w/no options: 5.04u 0.00s 0:05.04 100.0%
cc -fast: 0.01u 0.00s 0:00.01 100.0%
--
________________________________________________________________________
Alan Coopersmith alanc@alum.calberkeley.org
http://www.CSUA.Berkeley.EDU/~alanc/ aka: Alan.Coopersmith@Sun.COM
Working for, but definitely not speaking for, Sun Microsystems, Inc.
|
|
0
|
|
|
|
Reply
|
Alan
|
12/28/2003 9:33:28 PM
|
|
Alan Coopersmith <alanc@alum.calberkeley.org> writes in comp.unix.solaris:
|"computer person" <fake_address@nothing.com> writes in comp.unix.solaris:
||The simple C program is as follows from Sun Egineering:
||#define LIMIT (1000000*1000)
||main()
||{
|| int cnt = 0;
|| while (cnt < LIMIT)
|| cnt++;
|| printf("final = %d million\n", cnt/1000000);
||}
||
||It was compiled with default option on GCC on both IBM and Sun..It counts to
||1000 million in 10 seconds on IBM p660 and 20 seconds on v1280.
|
|I think the lesson I'd learn here is not to use gcc with default
|options:
|
|System: Sun E250 with 2x400Mhz UltraSPARC II
|Compilers: gcc 2.95, SunOne CC 8.0 (patch 112760-04)
|
| gcc w/no options: 32.72u 0.01s 0:32.73 100.0%
| gcc -O2: 5.03u 0.01s 0:05.04 100.0%
| cc w/no options: 5.04u 0.00s 0:05.04 100.0%
| cc -fast: 0.01u 0.00s 0:00.01 100.0%
BTW, in fairness, cc -fast seems to figure out how useless the loop is
and optimize it out completely:
! SUBROUTINE main
!
! OFFSET SOURCE LINE LABEL INSTRUCTION
.global main
main:
/* 000000 7 */ sethi
%hi(.L23),%o5
/* 0x0004 */ or %g0,%o7,%g1
/* 0x0008 */ add %o5,%lo(.L23),%o0
/* 0x000c */ or %g0,1000,%o1
/* 0x0010 */ call printf ! params = %o0 %o1 ! Result = ! (tail call)
/* 0x0014 */ or %g0,%g1,%o7
/* 0x0018 0 */ .type main,2
/* 0x0018 0 */ .size main,(.-main)
/* 0x0018 0 */ .global __fsr_init_value
/* 0x0018 */ __fsr_init_value=1
That's the entire program when compiled with -fast.
Without -fast there are actually compare & branch instructions in the
assembler output, so it does run through the loop.
--
________________________________________________________________________
Alan Coopersmith alanc@alum.calberkeley.org
http://www.CSUA.Berkeley.EDU/~alanc/ aka: Alan.Coopersmith@Sun.COM
Working for, but definitely not speaking for, Sun Microsystems, Inc.
|
|
0
|
|
|
|
Reply
|
Alan
|
12/28/2003 9:41:19 PM
|
|
>
> The simple C program is as follows from Sun Egineering:
> #define LIMIT (1000000*1000)
> main()
> {
> int cnt = 0;
> while (cnt < LIMIT)
> cnt++;
> printf("final = %d million\n", cnt/1000000);
> }
>
> It was compiled with default option on GCC on both IBM and Sun..It counts to
> 1000 million in 10 seconds on IBM p660 and 20 seconds on v1280.
>
> Perhaps thers can compile that C program and run it on their boxes (IBM and
> SUN) and post the results just for fun!
>
This test program is not suitable for performance tests.
Depending on compiler and compiler flags you'll get wide variing times.
Here are the times I got on my Sun System (SunBlade 150 with 650 MHz,
Solaris 9 08/03):
compiler flags time
gcc-3.3.2 - ~20.2s
gcc-3.3.2 O ~ 1.6s
Sun cc 5.5 - ~ 3.1s
Sun cc 5.5 O ~ 0.0s
The last run has probably recognized the nonsense in the program and
modified it to something similar to this:
#define LIMIT (1000000*1000)
main()
{
cnt=LIMIT;
printf("final = %d million\n", cnt/1000000);
}
But this UltraSparc IIi@650MHz is not as fast as a UltraSparc III@900MHz.
According to SPEC INT2000 the US-III should be twice as fast as the US-IIi.
Regards
Frank
P.S.:
a modified version takes a lot more time.
#define LIMIT (1000000*1000)
main()
{
int cnt = 0;
while (cnt < LIMIT) {
cnt++;
if ( cnt % 1000000 == 0 ) {
printf("cnt = %d million\n", cnt/1000000);
}
}
}
gcc-3.3.2 O3 ~112s
Sun cc 5.5 fast ~55s
|
|
0
|
|
|
|
Reply
|
Frank
|
12/28/2003 10:02:25 PM
|
|
"Alan Coopersmith" <alanc@alum.calberkeley.org> wrote in message
news:bsnilv$llh$2@agate.berkeley.edu...
> Alan Coopersmith <alanc@alum.calberkeley.org> writes in comp.unix.solaris:
> |"computer person" <fake_address@nothing.com> writes in comp.unix.solaris:
> ||The simple C program is as follows from Sun Egineering:
> ||#define LIMIT (1000000*1000)
> ||main()
> ||{
> || int cnt = 0;
> || while (cnt < LIMIT)
> || cnt++;
> || printf("final = %d million\n", cnt/1000000);
> ||}
> ||
> ||It was compiled with default option on GCC on both IBM and Sun..It
counts to
> ||1000 million in 10 seconds on IBM p660 and 20 seconds on v1280.
> |
> |I think the lesson I'd learn here is not to use gcc with default
> |options:
> |
> |System: Sun E250 with 2x400Mhz UltraSPARC II
> |Compilers: gcc 2.95, SunOne CC 8.0 (patch 112760-04)
> |
> | gcc w/no options: 32.72u 0.01s 0:32.73 100.0%
> | gcc -O2: 5.03u 0.01s 0:05.04 100.0%
> | cc w/no options: 5.04u 0.00s 0:05.04 100.0%
> | cc -fast: 0.01u 0.00s 0:00.01 100.0%
>
> BTW, in fairness, cc -fast seems to figure out how useless the loop is
> and optimize it out completely:
> ! SUBROUTINE main
> !
> ! OFFSET SOURCE LINE LABEL INSTRUCTION
>
> .global main
> main:
> /* 000000 7 */ sethi
> %hi(.L23),%o5
> /* 0x0004 */ or %g0,%o7,%g1
> /* 0x0008 */ add %o5,%lo(.L23),%o0
> /* 0x000c */ or %g0,1000,%o1
> /* 0x0010 */ call printf ! params = %o0 %o1 ! Result = ! (tail call)
> /* 0x0014 */ or %g0,%g1,%o7
> /* 0x0018 0 */ .type main,2
> /* 0x0018 0 */ .size main,(.-main)
> /* 0x0018 0 */ .global __fsr_init_value
> /* 0x0018 */ __fsr_init_value=1
>
> That's the entire program when compiled with -fast.
> Without -fast there are actually compare & branch instructions in the
> assembler output, so it does run through the loop.
>
> --
> ________________________________________________________________________
> Alan Coopersmith alanc@alum.calberkeley.org
> http://www.CSUA.Berkeley.EDU/~alanc/ aka: Alan.Coopersmith@Sun.COM
> Working for, but definitely not speaking for, Sun Microsystems, Inc.
OK..You guys are lots of fun..What would be a more valid C program to test
CPU speed then? Obviously if you playing with options to optimize on the
compiler you are messing with the generated machine instructions. The goal
is to have exactly the same cycles of instructions measured on each platform
and at 32 bit not 64bit because that is what we use ..All spec2000 are
solaris9 and probably 64bit so that is apples to orange compare to our
environment.
|
|
0
|
|
|
|
Reply
|
computer
|
12/28/2003 11:19:33 PM
|
|
computer person wrote:
<snip>
> OK..You guys are lots of fun..What would be a more valid C program to test
> CPU speed then? Obviously if you playing with options to optimize on the
> compiler you are messing with the generated machine instructions. The goal
> is to have exactly the same cycles of instructions measured on each platform
> and at 32 bit not 64bit because that is what we use ..All spec2000 are
> solaris9 and probably 64bit so that is apples to orange compare to our
> environment.
>
>
Having "the same cycles of instructions" might or might not make sense
when you're crossing architecture lines. Ultimately what you want
benchmarked is your specific workload; no synthetic benchmark is a
perfect predictor of what YOUR hardware/software/workload combination
will be called upon to accomplish.
--
After being targeted with gigabytes of trash by the "SWEN" worm, I have
concluded we must conceal our e-mail address. Our true address is the
mirror image of what you see before the "@" symbol. It's a shame such
steps are necessary. ...Charlie
|
|
0
|
|
|
|
Reply
|
CJT
|
12/28/2003 11:34:11 PM
|
|
On Sun, 28 Dec 2003, computer person wrote:
> and at 32 bit not 64bit because that is what we use ..All spec2000 are
> solaris9 and probably 64bit so that is apples to orange compare to our
> environment.
Not likely: 64-bit wouldn't give the SPEC tests any advantage
that I can think of. There's nothing magic about being 64-bit
that makes it magically faster than an equivelent 32-bit app,
ya know (although they CAN be). If anything, a 64-bit app
might expect a small performance degradation, due to the increased
size of pointers and longs (hence, fewer cache hits).
--
Rich Teer, SCNA, SCSA . * * . * .* .
. * . .*
President, * . . /\ ( . . *
Rite Online Inc. . . / .\ . * .
.*. / * \ . .
. /* o \ .
Voice: +1 (250) 979-1638 * '''||''' .
URL: http://www.rite-online.net ******************
|
|
0
|
|
|
|
Reply
|
Rich
|
12/28/2003 11:59:08 PM
|
|
"CJT" <abujlehc@prodigy.net> wrote in message
news:3FEF6875.9000509@prodigy.net...
> computer person wrote:
>
> <snip>
> > OK..You guys are lots of fun..What would be a more valid C program to
test
> > CPU speed then? Obviously if you playing with options to optimize on the
> > compiler you are messing with the generated machine instructions. The
goal
> > is to have exactly the same cycles of instructions measured on each
platform
> > and at 32 bit not 64bit because that is what we use ..All spec2000 are
> > solaris9 and probably 64bit so that is apples to orange compare to our
> > environment.
> >
> >
>
> Having "the same cycles of instructions" might or might not make sense
> when you're crossing architecture lines. Ultimately what you want
> benchmarked is your specific workload; no synthetic benchmark is a
> perfect predictor of what YOUR hardware/software/workload combination
> will be called upon to accomplish.
>
> --
> After being targeted with gigabytes of trash by the "SWEN" worm, I have
> concluded we must conceal our e-mail address. Our true address is the
> mirror image of what you see before the "@" symbol. It's a shame such
> steps are necessary. ...Charlie
Well, with all that said, if we run our application with exactly the same
data and optimized database and disk set up the IBM is twice as fast..Cheers
|
|
0
|
|
|
|
Reply
|
Computer
|
12/29/2003 12:00:34 AM
|
|
"Rich Teer" <rich.teer@rite-group.com> wrote in message
news:Pine.SOL.4.58.0312281556170.9814@zaphod.rite-group.com...
> On Sun, 28 Dec 2003, computer person wrote:
>
> > and at 32 bit not 64bit because that is what we use ..All spec2000 are
> > solaris9 and probably 64bit so that is apples to orange compare to our
> > environment.
>
> Not likely: 64-bit wouldn't give the SPEC tests any advantage
> that I can think of. There's nothing magic about being 64-bit
> that makes it magically faster than an equivelent 32-bit app,
> ya know (although they CAN be). If anything, a 64-bit app
> might expect a small performance degradation, due to the increased
> size of pointers and longs (hence, fewer cache hits).
>
> --
> Rich Teer, SCNA, SCSA . * * . * .* .
> . * . .*
> President, * . . /\ ( . . *
> Rite Online Inc. . . / .\ . * .
> .*. / * \ . .
> . /* o \ .
> Voice: +1 (250) 979-1638 * '''||''' .
> URL: http://www.rite-online.net ******************
yes, I agree about the 64 bit but perhaps sol9 vs sol8 makes a diff
|
|
0
|
|
|
|
Reply
|
computer
|
12/29/2003 12:38:45 AM
|
|
On Mon, 29 Dec 2003, computer person wrote:
> yes, I agree about the 64 bit but perhaps sol9 vs sol8 makes a diff
Yes, that's possible, due to performance improvements in
Solaris. But there's only so much you can do to improve
the performance of a tight loop of code!
--
Rich Teer, SCNA, SCSA . * * . * .* .
. * . .*
President, * . . /\ ( . . *
Rite Online Inc. . . / .\ . * .
.*. / * \ . .
. /* o \ .
Voice: +1 (250) 979-1638 * '''||''' .
URL: http://www.rite-online.net ******************
|
|
0
|
|
|
|
Reply
|
Rich
|
12/29/2003 1:33:33 AM
|
|
Rich Teer wrote:
> On Sun, 28 Dec 2003, computer person wrote:
>
>
>>and at 32 bit not 64bit because that is what we use ..All spec2000 are
>>solaris9 and probably 64bit so that is apples to orange compare to our
>>environment.
>
>
> Not likely: 64-bit wouldn't give the SPEC tests any advantage
> that I can think of. There's nothing magic about being 64-bit
> that makes it magically faster than an equivelent 32-bit app,
> ya know (although they CAN be). If anything, a 64-bit app
> might expect a small performance degradation, due to the increased
> size of pointers and longs (hence, fewer cache hits).
>
Rich,
A couple of FP intensive apps seem to run faster
in 64-bit mode (Fortran 77, mostly double precision)
with the Forte compiler. I gather that there are
more FP registers available in 64-bit mode (I'm no
expert, just a user) and the compiler puts them to
good use or some such.
Stuart
|
|
0
|
|
|
|
Reply
|
Stuart
|
12/29/2003 1:36:00 AM
|
|
On Sun, 28 Dec 2003, Stuart Biggar wrote:
> A couple of FP intensive apps seem to run faster
> in 64-bit mode (Fortran 77, mostly double precision)
> with the Forte compiler. I gather that there are
> more FP registers available in 64-bit mode (I'm no
> expert, just a user) and the compiler puts them to
> good use or some such.
Good point; although I was thinking of integer performance,
where the numbers involved can fit into a 32-bit quantity.
--
Rich Teer, SCNA, SCSA . * * . * .* .
. * . .*
President, * . . /\ ( . . *
Rite Online Inc. . . / .\ . * .
.*. / * \ . .
. /* o \ .
Voice: +1 (250) 979-1638 * '''||''' .
URL: http://www.rite-online.net ******************
|
|
0
|
|
|
|
Reply
|
Rich
|
12/29/2003 4:05:07 AM
|
|
On Sun, 28 Dec 2003 21:21:09 GMT "computer person" <fake_address@nothing.com> wrote:
>
> The simple C program is as follows from Sun Egineering:
> #define LIMIT (1000000*1000)
> main()
> {
> int cnt = 0;
> while (cnt < LIMIT)
> cnt++;
> printf("final = %d million\n", cnt/1000000);
> }
>
> It was compiled with default option on GCC on both IBM and Sun..It counts to
> 1000 million in 10 seconds on IBM p660 and 20 seconds on v1280.
Make cnt 'volatile' for a fair comparison. It's hard to know what gcc
might do on different platforms otherwise. Also, please show us the
generated .s files as well (gcc -S). Since there's a branch in a
tight loop, wasting the delay slot on sparc may be significant. (gcc
may not use the delay slot with -O0, giving the IBM CPU an unfair
advantage.)
A fairer comparison is likely to be -O2 on both processors, with cnt
as volatile.
Given that the spec #'s are close, I think you'll find that when doing
a fair comparison the systems will come much closer.
Not that this little test is at all indicative of real world performance
with real world applications. If the "experts" who helped you tune the
system gave you the above as a test, I suggest you find new "experts".
/fc
|
|
0
|
|
|
|
Reply
|
Frank
|
12/29/2003 5:29:24 AM
|
|
In article <9PHHb.224413$%TO.53235@twister01.bloor.is.net.cable.rogers.com>,
"computer person" <fake_address@nothing.com> writes:
> "CJT" <abujlehc@prodigy.net> wrote in message
> news:3FEF2A46.7000402@prodigy.net...
>> computer person wrote:
>>
>> > We are currently migrating from IBM RS/6000 p660 4 cpu machine to v1280
> 12
>> > cpu machine due to corporate directive to standardize on Sun.
A wise choice. Unusual from a "corporate directive"...
>> > We are running Oracle 8.1.7.4 (32 bit). We are finding during our
>> > performance and parallel tests that our IBM is 2 times as fast as the
> v1280.
>> > Management does not want to go to 64 bit so that is not on the agenda.
Ah! Thats better... I knew it wouldnt last.
> The simple C program is as follows from Sun Egineering:
> #define LIMIT (1000000*1000)
> main()
> {
> int cnt = 0;
> while (cnt < LIMIT)
> cnt++;
> printf("final = %d million\n", cnt/1000000);
> }
> It was compiled with default option on GCC on both IBM and Sun..It counts to
> 1000 million in 10 seconds on IBM p660 and 20 seconds on v1280.
Weird my Ultra 2 400 MHz using gcc 3.x.x can do this in :
date; ./ztst; date
Mon Dec 29 09:36:16 PST 2003
final = 1000 million
real 0m2.544s
user 0m2.530s
sys 0m0.010s
Mon Dec 29 09:36:18 PST 2003
Yep. Less than 3 seconds.
> Perhaps thers can compile that C program and run it on their boxes (IBM and
> SUN) and post the results just for fun!
OK I admit cheated. I compiled it 64 bit with -O6...
32 bits no optimization took 33 seconds.
|
|
0
|
|
|
|
Reply
|
gerryt
|
12/29/2003 5:42:30 PM
|
|
"Frank Cusack" <fcusack@fcusack.com> wrote in message
news:m3hdzk8a4r.fsf@magma.savecore.net...
> On Sun, 28 Dec 2003 21:21:09 GMT "computer person"
<fake_address@nothing.com> wrote:
> >
> > The simple C program is as follows from Sun Egineering:
> > #define LIMIT (1000000*1000)
> > main()
> > {
> > int cnt = 0;
> > while (cnt < LIMIT)
> > cnt++;
> > printf("final = %d million\n", cnt/1000000);
> > }
> >
> > It was compiled with default option on GCC on both IBM and Sun..It
counts to
> > 1000 million in 10 seconds on IBM p660 and 20 seconds on v1280.
>
> Make cnt 'volatile' for a fair comparison. It's hard to know what gcc
> might do on different platforms otherwise. Also, please show us the
> generated .s files as well (gcc -S). Since there's a branch in a
> tight loop, wasting the delay slot on sparc may be significant. (gcc
> may not use the delay slot with -O0, giving the IBM CPU an unfair
> advantage.)
>
> A fairer comparison is likely to be -O2 on both processors, with cnt
> as volatile.
>
> Given that the spec #'s are close, I think you'll find that when doing
> a fair comparison the systems will come much closer.
>
> Not that this little test is at all indicative of real world performance
> with real world applications. If the "experts" who helped you tune the
> system gave you the above as a test, I suggest you find new "experts".
>
> /fc
Well, after all we are dealing with so called "Sun Microsystems experts".
That says it all I guess..
|
|
0
|
|
|
|
Reply
|
computer
|
12/29/2003 7:27:11 PM
|
|
"Frank Cusack" <fcusack@fcusack.com> wrote in message
news:m3hdzk8a4r.fsf@magma.savecore.net...
> On Sun, 28 Dec 2003 21:21:09 GMT "computer person"
<fake_address@nothing.com> wrote:
> >
> > The simple C program is as follows from Sun Egineering:
> > #define LIMIT (1000000*1000)
> > main()
> > {
> > int cnt = 0;
> > while (cnt < LIMIT)
> > cnt++;
> > printf("final = %d million\n", cnt/1000000);
> > }
> >
> > It was compiled with default option on GCC on both IBM and Sun..It
counts to
> > 1000 million in 10 seconds on IBM p660 and 20 seconds on v1280.
>
> Make cnt 'volatile' for a fair comparison. It's hard to know what gcc
> might do on different platforms otherwise. Also, please show us the
> generated .s files as well (gcc -S). Since there's a branch in a
> tight loop, wasting the delay slot on sparc may be significant. (gcc
> may not use the delay slot with -O0, giving the IBM CPU an unfair
> advantage.)
>
> A fairer comparison is likely to be -O2 on both processors, with cnt
> as volatile.
>
> Given that the spec #'s are close, I think you'll find that when doing
> a fair comparison the systems will come much closer.
>
> Not that this little test is at all indicative of real world performance
> with real world applications. If the "experts" who helped you tune the
> system gave you the above as a test, I suggest you find new "experts".
>
> /fc
Here is the AIX version with -S option on GCC:
.file "bm.c"
..toc
..csect .text[PR]
gcc2_compiled.:
__gnu_compiled_c:
..csect _bm.rw_c[RO],3
.align 2
LC..0:
.byte "final = %d million"
.byte 10, 0
..toc
LC..1:
.tc LC..0[TC],LC..0
..csect .text[PR]
.align 2
.globl main
.globl .main
..csect main[DS]
main:
.long .main, TOC[tc0], 0
..csect .text[PR]
..main:
.extern __mulh
.extern __mull
.extern __divss
.extern __divus
.extern __quoss
.extern __quous
mflr 0
stw 31,-4(1)
stw 0,8(1)
stwu 1,-72(1)
mr 31,1
bl .__main
cror 31,31,31
li 0,0
stw 0,56(31)
L..3:
lwz 0,56(31)
lis 9,0x3b9a
ori 9,9,51711
cmpw 0,0,9
bc 4,1,L..5
b L..4
L..5:
lwz 9,56(31)
addi 0,9,1
stw 0,56(31)
b L..3
L..4:
lwz 0,56(31)
lis 9,0x431b
ori 9,9,56963
mr 3,0
mr 4,9
bla __mulh
mr 9,3
srawi 11,9,18
srawi 9,0,31
subfc 0,9,11
lwz 3,LC..1(2)
mr 4,0
bl .printf
cror 31,31,31
L..2:
lwz 1,0(1)
lwz 0,8(1)
mtlr 0
lwz 31,-4(1)
blr
LT..main:
.long 0
.byte 0,0,32,97,128,1,0,1
.long LT..main-.main
.short 4
.byte "main"
.byte 31
_section_.text:
..csect .data[RW],3
.long _section_.text
Here is Sun version with same:
.file "bm.c"
gcc2_compiled.:
..section ".rodata"
.align 8
..LLC0:
.asciz "final = %d million\n"
.global .div
..section ".text"
.align 4
.global main
.type main,#function
.proc 04
main:
!#PROLOGUE# 0
save %sp, -120, %sp
!#PROLOGUE# 1
st %g0, [%fp-20]
..LL3:
ld [%fp-20], %o0
sethi %hi(999999488), %o2
or %o2, 511, %o1
cmp %o0, %o1
ble .LL5
nop
b .LL4
nop
..LL5:
ld [%fp-20], %o0
add %o0, 1, %o1
st %o1, [%fp-20]
b .LL3
nop
..LL4:
ld [%fp-20], %o0
sethi %hi(999424), %o1
or %o1, 576, %o1
call .div, 0
nop
mov %o0, %o1
sethi %hi(.LLC0), %o2
or %o2, %lo(.LLC0), %o0
call printf, 0
nop
..LL2:
ret
restore
..LLfe1:
.size main,.LLfe1-main
.ident "GCC: (GNU) 2.95.3 20010315 (release)"
Comments PLEAZE!
|
|
0
|
|
|
|
Reply
|
computer
|
12/29/2003 7:35:02 PM
|
|
computer person wrote:
> "Frank Cusack" <fcusack@fcusack.com> wrote in message
> news:m3hdzk8a4r.fsf@magma.savecore.net...
>
>>On Sun, 28 Dec 2003 21:21:09 GMT "computer person"
>
> <fake_address@nothing.com> wrote:
>
>>>The simple C program is as follows from Sun Egineering:
>>>#define LIMIT (1000000*1000)
>>>main()
>>>{
>>> int cnt = 0;
>>> while (cnt < LIMIT)
>>> cnt++;
>>> printf("final = %d million\n", cnt/1000000);
>>>}
>>>
>>>It was compiled with default option on GCC on both IBM and Sun..It
>
> counts to
>
>>>1000 million in 10 seconds on IBM p660 and 20 seconds on v1280.
>>
>>Make cnt 'volatile' for a fair comparison. It's hard to know what gcc
>>might do on different platforms otherwise. Also, please show us the
>>generated .s files as well (gcc -S). Since there's a branch in a
>>tight loop, wasting the delay slot on sparc may be significant. (gcc
>>may not use the delay slot with -O0, giving the IBM CPU an unfair
>>advantage.)
>>
>>A fairer comparison is likely to be -O2 on both processors, with cnt
>>as volatile.
>>
>>Given that the spec #'s are close, I think you'll find that when doing
>>a fair comparison the systems will come much closer.
>>
>>Not that this little test is at all indicative of real world performance
>>with real world applications. If the "experts" who helped you tune the
>>system gave you the above as a test, I suggest you find new "experts".
>>
>>/fc
>
> Well, after all we are dealing with so called "Sun Microsystems experts".
> That says it all I guess..
>
>
You wouldn't have a hidden agenda in all this, would you?
--
After being targeted with gigabytes of trash by the "SWEN" worm, I have
concluded we must conceal our e-mail address. Our true address is the
mirror image of what you see before the "@" symbol. It's a shame such
steps are necessary. ...Charlie
|
|
0
|
|
|
|
Reply
|
CJT
|
12/29/2003 7:45:00 PM
|
|
At Mon, 29 Dec 2003 19:35:02 GMT, "computer person" <fake_address@nothing.com> writes:
> Here is Sun version with same:
You compiled the SPARC version without any optimization!
What kind of benchmark is that?
GCC is notably inefficent in that mode on the SPARC. Try compiling it
with -O2 at least; this is typical for production code, and speeds up
the SPARC version by a factor of 12.5 on my old 440MHz UltraSPARC II
host.
However, as someone else has pointed out, your benchmark can be
optimized away entirely by Sun's C compiler. That's probably the way
that Oracle is built, so your benchmark is irrelevant for your real
problem (whatever it is).
If all you care about is single-threaded CPU performance on 32-bit
Oracle, your best bet by far, price/performance wise, has got to be
x86. It will stomp both POWER4 and SPARC into the ground. (So you
needn't waste our time with silly benchmarks; we already know x86 is
the way to go for your problem. :-)
|
|
0
|
|
|
|
Reply
|
Paul
|
12/29/2003 8:20:06 PM
|
|
computer person wrote:
This is Your loop from GCC on Sun:
>.LL3:
> ld [%fp-20], %o0
> sethi %hi(999999488), %o2
> or %o2, 511, %o1
> cmp %o0, %o1
> ble .LL5
> nop
> b .LL4
> nop
>.LL5:
> ld [%fp-20], %o0
> add %o0, 1, %o1
> st %o1, [%fp-20]
> b .LL3
> nop
>.LL4:
Obviously, GCC without optimization knows little about pipelining.
Also, hinting to the compiler that cnt should be kept in a register
might eliminate the expensive ld.
To illustrate my point, I am using S1CC8, also without any optimization,
just so that the compiler doesn't optimize away the loop altogether:
! File doit.c:
! 1 #include <stdio.h>
! 2
! 3 #define LIMIT (1000000*1000)
! 4
! 5 int main(int argc, char* argv[])
! 6 {
! 7 register int cnt = 0;
mov %g0,%i5
! 8 while ( cnt < LIMIT)
sethi %hi(0x3b9aca00),%l0
or %l0,%lo(0x3b9aca00),%l0
cmp %i5,%l0
bge .L97
nop
! block 2
..L_y0:
sethi %hi(0x3b9aca00),%l0
or %l0,%lo(0x3b9aca00),%l0
..L98:
..L95:
! 9 cnt++;
add %i5,1,%i5
cmp %i5,%l0
bl .L95
nop
! block 3
[...]
Already You can see how S1CC8 creates much better code. But to drive home
the pipelining, one could "optimize" like this
..L95:
! 9 cnt++;
cmp %i5,%l0
bl .L95
add %i5,1,%i5
i.e. executing the add in the brach window instead of a nop. [NB I cannot
be bothered to check invariants for this.] For this extremely trivial
example, You will be able to observe a speedup of 100% doing so.
All in all I agree with others who suggested at least -O2 optimization in
order to fairly use RISC CPUs as they are ment to be.
Also, I cannot see why on earth anyone would use anything else but Sun's
own C compilers in a production environment, unless GCC is really necessary.
If You can afford a V1280, You most certainly can shell out the USD999 for
the S1CC8. Finally, I am not sure Sun cc's -fast option should be used
unchecked for production code.
regards
Torsten
ps. Did I read You right? It was some "professional services" people who
provided You with this alleged benchmark? Obviously, this code is extremely
trivial and dependant on the quality and optimization of the C compiler. At
best, it exercises a CPUs integer pipeline. It does *not* illustrate a
machines cache or memory bandwidth, and it certainly doesn't show secondary
storage i/o performance. In other words, it has absolutely no relevance to
Oracle or really whatsoever. In case these were paid consultants, I'd at
least ask for my money back.
|
|
0
|
|
|
|
Reply
|
Torsten
|
12/29/2003 8:58:00 PM
|
|
computer person wrote:
>
>
> The queries that run during the parallel test are using sqlplus as the
> client. They seem to be very CPU intensize because I do not see any Wait for
> IO. The CPU total runs at 8.3 % which is 1 full cpu during the run. We have
> 12 cpus so 100/12=8.3%.
>
It appears to me your oracle binary (licence?) is a single-processor
version.
As sqlplus is single-threaded, i can imagine it would run on 1
processor, so using more processors for more processes.
BUT if the oracle-engine is single-processor (i guess you have a new
oracle for Solaris as the AIX version probably won't run), you never
reach more than your 8.3% system usage.
>
> We have done our homework so don't think we simply banged up a machine and
> ran stuff :)
>
You did. But maybe your Oracle licence is a multiple user, but not
multiple processor version and thus driving you ......
Bart
|
|
0
|
|
|
|
Reply
|
Bart
|
12/29/2003 9:04:27 PM
|
|
"Torsten Kirschner" <torsten.kirschner@sandbox.no> wrote in message
news:3ff0964b@news.broadpark.no...
> computer person wrote:
>
> This is Your loop from GCC on Sun:
>
> >.LL3:
> > ld [%fp-20], %o0
> > sethi %hi(999999488), %o2
> > or %o2, 511, %o1
> > cmp %o0, %o1
> > ble .LL5
> > nop
> > b .LL4
> > nop
> >.LL5:
> > ld [%fp-20], %o0
> > add %o0, 1, %o1
> > st %o1, [%fp-20]
> > b .LL3
> > nop
> >.LL4:
>
>
> Obviously, GCC without optimization knows little about pipelining.
> Also, hinting to the compiler that cnt should be kept in a register
> might eliminate the expensive ld.
>
> To illustrate my point, I am using S1CC8, also without any optimization,
> just so that the compiler doesn't optimize away the loop altogether:
>
> ! File doit.c:
> ! 1 #include <stdio.h>
> ! 2
> ! 3 #define LIMIT (1000000*1000)
> ! 4
> ! 5 int main(int argc, char* argv[])
> ! 6 {
> ! 7 register int cnt = 0;
> mov %g0,%i5
> ! 8 while ( cnt < LIMIT)
> sethi %hi(0x3b9aca00),%l0
> or %l0,%lo(0x3b9aca00),%l0
> cmp %i5,%l0
> bge .L97
> nop
> ! block 2
> .L_y0:
> sethi %hi(0x3b9aca00),%l0
> or %l0,%lo(0x3b9aca00),%l0
> .L98:
> .L95:
> ! 9 cnt++;
> add %i5,1,%i5
> cmp %i5,%l0
> bl .L95
> nop
> ! block 3
> [...]
>
> Already You can see how S1CC8 creates much better code. But to drive home
> the pipelining, one could "optimize" like this
>
> .L95:
> ! 9 cnt++;
>
> cmp %i5,%l0
> bl .L95
> add %i5,1,%i5
>
> i.e. executing the add in the brach window instead of a nop. [NB I cannot
> be bothered to check invariants for this.] For this extremely trivial
> example, You will be able to observe a speedup of 100% doing so.
>
> All in all I agree with others who suggested at least -O2 optimization in
> order to fairly use RISC CPUs as they are ment to be.
> Also, I cannot see why on earth anyone would use anything else but Sun's
> own C compilers in a production environment, unless GCC is really
necessary.
> If You can afford a V1280, You most certainly can shell out the USD999 for
> the S1CC8. Finally, I am not sure Sun cc's -fast option should be used
> unchecked for production code.
>
> regards
> Torsten
> ps. Did I read You right? It was some "professional services" people who
> provided You with this alleged benchmark? Obviously, this code is
extremely
> trivial and dependant on the quality and optimization of the C compiler.
At
> best, it exercises a CPUs integer pipeline. It does *not* illustrate a
> machines cache or memory bandwidth, and it certainly doesn't show
secondary
> storage i/o performance. In other words, it has absolutely no relevance to
> Oracle or really whatsoever. In case these were paid consultants, I'd at
> least ask for my money back.
>
>
Your confusing the Oracle benchmarks and the simple cpu test which have
nothing to do with each other. If we optimize the code we do not have a test
case anymore. Suggest how we can use an exact program on each platform to
demonstrate the 32 bit single cpu case since that is what my whole email and
issue is about. Do not get me wrong, I too could optimize the crap out of
anything and disprove anything anyone has the gutts to throw as a test case.
Thats why this is fun, I guess.
thanks for all the input, it is appreciated!
|
|
0
|
|
|
|
Reply
|
computer
|
12/29/2003 9:09:54 PM
|
|
"Bart Somers" <nospam@localhost.lan> wrote in message
news:vv15jn746bc024@corp.supernews.com...
>
>
> computer person wrote:
> >
> >
> > The queries that run during the parallel test are using sqlplus as the
> > client. They seem to be very CPU intensize because I do not see any Wait
for
> > IO. The CPU total runs at 8.3 % which is 1 full cpu during the run. We
have
> > 12 cpus so 100/12=8.3%.
> >
> It appears to me your oracle binary (licence?) is a single-processor
> version.
> As sqlplus is single-threaded, i can imagine it would run on 1
> processor, so using more processors for more processes.
> BUT if the oracle-engine is single-processor (i guess you have a new
> oracle for Solaris as the AIX version probably won't run), you never
> reach more than your 8.3% system usage.
>
> >
> > We have done our homework so don't think we simply banged up a machine
and
> > ran stuff :)
> >
> You did. But maybe your Oracle licence is a multiple user, but not
> multiple processor version and thus driving you ......
>
> Bart
>
The oracle is fully licensed for 12 cpu on sun. It only uses 1 cpu cuz we
only ran 1 sqlplus.
|
|
0
|
|
|
|
Reply
|
computer
|
12/29/2003 9:21:56 PM
|
|
"CJT" <abujlehc@prodigy.net> wrote in message
news:3FF08441.9010604@prodigy.net...
> computer person wrote:
>
> > "Frank Cusack" <fcusack@fcusack.com> wrote in message
> > news:m3hdzk8a4r.fsf@magma.savecore.net...
> >
> >>On Sun, 28 Dec 2003 21:21:09 GMT "computer person"
> >
> > <fake_address@nothing.com> wrote:
> >
> >>>The simple C program is as follows from Sun Egineering:
> >>>#define LIMIT (1000000*1000)
> >>>main()
> >>>{
> >>> int cnt = 0;
> >>> while (cnt < LIMIT)
> >>> cnt++;
> >>> printf("final = %d million\n", cnt/1000000);
> >>>}
> >>>
> >>>It was compiled with default option on GCC on both IBM and Sun..It
> >
> > counts to
> >
> >>>1000 million in 10 seconds on IBM p660 and 20 seconds on v1280.
> >>
> >>Make cnt 'volatile' for a fair comparison. It's hard to know what gcc
> >>might do on different platforms otherwise. Also, please show us the
> >>generated .s files as well (gcc -S). Since there's a branch in a
> >>tight loop, wasting the delay slot on sparc may be significant. (gcc
> >>may not use the delay slot with -O0, giving the IBM CPU an unfair
> >>advantage.)
> >>
> >>A fairer comparison is likely to be -O2 on both processors, with cnt
> >>as volatile.
> >>
> >>Given that the spec #'s are close, I think you'll find that when doing
> >>a fair comparison the systems will come much closer.
> >>
> >>Not that this little test is at all indicative of real world performance
> >>with real world applications. If the "experts" who helped you tune the
> >>system gave you the above as a test, I suggest you find new "experts".
> >>
> >>/fc
> >
> > Well, after all we are dealing with so called "Sun Microsystems
experts".
> > That says it all I guess..
> >
> >
> You wouldn't have a hidden agenda in all this, would you?
>
> --
> After being targeted with gigabytes of trash by the "SWEN" worm, I have
> concluded we must conceal our e-mail address. Our true address is the
> mirror image of what you see before the "@" symbol. It's a shame such
> steps are necessary. ...Charlie
the agenda is to ask you guys for advice on how to demonstrate the cpu power
of the following (and no more)
IBM p660 750Mhz CPU
Sun v1280 900Mhz CPU
Run 1 process that shows a fair test of raw CPU power eliminating the IO as
an obstacle. If you can do this then you have met the agenda. I attempted
with a simple C program I got from Sun Engineering but everyone says this is
not a fair test because the object is not optimized. By optimizing the exe
the test becomes out of wack. Any suggestions! If we use Oracle we are
complicating the issue even though that is where our problems started but
now I have a belief the CPU is slow on Sun with our queries .
|
|
0
|
|
|
|
Reply
|
computer
|
12/29/2003 9:26:52 PM
|
|
At Mon, 29 Dec 2003 21:26:52 GMT, "computer person" <fake_address@nothing.com> writes:
> Any suggestions! If we use Oracle we are complicating the issue
But Oracle is where your problem lies. You can't avoid that
complication. The benchmarks that you've posted are red herrings.
Try filing a bug report with Oracle, explaining the problem in a
reproducible way, and complaining that unithreaded performance is
unreasonably slow on Solaris SPARC even though your CPU is pegged.
Quite possibly it's something silly like a misconfiguration on the
Solaris box; for example, perhaps you're running the Solaris version
with some internationalization setting that slows string processing
down dramatically. It's also quite possible that Oracle screwed up
some system calls somewhere in their server or their sqlplus client.
If you don't have Oracle support, get it: you need it.
|
|
0
|
|
|
|
Reply
|
Paul
|
12/29/2003 10:32:19 PM
|
|
"Paul Eggert" <eggert@twinsun.com> wrote in message
news:7wekunz24s.fsf@sic.twinsun.com...
> At Mon, 29 Dec 2003 21:26:52 GMT, "computer person"
<fake_address@nothing.com> writes:
>
> > Any suggestions! If we use Oracle we are complicating the issue
>
> But Oracle is where your problem lies. You can't avoid that
> complication. The benchmarks that you've posted are red herrings.
>
> Try filing a bug report with Oracle, explaining the problem in a
> reproducible way, and complaining that unithreaded performance is
> unreasonably slow on Solaris SPARC even though your CPU is pegged.
> Quite possibly it's something silly like a misconfiguration on the
> Solaris box; for example, perhaps you're running the Solaris version
> with some internationalization setting that slows string processing
> down dramatically. It's also quite possible that Oracle screwed up
> some system calls somewhere in their server or their sqlplus client.
>
> If you don't have Oracle support, get it: you need it.
Oracle prof services have been involved since we noticed the issue and have
reviewed all config and have suggested a few things but they have not made
any difference to the query times. Sun prof services Oracle staff have also
looked and came back with some block size suggestions which also had no
affect. veritas has been involved and tuned the crap out of the Vxfs with no
impact. Any other suggestions? how about forget about Oracle for a minute.
Give me benefit of your doubt and lets see if we can see how Mr CPU is
performing with out all the Oracle crap.. Seems you do not know how to do a
test like this.
|
|
0
|
|
|
|
Reply
|
computer
|
12/29/2003 10:46:34 PM
|
|
computer person wrote:
> "CJT" <abujlehc@prodigy.net> wrote in message
> news:3FF08441.9010604@prodigy.net...
>
>>computer person wrote:
>>
>>
>>>"Frank Cusack" <fcusack@fcusack.com> wrote in message
>>>news:m3hdzk8a4r.fsf@magma.savecore.net...
>>>
>>>
>>>>On Sun, 28 Dec 2003 21:21:09 GMT "computer person"
>>>
>>><fake_address@nothing.com> wrote:
>>>
>>>
>>>>>The simple C program is as follows from Sun Egineering:
>>>>>#define LIMIT (1000000*1000)
>>>>>main()
>>>>>{
>>>>> int cnt = 0;
>>>>> while (cnt < LIMIT)
>>>>> cnt++;
>>>>> printf("final = %d million\n", cnt/1000000);
>>>>>}
>>>>>
>>>>>It was compiled with default option on GCC on both IBM and Sun..It
>>>
>>>counts to
>>>
>>>
>>>>>1000 million in 10 seconds on IBM p660 and 20 seconds on v1280.
>>>>
>>>>Make cnt 'volatile' for a fair comparison. It's hard to know what gcc
>>>>might do on different platforms otherwise. Also, please show us the
>>>>generated .s files as well (gcc -S). Since there's a branch in a
>>>>tight loop, wasting the delay slot on sparc may be significant. (gcc
>>>>may not use the delay slot with -O0, giving the IBM CPU an unfair
>>>>advantage.)
>>>>
>>>>A fairer comparison is likely to be -O2 on both processors, with cnt
>>>>as volatile.
>>>>
>>>>Given that the spec #'s are close, I think you'll find that when doing
>>>>a fair comparison the systems will come much closer.
>>>>
>>>>Not that this little test is at all indicative of real world performance
>>>>with real world applications. If the "experts" who helped you tune the
>>>>system gave you the above as a test, I suggest you find new "experts".
>>>>
>>>>/fc
>>>
>>>Well, after all we are dealing with so called "Sun Microsystems
>
> experts".
>
>>>That says it all I guess..
>>>
>>>
>>
>>You wouldn't have a hidden agenda in all this, would you?
>>
>>--
>>After being targeted with gigabytes of trash by the "SWEN" worm, I have
>>concluded we must conceal our e-mail address. Our true address is the
>>mirror image of what you see before the "@" symbol. It's a shame such
>>steps are necessary. ...Charlie
>
>
> the agenda is to ask you guys for advice on how to demonstrate the cpu power
> of the following (and no more)
>
> IBM p660 750Mhz CPU
> Sun v1280 900Mhz CPU
>
> Run 1 process that shows a fair test of raw CPU power eliminating the IO as
> an obstacle. If you can do this then you have met the agenda. I attempted
> with a simple C program I got from Sun Engineering but everyone says this is
> not a fair test because the object is not optimized. By optimizing the exe
> the test becomes out of wack. Any suggestions! If we use Oracle we are
> complicating the issue even though that is where our problems started but
> now I have a belief the CPU is slow on Sun with our queries .
>
>
Just use the SPEC numbers if all you care about is "raw CPU power"
(whatever that means). I'd be much more concerned about your
saying your Oracle application is slower by a factor of 2 with the
same data (although I'm not so sure I understand why you're trying
to do everything with one CPU on multi-CPU boxes -- what's important
is the system, it seems to me).
Benchmarking with code that is irrelevant to the job you have to do
seems a waste of time to me -- but it's your time to spend as you like.
--
After being targeted with gigabytes of trash by the "SWEN" worm, I have
concluded we must conceal our e-mail address. Our true address is the
mirror image of what you see before the "@" symbol. It's a shame such
steps are necessary. ...Charlie
|
|
0
|
|
|
|
Reply
|
CJT
|
12/29/2003 11:09:05 PM
|
|
In article
<e92Ib.208969$ea%.43302@news01.bloor.is.net.cable.rogers.com>,
"computer person" <fake_address@nothing.com> wrote:
> "Paul Eggert" <eggert@twinsun.com> wrote in message
> news:7wekunz24s.fsf@sic.twinsun.com...
> > At Mon, 29 Dec 2003 21:26:52 GMT, "computer person"
> <fake_address@nothing.com> writes:
> >
> > > Any suggestions! If we use Oracle we are complicating the issue
> >
> > But Oracle is where your problem lies. You can't avoid that
> > complication. The benchmarks that you've posted are red herrings.
> >
> > Try filing a bug report with Oracle, explaining the problem in a
> > reproducible way, and complaining that unithreaded performance is
> > unreasonably slow on Solaris SPARC even though your CPU is pegged.
> > Quite possibly it's something silly like a misconfiguration on the
> > Solaris box; for example, perhaps you're running the Solaris version
> > with some internationalization setting that slows string processing
> > down dramatically. It's also quite possible that Oracle screwed up
> > some system calls somewhere in their server or their sqlplus client.
> >
> > If you don't have Oracle support, get it: you need it.
>
> Oracle prof services have been involved since we noticed the issue and have
> reviewed all config and have suggested a few things but they have not made
> any difference to the query times. Sun prof services Oracle staff have also
> looked and came back with some block size suggestions which also had no
> affect. veritas has been involved and tuned the crap out of the Vxfs with no
> impact. Any other suggestions? how about forget about Oracle for a minute.
> Give me benefit of your doubt and lets see if we can see how Mr CPU is
> performing with out all the Oracle crap.. Seems you do not know how to do a
> test like this.
Woah there, computer boy. Be careful. Paul's been in this business
probably longer than you've been alive (I seem to recall running across
him when I was at UCLA back in the late 1970's).
We're all telling you that the CPU benchmark you're using is so bogus
that you're wasting your time. Forget who gave it to you. Focus on the
goal--improving performance on the Solaris machine for this
database--rather than spitting in the face of those who are trying to
help you. You'll get more help that way, unless you're really a troll
trying to do the Dr. Don Cool thing (where is he anyway?).
Does the EXPLAIN SELECT query (works on MySQL!) tell you the strategy
the Oracle engine is using produce the same output on both AIX and
Solaris? If not, have Oracle explain why. I suspect the DBMS is making
wrong assumptions about how to do the query or there's an undiscovered
bug in the Solaris version.
Try other simplier queries on both systems. Do they consistantly
perform differently? At what point is there divergence?
--
DeeDee, don't press that button! DeeDee! NO! Dee...
|
|
0
|
|
|
|
Reply
|
Michael
|
12/29/2003 11:10:24 PM
|
|
computer person wrote:
> "Paul Eggert" <eggert@twinsun.com> wrote in message
> news:7wekunz24s.fsf@sic.twinsun.com...
>
>>At Mon, 29 Dec 2003 21:26:52 GMT, "computer person"
>
> <fake_address@nothing.com> writes:
>
>>>Any suggestions! If we use Oracle we are complicating the issue
>>
>>But Oracle is where your problem lies. You can't avoid that
>>complication. The benchmarks that you've posted are red herrings.
>>
>>Try filing a bug report with Oracle, explaining the problem in a
>>reproducible way, and complaining that unithreaded performance is
>>unreasonably slow on Solaris SPARC even though your CPU is pegged.
>>Quite possibly it's something silly like a misconfiguration on the
>>Solaris box; for example, perhaps you're running the Solaris version
>>with some internationalization setting that slows string processing
>>down dramatically. It's also quite possible that Oracle screwed up
>>some system calls somewhere in their server or their sqlplus client.
>>
>>If you don't have Oracle support, get it: you need it.
>
>
> Oracle prof services have been involved since we noticed the issue and have
> reviewed all config and have suggested a few things but they have not made
> any difference to the query times. Sun prof services Oracle staff have also
> looked and came back with some block size suggestions which also had no
> affect. veritas has been involved and tuned the crap out of the Vxfs with no
> impact. Any other suggestions? how about forget about Oracle for a minute.
> Give me benefit of your doubt and lets see if we can see how Mr CPU is
> performing with out all the Oracle crap.. Seems you do not know how to do a
> test like this.
>
>
One thing you could do rather quickly is put multiple CPUs on the
task and see whether it goes faster. Right now you don't even know
whether you're CPU-bound, AFAICT.
--
After being targeted with gigabytes of trash by the "SWEN" worm, I have
concluded we must conceal our e-mail address. Our true address is the
mirror image of what you see before the "@" symbol. It's a shame such
steps are necessary. ...Charlie
|
|
0
|
|
|
|
Reply
|
CJT
|
12/29/2003 11:12:09 PM
|
|
computer person wrote:
> Your confusing the Oracle benchmarks and the simple cpu test which have
> nothing to do with each other.
No I don't. You wrote in Your article
<9PHHb.224413$%TO.53235@twister01.bloor.is.net.cable.rogers.com> :
> The simple C program is as follows from Sun Egineering:
> #define LIMIT (1000000*1000)
> main()
> {
> int cnt = 0;
> while (cnt < LIMIT)
> cnt++;
> printf("final = %d million\n", cnt/1000000);
> }
>
> It was compiled with default option on GCC on both IBM and Sun..It counts
> to 1000 million in 10 seconds on IBM p660 and 20 seconds on v1280.
>
[...]
> Perhaps thers can compile that C program and run it on their boxes (IBM
> and SUN) and post the results just for fun!
This is what I commented on from memory.
> If we optimize the code we do not have a test case anymore.
As pointed out by others and myself, this "test case" is both irrelevant
and badly crafted, for reasons stated. As a matter of fact, I doubt very
much that "Sun Engeneering" provided the original code to begin with. As
others here I happen to know some Sun engeneers and none of them could be
forced to write such bad code, certainly not in regard to Your stated
Oracle problem.
On optimization, there is that school that says one should run identical
code on different machines and compare performance. This is what the SPEC
benchmarks are for. They've been at it for years and I deem any attempt to
compete with them futile.
While I agree that SPEC yields some rule of thumb results, I subscribe to a
a further step, which is allowing a finite time for all-out optimization.
For any given case, that is what I really care about; "how fast can You
make it solve my actual problems?"
> Suggest how we can use an exact program on each platform to demonstrate
> the 32 bit single cpu case since that is what my whole email and issue is
> about.
Then go to www.spec.org.
However, I was under the impression that You wanted to get help on making
Your sepecific Oracle query run faster or at least explained why the IBM
machine outperforms the Sun.
> Do not get me wrong, I too could optimize the crap out of anything
> and disprove anything anyone has the gutts to throw as a test case.
Really? Then why do You have Oracle trouble in the first place?
> Thats why this is fun, I guess.
I wonder, You might not be a Sun-bashing troll by any chance?
In the increasingly unlikely case that You're for real, I suggest the
following:
DB-performance optimization is non-trivial. It depends on many factors.
- the query
- the way Oracle executes Your query (PQO)
- Oracle configuration
- RAM
- #ofCPU
- I/O
If You cannot work out the optimal setup in co-operation with the mentioned
Oracle and Sun PSO, and then make a case of it with "management" as to how
much it might cost, You're probably out of luck, anyway. As far as
single-CPU performance goes, others have pointed out that You'd probably be
better off running a COTS PC. Only, why on earth did Your organization buy
that heavy SMP gear, then?
I suppose You're chances for getting more Oracle-related help might
increase by asking qualified questions in the Oracle-related NG, e.g.
comp.databases.oracle.server.
|
|
0
|
|
|
|
Reply
|
Torsten
|
12/29/2003 11:14:34 PM
|
|
<Michael Vilain <vilain@spamcop.net>> wrote in message
news:vilain-AF77AC.15102429122003@comcast.ash.giganews.com...
> In article
> <e92Ib.208969$ea%.43302@news01.bloor.is.net.cable.rogers.com>,
> "computer person" <fake_address@nothing.com> wrote:
>
> > "Paul Eggert" <eggert@twinsun.com> wrote in message
> > news:7wekunz24s.fsf@sic.twinsun.com...
> > > At Mon, 29 Dec 2003 21:26:52 GMT, "computer person"
> > <fake_address@nothing.com> writes:
> > >
> > > > Any suggestions! If we use Oracle we are complicating the issue
> > >
> > > But Oracle is where your problem lies. You can't avoid that
> > > complication. The benchmarks that you've posted are red herrings.
> > >
> > > Try filing a bug report with Oracle, explaining the problem in a
> > > reproducible way, and complaining that unithreaded performance is
> > > unreasonably slow on Solaris SPARC even though your CPU is pegged.
> > > Quite possibly it's something silly like a misconfiguration on the
> > > Solaris box; for example, perhaps you're running the Solaris version
> > > with some internationalization setting that slows string processing
> > > down dramatically. It's also quite possible that Oracle screwed up
> > > some system calls somewhere in their server or their sqlplus client.
> > >
> > > If you don't have Oracle support, get it: you need it.
> >
> > Oracle prof services have been involved since we noticed the issue and
have
> > reviewed all config and have suggested a few things but they have not
made
> > any difference to the query times. Sun prof services Oracle staff have
also
> > looked and came back with some block size suggestions which also had no
> > affect. veritas has been involved and tuned the crap out of the Vxfs
with no
> > impact. Any other suggestions? how about forget about Oracle for a
minute.
> > Give me benefit of your doubt and lets see if we can see how Mr CPU is
> > performing with out all the Oracle crap.. Seems you do not know how to
do a
> > test like this.
>
> Woah there, computer boy. Be careful. Paul's been in this business
> probably longer than you've been alive (I seem to recall running across
> him when I was at UCLA back in the late 1970's).
>
> We're all telling you that the CPU benchmark you're using is so bogus
> that you're wasting your time. Forget who gave it to you. Focus on the
> goal--improving performance on the Solaris machine for this
> database--rather than spitting in the face of those who are trying to
> help you. You'll get more help that way, unless you're really a troll
> trying to do the Dr. Don Cool thing (where is he anyway?).
>
> Does the EXPLAIN SELECT query (works on MySQL!) tell you the strategy
> the Oracle engine is using produce the same output on both AIX and
> Solaris? If not, have Oracle explain why. I suspect the DBMS is making
> wrong assumptions about how to do the query or there's an undiscovered
> bug in the Solaris version.
>
> Try other simplier queries on both systems. Do they consistantly
> perform differently? At what point is there divergence?
>
> --
> DeeDee, don't press that button! DeeDee! NO! Dee...
>
>
>
No offence intended..I will go elsewares..thanks
|
|
0
|
|
|
|
Reply
|
computer
|
12/30/2003 12:35:30 AM
|
|
On Mon, 29 Dec 2003 23:12:09 GMT CJT <abujlehc@prodigy.net> wrote:
> One thing you could do rather quickly is put multiple CPUs on the
> task and see whether it goes faster. Right now you don't even know
> whether you're CPU-bound, AFAICT.
Yes he does, he indicated in an earlier message that CPU %util was 8.3%
in other words 100% of 1 CPU.
/fc
|
|
0
|
|
|
|
Reply
|
Frank
|
12/30/2003 3:00:20 AM
|
|
On Mon, 29 Dec 2003 21:58:00 +0100 Torsten Kirschner <torsten.kirschner@sandbox.no> wrote:
> ps. Did I read You right? It was some "professional services" people who
> provided You with this alleged benchmark? Obviously, this code is extremely
> trivial and dependant on the quality and optimization of the C compiler. At
> best, it exercises a CPUs integer pipeline. It does *not* illustrate a
> machines cache or memory bandwidth, and it certainly doesn't show secondary
> storage i/o performance. In other words, it has absolutely no relevance to
> Oracle or really whatsoever. In case these were paid consultants, I'd at
> least ask for my money back.
The OP has already spent a lot of time with both Sun and Oracle
prof. serv. They are unable to get the Sun to perform even reasonably
close to the IBM box (2x slowdown, even IF it were un-tuned vs. tuned
(which it isn't) is ridiculous).
The OP has also reasonably determined that the app is CPU bound (simple
vmstat/top data).
So he's trying to find a *simple* benchmark which shows just the CPU
difference. You guys have all been focusing on the micro problem of
blasting the benchmarks. OK, the benchmark is bad, but it's not enough
to say it's useless, someone who understands STAR assembly should
compare the IBM code to the SPARC code!
I suggested -O2 and volatile int cnt so that the gcc effects on different
platforms would tend to be equalized. (volatile int won't hit memory
bandwidth problems because it'll stick to L1 cache, and it will tell
the compiler it can't just optimize the loop away completely).
If the OP will compile with -O2 and use a volatile int, ie get to a
more fair comparison, we can show him that the CPU is not inherently
2x slower. Then he'll see that it's not the hardware's fault ...
The docs I've found on STAR say that the CPU does not do branch prediction
and always assumes conditional branches are not taken. So once we get
the delay slot to be used on SPARC, we may see that the sparc whoops butt.
(I predict this based on someone's post of their results with this code.)
So if we see that it will be interesting to rewrite the code as a while
loop with a conditional break instead of a conditional loop.
This doesn't really matter to Oracle performance, but at least we can show
that the hardware isn't inherently slower ... and that Oracle itself is
simply tuned better for IBM.
/fc
|
|
0
|
|
|
|
Reply
|
Frank
|
12/30/2003 3:29:08 AM
|
|
On Mon, 29 Dec 2003 21:09:54 GMT "computer person" <fake_address@nothing.com> wrote:
> Your confusing the Oracle benchmarks and the simple cpu test which have
> nothing to do with each other. If we optimize the code we do not have a test
> case anymore.
No, actually if you DON'T optimize you don't have a test case.
The code emitted by gcc without optimization will be wildly different
on the 2 platforms. You need to optimize on BOTH platforms to begin
to get a fair comparison. Did you see the results from someone who
showed Sun CC had a 6x (600% if you want to see a big looking number)
over gcc, both WITHOUT any optimization flags? That shows you what a
big difference the compiler makes. gcc on the IBM may be much smarter
than gcc on SPARC when it doesn't do optimization. You can't really
tell, that's why I asked you to post the assembly. Hopefully someone
can look at the IBM code and see if it's as bad as the SPARC or if
it's actually reasonable (in which case you've unfairly given the IBM
an advantage).
> Suggest how we can use an exact program on each platform to
> demonstrate the 32 bit single cpu case since that is what my whole
> email and issue is about.
That can't be done. You can't demonstrate why Oracle is slow with a
simple benchmark, period. You'd need to collect detailed info on both
sides of things like memory usage, L2 cache misses, etc. It'd take you
ages.
But at least I can hope to show you that SPARC isn't 2x slower as you
seem to be believing.
On Tue, 30 Dec 2003 00:14:34 +0100 Torsten Kirschner <torsten.kirschner@sandbox.no> wrote:
> If You cannot work out the optimal setup in co-operation with the mentioned
> Oracle and Sun PSO, and then make a case of it with "management" as to how
> much it might cost, You're probably out of luck, anyway. As far as
> single-CPU performance goes, others have pointed out that You'd probably be
> better off running a COTS PC. Only, why on earth did Your organization buy
> that heavy SMP gear, then?
huh. That's a good point. Who cares anyway? If the OP is going to just
run on one CPU, *32-bit*, then the SPARC is an incredibly bad choice.
/fc
|
|
0
|
|
|
|
Reply
|
Frank
|
12/30/2003 3:40:39 AM
|
|
Frank Cusack wrote:
> On Mon, 29 Dec 2003 21:58:00 +0100 Torsten Kirschner <torsten.kirschner@sandbox.no> wrote:
>
>
>>ps. Did I read You right? It was some "professional services" people who
>>provided You with this alleged benchmark? Obviously, this code is extremely
>>trivial and dependant on the quality and optimization of the C compiler. At
>>best, it exercises a CPUs integer pipeline. It does *not* illustrate a
>>machines cache or memory bandwidth, and it certainly doesn't show secondary
>>storage i/o performance. In other words, it has absolutely no relevance to
>>Oracle or really whatsoever. In case these were paid consultants, I'd at
>>least ask for my money back.
>
>
> The OP has already spent a lot of time with both Sun and Oracle
> prof. serv. They are unable to get the Sun to perform even reasonably
> close to the IBM box (2x slowdown, even IF it were un-tuned vs. tuned
> (which it isn't) is ridiculous).
>
> The OP has also reasonably determined that the app is CPU bound (simple
> vmstat/top data).
>
> So he's trying to find a *simple* benchmark which shows just the CPU
> difference. You guys have all been focusing on the micro problem of
> blasting the benchmarks. OK, the benchmark is bad, but it's not enough
> to say it's useless, someone who understands STAR assembly should
> compare the IBM code to the SPARC code!
>
> I suggested -O2 and volatile int cnt so that the gcc effects on different
> platforms would tend to be equalized. (volatile int won't hit memory
> bandwidth problems because it'll stick to L1 cache, and it will tell
> the compiler it can't just optimize the loop away completely).
>
> If the OP will compile with -O2 and use a volatile int, ie get to a
> more fair comparison, we can show him that the CPU is not inherently
> 2x slower. Then he'll see that it's not the hardware's fault ...
>
> The docs I've found on STAR say that the CPU does not do branch prediction
> and always assumes conditional branches are not taken. So once we get
> the delay slot to be used on SPARC, we may see that the sparc whoops butt.
> (I predict this based on someone's post of their results with this code.)
> So if we see that it will be interesting to rewrite the code as a while
> loop with a conditional break instead of a conditional loop.
>
> This doesn't really matter to Oracle performance, but at least we can show
> that the hardware isn't inherently slower ... and that Oracle itself is
> simply tuned better for IBM.
>
> /fc
OK. I've been quickly reading through this thread.
I too have a V1280: 8x900 MHz, 16GB RAM, Solaris 9.
FWIW (probably not much), here is that C program compiled with gcc 3.3.2
in both 32 bit and 64 bit. Sorry I don't have the Sun C compiler.
Watching mpstat over 5 second intervals, it wasn't until about 15 sec's
into it that a CPU would momentarly go to zero % available (not all the
every time though).
Used -O2 and volatile:
# cat test.c
#include <stdio.h>
#define LIMIT (1000000*1000)
main()
{
volatile int cnt = 0;
while (cnt < LIMIT)
cnt++;
printf("final = %d million\n", cnt/1000000);
}
# gcc -O2 test.c
# time ./a.out
final = 1000 million
real 17.6
user 17.6
sys 0.0
# gcc -O2 -m64 test.c
# time ./a.out
final = 1000 million
real 19.2
user 19.2
sys 0.0
# gcc --v
Reading specs from /usr/local/lib/gcc-lib/sparc-sun-solaris2.9/3.3.2/specs
Configured with: ../configure --with-as=/usr/ccs/bin/as
--with-ld=/usr/ccs/bin/ld --disable-nls --disable-libgcj
--enable-languages=c,c++ : (reconfigured) ../configure
--with-as=/usr/ccs/bin/as --with-ld=/usr/ccs/bin/ld --disable-nls
--disable-libgcj --enable-languages=c,c++
Thread model: posix
gcc version 3.3.2
# uname -a
SunOS v1280 5.9 Generic_112233-08 sun4u sparc SUNW,Netra-T12
#
|
|
0
|
|
|
|
Reply
|
Roger
|
12/30/2003 4:56:40 AM
|
|
computer person wrote:
> We are currently migrating from IBM RS/6000 p660 4 cpu machine to v1280 12
> cpu machine due to corporate directive to standardize on Sun.
>
> We are running Oracle 8.1.7.4 (32 bit). We are finding during our
> performance and parallel tests that our IBM is 2 times as fast as the v1280.
> Our processes are only using 1 CPU.
>
> Management does not want to go to 64 bit so that is not on the agenda.
>
> I have pulled off the spec2000 for the 2 chips and it says the Ultrasparc
> III should be faster than the IBM processor but we are not finding that with
> our application.
>
> We ran a simple C program test and did find that just CPU use it says the
> IBM is twice the speed for cpu operations.
>
> Any comments?
>
>
Your can approximately measure speed of your CPU with OpenSSL -
openssl speed
|
|
0
|
|
|
|
Reply
|
Serge
|
12/30/2003 7:16:10 AM
|
|
On Mon, 29 Dec 2003 22:56:40 -0600 "Roger P. Johnson" <roger0080@netscape.net> wrote:
> OK. I've been quickly reading through this thread.
> I too have a V1280: 8x900 MHz, 16GB RAM, Solaris 9.
>
> FWIW (probably not much), here is that C program compiled with gcc
> 3.3.2 in both 32 bit and 64 bit. Sorry I don't have the Sun C
> compiler. Watching mpstat over 5 second intervals, it wasn't until
> about 15 sec's into it that a CPU would momentarly go to zero %
> available (not all the every time though).
>
> Used -O2 and volatile:
....
> # gcc -O2 test.c
> # time ./a.out
> final = 1000 million
>
> real 17.6
> user 17.6
> sys 0.0
> # gcc -O2 -m64 test.c
> # time ./a.out
> final = 1000 million
>
> real 19.2
> user 19.2
> sys 0.0
What do you get without -O2?
These numbers sound really slow. You mention that CPU doesn't go to 100%
until 15s into it, which jives with the below. Is your system otherwise
busy? The scheduling delay is odd.
On Sun, 28 Dec 2003 21:33:28 +0000 (UTC) Alan Coopersmith <alanc@alum.calberkeley.org> wrote:
> System: Sun E250 with 2x400Mhz UltraSPARC II
> Compilers: gcc 2.95, SunOne CC 8.0 (patch 112760-04)
>
> gcc w/no options: 32.72u 0.01s 0:32.73 100.0%
> gcc -O2: 5.03u 0.01s 0:05.04 100.0%
> cc w/no options: 5.04u 0.00s 0:05.04 100.0%
> cc -fast: 0.01u 0.00s 0:00.01 100.0%
/fc
|
|
0
|
|
|
|
Reply
|
Frank
|
12/30/2003 9:00:30 AM
|
|
Boy is unoptimized gcc slow (400MHz US-II):
tt30446@terrance[747]> gcc foo.c
tt30446@terrance[748]> timex ./a.out
final = 1000 million
real 35.43
user 35.31
sys 0.01
tt30446@terrance[750]> gcc --version
3.0.2
I get the same times with -O2 and -O4.
tt30446@terrance[756]> cc foo.c
"foo.c", line 5: warning: old-style declaration or incorrect type for: main
tt30446@terrance[757]> timex ./a.out
final = 1000 million
real 5.40
user 5.37
sys 0.02
tt30446@terrance[758]> cc -O foo.c
"foo.c", line 5: warning: old-style declaration or incorrect type for: main
tt30446@terrance[759]> timex ./a.out
final = 1000 million
real 0.02
user 0.01
sys 0.01
tt30446@terrance[764]> cc -V -O foo.c
cc: Sun C 5.5 Patch 112760-01 2003/05/18
acomp: Sun C 5.5 Patch 112760-01 2003/05/18
"foo.c", line 5: warning: old-style declaration or incorrect type for: main
iropt: Sun Compiler Common 7.1 Patch 112763-03 2003/08/21
cg: Sun Compiler Common 7.1 Patch 112763-03 2003/08/21
ld: Software Generation Utilities - Solaris Link Editors: 5.9-1.375
|
|
0
|
|
|
|
Reply
|
Thomas
|
12/30/2003 9:54:11 AM
|
|
"Michael Vilain <vilain@spamcop.net>" wrote:
> Does the EXPLAIN SELECT query (works on MySQL!) tell you the strategy
> the Oracle engine is using produce the same output on both AIX and
> Solaris? If not, have Oracle explain why. I suspect the DBMS is making
> wrong assumptions about how to do the query or there's an undiscovered
> bug in the Solaris version.
That's usually called explain plan on Oracle. I suppose you're right. In
case he didn't analyze the schema Oracle would use the old optimizer and
that can produce two or three times worse results.
OTOH, even if the plans are exactly the same, tkprof output should be able
to point at the problem. But Oracle people did all that already, right?
--
.-. .-. Yes, I am an agent of Satan, but my duties are largely
(_ \ / _) ceremonial.
|
| dave@fly.srk.fer.hr
|
|
0
|
|
|
|
Reply
|
Drazen
|
12/30/2003 12:44:13 PM
|
|
"Drazen Kacar" <dave@fly.srk.fer.hr> wrote in message
news:slrnbv2sot.d3i.dave@fly.srk.fer.hr...
> "Michael Vilain <vilain@spamcop.net>" wrote:
>
> > Does the EXPLAIN SELECT query (works on MySQL!) tell you the strategy
> > the Oracle engine is using produce the same output on both AIX and
> > Solaris? If not, have Oracle explain why. I suspect the DBMS is
making
> > wrong assumptions about how to do the query or there's an undiscovered
> > bug in the Solaris version.
>
> That's usually called explain plan on Oracle. I suppose you're right. In
> case he didn't analyze the schema Oracle would use the old optimizer and
> that can produce two or three times worse results.
>
> OTOH, even if the plans are exactly the same, tkprof output should be able
> to point at the problem. But Oracle people did all that already, right?
>
> --
> .-. .-. Yes, I am an agent of Satan, but my duties are largely
> (_ \ / _) ceremonial.
> |
> | dave@fly.srk.fer.hr
Hi, I am back after a rough day at the office with this issue. We have other
issues, as well, now discovered. Our ESSBASE cubes are running twice as fast
on AIX as well.
Anyways, we think we have figured out something at least. Looks like the
LINESIZE in the SQL is causing severe performance problems with sqlplus on
Solaris. The programmer has it set at 2500 or some real high number even
though the query only creates a 250 max record length. We changed it to 500
(just to be safe) and now that query runs in 1/3 the time. The CPU looks a
lot better too. I also found some mention of tuning the ndd tcp ack lower or
something like that which will speed up the sqlplus coming from other Sun
boxes to the DB server. Have not tried that one yet. When we run the client
query from a IBM box the query screams but any Sun client it is slow without
changing the LINESIZE. Good thing we do not need a huge LINESIZE.
The ESSBASE is pegging a CPU at 100% after I ran "mpstat 1" that was clear.
The ESSBASE is doing what is called a CALC which is basically read a bunch
of flat extracted files and build cubes out of them. Very CPU intensive
operation. I guess we have to find out why the thing runs slower on the Sun
box. I noticed that the IBM box CPU (single) is also pegged with no wait for
IO but it runs faster than the Sun. That looks like a good CPU power
comparison.. I know, I know, you probably think ESSBASE needs tuning on
Sun..We will see :)
Anyways, I have to thank this group for mentioning the mpstat command cuz
that is cool. I was trying to figure out how to get CPU by processor and
that fits the bill.
I will keep plugging away on this..thanks again!
|
|
0
|
|
|
|
Reply
|
computer
|
12/31/2003 2:04:55 AM
|
|
Frank Cusack wrote:
> On Mon, 29 Dec 2003 22:56:40 -0600 "Roger P. Johnson" <roger0080@netscape.net> wrote:
>
>>OK. I've been quickly reading through this thread.
>>I too have a V1280: 8x900 MHz, 16GB RAM, Solaris 9.
>>
>>FWIW (probably not much), here is that C program compiled with gcc
>>3.3.2 in both 32 bit and 64 bit. Sorry I don't have the Sun C
>>compiler. Watching mpstat over 5 second intervals, it wasn't until
>>about 15 sec's into it that a CPU would momentarly go to zero %
>>available (not all the every time though).
>>
>>Used -O2 and volatile:
>
> ...
>
>># gcc -O2 test.c
>># time ./a.out
>>final = 1000 million
>>
>>real 17.6
>>user 17.6
>>sys 0.0
>># gcc -O2 -m64 test.c
>># time ./a.out
>>final = 1000 million
>>
>>real 19.2
>>user 19.2
>>sys 0.0
>
>
> What do you get without -O2?
>
I can't run it this evening; the machine is doing a conversion now.
> These numbers sound really slow. You mention that CPU doesn't go to 100%
> until 15s into it, which jives with the below. Is your system otherwise
> busy? The scheduling delay is odd.
Indeed it was at the time. I didn't the machine was toooo busy to
interfere with the results too terrible bad :(
>
> On Sun, 28 Dec 2003 21:33:28 +0000 (UTC) Alan Coopersmith <alanc@alum.calberkeley.org> wrote:
>
>>System: Sun E250 with 2x400Mhz UltraSPARC II
>>Compilers: gcc 2.95, SunOne CC 8.0 (patch 112760-04)
>>
>> gcc w/no options: 32.72u 0.01s 0:32.73 100.0%
>> gcc -O2: 5.03u 0.01s 0:05.04 100.0%
>> cc w/no options: 5.04u 0.00s 0:05.04 100.0%
>> cc -fast: 0.01u 0.00s 0:00.01 100.0%
>
>
> /fc
|
|
0
|
|
|
|
Reply
|
Roger
|
12/31/2003 2:11:36 AM
|
|
|
46 Replies
259 Views
(page loaded in 0.344 seconds)
Similiar Articles: HPGCC: How to speed up the CPU? - comp.sys.hp48Hello people, I have seen on Al's page that it is possible to "overclock" the CPU up to 200MHz. I would like to know if there is an function in HPGCC... Looking for CPU temperatures and fans speed - comp.sys.hp.hpux ...I got new Pavilion d4100y computer and I would like to monitor the CPU temperatures and fans speed.Dose HP have a program to monitor the temperature... MinGW and Inline Assembler - comp.os.ms-windows.programmer.win32 ...Hello, this is part of a CPU-Speed-Detection Program: #ifdef __BORLANDC__ void getRdtsc( UINT64* pResult ) { _asm { db 0... How to get fan speeds on X4200? - comp.unix.solarishow to get thread count - comp.unix.programmer Looking for CPU temperatures and fans speed - comp.sys.hp.hpux ... how to get thread count - comp.unix.programmer How to get ... Mac os X 10.3.9 G4 USB 1.1 firmware upgrade --> USB 2.0 - comp.sys ...Heres my iMacs specs: Hardware: Hardware Overview: Machine Model: iMac CPU Type: PowerPC G4 (2.1) Number Of CPUs: 1 CPU Speed: 800 MHz L2 Cache (per CPU ... MERGE statement consuming all available CPU - comp.databases.ibm ...Database Context: ----- Parallelism: None CPU Speed: 2.401083e-07 Comm Speed: 0 Buffer Pool size: 1090194 Sort Heap size: 452 Database ... jpeg decoding speeds: x86, ARM, dsp, fpga, GPU - comp.compression ...Anyone know of any benchmarks that compare the speed of jpeg decompression on an x86 cpu, an ARM cpu, a DSP chip, an fpga, an GPU, etc? I know each generation of each ... High resolution timer. - comp.lang.asm.x86... and no, it does not return the same number on different CPUs... it returns the number of clock ticks since power-up, which of course depends upon the CPU speed (which can ... AMD, or Intel? - comp.cad.solidworksFor example, if you are using SW heavily for 2 hours and Task Manager says you have used 1/2 to 1 hour of CPU time you are probably limited by the CPU speed. power scheme from /proc/cpuinfo: What does index 8 mean? - comp.os ...I'm using CentOS 5.4 processor : 7 vendor_id : GenuineIntel cpu ... Memory problem - comp.lang.asm.x86 It does seem to me rather like a 'bug', if the speed ... CPU info - comp.unix.solarishi, i need some information of the CPU and bytes sent and received by my network ... Hello, this is part of a CPU-Speed-Detection Program: #ifdef __BORLANDC__ void ... Adding a second CPU to a DS20E - comp.os.vmsHi folks, I am planning to add a second CPU to my (finally working) DS20E. Is there anything to look out for except, of course, choosing CPU boards of the same speed ... Locking 75 MHz operation - almost... - comp.sys.hp48If anyone has any suggestions > on how to get around this I'm all ears... > I don't think the Saturn-ROM look is managing the CPU speed at all, it's part of the OS. How does UNIX determine percentage of CPU used by a process - comp ...Now i calculate the total CPU time used by this process during the last one second as: uses1=(ps2 ... CPU speed - comp.unix.solaris CPU speed via psp_cpu_frequency and ... How to get %CPU Utilization for performance testing using /proc in ...For this my layer is run for a long time and i have to find out that at what %CPU utilization it was running. I wrote a small program to find %CPU of any process. CPU Speed Pro Software - CPU Benchmark Ranking, CPU Comparison and ...CPU Speed Pro Software - CPU Benchmark your processor speed and rank with other Intel and AMD CPUs online. CPU Comparison charts plus Intel vs AMD rankings CPU Speed - CPU Benchmark and Processor Speed Test - Test the REAL ...CPU Speed - CPU Benchmark and Processor Speed Test Intel and AMD Processor Speed Test FREE SOFTWARE 7/24/2012 5:57:20 PM
|