Itanium performance problem

  • Follow


I'm having a strange performance problem on Itanium HP-UX.

I have a C program that gets data every 30 seconds, does calculations on the
data and outputs the results.

This program currently runs on RISC and SPARC with the same code and has no
issues and there is a desire to now run it on Itanium HP-UX.

The performance problem is in the calculation loop.

I put in debug code to get the wall clock time at the start and end of the
loop.

On SPARC and RISC machines the loop typically takes less than a second
of wall clock time.

On an Itanium virtual machine it takes typically around 45 seconds and on
an Itanium host machine it takes typically around 5 seconds to complete
the loop.

All the performance tools, gprof, caliper, etc., say the program is spending
small fractions of a second of CPU time in the loop, which is what I would
expect and is constant with RISC and SPARC testing.

When in the loop all data and results are in memory and there is no I/O and
no waits.

System performance tools say the CPU's are 90%+ idle while running the
application.

While the data arrays are huge, there is plenty of memory for it and the
system is not swapping.

As a test I ran the RISC version in Itanium emulation mode and got the
same slow results.

Anyone have a clue as to why there is this huge diffence between the CPU
time and wall clock time?


-- 
Jim Pennino

Remove .spam.sux to reply.
0
Reply jimp 11/4/2010 4:04:36 PM

jimp@specsol.spam.sux.com wrote:
> I'm having a strange performance problem on Itanium HP-UX.

> I have a C program that gets data every 30 seconds, does
> calculations on the data and outputs the results.

What sort of data, and how does it end-up being aligned in memory?

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1402251

rick jones
-- 
oxymoron n, commuter in a gas-guzzling luxury SUV with an American flag
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
0
Reply Rick 11/4/2010 11:13:22 PM


Rick Jones <rick.jones2@hp.com> wrote:
> jimp@specsol.spam.sux.com wrote:
>> I'm having a strange performance problem on Itanium HP-UX.
> 
>> I have a C program that gets data every 30 seconds, does
>> calculations on the data and outputs the results.
> 
> What sort of data, and how does it end-up being aligned in memory?

Floating point numbers in the range of about 0 to about 100 that provide
results in the range of 0 to about 50.

Caliper says the application isn't using any CPU time (relative to wall clock
time) and isn't trapping, i.e. the CPU time is milliseconds and the wall
clock time is about 45 seconds.

The exact same code runs fine on SPARC and RISC.

The RISC executable run under emulation has the same problem as the native
Itanium executable.



-- 
Jim Pennino

Remove .spam.sux to reply.
0
Reply jimp 11/5/2010 12:36:22 AM

jimp@specsol.spam.sux.com wrote:
> Floating point numbers in the range of about 0 to about 100 that provide
> results in the range of 0 to about 50.

We don't want no close to zeros.  :-) What is the lower bound?

> Caliper says the application isn't using any CPU time

What does Caliper say the kernel is doing?  If you have traps, it is all 
there.

Do you have a real tool, glance to see what's going on?


> The RISC executable run under emulation has the same problem as the native
> Integrity executable.

Naturally, it is the same hardware.
To prevent denorms, link with +FPD.
0
Reply Dennis 11/5/2010 5:02:02 AM

Dennis Handly <dhandly@convex.hp.com> wrote:
> jimp@specsol.spam.sux.com wrote:
>> Floating point numbers in the range of about 0 to about 100 that provide
>> results in the range of 0 to about 50.
> 
> We don't want no close to zeros.  :-) What is the lower bound?

Zero.

>> Caliper says the application isn't using any CPU time
> 
> What does Caliper say the kernel is doing?  If you have traps, it is all 
> there.

Caliper, prof, gprof, tusc, top, and vmstat say the kernel isn't doing much
of anything. Certainly nothing that takes 45 seconds.

Once more, there are no traps reported by anything.

> Do you have a real tool, glance to see what's going on?

No glance.

> 
>> The RISC executable run under emulation has the same problem as the native
>> Integrity executable.
> 
> Naturally, it is the same hardware.

The virtual machine and the host machine are the same hardware yet the
virtual machine takes 45 seconds of wall clock while the host machine takes
5 seconds of wall clock.

> To prevent denorms, link with +FPD.

Done that; makes no difference what so ever.



-- 
Jim Pennino

Remove .spam.sux to reply.
0
Reply jimp 11/5/2010 2:39:21 PM

jimp@specsol.spam.sux.com wrote:
> The virtual machine and the host machine are the same hardware yet
> the virtual machine takes 45 seconds of wall clock while the host
> machine takes 5 seconds of wall clock.

Which suggests that whatever is happening, it is forcing trips to the
hypervisor/host.

Do you have a small(ish) example you can share?

rick jones
-- 
Process shall set you free from the need for rational thought. 
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
0
Reply Rick 11/5/2010 5:06:54 PM

jimp@specsol.spam.sux.com wrote:
> Rick Jones <rick.jones2@hp.com> wrote:
> > jimp@specsol.spam.sux.com wrote:
> >> The virtual machine and the host machine are the same hardware yet
> >> the virtual machine takes 45 seconds of wall clock while the host
> >> machine takes 5 seconds of wall clock.
> > 
> > Which suggests that whatever is happening, it is forcing trips to the
> > hypervisor/host.
> > 
> > Do you have a small(ish) example you can share?
> > 
> > rick jones

> Example of what exactly?

Example of the code - ie a version that someone else could try running
on their system.

rick jones
-- 
Wisdom Teeth are impacted, people are affected by the effects of events.
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
0
Reply Rick 11/5/2010 8:10:52 PM

Rick Jones <rick.jones2@hp.com> wrote:
> jimp@specsol.spam.sux.com wrote:
>> The virtual machine and the host machine are the same hardware yet
>> the virtual machine takes 45 seconds of wall clock while the host
>> machine takes 5 seconds of wall clock.
> 
> Which suggests that whatever is happening, it is forcing trips to the
> hypervisor/host.
> 
> Do you have a small(ish) example you can share?
> 
> rick jones

Example of what exactly?

Here is a gprof run results of the function in question:

0.1  197.60  0.16   24    6.67    0   _ZN14CalcTravelTime16calc_travel_timeEv

Each invocation took 45 seconds of wall clock time yet gprof says the
execution time for this function is 6.67 milliseconds and only took 0.16 s
to be executed 24 times.

Wall clock time is determined by calling:

start_time = time (0);

just before the function, and;

syslog (LOG_DEBUG, "Loop time %d", time (0) - start_time);

just after the function.


-- 
Jim Pennino

Remove .spam.sux to reply.
0
Reply jimp 11/5/2010 8:26:39 PM

Rick Jones <rick.jones2@hp.com> wrote:
> jimp@specsol.spam.sux.com wrote:
>> Rick Jones <rick.jones2@hp.com> wrote:
>> > jimp@specsol.spam.sux.com wrote:
>> >> The virtual machine and the host machine are the same hardware yet
>> >> the virtual machine takes 45 seconds of wall clock while the host
>> >> machine takes 5 seconds of wall clock.
>> > 
>> > Which suggests that whatever is happening, it is forcing trips to the
>> > hypervisor/host.
>> > 
>> > Do you have a small(ish) example you can share?
>> > 
>> > rick jones
> 
>> Example of what exactly?
> 
> Example of the code - ie a version that someone else could try running
> on their system.
> 
> rick jones

That isn't practical.

The data comes from proprietary devices via a third party licensed comm
package.

Also just the calculations are done in about 20 kb of C++ code, a bit much to
post if there were a data simulator, which there isn't.

I think I said the program was C. That was a typo, it is C++.

And again, the exact same code has been running as expected on SPARC and
PA-RISC machines for years.


-- 
Jim Pennino

Remove .spam.sux to reply.
0
Reply jimp 11/5/2010 10:20:49 PM

You wrote:
>On SPARC and RISC machines the loop typically takes less than a second
>of wall clock time.
>
>On an Itanium virtual machine it takes typically around 45 seconds and on
>an Itanium host machine it takes typically around 5 seconds to complete
>the loop.
....
>Caliper says the application isn't using any CPU time (relative to wall 
>time) 

Focusing on the 45-second run, how's overall performance on that HPVM
Guest? Are resources being managed by anything like PRM/gWLM? Is this
a recent version of HPVM? Are there any system configuration issues
such as would be pointed to by check_patches (on the Host or Guest)?
If more than one Guest on the Host, then how are those Guests
performing in relation to one another from the Host perspective? Have
any resource limiters been configured for the Guest in question e.g.
when it was created? 

For the five-second verses one-second difference it sounds like you're
saying that each system call and app code function is running quickly
with little or no difference between this and other platforms - but
that time around syscalls/functions just seems to disappear for
4-seconds or so? Anything of interest in syslog/dmesg/diags? For
libraries being accessed by the app on the Guest, are there any
missing HP-UX patches that might address known issues? You might want
to open a case with the HP Response Center - perhaps a tool like
kitrace would be useful. 
0
Reply Eric 11/8/2010 7:33:03 AM

Eric Stahl <eric.stahl@earthlink.net> wrote:
> You wrote:
>>On SPARC and RISC machines the loop typically takes less than a second
>>of wall clock time.
>>
>>On an Itanium virtual machine it takes typically around 45 seconds and on
>>an Itanium host machine it takes typically around 5 seconds to complete
>>the loop.
> ...
>>Caliper says the application isn't using any CPU time (relative to wall 
>>time) 
> 
> Focusing on the 45-second run, how's overall performance on that HPVM
> Guest? Are resources being managed by anything like PRM/gWLM? Is this
> a recent version of HPVM? Are there any system configuration issues
> such as would be pointed to by check_patches (on the Host or Guest)?
> If more than one Guest on the Host, then how are those Guests
> performing in relation to one another from the Host perspective? Have
> any resource limiters been configured for the Guest in question e.g.
> when it was created? 

Gauging general performance is a bit difficult as essentially these are
development machines at this point and the program that has the problems
is the only thing that is anywhere near CPU intensive so far.

AFAIK, threre is only the host and one virtual machine.

Supposedly all the latest patches have been applied.

The guys that admin this hardware are new to HP-UX virtualization and
could well have something hosed in the setup.

> For the five-second verses one-second difference it sounds like you're
> saying that each system call and app code function is running quickly
> with little or no difference between this and other platforms - but
> that time around syscalls/functions just seems to disappear for
> 4-seconds or so? Anything of interest in syslog/dmesg/diags? For
> libraries being accessed by the app on the Guest, are there any
> missing HP-UX patches that might address known issues? You might want
> to open a case with the HP Response Center - perhaps a tool like
> kitrace would be useful. 

What it looks like is for the host machine the CPU disappears for about 4
seconds while on the virtual machine the CPU disappears for about 45 seconds.

There is nothing in any of the logs that give a clue.

At this point I'm pretty much conviced that the issue has something to do
with the virtualization software CPU scheduling and which hides it's actions
from the regular OS utilities/tools.

I have found some HP recommended monitoring/diagnostics that deal with
virtualization that are not installed and asked the admins to install them.

The admins have opened a case with HP, so maybe.

Hopefully HP will eventually send out an expert on virtualization who will
find something less than optimal in the setup.

In the mean time if anyone knows of any documention on optimizing performance
on virtual setups it would be appreciated.


-- 
Jim Pennino

Remove .spam.sux to reply.
0
Reply jimp 11/8/2010 5:18:55 PM

jimp@specsol.spam.sux.com wrote:
> Dennis Handly <dhandly@convex.hp.com> wrote:
>> We don't want no close to zeros.  :-) What is the lower bound?
> 
> Zero.

What I meant is what is the lowest lower bound, besides zero.  Are you 
getting denorms?

>> To prevent denorms, link with +FPD.
> 
> Done that; makes no difference what so ever.

Ok, it seems an HPVM issue.  Or with your expectations.  ;-)

I assume you have at least one real CPU for each VM?
0
Reply Dennis 11/9/2010 5:32:25 AM

Dennis Handly <dhandly@convex.hp.com> wrote:
> jimp@specsol.spam.sux.com wrote:
>> Dennis Handly <dhandly@convex.hp.com> wrote:
>>> We don't want no close to zeros.  :-) What is the lower bound?
>> 
>> Zero.
> 
> What I meant is what is the lowest lower bound, besides zero.  Are you 
> getting denorms?
> 
>>> To prevent denorms, link with +FPD.
>> 
>> Done that; makes no difference what so ever.
> 
> Ok, it seems an HPVM issue.  Or with your expectations.  ;-)
> 
> I assume you have at least one real CPU for each VM?

I currently have no way of knowing.

I found some management/diagnostic software on the HP site and have asked
the admins to install it and give me access.


-- 
Jim Pennino

Remove .spam.sux to reply.
0
Reply jimp 11/9/2010 5:45:17 PM

jimp@specsol.spam.sux.com wrote:
> I currently have no way of knowing.

How many CPUs on the host?  How many VMs and how many CPUs does each VM 
have?
0
Reply Dennis 11/10/2010 3:37:52 AM

Dennis Handly <dhandly@convex.hp.com> wrote:
> jimp@specsol.spam.sux.com wrote:
>> I currently have no way of knowing.
> 
> How many CPUs on the host?  How many VMs and how many CPUs does each VM 
> have?

I currently have no way of knowing other than asking the admins.

Like I said, I've asked for access.

-- 
Jim Pennino

Remove .spam.sux to reply.
0
Reply jimp 11/10/2010 5:43:54 AM

14 Replies
481 Views

(page loaded in 0.132 seconds)

Similiar Articles:


















7/23/2012 9:31:52 AM


Reply: