Execution times running multiple instances of an application

  • Follow


I'm doing some testing of an application using a Pentium 4 HT cpu. The 
application runs under Cygwin using the XWin X Server and an xterm window. 
It takes 1.9 hrs to complete using gcc 4.2.1 configured with thread model: 
single. When I look at Windows Task Manager, it shows a cpu usage of 50% 
with no other apps running at the time. From a previous optimization thread, 
I tried running 2 instances of the application at the same time by opening 
up another window under XWin. Task manager shows 100% cpu usage but instead 
of completing the 2 applications in 2 hours, it takes 3.9 hours; slightly 
more than twice the time if I had run the applications sequentially. Was 
this to be expected?
Mike 


0
Reply deltaseq0 (54) 11/26/2007 5:55:56 PM

In article <JMD2j.18$gL1.13@newsfe08.lga>,
deltaseq0 <deltaseq0@nospam.net> wrote:
>I'm doing some testing of an application using a Pentium 4 HT cpu. 
(snip)
>up another window under XWin. Task manager shows 100% cpu usage but instead 
>of completing the 2 applications in 2 hours, it takes 3.9 hours; slightly 
>more than twice the time if I had run the applications sequentially. Was 
>this to be expected?

Hyperthreading is complicated and can tend to confuse the operating
system. The OS thinks that you have two real CPUs, but under the hood
they actually the same execution unit so only one of the two 'virtual'
CPUs can actually be doing anything at any given time. This can be a win
if your applications spend a lot of time in cache misses, waiting for
disk reads and the like, in which case the otherwise wasted time can be
spent running other processes. However, IME this is pretty rare, and the
maximum potential speedup in the real world is maybe 5% or so (i.e. two
runs will take slightly less than twice the time of one run). If your
app is largely CPU-bound with medium-sized working sets, the problem of
the cache requirements of the two processes conflicting is likely to
hurt you more than the potential hyperthreading can help you: this is
what's probably happened here.

In essence, your problem is that Task Manager doesn't actually fully
understand your CPU, and so its "50% usage" for the single-app scenario
is wrong. If you had a real dual-core CPU then the situation would be
more like what you expect.

-- 
Mark Mackey   http://www.swallowtail.org/
code code code code code code code code code code code code code bug code co
de code code code bug code code code code code code code code code code code
code code code code code code code code code code code code code code code c

0
Reply markm13 (51) 11/26/2007 6:23:18 PM


"Mark Mackey" <markm@chiark.greenend.org.uk> wrote in message 
news:klh*fpS0r@news.chiark.greenend.org.uk...
> In article <JMD2j.18$gL1.13@newsfe08.lga>,
> deltaseq0 <deltaseq0@nospam.net> wrote:
>>I'm doing some testing of an application using a Pentium 4 HT cpu.
> (snip)
>>up another window under XWin. Task manager shows 100% cpu usage but 
>>instead
>>of completing the 2 applications in 2 hours, it takes 3.9 hours; slightly
>>more than twice the time if I had run the applications sequentially. Was
>>this to be expected?
>
> Hyperthreading is complicated and can tend to confuse the operating
> system. The OS thinks that you have two real CPUs, but under the hood
> they actually the same execution unit so only one of the two 'virtual'
> CPUs can actually be doing anything at any given time. This can be a win
> if your applications spend a lot of time in cache misses, waiting for
> disk reads and the like, in which case the otherwise wasted time can be
> spent running other processes. However, IME this is pretty rare, and the
> maximum potential speedup in the real world is maybe 5% or so (i.e. two
> runs will take slightly less than twice the time of one run). If your
> app is largely CPU-bound with medium-sized working sets, the problem of
> the cache requirements of the two processes conflicting is likely to
> hurt you more than the potential hyperthreading can help you: this is
> what's probably happened here.
>
> In essence, your problem is that Task Manager doesn't actually fully
> understand your CPU, and so its "50% usage" for the single-app scenario
> is wrong. If you had a real dual-core CPU then the situation would be
> more like what you expect.
>
> -- 
> Mark Mackey   http://www.swallowtail.org/
> code code code code code code code code code code code code code bug code 
> co
> de code code code bug code code code code code code code code code code 
> code
> code code code code code code code code code code code code code code code 
> c
>
When the single app runs, it fills 1 "virtual" core to 100% throughout the 
run with no dead time. Is that an indication that the app is cpu-bound?
I was thinking of installing gcc 4.3 with multi-threading and modifying the 
code to accept OpenMP directives. Would that help in this case?
-Mike 


0
Reply deltaseq0 (54) 11/26/2007 6:51:12 PM

On Mon, 26 Nov 2007 12:55:56 -0500, deltaseq0 <deltaseq0@nospam.net>
 wrote in <JMD2j.18$gL1.13@newsfe08.lga>:
> I'm doing some testing of an application using a Pentium 4 HT cpu. The 
> application runs under Cygwin using the XWin X Server and an xterm window. 
> It takes 1.9 hrs to complete using gcc 4.2.1 configured with thread model: 
> single. When I look at Windows Task Manager, it shows a cpu usage of 50% 
> with no other apps running at the time. From a previous optimization thread, 
> I tried running 2 instances of the application at the same time by opening 
> up another window under XWin. Task manager shows 100% cpu usage but instead 
> of completing the 2 applications in 2 hours, it takes 3.9 hours; slightly 
> more than twice the time if I had run the applications sequentially. Was 
> this to be expected?

	I have seen instances where a programme has completely slowed down
due to its "hopping" from one CPU to another and (presumably) losing its
cache in the process -- although this may have been on a proper dual-core
rather than a P4.  You can check if it makes a difference in Task Manager --
in the Processes tab, right-click on the programme and select "Set Affinity"
from the drop-down menu.  Set one copy of the programme to run on CPU 0
and the other to run on CPU 1 and see if it runs any faster.

-- 
Ivan Reid, School of Engineering & Design, _____________  CMS Collaboration,
Brunel University.    Ivan.Reid@[brunel.ac.uk|cern.ch]    Room 40-1-B12, CERN
        KotPT -- "for stupidity above and beyond the call of duty".
0
Reply Ivan.Reid (496) 11/26/2007 9:04:12 PM

"Dr Ivan D. Reid" <Ivan.Reid@brunel.ac.uk> wrote in message 
news:slrnfkmd6c.53d.Ivan.Reid@loki.brunel.ac.uk...
> On Mon, 26 Nov 2007 12:55:56 -0500, deltaseq0 <deltaseq0@nospam.net>
> wrote in <JMD2j.18$gL1.13@newsfe08.lga>:
>> I'm doing some testing of an application using a Pentium 4 HT cpu. The
>> application runs under Cygwin using the XWin X Server and an xterm 
>> window.
>> It takes 1.9 hrs to complete using gcc 4.2.1 configured with thread 
>> model:
>> single. When I look at Windows Task Manager, it shows a cpu usage of 50%
>> with no other apps running at the time. From a previous optimization 
>> thread,
>> I tried running 2 instances of the application at the same time by 
>> opening
>> up another window under XWin. Task manager shows 100% cpu usage but 
>> instead
>> of completing the 2 applications in 2 hours, it takes 3.9 hours; slightly
>> more than twice the time if I had run the applications sequentially. Was
>> this to be expected?
>
> I have seen instances where a programme has completely slowed down
> due to its "hopping" from one CPU to another and (presumably) losing its
> cache in the process -- although this may have been on a proper dual-core
> rather than a P4.  You can check if it makes a difference in Task 
> Manager --
> in the Processes tab, right-click on the programme and select "Set 
> Affinity"
> from the drop-down menu.  Set one copy of the programme to run on CPU 0
> and the other to run on CPU 1 and see if it runs any faster.
>
> -- 
> Ivan Reid, School of Engineering & Design, _____________  CMS 
> Collaboration,
> Brunel University.    Ivan.Reid@[brunel.ac.uk|cern.ch]    Room 40-1-B12, 
> CERN
>        KotPT -- "for stupidity above and beyond the call of duty".
Dr. Reid:
Thanks for the suggestion. Unfortunately, the execution times were not 
changed by setting affinities. - Mike 


0
Reply deltaseq0 (54) 11/27/2007 12:47:24 AM

"deltaseq0" <deltaseq0@nospam.net> wrote in message 
news:wAE2j.12$nr.1@newsfe09.lga...

> When the single app runs, it fills 1 "virtual" core to 100% throughout the 
> run with no dead time. Is that an indication that the app is cpu-bound?
> I was thinking of installing gcc 4.3 with multi-threading and modifying 
> the code to accept OpenMP directives. Would that help in this case?
> -Mike

In your case it probably won't do a lick of good.  One thing you could
do is to cut it down to a benchmark that takes a minute or so to run.
Put that on a thumb drive and take it down to your local store and see
what your times look like running one or two instances on a Core 2 Duo
or one, two, and four instances on a Core 2 Quad.  I have found that
the clerks at the store are just as curious about the performance of
their machines as you are, and they will often let you attempt a
benchmark if it doesn't take too long.

If your program is waiting on memory for the most part running multiple
copies of it won't do much good on the above-mentioned processors
because they only offer a single path to memory and one job will
saturate that path by itself.  If that's the case you can go back to
your code and try to iron out its memory problems or look at a system
that has multiple paths to memory.

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end


0
Reply not_valid (1681) 11/27/2007 2:56:01 AM

Dr Ivan D. Reid wrote:
> On Mon, 26 Nov 2007 12:55:56 -0500, deltaseq0 <deltaseq0@nospam.net>
>  wrote in <JMD2j.18$gL1.13@newsfe08.lga>:
>> I'm doing some testing of an application using a Pentium 4 HT cpu. The 
>> application runs under Cygwin using the XWin X Server and an xterm window. 
>> It takes 1.9 hrs to complete using gcc 4.2.1 configured with thread model: 
>> single. When I look at Windows Task Manager, it shows a cpu usage of 50% 
>> with no other apps running at the time. From a previous optimization thread, 
>> I tried running 2 instances of the application at the same time by opening 
>> up another window under XWin. Task manager shows 100% cpu usage but instead 
>> of completing the 2 applications in 2 hours, it takes 3.9 hours; slightly 
>> more than twice the time if I had run the applications sequentially. Was 
>> this to be expected?
> 
> 	I have seen instances where a programme has completely slowed down
> due to its "hopping" from one CPU to another and (presumably) losing its
> cache in the process -- although this may have been on a proper dual-core
> rather than a P4.  You can check if it makes a difference in Task Manager --
> in the Processes tab, right-click on the programme and select "Set Affinity"
> from the drop-down menu.  Set one copy of the programme to run on CPU 0
> and the other to run on CPU 1 and see if it runs any faster.
> 
As OP is running on a single core HyperThreaded processor, where both
jobs use the same L2 cache, the most likely cache problem is contention
for cache resource.  Swapping L1 is not as big a problem.
My primary use for HyperThreading is running cygwin gcc/gfortran
testsuite, where most of the time is spent on the extremely slow disk
operations.
Under Windows, you are lucky if raising the task manager meter from 50%
to 100% gets you a 20% increase in throughput with HT. It may approach
50% increased throughput with a recent linux, in cases where your
application runs threads without much increase in memory requirement.
0
Reply timothyprince1 (449) 11/27/2007 3:23:51 AM

"James Van Buskirk" <not_valid@comcast.net> wrote in message 
news:QNidnZpxCY7YGtbanZ2dnUVZ_rWtnZ2d@comcast.com...
> "deltaseq0" <deltaseq0@nospam.net> wrote in message 
> news:wAE2j.12$nr.1@newsfe09.lga...
>
>> When the single app runs, it fills 1 "virtual" core to 100% throughout 
>> the run with no dead time. Is that an indication that the app is 
>> cpu-bound?
>> I was thinking of installing gcc 4.3 with multi-threading and modifying 
>> the code to accept OpenMP directives. Would that help in this case?
>> -Mike
>
> In your case it probably won't do a lick of good.  One thing you could
> do is to cut it down to a benchmark that takes a minute or so to run.
> Put that on a thumb drive and take it down to your local store and see
> what your times look like running one or two instances on a Core 2 Duo
> or one, two, and four instances on a Core 2 Quad.  I have found that
> the clerks at the store are just as curious about the performance of
> their machines as you are, and they will often let you attempt a
> benchmark if it doesn't take too long.
>
> If your program is waiting on memory for the most part running multiple
> copies of it won't do much good on the above-mentioned processors
> because they only offer a single path to memory and one job will
> saturate that path by itself.  If that's the case you can go back to
> your code and try to iron out its memory problems or look at a system
> that has multiple paths to memory.
>
> -- 
> write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
> 6.0134700243160014d-154/),(/'x'/)); end
>
>
James:
Good advice! I'll probable go down to the computer store but it may not work 
out because the application was targeted to run on cygwin and I would guess 
that most of their boxes are WinTel. On the other hand, I tried running the 
application from the Command Prompt window and it work! I would not have 
expected that to be the case.
In any event, if I need to recompile, targeting Windows from cygwin is 
beyond me for now. - Mike 


0
Reply deltaseq0 (54) 11/27/2007 1:54:19 PM

7 Replies
38 Views

(page loaded in 0.113 seconds)

Similiar Articles:













7/23/2012 6:00:39 AM


Reply: