I'm doing some testing of an application using a Pentium 4 HT cpu. The
application runs under Cygwin using the XWin X Server and an xterm window.
It takes 1.9 hrs to complete using gcc 4.2.1 configured with thread model:
single. When I look at Windows Task Manager, it shows a cpu usage of 50%
with no other apps running at the time. From a previous optimization thread,
I tried running 2 instances of the application at the same time by opening
up another window under XWin. Task manager shows 100% cpu usage but instead
of completing the 2 applications in 2 hours, it takes 3.9 hours; slightly
more than twice the time if I had run the applications sequentially. Was
this to be expected?
Mike
|
|
0
|
|
|
|
Reply
|
deltaseq0 (54)
|
11/26/2007 5:55:56 PM |
|
In article <JMD2j.18$gL1.13@newsfe08.lga>,
deltaseq0 <deltaseq0@nospam.net> wrote:
>I'm doing some testing of an application using a Pentium 4 HT cpu.
(snip)
>up another window under XWin. Task manager shows 100% cpu usage but instead
>of completing the 2 applications in 2 hours, it takes 3.9 hours; slightly
>more than twice the time if I had run the applications sequentially. Was
>this to be expected?
Hyperthreading is complicated and can tend to confuse the operating
system. The OS thinks that you have two real CPUs, but under the hood
they actually the same execution unit so only one of the two 'virtual'
CPUs can actually be doing anything at any given time. This can be a win
if your applications spend a lot of time in cache misses, waiting for
disk reads and the like, in which case the otherwise wasted time can be
spent running other processes. However, IME this is pretty rare, and the
maximum potential speedup in the real world is maybe 5% or so (i.e. two
runs will take slightly less than twice the time of one run). If your
app is largely CPU-bound with medium-sized working sets, the problem of
the cache requirements of the two processes conflicting is likely to
hurt you more than the potential hyperthreading can help you: this is
what's probably happened here.
In essence, your problem is that Task Manager doesn't actually fully
understand your CPU, and so its "50% usage" for the single-app scenario
is wrong. If you had a real dual-core CPU then the situation would be
more like what you expect.
--
Mark Mackey http://www.swallowtail.org/
code code code code code code code code code code code code code bug code co
de code code code bug code code code code code code code code code code code
code code code code code code code code code code code code code code code c
|
|
0
|
|
|
|
Reply
|
markm13 (51)
|
11/26/2007 6:23:18 PM
|
|
"Mark Mackey" <markm@chiark.greenend.org.uk> wrote in message
news:klh*fpS0r@news.chiark.greenend.org.uk...
> In article <JMD2j.18$gL1.13@newsfe08.lga>,
> deltaseq0 <deltaseq0@nospam.net> wrote:
>>I'm doing some testing of an application using a Pentium 4 HT cpu.
> (snip)
>>up another window under XWin. Task manager shows 100% cpu usage but
>>instead
>>of completing the 2 applications in 2 hours, it takes 3.9 hours; slightly
>>more than twice the time if I had run the applications sequentially. Was
>>this to be expected?
>
> Hyperthreading is complicated and can tend to confuse the operating
> system. The OS thinks that you have two real CPUs, but under the hood
> they actually the same execution unit so only one of the two 'virtual'
> CPUs can actually be doing anything at any given time. This can be a win
> if your applications spend a lot of time in cache misses, waiting for
> disk reads and the like, in which case the otherwise wasted time can be
> spent running other processes. However, IME this is pretty rare, and the
> maximum potential speedup in the real world is maybe 5% or so (i.e. two
> runs will take slightly less than twice the time of one run). If your
> app is largely CPU-bound with medium-sized working sets, the problem of
> the cache requirements of the two processes conflicting is likely to
> hurt you more than the potential hyperthreading can help you: this is
> what's probably happened here.
>
> In essence, your problem is that Task Manager doesn't actually fully
> understand your CPU, and so its "50% usage" for the single-app scenario
> is wrong. If you had a real dual-core CPU then the situation would be
> more like what you expect.
>
> --
> Mark Mackey http://www.swallowtail.org/
> code code code code code code code code code code code code code bug code
> co
> de code code code bug code code code code code code code code code code
> code
> code code code code code code code code code code code code code code code
> c
>
When the single app runs, it fills 1 "virtual" core to 100% throughout the
run with no dead time. Is that an indication that the app is cpu-bound?
I was thinking of installing gcc 4.3 with multi-threading and modifying the
code to accept OpenMP directives. Would that help in this case?
-Mike
|
|
0
|
|
|
|
Reply
|
deltaseq0 (54)
|
11/26/2007 6:51:12 PM
|
|
On Mon, 26 Nov 2007 12:55:56 -0500, deltaseq0 <deltaseq0@nospam.net>
wrote in <JMD2j.18$gL1.13@newsfe08.lga>:
> I'm doing some testing of an application using a Pentium 4 HT cpu. The
> application runs under Cygwin using the XWin X Server and an xterm window.
> It takes 1.9 hrs to complete using gcc 4.2.1 configured with thread model:
> single. When I look at Windows Task Manager, it shows a cpu usage of 50%
> with no other apps running at the time. From a previous optimization thread,
> I tried running 2 instances of the application at the same time by opening
> up another window under XWin. Task manager shows 100% cpu usage but instead
> of completing the 2 applications in 2 hours, it takes 3.9 hours; slightly
> more than twice the time if I had run the applications sequentially. Was
> this to be expected?
I have seen instances where a programme has completely slowed down
due to its "hopping" from one CPU to another and (presumably) losing its
cache in the process -- although this may have been on a proper dual-core
rather than a P4. You can check if it makes a difference in Task Manager --
in the Processes tab, right-click on the programme and select "Set Affinity"
from the drop-down menu. Set one copy of the programme to run on CPU 0
and the other to run on CPU 1 and see if it runs any faster.
--
Ivan Reid, School of Engineering & Design, _____________ CMS Collaboration,
Brunel University. Ivan.Reid@[brunel.ac.uk|cern.ch] Room 40-1-B12, CERN
KotPT -- "for stupidity above and beyond the call of duty".
|
|
0
|
|
|
|
Reply
|
Ivan.Reid (496)
|
11/26/2007 9:04:12 PM
|
|
"Dr Ivan D. Reid" <Ivan.Reid@brunel.ac.uk> wrote in message
news:slrnfkmd6c.53d.Ivan.Reid@loki.brunel.ac.uk...
> On Mon, 26 Nov 2007 12:55:56 -0500, deltaseq0 <deltaseq0@nospam.net>
> wrote in <JMD2j.18$gL1.13@newsfe08.lga>:
>> I'm doing some testing of an application using a Pentium 4 HT cpu. The
>> application runs under Cygwin using the XWin X Server and an xterm
>> window.
>> It takes 1.9 hrs to complete using gcc 4.2.1 configured with thread
>> model:
>> single. When I look at Windows Task Manager, it shows a cpu usage of 50%
>> with no other apps running at the time. From a previous optimization
>> thread,
>> I tried running 2 instances of the application at the same time by
>> opening
>> up another window under XWin. Task manager shows 100% cpu usage but
>> instead
>> of completing the 2 applications in 2 hours, it takes 3.9 hours; slightly
>> more than twice the time if I had run the applications sequentially. Was
>> this to be expected?
>
> I have seen instances where a programme has completely slowed down
> due to its "hopping" from one CPU to another and (presumably) losing its
> cache in the process -- although this may have been on a proper dual-core
> rather than a P4. You can check if it makes a difference in Task
> Manager --
> in the Processes tab, right-click on the programme and select "Set
> Affinity"
> from the drop-down menu. Set one copy of the programme to run on CPU 0
> and the other to run on CPU 1 and see if it runs any faster.
>
> --
> Ivan Reid, School of Engineering & Design, _____________ CMS
> Collaboration,
> Brunel University. Ivan.Reid@[brunel.ac.uk|cern.ch] Room 40-1-B12,
> CERN
> KotPT -- "for stupidity above and beyond the call of duty".
Dr. Reid:
Thanks for the suggestion. Unfortunately, the execution times were not
changed by setting affinities. - Mike
|
|
0
|
|
|
|
Reply
|
deltaseq0 (54)
|
11/27/2007 12:47:24 AM
|
|
"deltaseq0" <deltaseq0@nospam.net> wrote in message
news:wAE2j.12$nr.1@newsfe09.lga...
> When the single app runs, it fills 1 "virtual" core to 100% throughout the
> run with no dead time. Is that an indication that the app is cpu-bound?
> I was thinking of installing gcc 4.3 with multi-threading and modifying
> the code to accept OpenMP directives. Would that help in this case?
> -Mike
In your case it probably won't do a lick of good. One thing you could
do is to cut it down to a benchmark that takes a minute or so to run.
Put that on a thumb drive and take it down to your local store and see
what your times look like running one or two instances on a Core 2 Duo
or one, two, and four instances on a Core 2 Quad. I have found that
the clerks at the store are just as curious about the performance of
their machines as you are, and they will often let you attempt a
benchmark if it doesn't take too long.
If your program is waiting on memory for the most part running multiple
copies of it won't do much good on the above-mentioned processors
because they only offer a single path to memory and one job will
saturate that path by itself. If that's the case you can go back to
your code and try to iron out its memory problems or look at a system
that has multiple paths to memory.
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
|
|
0
|
|
|
|
Reply
|
not_valid (1681)
|
11/27/2007 2:56:01 AM
|
|
Dr Ivan D. Reid wrote:
> On Mon, 26 Nov 2007 12:55:56 -0500, deltaseq0 <deltaseq0@nospam.net>
> wrote in <JMD2j.18$gL1.13@newsfe08.lga>:
>> I'm doing some testing of an application using a Pentium 4 HT cpu. The
>> application runs under Cygwin using the XWin X Server and an xterm window.
>> It takes 1.9 hrs to complete using gcc 4.2.1 configured with thread model:
>> single. When I look at Windows Task Manager, it shows a cpu usage of 50%
>> with no other apps running at the time. From a previous optimization thread,
>> I tried running 2 instances of the application at the same time by opening
>> up another window under XWin. Task manager shows 100% cpu usage but instead
>> of completing the 2 applications in 2 hours, it takes 3.9 hours; slightly
>> more than twice the time if I had run the applications sequentially. Was
>> this to be expected?
>
> I have seen instances where a programme has completely slowed down
> due to its "hopping" from one CPU to another and (presumably) losing its
> cache in the process -- although this may have been on a proper dual-core
> rather than a P4. You can check if it makes a difference in Task Manager --
> in the Processes tab, right-click on the programme and select "Set Affinity"
> from the drop-down menu. Set one copy of the programme to run on CPU 0
> and the other to run on CPU 1 and see if it runs any faster.
>
As OP is running on a single core HyperThreaded processor, where both
jobs use the same L2 cache, the most likely cache problem is contention
for cache resource. Swapping L1 is not as big a problem.
My primary use for HyperThreading is running cygwin gcc/gfortran
testsuite, where most of the time is spent on the extremely slow disk
operations.
Under Windows, you are lucky if raising the task manager meter from 50%
to 100% gets you a 20% increase in throughput with HT. It may approach
50% increased throughput with a recent linux, in cases where your
application runs threads without much increase in memory requirement.
|
|
0
|
|
|
|
Reply
|
timothyprince1 (449)
|
11/27/2007 3:23:51 AM
|
|
"James Van Buskirk" <not_valid@comcast.net> wrote in message
news:QNidnZpxCY7YGtbanZ2dnUVZ_rWtnZ2d@comcast.com...
> "deltaseq0" <deltaseq0@nospam.net> wrote in message
> news:wAE2j.12$nr.1@newsfe09.lga...
>
>> When the single app runs, it fills 1 "virtual" core to 100% throughout
>> the run with no dead time. Is that an indication that the app is
>> cpu-bound?
>> I was thinking of installing gcc 4.3 with multi-threading and modifying
>> the code to accept OpenMP directives. Would that help in this case?
>> -Mike
>
> In your case it probably won't do a lick of good. One thing you could
> do is to cut it down to a benchmark that takes a minute or so to run.
> Put that on a thumb drive and take it down to your local store and see
> what your times look like running one or two instances on a Core 2 Duo
> or one, two, and four instances on a Core 2 Quad. I have found that
> the clerks at the store are just as curious about the performance of
> their machines as you are, and they will often let you attempt a
> benchmark if it doesn't take too long.
>
> If your program is waiting on memory for the most part running multiple
> copies of it won't do much good on the above-mentioned processors
> because they only offer a single path to memory and one job will
> saturate that path by itself. If that's the case you can go back to
> your code and try to iron out its memory problems or look at a system
> that has multiple paths to memory.
>
> --
> write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
> 6.0134700243160014d-154/),(/'x'/)); end
>
>
James:
Good advice! I'll probable go down to the computer store but it may not work
out because the application was targeted to run on cygwin and I would guess
that most of their boxes are WinTel. On the other hand, I tried running the
application from the Command Prompt window and it work! I would not have
expected that to be the case.
In any event, if I need to recompile, targeting Windows from cygwin is
beyond me for now. - Mike
|
|
0
|
|
|
|
Reply
|
deltaseq0 (54)
|
11/27/2007 1:54:19 PM
|
|
|
7 Replies
38 Views
(page loaded in 0.113 seconds)
Similiar Articles: Advice on running Oracle with SGA > 20 GB - comp.databases.oracle ...... very inefficient queries joining multiple tables (consuming in excess of 800 million buffer gets per execution ... the request is silently ignored at > run-time ... Problem with pthreads C++ wrapper class on Linux - comp ...However, a simple test application using this Thread ... Thread(void); // Create and start execution ... Program-wrapper to avoid multiple running instances - comp.unix ... XLSREAD issues - comp.soft-sys.matlabAfter the execution of the .m file ... there is no actual excel application open in my taskbar. I am running a ... might want to check that no instance of Excel is running ... Win32 performance - comp.graphics.api.openglRunning multiple instances is not really a feature of the program, I ... how to disable cpu c state from application or - comp ... Privacy Policy | All Times Are GMT(UTC) | ... High resolution timer. - comp.lang.asm.x86For an application, this generally results in ... gives you the exact amount of real time the execution ... new toy (PPS jitter) - comp.protocols.time.ntp ... It's running ... Memory issue - Hash Approach - comp.soft-sys.sasAborted during the EXECUTION phase. ERROR: The ... For example, if multiple instances of SAS are running concurrently, and all ... Privacy Policy | All Times Are GMT(UTC) | ... Is it possible to limit memory usage per process? - comp.sys.mac ...... Poker Tracker 3') This will run the application ... Safari You can actually start multiple instances of the same application ... Annoying Usenet one post at a time. How to run Mathematica nb file in command line in windows? - comp ...On Sep 19, 2:38 am, timedcy <time...@gmail.com> wrote: > How to run ... I am told single-click execution can be ... Multiple Instances of program - comp.unix.programmer ... PB 11.5 problem Execution to slow to open windows - comp.soft-sys ...Hi all I moved from PB 6.5 to PB 11.5 My application ... While native 11.5 PBVM seems to run at the prior ... problem is to open the file and people are ... multiple ... Problem with gettimeofday() function - comp.unix.programmer ...... function in the TTCP application, this ... Now if we run the receiver, the in its first 1-2 execution it will show a time difference in 0 ... Holds UDP class instance. ofi ... Avoiding Multiple Application Instances - JMNCO Home Page... multiple instances to run ... running, // then activate it and return FALSE from InitInstance to end the // execution of this instance. ... time Application instance 1 ... Run Multiple Instances of a LabVIEW Executable Simultaneously ...... time. Observe that an ini config file is created in the same directory as the executable ; Close the execution of the application ... run multiple instances of this application ... 7/23/2012 6:00:39 AM
|