How to profile code parts without thread scheduling errors?

  • Follow


I want to get some timings for the code on and i386/amd64 system but
don't want to get the data
corrupted by tasks/thread switches. I haven't seen any answer if the
performance counter TSC register is saved during task switches but
unfortunately i doubt it (together with enforced CPU affinity).

Is there any way to measure this on Solaris, Windows Vista/XP, Linux
or any of the three BSD's?
Can't be true that all of them sucks so terrible.

Any CPU's clock cycle counting feature would be absolute useless
then.

0
Reply llothar 12/8/2007 5:43:23 AM

In article <e8f6eb49-04da-4bb9-8daf-59f58dde0bc9
@l1g2000hsa.googlegroups.com>, spamtrap@crayne.org says...
> I want to get some timings for the code on and i386/amd64 system but
> don't want to get the data
> corrupted by tasks/thread switches. I haven't seen any answer if the
> performance counter TSC register is saved during task switches but
> unfortunately i doubt it (together with enforced CPU affinity).
> 
> Is there any way to measure this on Solaris, Windows Vista/XP, Linux
> or any of the three BSD's?
> Can't be true that all of them sucks so terrible.
> 
> Any CPU's clock cycle counting feature would be absolute useless
> then.

No, the TSC is not saved/restored on a per-thread basis on any OS of 
which I'm aware. Fortunately, far from rendering it "absolute useless", 
it has virtually no effect whatsoever.

The TSC is useful primarily for timing small, short code sequences. For 
them, the chances of a thread switch happening during a timing run are 
extremely remote (I've been doing such timing semi-regularly for years, 
and seen it happen once). When it does happen, it's quite easy to 
detect: you get one run after another that take, for example, 20 cycles 
-- then you get one that takes tens of thousands of cycles. Clearly the 
latter had a thread switch (or interrupt) take place, and you simply 
throw it out.

If you're timing pieces of code that take long enough that you're at all 
likely to see thread switches while they run, you generally want to use 
the OS' thread timing functions instead. On POSIX-style systems you're 
looking for times(), and on Windows GetThreadTimes.

-- 
    Later,
    Jerry.

The universe is a figment of its own imagination.

0
Reply Jerry 12/8/2007 3:39:19 PM


llothar  <spamtrap@crayne.org> wrote:
>
>I want to get some timings for the code on and i386/amd64 system but
>don't want to get the data corrupted by tasks/thread switches. I 
>haven't seen any answer if the performance counter TSC register is
>saved during task switches but unfortunately i doubt it

It wouldn't be much good if it did.  It counts cycles since power was
applied.

A rough average is the best you can do.  Besides task switches, you also
have to compete with interrupts, and the execution time on current CPUs are
dramatically affected by caches.

>Is there any way to measure this on Solaris, Windows Vista/XP, Linux
>or any of the three BSD's?
>Can't be true that all of them sucks so terrible.

Nonsense.  It is simply that your expectations are naive.  If you need
precise cycle counts, you will have to do it in an MS-DOS environment,
without an operating system, with interrupts disabled.

Fortunately, you don't really need precise cycle counts.  You need an
approximation, and that you CAN get.
-- 
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

0
Reply Tim 12/9/2007 5:16:50 AM

Tim Roberts  <spamtrap@crayne.org> writes:
> llothar  <spamtrap@crayne.org> wrote:
> >
> >I want to get some timings for the code on and i386/amd64 system but
> >don't want to get the data corrupted by tasks/thread switches. I 
> >haven't seen any answer if the performance counter TSC register is
> >saved during task switches but unfortunately i doubt it
> 
> It wouldn't be much good if it did.  It counts cycles since power was
> applied.

Why do you say it wouldn't be much good? The concept of 
a per-task cycle counter isn't at all alien in modern CPU 
architectures.

Phil
-- 
Dear aunt, let's set so double the killer delete select all.
-- Microsoft voice recognition live demonstration

0
Reply Phil 12/9/2007 10:14:36 AM

Tim Roberts <spamtrap@crayne.org> wrote in part:
> llothar  <spamtrap@crayne.org> wrote:
>>haven't seen any answer if the performance counter TSC register is
>>saved during task switches but unfortunately i doubt it
> 
> It wouldn't be much good if it did.  It counts cycles since
> power was applied.

By default, yes.  But TSC is writeable in ring0 and this is
a good feature for undetectable virtualization or to defeat
anti-debugger measures.

-- Robert

0
Reply Robert 12/9/2007 6:08:33 PM

In article <nbuml3hunqniplegs4hkj8tsbv4cmgagti@4ax.com>, 
spamtrap@crayne.org says...
> llothar  <spamtrap@crayne.org> wrote:
> >
> >I want to get some timings for the code on and i386/amd64 system but
> >don't want to get the data corrupted by tasks/thread switches. I 
> >haven't seen any answer if the performance counter TSC register is
> >saved during task switches but unfortunately i doubt it
> 
> It wouldn't be much good if it did.  It counts cycles since power was
> applied.

You _can_ write to the TSC from privileged mode if you see fit to do so. 
It's a lousy idea as a rule, but it is possible.
 
> A rough average is the best you can do.  Besides task switches, you also
> have to compete with interrupts, and the execution time on current CPUs are
> dramatically affected by caches.

You can do a LOT better than a rough average. For the length of code 
you're normally dealing with, a task switch or interrupt sticks out like 
a sore thumb, making a sequence take (at least) several hundred times as 
long as it would otherwise. Cache effects can be dealt with fairly 
dependably as well, simply by running the same sequence in a tight loop 
a few times in a row.

> Nonsense.  It is simply that your expectations are naive.  If you need
> precise cycle counts, you will have to do it in an MS-DOS environment,
> without an operating system, with interrupts disabled.

No you don't. As I pointed out above, an interrupt or other task switch 
is _extremely_ obvious unless you're starting with a code sequence so 
long there's no point in worrying about cycle-level precision anyway.

> Fortunately, you don't really need precise cycle counts.  You need an
> approximation, and that you CAN get.

It certainly makes the most sense to benchmark the code in the same 
environment that'll be used to run it.

-- 
    Later,
    Jerry.

The universe is a figment of its own imagination.

0
Reply Jerry 12/9/2007 6:13:12 PM

Phil Carmody <thefatphil_demunged@yahoo.co.uk> wrote:
>Tim Roberts  <spamtrap@crayne.org> writes:
>> llothar  <spamtrap@crayne.org> wrote:
>> >
>> >I want to get some timings for the code on and i386/amd64 system but
>> >don't want to get the data corrupted by tasks/thread switches. I 
>> >haven't seen any answer if the performance counter TSC register is
>> >saved during task switches but unfortunately i doubt it
>> 
>> It wouldn't be much good if it did.  It counts cycles since power was
>> applied.
>
>Why do you say it wouldn't be much good? The concept of 
>a per-task cycle counter isn't at all alien in modern CPU 
>architectures.

I was on a roll, and abbreviated too much.  Yes, I agree that the concept
of a per-task cycle counter is valuable.

And in fact, such a thing would not be hard to implement in Windows.  The
cycle counter is, after all, writable.
-- 
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

0
Reply Tim 12/11/2007 4:32:03 AM

On Tue, 11 Dec 2007 04:32:03 GMT, Tim Roberts  <spamtrap@crayne.org>
wrote:

>Phil Carmody <thefatphil_demunged@yahoo.co.uk> wrote:
>>Tim Roberts  <spamtrap@crayne.org> writes:
>>> llothar  <spamtrap@crayne.org> wrote:
>>> >
>>> >I want to get some timings for the code on and i386/amd64 system but
>>> >don't want to get the data corrupted by tasks/thread switches. I 
>>> >haven't seen any answer if the performance counter TSC register is
>>> >saved during task switches but unfortunately i doubt it
>>> 
>>> It wouldn't be much good if it did.  It counts cycles since power was
>>> applied.
>>
>>Why do you say it wouldn't be much good? The concept of 
>>a per-task cycle counter isn't at all alien in modern CPU 
>>architectures.
>
>I was on a roll, and abbreviated too much.  Yes, I agree that the concept
>of a per-task cycle counter is valuable.
>
>And in fact, such a thing would not be hard to implement in Windows.  The
>cycle counter is, after all, writable.

Sorry if I'm being dense here, but why would you need or want to
write the TSC for this?    I don't see where writing the TSC gains
you anything that could not be done by simply storing an initial
value and subtracting it later.   But either way, I don't see how it
gets down to a per-task timing, since you'd still have to allow for
interrupts.  (Though, as has been pointed out, they usually would
stick out like a sore thumb and could thus perhaps be excluded
via a deviation threshold test.)

Best regards,


Bob Masta
 
              DAQARTA  v3.50
   Data AcQuisition And Real-Time Analysis
             www.daqarta.com
Scope, Spectrum, Spectrogram, FREE Signal Generator
        Science with your sound card!

0
Reply NoSpam 12/11/2007 12:55:01 PM

7 Replies
93 Views

(page loaded in 0.229 seconds)

Similiar Articles:








7/26/2012 6:57:24 PM


Reply: