Hi guys,
I need to implement low cost high resolution timer (on a scale of
1ms) . It should run on Windows and Linux so there is no Win32 API.
The first thing to use is RDTSC. Every body express concern about CPU
throttling. What about multiple CPUs. Does RDTSC return the same
number for different CPUs?
|
|
0
|
|
|
|
Reply
|
Sergey
|
6/15/2008 2:51:18 AM |
|
Sergey.skv schrieb:
> I need to implement low cost high resolution timer (on a scale of
> 1ms) . It should run on Windows and Linux so there is no Win32 API.
> The first thing to use is RDTSC. Every body express concern about CPU
> throttling. What about multiple CPUs. Does RDTSC return the same
> number for different CPUs?
Use QueryPerformanceCounter / QueryPerformanceFrequency
on Windows and gettimeofday on Linux.
Those are independent of CPU cores and clock changes.
Hendrik vdH
|
|
0
|
|
|
|
Reply
|
Hendrik
|
6/15/2008 8:08:27 AM
|
|
On Jun 14, 7:51 pm, Sergey.skv <spamt...@crayne.org> wrote:
> Hi guys,
> I need to implement low cost high resolution timer (on a scale of
> 1ms) . It should run on Windows and Linux so there is no Win32 API.
> The first thing to use is RDTSC. Every body express concern about CPU
> throttling. What about multiple CPUs. Does RDTSC return the same
> number for different CPUs?
No, TSCs are independent and they may count at different rates,
although close. And AFAIK power management is also not global, so you
can never know when the rate may drop on some CPU or go back.
How about implementing two versions (OS-specific) of the timer and
hiding the implementations behind a generic interface?
Alex
|
|
0
|
|
|
|
Reply
|
Alexei
|
6/15/2008 9:07:49 AM
|
|
On Sat, 14 Jun 2008 19:51:18 -0700 (PDT), Sergey.skv
<spamtrap@crayne.org> wrote:
>Hi guys,
>I need to implement low cost high resolution timer (on a scale of
>1ms) . It should run on Windows and Linux so there is no Win32 API.
>The first thing to use is RDTSC. Every body express concern about CPU
>throttling. What about multiple CPUs. Does RDTSC return the same
>number for different CPUs?
>
You are right to be concerned about RDTSC and power management,
and no, it does not return the same number on different CPUs... it
returns the number of clock ticks since power-up, which of course
depends upon the CPU speed (which can change during power management).
You need to explain exactly what you are trying to do, because the
best answer depends upon multiple factors. 1 msec resolution is
available from the CMOS clock, but I'm not sure how to get at that
under Windows. The time-of-day clock has sub-microsecond resolution,
but is probably out of the question for Windows. RDTSC has incredible
resolution, but has the variable-speed problem... though it is
available in either OS.
Windows timer APIs have nominal 1 msec resolution, but in reality it's
probably more like 10-20 msec unless you use the multimedia timers.
Note that in any multi-tasking OS, you don't get unlimited access to
the timers... your task has to wait its turn with all the others,
including internal OS tasks you don't even know about.
That's true no matter which timer you are using... even RDTSC. You
never know when the OS is going to snatch the thread from you to go
off and scratch its proverbial behind. So you may need to make some
compromises, unless you want to write Ring 0 code that can take
absolute control and hold up the other tasks while it does so.
Best regards,
Bob Masta
DAQARTA v4.00
Data AcQuisition And Real-Time Analysis
www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
FREE Signal Generator
Science with your sound card!
|
|
0
|
|
|
|
Reply
|
NoSpam
|
6/15/2008 12:25:23 PM
|
|
On 15 Jun, 03:51, Sergey.skv <spamt...@crayne.org> wrote:
> Hi guys,
> I need to implement low cost high resolution timer (on a scale of
> 1ms) . It should run on Windows and Linux so there is no Win32 API.
> The first thing to use is RDTSC. Every body express concern about CPU
> throttling. What about multiple CPUs. Does RDTSC return the same
> number for different CPUs?
If you are writing in C take a look at the gettimeofday and select
calls. The timer resolution will depend on your OS but these calls
should make use of whatever resolution is available.
|
|
0
|
|
|
|
Reply
|
James
|
6/15/2008 1:54:55 PM
|
|
Hendrik van der Heijden <spamtrap@crayne.org> wrote:
>Sergey.skv schrieb:
>> I need to implement low cost high resolution timer (on a scale of
>> 1ms) . It should run on Windows and Linux so there is no Win32 API.
>> The first thing to use is RDTSC. Every body express concern about CPU
>> throttling. What about multiple CPUs. Does RDTSC return the same
>> number for different CPUs?
>
>Use QueryPerformanceCounter / QueryPerformanceFrequency
>on Windows and gettimeofday on Linux.
>Those are independent of CPU cores and clock changes.
Unfortunately, that's not true. On a multiprocessor machine,
QueryPerformanceCounter returns the raw cycle counter. The returned value
can even go backwards if you switch CPUs.
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.
|
|
0
|
|
|
|
Reply
|
Tim
|
6/16/2008 12:01:37 AM
|
|
Sergey.skv wrote:
> Hi guys,
> I need to implement low cost high resolution timer (on a scale of
> 1ms) . It should run on Windows and Linux so there is no Win32 API.
> The first thing to use is RDTSC. Every body express concern about CPU
> throttling. What about multiple CPUs. Does RDTSC return the same
> number for different CPUs?
>
TSCs of different CPUs should be synchronized automatically by the CPU
hardware itself. So yes, RDTSC should return the same value.
Thanks,
Jike
|
|
0
|
|
|
|
Reply
|
Jike
|
6/16/2008 2:05:39 AM
|
|
On Jun 15, 7:05 pm, Jike Song <spamt...@crayne.org> wrote:
> Sergey.skv wrote:
> > Hi guys,
> > I need to implement low cost high resolution timer (on a scale of
> > 1ms) . It should run on Windows and Linux so there is no Win32 API.
> > The first thing to use is RDTSC. Every body express concern about CPU
> > throttling. What about multiple CPUs. Does RDTSC return the same
> > number for different CPUs?
>
> TSCs of different CPUs should be synchronized automatically by the CPU
> hardware itself. So yes, RDTSC should return the same value.
>
> Thanks,
> Jike
By saying "should" are you expressing a desire or know it for a fact?
If you know it, please provide supportive references.
Alex
|
|
0
|
|
|
|
Reply
|
Alexei
|
6/16/2008 4:56:38 AM
|
|
Tim Roberts schrieb:
> Hendrik van der Heijden <spamtrap@crayne.org> wrote:
>> Sergey.skv schrieb:
>>> I need to implement low cost high resolution timer (on a scale of
>>> 1ms) . It should run on Windows and Linux so there is no Win32 API.
>>> The first thing to use is RDTSC. Every body express concern about CPU
>>> throttling. What about multiple CPUs. Does RDTSC return the same
>>> number for different CPUs?
>> Use QueryPerformanceCounter / QueryPerformanceFrequency
>> on Windows and gettimeofday on Linux.
>> Those are independent of CPU cores and clock changes.
>
> Unfortunately, that's not true. On a multiprocessor machine,
> QueryPerformanceCounter returns the raw cycle counter. The returned value
> can even go backwards if you switch CPUs.
Afaik, the idea is that QueryPerformanceCounter should not have
the problems of RDTSC with core migration and changing clok frequency.
However, there were patches from MS (KB896256) and AMD to get
the intended behaviour right on multicore/MP systems.
Hendrik vdH
|
|
0
|
|
|
|
Reply
|
Hendrik
|
6/16/2008 6:18:16 AM
|
|
Alexei A. Frounze wrote:
>
> By saying "should" are you expressing a desire or know it for a fact?
> If you know it, please provide supportive references.
>
Sorry, what I mean is the hardware should take the responsibility to
make sure that different TSCs have the same value, at least very close
to each other.
Jike
|
|
0
|
|
|
|
Reply
|
Jike
|
6/16/2008 9:14:58 AM
|
|
Hi,
On Jun 16, 11:05 am, Jike Song <spamt...@crayne.org> wrote:
> TSCs of different CPUs should be synchronized automatically by the CPU
> hardware itself. So yes, RDTSC should return the same value.
In this case "should" means "it'd be nice if it did", but in practice
you can't assume this.
For an example, see this web page: http://msdn.microsoft.com/en-us/library/bb173458(VS.85).aspx
Here's a quote from that web page:
"Discontinuous values. Using RDTSC directly assumes that the thread is
always running on the same processor. Multiprocessor and dual-core
systems do not guarantee synchronization of their cycle counters
between cores. This is exacerbated when combined with modern power
management technologies that idle and restore various cores at
different times, which results in the cores typically being out of
synchronization. For an application, this generally results in
glitches or in potential crashes as the thread jumps between the
processors and gets timing values that result in large deltas,
negative deltas, or halted timing."
AMD invented a new instruction to work-around this problem - the
RDTSCP instruction. This instruction returns the TSC for the current
CPU and also returns the processor ID. The idea is that when you're
timing something you'd do something like:
startTime and startCPU = RDTSP
*** something ***
endTime and endCPU = RDTSP
if(startCPU != endCPU) {
printf("Can't measure the cycles used - different CPUs used");
} else {
printf("Cycles used: ", endTime - startTime);
}
Or alternatively (if what you're measuring can safely be repeated),
you could try:
do {
startTime and startCPU = RDTSP
*** something ***
endTime and endCPU = RDTSP
} while(startCPU != endCPU);
printf("Cycles used: ", endTime - startTime);
Of course this DOES NOT WORK - it suffers from the "ABA
problem" (note: for a description of this problem see: "http://
en.wikipedia.org/wiki/ABA_problem").
The solution is for the OS to set the TSD flag in CR4 and to
virtualise the time stamp counter, such that each thread has it's own
virtual time stamp counter. In this case you'd have an invalid opcode
handler that emulates the RDTSC instruction by doing "virtualTSC =
virtual_TSC_for_this_thread + RDTSC - RDTSC_at_last_thread_switch" and
the thread switch code would need to do "virtual_TSC_for_this_thread
+= RDTSC - RDTSC_at_last_thread_switch".
A better idea would be to use performance monitoring counters (e.g.
"non-halted clock ticks"), and give each thread 2 separate virtual
TSCs - one for "cycles spent running this thread at CPL=3" and the
other for "cycles spent running this thread at CPL=0", so that the
thread's "CPL=3 TSC" isn't effected as much by IRQs, etc.
And yes, I do worry about the sanity of Microsoft, Linux and AMD -
none of them get it right... ;-)
However, I'm talking about measuring the cycles used by a certain
piece of code, and not talking about using TSC to measure real time.
To measure real time you need a timer that measures real time, not a
timer that measures CPU cycles. Basically the TSC is not the right
tool for the job. Use HPET, the local APIC timer, the RTC or the PIT;
otherwise you'll need a collection of work-arounds for different cases
just to get it slightly right (which will involve messing with the
scheduler and using the local APIC's "thermal sensor" interrupt).
Cheers,
Brendan
|
|
0
|
|
|
|
Reply
|
Brendan
|
6/16/2008 2:12:40 PM
|
|
Brendan wrote:
> AMD invented a new instruction to work-around this problem - the
> RDTSCP instruction. This instruction returns the TSC for the current
> CPU and also returns the processor ID. The idea is that when you're
> timing something you'd do something like:
>
> startTime and startCPU = RDTSP
> *** something ***
> endTime and endCPU = RDTSP
> if(startCPU != endCPU) {
> printf("Can't measure the cycles used - different CPUs used");
> } else {
> printf("Cycles used: ", endTime - startTime);
> }
>
> Or alternatively (if what you're measuring can safely be repeated),
> you could try:
>
> do {
> startTime and startCPU = RDTSP
> *** something ***
> endTime and endCPU = RDTSP
> } while(startCPU != endCPU);
> printf("Cycles used: ", endTime - startTime);
>
>
> Of course this DOES NOT WORK - it suffers from the "ABA
> problem" (note: for a description of this problem see: "http://
> en.wikipedia.org/wiki/ABA_problem").
Why doesn't it work, in capital letters? It does work. It gives you the exact
amount of real time the execution of "*** something ***" took, even if it was
partially executed on another CPU.
|
|
0
|
|
|
|
Reply
|
Cyril
|
6/16/2008 8:52:35 PM
|
|
"Brendan" <spamtrap@crayne.org> wrote in message
news:11c1c9ad-1923-49cf-b786-0a4c13724ed2@u12g2000prd.googlegroups.com...
> To measure real time you need a timer that measures real time,
True, but that requires hardware and privilege to access the hardware -
since it's not on cpu. Wasn't the OP looking for a non-hardware solution?
OP:
>>>> It should run on Windows and Linux so there is no Win32 API.
> To measure real time you need a timer that measures real time, not a
> timer that measures CPU cycles.
If you measure a fixed period in cycles (and are using uniprocessor), you
can convert (with slight error) from cycles to time. There are many things
in a PC you can measure in cycles. The issue is finding an event which has
a fixed period over many generations of PC designs, where measurement of the
event is not affected by multiple cores and is not affected by SMM or other
processor takeover/sleep events/interrupts, and the event has little or no
privilege and few programming issues...
E.g., soundcard timing has sufficient accuracy, but requires programming and
privilege, and audio isn't ubiquitous or standardized...
http://www.leapsecond.com/pages/sound-1pps/
> Basically the TSC is not the right
> tool for the job.
True, but I suspect a USB GPS (1ms resolution) was out of the question (due
to cost, privilege, hardware solution, lack of ubiquity etc.):
>>>> I need to implement low cost high resolution timer...
Rod Pemberton
|
|
0
|
|
|
|
Reply
|
Rod
|
6/17/2008 12:51:46 AM
|
|
Hi,
On Jun 17, 9:51 am, "Rod Pemberton" <spamt...@crayne.org> wrote:
> "Brendan" <spamt...@crayne.org> wrote in message
>
> news:11c1c9ad-1923-49cf-b786-0a4c13724ed2@u12g2000prd.googlegroups.com...
>
> > To measure real time you need a timer that measures real time,
>
> True, but that requires hardware and privilege to access the hardware -
> since it's not on cpu. Wasn't the OP looking for a non-hardware solution?
> OP:
>
> >>>> It should run on Windows and Linux so there is no Win32 API.
My comments are in reply to Jike Song's post (ie. "TSCs of different
CPUs should be synchronized automatically by the CPU hardware
itself."), which is why I quoted Jike Song's post. I didn't quote the
OP's post because I wasn't replying to the OP's post.
The OP wants to write a Windows application that doesn't use the Win32
API (and a Linux application that doesn't use the Linux API, I assume)
and I'm supposed to think it's a sane request? Alex was smart enough
to suggest the right solution ("How about implementing two versions
(OS-specific) of the timer and hiding the implementations behind a
generic interface?").
> > To measure real time you need a timer that measures real time, not a
> > timer that measures CPU cycles.
>
> If you measure a fixed period in cycles (and are using uniprocessor), you
> can convert (with slight error) from cycles to time. There are many things
> in a PC you can measure in cycles.
There's lots of things I could measure with my left buttock, but that
doesn't make a "buttocks-worth" an appropriate unit of measurement.
If the CPU is running at 100% and you measure how many cycles per
second the CPU is running at; then the CPU's gets a little warmer and
thermal throttling kicks in and the CPU automatically starts running
at 25%; then you use the "cycles per second" you worked out earlier to
measure what you think is a 1 second delay and the delay will actually
be a 4 second delay.
If the CPU is running at 25% and you measure how many cycles per
second the CPU is running at; then the CPU's gets a little cooler and
thermal throttling stops and the CPU automatically starts running at
100%; then you use the "cycles per second" you worked out earlier to
measure what you think is a 1 second delay and the delay will actually
be a 250 ms delay.
Is this what you mean by "slight error"? In some circles this amount
of error is called "gross negligence"...
Cheers,
Brendan
|
|
0
|
|
|
|
Reply
|
Brendan
|
6/17/2008 9:53:20 AM
|
|
Hi,
On Jun 17, 5:52 am, Cyril Novikov <spamt...@crayne.org> wrote:
> Brendan wrote:
> > AMD invented a new instruction to work-around this problem - the
> > RDTSCP instruction. This instruction returns the TSC for the current
> > CPU and also returns the processor ID. The idea is that when you're
> > timing something you'd do something like:
>
> > startTime and startCPU = RDTSP
> > *** something ***
> > endTime and endCPU = RDTSP
> > if(startCPU != endCPU) {
> > printf("Can't measure the cycles used - different CPUs used");
> > } else {
> > printf("Cycles used: ", endTime - startTime);
> > }
>
> > Or alternatively (if what you're measuring can safely be repeated),
> > you could try:
>
> > do {
> > startTime and startCPU = RDTSP
> > *** something ***
> > endTime and endCPU = RDTSP
> > } while(startCPU != endCPU);
> > printf("Cycles used: ", endTime - startTime);
>
> > Of course this DOES NOT WORK - it suffers from the "ABA
> > problem" (note: for a description of this problem see: "http://
> > en.wikipedia.org/wiki/ABA_problem").
>
> Why doesn't it work, in capital letters? It does work. It gives you the exact
> amount of real time the execution of "*** something ***" took, even if it was
> partially executed on another CPU.
There's 2 uses for the TSC - measuring how many cycles a piece of code
consumes (e.g. for profiling purposes), and measuring real time.
For measuring how many cycles a piece of code consumes; if the piece
of code being measured runs for 1000 cycles on one CPU, then 1000
cycles on another CPU, then 1000 cycles on the first CPU, then the
correct result would be 3000 cycles. If the CPUs are running at
different speeds (which does happen due to power management
technologies like AMD's Cool'N'Quiet and Intel's SpeedStep) then the
result could be 6000 cycles, or 1250 cycles, or something else. There
are work-arounds for this problem (e.g. doing the test lots of times
and picking the most common result to avoid erroneous results), but if
you're using work-arounds like this then RDTSC will probably work as
well as RDTSCP.
For measuring real time; RDTSC and RDTSCP is mostly useless within an
application - there's too many things that can go wrong that need to
be worked around by kernel code; and if the kernel code does implement
these work-arounds then the application would end up using an API
function instead of using RDTSC or RDTSCP directly. Of course in this
case the kernel doesn't really need RDTSCP to figure out which CPU
it's running on.
Cheers,
Brendan
|
|
0
|
|
|
|
Reply
|
Brendan
|
6/17/2008 12:33:32 PM
|
|
On Mon, 16 Jun 2008 07:12:40 -0700 (PDT), Brendan
<spamtrap@crayne.org> wrote:
<snip>
> Basically the TSC is not the right tool for the job.
Especially since the OP has not told us what "the job" is yet!
>Use HPET, the local APIC timer, the RTC or the PIT;
>otherwise you'll need a collection of work-arounds for different cases
>just to get it slightly right (which will involve messing with the
>scheduler and using the local APIC's "thermal sensor" interrupt).
I'm concerned that since the OP says "I need to implement low cost
high resolution timer (on a scale of 1ms)" he may not be timing code,
but external events... like a stopwatch. If so, he needs to
understand that a multitasking OS like Windows or Linux is not going
to give 1msec resolution with *any* timer, since there will always be
much more than 1 msec uncertainty in the latency.
The only sure way that I can think of to do a decent stopwatch
(without messing with Ring 0) is to use the sound card. Connect a
switch and battery so the Line input sees 1.5V to start the interval
and 0V to end it. The card is AC-coupled so the Line input should not
be left open... use a SPDT switch between 1.5 and 0V, or use an SPST
switch to 1.5V but keep (say) 1K across the input.
Now when the input goes to 1.5V the ADC sees a postive spike, and when
it goes to 0 you get a negative spike. Monitor the data stream and
detect the 2 spikes, and count the samples betweeen them. At 48000
samples per sec, that gives about 20.8 microsecond resolution and
essentially no latency issues.
You never know the absolute start or stop time, but the interval is
accurate. There is a latency between the events and your knowledge of
them, since you have to wait for the sound card to return full buffers
of data (maybe 20 msec or so), but is likely of no concern.
Obviously, this is non-trivial unless you are already comfortable with
sound card functions. And it will need different implementations for
Windows and Linux. But it will definitely work as a stopwatch!
Best regards,
Bob Masta
DAQARTA v4.00
Data AcQuisition And Real-Time Analysis
www.daqarta.com
Scope, Spectrum, Spectrogram, Sound Level Meter
FREE Signal Generator
Science with your sound card!
|
|
0
|
|
|
|
Reply
|
NoSpam
|
6/17/2008 1:03:30 PM
|
|
Brendan mentioned:
....
> If the CPU is running at 100% and you measure how many cycles per
> second the CPU is running at; then the CPU's gets a little warmer and
> thermal throttling kicks in and the CPU automatically starts running
> at 25%; then you use the "cycles per second" you worked out earlier to
> measure what you think is a 1 second delay and the delay will actually
> be a 4 second delay.
> If the CPU is running at 25% and you measure how many cycles per
> second the CPU is running at; then the CPU's gets a little cooler and
> thermal throttling stops and the CPU automatically starts running at
> 100%; then you use the "cycles per second" you worked out earlier to
> measure what you think is a 1 second delay and the delay will actually
> be a 250 ms delay.
> Is this what you mean by "slight error"? In some circles this amount
> of error is called "gross negligence"...
Regardless of a CPu's 'thermal slow down':
If I'd detect any HW which needs "Seconds" to respond I'd throw it as
far as I can. the max. Repond/Delay I ever found are initialising a
PS/2 wheel mouse (~200 mS). Everthing else needs max. 1mS..20 mS
(milliSeconds)
But right, the only reliable timer sources are RTCL and PIT beside
VGA-Counters, USB-framecounters, AUDIO-I/O or mem-mapped timers and
of course my own designed/manufactured HW (nanoSecond X-tal-circuit
as standard part of my own built steroscopic video-aquistion cards).
__
wolfgang
|
|
0
|
|
|
|
Reply
|
Wolfgang
|
6/17/2008 10:18:42 PM
|
|
Hi,
On Jun 18, 7:18 am, "Wolfgang Kern" <spamt...@crayne.org> wrote:
> Brendan mentioned:
> > If the CPU is running at 100% and you measure how many cycles per
> > second the CPU is running at; then the CPU's gets a little warmer and
> > thermal throttling kicks in and the CPU automatically starts running
> > at 25%; then you use the "cycles per second" you worked out earlier to
> > measure what you think is a 1 second delay and the delay will actually
> > be a 4 second delay.
> > If the CPU is running at 25% and you measure how many cycles per
> > second the CPU is running at; then the CPU's gets a little cooler and
> > thermal throttling stops and the CPU automatically starts running at
> > 100%; then you use the "cycles per second" you worked out earlier to
> > measure what you think is a 1 second delay and the delay will actually
> > be a 250 ms delay.
> > Is this what you mean by "slight error"? In some circles this amount
> > of error is called "gross negligence"...
>
> Regardless of a CPu's 'thermal slow down':
> If I'd detect any HW which needs "Seconds" to respond I'd throw it as
> far as I can. the max. Repond/Delay I ever found are initialising a
> PS/2 wheel mouse (~200 mS). Everthing else needs max. 1mS..20 mS
> (milliSeconds)
Same thing at a different scale. For e.g. if you need a 1 ms delay and
don't know what the thermal throttling is doing, then you'd need to
use an "8 ms" delay to make sure the delay isn't too short (in case
calibration was done with thermal throttling but the delay occurs
without throttling). This "8 ms" delay may end up being a 1 ms delay
or a 64 ms delay, or something in-between.
> But right, the only reliable timer sources are RTCL and PIT beside
> VGA-Counters, USB-framecounters, AUDIO-I/O or mem-mapped timers and
> of course my own designed/manufactured HW (nanoSecond X-tal-circuit
> as standard part of my own built steroscopic video-aquistion cards).
The full list of possible timers would be RTC, PIT, local APIC timer
and HPET, where all of them may or may not be present but at least one
will be present (as HPET is used to emulate RTC and PIT in some
motherboards, and this emulation could be disabled by the OS - don't
know if any OSs do that though). I wouldn't rely on VGA-counters
unless you're writing a VGA driver, or USB framecounters unless you're
writing a USB controller driver, etc.
Cheers,
Brendan
|
|
0
|
|
|
|
Reply
|
Brendan
|
6/18/2008 12:19:15 AM
|
|
Brendan wrote:
>
> The full list of possible timers would be RTC, PIT, local APIC timer
> and HPET,
and the ACPI Power Management Timer?
Regards,
Jike
|
|
0
|
|
|
|
Reply
|
Jike
|
6/18/2008 2:30:31 AM
|
|
Hendrik van der Heijden <spamtrap@crayne.org> wrote:
>Tim Roberts schrieb:
>>
>> Unfortunately, that's not true. On a multiprocessor machine,
>> QueryPerformanceCounter returns the raw cycle counter. The returned value
>> can even go backwards if you switch CPUs.
>
>Afaik, the idea is that QueryPerformanceCounter should not have
>the problems of RDTSC with core migration and changing clok frequency.
>
>However, there were patches from MS (KB896256) and AMD to get
>the intended behaviour right on multicore/MP systems.
That's what the documents would lead you to think, but I have XP SP2 with
KB896256 installed on my Intel Core2 Duo 2GHz laptop, and after being
booted up for just about an hour, my two cores are currently 533,500,000
cycles apart.
However, QueryPerformanceCounter now seems to be using the ACPI timer, and
all calls seem to be forced to a single core, so the results are once again
monotonic.
--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.
|
|
0
|
|
|
|
Reply
|
Tim
|
6/18/2008 5:34:26 AM
|
|
On Jun 17, 7:30 pm, Jike Song <spamt...@crayne.org> wrote:
> Brendan wrote:
>
> > The full list of possible timers would be RTC, PIT, local APIC timer
> > and HPET,
>
> and the ACPI Power Management Timer?
Isn't that just a slow counter from which you can't get interrupts?
Alex
|
|
0
|
|
|
|
Reply
|
Alexei
|
6/18/2008 8:49:53 AM
|
|
Brendan replied:
....
>> Regardless of a CPu's 'thermal slow down':
>> If I'd detect any HW which needs "Seconds" to respond I'd throw it as
>> far as I can. the max. Repond/Delay I ever found are initialising a
>> PS/2 wheel mouse (~200 mS). Everthing else needs max. 1mS..20 mS
>> (milliSeconds)
> Same thing at a different scale. For e.g. if you need a 1 ms delay and
> don't know what the thermal throttling is doing, then you'd need to
> use an "8 ms" delay to make sure the delay isn't too short (in case
> calibration was done with thermal throttling but the delay occurs
> without throttling). This "8 ms" delay may end up being a 1 ms delay
> or a 64 ms delay, or something in-between.
And that's why I use the PIT (1mS IRQ) for HW delays and timeout.
The CPU would have to throttle down by a very huge factor to become any
noticable additional delay. 1mS means 10**6 ticks on a 1GHz CPU and
HW-routines are short, ie: ~500 cycles(+I/O-access) per IRQ on my OS.
The I/O access time wont count to the delay of a slowed CPU.
>> But right, the only reliable timer sources are RTCL and PIT beside
>> VGA-Counters, USB-framecounters, AUDIO-I/O or mem-mapped timers and
>> of course my own designed/manufactured HW (nanoSecond X-tal-circuit
>> as standard part of my own built steroscopic video-aquistion cards).
> The full list of possible timers would be RTC, PIT, local APIC timer
> and HPET, where all of them may or may not be present but at least one
> will be present (as HPET is used to emulate RTC and PIT in some
> motherboards, and this emulation could be disabled by the OS - don't
> know if any OSs do that though). I wouldn't rely on VGA-counters
> unless you're writing a VGA driver, or USB framecounters unless you're
> writing a USB controller driver, etc.
Right, USB and VGA-timers are not the best choice because they tend to
vary with modes and rates.
I remember to have played with the LPT as an additional IRQ-timer
by just mounting an RC-component between Strobe and ACK, so the
IRQ-routine just had to toggle the Strobe-pin to keep it running.
But the achievable frequency-range was somehow limited anyway.
__
wolfgang
|
|
0
|
|
|
|
Reply
|
Wolfgang
|
6/18/2008 11:37:29 AM
|
|
"Tim Roberts" <spamtrap@crayne.org> wrote in message
news:an6h545q9dleuvb9c4t3kh1b43iivle1sj@4ax.com...
> Hendrik van der Heijden <spamtrap@crayne.org> wrote:
>
> >Tim Roberts schrieb:
> >>
> >> Unfortunately, that's not true. On a multiprocessor machine,
> >> QueryPerformanceCounter returns the raw cycle counter. The returned
value
> >> can even go backwards if you switch CPUs.
> >
> >Afaik, the idea is that QueryPerformanceCounter should not have
> >the problems of RDTSC with core migration and changing clok frequency.
> >
> >However, there were patches from MS (KB896256) and AMD to get
> >the intended behaviour right on multicore/MP systems.
>
> That's what the documents would lead you to think, but I have XP SP2 with
> KB896256 installed on my Intel Core2 Duo 2GHz laptop, and after being
> booted up for just about an hour, my two cores are currently 533,500,000
> cycles apart.
>
> However, QueryPerformanceCounter now seems to be using the ACPI timer, and
> all calls seem to be forced to a single core, so the results are once
again
> monotonic.
But, the problem is you don't know whether MS's QueryPerformanceCounter is
going to use RDTSC (for uniprocessor machines), or use PIT or ACPI when
RDTSC is inconsistent (multi-core or multiple cpu's). So, if you don't know
the timebase QPC is using, how do you get the OP's reliable and consistent
1ms clock from QPC on Windows (forgetting that the OP needed Linux too...)?
According the AMD manuals, even RDTSCP is inconsistent. It's only accurate
if and when "CPUID 8000_0007.edx[8] = 1"... QPC also has to come up with a
"solution" for processors like the Efficeon which thermal throttles the cpu,
but doesn't throttle RTDSC.
--quotes--
"QueryPerformanceCounter will use either the programmable-interval-timer
(PIT), or the ACPI power management timer (PMT), or the CPU-level
timestamp-counter (TSC). Accessing the PIT/PMT requires execution of slow
I/O port instructions and as a result the execution time for QPC is in the
order of microseconds. In contrast reading the TSC is on the order of 100
clock cycles (to read the TSC from the chip and convert it to a time value
based on the operating frequency). "
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6440250
"On single-core processors without Intel SpeedStep, QueryPerformanceCounter
is implemented using RDTSC. "
http://smallcode.weblogs.us/2007/12/07/performance-measurements-with-rdtsc/
"Vista actually uses the RDTSC instruction as a better way of ensuring
fairness when scheduling threads."
http://lyon-smith.org/blogs/code-o-rama/archive/2007/07/17/timing-code-on-windows-with-the-rdtsc-instruction.aspx
"...efficeon updates RDTSC independently of the actual CPU clock speed and
at a rate that corresponds to the maximum CPU frequency."
http://www.vanshardware.com/articles/2004/05/040517_efficeonFreeze/040517_efficeonFreeze.htm
"TSC drift can occur on K8 AMD multi-processor platforms and
single-processor dual-core platforms as they do not provide frequency
independent TSC. This drift does not occur on single-processor single-core
platforms for obvious reasons."
http://developer.amd.com/documentation/articles/Pages/1214200692_5.aspx
"The behavior of the RDTSCP instruction is implementation dependent. The TSC
counts at a constant rate, but may be affected by power management events
(such as frequency changes), depending on the processor implementation. If
CPUID 8000_0007.edx[8] = 1, then the TSC rate is ensured to be invariant
across all P-States, C-States, and stop-grant transitions (such as STPCLK
Throttling); therefore, the TSC is suitable for use as a source of time." -
AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System
Instructions, AMD64 24594 r3.14
Rod Pemberton
|
|
0
|
|
|
|
Reply
|
Rod
|
6/18/2008 10:19:02 PM
|
|
Hi,
On Jun 18, 11:30 am, Jike Song <spamt...@crayne.org> wrote:
> Brendan wrote:
> > The full list of possible timers would be RTC, PIT, local APIC timer
> > and HPET,
>
> and the ACPI Power Management Timer?
Doh - I forgot about that one (never used it to be honest)..
On Jun 18, 5:49 pm, "Alexei A. Frounze" <spamt...@crayne.org> wrote:
> On Jun 17, 7:30 pm, Jike Song <spamt...@crayne.org> wrote:
> > and the ACPI Power Management Timer?
>
> Isn't that just a slow counter from which you can't get interrupts?
It's a 24-bit or 32-bit timer (width depends on chipset) operating at
roughly 3.5 MHz (or roughly 286 ns precision), that generates an "ACPI
event" (SMI or SCI) when it rolls over (which would happen about every
4.7 seconds with a 24-bit count, or about every 20 minutes with a 32-
bit count). It counts up, so if you can set it's count (I'm not sure
if it's count can be set) then you'd be able to use "0xFFFFFFFF -
delayTicks" (or "0x00FFFFFF - delayTicks") so that it rolls over and
generates an "ACPI event" after delayTicks have passed.
Note: AFAIK the ACPI specification doesn't say anything about being
able to change the count, and does say that the OS is meant to use it
to measure idle time by using the difference between readings. The
"ACPI event" is meant to be used to increase the counter's width (e.g.
turn it into a larger width count by doing "fullCount =
(number_of_rollovers << N) | currentCount" (where N would be 32 or 24
depending on the size of the chipset's counter).
Cheers,
Brendan
|
|
0
|
|
|
|
Reply
|
Brendan
|
6/19/2008 3:38:49 AM
|
|
Brendan wrote:
> It's a 24-bit or 32-bit timer (width depends on chipset) operating at
> roughly 3.5 MHz
To be specific, it operates at a nominal frequency of 3.579545 MHz,
a.k.a. the NTSC color burst freqency; also precisely 3 times the
frequency of the 8253/8254 PIT. In most PCs this is provided with an
accuracy of �50 ppm (�180 Hz), but in systems equipped with temperature
compensated crystal oscillators (TCXOs) it can be as good as �1 ppm
(�3.6 Hz).
-hpa
|
|
0
|
|
|
|
Reply
|
H
|
6/19/2008 6:50:55 AM
|
|
|
24 Replies
498 Views
(page loaded in 0.268 seconds)
Similiar Articles: High resolution timer. - comp.lang.asm.x86Hi guys, I need to implement low cost high resolution timer (on a scale of 1ms) . It should run on Windows and Linux so there is no Win32 API. The fi... Intel High Precision Event Timers - comp.protocols.time.ntp ...As you say, though, this doesn't help the precision ... with the interpolation disabled and the high-resolution timer ... PC Stamsund: Level,Date and Time,Source,Event ID ... DSP - Board with 2 ADC channels with high speed/resolution - comp ...I am searching for a DSP - Board with 2 high speed/high resolution ADC-channels (sampling rate larger than 1 MHz, resolution 16 Bit) The Tiburon - Boa... Timekeeping broken on Windows XP with multimedia timer enabled (-M ...> >> Alan, >> >> Our experience was that the switching between normal and high-resolution >> timers caused steps of many milliseconds (I don't recall the exact figure ... How do get a timer in a thread? - comp.unix.programmerIn Java one can set up a timer, and using that timer arrange for methods to ... Linux kernel? - comp.protocols.time.ntp ..... time.ntp ... ntpd and High resolution timer ... Packet timestamps when using Windows-7/Vista - comp.protocols.time ...Yes, I am running Dave Hart's binaries with the interpolation disabled and the high-resolution timer enabled, so it just relies on the ~1KHz clock. Idea: Applesoft & BG music - comp.sys.apple2.programmerIn principle, the addition of a high-resolution timer with interrupt capabilities could solve much of this problem (except when performing another time-critical task ... clock microseconds with resolution in milliseconds - comp.lang.tcl ...Tcl Reference Manual: clock - TMML: Introduction Returns the current time as an integer number of microseconds. See HIGH RESOLUTION TIMERS for a full description. clock ... some GPIB-tcl questions - comp.lang.tclThis is because 8.4 didn't provide high resolution timers. It should be replaced by the equivalent of [clock microseconds], to make it work cross platform 2) The ... RE: [ntp:questions] Re: Frequency and leapseconds! - comp ...The standard method for getting high resolution (<55ms) time on x86 is to latch and read the 8253 timer channel 0 hardware down counter that generates the int 8 clock ... like a kid with a new toy (PPS jitter) - comp.protocols.time.ntp ...It's running with a patch to lock the main and timer threads to the 2nd CPU ... Unlike most modern unix platforms, Windows doesn't have a high- resolution system clock ... still not able to get NTP to sync on windows 7 even w/ more ...Have you set the "High performance" power option on your PC, or at least ... frequency 2.083 MHz > > Information 2/24/2011 3:51:02 PM NTP 3 None MM timer resolution ... [comp.publish.cdrom] CD-Recordable FAQ, Part 1/4 - comp.publish ...Archive-name: cdrom/cd-recordable/part1 Posting-Frequency: monthly Last-modified: 2008/10/09 Version: 2.71 Send corrections and updates to And... Analyse the amplitude while recording a sound - comp.lang.java ...... display (in real time) a number > indicating how high ... slow! - comp.protocols.time.ntp ... loop while timer #5 ... memory you have, while the ... quite limited time resolution ... problem with synchronizing two comps to each other - comp ...Millisecond resolution is easy enough to achieve with NTP ... In which way does this relate to loosing timer ... Loosing interrupts comes from having HZ set to high ... High Resolution Timers - More Than 5 Million Users Work Online ...Chapter 5 High Resolution Timers . A real-time system is dependent on accurate and predictable time measurement. Timers provide a mechanism to notify a task when a ... About Timers - Microsoft Corporation: Software, Smartphones ...By calling this function at the beginning and end of a section of code, an application essentially uses the counter as a high-resolution timer. 7/26/2012 9:46:58 AM
|