Hi,
I got through this mailing list and asked google for several
times and still, I found no answer or hint where to look at.
So, the basic problem is that I have a couple of linux boxes
that need to be synchronized via NTP. Those boxes have all the
very same hardware and configuration.
Running the kernel 2.4.24 NTP works fine. For various reasons
there is the need to run -- at least a few of them -- with
kernel 2.6.9 and here it happens. NTP complains. The Log states:
ntpd[22804]: frequency error -512 PPM exceeds tolerance 500 PPM
Using "adjtimex" does not alleviate the problem at all. The
kernel 2.6.9 was configured via "make oldconfig" from kernel
2.4.24.
Does anybody has a hint where to look at or even a solution to
the problem. For better debugging, some further facts about the
environment:
ntp 4.2.0a-11
os Debian Sarge with kernel 2.6.9 resp. 2.4.24
Any help is welcome!
- goetz
--
Dipl. Inform Goetz Lichtwald
University of Karlsruhe, Germany
Institute of Telematics
Zirkel 2, 76128 Karlsruhe
Phone: +49 (0) 721 - 608 6404
Fax: +49 (0) 721 - 38 80 97
PGP-Key: http://www.tm.uka.de/~lichtwald/mykey.asc
|
|
0
|
|
|
|
Reply
|
Goetz
|
1/17/2005 7:29:20 AM |
|
On 2005-01-17, Goetz Lichtwald <lichtwald@tm.uka.de> wrote:
> Running the kernel 2.4.24 NTP works fine. For various reasons
> there is the need to run -- at least a few of them -- with
> kernel 2.6.9 and here it happens. NTP complains. The Log states:
>
> ntpd[22804]: frequency error -512 PPM exceeds tolerance 500 PPM
There have been reports that disabling the APIC at boot time helps
with this issue.
--
Steve Kostecke <kostecke@ntp.isc.org>
NTP Public Services Project - http://ntp.isc.org/
|
|
0
|
|
|
|
Reply
|
Steve
|
1/17/2005 2:17:47 PM
|
|
A long shoot (in the dark)
Check the value of HZ. If its not 100 try 100.
/Bj�rn
Goetz Lichtwald <lichtwald@tm.uka.de> writes:
> Hi,
>
>
> I got through this mailing list and asked google for several
> times and still, I found no answer or hint where to look at.
>
> So, the basic problem is that I have a couple of linux boxes
> that need to be synchronized via NTP. Those boxes have all the
> very same hardware and configuration.
>
> Running the kernel 2.4.24 NTP works fine. For various reasons
> there is the need to run -- at least a few of them -- with
> kernel 2.6.9 and here it happens. NTP complains. The Log states:
>
> ntpd[22804]: frequency error -512 PPM exceeds tolerance 500 PPM
>
> Using "adjtimex" does not alleviate the problem at all. The
> kernel 2.6.9 was configured via "make oldconfig" from kernel
> 2.4.24.
>
> Does anybody has a hint where to look at or even a solution to
> the problem. For better debugging, some further facts about the
> environment:
> ntp 4.2.0a-11
> os Debian Sarge with kernel 2.6.9 resp. 2.4.24
>
>
> Any help is welcome!
>
> - goetz
> --
> Dipl. Inform Goetz Lichtwald
> University of Karlsruhe, Germany
> Institute of Telematics
> Zirkel 2, 76128 Karlsruhe
> Phone: +49 (0) 721 - 608 6404
> Fax: +49 (0) 721 - 38 80 97
> PGP-Key: http://www.tm.uka.de/~lichtwald/mykey.asc
|
|
0
|
|
|
|
Reply
|
Bjorn
|
1/17/2005 4:29:05 PM
|
|
In article <mailman.25.1105956066.588.questions@lists.ntp.isc.org>,
Goetz Lichtwald <lichtwald@tm.uka.de> wrote:
> ntpd[22804]: frequency error -512 PPM exceeds tolerance 500 PPM
Either you are losing a lot of timer interrupts (I can't remember
if this would be reported as a positive (correction) or negative
(error) value), or your motherboard crystal is way out.
In the former case, you must fix the device driver that is losing the
interrupts (or reduce the kernel interrupt rate - 100Hz, as on older
and slower systems, is usually OK).
In the second case, it indicative of hardware that is too unreliable
to use with precision time keeping, and you should replace the hardware,
although the use of the tickadj utility, to modify the kernel's idea of
the duration of a tick, can be used to compensate in multiples of 100ppm.
|
|
0
|
|
|
|
Reply
|
david
|
1/17/2005 10:00:03 PM
|
|
Goetz Lichtwald <lichtwald@tm.uka.de> writes:
> Hi,
>
>
> I got through this mailing list and asked google for several
> times and still, I found no answer or hint where to look at.
>
> So, the basic problem is that I have a couple of linux boxes
> that need to be synchronized via NTP. Those boxes have all the
> very same hardware and configuration.
>
> Running the kernel 2.4.24 NTP works fine. For various reasons
> there is the need to run -- at least a few of them -- with
> kernel 2.6.9 and here it happens. NTP complains. The Log states:
>
> ntpd[22804]: frequency error -512 PPM exceeds tolerance 500 PPM
I hope you did both tests on the same hardware...
>
> Using "adjtimex" does not alleviate the problem at all. The
> kernel 2.6.9 was configured via "make oldconfig" from kernel
> 2.4.24.
>
> Does anybody has a hint where to look at or even a solution to
> the problem. For better debugging, some further facts about the
> environment:
> ntp 4.2.0a-11
> os Debian Sarge with kernel 2.6.9 resp. 2.4.24
I can really reccommend the 2.4 Kernels with PPSkit.
Regards,
Ulrich
|
|
0
|
|
|
|
Reply
|
Ulrich
|
1/21/2005 10:35:38 AM
|
|
You didn't mention the hardware that you were using. I was just
looking at the 2.6 kernel source in the time keeping area the
other day. There actually appears to be quite a few changes
since I looked at it a while back (not sure if that was 2.2 or 2.4).
I am NOT a kernel hacker expert.
However, as I read the 2.6 code, it seems to me:
-- HZ has been taken out of the .config, so it is not posssible
to just change HZ in the .config and recompile
-- it looks like the "actual HZ" used by the timer interrupt
has been fixed at 1000 in the ".h" file (for the Intel i386
architecture anyway)
-- the kernel "lies" to "user space" (i.e. applications) that
the HZ is 100 (called "user HZ") so that applications think
that the standard "jiffy" is 10 ms.
-- the timer interrupt does some "fancy footwork" using either
the TSC (a simple clock pulse counter built into later
pentium chips) or a thing called something like "HPET" timer
(doing this from memory, so I may have that acronym wrong)
to deduce when timer interrupts are lost and correct for this
I will add to this some speculation:
-- Originally, almost all Unix's and Linux's on PC style hardware
ran the clock interrupt at 100Hz, or once every 10 ms. Note
that various I/O driver code must lock off interrupts occasionally
to avoid "race conditions" in messing with data structures that
are modified both in interrupt code and in non-interrupt code.
An interrupt occurring while interrupts are locked out is
stored in a one-bit "register". If two interrupts should occur
while interrupts are locked out then when interrupts are again
enabled, the interrupt service routine is only executed once.
This means that one increment of the system time will not be made.
-- A lost clock interrupt (without the fancy footwork of the 2.6
kernel) would cause time to fall behind by 10 ms. (100 Hz clock)
or by 1 ms. (1000 Hz clock).
-- If the interrupts are locked out for more that 10 ms. (100 Hz
clock) or for more than 1 ms. (1000 Hz clock) the time would
fall behind
-- The time the interrupts are locked out depends critically on
the type of other hardware on this system (e.g. whether DMA
is available for certain devices, and which drivers are
being used)
-- The time interrupts are locked out depends on the speed of
the CPU; the same code takes longer to run on slower CPUs
-- Older CPUs may not have the timers to correct for lost interrupts;
it DOESN'T look like the 2.6 kernel sets the clock to 100 Hz
in the absence of a TSC or something to use to avoid lost
interrupts.
-- The 2.6 kernel seems to have more support for the newer CPUs
that allow the speed to be stepped up and down depending on
computation needs, temperature, or who knows what else. I
haven't been able to follow what happens in this case, e.g.
whether the TSC becomes an unreliable indicator for lost
interrupts.
Perhaps someone who works on the time keeping code in the kernel
monitors this list and can comment on the above. (Then again,
based on my experience, maybe not.) Back in the RedHat 7.x series,
someone at RedHat stepped up the HZ from 100 to 1000 and timekeeping
on my old machine was rotten until I recompiled the kernel for
100 Hz. Of course, I don't believe the kernel in those days
aattempted to detect lost interrupts. Now it appears this is
happening again, but Linux is supporting only the newer machines
that have the TSC timer and/or are fast enough that interrupts
aren't locked out for too long.
Goetz Lichtwald wrote:
> Hi,
>
>
> I got through this mailing list and asked google for several
> times and still, I found no answer or hint where to look at.
>
> So, the basic problem is that I have a couple of linux boxes
> that need to be synchronized via NTP. Those boxes have all the
> very same hardware and configuration.
>
> Running the kernel 2.4.24 NTP works fine. For various reasons
> there is the need to run -- at least a few of them -- with
> kernel 2.6.9 and here it happens. NTP complains. The Log states:
>
> ntpd[22804]: frequency error -512 PPM exceeds tolerance 500 PPM
>
> Using "adjtimex" does not alleviate the problem at all. The
> kernel 2.6.9 was configured via "make oldconfig" from kernel
> 2.4.24.
>
> Does anybody has a hint where to look at or even a solution to
> the problem. For better debugging, some further facts about the
> environment:
> ntp 4.2.0a-11
> os Debian Sarge with kernel 2.6.9 resp. 2.4.24
>
>
> Any help is welcome!
>
> - goetz
|
|
0
|
|
|
|
Reply
|
John
|
1/23/2005 3:04:01 AM
|
|
|
5 Replies
367 Views
(page loaded in 0.198 seconds)
|