Now this is really annoying me now. I simply cannot find the reason, why this is happening. Is there any hope to turn this off? Regards, Edin -- Sicherheit gibt es nicht umsonst - zu keiner Zeit und nirgendwo! (Das BSI IT-Sicherheitshandbuch)
![]() |
0 |
![]() |
>Now this is really annoying me now. I simply cannot find the reason, why this is >happening. > >Is there any hope to turn this off? This should be in the FAQ. Have you looked there? One common cause is a software bug in the kernel timekeeping area. It adds too much/little to the time on each interrupt. Or your crystal could be way off. Or you could be missing interrupts. Does it happen when the system is idle, or only when it's busy? Laptops may have troubles with power-saving modes. I'm not familiar with this area. What is your "drift"? It's usually in /etc/ntp.drift, or something like that. (Look in your config file.) Or try the ntptime utility where it's called "frequency". If the drift is off by more than 500 ppm, ntpd has a sanity check/limit and won't correct enough to keep the clock accurate. So it drifts, and after a while it drifts far enough that ntpd takes a big step, prints that message, and starts over. -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.
![]() |
0 |
![]() |
Hi Hal and thanks for the reply, Hal Murray wrote: >>Now this is really annoying me now. I simply cannot find the reason, why this is >>happening. >> >>Is there any hope to turn this off? > > > This should be in the FAQ. Have you looked there? Yes I have, but found nothing about it. I'll look again. > > One common cause is a software bug in the kernel timekeeping area. > It adds too much/little to the time on each interrupt. I use kernel 2.4.20. I have 8 other systems with the same configuration working perfectly fine. > > Or your crystal could be way off. :( > > Or you could be missing interrupts. Does it happen when the system > is idle, or only when it's busy? Every ~22 minutes: Jul 3 19:58:02 rufus ntpd[1285]: synchronisation lost Jul 3 20:21:41 rufus ntpd[1285]: time reset 0.395221 s Jul 3 20:21:41 rufus ntpd[1285]: synchronisation lost Jul 3 20:43:10 rufus ntpd[1285]: time reset 1.205013 s Jul 3 20:43:10 rufus ntpd[1285]: synchronisation lost Jul 3 21:04:46 rufus ntpd[1285]: time reset 0.467360 s Jul 3 21:04:46 rufus ntpd[1285]: synchronisation lost Jul 3 21:26:26 rufus ntpd[1285]: time reset 1.306115 s Jul 3 21:26:26 rufus ntpd[1285]: synchronisation lost Jul 3 21:48:08 rufus ntpd[1285]: time reset 0.556156 s Jul 3 21:48:08 rufus ntpd[1285]: synchronisation lost Jul 3 22:09:50 rufus ntpd[1285]: time reset 1.317970 s Jul 3 22:09:50 rufus ntpd[1285]: synchronisation lost Jul 3 22:32:29 rufus ntpd[1285]: time reset 0.568680 s Jul 3 22:32:29 rufus ntpd[1285]: synchronisation lost Jul 3 22:55:20 rufus ntpd[1285]: time reset 1.325117 s Jul 3 22:55:20 rufus ntpd[1285]: synchronisation lost > > Laptops may have troubles with power-saving modes. I'm not familiar > with this area. It's not a laptop but a big server. > > > What is your "drift"? It's usually in /etc/ntp.drift, or > something like that. (Look in your config file.) Or try the > ntptime utility where it's called "frequency". Jul 3 18:27:07 rufus ntpd[1285]: ntpd 4.1.1b@1.829 Fri Jun 13 16:00:49 CEST 2003 (1) Jul 3 18:27:07 rufus ntpd[1285]: precision = 17 usec Jul 3 18:27:07 rufus ntpd[1285]: kernel time discipline status 0040 Jul 3 18:27:07 rufus ntpd[1285]: frequency initialized 88.736 from /etc/ntpd.drift Jul 3 18:30:19 rufus ntpd[1285]: time reset -0.175969 s Jul 3 18:30:19 rufus ntpd[1285]: kernel time discipline status change 41 Jul 3 18:30:19 rufus ntpd[1285]: synchronisation lost > > If the drift is off by more than 500 ppm, ntpd has a sanity > check/limit and won't correct enough to keep the clock accurate. > So it drifts, and after a while it drifts far enough that ntpd > takes a big step, prints that message, and starts over. > Regards, Edin
![]() |
0 |
![]() |
In article <be28f0$e5mc$1@ID-139806.news.dfncis.de>, Edin Dizdarevic <edin.dizdarevic@interActive-Systems.de> wrote: >It's not a laptop but a big server. SMP? I've seen the existence of two cycle clocks make NTP unhappy, but nothing as bad as what you are seeing. Have you considered other things like heavy network load, lost interrupts, ... Do the other machines have identical hardware, or just something close?
![]() |
0 |
![]() |
Allen McIntosh wrote: > In article <be28f0$e5mc$1@ID-139806.news.dfncis.de>, > Edin Dizdarevic <edin.dizdarevic@interActive-Systems.de> wrote: > >>It's not a laptop but a big server. > > > SMP? I've seen the existence of two cycle clocks make NTP unhappy, but > nothing as bad as what you are seeing. Dual board but single Xeon 1.8G CPU > > Have you considered other things like heavy network load, lost interrupts, ... > Do the other machines have identical hardware, or just something close? This machine has nothing to do. Lost interrupts... could be. It's an SCSI sysstem with only one IDE CD-ROM drive. I disabled APIC, disabled DMA for CD-ROM, updated SCSI drivers, kernel 2.4.20...nothing helped. Other machines are "simple" Pentiums 4... Regards, Edin
![]() |
0 |
![]() |
-----BEGIN PGP SIGNED MESSAGE----- In article <be28f0$e5mc$1@ID-139806.news.dfncis.de>, edin.dizdarevic@interActive-Systems.de says... > [snip] >> What is your "drift"? It's usually in /etc/ntp.drift, or >> something like that. (Look in your config file.) Or try the >> ntptime utility where it's called "frequency". > >Jul 3 18:27:07 rufus ntpd[1285]: ntpd 4.1.1b@1.829 Fri Jun 13 16:00:49 >CEST 2003 (1) I have a client with a similar poweredge system running rh7.2, but we run ntpq 4.1.71-a Thu Jun 13 11:32:54 PDT 2002 (1) built from source. The last sync lost message was about a month ago. -----BEGIN PGP SIGNATURE----- Version: 4.5 iQCVAgUBPwhw19ZjPoeWO7BhAQGfewP/eACoEYjIPpw/3m2mZ2fgm9dE8A3v8TBZ sUubhOXlMQhM0oB8g91YGht+aZ5XrJOHeFCKjSNkYrfnt3WkLmb9O+hxvMbBAXjP FLmy2wuKuL5QQcT/A7lr+rKvwYn9oiYya8aoo+ChQiGDLJAATyTki0tkydpgunPg N5W26bLlTJw= =WCPk -----END PGP SIGNATURE-----
![]() |
0 |
![]() |
Carl Byington wrote: > -----BEGIN PGP SIGNED MESSAGE----- > > In article <be28f0$e5mc$1@ID-139806.news.dfncis.de>, > edin.dizdarevic@interActive-Systems.de says... > > > [snip] > > >>>What is your "drift"? It's usually in /etc/ntp.drift, or >>>something like that. (Look in your config file.) Or try the >>>ntptime utility where it's called "frequency". >> >>Jul 3 18:27:07 rufus ntpd[1285]: ntpd 4.1.1b@1.829 Fri Jun 13 16:00:49 >>CEST 2003 (1) > > > I have a client with a similar poweredge system running rh7.2, but we run > ntpq 4.1.71-a Thu Jun 13 11:32:54 PDT 2002 (1) built from source. The last > sync lost message was about a month ago. How similar and which kernel version do you use? Greetings, Edin > > > > -----BEGIN PGP SIGNATURE----- > Version: 4.5 > > iQCVAgUBPwhw19ZjPoeWO7BhAQGfewP/eACoEYjIPpw/3m2mZ2fgm9dE8A3v8TBZ > sUubhOXlMQhM0oB8g91YGht+aZ5XrJOHeFCKjSNkYrfnt3WkLmb9O+hxvMbBAXjP > FLmy2wuKuL5QQcT/A7lr+rKvwYn9oiYya8aoo+ChQiGDLJAATyTki0tkydpgunPg > N5W26bLlTJw= > =WCPk > -----END PGP SIGNATURE----- >
![]() |
0 |
![]() |
On Mon, 07 Jul 2003 16:27:15 +0000, Hal Murray Combobulated: >>How would one go about diagnosing lost interrupts? > > Write a loop that: > reads the cycle counter and the system time waits/sleeps a while > reads them again > does the arithmetic to compare the differences. > > You have to calibrate the speed of your CPU. > > If you want something more than a hack, you need to do something like > read the cycle counter both before and after getting the system time, > and try again if the difference is large, say because the scheduler gave > the CPU to another task or you got a page fault or ... > > You can obviously use the reads from the end of one loop as the readings > for the beginning of the next loop. Uhhhg........ You're way over my head here, Hal! I'm not a programmer but I can follow code somewhat. You wouldn't have a link to some documentation or examples of what you just explained would you? -- Ciao, CRH 8^)>
![]() |
0 |
![]() |
-----BEGIN PGP SIGNED MESSAGE----- In article <bea6d1$2t592$1@ID-139806.news.dfncis.de>, edin.dizdarevic@interActive-Systems.de says... > >> I have a client with a similar poweredge system running rh7.2, but we run >> ntpq 4.1.71-a Thu Jun 13 11:32:54 PDT 2002 (1) built from source. The last >> sync lost message was about a month ago. > >How similar and which kernel version do you use? One system is stock RH7.2 2.4.7-10smp #1 SMP Thu Sep 6 17:09:31 EDT 2001 i686 unknown on a dual processor poweredge of some flavor. That system is about 18 months old at this point, and I don't have the exact model number here. That one is running just fine with ntp 4.1.71-a. Another system is stock RH8.0 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 GNU/Linux on a single processor poweredge of some flavor. That system is about 6 months old, and I don't have the exact model number here either. This was running ntp 4.1.1.a from the stock RH8.0 distribution, but I just noticed that it had your same problem with excessive lost sync. So I compiled ntp from source on that machine, and it is now running 4.1.71-a. But it still has the same problem. Jul 7 18:12:45 mail3 ntpd[12685]: time reset 1.470749 s Jul 7 18:12:45 mail3 ntpd[12685]: synchronisation lost Jul 7 18:32:05 mail3 ntpd[12685]: time reset 1.031199 s Jul 7 18:32:05 mail3 ntpd[12685]: synchronisation lost Jul 7 18:55:51 mail3 ntpd[12685]: time reset 1.140142 s Jul 7 18:55:51 mail3 ntpd[12685]: synchronisation lost Jul 7 19:28:15 mail3 ntpd[12685]: time reset 0.529740 s Jul 7 19:28:15 mail3 ntpd[12685]: synchronisation lost Jul 7 20:15:51 mail3 ntpd[12685]: time reset 0.561480 s Jul 7 20:15:51 mail3 ntpd[12685]: synchronisation lost -----BEGIN PGP SIGNATURE----- Version: 4.5 iQCVAgUBPwpBttZjPoeWO7BhAQG15wP9FBpEHmF/fFVKfy4Qo6dZvtBXaLshT5Qn wTp+5Q8GKUUAgyqZx/ADnjf6+Kzc6aIv9cr+MxwUx8E8UAv58nBmq4ZYkGQxj8M5 ehkDKW+wwHbzLXnSyW+r9V8j60AfJDlKLHBbKT/TYoPIb8qI4sIMBbpdwMnhJ69l oitWHOvIWHY= =xFVM -----END PGP SIGNATURE-----
![]() |
0 |
![]() |
>Uhhhg........ You're way over my head here, Hal! I'm not a programmer but >I can follow code somewhat. You wouldn't have a link to some documentation >or examples of what you just explained would you? I don't have any clean code. It's the sort of thing that gets hacked up to chase some particular problem. You can try the program at: http://www.megapathdsl.net/~hmurray/temp/cpu (Compiled on RH 7.2) On my system, it prints this: Cycles MicroSec MHz uSec to gettimeofday first second 2821021234 4994934 564.776 2.243 0.930 2824212558 5000583 564.777 3.070 0.930 2824204896 5000568 564.777 2.943 0.930 2824212945 5000581 564.777 3.205 0.930 It's computing the CPU frequency from the cycle counter. This is a 566 MHz machine, but the spread-spectrum hacks on the CPU clock reduce that a bit. It may not work right on an SMP system. -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.
![]() |
0 |
![]() |
Hal Murray wrote: >>Uhhhg........ You're way over my head here, Hal! I'm not a programmer but >>I can follow code somewhat. You wouldn't have a link to some documentation >>or examples of what you just explained would you? > > > I don't have any clean code. It's the sort of thing that > gets hacked up to chase some particular problem. > > You can try the program at: > http://www.megapathdsl.net/~hmurray/temp/cpu > (Compiled on RH 7.2) > > On my system, it prints this: > > Cycles MicroSec MHz uSec to gettimeofday > first second > > 2821021234 4994934 564.776 2.243 0.930 > 2824212558 5000583 564.777 3.070 0.930 > 2824204896 5000568 564.777 2.943 0.930 > 2824212945 5000581 564.777 3.205 0.930 > > It's computing the CPU frequency from the cycle counter. > > This is a 566 MHz machine, but the spread-spectrum hacks > on the CPU clock reduce that a bit. > > It may not work right on an SMP system. > But what does it all mean? Cycles MicroSec MHz uSec to gettimeofday first second 8981007560 5005743 1794.141 1.157 0.798 8988970204 5010181 1794.141 1.113 0.798 8988971524 5010182 1794.141 1.454 0.798 8988968944 5010180 1794.141 1.228 0.798 8988971748 5010182 1794.141 1.503 0.798 8988970560 5010179 1794.142 1.271 0.798 8988973648 5010183 1794.141 1.135 0.798 8988965460 5010178 1794.141 1.026 0.798 8988969384 5010180 1794.141 0.898 0.805 8988971760 5010181 1794.141 0.903 0.798 8988968960 5010179 1794.141 0.916 0.798 8988970564 5010180 1794.141 0.910 0.798 8988970432 5010180 1794.141 1.026 0.798 8988970604 5010181 1794.141 1.032 0.798 8988969732 5010178 1794.142 0.903 0.798 8988970172 5010180 1794.141 0.910 0.798 8988976384 5010182 1794.142 2.127 0.798 8988967324 5010179 1794.141 1.300 0.798 8988968224 5010177 1794.142 0.916 1.699 8988972180 5010180 1794.142 1.066 0.798 8988969140 5010179 1794.141 0.901 0.805 8988970104 5010179 1794.141 1.101 0.798 8988970304 5010178 1794.142 0.914 0.798 8988970104 5010179 1794.141 1.014 0.798 8988969612 5010178 1794.142 0.916 0.798 8988972016 5010180 1794.141 1.026 0.798 8988973296 6210114 1447.473 1.467 0.989 <<<<<<< What's this? 8988982348 5010185 1794.142 1.028 0.798 8988957572 5010172 1794.141 1.063 0.798 8988968676 5010177 1794.142 1.039 0.798 8988969552 5010178 1794.142 1.037 0.798 8988970416 5010179 1794.142 0.903 0.798 8988972852 5010179 1794.142 1.034 0.798 8988967596 5010176 1794.142 0.907 0.798 8988973560 5010180 1794.142 1.126 0.805 8988968020 5010177 1794.142 0.898 0.805 8988970004 5010991 1793.851 0.981 0.805 8988970260 5011419 1793.698 1.033 0.798 8988970132 5011286 1793.745 1.061 0.798 8988970488 5009279 1794.464 0.905 0.805 8988971156 5009393 1794.423 1.126 0.798 8988971480 5009499 1794.385 1.070 0.798 8988969384 5009574 1794.358 1.030 0.805 8988968584 5009637 1794.335 0.905 0.805 8988971276 5009725 1794.304 1.030 0.805 8988971068 5009626 1794.340 1.034 0.798 8988970832 5009363 1794.434 0.912 0.798 8988970420 5009396 1794.422 1.005 0.798 8988970004 5009705 1794.311 1.043 0.805 8988969440 5010015 1794.200 1.164 0.798 8988968804 5010071 1794.180 0.916 0.798 8988971656 5009922 1794.234 1.030 0.805 8988971620 5009552 1794.366 1.023 0.798 8988969760 5011050 1793.830 1.015 0.798 8988970204 5011026 1793.838 1.342 0.798 Regards, Edin -- Sicherheit gibt es nicht umsonst - zu keiner Zeit und nirgendwo! (Das BSI IT-Sicherheitshandbuch)
![]() |
0 |
![]() |
>But what does it all mean? It goes into a loop, sleeping for 5 seconds... First column is number of cycles (during that sleep) Second column is microseconds. Third column is computed cycle counter frequency. > Cycles MicroSec MHz uSec to gettimeofday > first second > > 8981007560 5005743 1794.141 1.157 0.798 > 8988970204 5010181 1794.141 1.113 0.798 > 8988971524 5010182 1794.141 1.454 0.798 > 8988968944 5010180 1794.141 1.228 0.798 .... Looks normal. You have a 1.8 GHz system. > 8988969612 5010178 1794.142 0.916 0.798 > 8988972016 5010180 1794.141 1.026 0.798 > 8988973296 6210114 1447.473 1.467 0.989 <<<<<<< What's this? > 8988982348 5010185 1794.142 1.028 0.798 That's why this program is a hack. You have to poke around and/or hack the code to print more info when strange things happen. Did your time get reset by 1.2 seconds at that point? -- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.
![]() |
0 |
![]() |
Hal Murray wrote: >>But what does it all mean? > > > It goes into a loop, sleeping for 5 seconds... > > First column is number of cycles (during that sleep) > Second column is microseconds. > Third column is computed cycle counter frequency. > > > >> Cycles MicroSec MHz uSec to gettimeofday >> first second >> >> 8981007560 5005743 1794.141 1.157 0.798 >> 8988970204 5010181 1794.141 1.113 0.798 >> 8988971524 5010182 1794.141 1.454 0.798 >> 8988968944 5010180 1794.141 1.228 0.798 > > > ... > > Looks normal. You have a 1.8 GHz system. ;) > > > >> 8988969612 5010178 1794.142 0.916 0.798 >> 8988972016 5010180 1794.141 1.026 0.798 >> 8988973296 6210114 1447.473 1.467 0.989 <<<<<<< What's this? >> 8988982348 5010185 1794.142 1.028 0.798 > > > That's why this program is a hack. You have to poke around > and/or hack the code to print more info when strange things > happen. > > Did your time get reset by 1.2 seconds at that point? > How do you get that value? -- Sicherheit gibt es nicht umsonst - zu keiner Zeit und nirgendwo! (Das BSI IT-Sicherheitshandbuch)
![]() |
0 |
![]() |