Hi,
I have a configuration which i have pasted below in ntp.conf
server 127.127.1.0 version 3
fudge 127.127.1.0 stratum 12
server timeclripfa.akadns.net version 3
peer cla2astr version 3
peer cla3astr version 3
peer cla4astr version 3
If i run my xntpd(4.1.1) daemon with the above configuration, i found
that after some time clock syncs with the local clock or other
peer servers( cla2astr-ics0, cla3astr-ics0 and cla4astr-ics0 )
which have a very high stratum value (12). The server
timeclripfa.akadns.net has a stratum value of 2. I believe
that xntpd should choose a low stratum server as its time source.
I ran ntpdate timeclripfa.akadns.net before starting xntpd.
Is there any reason why xntpd does not choose the low startum server
and falls back on local clock for synchronization ?
I ran xntpd in debug mode ( immediately after running ntpdate with
timeclripfa.akadns.net) and found that the initial offset difference
immediately goes up to around 174ms.. This value keeps increasing to
more than +100secs and my system doesnot sync at all..
Thanks,
Balaji
|
|
0
|
|
|
|
Reply
|
Sankaran
|
6/17/2005 1:17:07 PM |
|
What does 'ntpq -p' show on this system?
H
|
|
0
|
|
|
|
Reply
|
Harlan
|
6/17/2005 7:14:34 PM
|
|
In article <mailman.101.1119014403.91305.questions@lists.ntp.isc.org>,
Sankaran Balaji <balajisa@india.hp.com> wrote:
> I have a configuration which i have pasted below in ntp.conf
You have not specified anything like the minimum required information.
We need to know hardware and operating system and the output of
the ntpq peers command for each of your servers, as a minimum.
> server 127.127.1.0 version 3
Specifying version 3 on a reference clock makes no sense, but is
probably harmless. The first question though is why you have
the local clock configured at all; if you cannot answer that, you
should not have it.
> If i run my xntpd(4.1.1) daemon with the above configuration, i found
xntpd is version 3, so 4.1.1 cannot be xntpd! Version 3 is obsolete.
> I ran xntpd in debug mode ( immediately after running ntpdate with
> timeclripfa.akadns.net) and found that the initial offset difference
> immediately goes up to around 174ms.. This value keeps increasing to
> more than +100secs and my system doesnot sync at all..
If it takes less than 2.5 days to reach 100s offset, you need to fix
your clock frequency problem before you even think about running
time discipline software. There really isn't enough information
in your posting to speculate more, but have a look at the recent
posting about Linux and Windows lost interrupts. (Drifting out to
100s over any period of time is a no no, but 2.5 days would mean
that you are drifting at nearly 500ppm, the maximum possible
correctable frequency error.)
> timeclripfa.akadns.net has a stratum value of 2. I believe
> that xntpd should choose a low stratum server as its time source.
> I ran ntpdate timeclripfa.akadns.net before starting xntpd.
> Is there any reason why xntpd does not choose the low startum server
> and falls back on local clock for synchronization ?
It's almost certainly being rejected as a falseticker before its
stratum can be considered. One of the problems with local clocks
is that they give a falsely narrow error tolerance.
I think you are mis-using local clocks, and probably mis-using peering.
It is best to have at most one machine configured with the local clock as
a fallback, but if you have multiple ones, you should stagger their
stratums by at least two and should not use peer relationships, but
establish a clear server heirarchy.
If you are going to use a local clock, you really need to have multiple
sources of true time that can outvote it. However, most people shouldn't
have a local clock configured, and if you haven't understood why local
clocks can be undesirable, you probably shouldn't have one configured.
PS. If at all possible, please use proper USENET software, as the
email gateway to this newsgroup is broken.
|
|
0
|
|
|
|
Reply
|
david
|
6/18/2005 9:49:05 AM
|
|
Sankaran Balaji wrote:
>Hi,
>
>I have a configuration which i have pasted below in ntp.conf
>
>server 127.127.1.0 version 3
>
>fudge 127.127.1.0 stratum 12
>
>server timeclripfa.akadns.net version 3
>
>peer cla2astr version 3
>
>peer cla3astr version 3
>
>peer cla4astr version 3
>
>
>If i run my xntpd(4.1.1) daemon with the above configuration, i found
>that after some time clock syncs with the local clock or other
>peer servers( cla2astr-ics0, cla3astr-ics0 and cla4astr-ics0 )
>which have a very high stratum value (12). The server
>timeclripfa.akadns.net has a stratum value of 2. I believe
>that xntpd should choose a low stratum server as its time source.
>I ran ntpdate timeclripfa.akadns.net before starting xntpd.
>
>Is there any reason why xntpd does not choose the low startum server
>and falls back on local clock for synchronization ?
>
>I ran xntpd in debug mode ( immediately after running ntpdate with
>timeclripfa.akadns.net) and found that the initial offset difference
>immediately goes up to around 174ms.. This value keeps increasing to
>more than +100secs and my system doesnot sync at all..
>
>
>Thanks,
>Balaji
>
>
>
>
<>The most likely reason I can think of is that timeclripfa.akadns.net
becomes unreachable for some reason.
The output of ntpq -p taken, say, five minutes after startup and again
after the local clock is selected as the synchronization source might be
helpful.
|
|
0
|
|
|
|
Reply
|
Richard
|
6/18/2005 2:55:29 PM
|
|
Hi,
Following are the information that you have requested:-
1) Hardware is alpha and Os is Tru64
XNTPDC -P output ( every 900 seconds)
-------------------------------------
remote local st poll reach delay offset(secs)
disp^M
=======================================================================
=LOCAL(0) 127.0.0.1 12 64 3 0.00000 0.000000 3.93846
+cla2astr 5.0.0.0 16 64 0 0.00000 0.000000 0.00000
+cla3astr 5.0.0.0 16 64 0 0.00000 0.000000 0.00000
=timeclripfa 131.98.196.54 2 64 3 0.00188 -0.174708 3.93811
remote local st poll reach delay offset disp
=======================================================================
*LOCAL(0) 127.0.0.1 12 64 377 0.00000 0.000000 0.00189
+cla2astr 10.0.0.1 14 64 17 0.00049 -0.275757
0.93919
+cla3astr 10.0.0.1 16 64 0 0.00000 0.000000
0.00000
=timeclripfa 131.98.196.54 2 64 377 0.00285 0.477008 0.00142
remote local st poll reach delay offset disp^M
=======================================================================
*LOCAL(0) 127.0.0.1 12 64 377 0.00000 0.000000 0.00189
+cla2astr 10.0.0.1 14 64 17 0.00049 0.191304
0.43945
+cla3astr 10.0.0.1 14 64 374 0.00049 0.993265
0.00208
=timeclripfa 131.98.196.54 2 64 377 0.00284 1.122346 0.00143
remote local st poll reach delay offset disp^M
=======================================================================
*LOCAL(0) 127.0.0.1 12 64 377 0.00000 0.000000 0.00191
+cla2astr 10.0.0.1 14 64 377 0.00049 0.462898
0.00200
+cla3astr 10.0.0.1 14 64 277 0.00049 -0.428132
0.00192
=timeclripfa 131.98.196.54 2 64 377 0.00188 1.834147 0.00143
The offset value of server timeclripfa increased
0.5-0.7s(approximately) every
900 seconds interval. After around three days , xntpdc -p value showed
as following:
remote local st poll reach delay offset disp
=======================================================================
=LOCAL(0) 127.0.0.1 12 64 377 0.00000 0.000000 0.00191
*cla2astr 10.0.0.1 13 1024 376 0.00615 0.002557
0.02705
+cla3astr 10.0.0.1 14 1024 377 0.00049 -0.000086
0.01656
=timeclripfa 131.98.196.54 2 1024 377 0.00383 131.74644 0.01530
Drift file has got the following value : 1.470(after 3 days).
I used local clock as a fall back reference just in case server
timeclripfa
goes down.
Other local peer servers ( cla2astr,cla3astr) also has approximately
same kind of stats as this one(cla2astr) i.e 130s drift after three
days.
Thanks,
Balaji
|
|
0
|
|
|
|
Reply
|
sbalaji79
|
6/20/2005 9:31:39 AM
|
|
sbalaji79@gmail.com wrote:
>Hi,
>
>
>Following are the information that you have requested:-
>
>
>1) Hardware is alpha and Os is Tru64
>
>
> XNTPDC -P output ( every 900 seconds)
> -------------------------------------
>
>
>
>
>remote local st poll reach delay offset(secs)
>disp^M
>=======================================================================
>=LOCAL(0) 127.0.0.1 12 64 3 0.00000 0.000000 3.93846
>+cla2astr 5.0.0.0 16 64 0 0.00000 0.000000 0.00000
>+cla3astr 5.0.0.0 16 64 0 0.00000 0.000000 0.00000
>=timeclripfa 131.98.196.54 2 64 3 0.00188 -0.174708 3.93811
>
>
>
cla2astr and cla3astr have "reach fields of 0 meaning that they are
unreachable.
The delay and offset fields are in milliseconds, not seconds.
>remote local st poll reach delay offset disp
>=======================================================================
>*LOCAL(0) 127.0.0.1 12 64 377 0.00000 0.000000 0.00189
>+cla2astr 10.0.0.1 14 64 17 0.00049 -0.275757
>0.93919
>+cla3astr 10.0.0.1 16 64 0 0.00000 0.000000
>0.00000
>=timeclripfa 131.98.196.54 2 64 377 0.00285 0.477008 0.00142
>
cla3astr is still unreachable.
>remote local st poll reach delay offset disp^M
>=======================================================================
>*LOCAL(0) 127.0.0.1 12 64 377 0.00000 0.000000 0.00189
>+cla2astr 10.0.0.1 14 64 17 0.00049 0.191304
>0.43945
>+cla3astr 10.0.0.1 14 64 374 0.00049 0.993265
>0.00208
>=timeclripfa 131.98.196.54 2 64 377 0.00284 1.122346 0.00143
>
>
>
cla2astr and cla3astr failed to respond at four out of the last eight
polling intervals. The "reach" field is an eight bit shift register.
Each response to a query results in left shifting a 1 bit into the
register. Each failure to respond shifts a 0 bit into the register.
The values are displayed in octal. 377 represents eight 1 bits. 17
represents four 0 bits and four 1 bits or four failures followed by four
successes.
>remote local st poll reach delay offset disp^M
>=======================================================================
>*LOCAL(0) 127.0.0.1 12 64 377 0.00000 0.000000 0.00191
>+cla2astr 10.0.0.1 14 64 377 0.00049 0.462898
>0.00200
>+cla3astr 10.0.0.1 14 64 277 0.00049 -0.428132
>0.00192
>=timeclripfa 131.98.196.54 2 64 377 0.00188 1.834147 0.00143
>
>
>The offset value of server timeclripfa increased
>0.5-0.7s(approximately) every
>900 seconds interval. After around three days , xntpdc -p value showed
>as following:
>remote local st poll reach delay offset disp
>=======================================================================
>=LOCAL(0) 127.0.0.1 12 64 377 0.00000 0.000000 0.00191
>*cla2astr 10.0.0.1 13 1024 376 0.00615 0.002557
>0.02705
>+cla3astr 10.0.0.1 14 1024 377 0.00049 -0.000086
>0.01656
>=timeclripfa 131.98.196.54 2 1024 377 0.00383 131.74644 0.01530
>
>
>Drift file has got the following value : 1.470(after 3 days).
>
>
>I used local clock as a fall back reference just in case server
>timeclripfa
>goes down.
>
>Other local peer servers ( cla2astr,cla3astr) also has approximately
>same kind of stats as this one(cla2astr) i.e 130s drift after three
>days.
>
>
>Thanks,
>Balaji
>
>
In all cases, it appears that your local clock has been selected as the
synchronization source (the * in the first column). I'm not as
familiar with the ntpdc display as I am with the ntpq display and I'm
not certain what the "=" in column one means. The local clock is
drifting farther and farther from the correct time because it is
unsynchronized.
In your original post you mentioned using "xntpd 4.1.1" but the "x" was
dropped from the name at version 4.0. So either it should be ntpd or
it's not 4.1.1.
Finally, you mention a drift file, but the ntp.conf in your original
post does not specify a drift file! Did you really cut and paste the
entire ntp.conf file or just the parts you thought were significant?
I would suggest the following ntp.conf:
logfile /var/ntp/ntp.log
driftfile /var/ntp/ntp.drift
server timeclripfa.akadns.net prefer
peer cla2astr
peer cla3astr
peer cla4astr
Be sure to create /var/ntp with permissions allowing ntpd to write to it.
|
|
0
|
|
|
|
Reply
|
Richard
|
6/20/2005 12:58:57 PM
|
|
Thanks for your suggestion. I will try it .. Tru64 maintains the same
naming convention
for both ntp v3 and ntp v4.1.1 binaries. So we still use the name xntpd
for version 4.
xntpdc displays the offset in seconds whereas ntpq displays it in
milliseconds.
I have pasted only the relavent portion of ntp.conf so my initial
posting does
not have drift file.
"= " in xntpdc output means that ntpd is operating in client mode with
server
timeclripfa.
|
|
0
|
|
|
|
Reply
|
sbalaji79
|
6/20/2005 1:29:24 PM
|
|
In article <1119259899.825087.5260@g49g2000cwa.googlegroups.com>,
sbalaji79@gmail.com wrote:
> Following are the information that you have requested:-
You haven't provided corresponding information for the other cla
machines, but I would speculate that they are configured the same
except that they don't have any true source of time at all (or
are failing in the same way).
What you need to do is:
1) replace peering by server client relationships, leading away from
the machine with the real source of time. If you retain peering,
you must eliminate the local clocks.
2) Preferably remove all local clocks. The machines will free run using
the last correction data if you don't have a local clock driver.
3) If you keep the local clocks, make sure that the machine with real
time sources has several (e.g. four) good ones, so that it will outvote
bogus local clocks in the falseticker algorithm, and remove them from
the other machines.
4) If you cannot remove them, from the other machines, stagger their
strata by at least two and use server client relationships with the
client having the two higher local clock stratum.
> 1) Hardware is alpha and Os is Tru64
If Tru64 has chosen to call ntpd xntpd they have made a mistake that
is going to cause a lot of support problems. It is generally reckoned
that xntpd was the wrong name to use.
> XNTPDC -P output ( every 900 seconds)
The preferred format is ntpq output.
> *LOCAL(0) 127.0.0.1 12 64 377 0.00000 0.000000 0.00189
This has been selected because there are no overlapping error
intervals, so all but one source gets eliminated before considering
strata.
> =LOCAL(0) 127.0.0.1 12 64 377 0.00000 0.000000 0.00191
> *cla2astr 10.0.0.1 13 1024 376 0.00615 0.002557 0.02705
> +cla3astr 10.0.0.1 14 1024 377 0.00049 -0.000086 0.01656
cla2 gets selected because its offset is close enough to the local
clock that the error bounds overlap. cla3's error bounds also overlap.
The local clock is now ruled out because of the rule that says that
it is the source of last resort during clustering. timeclr is
ruled out in the falseticker check because it doesn't overlap with the
other three.
I guess that cla2 is using its local clock as source, and that is
why it shows stratum 13.
> Drift file has got the following value : 1.470(after 3 days).
That's remarkably low. I suspect it is seeing its own local clock
reflected back from the other machines.
> I used local clock as a fall back reference just in case server
> timeclripfa
> goes down.
In most cases, just letting the machines free run is enough.
> Other local peer servers ( cla2astr,cla3astr) also has approximately
> same kind of stats as this one(cla2astr) i.e 130s drift after three
> days.
The hardware needs fixing. If the local clock is running with only 1.47ppm
correction, the true correction required (just) exceeds the capture range
of ntpd. You need to find out why you have a 500ppm frequency error
and fix that before you do anything else, otherwise your clock will
get continually stepped (assuming you allow that - otherwise it will
run away by the excess of the required correction over 500ppm).
|
|
0
|
|
|
|
Reply
|
david
|
6/20/2005 8:53:09 PM
|
|
sbalaji79@gmail.com wrote:
> Hi,
>
>
> Following are the information that you have requested:-
>
>
> 1) Hardware is alpha and Os is Tru64
>
>
> XNTPDC -P output ( every 900 seconds)
> -------------------------------------
>
>
>
>
> remote local st poll reach delay offset(secs)
> disp^M
> =======================================================================
> =LOCAL(0) 127.0.0.1 12 64 3 0.00000 0.000000 3.93846
> +cla2astr 5.0.0.0 16 64 0 0.00000 0.000000 0.00000
> +cla3astr 5.0.0.0 16 64 0 0.00000 0.000000 0.00000
> =timeclripfa 131.98.196.54 2 64 3 0.00188 -0.174708 3.93811
>
>
> remote local st poll reach delay offset disp
> =======================================================================
> *LOCAL(0) 127.0.0.1 12 64 377 0.00000 0.000000 0.00189
> +cla2astr 10.0.0.1 14 64 17 0.00049 -0.275757
> 0.93919
> +cla3astr 10.0.0.1 16 64 0 0.00000 0.000000
> 0.00000
> =timeclripfa 131.98.196.54 2 64 377 0.00285 0.477008 0.00142
> remote local st poll reach delay offset disp^M
> =======================================================================
> *LOCAL(0) 127.0.0.1 12 64 377 0.00000 0.000000 0.00189
> +cla2astr 10.0.0.1 14 64 17 0.00049 0.191304
> 0.43945
> +cla3astr 10.0.0.1 14 64 374 0.00049 0.993265
> 0.00208
> =timeclripfa 131.98.196.54 2 64 377 0.00284 1.122346 0.00143
>
>
> remote local st poll reach delay offset disp^M
> =======================================================================
> *LOCAL(0) 127.0.0.1 12 64 377 0.00000 0.000000 0.00191
> +cla2astr 10.0.0.1 14 64 377 0.00049 0.462898
> 0.00200
> +cla3astr 10.0.0.1 14 64 277 0.00049 -0.428132
> 0.00192
> =timeclripfa 131.98.196.54 2 64 377 0.00188 1.834147 0.00143
>
>
> The offset value of server timeclripfa increased
> 0.5-0.7s(approximately) every
> 900 seconds interval. After around three days , xntpdc -p value showed
> as following:
> remote local st poll reach delay offset disp
> =======================================================================
> =LOCAL(0) 127.0.0.1 12 64 377 0.00000 0.000000 0.00191
> *cla2astr 10.0.0.1 13 1024 376 0.00615 0.002557
> 0.02705
> +cla3astr 10.0.0.1 14 1024 377 0.00049 -0.000086
> 0.01656
> =timeclripfa 131.98.196.54 2 1024 377 0.00383 131.74644 0.01530
>
>
> Drift file has got the following value : 1.470(after 3 days).
>
>
> I used local clock as a fall back reference just in case server
> timeclripfa
> goes down.
>
> Other local peer servers ( cla2astr,cla3astr) also has approximately
> same kind of stats as this one(cla2astr) i.e 130s drift after three
> days.
>
>
> Thanks,
> Balaji
>
Balaji,
Your client has 5 sources of time - the local clock, 3 peers using their
local clocks (one apparently down), and one actual server. NTP is looking
for some consensus in the time among all its servers and peers. What
I believe is happening is that all of the client systems are initially
closer to agreement with each other than with the server (they all choose
to synch with the first client), so they form that consensus among
themselves. The actual server, in the meantime, is proceeding forward at
a different frequency and diverges even further, so they all ignore it and
remain synchronized to one another instead.
You should have at least 3, preferably 4, real servers configured for
each client. If this is not possible, you will have to remove the local
clock (and probably the peers) from each client ntp.conf for at least as
long as it takes for each client to stabilize against the real server and
establish a characteristic drift rate. To re-initialize everything properly
on each client, you should stop (x)ntpd, delete ntp.drift, set the time
using ntpdate, and then start (x)ntpd with only real servers in ntp.conf.
You should verify that the server itself is a proper precision server,
configured and behaving properly, and that it does not have an absurd drift
rate. It does not appear to be unstable from the above data, but it or its
server(s) might, for example, be phony servers serving out local clocks.
-Tom
|
|
0
|
|
|
|
Reply
|
Tom
|
6/21/2005 1:52:52 PM
|
|
|
8 Replies
416 Views
(page loaded in 0.085 seconds)
|