clock not synching with low stratum time server

  • Follow


Hi,

I have a configuration which i have pasted below in ntp.conf

server 127.127.1.0 version 3

fudge 127.127.1.0 stratum 12

server timeclripfa.akadns.net version 3

peer cla2astr version 3

peer cla3astr version 3

peer cla4astr version 3


If i run my xntpd(4.1.1) daemon with the above configuration, i found
that after some time clock syncs with the local clock or other
peer servers(  cla2astr-ics0, cla3astr-ics0 and cla4astr-ics0 )
which have a very high stratum value (12). The  server
timeclripfa.akadns.net has a stratum value of 2. I believe
that xntpd should choose a low stratum server as its time source.
I ran ntpdate timeclripfa.akadns.net before starting xntpd.

Is there any reason why xntpd does not choose the low startum server
and falls back on local clock for synchronization ?

I ran xntpd in debug mode ( immediately after running ntpdate with
timeclripfa.akadns.net) and found that the initial offset difference
immediately goes up to  around 174ms.. This value keeps increasing to
more than +100secs and my system doesnot sync at all..


Thanks,
Balaji


0
Reply Sankaran 6/17/2005 1:17:07 PM

What does 'ntpq -p' show on this system?

H
0
Reply Harlan 6/17/2005 7:14:34 PM


In article <mailman.101.1119014403.91305.questions@lists.ntp.isc.org>,
Sankaran Balaji <balajisa@india.hp.com> wrote:

> I have a configuration which i have pasted below in ntp.conf

You have not specified anything like the minimum required information.
We need to know hardware and operating system and the output of
the ntpq peers command for each of your servers, as a minimum.

> server 127.127.1.0 version 3

Specifying version 3 on a reference clock makes no sense, but is 
probably harmless.  The first question though is why you have
the local clock configured at all; if you cannot answer that, you
should not have it.

> If i run my xntpd(4.1.1) daemon with the above configuration, i found

xntpd is version 3, so 4.1.1 cannot be xntpd!  Version 3 is obsolete.

> I ran xntpd in debug mode ( immediately after running ntpdate with
> timeclripfa.akadns.net) and found that the initial offset difference
> immediately goes up to  around 174ms.. This value keeps increasing to
> more than +100secs and my system doesnot sync at all..

If it takes less than 2.5 days to reach 100s offset, you need to fix
your clock frequency problem before you even think about running 
time discipline software.  There really isn't enough information
in your posting to speculate more, but have a look at the recent
posting about Linux and Windows lost interrupts.  (Drifting out to
100s over any period of time is a no no, but 2.5 days would mean
that you are drifting at nearly 500ppm, the maximum possible
correctable frequency error.)

> timeclripfa.akadns.net has a stratum value of 2. I believe
> that xntpd should choose a low stratum server as its time source.
> I ran ntpdate timeclripfa.akadns.net before starting xntpd.

> Is there any reason why xntpd does not choose the low startum server
> and falls back on local clock for synchronization ?

It's almost certainly being rejected as a falseticker before its
stratum can be considered.  One of the problems with local clocks
is that they give a falsely narrow error tolerance.

I think you are mis-using local clocks, and probably mis-using peering.
It is best to have at most one machine configured with the local clock as
a fallback, but if you have multiple ones, you should stagger their
stratums by at least two and should not use peer relationships, but 
establish a clear server heirarchy.

If you are going to use a local clock, you really need to have multiple
sources of true time that can outvote it.  However, most people shouldn't
have a local clock configured, and if you haven't understood why local
clocks can be undesirable, you probably shouldn't have one configured.

PS.  If at all possible, please use proper USENET software, as the 
email gateway to this newsgroup is broken.
0
Reply david 6/18/2005 9:49:05 AM

Sankaran Balaji wrote:

>Hi,
>
>I have a configuration which i have pasted below in ntp.conf
>
>server 127.127.1.0 version 3
>
>fudge 127.127.1.0 stratum 12
>
>server timeclripfa.akadns.net version 3
>
>peer cla2astr version 3
>
>peer cla3astr version 3
>
>peer cla4astr version 3
>
>
>If i run my xntpd(4.1.1) daemon with the above configuration, i found
>that after some time clock syncs with the local clock or other
>peer servers(  cla2astr-ics0, cla3astr-ics0 and cla4astr-ics0 )
>which have a very high stratum value (12). The  server
>timeclripfa.akadns.net has a stratum value of 2. I believe
>that xntpd should choose a low stratum server as its time source.
>I ran ntpdate timeclripfa.akadns.net before starting xntpd.
>
>Is there any reason why xntpd does not choose the low startum server
>and falls back on local clock for synchronization ?
>
>I ran xntpd in debug mode ( immediately after running ntpdate with
>timeclripfa.akadns.net) and found that the initial offset difference
>immediately goes up to  around 174ms.. This value keeps increasing to
>more than +100secs and my system doesnot sync at all..
>
>
>Thanks,
>Balaji
>
>
>  
>
<>The most likely reason I can think of is that timeclripfa.akadns.net 
becomes unreachable for some reason.

The output of ntpq -p taken, say, five minutes after startup and again 
after the local clock is selected as the synchronization source might be 
helpful.
0
Reply Richard 6/18/2005 2:55:29 PM

Hi,


Following are the information that you have requested:-


1) Hardware is alpha and Os is Tru64


 XNTPDC -P output ( every 900 seconds)
 -------------------------------------




remote           local      st poll reach  delay   offset(secs)
disp^M
=======================================================================
=LOCAL(0)        127.0.0.1       12   64    3 0.00000  0.000000 3.93846
+cla2astr        5.0.0.0         16   64    0 0.00000  0.000000 0.00000
+cla3astr        5.0.0.0         16   64    0 0.00000  0.000000 0.00000
=timeclripfa     131.98.196.54    2   64    3 0.00188 -0.174708 3.93811


remote           local      st poll reach  delay   offset    disp
=======================================================================
*LOCAL(0)        127.0.0.1       12   64  377 0.00000  0.000000 0.00189
+cla2astr          10.0.0.1        14   64   17 0.00049 -0.275757
0.93919
+cla3astr          10.0.0.1        16   64    0 0.00000  0.000000
0.00000
=timeclripfa     131.98.196.54    2   64  377 0.00285  0.477008 0.00142
remote           local      st poll reach  delay   offset    disp^M
=======================================================================
*LOCAL(0)        127.0.0.1       12   64  377 0.00000  0.000000 0.00189
+cla2astr           10.0.0.1        14   64   17 0.00049  0.191304
0.43945
+cla3astr           10.0.0.1        14   64  374 0.00049  0.993265
0.00208
=timeclripfa     131.98.196.54    2   64  377 0.00284  1.122346 0.00143


remote           local      st poll reach  delay   offset    disp^M
=======================================================================
*LOCAL(0)        127.0.0.1       12   64  377 0.00000  0.000000 0.00191
+cla2astr            10.0.0.1        14   64  377 0.00049  0.462898
0.00200
+cla3astr            10.0.0.1        14   64  277 0.00049 -0.428132
0.00192
=timeclripfa     131.98.196.54    2   64  377 0.00188  1.834147 0.00143


The offset value of server timeclripfa  increased
0.5-0.7s(approximately) every
900 seconds interval. After around three days , xntpdc -p value showed
as following:
remote           local      st poll reach  delay   offset    disp
=======================================================================
=LOCAL(0)        127.0.0.1       12   64  377 0.00000  0.000000 0.00191
*cla2astr            10.0.0.1        13 1024  376 0.00615  0.002557
0.02705
+cla3astr           10.0.0.1        14 1024  377 0.00049 -0.000086
0.01656
=timeclripfa     131.98.196.54    2 1024  377 0.00383 131.74644 0.01530


Drift file has got the following value : 1.470(after 3 days).


I used local clock as a fall back reference just in case server
timeclripfa
goes down.

Other local peer servers ( cla2astr,cla3astr) also has approximately
same kind of stats as this one(cla2astr) i.e 130s drift after three
days.
 
                                               
Thanks,
Balaji

0
Reply sbalaji79 6/20/2005 9:31:39 AM

sbalaji79@gmail.com wrote:

>Hi,
>
>
>Following are the information that you have requested:-
>
>
>1) Hardware is alpha and Os is Tru64
>
>
> XNTPDC -P output ( every 900 seconds)
> -------------------------------------
>
>
>
>
>remote           local      st poll reach  delay   offset(secs)
>disp^M
>=======================================================================
>=LOCAL(0)        127.0.0.1       12   64    3 0.00000  0.000000 3.93846
>+cla2astr        5.0.0.0         16   64    0 0.00000  0.000000 0.00000
>+cla3astr        5.0.0.0         16   64    0 0.00000  0.000000 0.00000
>=timeclripfa     131.98.196.54    2   64    3 0.00188 -0.174708 3.93811
>
>
>
cla2astr and cla3astr have "reach fields of 0 meaning that they are 
unreachable.
The delay and offset fields are in milliseconds, not seconds.

>remote           local      st poll reach  delay   offset    disp
>=======================================================================
>*LOCAL(0)        127.0.0.1       12   64  377 0.00000  0.000000 0.00189
>+cla2astr          10.0.0.1        14   64   17 0.00049 -0.275757
>0.93919
>+cla3astr          10.0.0.1        16   64    0 0.00000  0.000000
>0.00000
>=timeclripfa     131.98.196.54    2   64  377 0.00285  0.477008 0.00142
>

cla3astr is still unreachable.

>remote           local      st poll reach  delay   offset    disp^M
>=======================================================================
>*LOCAL(0)        127.0.0.1       12   64  377 0.00000  0.000000 0.00189
>+cla2astr           10.0.0.1        14   64   17 0.00049  0.191304
>0.43945
>+cla3astr           10.0.0.1        14   64  374 0.00049  0.993265
>0.00208
>=timeclripfa     131.98.196.54    2   64  377 0.00284  1.122346 0.00143
>
>
>
cla2astr and cla3astr failed to respond at four out of the last eight 
polling intervals.   The "reach" field  is an eight bit shift register.  
Each response to a query results in left shifting a 1 bit  into the 
register.  Each failure to respond shifts a 0 bit into the register.   
The values are displayed in octal.  377 represents eight 1 bits.  17 
represents four 0 bits and four 1 bits or four failures followed by four 
successes.

>remote           local      st poll reach  delay   offset    disp^M
>=======================================================================
>*LOCAL(0)        127.0.0.1       12   64  377 0.00000  0.000000 0.00191
>+cla2astr            10.0.0.1        14   64  377 0.00049  0.462898
>0.00200
>+cla3astr            10.0.0.1        14   64  277 0.00049 -0.428132
>0.00192
>=timeclripfa     131.98.196.54    2   64  377 0.00188  1.834147 0.00143
>
>
>The offset value of server timeclripfa  increased
>0.5-0.7s(approximately) every
>900 seconds interval. After around three days , xntpdc -p value showed
>as following:
>remote           local      st poll reach  delay   offset    disp
>=======================================================================
>=LOCAL(0)        127.0.0.1       12   64  377 0.00000  0.000000 0.00191
>*cla2astr            10.0.0.1        13 1024  376 0.00615  0.002557
>0.02705
>+cla3astr           10.0.0.1        14 1024  377 0.00049 -0.000086
>0.01656
>=timeclripfa     131.98.196.54    2 1024  377 0.00383 131.74644 0.01530
>
>
>Drift file has got the following value : 1.470(after 3 days).
>
>
>I used local clock as a fall back reference just in case server
>timeclripfa
>goes down.
>
>Other local peer servers ( cla2astr,cla3astr) also has approximately
>same kind of stats as this one(cla2astr) i.e 130s drift after three
>days.
> 
>                                               
>Thanks,
>Balaji
>
>
In all cases, it appears that your local clock has been selected as the 
synchronization source (the * in the first column).   I'm not as 
familiar with the ntpdc display as I am with the ntpq display and I'm 
not certain what the "=" in column one means.  The local clock is 
drifting farther and farther from the correct time because it is 
unsynchronized.

In your original post you mentioned using "xntpd 4.1.1" but the "x" was 
dropped from the name at version 4.0.   So either it should be ntpd or 
it's not 4.1.1.

Finally, you mention a drift file, but the ntp.conf in your original 
post does not specify a drift file!   Did you really cut and paste the 
entire ntp.conf file or just the parts you thought were significant?

I would suggest the following ntp.conf:

logfile /var/ntp/ntp.log
driftfile /var/ntp/ntp.drift
server timeclripfa.akadns.net prefer
peer cla2astr
peer cla3astr
peer cla4astr

Be sure to create /var/ntp with permissions allowing ntpd to write to it.
0
Reply Richard 6/20/2005 12:58:57 PM

Thanks for your suggestion. I will try it .. Tru64 maintains the same
naming convention
for both ntp v3 and ntp v4.1.1 binaries. So we still use the name xntpd
for version 4.

 xntpdc displays the offset in seconds whereas ntpq displays it in
milliseconds.

 I  have pasted only the relavent portion of ntp.conf so my initial
posting does
not have drift file.

 "= " in xntpdc output means that ntpd is operating in client mode with
server 
timeclripfa.

0
Reply sbalaji79 6/20/2005 1:29:24 PM

In article <1119259899.825087.5260@g49g2000cwa.googlegroups.com>,
sbalaji79@gmail.com wrote:

> Following are the information that you have requested:-

You haven't provided corresponding information for the other cla 
machines, but I would speculate that they are configured the same
except that they don't have any true source of time at all (or
are failing in the same way).

What you need to do is:

1) replace peering by server client relationships, leading away from
   the machine with the real source of time.  If you retain peering,
   you must eliminate the local clocks.

2) Preferably remove all local clocks.  The machines will free run using
   the last correction data if you don't have a local clock driver.

3) If you keep the local clocks, make sure that the machine with real 
   time sources has several (e.g. four) good ones, so that it will outvote
   bogus local clocks in the falseticker algorithm, and remove them from
   the other machines.

4) If you cannot remove them, from the other machines, stagger their
   strata by at least two and use server client relationships with the
   client having the two higher local clock stratum.


> 1) Hardware is alpha and Os is Tru64

If Tru64 has chosen to call ntpd xntpd they have made a mistake that
is going to cause a lot of support problems.  It is generally reckoned
that xntpd was the wrong name to use.

> XNTPDC -P output ( every 900 seconds)

The preferred format is ntpq output.

> *LOCAL(0)        127.0.0.1       12   64  377 0.00000  0.000000 0.00189

This has been selected because there are no overlapping error 
intervals, so all but one source gets eliminated before considering
strata.

> =LOCAL(0)        127.0.0.1       12   64  377 0.00000  0.000000 0.00191
> *cla2astr        10.0.0.1        13 1024  376 0.00615  0.002557 0.02705
> +cla3astr        10.0.0.1        14 1024  377 0.00049 -0.000086 0.01656

cla2 gets selected because its offset is close enough to the local
clock that the error bounds overlap.  cla3's error bounds also overlap.
The local clock is now ruled out because of the rule that says that
it is the source of last resort during clustering.  timeclr is 
ruled out in the falseticker check because it doesn't overlap with the
other three.

I guess that cla2 is using its local clock as source, and that is
why it shows stratum 13.

> Drift file has got the following value : 1.470(after 3 days).

That's remarkably low.  I suspect it is seeing its own local clock 
reflected back from the other machines.

> I used local clock as a fall back reference just in case server
> timeclripfa
> goes down.

In most cases, just letting the machines free run is enough.

> Other local peer servers ( cla2astr,cla3astr) also has approximately
> same kind of stats as this one(cla2astr) i.e 130s drift after three
> days.

The hardware needs fixing.  If the local clock is running with only 1.47ppm
correction, the true correction required (just) exceeds the capture range
of ntpd.  You need to find out why you have a 500ppm frequency error
and fix that before you do anything else, otherwise your clock will
get continually stepped (assuming you allow that - otherwise it will
run away by the excess of the required correction over 500ppm).
 
0
Reply david 6/20/2005 8:53:09 PM

sbalaji79@gmail.com wrote:
> Hi,
> 
> 
> Following are the information that you have requested:-
> 
> 
> 1) Hardware is alpha and Os is Tru64
> 
> 
>  XNTPDC -P output ( every 900 seconds)
>  -------------------------------------
> 
> 
> 
> 
> remote           local      st poll reach  delay   offset(secs)
> disp^M
> =======================================================================
> =LOCAL(0)        127.0.0.1       12   64    3 0.00000  0.000000 3.93846
> +cla2astr        5.0.0.0         16   64    0 0.00000  0.000000 0.00000
> +cla3astr        5.0.0.0         16   64    0 0.00000  0.000000 0.00000
> =timeclripfa     131.98.196.54    2   64    3 0.00188 -0.174708 3.93811
> 
> 
> remote           local      st poll reach  delay   offset    disp
> =======================================================================
> *LOCAL(0)        127.0.0.1       12   64  377 0.00000  0.000000 0.00189
> +cla2astr          10.0.0.1        14   64   17 0.00049 -0.275757
> 0.93919
> +cla3astr          10.0.0.1        16   64    0 0.00000  0.000000
> 0.00000
> =timeclripfa     131.98.196.54    2   64  377 0.00285  0.477008 0.00142
> remote           local      st poll reach  delay   offset    disp^M
> =======================================================================
> *LOCAL(0)        127.0.0.1       12   64  377 0.00000  0.000000 0.00189
> +cla2astr           10.0.0.1        14   64   17 0.00049  0.191304
> 0.43945
> +cla3astr           10.0.0.1        14   64  374 0.00049  0.993265
> 0.00208
> =timeclripfa     131.98.196.54    2   64  377 0.00284  1.122346 0.00143
> 
> 
> remote           local      st poll reach  delay   offset    disp^M
> =======================================================================
> *LOCAL(0)        127.0.0.1       12   64  377 0.00000  0.000000 0.00191
> +cla2astr            10.0.0.1        14   64  377 0.00049  0.462898
> 0.00200
> +cla3astr            10.0.0.1        14   64  277 0.00049 -0.428132
> 0.00192
> =timeclripfa     131.98.196.54    2   64  377 0.00188  1.834147 0.00143
> 
> 
> The offset value of server timeclripfa  increased
> 0.5-0.7s(approximately) every
> 900 seconds interval. After around three days , xntpdc -p value showed
> as following:
> remote           local      st poll reach  delay   offset    disp
> =======================================================================
> =LOCAL(0)        127.0.0.1       12   64  377 0.00000  0.000000 0.00191
> *cla2astr            10.0.0.1        13 1024  376 0.00615  0.002557
> 0.02705
> +cla3astr           10.0.0.1        14 1024  377 0.00049 -0.000086
> 0.01656
> =timeclripfa     131.98.196.54    2 1024  377 0.00383 131.74644 0.01530
> 
> 
> Drift file has got the following value : 1.470(after 3 days).
> 
> 
> I used local clock as a fall back reference just in case server
> timeclripfa
> goes down.
> 
> Other local peer servers ( cla2astr,cla3astr) also has approximately
> same kind of stats as this one(cla2astr) i.e 130s drift after three
> days.
>  
>                                                
> Thanks,
> Balaji
> 

Balaji,

Your client has 5 sources of time - the local clock, 3 peers using their
local clocks (one apparently down), and one actual server. NTP is looking
for some consensus in the time among all its servers and peers. What
I believe is happening is that all of the client systems are initially
closer to agreement with each other than with the server (they all choose
to synch with the first client), so they form that consensus among
themselves. The actual server, in the meantime, is proceeding forward at
a different frequency and diverges even further, so they all ignore it and
remain synchronized to one another instead.

You should have at least 3, preferably 4, real servers configured for
each client. If this is not possible, you will have to remove the local
clock (and probably the peers) from each client ntp.conf for at least as
long as it takes for each client to stabilize against the real server and
establish a characteristic drift rate. To re-initialize everything properly
on each client, you should stop (x)ntpd, delete ntp.drift, set the time
using ntpdate, and then start (x)ntpd with only real servers in ntp.conf.

You should verify that the server itself is a proper precision server,
configured and behaving properly, and that it does not have an absurd drift
rate. It does not appear to be unstable from the above data, but it or its
server(s) might, for example, be phony servers serving out local clocks.

-Tom
0
Reply Tom 6/21/2005 1:52:52 PM

8 Replies
416 Views

(page loaded in 0.085 seconds)

Similiar Articles:













7/24/2012 12:00:27 PM


Reply: