Client doesn't drop failed source

  • Follow


We were trying to see how NTP failover worked before implementing an
accurate time system on an isolated network.  Situation: Two NTP GPS
clocks are defined as the primary source, two others (belonging to a
different organization) are secondary sources.  To test we set up a client
that synchronizes with the four clocks, and ntpq> peers shows one of the
primaries with a "*" in the first column meaning it was synchronized to as
time source, the other 3 show "+" as expected, meaning they are usable
sources.  We disconnect the antenna of the selected clock.  As expected
the clock (Spectracom) changes its stratum from 1 to 16 after a delay, and
on the next poll (interval is 1024 seconds) ntpq shows its stratum as 16
and the clock is not considered a source (first column shows blank), and
the second primary has the "*".  We reconnect the antenna and disconnect
the other primary antenna.  It selects the first primary as expected.  We
disconnect both antennas and it selects one of the backup clocks as source
as expected.  We connect the antennas and it goes back to one of the
primaries.

Now the question: For the next test, we disconnect the LAN cable of the
selected primary clock, and we expect the next poll to fail and the client
to give up on it and to select the other primary clock.  No, we just see 
the time from the last poll incrementing well beyond the poll interval
(1024 seconds) but the disconnected clock still selected as primary!
When we connect the cable it fails over to the other primary.

How do we get the system to fail over to another clock if the one it was
using fails?  The system isn't a Unix (OpenVMS) it does use Unix type NTP
commands/programs (ntpq, ntpdate, ntpdc)
0
Reply moroney 1/8/2010 8:45:04 PM

On 1/8/2010 12:45 PM, Michael Moroney wrote:
> Now the question: For the next test, we disconnect the
>   LAN cable of the selected primary clock, and we expect
>   the next poll to fail and the client to give up on it
>   and to select the other primary clock.
>  No, we just see the time from the last poll incrementing
>   well beyond the poll interval (1024 seconds) but the
>   disconnected clock still selected as primary!
>  When we connect the cable it fails over to the other primary.
>
> How do we get the system to fail over to another clock
>  if the one it was using fails?  The system isn't a Unix
>  (OpenVMS) it does use Unix type NTP commands/programs
>  (ntpq, ntpdate, ntpdc)

I suspect you will have to supply your .conf file,
 or otherwise explain your minclock minsane prefer, ...
 settings, if you want many useful answers.

-- 
E-Mail Sent to this address <BlackList@Anitech-Systems.com>
  will be added to the BlackLists.
0
Reply E 1/8/2010 9:41:07 PM


Michael Moroney wrote:

> that synchronizes with the four clocks, and ntpq> peers shows one of the
> primaries with a "*" in the first column meaning it was synchronized to as
> time source, the other 3 show "+" as expected, meaning they are usable
> sources.  We disconnect the antenna of the selected clock.  As expected

* and + mean they are used, not just that they are usable.  The time 
used to discipline the clock is a weighted average of all of them.

> the clock (Spectracom) changes its stratum from 1 to 16 after a delay, and

> Now the question: For the next test, we disconnect the LAN cable of the
> selected primary clock, and we expect the next poll to fail and the client
> to give up on it and to select the other primary clock.  No, we just see 
> the time from the last poll incrementing well beyond the poll interval
> (1024 seconds) but the disconnected clock still selected as primary!

If it remains the best choice it could take up to 8192 seconds before 
all the samples are flushed from the FIFO and it becomes completely 
ineligible.  There is logic to discourage early switching, as clock 
hopping is, itself, considered a bad thing.

> When we connect the cable it fails over to the other primary.
> 
> How do we get the system to fail over to another clock if the one it was
> using fails?  The system isn't a Unix (OpenVMS) it does use Unix type NTP
> commands/programs (ntpq, ntpdate, ntpdc)

As hinted at the beginning, except in terms of the stratum, root 
distance and root dispersion that it announces downstream, ntpd does not 
failover; it uses all the good clocks up to a, configurable, limit.


0
Reply David 1/8/2010 9:57:02 PM

David Woolley <david@ex.djwhome.demon.invalid> writes:

>Michael Moroney wrote:

>> that synchronizes with the four clocks, and ntpq> peers shows one of the
>> primaries with a "*" in the first column meaning it was synchronized to as
>> time source, the other 3 show "+" as expected, meaning they are usable
>> sources.  We disconnect the antenna of the selected clock.  As expected

>* and + mean they are used, not just that they are usable.  The time 
>used to discipline the clock is a weighted average of all of them.

If it uses all of them, what is the difference between "*" and "+"?
It still doesn't explain why the disconnected clock doesn't get dropped.

>> the clock (Spectracom) changes its stratum from 1 to 16 after a delay, and

>If it remains the best choice it could take up to 8192 seconds before 
>all the samples are flushed from the FIFO and it becomes completely 
>ineligible.  There is logic to discourage early switching, as clock 
>hopping is, itself, considered a bad thing.

OK, is 8192 the default time before a nonresponsive host gets dropped?
0
Reply moroney 1/9/2010 2:32:52 AM

E-Mail Sent to this address will be added to the BlackLists <Null@BlackList.Anitech-Systems.invalid> writes:

>I suspect you will have to supply your .conf file,
> or otherwise explain your minclock minsane prefer, ...
> settings, if you want many useful answers.

Very simple.

driftfile SYS$SPECIFIC:[TCPIP$NTP]TCPIP$NTP.DRIFT

server 10.136.3.70 prefer
server 10.136.3.71 prefer
server 10.98.63.50
server 10.98.63.150


0
Reply moroney 1/9/2010 2:34:56 AM

Michael Moroney wrote:
> David Woolley <david@ex.djwhome.demon.invalid> writes:
> 
>> Michael Moroney wrote:
> 
>>> that synchronizes with the four clocks, and ntpq> peers shows one of the
>>> primaries with a "*" in the first column meaning it was synchronized to as
>>> time source, the other 3 show "+" as expected, meaning they are usable
>>> sources.  We disconnect the antenna of the selected clock.  As expected
> 
>> * and + mean they are used, not just that they are usable.  The time 
>> used to discipline the clock is a weighted average of all of them.
> 
> If it uses all of them, what is the difference between "*" and "+"?
> It still doesn't explain why the disconnected clock doesn't get dropped.
> 

"*" designates the primary source of time.  "+" designates an "advisory" 
source.  It's available for use if something happens to the primary 
source.

See Dave Mills' book on NTP, and RFC-1305.  RFC-1305 is for V3 of NTP. 
There's an RFC in preparation for V4 but as of the last time I heard it 
was being written by a committee and may be released sometime in the 
next decade!


0
Reply Richard 1/9/2010 2:48:15 AM

 	Hello Richard ,

On Fri, 8 Jan 2010, Richard B. Gilbert wrote:
> Michael Moroney wrote:
>> David Woolley <david@ex.djwhome.demon.invalid> writes:
>> 
>>> Michael Moroney wrote:
>> 
>>>> that synchronizes with the four clocks, and ntpq> peers shows one of the
>>>> primaries with a "*" in the first column meaning it was synchronized to 
>>>> as
>>>> time source, the other 3 show "+" as expected, meaning they are usable
>>>> sources.  We disconnect the antenna of the selected clock.  As expected
>> 
>>> * and + mean they are used, not just that they are usable.  The time used 
>>> to discipline the clock is a weighted average of all of them.
>> 
>> If it uses all of them, what is the difference between "*" and "+"?
>> It still doesn't explain why the disconnected clock doesn't get dropped.
>> 
>
> "*" designates the primary source of time.  "+" designates an "advisory" 
> source.  It's available for use if something happens to the primary source.
>
> See Dave Mills' book on NTP, and RFC-1305.  RFC-1305 is for V3 of NTP. 
> There's an RFC in preparation for V4 but as of the last time I heard it was 
> being written by a committee and may be released sometime in the next decade!

 	http://www.eecis.udel.edu/~mills/ntp/html/ntpdc.html

 	To Quote:
"
peers
     Obtains a list of peers for which the server is maintaining state, along 
with a summary of that state. Summary information includes the address of the 
remote peer, the local interface address (0.0.0.0 if a local address has yet to 
be determined), the stratum of the remote peer (a stratum of 16 indicates the 
remote peer is unsynchronized), the polling interval, in seconds, the 
reachability register, in octal, and the current estimated delay, offset and 
dispersion of the peer, all in seconds.

     The character in the left margin indicates the mode this peer entry is 
operating in. A + denotes symmetric active, a - indicates symmetric passive, a = 
means the remote server is being polled in client mode, a ^ indicates that the 
server is broadcasting to this address, a ~ denotes that the remote peer is 
sending broadcasts and a * marks the peer the server is currently synchronizing 
to.

     The contents of the host field may be one of four forms. It may be a host 
name, an IP address, a reference clock implementation name with its parameter or 
REFCLK(implementation number, parameter). On hostnames no only IP-addresses will 
be displayed.
"
 		Hth ,  JimL
-- 
+------------------------------------------------------------------+
| James   W.   Laferriere | System    Techniques | Give me VMS     |
| Network&System Engineer | 3237     Holden Road |  Give me Linux  |
| babydr@baby-dragons.com | Fairbanks, AK. 99709 |   only  on  AXP |
+------------------------------------------------------------------+
0
Reply Mr 1/9/2010 3:34:11 AM

On 2010-01-09, Michael Moroney <moroney@world.std.spaamtrap.com> wrote:
> David Woolley <david@ex.djwhome.demon.invalid> writes:
>
>>Michael Moroney wrote:
>
>>> that synchronizes with the four clocks, and ntpq> peers shows one of the
>>> primaries with a "*" in the first column meaning it was synchronized to as
>>> time source, the other 3 show "+" as expected, meaning they are usable
>>> sources.  We disconnect the antenna of the selected clock.  As expected
>
>>* and + mean they are used, not just that they are usable.  The time 
>>used to discipline the clock is a weighted average of all of them.
>
> If it uses all of them, what is the difference between "*" and "+"?
> It still doesn't explain why the disconnected clock doesn't get dropped.
>
>>> the clock (Spectracom) changes its stratum from 1 to 16 after a delay, and
>
>>If it remains the best choice it could take up to 8192 seconds before 
>>all the samples are flushed from the FIFO and it becomes completely 
>>ineligible.  There is logic to discourage early switching, as clock 
>>hopping is, itself, considered a bad thing.
>
> OK, is 8192 the default time before a nonresponsive host gets dropped?

No, 8 poll intervals, whatever your poll interval is. 

0
Reply unruh 1/9/2010 4:18:16 AM

On 1/8/2010 6:34 PM, Michael Moroney wrote:
> BlackLists writes:
>> I suspect you will have to supply your .conf file,
>> or otherwise explain your minclock minsane prefer, ...
>> settings, if you want many useful answers.
>
> Very simple.
>
> driftfile SYS$SPECIFIC:[TCPIP$NTP]TCPIP$NTP.DRIFT
>
> server 10.136.3.70 prefer
> server 10.136.3.71 prefer
> server 10.98.63.50
> server 10.98.63.150

I mentioned this about a month ago,
 the prefered servers are always included:

<http://www.eecis.udel.edu/~mills/ntp/html/prefer.html>
The prefer Peer
"In the prefer scheme the clustering algorithm is modified
   so that the prefer peer is never discarded"

"Ordinarily, the combining algorithm computes a weighted average of
    the survivor offsets to produce the final synchronization source.
   However, if a prefer peer is among the survivors, the combining
    algorithm is not used. Instead, the offset of the prefer peer
       is used exclusively as the final synchronization source."

Mitigation Rules
"As the selection algorithm scans the associations for selectable
   candidates, the modem driver and local driver are segregated
   for later, but only if not designated a prefer peer.  If so
   designated, a driver is included among the candidate population."

  ...

"If the prefer peer is among the survivors, it becomes the system
   peer and its clock offset and jitter are inherited by the
   corresponding system variables. Otherwise, the combining
   algorithm computes these variables from the survivor population."

The minsane Option
"The minsane option specifies the minimum number of survivors
   required to synchronized the system clock."


<http://www.eecis.udel.edu/~mills/ntp/html/miscopt.html>
minsane
"minsane Specify the number of servers used by the selection
   algorithm as the minimum to set the system clock.
  The default is 1 for legacy purposes; however, for critical
   applications the value should be somewhat higher but less
   than minclock."


-- 
E-Mail Sent to this address <BlackList@Anitech-Systems.com>
  will be added to the BlackLists.
0
Reply E 1/9/2010 6:30:34 AM

Michael Moroney wrote:

> Very simple.
> 
> driftfile SYS$SPECIFIC:[TCPIP$NTP]TCPIP$NTP.DRIFT
> 
> server 10.136.3.70 prefer
> server 10.136.3.71 prefer

That's not simple.  You are only supposed ot have, at most, one prefer 
source.
> server 10.98.63.50
> server 10.98.63.150
> 
> 
0
Reply David 1/9/2010 9:24:14 AM

Richard B. Gilbert wrote:
> Michael Moroney wrote:
>> David Woolley <david@ex.djwhome.demon.invalid> writes:
>>
>>> Michael Moroney wrote:
>>
>>>> that synchronizes with the four clocks, and ntpq> peers shows one of 
>>>> the
>>>> primaries with a "*" in the first column meaning it was synchronized 
>>>> to as
>>>> time source, the other 3 show "+" as expected, meaning they are usable
>>>> sources.  We disconnect the antenna of the selected clock.  As expected
>>
>>> * and + mean they are used, not just that they are usable.  The time 
>>> used to discipline the clock is a weighted average of all of them.
>>
>> If it uses all of them, what is the difference between "*" and "+"?
>> It still doesn't explain why the disconnected clock doesn't get dropped.
>>
> 
> "*" designates the primary source of time.  "+" designates an "advisory" 
> source.  It's available for use if something happens to the primary source.

You are making the same mistake that he was making.  * means the source 
of time used to determine the stratum, and downstream root distance and 
dispersion.  + and * both mean that the time is being used to discipline 
the local clock.  + is not a standby, as far as disciplining the clock 
is concerned; + entries are actively used to discipline the clock.

On V3, the time used to discipline the local clock is a weighted average 
based on a quality metric - I forget exactly which one.  I'm not so sure 
that V4 applies as much weighting.
0
Reply David 1/9/2010 9:31:34 AM

E-Mail Sent to this address will be added to the BlackLists wrote:

>    However, if a prefer peer is among the survivors, the combining
>     algorithm is not used. Instead, the offset of the prefer peer
>        is used exclusively as the final synchronization source."

OK.  This is why this configuration is NOT simple!  My original 
comments, and my reply to Michael, are based on not using prefer.  You 
have used two prefers, and I would have to check the source code to see 
how ntpd handles that.  It may average the two, or it may stop on the 
first prefer peer that is still valid.
0
Reply David 1/9/2010 9:39:07 AM

David Woolley wrote:

> OK.  This is why this configuration is NOT simple!  My original 
> comments, and my reply to Michael, are based on not using prefer.  You 

Sorry, reply to Richard.
0
Reply David 1/9/2010 9:40:28 AM

Michael Moroney wrote:
> David Woolley <david@ex.djwhome.demon.invalid> writes:

>> If it remains the best choice it could take up to 8192 seconds before 
>> all the samples are flushed from the FIFO and it becomes completely 
>> ineligible.  There is logic to discourage early switching, as clock 
>> hopping is, itself, considered a bad thing.
> 
> OK, is 8192 the default time before a nonresponsive host gets dropped?


8192 was an oversimplification, as:

1) I think that the reachability has to be more than 1/8 before the 
source is considered valid - however a reachability of 7/8 is definitely 
still considered valid;

2) In a failing situation, the poll rate may start to reduce;

3) If the peer isn't the preferred peer, you might trip the rules for 
switching peers simply because another one is better and enough time has 
passed (I'd have to check whether the anti-clock hopping is based on 
sample counts or elapsed time).

8 * maxpoll guarantees that all samples for a source have been flushed 
and it cannot possibly be used, but the cutoff may be more like 4 * 
maxpoll (I need to check the code - this apparent limit may actually be 
more the result of jitter calculations during startup and might not 
apply to the loss of a source).

NTP tries to set the poll rate so that the error due to using a sample 
that is 8 poll intervals old is negligible when taking into account the 
PLL bandwidth.  It is oversampling the source by a factor of more than 
8.  If a source continues to respond to polls it can and often does use 
a sample that is 8192 seconds old for a poll interval of 1024.
0
Reply David 1/9/2010 9:55:47 AM

On 1/9/2010 1:24 AM, David Woolley wrote:
> Michael Moroney wrote:
>> server 10.136.3.70 prefer
>> server 10.136.3.71 prefer
>
> You are only supposed ot have, at most, one prefer source.
>> server 10.98.63.50
>> server 10.98.63.150

(Shrug)

I see nothing "wrong" with having several, {I've done it}
 as long as it is understood that are always included in
 the combining algorithm and don't get discarded.  AFAICT

-- 
E-Mail Sent to this address <BlackList@Anitech-Systems.com>
  will be added to the BlackLists.
0
Reply E 1/11/2010 7:54:18 PM

Can anyone point me to anything similar to "How to explain NTP to Project
Managers" esp. explaining how a preferred clock server is included even
though the LAN cable is known to be dangling from the rack.  Most of which
I find with Google is too technical.  This is confusing to me as well,
with the algorithm tracking round trip times, compensating for offsets, 
drift, jitters etc., plus this is all new to me.

What I was asked to do is pretty much as follows: Track Clocks A and/or B
as long as either are available, otherwise track Clocks C and/or D.

All 4 clocks are GPS clocks (stratum 1) and clocks C and D belong to 
another group and are supposed to be "backup" while A and B are "our"
clocks.

Manager expects that if the LAN cable to A is pulled, A will be dropped 
and system will synch to B and vice versa.  The "disconnect antenna" test
worked perfectly since the clocks changed their stratum to 16 and the
system dropped them as a source on the next poll.  I do understand that in
actuality all the available Stratum 1 Clocks will be used, and I understand
that the actual NTP client and clocks don't care who "owns" them.
0
Reply moroney 1/11/2010 10:12:57 PM

E-Mail Sent to this address will be added to the BlackLists wrote:

> 
> I see nothing "wrong" with having several, {I've done it}
>  as long as it is understood that are always included in
>  the combining algorithm and don't get discarded.  AFAICT
> 

Nothing is guaranteed when you go outside the specification.  Another 
possible implementation is to stop looking once you find the first 
prefer peer.
0
Reply David 1/11/2010 10:24:33 PM

> 
> I see nothing "wrong" with having several, {I've done it}
>  as long as it is understood that are always included in
>  the combining algorithm and don't get discarded.  AFAICT
> 

At least in 4.2.4p4, they are never included in the combining algorithm. 
  If one ignores some complications due to local and PPS clocks, the 
first peer found (I'm not sure if the list is sorted at this point) is 
the only source of time used.  Any redundant prefer peers are excluded. 
  clock_combine only appears on the branch that is executed when there 
is no prefer peer.

The other big effect of prefer is that candidates that have better 
metrics than the worst prefer peer are selected, although I think this 
is only significant if there is PPS peer that would otherwise have been 
rejected.
0
Reply David 1/11/2010 11:13:35 PM

Michael Moroney wrote:
> Can anyone point me to anything similar to "How to explain NTP to Project
> Managers" esp. explaining how a preferred clock server is included even

Unfortunately its primary author fancies himself as a mathematician, so 
the main documentation is a mathematical treatise.  He has produced some 
powerpoint sort of material, but probably more for academics than 
financially oriented managers.

> though the LAN cable is known to be dangling from the rack.  Most of which

For a start, if it dropped a source on the first missing reply it would 
result in clock hopping, which is undesirable.  UDP is unrliable, and 
you cannot rely on getting every query returned.

Looking at the, slightly out of date, 4.2.4p4, a reachability of 1/8 is 
still acceptable.  Rejection during startup will be based on a high 
jitter, causing a high root distance, rather than on the number of 
replies in the last 8 polls.

Basically sources are dropped when they can no longer provide valid 
input to the time estimation process.  The quality of their input 
reduces with age, but it doesn't suddenly drop to zero after one poll 
interval; in fact a source may still have lower error bounds than the 
other options even when it has been rejected on reachability.

Using prefer will probably result in using sources whose error bounds 
are rather high.

Incidentally, another thing that managers like doing is analysing the 
dynamics of the system when they change the client clock manually.  ntpd 
can help cope with poor quality hardware, but it is working completely 
out of specification when asked to deal with a simulation of such a 
gross hardware fault.
0
Reply David 1/11/2010 11:25:39 PM

David Woolley <david@ex.djwhome.demon.invalid> wrote:
> For a start, if it dropped a source on the first missing reply it
> would result in clock hopping, which is undesirable.  UDP is
> unrliable, and you cannot rely on getting every query returned.

In that sense then, even TCP is unreliable for it cannot guarantee
getting every query returned.  What distinguishes TCP from UDP is that
TCP will make multiple attempts to deliver the data and then signal
the probable (but not certain) non-delivery of data.

rick jones
-- 
oxymoron n, commuter in a gas-guzzling luxury SUV with an American flag
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
0
Reply Rick 1/11/2010 11:47:52 PM

On 1/11/2010 2:24 PM, David Woolley wrote:
> BlackLists wrote:
>> I see nothing "wrong" with having several, {I've done it}
>>  as long as it is understood that are always included in
>>  the combining algorithm and don't get discarded.  AFAICT
>
> Nothing is guaranteed when you go outside the specification.
> Another possible implementation is to stop looking once you
>  find the first prefer peer.

<http://www.eecis.udel.edu/~mills/ntp/html/prefer.html#prefer>
"... The prefer option designates one or more survivors as
  preferred over all others. While the rules do not forbid it,
  it is usually not useful to designate more than one source
  as preferred; however, if more than one source is so designated,
  they are used in the order specified in the configuration file;
  that is, if the first one becomes unselectable, the second one
  is considered and so forth. ..."

-- 
E-Mail Sent to this address <BlackList@Anitech-Systems.com>
  will be added to the BlackLists.
0
Reply E 1/12/2010 12:00:29 AM

On 2010-01-11, Michael Moroney <moroney@world.std.spaamtrap.com> wrote:
> Can anyone point me to anything similar to "How to explain NTP to Project
> Managers" esp. explaining how a preferred clock server is included even
> though the LAN cable is known to be dangling from the rack.  Most of which
> I find with Google is too technical.  This is confusing to me as well,
> with the algorithm tracking round trip times, compensating for offsets, 
> drift, jitters etc., plus this is all new to me.

The procedure is
Once every poll interval ( say 2^poll seconds, or 1024 seconds for poll
10) a packet is sent out to a server. The result of that packet is put
into the filter pool. then the lowest roundtrip time packet of the last
8 is selected as the time from that peer. Thus, there is always a
possible selection even if the line has been disconnected for 7000 sec
(2 hr). 

ntp does NOT contiuously test the line to see that there is a connection
to the remote server. ntp does not throw its hands into the air and give
up if it misses one or two responses. 

Now you could always put in a little program to continuously ping the
server, and if it comes down, remove that server line from /etc/ntp.conf
and tell ntp to reread its config file, but ntp does not do that for you. 
One of its design criteria is to have as little on the net as possible. 

Note that since it is your own server, you could always tell your
machines to poll it every 16 sec. (maxpoll 4, minpll 4) 
That way you would have to wait for at most 2 min before it would
realise that the site had gone down. It is your own server, so you
really do not care if you send packets to it that often. (The maxpoll of
10 is desinged so that you do not swamp the large NIST servers for
example). Your network can take it, and unless your server is an IPhone,
it can take it as well.



>
> What I was asked to do is pretty much as follows: Track Clocks A and/or B
> as long as either are available, otherwise track Clocks C and/or D.
>
> All 4 clocks are GPS clocks (stratum 1) and clocks C and D belong to 
> another group and are supposed to be "backup" while A and B are "our"
> clocks.
>
> Manager expects that if the LAN cable to A is pulled, A will be dropped 
> and system will synch to B and vice versa.  The "disconnect antenna" test
> worked perfectly since the clocks changed their stratum to 16 and the
> system dropped them as a source on the next poll.  I do understand that in
> actuality all the available Stratum 1 Clocks will be used, and I understand
> that the actual NTP client and clocks don't care who "owns" them.
0
Reply unruh 1/12/2010 12:24:25 AM

E-Mail Sent to this address will be added to the BlackLists wrote:
> On 1/11/2010 2:24 PM, David Woolley wrote:
>> BlackLists wrote:
>>> I see nothing "wrong" with having several, {I've done it}
>>>  as long as it is understood that are always included in
>>>  the combining algorithm and don't get discarded.  AFAICT
> 
>   as preferred; however, if more than one source is so designated,
>   they are used in the order specified in the configuration file;

This is consistent with the actual behaviour, which is that they are not 
all used in the combining algorithm.  Only the first viable prefer peer 
in the configuration file is used.  (All peers are used in excluding 
false tickers.)
0
Reply David 1/12/2010 7:32:15 AM

David Woolley wrote:

> 
> Looking at the, slightly out of date, 4.2.4p4, a reachability of 1/8 is 
> still acceptable.  Rejection during startup will be based on a high 

On a further look, it looks like the source's quality measures are 
invalidated after three consecutive missed replies.
0
Reply David 1/19/2010 10:41:38 PM

David,

Not true. The dispersion does increase, but the server is valid until 
the dispersion exceeds the select threshold, usually in four or five 
more missed messages.

Dave

David Woolley wrote:

> David Woolley wrote:
>
>>
>> Looking at the, slightly out of date, 4.2.4p4, a reachability of 1/8 
>> is still acceptable.  Rejection during startup will be based on a high 
>
>
> On a further look, it looks like the source's quality measures are 
> invalidated after three consecutive missed replies.
>
> _______________________________________________
> questions mailing list
> questions@lists.ntp.org
> http://lists.ntp.org/listinfo/questions
0
Reply David 1/20/2010 4:25:53 AM

Rick Jones wrote:
> David Woolley <david@ex.djwhome.demon.invalid> wrote:
>> For a start, if it dropped a source on the first missing reply it
>> would result in clock hopping, which is undesirable.  UDP is
>> unrliable, and you cannot rely on getting every query returned.
> 
> In that sense then, even TCP is unreliable for it cannot guarantee
> getting every query returned.  What distinguishes TCP from UDP is that
> TCP will make multiple attempts to deliver the data and then signal
> the probable (but not certain) non-delivery of data.
> 
> rick jones

More importantly, TCP requires a 3-way handshake before it can deliver
the packet, so there is setup time, delivery and tear-down time involved
as well just to deliver one small packet.

Danny

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
0
Reply Danny 1/20/2010 5:07:59 AM

Danny Mayer wrote:

> 
> More importantly, TCP requires a 3-way handshake before it can deliver
> the packet, so there is setup time, delivery and tear-down time involved
> as well just to deliver one small packet.

One wouldn't set up a TCP connection for each message!
0
Reply David 1/20/2010 7:51:15 AM

David Mills wrote:
> 
> Not true. The dispersion does increase, but the server is valid until 
> the dispersion exceeds the select threshold, usually in four or five 
> more missed messages.

That's why I said the quality measures are invalidated, rather than the 
server is invalidated.  I probably should have said "effectively".  I 
guess that anti-clock hopping may reduce the effect.

I was really trying to correct my initial reading, that the only 
reachability check was reachability == 0.  There is also a reachability 
& 7 == 0 check, earlier in the code, which results in a sample with the 
maximum possible dispersion value being loaded.

I can't tell, quickly, what gets loaded when there are only one or two 
missed polls.  I'd always assumed that the filter slots tracked the 
rachability ones, but maybe not.

0
Reply David 1/20/2010 8:04:29 AM

David Woolley wrote:
> Danny Mayer wrote:
>> More importantly, TCP requires a 3-way handshake
>>  before it can deliver the packet, so there is setup
>>  time, delivery and tear-down time involved as well
>>  just to deliver one small packet.
>
> One wouldn't set up a TCP connection for each message!

Really?

I can perhaps see idling the connection to keep it open when the
 poll rate is at ~ 1 minute, however what about when the poll
 rate decreases to ~ 17 minutes? (or less often if so configured)

-- 
E-Mail Sent to this address <BlackList@Anitech-Systems.com>
  will be added to the BlackLists.
0
Reply E 1/21/2010 3:21:47 AM

E-Mail Sent to this address will be added to the BlackLists wrote:

> 
> I can perhaps see idling the connection to keep it open when the
>  poll rate is at ~ 1 minute, however what about when the poll
>  rate decreases to ~ 17 minutes? (or less often if so configured)

There are no network cost in keeping a TCP connection "idling", as there 
is no traffic in that state.  (The exception is if you enable 
keepalives, and you wait hours between real traffic, but even then the 
traffic is very very small.)
> 
0
Reply David 1/21/2010 7:23:48 AM

David Woolley <david@ex.djwhome.demon.invalid> wrote:
> E-Mail Sent to this address will be added to the BlackLists wrote:
>
>> 
>> I can perhaps see idling the connection to keep it open when the
>>  poll rate is at ~ 1 minute, however what about when the poll
>>  rate decreases to ~ 17 minutes? (or less often if so configured)
>
> There are no network cost in keeping a TCP connection "idling", as there 
> is no traffic in that state.  (The exception is if you enable 
> keepalives, and you wait hours between real traffic, but even then the 
> traffic is very very small.)

It would be very unwise to use TCP for something like NTP.
Information sent by an application via a TCP socket will be re-sent by
the OS when no acknowledgement is received.  When the original network
packet had been lost, the receiver will get a retransmitted copy which
contains the original timestamp but which arrives much later in time.
(the re-try timer in TCP is usually in the order of seconds)

It is better to send a time message and lose it (with UDP), than to
receive it later because it has been re-transmitted.  With UDP the
re-transmit is done at the application layer, and the application can
put fresh time information in the re-transmission.
0
Reply Rob 1/21/2010 9:06:02 AM

Rob wrote:

> 
> It would be very unwise to use TCP for something like NTP.
> Information sent by an application via a TCP socket will be re-sent by

Agreed, but the thread had gone off topic.
0
Reply David 1/21/2010 9:54:07 PM

In article <hj8ve7$lm7$1@news.eternal-september.org>,
 David Woolley <david@ex.djwhome.demon.invalid> writes:

>There are no network cost in keeping a TCP connection "idling", as there 
>is no traffic in that state.  (The exception is if you enable 
>keepalives, and you wait hours between real traffic, but even then the 
>traffic is very very small.)

There is a cost at both ends.  It's the memory required to maintain
the state of the connection.  (and possibly the CPU to find the
connection you want)

If you don't do some form of keepalive, you will end up with a huge
collection of dead connections when the other end crashes without
closing their end. 

-- 
These are my opinions, not necessarily my employer's.  I hate spam.

0
Reply hal 1/24/2010 4:51:06 AM

E-Mail Sent to this address will be added to the BlackLists wrote:
> David Woolley wrote:
>> Danny Mayer wrote:
>>> More importantly, TCP requires a 3-way handshake
>>>  before it can deliver the packet, so there is setup
>>>  time, delivery and tear-down time involved as well
>>>  just to deliver one small packet.
>> One wouldn't set up a TCP connection for each message!
> 
> Really?
> 
> I can perhaps see idling the connection to keep it open when the
>  poll rate is at ~ 1 minute, however what about when the poll
>  rate decreases to ~ 17 minutes? (or less often if so configured)
> 

How many connections do you think it takes to overwelm a server? Even on
an idle connection TCP normally requires at least a minute of timeout
(lots of caveats, etc. here) but the server needs to drop these
connections as quickly as possible in order to get the next one. Don't
go there, there's a lot more going on under the hood than you seem to
realise from this response.

Danny

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
0
Reply Danny 1/31/2010 5:10:29 AM

Danny Mayer wrote:

> 
> How many connections do you think it takes to overwelm a server? Even on
> an idle connection TCP normally requires at least a minute of timeout

If you enable keep alives, the default timeout is several hours.  If you 
don't enable keepalives, there is no idle traffic at all.  You only need 
keep alives if there is no natural traffic in one direction.  Otherwise 
the first packet towards the side that has forgotten the connection will 
cause it to reset it.

> (lots of caveats, etc. here) but the server needs to drop these
> connections as quickly as possible in order to get the next one. Don't
> go there, there's a lot more going on under the hood than you seem to
> realise from this response.

TCP is actually designed to produce minimal idle traffic.

As noted elswhere, there are other reasons for not using TCP for NTP.
0
Reply David 1/31/2010 9:39:25 AM

>If you enable keep alives, the default timeout is several hours.  If you 
>don't enable keepalives, there is no idle traffic at all.  You only need 
>keep alives if there is no natural traffic in one direction.  Otherwise 
>the first packet towards the side that has forgotten the connection will 
>cause it to reset it.

Right.  But a NTP server doesn't normally initiate any traffic,
so if the client crashes the TCP connection will get left
dangling.

I suppose it would be simple to setup a background task
to kill any connections that hadn't seen any traffic for
N hours.  (and/or kill the oldest one if you have too many)

-- 
These are my opinions, not necessarily my employer's.  I hate spam.

0
Reply hal 2/2/2010 5:44:46 AM

David Woolley wrote:
> E-Mail Sent to this address will be added to the BlackLists wrote:
> 
>>
>> I can perhaps see idling the connection to keep it open when the
>>  poll rate is at ~ 1 minute, however what about when the poll
>>  rate decreases to ~ 17 minutes? (or less often if so configured)
> 
> There are no network cost in keeping a TCP connection "idling", as there
> is no traffic in that state.  (The exception is if you enable
> keepalives, and you wait hours between real traffic, but even then the
> traffic is very very small.)

While there may be no network costs, it a major cost to the server which
not only uses up member holding it open but also consumes a file
descriptor which are of limited supply.

Danny


-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
0
Reply Danny 2/7/2010 4:29:54 AM

36 Replies
295 Views

(page loaded in 0.184 seconds)

Similiar Articles:


















7/22/2012 1:14:05 PM


Reply: