Hello,
thanks to all who replied. Unfortunately the moderation bit made all of my
last replies expire and instead of reposting them, I choose to sum things up
in a single post.
The original question was answered. What I really wanted was a libntpq anyway,
and Heiko has already contributed a libntpq.a as well as a NTP SNMP daemon,
both of which we want. I understand it's part of ntp-dev, but didn't have the
chance to look on it (I was on a site visit this week). The plan certainly is
to use that work and I am heavily reliefed that one apparently no longer has
to fork the ntpq code base privately just to avoid the external process and
parser.
But I have to thank you people for extra points made and things I have
learned:
1. Mentioning the "25ms" as too long was a very bad idea from me, it sure
provided only for distraction. I wasn't interested at all in discussing if
that's a time frame that matters or not. In my world, every delay to
accessing information needs another justification than "not needed as fast",
but that leads to distraction as everybody may or may not subscribe to that
point of view.
2. I understood that it will be way better to monitor the offset of "external"
NTP servers via the small time query packets that ntpd use among themselves.
We will be able to determine a offset and restrict NTP servers that appear to
malfunction. Benefit: We potentially could disable it in some cases, before
it ever has an impact on our ntpd servers.
3. I was not clear enough that our NTP interest has two roles. One as the
maker of a specific system with specific NTP setup in place for which we have
provided support. In that role I have come to learn about the necessity of
NTP monitoring. Two as the maker of a middleware, where we can't tell people
to change their environment, but are to monitor it. Avoiding errors is
interesting in the first setup, in the second it's not our option and
responsability.
4. Also I learned about the orphaned mode. It wasn't available when our
current setup was designed (more than 7 years ago I think) and will make for
a nice enhancement proposal for our system.
5. The NTP rules about how often to query a daemon. I have tried to read up
about it, but only found general advice. Under the assumption that ntpd is
not multithreaded querying it at the time it should respond to other servers
is slightly unfortunate. I was thinking that the query is so fast that it
doesn't matter. I will have to back it up with numbers. I presume we could
query the local ntpd for its time in a loop and compare with local current
time to get an idea if extra libntpq queries degrade it or not.
Best regards,
Kay Hayen
|
|
0
|
|
|
|
Reply
|
kayhayen (2)
|
9/5/2008 6:55:26 AM |
|
Kay Hayen wrote:
> Hello,
>
> thanks to all who replied. Unfortunately the moderation bit made all of my
> last replies expire and instead of reposting them, I choose to sum things up
> in a single post.
<snip>
> 5. The NTP rules about how often to query a daemon. I have tried to read up
> about it, but only found general advice. Under the assumption that ntpd is
> not multithreaded querying it at the time it should respond to other servers
> is slightly unfortunate. I was thinking that the query is so fast that it
> doesn't matter. I will have to back it up with numbers. I presume we could
> query the local ntpd for its time in a loop and compare with local current
> time to get an idea if extra libntpq queries degrade it or not.
>
The "rules" about how often to query a daemon are not all that
complicated. The fact that there ARE rules is due to some history;
google for "Netgear Wisconsin" for the sordid details. For a "second
opinion" google for "DLink PHK".
Briefly, you use the defaults for MINPOLL and MAXPOLL. You may use the
"iburst" keyword in a server statement for fast startup. You may use
the "burst" keyword ONLY with the permission of the the server's owner.
99.99% of NTP installations will work very well using these rules". If
yours does not, ask here for help!
"burst" is intended for systems that make a dialup telephone connection
to a server three or four times a day.
"iburst" sends an initial burst of eight request packets at intervals of
two seconds. Thereafter, the server is polled at intervals between 64
and 1024 seconds; ntpd adjusts the poll interval within this range as
needed.
|
|
0
|
|
|
|
Reply
|
Richard
|
9/5/2008 12:17:46 PM
|
|
Hello Richard,
you wrote:
> The "rules" about how often to query a daemon are not all that
> complicated. The fact that there ARE rules is due to some history;
> google for "Netgear Wisconsin" for the sordid details. For a "second
> opinion" google for "DLink PHK".
Fascinating reads indeed, thanks for the pointers.
What worried me more was how often we can query the local ntpd before it will
have an adverse effect. Meantime I somehow I sought to convince me I should
be able to convince myself that ntpq requests are served at a different
priority (other socket) than ntpd requests are. I didn't find 2 sockets
though.
> Briefly, you use the defaults for MINPOLL and MAXPOLL. You may use the
> "iburst" keyword in a server statement for fast startup. You may use
> the "burst" keyword ONLY with the permission of the the server's owner.
> 99.99% of NTP installations will work very well using these rules". If
> yours does not, ask here for help!
Now speaking about our system, not the middleware, with connections as
follows:
External NTPs <-> 2 entry hosts <-> 8 other hosts.
And iburst and minpoll=maxpoll=5 to improve the results.
Currently we observe that both entry hosts can both become restricted due to
large offsets on other hosts, so they become restricted and that will make
the software refuse to go on. Ideally that would not happen.
I will try to formulate questions:
When the other hosts synchronize to the entry hosts of our system, don't the
other hosts ntpd know when and how much these entry hosts changed their time
due to input?
Would NTP would be more robust if we would configure routing on the entry
hosts, so that they can all speak directly with the external NTPs on their
own?
Is the use of ntpdate before starting ntpd recommended and/or does the iburst
option replace it?
Best regards,
Kay Hayen
|
|
0
|
|
|
|
Reply
|
kayhayen
|
9/5/2008 5:41:16 PM
|
|
Kay Hayen wrote:
> Hello Richard,
>
> you wrote:
>
>> The "rules" about how often to query a daemon are not all that
>> complicated. The fact that there ARE rules is due to some history;
>> google for "Netgear Wisconsin" for the sordid details. For a "second
>> opinion" google for "DLink PHK".
>
> Fascinating reads indeed, thanks for the pointers.
>
> What worried me more was how often we can query the local ntpd before it will
> have an adverse effect. Meantime I somehow I sought to convince me I should
> be able to convince myself that ntpq requests are served at a different
> priority (other socket) than ntpd requests are. I didn't find 2 sockets
> though.
>
>> Briefly, you use the defaults for MINPOLL and MAXPOLL. You may use the
>> "iburst" keyword in a server statement for fast startup. You may use
>> the "burst" keyword ONLY with the permission of the the server's owner.
>> 99.99% of NTP installations will work very well using these rules". If
>> yours does not, ask here for help!
>
> Now speaking about our system, not the middleware, with connections as
> follows:
>
> External NTPs <-> 2 entry hosts <-> 8 other hosts.
>
What do you mean by "entry hosts"?
> And iburst and minpoll=maxpoll=5 to improve the results.
Use the default values of minpoll and maxpoll! Ntpd will adjust the
polling interval within those limits. Ntpd is far smarter than you or
I. It will normally start by using minpoll and increase the interval
after it has initial synchronization. If network conditions deteriorate
it will decrease the poll interval and increase it as conditions
improve. IOW it will use the optimum poll interval for the conditions
then obtaining. If you configured seven servers, you might observe ntpd
using seven DIFFERENT poll intervals, one for each server because seven
different servers will be reached by at least seven different network paths!
>
> Currently we observe that both entry hosts can both become restricted due to
> large offsets on other hosts, so they become restricted and that will make
> the software refuse to go on. Ideally that would not happen.
>
> I will try to formulate questions:
>
> When the other hosts synchronize to the entry hosts of our system, don't the
> other hosts ntpd know when and how much these entry hosts changed their time
> due to input?
>
> Would NTP would be more robust if we would configure routing on the entry
> hosts, so that they can all speak directly with the external NTPs on their
> own?
>
> Is the use of ntpdate before starting ntpd recommended and/or does the iburst
> option replace it?
>
> Best regards,
> Kay Hayen
|
|
0
|
|
|
|
Reply
|
Richard
|
9/5/2008 6:42:32 PM
|
|
Kay,
I think most of your questions are answered at:
http://support.ntp.org/Support
I'd also be happy to discuss with you or anybody else at your company how
membership in the NTP Forum would be of benefit.
--
Harlan Stenn <stenn@ntp.org>
http://ntpforum.isc.org - be a member!
|
|
0
|
|
|
|
Reply
|
Harlan
|
9/5/2008 7:30:09 PM
|
|
kayhayen@gmx.de (Kay Hayen) writes:
>Hello Richard,
>you wrote:
>> The "rules" about how often to query a daemon are not all that
>> complicated. The fact that there ARE rules is due to some history;
>> google for "Netgear Wisconsin" for the sordid details. For a "second
>> opinion" google for "DLink PHK".
>Fascinating reads indeed, thanks for the pointers.
>What worried me more was how often we can query the local ntpd before it will
>have an adverse effect. Meantime I somehow I sought to convince me I should
>be able to convince myself that ntpq requests are served at a different
>priority (other socket) than ntpd requests are. I didn't find 2 sockets
>though.
Depends on the system but thousands of times per second is not out of the
ballpark. I assume you are not planning anything that severe.
(Some servers bombarded by those idiotic people I believed managed those
kinds of rates.)
>> Briefly, you use the defaults for MINPOLL and MAXPOLL. You may use the
>> "iburst" keyword in a server statement for fast startup. You may use
>> the "burst" keyword ONLY with the permission of the the server's owner.
>> 99.99% of NTP installations will work very well using these rules". If
>> yours does not, ask here for help!
>Now speaking about our system, not the middleware, with connections as
>follows:
>External NTPs <-> 2 entry hosts <-> 8 other hosts.
>And iburst and minpoll=maxpoll=5 to improve the results.
On which? That should NOT be on the external NTPs unless you own them. That
will not necessarily improve results-- depends on whether you want short
term accuracy or long term (eg what happens if the connection with the
outside world goes down for 3 days. Do you want to make sure your systems
will keep good time during those three days? Are you willing to buy 25usec
rather thahn 50usec short term accuracy for 10 sec drift over that 3 days?
>Currently we observe that both entry hosts can both become restricted due to
>large offsets on other hosts, so they become restricted and that will make
>the software refuse to go on. Ideally that would not happen.
>I will try to formulate questions:
>When the other hosts synchronize to the entry hosts of our system, don't the
>other hosts ntpd know when and how much these entry hosts changed their time
>due to input?
Yes, and no. On one level no-- they trust their sources. However part of
the information they get is the dispersion. That gives some info about how
well those servers are tracking the outside world.
>Would NTP would be more robust if we would configure routing on the entry
>hosts, so that they can all speak directly with the external NTPs on their
>own?
Multiple paths are always more robust than one path.
>Is the use of ntpdate before starting ntpd recommended and/or does the iburst
>option replace it?
Not recommended.
|
|
0
|
|
|
|
Reply
|
Unruh
|
9/5/2008 9:41:59 PM
|
|
Kay Hayen wrote:
>
> External NTPs <-> 2 entry hosts <-> 8 other hosts.
>
> And iburst and minpoll=maxpoll=5 to improve the results.
If these External NTPs really are external, i.e. not owned by you, do
not do this without explicit permission from their owners. There is a
real risk of countermeasures if you don't. These may result in poor
time or no time. Generally polling with anything less than the default
MINPOLL and MAXPOLL can be considered abusive and polling with a MAXPOLL
less than the default MINPOLL will trigger countermeasures in any system
configure to apply them.
>
> Currently we observe that both entry hosts can both become restricted due to
> large offsets on other hosts, so they become restricted and that will make
> the software refuse to go on. Ideally that would not happen.
I've never triggered countermeasures (kiss of death), but I have a
feeling that that is what you will observe on an NTP client that is too
old to recognize the warning it will get from the server.
If you are not subject to countermeasures, you have something very very
broken if you reach the 1000s drop dead point. You should be worried,
but it can happen legitimately, if you exceed the 128ms step threshold.
]
>
> I will try to formulate questions:
>
> When the other hosts synchronize to the entry hosts of our system, don't the
> other hosts ntpd know when and how much these entry hosts changed their time
> due to input?
You seem to be under the misapprehension that ntpd makes step changes on
each measurement. It actually makes slow adjustments to effective
frequency and rate of change of frequency based on s signficant number
of preceding measurements (Unruh: I'm over-simplifying both the 8 step
filter and the low pass loop filter here).
>
> Would NTP would be more robust if we would configure routing on the entry
> hosts, so that they can all speak directly with the external NTPs on their
> own?
Ask permission from the owners of the external hosts before doing this,
as it increases the load you impose. Also, it is likely to result in
larger offsets between machines.
>
> Is the use of ntpdate before starting ntpd recommended and/or does the iburst
> option replace it?
ntpdate is deprecated. -g is the nearest equivalent function in ntpd.
|
|
0
|
|
|
|
Reply
|
David
|
9/5/2008 9:50:39 PM
|
|
"Richard B. Gilbert" <rgilbert88@comcast.net> writes:
>Kay Hayen wrote:
>> Hello Richard,
>>
>> you wrote:
>>
>>> The "rules" about how often to query a daemon are not all that
>>> complicated. The fact that there ARE rules is due to some history;
>>> google for "Netgear Wisconsin" for the sordid details. For a "second
>>> opinion" google for "DLink PHK".
>>
>> Fascinating reads indeed, thanks for the pointers.
>>
>> What worried me more was how often we can query the local ntpd before it will
>> have an adverse effect. Meantime I somehow I sought to convince me I should
>> be able to convince myself that ntpq requests are served at a different
>> priority (other socket) than ntpd requests are. I didn't find 2 sockets
>> though.
>>
>>> Briefly, you use the defaults for MINPOLL and MAXPOLL. You may use the
>>> "iburst" keyword in a server statement for fast startup. You may use
>>> the "burst" keyword ONLY with the permission of the the server's owner.
>>> 99.99% of NTP installations will work very well using these rules". If
>>> yours does not, ask here for help!
>>
>> Now speaking about our system, not the middleware, with connections as
>> follows:
>>
>> External NTPs <-> 2 entry hosts <-> 8 other hosts.
>>
>What do you mean by "entry hosts"?
>> And iburst and minpoll=maxpoll=5 to improve the results.
>Use the default values of minpoll and maxpoll! Ntpd will adjust the
>polling interval within those limits. Ntpd is far smarter than you or
Well, you have too much faith in ntp. It is a whole series of comprimises,
many set up in the days when one second network delays were not unknown.
And one of ht ekey design criteria in that minpoll/maxpoll is to relieve
congestion on the servers. IF he is using his own servers (not outside
servers) then he can decrease the minpoll/maxpoll pairs ( after all the
refclocks run at minpoll=maxpoll 4) But there is a tradeoff. because of the
design of ntp, if you choose a low maxpoll, you will keep the phase errors
smaller, but at the expense of larger drift errors. (it basically averages
over a time interval a few times longer than the maxpoll interval) A longer
timebase means a longer lever arm for determining the drift. But at the
expense of not having as much data to beat down the statistical errors in
the offset.
Thus, with ntp if you want an accurate determination of the clock drift,
use a longer poll ( eg if there is a chance of your system loosing
connectivity for a few days) If you want lower phase noise while connected,
use a shorter poll. But remember that servers out there will get extremely
upset if you query them too often.
Essentially you want to be working the Allan minimum to get rid of both
short and long term. But NTP does not determine where that is. It simply
assumes a value. That assumption is not necessarily very good.
(Close by clock servers, with heavily used machines-- lots of temp
fluctuations-- and the optimum point is much shorter than the assumption.
Ie, statistical errors are much smaller than clock drift errors.
>I. It will normally start by using minpoll and increase the interval
>after it has initial synchronization. If network conditions deteriorate
>it will decrease the poll interval and increase it as conditions
>improve. IOW it will use the optimum poll interval for the conditions
>then obtaining. If you configured seven servers, you might observe ntpd
>using seven DIFFERENT poll intervals, one for each server because seven
>different servers will be reached by at least seven different network paths!
>>
>> Currently we observe that both entry hosts can both become restricted due to
>> large offsets on other hosts, so they become restricted and that will make
>> the software refuse to go on. Ideally that would not happen.
>>
>> I will try to formulate questions:
>>
>> When the other hosts synchronize to the entry hosts of our system, don't the
>> other hosts ntpd know when and how much these entry hosts changed their time
>> due to input?
>>
>> Would NTP would be more robust if we would configure routing on the entry
>> hosts, so that they can all speak directly with the external NTPs on their
>> own?
>>
>> Is the use of ntpdate before starting ntpd recommended and/or does the iburst
>> option replace it?
>>
>> Best regards,
>> Kay Hayen
|
|
0
|
|
|
|
Reply
|
Unruh
|
9/6/2008 12:31:59 AM
|
|
Kay Hayen wrote:
> thanks to all who replied. Unfortunately the moderation bit made all of my
> last replies expire and instead of reposting them, I choose to sum things up
> in a single post.
Expiration is not a moderation thing, it is something done by your
USENET service provider to manage disk space. I think mine still has
the whole thread, and Google groups certainly will, except for any that
people have told it not to store.
The other thing that can have this effect, is that many Usenet readers,
by default, hide messages you have already read.
On the other hand, it appears that you are submitting via the mail to
new gateway. If your ISP is expiring items in your mailbox so soon, you
need a new ISP.
|
|
0
|
|
|
|
Reply
|
David
|
9/6/2008 9:14:24 AM
|
|
Hello,
> > Now speaking about our system, not the middleware, with connections as
> > follows:
> >
> > External NTPs <-> 2 entry hosts <-> 8 other hosts.
>
> What do you mean by "entry hosts"?
>From our 10 machines, only 2 have connection to the "external" NTP servers.
The "entry hosts" are these and servers of the "other" 8 ones.
> > And iburst and minpoll=maxpoll=5 to improve the results.
>
> Use the default values of minpoll and maxpoll! Ntpd will adjust the
> polling interval within those limits. Ntpd is far smarter than you or
> I. It will normally start by using minpoll and increase the interval
> after it has initial synchronization. If network conditions deteriorate
> it will decrease the poll interval and increase it as conditions
> improve. IOW it will use the optimum poll interval for the conditions
> then obtaining. If you configured seven servers, you might observe ntpd
> using seven DIFFERENT poll intervals, one for each server because seven
> different servers will be reached by at least seven different network
> paths!
Well, to my knowledge we did it because we observed improved convergence
behaviour on the 8 "other hosts", and particularily because it was not
working before. At the time they do an "iburst", none of the entry machines
may be running an ntpd yet, nor may it have completed its own iburst yet.
They all boot at the same time, so that would be why the low poll value is
used. As our system runs in isolated environments where people have full
control, polling this frequent (still only ever 32 seconds) is not a big
harm.
We have requirements to be able to run the software in x seconds after reboot
or else our customers acceptance tests fail. The requirement makes sense as
we are talking here about availability of service or not. Obviously the time
should be as small as possible.
For the servers behind the "entry hosts" I don't see how we could let ntpd
have its way when it's too slow.
Our requirements are abnormal, admitted. We require "equal" time on all 10
machines and that very fast.
I somehow think we should have something with ntpdate before ntpd is run. It
would waits for reachability of "ntpd" on the entry hosts and does an ntpdate
before running the local ntpd with an iburst that will then have less work to
do. (We shouldn't use a drift file in that case I presume, but due to issues
with the old middleware NTP supervision, we can't anyway.)
Then we could be faster and be robust against boot order variations.
Best regards,
Kay Hayen
|
|
0
|
|
|
|
Reply
|
kayhayen
|
9/7/2008 5:50:21 AM
|
|
Hello Mr. Unruh,
> >What worried me more was how often we can query the local ntpd before it
> > will have an adverse effect. Meantime I somehow I sought to convince me I
> > should be able to convince myself that ntpq requests are served at a
> > different priority (other socket) than ntpd requests are. I didn't find 2
> > sockets though.
>
> Depends on the system but thousands of times per second is not out of the
> ballpark. I assume you are not planning anything that severe.
> (Some servers bombarded by those idiotic people I believed managed those
> kinds of rates.)
No, not at all. We will only be targeting our local ntpd with ntpq requests
and then we will likely be able to use low rates.
As we are now for the offsets going to monitor them on our own contacting the
external ntpd at a rate, we will only need to know when its going to contact
an ntpd, and then restrict via another ntpq request possibly.
For all of that is no longer critical to be fast. Thank you for pounding on me
with that. :-)
> >> Briefly, you use the defaults for MINPOLL and MAXPOLL. You may use the
> >> "iburst" keyword in a server statement for fast startup. You may use
> >> the "burst" keyword ONLY with the permission of the the server's owner.
> >> 99.99% of NTP installations will work very well using these rules". If
> >> yours does not, ask here for help!
> >
> >Now speaking about our system, not the middleware, with connections as
> >follows:
> >
> >External NTPs <-> 2 entry hosts <-> 8 other hosts.
> >
> >And iburst and minpoll=maxpoll=5 to improve the results.
>
> On which? That should NOT be on the external NTPs unless you own them. That
> will not necessarily improve results-- depends on whether you want short
> term accuracy or long term (eg what happens if the connection with the
> outside world goes down for 3 days. Do you want to make sure your systems
> will keep good time during those three days? Are you willing to buy 25usec
> rather thahn 50usec short term accuracy for 10 sec drift over that 3 days?
If the NTP connections fail, we can accept a slow drift very well. But see my
last response to Richard B. Gilbert about why this is needed. We want the 8
other hosts to synchronize fast.
When they "iburst" none of the entry hosts may already have completed its own
startup, so they need to poll quickly even after the "iburst" or else
sychronization after reboot will take too long.
> >Currently we observe that both entry hosts can both become restricted due
> > to large offsets on other hosts, so they become restricted and that will
> > make the software refuse to go on. Ideally that would not happen.
> >
> >
> >I will try to formulate questions:
> >
> >When the other hosts synchronize to the entry hosts of our system, don't
> > the other hosts ntpd know when and how much these entry hosts changed
> > their time due to input?
>
> Yes, and no. On one level no-- they trust their sources. However part of
> the information they get is the dispersion. That gives some info about how
> well those servers are tracking the outside world.
But that would be more of "no". All the increased dispersion on "entry hosts"
due to required time shifting is going to give us is a slow down in the
synchronization of the "other hosts".
> >Is the use of ntpdate before starting ntpd recommended and/or does the
> > iburst option replace it?
>
> Not recommended.
I sort of think that we can build something for the "other" hosts that makes
them wait for the "entry" hosts to be synchronized. See that response to
Richard B. Gilbert again.
We could alternatively want to change ntpd in a way that the iburst lasts
until a sufficient synchronization was achieved. But it appears to be more
simply to delay the iburst by delaying the ntpd start until sufficient
conditions are met.
For the startup of our system that could be a solution that removes the need
for permanently low poll intervalls, although they are only needed initially.
Best regards,
Kay Hayen
|
|
0
|
|
|
|
Reply
|
kayhayen
|
9/7/2008 6:15:59 AM
|
|
Hello David,
Am Freitag, 5. September 2008 23:50:39 schrieb David Woolley:
> Kay Hayen wrote:
> > External NTPs <-> 2 entry hosts <-> 8 other hosts.
> >
> > And iburst and minpoll=maxpoll=5 to improve the results.
>
> If these External NTPs really are external, i.e. not owned by you, do
> not do this without explicit permission from their owners. There is a
> real risk of countermeasures if you don't. These may result in poor
> time or no time. Generally polling with anything less than the default
> MINPOLL and MAXPOLL can be considered abusive and polling with a MAXPOLL
> less than the default MINPOLL will trigger countermeasures in any system
> configure to apply them.
They are owned by the same people who then own installations of our system, so
that wouldn't be an issue.
When I say "restrict" it is our own system that decides that ">x ms" offset is
too bad and prevents ntpd from talking to it any further with a "restrict"
command. If all 2 servers of an "other host" are "restricted", it will crash
the software.
All of that is own our making and control.
Regarding the poll values. I am not sure why we do it the external NTPs as
well. Could be that the dispersion can be brought down quicker this way
on "entry hosts" and allow the "other hosts" to synchronize faster with them,
or could be that we never considered it worthwhile to optimize it away.
> > I will try to formulate questions:
> >
> > When the other hosts synchronize to the entry hosts of our system, don't
> > the other hosts ntpd know when and how much these entry hosts changed
> > their time due to input?
>
> You seem to be under the misapprehension that ntpd makes step changes on
> each measurement. It actually makes slow adjustments to effective
> frequency and rate of change of frequency based on s signficant number
> of preceding measurements (Unruh: I'm over-simplifying both the 8 step
> filter and the low pass loop filter here).
Well yes, but between 2 queries from the same client the ntpd will have made a
certain adjustment. If the client gets to know this value, it will have to
blame its own clock for that extra difference and assign it dispersion that
it doesn't deserve.
So, what I don't get is probably more like: How much will a stratum 3 server
be able to use the stratum 2 server only as an indirection of the stratum 1
datation?
In my mind the stratum 2 server was only trying to be accurate about how old
the stratum 1 datation is, and that the stratum 3 would be enabled to try and
converge towards the stratum 1 clock.
In other words I could say: I was expecting the ntpd answer to contain
upstream ntpd answers as well. And I was expecting the processing ntpd trying
to guess the stratum 1 time. Instead it seems, it is "only" trying to guess
the next upstream time and based on dispersions (its own and upstream) it's
following the direction of the guessed time more or less closely.
That's a different model and I think Mr. Unruh already clarified to me that
it's not the model that NTP uses. I think "my" model has little experience or
qualification behind it. Current NTP on the other hand is proven.
But I guess, it explains, why I have had a hard time to ever understand why
the "entry hosts" are so bad at forwarding the time after reboot. Obviously
NTP is not and need not be optimized towards simultaneous initialization.
Best regards,
Kay Hayen
|
|
0
|
|
|
|
Reply
|
kayhayen
|
9/7/2008 6:47:47 AM
|
|
Kay,
If you use iburst in your config files and have a "good" value in the
ntp.drift file, ntpd should sync up and be ready to go in about 11 seconds.
Please see:
https://support.ntp.org/bin/view/Support/ConfiguringNTP
and
https://support.ntp.org/bin/view/Support/StartingNTP
--
Harlan Stenn <stenn@ntp.org>
http://ntpforum.isc.org - be a member!
|
|
0
|
|
|
|
Reply
|
Harlan
|
9/7/2008 7:03:34 AM
|
|
Kay Hayen wrote:
> We could alternatively want to change ntpd in a way that the iburst lasts
> until a sufficient synchronization was achieved. But it appears to be more
> simply to delay the iburst by delaying the ntpd start until sufficient
> conditions are met.
>
That's not going to be desirable. Although you might only use it on
your internal severs, it will soon get round on the grapevine that it is
a good thing to do, which will result in servers that are down or denied
to the client, or the networks of ex-servers getting bombarded with
large numbers of requests, whereas I believe the standard behaviour is
to back off under those circumstances.
|
|
0
|
|
|
|
Reply
|
David
|
9/7/2008 8:36:35 AM
|
|
Hello Harlan,
you wrote:
> If you use iburst in your config files and have a "good" value in the
> ntp.drift file, ntpd should sync up and be ready to go in about 11 seconds.
>
> Please see:
>
> https://support.ntp.org/bin/view/Support/ConfiguringNTP
>
> and
>
> https://support.ntp.org/bin/view/Support/StartingNTP
I wasn't aware of "ntp-wait" yet. Seems to do (almost) what we might want:
Quote :
is never "stepped" backwards, run:
ntp-wait -v
as late as possible in the boot sequence, before starting these time-sensitive
services.
----
In effect that is what we want to do before the start of our application. But
it doesn't solve the problem fully for us. We would want on our "other hosts"
to have it check remote ntpd. That way we would have:
External NTPs <-> Entry Hosts <-> Other Hosts
The "entry hosts" would do simply local ntp-wait before starting the
application, but otherwise behave as normal. They only need to iburst and
then use default poll values.
The "other hosts" would do 2 ntp-wait on the 2 entry hosts. Only once either
of them finishes, the ntpd is started and boot sequence continued.
Et voila, our simultaneous initialization problems would be gone. Checking the
man page of ntp-wait on my Debian Testing here (4.2.4p4) it seems we would
have to enable the query of remote hosts first, but that sounds like a rather
simple patch.
The fundamental issue is that the "iburst" of the "other" hosts gets done
before it is entirely useful (the entry hosts are only just synchronizing at
best) and a remote ntp-wait could solve that.
Best regards,
Kay Hayen
PS: Addressing the support suggestion too. We will consider it definitely. So
far only our customers have had such contracts for their operational use. But
as we start to provide a NTP monitoring middleware as well, the situation
will be entirely different. There we don't control the NTP setup at all, but
only monitor it and raise alarms that will frequently result in support
questions to us. We would like to have a partner for these, I presume. I will
raise the issue in a meeting next week.
|
|
0
|
|
|
|
Reply
|
kayhayen
|
9/7/2008 8:36:59 AM
|
|
Harlan Stenn wrote:
>
> If you use iburst in your config files and have a "good" value in the
> ntp.drift file, ntpd should sync up and be ready to go in about 11 seconds.
It may be in error by up to 128ms under these circumstances, which will
take an hour or so to correct, during which there will be significant
frequency excursions. Note that this isn't a consequence of iburst,
iburst simply means that it will accept the current local time faster.
|
|
0
|
|
|
|
Reply
|
David
|
9/7/2008 8:42:28 AM
|
|
Hello David,
Am Sonntag, 7. September 2008 10:36:35 schrieb David Woolley:
> Kay Hayen wrote:
> > We could alternatively want to change ntpd in a way that the iburst lasts
> > until a sufficient synchronization was achieved. But it appears to be
> > more simply to delay the iburst by delaying the ntpd start until
> > sufficient conditions are met.
>
> That's not going to be desirable. Although you might only use it on
> your internal severs, it will soon get round on the grapevine that it is
> a good thing to do, which will result in servers that are down or denied
> to the client, or the networks of ex-servers getting bombarded with
> large numbers of requests, whereas I believe the standard behaviour is
> to back off under those circumstances.
Which is why I assume that the project won't accept patches that make ntp-wait
work with remote hosts. Anybody correct me if I am wrong, because we would
gladly contribute such patches. In the mean time, we can use the new libntpq
to achieve the same effect.
The use of ntp-wait on external servers would be pointless and harmful. It
only makes sense to us, because we sort of _know_ that our startup process is
in a race of all machines at the same time, because of the joint reboot,
joint power up after (simulated) power failure.
I think the 11 seconds that Harlan mentioned are something we definitely want
to have, but fail to meet the preconditions on the "other" hosts, due to the
lack of a waiting step. But if have measures in place to make sure that
our "entry" servers are themselves synchronized themselves before bursting on
them with our "other" hosts, then it will be all graceful I guess.
If done correctly (async to other boot tasks), the delay in starting our
application could become difficult to notice. And using the new-born libntpq
with remote hosts, an implementation of ntp-wait for remote servers will be
rather trivial to make.
Well, to sum it up. I think I got a plan now.
Best regards,
Kay Hayen
|
|
0
|
|
|
|
Reply
|
kayhayen
|
9/7/2008 9:33:11 AM
|
|
>>> In article <48c393fb$0$522$5a6aecb4@news.aaisp.net.uk>, David Woolley <david@ex.djwhome.demon.co.uk.invalid> writes:
David> Harlan Stenn wrote:
>> If you use iburst in your config files and have a "good" value in the
>> ntp.drift file, ntpd should sync up and be ready to go in about 11
>> seconds.
David> It may be in error by up to 128ms under these circumstances, which
David> will take an hour or so to correct, during which there will be
David> significant frequency excursions. Note that this isn't a consequence
David> of iburst, iburst simply means that it will accept the current local
David> time faster.
Please see (and add useful content to):
http://support.ntp.org/bin/view/Dev/NtpdsSyncStatus
--
Harlan Stenn <stenn@ntp.org>
http://ntpforum.isc.org - be a member!
|
|
0
|
|
|
|
Reply
|
Harlan
|
9/7/2008 6:19:05 PM
|
|
Kay Hayen wrote:
>
> They are owned by the same people who then own installations of our system, so
> that wouldn't be an issue.
You will still have to ensure that they do not enable kiss of death on
those servers. Also you should make sure that they don't try to use
w32time, especially older versions, and if your timing requirements are
tighter than a few tens of milliseconds, that there are no Windows
machines involved.
>
> When I say "restrict" it is our own system that decides that ">x ms" offset is
> too bad and prevents ntpd from talking to it any further with a "restrict"
> command. If all 2 servers of an "other host" are "restricted", it will crash
> the software.
You are overriding NTP's selection algorithms. Effectively you are no
longer running NTP.
>
> All of that is own our making and control.
>
> Regarding the poll values. I am not sure why we do it the external NTPs as
> well. Could be that the dispersion can be brought down quicker this way
You are misusing "dispersion". Dispersion is an estimate of worst case
drift and reading resolution errors.
> on "entry hosts" and allow the "other hosts" to synchronize faster with them,
> or could be that we never considered it worthwhile to optimize it away.
> Well yes, but between 2 queries from the same client the ntpd will have made a
> certain adjustment. If the client gets to know this value, it will have to
ntpd is making adjustments at least every 4 seconds (old versions) and
as often as every clock tick. It does this by adjusting frequency not
by directly adjusting time.
> blame its own clock for that extra difference and assign it dispersion that
> it doesn't deserve.
>>
> That's a different model and I think Mr. Unruh already clarified to me that
> it's not the model that NTP uses. I think "my" model has little experience or
> qualification behind it. Current NTP on the other hand is proven.
>
Firstly, I don't know any time synchronisation software that doesn't
have a large step by step element.
More importantly, if you are going to micro-manage ntpd, you need a deep
understanding of how it works to know what the statistics really mean
and know what are realistic expectations.
|
|
0
|
|
|
|
Reply
|
David
|
9/8/2008 7:14:14 AM
|
|
David,
David Woolley wrote:
> Kay Hayen wrote:
>
>> thanks to all who replied. Unfortunately the moderation bit made all of
>> my last replies expire and instead of reposting them, I choose to sum
>> things up in a single post.
>
> Expiration is not a moderation thing, it is something done by your
> USENET service provider to manage disk space. I think mine still has
> the whole thread, and Google groups certainly will, except for any that
> people have told it not to store.
>
> The other thing that can have this effect, is that many Usenet readers,
> by default, hide messages you have already read.
>
> On the other hand, it appears that you are submitting via the mail to
> new gateway. If your ISP is expiring items in your mailbox so soon, you
> need a new ISP.
If I understand Kay correctly then the problem is that he responded via the
questions@ list and the moderation bit was set there, which prevented some
of his articles from being gatewayed to the usenet.
In Kay's original thread Steve Kostecke mentioned he had removed the
moderation bit for Kay, but obviously that did not fully help.
Martin
--
Martin Burnicki
Meinberg Funkuhren
Bad Pyrmont
Germany
|
|
0
|
|
|
|
Reply
|
Martin
|
9/8/2008 9:20:06 AM
|
|
Hello David,
> > When I say "restrict" it is our own system that decides that ">x ms"
> > offset is too bad and prevents ntpd from talking to it any further with a
> > "restrict" command. If all 2 servers of an "other host" are "restricted",
> > it will crash the software.
>
> You are overriding NTP's selection algorithms. Effectively you are no
> longer running NTP.
How would it be difference from using the restrict command manually?
And why would it not be NTP?
> > All of that is own our making and control.
> >
> > Regarding the poll values. I am not sure why we do it the external NTPs
> > as well. Could be that the dispersion can be brought down quicker this
> > way
>
> You are misusing "dispersion". Dispersion is an estimate of worst case
> drift and reading resolution errors.
Well, dispersion is going down only with more samples to base estimation on,
isn't it? And we need that quick, if we want the server to influence the
hosts behind it quickly, say after a "NTP LAN" failure ended (some people
have dedicated LANs for NTP).
> > on "entry hosts" and allow the "other hosts" to synchronize faster with
> > them, or could be that we never considered it worthwhile to optimize it
> > away. Well yes, but between 2 queries from the same client the ntpd will
> > have made a certain adjustment. If the client gets to know this value, it
> > will have to
>
> ntpd is making adjustments at least every 4 seconds (old versions) and
> as often as every clock tick. It does this by adjusting frequency not
> by directly adjusting time.
I was not concerned with how the kernel makes the adjustments, but rather that
the a fixed time change over the period is known. The slew rate is known,
isn't it?
Let me use a car analogy, these things work. :-)
Lets assume a three lane high way with 3 cars that try to drive at the same
speed. The car to the left is driving at (near) constant speed. The driver in
the middle accelerates and braces according to his motor behaviour as well as
the observed difference in speed between him and the other one. Now what
should the driver to the right do?
In my view, he could take the acceleration of his neighbour into account when
making estimates of his own error.
Best regards,
Kay Hayen
|
|
0
|
|
|
|
Reply
|
kayhayen
|
9/8/2008 3:43:55 PM
|
|
kayhayen@gmx.de (Kay Hayen) writes:
>Hello David,
>> > When I say "restrict" it is our own system that decides that ">x ms"
>> > offset is too bad and prevents ntpd from talking to it any further with a
>> > "restrict" command. If all 2 servers of an "other host" are "restricted",
>> > it will crash the software.
>>
>> You are overriding NTP's selection algorithms. Effectively you are no
>> longer running NTP.
>How would it be difference from using the restrict command manually?
>And why would it not be NTP?
>> > All of that is own our making and control.
>> >
>> > Regarding the poll values. I am not sure why we do it the external NTPs
>> > as well. Could be that the dispersion can be brought down quicker this
>> > way
>>
>> You are misusing "dispersion". Dispersion is an estimate of worst case
>> drift and reading resolution errors.
>Well, dispersion is going down only with more samples to base estimation on,
>isn't it? And we need that quick, if we want the server to influence the
>hosts behind it quickly, say after a "NTP LAN" failure ended (some people
>have dedicated LANs for NTP).
>> > on "entry hosts" and allow the "other hosts" to synchronize faster with
>> > them, or could be that we never considered it worthwhile to optimize it
>> > away. Well yes, but between 2 queries from the same client the ntpd will
>> > have made a certain adjustment. If the client gets to know this value, it
>> > will have to
>>
>> ntpd is making adjustments at least every 4 seconds (old versions) and
>> as often as every clock tick. It does this by adjusting frequency not
>> by directly adjusting time.
>I was not concerned with how the kernel makes the adjustments, but rather that
>the a fixed time change over the period is known. The slew rate is known,
>isn't it?
>Let me use a car analogy, these things work. :-)
>Lets assume a three lane high way with 3 cars that try to drive at the same
>speed. The car to the left is driving at (near) constant speed. The driver in
>the middle accelerates and braces according to his motor behaviour as well as
>the observed difference in speed between him and the other one. Now what
>should the driver to the right do?
The cars have the road as a reference. However without the road, how does
car 3 know that car 2 is accelerating and decelerating and that it is not
hiw own car that is misbehaving? He does not. All he
can do is collect more cars and use the average behaviour to determine who
is behaving badly.
With two other cars only as a reference there is no way of deciding which
is weird.
And if he has the road as a reference, then use the road, not either of the
other cars ( ie buy yourself a GPS receiver with PPS and then you will not
have to worry about what other cars are doing).
>In my view, he could take the acceleration of his neighbour into account when
>making estimates of his own error.
>Best regards,
>Kay Hayen
|
|
0
|
|
|
|
Reply
|
Unruh
|
9/8/2008 5:13:40 PM
|
|
Kay Hayen wrote:
> Hello David,
>
>>> When I say "restrict" it is our own system that decides that ">x ms"
>>> offset is too bad and prevents ntpd from talking to it any further with a
>>> "restrict" command. If all 2 servers of an "other host" are "restricted",
>>> it will crash the software.
>> You are overriding NTP's selection algorithms. Effectively you are no
>> longer running NTP.
>
> How would it be difference from using the restrict command manually?
>
> And why would it not be NTP?
>
>>> All of that is own our making and control.
>>>
>>> Regarding the poll values. I am not sure why we do it the external NTPs
>>> as well. Could be that the dispersion can be brought down quicker this
>>> way
>> You are misusing "dispersion". Dispersion is an estimate of worst case
>> drift and reading resolution errors.
>
> Well, dispersion is going down only with more samples to base estimation on,
> isn't it? And we need that quick, if we want the server to influence the
> hosts behind it quickly, say after a "NTP LAN" failure ended (some people
> have dedicated LANs for NTP).
>
>>> on "entry hosts" and allow the "other hosts" to synchronize faster with
>>> them, or could be that we never considered it worthwhile to optimize it
>>> away. Well yes, but between 2 queries from the same client the ntpd will
>>> have made a certain adjustment. If the client gets to know this value, it
>>> will have to
>> ntpd is making adjustments at least every 4 seconds (old versions) and
>> as often as every clock tick. It does this by adjusting frequency not
>> by directly adjusting time.
>
> I was not concerned with how the kernel makes the adjustments, but rather that
> the a fixed time change over the period is known. The slew rate is known,
> isn't it?
>
> Let me use a car analogy, these things work. :-)
>
> Lets assume a three lane high way with 3 cars that try to drive at the same
> speed. The car to the left is driving at (near) constant speed. The driver in
> the middle accelerates and braces according to his motor behaviour as well as
> the observed difference in speed between him and the other one. Now what
> should the driver to the right do?
>
> In my view, he could take the acceleration of his neighbour into account when
> making estimates of his own error.
Why should the driver in the right lane not ignore the driver in the
middle and try to match his speed to the leftmost driver? It seems to
me that this is analogous to preferring the stratum one server to the
stratum two server!
|
|
0
|
|
|
|
Reply
|
Richard
|
9/8/2008 6:20:30 PM
|
|
Unruh wrote:
> kayhayen@gmx.de (Kay Hayen) writes:
>
>> Hello David,
>
>>>> When I say "restrict" it is our own system that decides that ">x ms"
>>>> offset is too bad and prevents ntpd from talking to it any further with a
>>>> "restrict" command. If all 2 servers of an "other host" are "restricted",
>>>> it will crash the software.
>>> You are overriding NTP's selection algorithms. Effectively you are no
>>> longer running NTP.
>
>> How would it be difference from using the restrict command manually?
>
>> And why would it not be NTP?
>
>>>> All of that is own our making and control.
>>>>
>>>> Regarding the poll values. I am not sure why we do it the external NTPs
>>>> as well. Could be that the dispersion can be brought down quicker this
>>>> way
>>> You are misusing "dispersion". Dispersion is an estimate of worst case
>>> drift and reading resolution errors.
>
>> Well, dispersion is going down only with more samples to base estimation on,
>> isn't it? And we need that quick, if we want the server to influence the
>> hosts behind it quickly, say after a "NTP LAN" failure ended (some people
>> have dedicated LANs for NTP).
>
>>>> on "entry hosts" and allow the "other hosts" to synchronize faster with
>>>> them, or could be that we never considered it worthwhile to optimize it
>>>> away. Well yes, but between 2 queries from the same client the ntpd will
>>>> have made a certain adjustment. If the client gets to know this value, it
>>>> will have to
>>> ntpd is making adjustments at least every 4 seconds (old versions) and
>>> as often as every clock tick. It does this by adjusting frequency not
>>> by directly adjusting time.
>
>> I was not concerned with how the kernel makes the adjustments, but rather that
>> the a fixed time change over the period is known. The slew rate is known,
>> isn't it?
>
>> Let me use a car analogy, these things work. :-)
>
>> Lets assume a three lane high way with 3 cars that try to drive at the same
>> speed. The car to the left is driving at (near) constant speed. The driver in
>> the middle accelerates and braces according to his motor behaviour as well as
>> the observed difference in speed between him and the other one. Now what
>> should the driver to the right do?
>
> The cars have the road as a reference. However without the road, how does
> car 3 know that car 2 is accelerating and decelerating and that it is not
> hiw own car that is misbehaving? He does not. All he
> can do is collect more cars and use the average behaviour to determine who
> is behaving badly.
Car 3 has a speedometer!
>
> With two other cars only as a reference there is no way of deciding which
> is weird.
>
> And if he has the road as a reference, then use the road, not either of the
> other cars ( ie buy yourself a GPS receiver with PPS and then you will not
> have to worry about what other cars are doing).
>
>
>> In my view, he could take the acceleration of his neighbour into account when
>> making estimates of his own error.
>
>> Best regards,
>> Kay Hayen
|
|
0
|
|
|
|
Reply
|
Richard
|
9/8/2008 6:23:24 PM
|
|
On 2008-09-08, Martin Burnicki <martin.burnicki@meinberg.de> wrote:
> If I understand Kay correctly then the problem is that he responded
> via the questions@ list and the moderation bit was set there, which
> prevented some of his articles from being gatewayed to the usenet.
When messages are held for moderation they are not sent to the
mailing-list. _None_ of the list subscribers (which includes the
gateway) see those messages until they are released.
> In Kay's original thread Steve Kostecke mentioned he had removed the
> moderation bit for Kay, but obviously that did not fully help.
I stated that I released Kay's messages but that I left the moderation
bit alone and deferred to the list-master.
--
Steve Kostecke <kostecke@ntp.org>
NTP Public Services Project - http://support.ntp.org/
|
|
0
|
|
|
|
Reply
|
Steve
|
9/8/2008 6:50:53 PM
|
|
Kay Hayen wrote:
>
>
> How would it be difference from using the restrict command manually?
Because manual use would normally be based on significant thought and
measurements over an extended period.
>
> And why would it not be NTP?
Because a key part of NTP is the algorithm used to identify and reject
unreliable sources of time. These actually work better if you have many
sources.
> >
> Well, dispersion is going down only with more samples to base estimation on,
The calculation initially assumes that the source jitter might be very
large until it has evidence to the contrary.
> isn't it? And we need that quick, if we want the server to influence the
> hosts behind it quickly, say after a "NTP LAN" failure ended (some people
> have dedicated LANs for NTP).
iburst covers that.
>
> I was not concerned with how the kernel makes the adjustments, but rather that
> the a fixed time change over the period is known. The slew rate is known,
> isn't it?
The actual change in time in any period should be zero, within
statistical error. The real excess slew rate should also be zero within
statistical error. The assumed length of a tick, which is probably the
reciprocal of what you mean by the slew rate, is continuously varying.
You would need to integrate this to get the excess number of ticks over
a period, which is, I think your concept of error.
(The big argument between chrony and ntpd is about whether ntpd really
gives the best estimate of true time for real inputs.)
>
> Let me use a car analogy, these things work. :-)
>
> Lets assume a three lane high way with 3 cars that try to drive at the same
> speed. The car to the left is driving at (near) constant speed. The driver in
> the middle accelerates and braces according to his motor behaviour as well as
> the observed difference in speed between him and the other one. Now what
> should the driver to the right do?
>
> In my view, he could take the acceleration of his neighbour into account when
> making estimates of his own error.
Analogies are always unsafe in fora, but the second car doesn't actually
know its acceleration (remember, if they could actually see the road,
they would use that as reference). All it knows is how hard its driver
is pushing on the accelerator.
Moreover, the drivers are looking at each other through mirrors that are
vibrating violently and unpredictably, such that the apparent position
of the neighbours is varying much more than their true relevant
position. To a significant extent the mirrors are moving independently
of each other (this probably requires that the third driver actually be
the middle one, to make the physical model sensible).
|
|
0
|
|
|
|
Reply
|
David
|
9/8/2008 9:19:52 PM
|
|
"Richard B. Gilbert" <rgilbert88@comcast.net> writes:
>Unruh wrote:
>> kayhayen@gmx.de (Kay Hayen) writes:
>>
>>> Hello David,
>>
>>>>> When I say "restrict" it is our own system that decides that ">x ms"
>>>>> offset is too bad and prevents ntpd from talking to it any further with a
>>>>> "restrict" command. If all 2 servers of an "other host" are "restricted",
>>>>> it will crash the software.
>>>> You are overriding NTP's selection algorithms. Effectively you are no
>>>> longer running NTP.
>>
>>> How would it be difference from using the restrict command manually?
>>
>>> And why would it not be NTP?
>>
>>>>> All of that is own our making and control.
>>>>>
>>>>> Regarding the poll values. I am not sure why we do it the external NTPs
>>>>> as well. Could be that the dispersion can be brought down quicker this
>>>>> way
>>>> You are misusing "dispersion". Dispersion is an estimate of worst case
>>>> drift and reading resolution errors.
>>
>>> Well, dispersion is going down only with more samples to base estimation on,
>>> isn't it? And we need that quick, if we want the server to influence the
>>> hosts behind it quickly, say after a "NTP LAN" failure ended (some people
>>> have dedicated LANs for NTP).
>>
>>>>> on "entry hosts" and allow the "other hosts" to synchronize faster with
>>>>> them, or could be that we never considered it worthwhile to optimize it
>>>>> away. Well yes, but between 2 queries from the same client the ntpd will
>>>>> have made a certain adjustment. If the client gets to know this value, it
>>>>> will have to
>>>> ntpd is making adjustments at least every 4 seconds (old versions) and
>>>> as often as every clock tick. It does this by adjusting frequency not
>>>> by directly adjusting time.
>>
>>> I was not concerned with how the kernel makes the adjustments, but rather that
>>> the a fixed time change over the period is known. The slew rate is known,
>>> isn't it?
>>
>>> Let me use a car analogy, these things work. :-)
>>
>>> Lets assume a three lane high way with 3 cars that try to drive at the same
>>> speed. The car to the left is driving at (near) constant speed. The driver in
>>> the middle accelerates and braces according to his motor behaviour as well as
>>> the observed difference in speed between him and the other one. Now what
>>> should the driver to the right do?
>>
>> The cars have the road as a reference. However without the road, how does
>> car 3 know that car 2 is accelerating and decelerating and that it is not
>> hiw own car that is misbehaving? He does not. All he
>> can do is collect more cars and use the average behaviour to determine who
>> is behaving badly.
>Car 3 has a speedometer!
Yes, that is with reference to the road. Car three should thus completely
ignore the other two cars and use his speedometer.
Ie, put up a GPS receiver with a PPS and use that as your time source, and
ignore all the other ntp time sources, except perhaps as sanity checks (eg
if you r speedometer breaks you should get to know about it by occasionally
looking at the other cars)
>>
>> With two other cars only as a reference there is no way of deciding which
>> is weird.
>>
>> And if he has the road as a reference, then use the road, not either of the
>> other cars ( ie buy yourself a GPS receiver with PPS and then you will not
>> have to worry about what other cars are doing).
>>
>>
>>> In my view, he could take the acceleration of his neighbour into account when
>>> making estimates of his own error.
>>
>>> Best regards,
>>> Kay Hayen
|
|
0
|
|
|
|
Reply
|
Unruh
|
9/8/2008 9:36:16 PM
|
|
Steve,
Steve Kostecke wrote:
> On 2008-09-08, Martin Burnicki <martin.burnicki@meinberg.de> wrote:
>> In Kay's original thread Steve Kostecke mentioned he had removed the
>> moderation bit for Kay, but obviously that did not fully help.
>
> I stated that I released Kay's messages but that I left the moderation
> bit alone and deferred to the list-master.
Sorry, I mis-remembered this.
Has the moderation bit for Kay been set because he posted to the questions@
list without having subscribed to the list?
Martin
--
Martin Burnicki
Meinberg Funkuhren
Bad Pyrmont
Germany
|
|
0
|
|
|
|
Reply
|
Martin
|
9/9/2008 7:41:15 AM
|
|
>The "rules" about how often to query a daemon are not all that
>complicated. The fact that there ARE rules is due to some history;
>google for "Netgear Wisconsin" for the sordid details. For a "second
>opinion" google for "DLink PHK".
There is a good summary at:
NTP server misuse and abuse
http://en.wikipedia.org/wiki/NTP_server_misuse_and_abuse
--
These are my opinions, not necessarily my employer's. I hate spam.
|
|
0
|
|
|
|
Reply
|
hal
|
9/9/2008 8:04:41 AM
|
|
Unruh wrote:
>
> Yes, that is with reference to the road. Car three should thus completely
> ignore the other two cars and use his speedometer.
>
> Ie, put up a GPS receiver with a PPS and use that as your time source, and
> ignore all the other ntp time sources, except perhaps as sanity checks (eg
> if you r speedometer breaks you should get to know about it by occasionally
> looking at the other cars)
A)One GPS to each box or
B) a single GPS with PPS line to all boxes?
A:
Doesn't that impact reliability?
You add the failure probability of a GPS-unit to each Box
where one failure will make the whole system fail.
What about doing startup of all involved boxes from the (outside)
upstream timeserver?
a question in this context:
could I use something like this for a group of boxes to sync:
server $external_upstream_host
foreach box $neighbours
peer $box
########################
uwe
|
|
0
|
|
|
|
Reply
|
Uwe
|
9/9/2008 8:34:17 AM
|
|
On 2008-09-09, Martin Burnicki <martin.burnicki@meinberg.de> wrote:
> Has the moderation bit for Kay been set because he posted to the questions@
> list without having subscribed to the list?
Kay did the right thing and subscribed to the list before posting to it.
Posts from all new subscribers are held for moderation (i.e. "their
moderation bit is set") until they have demostrated that they are not
attempting to use the list in an abusive manner. This policy keeps out
the "drive-by" spammers.
--
Steve Kostecke <kostecke@ntp.org>
NTP Public Services Project - http://support.ntp.org/
|
|
0
|
|
|
|
Reply
|
Steve
|
9/9/2008 12:15:57 PM
|
|
Uwe Klein <uwe_klein_habertwedt@t-online.de> writes:
>Unruh wrote:
>>
>> Yes, that is with reference to the road. Car three should thus completely
>> ignore the other two cars and use his speedometer.
>>
>> Ie, put up a GPS receiver with a PPS and use that as your time source, and
>> ignore all the other ntp time sources, except perhaps as sanity checks (eg
>> if you r speedometer breaks you should get to know about it by occasionally
>> looking at the other cars)
>A)One GPS to each box or
>B) a single GPS with PPS line to all boxes?
Whichever you want. Up to you.
>A:
>Doesn't that impact reliability?
>You add the failure probability of a GPS-unit to each Box
>where one failure will make the whole system fail.
So, that is why ntp has backup servers. You have a single failure point
anyway-- the network. It goes down, and nothing can get the time.
>What about doing startup of all involved boxes from the (outside)
>upstream timeserver?
???
>a question in this context:
>could I use something like this for a group of boxes to sync:
>server $external_upstream_host
>foreach box $neighbours
> peer $box
>########################
>uwe
|
|
0
|
|
|
|
Reply
|
Unruh
|
9/9/2008 6:39:53 PM
|
|
Unruh wrote:
> Uwe Klein <uwe_klein_habertwedt@t-online.de> writes:
>
>
>>Unruh wrote:
>>
>>>Yes, that is with reference to the road. Car three should thus completely
>>>ignore the other two cars and use his speedometer.
>>>
>>>Ie, put up a GPS receiver with a PPS and use that as your time source, and
>>>ignore all the other ntp time sources, except perhaps as sanity checks (eg
>>>if you r speedometer breaks you should get to know about it by occasionally
>>>looking at the other cars)
>
>
>>A)One GPS to each box or
>>B) a single GPS with PPS line to all boxes?
>
>
> Whichever you want. Up to you.
>
>
>>A:
>>Doesn't that impact reliability?
>
>
>>You add the failure probability of a GPS-unit to each Box
>>where one failure will make the whole system fail.
>
>
> So, that is why ntp has backup servers. You have a single failure point
> anyway-- the network. It goes down, and nothing can get the time.
That actually is _three_ different scenarios.
time over the network:
network fails
1: time
2: the system as a whole
failure of network infrastructure
thus does not add to the probability of the complete system failing.
time over PPS/GPS 1 unit with signaling to each box:
network fails
2: the system as a whole
GPS fails
1: time
-> the system as a whole
This adds up to a higher failure rate/probability.
time over PPS/GPS unit per box:
network fails
2: the system as a whole
GPS fails
1: time
-> the system as a whole
This adds up to a higher failure rate/probability.
With the added disadvantage that GPS failure overall
is single failure times number of boxes.
uwe
|
|
0
|
|
|
|
Reply
|
Uwe
|
9/9/2008 7:08:57 PM
|
|
Unruh wrote:
> Uwe Klein <uwe_klein_habertwedt@t-online.de> writes:
>
>> Unruh wrote:
>>> Yes, that is with reference to the road. Car three should thus completely
>>> ignore the other two cars and use his speedometer.
>>>
>>> Ie, put up a GPS receiver with a PPS and use that as your time source, and
>>> ignore all the other ntp time sources, except perhaps as sanity checks (eg
>>> if you r speedometer breaks you should get to know about it by occasionally
>>> looking at the other cars)
>
>> A)One GPS to each box or
>> B) a single GPS with PPS line to all boxes?
>
> Whichever you want. Up to you.
>
>> A:
>> Doesn't that impact reliability?
>
>> You add the failure probability of a GPS-unit to each Box
>> where one failure will make the whole system fail.
>
> So, that is why ntp has backup servers. You have a single failure point
> anyway-- the network. It goes down, and nothing can get the time.
>
<snip>
If the possibility of failure of your network or your internet
connection worries you, you can use a modem and a telephone line as a
backup! Or you can get a GPS receiver, WWV/WWVH/WWVB receiver or an
atomic clock of your very own. Most sites don't bother because their
requirements are not that tight. FWIW, a system that has been
synchronized by NTP will tend to stay close to the correct time for a
reasonable period of time as long as the environment does not change
significantly. If the network fails AND the air conditioning fails you
are in trouble!
|
|
0
|
|
|
|
Reply
|
Richard
|
9/9/2008 7:52:30 PM
|
|
Uwe Klein <uwe_klein_habertwedt@t-online.de> writes:
>Unruh wrote:
>> Uwe Klein <uwe_klein_habertwedt@t-online.de> writes:
>>
>>
>>>Unruh wrote:
>>>
>>>>Yes, that is with reference to the road. Car three should thus completely
>>>>ignore the other two cars and use his speedometer.
>>>>
>>>>Ie, put up a GPS receiver with a PPS and use that as your time source, and
>>>>ignore all the other ntp time sources, except perhaps as sanity checks (eg
>>>>if you r speedometer breaks you should get to know about it by occasionally
>>>>looking at the other cars)
>>
>>
>>>A)One GPS to each box or
>>>B) a single GPS with PPS line to all boxes?
>>
>>
>> Whichever you want. Up to you.
>>
>>
>>>A:
>>>Doesn't that impact reliability?
>>
>>
>>>You add the failure probability of a GPS-unit to each Box
>>>where one failure will make the whole system fail.
>>
>>
>> So, that is why ntp has backup servers. You have a single failure point
>> anyway-- the network. It goes down, and nothing can get the time.
>That actually is _three_ different scenarios.
>time over the network:
>network fails
> 1: time
> 2: the system as a whole
> failure of network infrastructure
> thus does not add to the probability of the complete system failing.
>time over PPS/GPS 1 unit with signaling to each box:
>network fails
> 2: the system as a whole
>GPS fails
> 1: time
> -> the system as a whole
> This adds up to a higher failure rate/probability.
>time over PPS/GPS unit per box:
>network fails
> 2: the system as a whole
>GPS fails
> 1: time
> -> the system as a whole
> This adds up to a higher failure rate/probability.
> With the added disadvantage that GPS failure overall
> is single failure times number of boxes.
So, put a GPS connected to each box. That will be a stratum 0 source and
will be selected by ntp. If that fails, have each of the other machines as
a backup. They will be stratum 1 source. Then have the system go out onto
the world wide net to pool.ntp. Those will be stratum 2 or lower. Each
backs up the otehr. Thus each machine will gets its time from GPS (usec
precision) It that fails, they get it from the local machines ( 10s of usec
precision) If that all fails they get it from the net ( ms precision) It
that all fails, you are SOOL. You probably have other worries anyway.
How many belts and braces you want is entirely up to you.
I would have one GPS on one machine. Everything gets their time from that,
unless it fails in which case pool.ntp would act as a backup. But it is
entirely up to you.
>uwe
|
|
0
|
|
|
|
Reply
|
Unruh
|
9/9/2008 8:23:06 PM
|
|
Kay Hayen wrote:
> What worried me more was how often we can query the local ntpd before it will
> have an adverse effect. Meantime I somehow I sought to convince me I should
> be able to convince myself that ntpq requests are served at a different
> priority (other socket) than ntpd requests are. I didn't find 2 sockets
> though.
>
They aren't. It's the same socket and each packet is responded to in
turn irrespective of the content. It's also not possible to create a
separate socket unless we have a separate command channel and that does
not currently exist and is nowhere defined in the protocol.
>> Briefly, you use the defaults for MINPOLL and MAXPOLL. You may use the
>> "iburst" keyword in a server statement for fast startup. You may use
>> the "burst" keyword ONLY with the permission of the the server's owner.
>> 99.99% of NTP installations will work very well using these rules". If
>> yours does not, ask here for help!
>
> Now speaking about our system, not the middleware, with connections as
> follows:
>
> External NTPs <-> 2 entry hosts <-> 8 other hosts.
>
> And iburst and minpoll=maxpoll=5 to improve the results.
This indicates that you don't understand NTP. You should never ever
change the minpoll and maxpoll values unless you understand the NTP
algorithms in detail and understand the consequences of changing them.
The default values were very carefully chosen to provide a balance
between various conflicting requirements to provide the most stable
clock discipline over a wide range of environments. You are
undersampling at the start of NTP and then oversampling as it starts to
stabilize the discipline loop.
>
> Currently we observe that both entry hosts can both become restricted due to
> large offsets on other hosts, so they become restricted and that will make
> the software refuse to go on. Ideally that would not happen.
>
If the servers that it uses become divergent it will be unable to pick
the "best" one and it will become unsynchronized.
> I will try to formulate questions:
>
> When the other hosts synchronize to the entry hosts of our system, don't the
> other hosts ntpd know when and how much these entry hosts changed their time
> due to input?
>
> Would NTP would be more robust if we would configure routing on the entry
> hosts, so that they can all speak directly with the external NTPs on their
> own?
>
Yes since the stratum will be lower so that the error budget will also
be lower.
> Is the use of ntpdate before starting ntpd recommended and/or does the iburst
> option replace it?
>
ntpdate is deprecated and is not normally needed. Make sure you start
ntpd with the -g option to step the clock initially to close to the
correct tick.
Danny
> Best regards,
> Kay Hayen
|
|
0
|
|
|
|
Reply
|
mayer
|
10/13/2008 3:43:02 AM
|
|
Danny Mayer wrote:
> Kay Hayen wrote:
>> And iburst and minpoll=maxpoll=5 to improve the results.
>
> This indicates that you don't understand NTP. You should never ever
> change the minpoll and maxpoll values unless you understand the NTP
> algorithms in detail and understand the consequences of changing them.
> The default values were very carefully chosen to provide a balance
> between various conflicting requirements to provide the most stable
Those conflicting requirements make assumptions about the environment in
which NTP is operating. Those assumptions probably aren't valid when
the servers are on the same high speed, low traffic, network. Having
said that, one shouldn't just set minpoll and maxpoll low, but should
actually measure the results and find optimum values for the actual
conditions.
> clock discipline over a wide range of environments. You are
> undersampling at the start of NTP and then oversampling as it starts to
> stabilize the discipline loop.
ntpd always oversamples. Changing the limits limits the range of filter
time constants used. Setting it low, improves convergence on startup,
and re-convergence after a temperature change, which is why there is so
much use of it - ntpd is failing to meet a market demand, and setting
both these low has become the urban folklore solution. It also tends to
minimise the value of "offset" at other times, but that is not
necessarily good, as offset is not the same thing as error, and,
ideally, would be uncorrelated with it.
(ntpd starts to back off the time constant long before the startup
transient is complete, so keeping it artificially low helps there. For
temperature changes, it takes time for the time constant to ramp down,
which is avoided by keeping it low.)
The reasons for not doing it are that it makes ntpd try to follow short
term variations in offset, which are likely to be due to network
conditions, rather than true time errors, and it makes the frequency
less stable, which means that short durations are measured less
accurately and time will diverge more quickly if connections to the
servers is lost. It also imposes an unnecessary load on the servers.
> ntpdate is deprecated and is not normally needed. Make sure you start
> ntpd with the -g option to step the clock initially to close to the
> correct tick.
-g doesn't step the clock, it simply allows the clock to be stepped by
more than 1000s, the first time. Clock stepping is still subject to the
128ms minimum offset. Both numbers are configurable, although changing
them may disable some functions.
|
|
0
|
|
|
|
Reply
|
David
|
10/13/2008 9:08:18 AM
|
|
On Tue, Sep 9, 2008 at 2:52 PM, Richard B. Gilbert
<rgilbert88@comcast.net> wrote:
> FWIW, a system that has been
> synchronized by NTP will tend to stay close to the correct time for a
> reasonable period of time as long as the environment does not change
> significantly. If the network fails AND the air conditioning fails you
> are in trouble!
That is, of course, precisely what happens in many long-term power
outages. Typical UPS battery run times in a datacenter are in minutes,
not hours. And UPS rarely backup the cooling system. If you don't have
a working generator on standby with plenty of fuel, you're up the
proverbial creek.
Even if you have the generators, you have to be careful. A colocation
provider recently had an outage that was interesting. A truck ran into
their (exterior) transformers, cutting utility power. No problem, they
have generators, right?. Well, their water chillers could not re-start
fast enough after the generators came on line, so the rapidly
increasing temperature caused shut down about 1/3 of the servers in
their datacenter. All told, their SLA credits amounted to millions of
dollars.
Focusing on extreme redundancy for one piece of your infrastructure
(time) is sort of pointless if you don't have full tested redundancy
in the lower layers of the system (physcial plant, power, cooling,
network, etc.)
--
RPM
|
|
0
|
|
|
|
Reply
|
malayter
|
10/13/2008 1:30:22 PM
|
|
mayer@ntp.isc.org (Danny Mayer) writes:
....
>>> Briefly, you use the defaults for MINPOLL and MAXPOLL. You may use the
>>> "iburst" keyword in a server statement for fast startup. You may use
>>> the "burst" keyword ONLY with the permission of the the server's owner.
>>> 99.99% of NTP installations will work very well using these rules". If
>>> yours does not, ask here for help!
>>
>> Now speaking about our system, not the middleware, with connections as
>> follows:
>>
>> External NTPs <-> 2 entry hosts <-> 8 other hosts.
>>
>> And iburst and minpoll=maxpoll=5 to improve the results.
>This indicates that you don't understand NTP. You should never ever
>change the minpoll and maxpoll values unless you understand the NTP
>algorithms in detail and understand the consequences of changing them.
>The default values were very carefully chosen to provide a balance
>between various conflicting requirements to provide the most stable
>clock discipline over a wide range of environments. You are
>undersampling at the start of NTP and then oversampling as it starts to
>stabilize the discipline loop.
The lower value on startup is to try to make ntp responsive at the
beginning, because it is so slow to correct errors. The longer value on
running is twofold-- to reduce the network demands on servers ( probably
the most important) and to increase the baseline for drift determination (
because of NTPs memoryless design) The former is important if you are using
public servers. The latter is important if you loose network connectivity
for days at a time. If you use your own server, and your network is stable,
a shorter maxpoll is better-- better control and faster response to
computer clock changes.
>>
>> Currently we observe that both entry hosts can both become restricted due to
>> large offsets on other hosts, so they become restricted and that will make
>> the software refuse to go on. Ideally that would not happen.
>>
>If the servers that it uses become divergent it will be unable to pick
>the "best" one and it will become unsynchronized.
>> I will try to formulate questions:
>>
>> When the other hosts synchronize to the entry hosts of our system, don't the
>> other hosts ntpd know when and how much these entry hosts changed their time
>> due to input?
No idea what this means. All a client gets is the offset of its clock with
respect to the server clock, and an estimate of the dispersion of the
server's clock. I do not know what "how much these entry hosts changed
their time due to input" means, but my guess is that the answer is "No, the
clients do not get any information about the internal workings of the
server"
>>
>> Would NTP would be more robust if we would configure routing on the entry
>> hosts, so that they can all speak directly with the external NTPs on their
>> own?
>>
>Yes since the stratum will be lower so that the error budget will also
>be lower.
That depends on your routers. If you have routers with bad
latency/dispersion, it may be worse due to network delays/variability.
>> Is the use of ntpdate before starting ntpd recommended and/or does the iburst
>> option replace it?
>>
>ntpdate is deprecated and is not normally needed. Make sure you start
>ntpd with the -g option to step the clock initially to close to the
>correct tick.
>Danny
>> Best regards,
>> Kay Hayen
|
|
0
|
|
|
|
Reply
|
Unruh
|
10/13/2008 5:14:17 PM
|
|
David,
The ntpd parameter constellation is indeed tuned for a necessarily wide
range of scenarios and may not be optimal for any particular case. From
an engineering point of view the solution for the minpoll/maxpoll issue
is obvious. Determine the Allan intercept as described in several
places, my papers and my book. The poll interval is carefully set at
1/32 the time constant, which should be at the intercept. So set minpoll
and maxpoll to the log2 of that value. Yes, the loop is purposely
oversampled with respect to the time constant, but not with respect to
the Allan intercept.
The Allan deviation characteristic displayed in the briefings on the NTP
project page should give a hint how the intercept varies with different
operating systems and network links. Indeed, if you have a fast LAN,
PCnet NIC and 3-GHz machine, the optimum poll interval is probably more
like 4 (16 s), but probably not 3 (8 s), as that invites increased
vulnerability to frequency surges.
The poll adjust algorithm does not do what you expect. See line 644 et
seq in ntp_loopfilter.c and the commentary there. This algorithm is the
result of literally 25 years of experiment and refinement. It is not
necessarily designed for rapid initial convergence; it is designed to be
sensitive to frequency surges once convergence has stabilized. The
frequency file avoids initial convergence if restarted after that.
Dave
David Woolley wrote:
> Danny Mayer wrote:
>
>> Kay Hayen wrote:
>
>
>>> And iburst and minpoll=maxpoll=5 to improve the results.
>>
>>
>> This indicates that you don't understand NTP. You should never ever
>> change the minpoll and maxpoll values unless you understand the NTP
>> algorithms in detail and understand the consequences of changing them.
>> The default values were very carefully chosen to provide a balance
>> between various conflicting requirements to provide the most stable
>
>
> Those conflicting requirements make assumptions about the environment in
> which NTP is operating. Those assumptions probably aren't valid when
> the servers are on the same high speed, low traffic, network. Having
> said that, one shouldn't just set minpoll and maxpoll low, but should
> actually measure the results and find optimum values for the actual
> conditions.
>
>> clock discipline over a wide range of environments. You are
>> undersampling at the start of NTP and then oversampling as it starts to
>> stabilize the discipline loop.
>
>
> ntpd always oversamples. Changing the limits limits the range of filter
> time constants used. Setting it low, improves convergence on startup,
> and re-convergence after a temperature change, which is why there is so
> much use of it - ntpd is failing to meet a market demand, and setting
> both these low has become the urban folklore solution. It also tends to
> minimise the value of "offset" at other times, but that is not
> necessarily good, as offset is not the same thing as error, and,
> ideally, would be uncorrelated with it.
>
> (ntpd starts to back off the time constant long before the startup
> transient is complete, so keeping it artificially low helps there. For
> temperature changes, it takes time for the time constant to ramp down,
> which is avoided by keeping it low.)
>
> The reasons for not doing it are that it makes ntpd try to follow short
> term variations in offset, which are likely to be due to network
> conditions, rather than true time errors, and it makes the frequency
> less stable, which means that short durations are measured less
> accurately and time will diverge more quickly if connections to the
> servers is lost. It also imposes an unnecessary load on the servers.
>
>> ntpdate is deprecated and is not normally needed. Make sure you start
>> ntpd with the -g option to step the clock initially to close to the
>> correct tick.
>
>
> -g doesn't step the clock, it simply allows the clock to be stepped by
> more than 1000s, the first time. Clock stepping is still subject to the
> 128ms minimum offset. Both numbers are configurable, although changing
> them may disable some functions.
>
|
|
0
|
|
|
|
Reply
|
David
|
10/14/2008 3:53:03 AM
|
|
|
39 Replies
153 Views
(page loaded in 0.681 seconds)
|