There's been some discussion on the Fedora-devel list about ways to speed
up booting for workstations. One of the things that slows down the boot
process is waiting for an initial network time sync. I'd like to solicit
opinions on how to organize the interaction between ntpd, the OS, the boot
scripts, the network interfaces (which may come and go; think mobile
devices), and possible hot-plugged local time sources. There was also
discussion a few months ago about getting NTP server addresses from DHCP,
so that should be considered.
|
|
0
|
|
|
|
Reply
|
Kenneth
|
2/2/2005 2:48:02 PM |
|
With a good ntp.drift file and the use of iburst, ntpd should be able to get
things in line in 11-15 seconds.
H
|
|
0
|
|
|
|
Reply
|
Harlan
|
2/2/2005 4:21:44 PM
|
|
At 8:48 AM -0600 2005-02-02, Kenneth Porter wrote:
> There's been some discussion on the Fedora-devel list about ways to speed
> up booting for workstations. One of the things that slows down the boot
> process is waiting for an initial network time sync. I'd like to solicit
> opinions on how to organize the interaction between ntpd, the OS, the boot
> scripts, the network interfaces (which may come and go; think mobile
> devices), and possible hot-plugged local time sources.
The problem is that there are many services which really need
proper time sync in order to operate correctly. You could just turn
off time sync and let these things run freely, or you could find some
way to shift the startup sequence so that only those things that
depend on time sync are started after ntpd, and everything else is
started before.
But beyond the standard mechanisms to speed up the initialization
process of starting ntpd (e.g., using "iburst" on all the server
configuration lines in your /etc/ntp.conf, etc...), I don't see any
other ways to make this process faster.
Either you do or do not run ntpd. Either you do or do not run
those services which depend on good time sync. I don't see that
there can be any other options.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/2/2005 4:23:04 PM
|
|
Brad Knowles <brad@stop.mail-abuse.org> wrote in
news:mailman.4.1107361920.583.questions@lists.ntp.isc.org:
> The problem is that there are many services which really need
> proper time sync in order to operate correctly. You could just turn
> off time sync and let these things run freely, or you could find some
> way to shift the startup sequence so that only those things that
> depend on time sync are started after ntpd, and everything else is
> started before.
Does not "needing good time sync" imply communications with the outside
world? So wouldn't those items fail for other reasons if NTP wasn't up yet?
Ideally things that need quality time *and* connectivity would wake up when
both conditions came true, and take some other action when connectivity was
removed. It would be desirable for applications to be able to register to
be notified of these events.
|
|
0
|
|
|
|
Reply
|
Kenneth
|
2/2/2005 5:53:18 PM
|
|
On 2005-02-02, Brad Knowles <brad@stop.mail-abuse.org> wrote:
> At 8:48 AM -0600 2005-02-02, Kenneth Porter wrote:
>
>> There's been some discussion on the Fedora-devel list about ways to
>> speed up booting for workstations. One of the things that slows down
>> the boot process is waiting for an initial network time sync. I'd
>> like to solicit opinions on how to organize the interaction between
>> ntpd, the OS, the boot scripts, the network interfaces (which may
>> come and go; think mobile devices), and possible hot-plugged local
>> time sources.
>
> The problem is that there are many services which really need proper
> time sync in order to operate correctly.
<snip>
> But beyond the standard mechanisms to speed up the initialization
> process of starting ntpd (e.g., using "iburst" on all the server
> configuration lines in your /etc/ntp.conf, etc...), I don't see any
> other ways to make this process faster.
My informal tests show that ntpd needs somewhere between 7 and 20
seconds to intially set the clock (using 'ntpd -gq'). It would be
reasonable to assume that starting ntpd with '-g' will take roughly the
same amount of time.
We need to keep in mind the fact that we're talking about workstations,
not servers, and that 'ntpq -g' can, and usually does, run in the
background.
--
Steve Kostecke <kostecke@ntp.isc.org>
NTP Public Services Project - http://ntp.isc.org/
|
|
0
|
|
|
|
Reply
|
Steve
|
2/2/2005 6:08:39 PM
|
|
At 11:53 AM -0600 2005-02-02, Kenneth Porter wrote:
> Ideally things that need quality time *and* connectivity would wake up when
> both conditions came true, and take some other action when connectivity was
> removed. It would be desirable for applications to be able to register to
> be notified of these events.
Feel free to rewrite the entire OS and all the applications to
work in this manner.
In the meanwhile, the rest of us will try to find what solutions
we can that will work with the existing OSes and applications we have
available to us.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/2/2005 6:52:35 PM
|
|
Kenneth,
This is the single most persistent issue in the engineering design of
NTP. There must be tradeoffs between security, robustenss, accuracy and
initial delay. In the current design compromise, a server is acceptable
only after three/four rounds of messages and the ensemble time is
acceptable with at least one of possibly several acceptable servers.
With IBURST mode, takes takes 6-8 seconds.
For better robustness use "tos minclock N", where the at least N
(default 1) servers must be acceptable to set the clock. Tonight I put
in a "tos maxdist M", where M is the distance threshold below which the
server is acceptable. Set "tos maxdist 16" and the first sample received
from any server will set the clock likety-split. Of course, essentially
all the mitigation algorithms using multiple-sample redundancy and
multiple-server diversity are systematically defeated. You might as well
use SNTP.
Dave
Kenneth Porter wrote:
> There's been some discussion on the Fedora-devel list about ways to speed
> up booting for workstations. One of the things that slows down the boot
> process is waiting for an initial network time sync. I'd like to solicit
> opinions on how to organize the interaction between ntpd, the OS, the boot
> scripts, the network interfaces (which may come and go; think mobile
> devices), and possible hot-plugged local time sources. There was also
> discussion a few months ago about getting NTP server addresses from DHCP,
> so that should be considered.
|
|
0
|
|
|
|
Reply
|
David
|
2/3/2005 3:16:46 AM
|
|
At 3:00 PM +0000 2005-02-03, Tom Smith wrote:
> I know the subject has been workstations, but let's talk for a moment
> about this religion as it concerns servers - like the ones that run
> telephone companies, stock exchanges, and banks inside heavily
> defended firewalls. It's the same issue, it's just that the stakes
> are higher. The issue is how quickly can you get these
> systems back up at boot. 15-30 seconds is a long time to wait.
> Too long.
With a decent drift file and using iburst throughout the server
definitions, Steve has demonstrated that you can get this down to
about seven seconds across a cable modem line, without any local
Stratum 1 time servers. This is real-world experience.
If your servers are time-sensitive, then they should be the ones
best able to tolerate that extra seven seconds during the startup
phase. The more important it is to have the time correct, the more
important it is that you be able to tolerate short delays on startup.
If you want to make that delay shorter, I guess you could package
Stratum 1 refclocks with every machine.
> We're not talking about one-shot sampling for maintaining the time,
> so comparisons to SNTP are not helpful. We're talking about speed of
> acquistion of an initial "good enough" time, keeping in mind that the
> perfect is often the enemy of the good.
Seven seconds to find "good enough" seems to be a pretty good
balance to me.
However, if you want to shoot yourself in the foot with a
thermonuclear bomb, please feel free to do so.
> The reason why so many of your constituency keep bringing this
> subject up is that they know that ntpd needs a good (not perfect)
> estimate of the time before it starts and that critical systems
> can't wait for perfection to get that estimate.
I don't know how much more perfection you want. If you can't
tolerate seven seconds during the startup phase, then you're using
the wrong protocols.
If you need a true fault-tolerant real-time system with
resolution down to attoseconds, and those seven additional seconds
during startup are effectively seven additional aeons for your
application and you cannot possibly tolerate them, then you shouldn't
be using TCP/IP, Unix, or anything else that anyone on this list
would recognize.
In this case, ntpd is the least of your worries.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/3/2005 2:56:51 PM
|
|
David L. Mills wrote:
> Kenneth,
>
> This is the single most persistent issue in the engineering design of
> NTP. There must be tradeoffs between security, robustenss, accuracy and
> initial delay. In the current design compromise, a server is acceptable
> only after three/four rounds of messages and the ensemble time is
> acceptable with at least one of possibly several acceptable servers.
> With IBURST mode, takes takes 6-8 seconds.
>
> For better robustness use "tos minclock N", where the at least N
> (default 1) servers must be acceptable to set the clock. Tonight I put
> in a "tos maxdist M", where M is the distance threshold below which the
> server is acceptable. Set "tos maxdist 16" and the first sample received
> from any server will set the clock likety-split. Of course, essentially
> all the mitigation algorithms using multiple-sample redundancy and
> multiple-server diversity are systematically defeated. You might as well
> use SNTP.
David,
I know the subject has been workstations, but let's talk for a moment
about this religion as it concerns servers - like the ones that run
telephone companies, stock exchanges, and banks inside heavily
defended firewalls. It's the same issue, it's just that the stakes
are higher. The issue is how quickly can you get these
systems back up at boot. 15-30 seconds is a long time to wait.
Too long.
We're not talking about one-shot sampling for maintaining the time,
so comparisons to SNTP are not helpful. We're talking about speed of
acquistion of an initial "good enough" time, keeping in mind that the
perfect is often the enemy of the good.
You might argue that if boot time is critical, just let the server come
up with whatever random time it comes up with and let ntpd fix
it up later. Give it a "-g" so it doesn't complain. A lot of folks
have tried this in the past inadvertently (and continue to do so)
by neglecting to put ntpdate into their boot sequence ahead of ntpd.
I've fixed a lot of systems whose drift files were pinned
at 500 ppm and whose systems ran perpetually fast or slow as
a result. We've also spent a lot of money fruitlessly replacing
motherboards on those systems. Turning a large initial offset over
to ntpd is decidedly NOT a Good Idea.
The reason why so many of your constituency keep bringing this
subject up is that they know that ntpd needs a good (not perfect)
estimate of the time before it starts and that critical systems
can't wait for perfection to get that estimate.
-Tom
________________________________________________________________________
Tom Smith smith@alum.mit.edu,smith@cag.lkg.hp.com
Hewlett-Packard Company Tel: +1 (603) 884-6329
110 Spit Brook Road ZKO1-3/H42 FAX: +1 (603) 884-6484
Nashua, New Hampshire 03062-2698, USA Mobile: +1 978 397 3411
|
|
0
|
|
|
|
Reply
|
Tom
|
2/3/2005 3:00:21 PM
|
|
Brad Knowles wrote:
> At 3:00 PM +0000 2005-02-03, Tom Smith wrote:
>
>> I know the subject has been workstations, but let's talk for a moment
>> about this religion as it concerns servers - like the ones that run
>> telephone companies, stock exchanges, and banks inside heavily
>> defended firewalls. It's the same issue, it's just that the stakes
>> are higher. The issue is how quickly can you get these
>> systems back up at boot. 15-30 seconds is a long time to wait.
>> Too long.
>
>
> With a decent drift file ...
Precisely. The decent drift file is a problem. It sometimes doesn't
exist after a large initial offset has been turned over to ntpd.
Now, if ntpd all by itself did a quick acquisition, didn't
count that initial clock setting in any way into the frequency
correction, and blocked the startup script progress until that
was complete and it was safe to proceed with starting the
time-sensitive stuff, all would be well with the world.
If I've missed how that happens, I apologize.
> If your servers are time-sensitive, then they should be the ones
> best able to tolerate that extra seven seconds during the startup
> phase.
You should discuss that with a bank or stock exchange that
is losing millions in transactions during those seconds
or with public utility that is paying the government
penalties for downtime. :-)
> The more important it is to have the time correct, the more
> important it is that you be able to tolerate short delays on startup.
Well, no. As David pointed out in his posting, all engineering
is a matter of tradeoffs. For many users, the tradeoff needs
to be 'Get these applications up fast on a "good enough"
time and refine the time (and frequency) in the background.'
> Seven seconds to find "good enough" seems to be a pretty good
> balance to me.
>
Perhaps it is. For you. If it's seven seconds.
> I don't know how much more perfection you want. If you can't
> tolerate seven seconds during the startup phase, then you're using the
> wrong protocols.
I don't want perfection at all. That's the point. ntpd gets it as right
as it needs to be. It just has to have something reasonable
to work with when it starts.
|
|
0
|
|
|
|
Reply
|
Tom
|
2/3/2005 4:06:42 PM
|
|
Brad Knowles <brad@stop.mail-abuse.org> writes:
> If you want to make that delay shorter, I guess you could
> package Stratum 1 refclocks with every machine.
I'd be waiting for minutes if I waited till ntpd decided it had a
"good enough" estimate of the Motorola Oncore's time. Ntpd *could*
have its first time estimate accurate to well under 1ms in half a
second on average. The problem is, it won't tell you the time or step
the system clock until it filters the crap out of the refclock signal.
Ntpd has a hard act to follow. The old ntpdate program was a near
ideal solution from the perspective of booting. It did its job
quickly and got off the pot.
-wolfgang
--
Wolfgang S. Rupprecht http://www.wsrcc.com/wolfgang/
Hate software patents? Sign here: http://thankpoland.info/
|
|
0
|
|
|
|
Reply
|
Wolfgang
|
2/3/2005 5:23:57 PM
|
|
> Precisely. The decent drift file is a problem. It sometimes doesn't
> exist after a large initial offset has been turned over to ntpd.
> Now, if ntpd all by itself did a quick acquisition, didn't
> count that initial clock setting in any way into the frequency
> correction, and blocked the startup script progress until that
> was complete and it was safe to proceed with starting the
> time-sensitive stuff, all would be well with the world.
> If I've missed how that happens, I apologize.
I always wondered why ntpd would throw a valuable drift value out the window
when it encounters an offset at startup, and would try to explain the
offset with a ridiculous frequency error of +- 500ppm and take forever to
settle, rather than correct the initial offset, load the drift value, and
be happy.
Such behaviour would also make the startup scripts easier.
Roman Maeder
|
|
0
|
|
|
|
Reply
|
Roman
|
2/3/2005 6:33:16 PM
|
|
At 10:21 AM -0500 2005-02-03, Tom Smith wrote:
>> With a decent drift file ...
>
> Precisely. The decent drift file is a problem. It sometimes doesn't
> exist after a large initial offset has been turned over to ntpd.
Even without a good drift file, you can still sync very quickly.
It may not be seven seconds, it may be fifteen. But that should
still be tolerable.
> You should discuss that with a bank or stock exchange that
> is losing millions in transactions during those seconds
> or with public utility that is paying the government
> penalties for downtime. :-)
My wife is general counsel, head of legal, and secretary to the
board for the world's largest clearing and settlement firm for
European stocks and bonds, with an annual turnover in excess of 256
trillion Euro last year, and assets under management in excess of
twelve trillion Euros. Yes, I mean trillion.
When Argentina decides not to make their interest payments on
their Brady bond debt, because 80% of their bonds are held through
her company, the final decision of whether or not to declare what
used to be the world's seventh largest economy officially bankrupt,
arrives on her desk.
I understand the scale of the problem. With over a trillion Euro
of turnover in a single workday, milliseconds do count.
> Well, no. As David pointed out in his posting, all engineering
> is a matter of tradeoffs. For many users, the tradeoff needs
> to be 'Get these applications up fast on a "good enough"
> time and refine the time (and frequency) in the background.'
So, doing a single query and taking whatever bogus time may be
set from that server, is more important than waiting a few more
seconds to make sure that you've got a pretty good timesync?
I'm sorry, I don't buy it. The bigger the application, the more
you have to lose, the more important it is to have good time sync.
See above -- milliseconds do count.
> Perhaps it is. For you. If it's seven seconds.
For financial applications, if the server goes down, then your
N+M fault-tolerant systems take over that load, and not a single
transaction is dropped or excessively delayed. If your main server
facility is taken out by terrorists or natural disaster, then your
hot spare facility, that is located hundreds or thousands of miles
away, takes over and a few transactions might be delayed, but nothing
is dropped.
If you're running something that mission-critical and you don't
have those kinds of systems (which can tolerate a few extra seconds
of startup time in order to ensure that the time is set reasonably
well), then you are shooting yourself in the foot with a
thermonuclear weapon, and you will get what you deserve.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/3/2005 7:19:12 PM
|
|
Tom,
It is true that ntpd is specifically engineered for Internet badlands
where popcorn spikes, evil masqueraders, misbehaving clocks and other
vermin might poison your DNS cache. A client using the pool servers
scheme really needs this ammunition.
The current version tries hard to strike a compromise between verifiable
assertions and acquisition speed. However, with the twinkle I described
earlier you can engineer any compromise your wish, including set the
clock on the first response received and in principle when the first two
responses from at least two servers and so on.
In very many simulation runs here I found it hard to get into a true
lockup condtion where the daemon did not recover from large initial time
or frequency offsets, even with the -x option. However, there are some
things the simulator can't pick up, like a large frequency offset
pre-programmed in the kernel. The recommended repair procedure should
that somehow happen is to run "ntptime -f 0" to kill the kernel offset
and then remove the ntp.drift file. Upon restart ntpd measures the
intrinsic frequency offset over about fifteen minutes, sets the clock
and resumes normal operation. I would think this a good way to determine
if a motherboard is or is not acceptable. I've seen lots of motherboards
and found most of them within 100 PPM and all of them within 500 PPM.
Even if over 500 PPM the clock is still disciplined but the offset
cannot be forced to zero.
So, best advice is to run ntpd with -g and "tos maxdist 16" in the
configuration file. I assume the version with this command will soon
appear as a snapshot. Note that the only thing this does is admit
servers to the selection algorithm no matter what the synchronization
distance is. Ordinarily the distance starts from 16 and reduces by half
for each response received. Other than this criterion, the algorithms
operate without change. You can do your own security analysis.
Dave
Tom Smith wrote:
> David L. Mills wrote:
>
>> Kenneth,
>>
>> This is the single most persistent issue in the engineering design of
>> NTP. There must be tradeoffs between security, robustenss, accuracy
>> and initial delay. In the current design compromise, a server is
>> acceptable only after three/four rounds of messages and the ensemble
>> time is acceptable with at least one of possibly several acceptable
>> servers. With IBURST mode, takes takes 6-8 seconds.
>>
>> For better robustness use "tos minclock N", where the at least N
>> (default 1) servers must be acceptable to set the clock. Tonight I put
>> in a "tos maxdist M", where M is the distance threshold below which
>> the server is acceptable. Set "tos maxdist 16" and the first sample
>> received from any server will set the clock likety-split. Of course,
>> essentially all the mitigation algorithms using multiple-sample
>> redundancy and multiple-server diversity are systematically defeated.
>> You might as well use SNTP.
>
>
> David,
>
> I know the subject has been workstations, but let's talk for a moment
> about this religion as it concerns servers - like the ones that run
> telephone companies, stock exchanges, and banks inside heavily
> defended firewalls. It's the same issue, it's just that the stakes
> are higher. The issue is how quickly can you get these
> systems back up at boot. 15-30 seconds is a long time to wait.
> Too long.
>
> We're not talking about one-shot sampling for maintaining the time,
> so comparisons to SNTP are not helpful. We're talking about speed of
> acquistion of an initial "good enough" time, keeping in mind that the
> perfect is often the enemy of the good.
>
> You might argue that if boot time is critical, just let the server come
> up with whatever random time it comes up with and let ntpd fix
> it up later. Give it a "-g" so it doesn't complain. A lot of folks
> have tried this in the past inadvertently (and continue to do so)
> by neglecting to put ntpdate into their boot sequence ahead of ntpd.
> I've fixed a lot of systems whose drift files were pinned
> at 500 ppm and whose systems ran perpetually fast or slow as
> a result. We've also spent a lot of money fruitlessly replacing
> motherboards on those systems. Turning a large initial offset over
> to ntpd is decidedly NOT a Good Idea.
>
> The reason why so many of your constituency keep bringing this
> subject up is that they know that ntpd needs a good (not perfect)
> estimate of the time before it starts and that critical systems
> can't wait for perfection to get that estimate.
>
> -Tom
> ________________________________________________________________________
> Tom Smith smith@alum.mit.edu,smith@cag.lkg.hp.com
> Hewlett-Packard Company Tel: +1 (603) 884-6329
> 110 Spit Brook Road ZKO1-3/H42 FAX: +1 (603) 884-6484
> Nashua, New Hampshire 03062-2698, USA Mobile: +1 978 397 3411
>
|
|
0
|
|
|
|
Reply
|
David
|
2/3/2005 7:50:06 PM
|
|
Tom,
I get nervous about nonquantitative statements, since they might start
urban legends. A "decent" frequency file is one created when first
starting ntpd without the file and letting it determine the intrinsic
frequency error. This takes about fifteen minutes. However, the
frequency file itself is written only after the first hour and at hourly
intervals after that. The discipline should be stable even if the
frequency file is present and intentionally set as much as +-500 PPM in
error and that even with a large initial time offset. This has been
confirmed by simulation; however, the simulations assume the adjtime()
system call operates as in original Unix model; the Solaris adjtime() is
a killer when large offsets are involved.
Dave
Tom Smith wrote:
> Brad Knowles wrote:
> > At 3:00 PM +0000 2005-02-03, Tom Smith wrote:
> >
> >> I know the subject has been workstations, but let's talk for a moment
> >> about this religion as it concerns servers - like the ones that run
> >> telephone companies, stock exchanges, and banks inside heavily
> >> defended firewalls. It's the same issue, it's just that the stakes
> >> are higher. The issue is how quickly can you get these
> >> systems back up at boot. 15-30 seconds is a long time to wait.
> >> Too long.
> >
> >
> > With a decent drift file ...
>
> Precisely. The decent drift file is a problem. It sometimes doesn't
> exist after a large initial offset has been turned over to ntpd.
> Now, if ntpd all by itself did a quick acquisition, didn't
> count that initial clock setting in any way into the frequency
> correction, and blocked the startup script progress until that
> was complete and it was safe to proceed with starting the
> time-sensitive stuff, all would be well with the world.
> If I've missed how that happens, I apologize.
>
> > If your servers are time-sensitive, then they should be the ones
> > best able to tolerate that extra seven seconds during the startup
> > phase.
>
> You should discuss that with a bank or stock exchange that
> is losing millions in transactions during those seconds
> or with public utility that is paying the government
> penalties for downtime. :-)
>
> > The more important it is to have the time correct, the more
> > important it is that you be able to tolerate short delays on startup.
>
> Well, no. As David pointed out in his posting, all engineering
> is a matter of tradeoffs. For many users, the tradeoff needs
> to be 'Get these applications up fast on a "good enough"
> time and refine the time (and frequency) in the background.'
>
> > Seven seconds to find "good enough" seems to be a pretty good
> > balance to me.
> >
>
> Perhaps it is. For you. If it's seven seconds.
>
> > I don't know how much more perfection you want. If you can't
> > tolerate seven seconds during the startup phase, then you're using the
> > wrong protocols.
>
> I don't want perfection at all. That's the point. ntpd gets it as right
> as it needs to be. It just has to have something reasonable
> to work with when it starts.
|
|
0
|
|
|
|
Reply
|
David
|
2/3/2005 8:09:16 PM
|
|
Brad Knowles <brad@stop.mail-abuse.org> wrote in
news:mailman.20.1107442717.583.questions@lists.ntp.isc.org:
> If you want to make that delay shorter, I guess you could package
> Stratum 1 refclocks with every machine.
Given the cheap price of a consumer GPS receiver, this doesn't sound that
crazy. How fast can ntpd lock on to the NMEA messages?
|
|
0
|
|
|
|
Reply
|
Kenneth
|
2/3/2005 8:20:53 PM
|
|
"David L. Mills" <mills@udel.edu> wrote in news:cttv9l$ql7$1
@dewey.udel.edu:
> It is true that ntpd is specifically engineered for Internet badlands
> where popcorn spikes, evil masqueraders, misbehaving clocks and other
> vermin might poison your DNS cache. A client using the pool servers
> scheme really needs this ammunition.
What about mobile clients? In the mobile environment, what does ntpd do
when you sever the network connection (ie. undock)? Suppose I undock
(taking down eth1) and plug in down the hall with the built-in NIC
(bringing up eth0). What must one do to make ntpd tolerant of that? Or must
mobile apps give up quality time because their network interfaces are
transient? Would one need to script a complete stop/start of ntpd whenever
interfaces come and go?
|
|
0
|
|
|
|
Reply
|
Kenneth
|
2/3/2005 8:23:12 PM
|
|
How timely this thread is. Given my volatile NTP situation...I am
starting to believe that using a VME-based GPS source as a reference
clock to my NTP daemon in a VME-based SPARC SBC isn't a good idea. It
has been one thing after another...and the "clients" are not getting
good time from their sole-source server. The main problem that I've
encountered is +500 PPM and steps during operations - which has
effectively ruined my data. So...on to plan B???
I am wondering if a simple, static network time server is a better
option...guess so at this point.
Thanks for the info.
Kit
------------------------------------------------------------------------
Kit Plummer
Operations Research and System Performance Dept.
Raytheon Missile Systems
On Feb 3, 2005, at 2:13 PM, Tom Smith wrote:
> David L. Mills wrote:
>> I get nervous about nonquantitative statements, since they might
>> start urban legends. A "decent" frequency file is one created when
>> first starting ntpd without the file and letting it determine the
>> intrinsic frequency error. This takes about fifteen minutes. However,
>> the frequency file itself is written only after the first hour and at
>> hourly intervals after that. The discipline should be stable even if
>> the frequency file is present and intentionally set as much as +-500
>> PPM in error and that even with a large initial time offset. This has
>> been confirmed by simulation; however, the simulations assume the
>> adjtime() system call operates as in original Unix model; the Solaris
>> adjtime() is a killer when large offsets are involved.
>
> A physicist I worked with early in my career taught me a very
> useful law. "Different things vary."
>
> I couldn't tell you how many ntp.drift files I've encountered
> with a vlaue of +-500.000. It's a lot. There are many ways this
> can occur, but all of them involve ntpd starting up against a large
> offset with its reference clocks and/or shutting down while it is
> working one off, the latter usually because of an NTP misconfiguration,
> but also sometimes because of thunderstorms in July. Others have
> also observed how this happens on mobile systems that get booted
> and shut down a lot.
>
> Once a system is in this state, it depends on the specifics of the OS
> how long or even if that system will "settle". I can assure you that
> for some systems, if not most, this is most assuredly not 15 minutes,
> might be days, or might, for all practical purposes, be never. These
> are the systems on which the ordinary non-NTP-expert system
> manager or field support team will go through several rounds of
> battery or crystal or motherboard or system replacement before
> anyone tells them to just delete the drift file and start over.
>
> _______________________________________________________________________
> _
> Tom Smith smith@alum.mit.edu,smith@cag.lkg.hp.com
> Hewlett-Packard Company Tel: +1 (603) 884-6329
> 110 Spit Brook Road ZKO1-3/H42 FAX: +1 (603) 884-6484
> Nashua, New Hampshire 03062-2698, USA Mobile: +1 978 397 3411
> _______________________________________________
> questions mailing list
> questions@lists.ntp.isc.org
> https://lists.ntp.isc.org/mailman/listinfo/questions
>
|
|
0
|
|
|
|
Reply
|
Kit
|
2/3/2005 9:09:23 PM
|
|
David L. Mills wrote:
> I get nervous about nonquantitative statements, since they might start
> urban legends. A "decent" frequency file is one created when first
> starting ntpd without the file and letting it determine the intrinsic
> frequency error. This takes about fifteen minutes. However, the
> frequency file itself is written only after the first hour and at hourly
> intervals after that. The discipline should be stable even if the
> frequency file is present and intentionally set as much as +-500 PPM in
> error and that even with a large initial time offset. This has been
> confirmed by simulation; however, the simulations assume the adjtime()
> system call operates as in original Unix model; the Solaris adjtime() is
> a killer when large offsets are involved.
A physicist I worked with early in my career taught me a very
useful law. "Different things vary."
I couldn't tell you how many ntp.drift files I've encountered
with a vlaue of +-500.000. It's a lot. There are many ways this
can occur, but all of them involve ntpd starting up against a large
offset with its reference clocks and/or shutting down while it is
working one off, the latter usually because of an NTP misconfiguration,
but also sometimes because of thunderstorms in July. Others have
also observed how this happens on mobile systems that get booted
and shut down a lot.
Once a system is in this state, it depends on the specifics of the OS
how long or even if that system will "settle". I can assure you that
for some systems, if not most, this is most assuredly not 15 minutes,
might be days, or might, for all practical purposes, be never. These
are the systems on which the ordinary non-NTP-expert system
manager or field support team will go through several rounds of
battery or crystal or motherboard or system replacement before
anyone tells them to just delete the drift file and start over.
________________________________________________________________________
Tom Smith smith@alum.mit.edu,smith@cag.lkg.hp.com
Hewlett-Packard Company Tel: +1 (603) 884-6329
110 Spit Brook Road ZKO1-3/H42 FAX: +1 (603) 884-6484
Nashua, New Hampshire 03062-2698, USA Mobile: +1 978 397 3411
|
|
0
|
|
|
|
Reply
|
Tom
|
2/3/2005 9:13:47 PM
|
|
Tom Smith wrote:
> David L. Mills wrote:
>
>> Kenneth,
>>
>> This is the single most persistent issue in the engineering design of
>> NTP. There must be tradeoffs between security, robustenss, accuracy
>> and initial delay. In the current design compromise, a server is
>> acceptable only after three/four rounds of messages and the ensemble
>> time is acceptable with at least one of possibly several acceptable
>> servers. With IBURST mode, takes takes 6-8 seconds.
>>
>> For better robustness use "tos minclock N", where the at least N
>> (default 1) servers must be acceptable to set the clock. Tonight I
>> put in a "tos maxdist M", where M is the distance threshold below
>> which the server is acceptable. Set "tos maxdist 16" and the first
>> sample received from any server will set the clock likety-split. Of
>> course, essentially all the mitigation algorithms using
>> multiple-sample redundancy and multiple-server diversity are
>> systematically defeated. You might as well use SNTP.
>
>
> David,
>
> I know the subject has been workstations, but let's talk for a moment
> about this religion as it concerns servers - like the ones that run
> telephone companies, stock exchanges, and banks inside heavily
> defended firewalls. It's the same issue, it's just that the stakes
> are higher. The issue is how quickly can you get these
> systems back up at boot. 15-30 seconds is a long time to wait.
> Too long.
>
> We're not talking about one-shot sampling for maintaining the time,
> so comparisons to SNTP are not helpful. We're talking about speed of
> acquistion of an initial "good enough" time, keeping in mind that the
> perfect is often the enemy of the good.
>
> You might argue that if boot time is critical, just let the server come
> up with whatever random time it comes up with and let ntpd fix
> it up later. Give it a "-g" so it doesn't complain. A lot of folks
> have tried this in the past inadvertently (and continue to do so)
> by neglecting to put ntpdate into their boot sequence ahead of ntpd.
> I've fixed a lot of systems whose drift files were pinned
> at 500 ppm and whose systems ran perpetually fast or slow as
> a result. We've also spent a lot of money fruitlessly replacing
> motherboards on those systems. Turning a large initial offset over
> to ntpd is decidedly NOT a Good Idea.
>
> The reason why so many of your constituency keep bringing this
> subject up is that they know that ntpd needs a good (not perfect)
> estimate of the time before it starts and that critical systems
> can't wait for perfection to get that estimate.
>
> -Tom
> ________________________________________________________________________
> Tom Smith smith@alum.mit.edu,smith@cag.lkg.hp.com
> Hewlett-Packard Company Tel: +1 (603) 884-6329
> 110 Spit Brook Road ZKO1-3/H42 FAX: +1 (603) 884-6484
> Nashua, New Hampshire 03062-2698, USA Mobile: +1 978 397 3411
>
Tom,
I think it all boils down to how good is "good enough"? Your snail
mail address suggests that you're in VMS Engineering or, if not, you
could throw rocks at them! VMS, although it keeps time in units of 100
nanosecond "ticks", only updates the clock every ten milliseconds!
(Measure with micrometer, mark with chalk, cut with ax?)
The documented and supported interfaces in VMS only permit you to set
the clock and read the clock to the nearest ten milliseconds.
If you are willing to have a server come up with a clock error of one
second, just boot and start ntpd later. If you need to have time
correct to the nearest microsecond, you are using the wrong tools.
If you are, in fact, talking about VMS and TCP/IP services, porting the
latest version of the NTP reference implementation would help you speed
up the startup. The last time I looked, TCP/IP Services (V5.1) was
using a port of NTP V3-5.91 which does not support the iburst
qualifier. Iburst allows much faster initialization; it gets you a
"good enough" time and frequency correction in about 8 seconds.
If eight seconds is too long, you need to specify how quickly you need
to acquire the correct time and how accurate the time must be. These
two specifications pretty much determine the tools you must use to meet
them; e.g. if you need time correct to +/- 50 nanoseconds and need to
set it within 100 microseconds, you will almost certainly need to use a
hardware reference clock such as a cesium or rubidium standard.
|
|
0
|
|
|
|
Reply
|
Richard
|
2/3/2005 9:52:53 PM
|
|
Wolfgang S. Rupprecht wrote:
>Brad Knowles <brad@stop.mail-abuse.org> writes:
>
>
>> If you want to make that delay shorter, I guess you could
>>package Stratum 1 refclocks with every machine.
>>
>>
>
>I'd be waiting for minutes if I waited till ntpd decided it had a
>"good enough" estimate of the Motorola Oncore's time. Ntpd *could*
>have its first time estimate accurate to well under 1ms in half a
>second on average. The problem is, it won't tell you the time or step
>the system clock until it filters the crap out of the refclock signal.
>
>Ntpd has a hard act to follow. The old ntpdate program was a near
>ideal solution from the perspective of booting. It did its job
>quickly and got off the pot.
>
>-wolfgang
>
>
Since the Motorola driver polls the clock every sixteen seconds and
since four samples are required, sixty-four seconds should be sufficient!
|
|
0
|
|
|
|
Reply
|
Richard
|
2/3/2005 10:06:05 PM
|
|
Brad Knowles wrote:
> At 10:21 AM -0500 2005-02-03, Tom Smith wrote:
>
>>> With a decent drift file ...
>>
>>
>> Precisely. The decent drift file is a problem. It sometimes doesn't
>> exist after a large initial offset has been turned over to ntpd.
>
>
> Even without a good drift file, you can still sync very quickly.
> It may not be seven seconds, it may be fifteen. But that should still
> be tolerable.
>
>> You should discuss that with a bank or stock exchange that
>> is losing millions in transactions during those seconds
>> or with public utility that is paying the government
>> penalties for downtime. :-)
>
>
> My wife is general counsel, head of legal, and secretary to the
> board for the world's largest clearing and settlement firm for
> European stocks and bonds, with an annual turnover in excess of 256
> trillion Euro last year, and assets under management in excess of
> twelve trillion Euros. Yes, I mean trillion.
>
> When Argentina decides not to make their interest payments on
> their Brady bond debt, because 80% of their bonds are held through her
> company, the final decision of whether or not to declare what used to
> be the world's seventh largest economy officially bankrupt, arrives on
> her desk.
>
> I understand the scale of the problem. With over a trillion Euro
> of turnover in a single workday, milliseconds do count.
>
>> Well, no. As David pointed out in his posting, all engineering
>> is a matter of tradeoffs. For many users, the tradeoff needs
>> to be 'Get these applications up fast on a "good enough"
>> time and refine the time (and frequency) in the background.'
>
>
> So, doing a single query and taking whatever bogus time may be set
> from that server, is more important than waiting a few more seconds to
> make sure that you've got a pretty good timesync?
>
> I'm sorry, I don't buy it. The bigger the application, the more
> you have to lose, the more important it is to have good time sync.
>
>
> See above -- milliseconds do count.
>
>> Perhaps it is. For you. If it's seven seconds.
>
>
> For financial applications, if the server goes down, then your N+M
> fault-tolerant systems take over that load, and not a single
> transaction is dropped or excessively delayed. If your main server
> facility is taken out by terrorists or natural disaster, then your hot
> spare facility, that is located hundreds or thousands of miles away,
> takes over and a few transactions might be delayed, but nothing is
> dropped.
>
> If you're running something that mission-critical and you don't
> have those kinds of systems (which can tolerate a few extra seconds of
> startup time in order to ensure that the time is set reasonably well),
> then you are shooting yourself in the foot with a thermonuclear
> weapon, and you will get what you deserve.
>
It's worth noting that, on September 11, 2001, Merrill-Lynch "failed
over'" to a duplicate data center in Westchester County in something
like four minutes; without losing a single transaction or a byte of
data. If downtime costs you $50,000,000/minute, the budget to ensure
that there isn't any downtime is practically infinite!!!!!
|
|
0
|
|
|
|
Reply
|
Richard
|
2/3/2005 10:14:26 PM
|
|
Tom Smith escreveu:
> David L. Mills wrote:
>
>> I get nervous about nonquantitative statements, since they might start
>> urban legends. A "decent" frequency file is one created when first
>> starting ntpd without the file and letting it determine the intrinsic
>> frequency error. This takes about fifteen minutes. However, the
>> frequency file itself is written only after the first hour and at
>> hourly intervals after that. The discipline should be stable even if
>> the frequency file is present and intentionally set as much as +-500
>> PPM in error and that even with a large initial time offset. This has
>> been confirmed by simulation; however, the simulations assume the
>> adjtime() system call operates as in original Unix model; the Solaris
>> adjtime() is a killer when large offsets are involved.
>
>
> A physicist I worked with early in my career taught me a very
> useful law. "Different things vary."
>
> I couldn't tell you how many ntp.drift files I've encountered
> with a vlaue of +-500.000. It's a lot. There are many ways this
> can occur, but all of them involve ntpd starting up against a large
> offset with its reference clocks and/or shutting down while it is
> working one off, the latter usually because of an NTP misconfiguration,
> but also sometimes because of thunderstorms in July. Others have
> also observed how this happens on mobile systems that get booted
> and shut down a lot.
>
> Once a system is in this state, it depends on the specifics of the OS
> how long or even if that system will "settle". I can assure you that
> for some systems, if not most, this is most assuredly not 15 minutes,
> might be days, or might, for all practical purposes, be never. These
> are the systems on which the ordinary non-NTP-expert system
> manager or field support team will go through several rounds of
> battery or crystal or motherboard or system replacement before
> anyone tells them to just delete the drift file and start over.
So, if I manage to set the initial time within a good aproximation od
"real" time using ntpdate using 5 servers as explained earlyer, would
you recomend to delete the drift file and let it start all over again?
Does this afect the time for the server to start serving?
Alain
|
|
0
|
|
|
|
Reply
|
Alain
|
2/3/2005 11:03:22 PM
|
|
At 9:03 PM -0200 2005-02-03, Alain wrote:
> So, if I manage to set the initial time within a good aproximation od
> "real" time using ntpdate using 5 servers as explained earlyer, would
> you recomend to delete the drift file and let it start all over again?
If the drift file states +/- 500ms, then I would be inclined to
remove it on startup no matter what. Of course, I would also be
inclined to use "ntpd -g" instead of ntpdate, for the reasons I've
previously given you, not to mention the issues that you have
mentioned here regarding the use of ntpdate against servers that are
not up.
> Does this afect the time for the server to start serving?
If the drift file is whacked out, it's better to remove it and
let the server re-calculate, than to try to compensate for a whacked
out drift file. You'll start serving time faster if you let the
system try to calculate the real situation once the clock has been
set to a reasonable value on startup.
Keep in mind that the server will have to take extra time to get
to a state where it can start serving time to clients, if it has had
to calculate a new drift file or try to deal with a whacked out drift
file. If you're sensitive to seven seconds of additional startup
time, then you really, really want to make sure that you always keep
a reasonable drift file.
Given the kinds of time periods we're talking about, I still have
yet to see a good reason for using "ntpdate; ntpd" instead of "ntpd
-g".
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/3/2005 11:55:01 PM
|
|
Alain wrote:
> So, if I manage to set the initial time within a good aproximation od
> "real" time using ntpdate using 5 servers as explained earlyer, would
> you recomend to delete the drift file and let it start all over again?
>
> Does this afect the time for the server to start serving?
I assume you asked this of me, rather than of Dave. Your method
seems to me to excellent.
If you do that, there should be no need to clean out the
characteristic drift that was so carefully computed over a long
period of time. Cleaning out the drift file is a drastic measure
required only if you do NOT pre-set the clock before starting ntpd
and you end up, as a result, with a bogus re-calculated drift rate.
What you have to calculate into the boot time is the time you have
to block while stepping the clock to the "right" time in the first place.
Putting ntpdate into the boot sequence means that time will be however
long it takes for ntpdate to complete, exactly that long, and no longer
and that when ntpdate completes the rest of your boot sequence, including
ntpd is "safe". That's why people are unhappy about ntpdate being
removed without an equivalent way of doing the same thing with
ntpd:
1) step the clock to a "good enough" time
do this within a few seconds
block while doing it
don't touch/change the pre-computed drift while doing this
2) start ntpd
3) start time-dependent services
It would certainly be possible for ntpd to do this all by itself,
but unless I misunderstand, the only way to do this with just
ntpd at the moment is:
1) ntpd -gq
2) sleep [guess how long ntpd -gq will take, worst case, to set the clock]
-or-
spin on ps looking for ntpd to start and then end
3) start ntpd [normal operation options]
4) start time-dependent services
Which is not the same thing.
________________________________________________________________________
Tom Smith smith@alum.mit.edu,smith@cag.lkg.hp.com
Hewlett-Packard Company Tel: +1 (603) 884-6329
110 Spit Brook Road ZKO1-3/H42 FAX: +1 (603) 884-6484
Nashua, New Hampshire 03062-2698, USA Mobile: +1 978 397 3411
|
|
0
|
|
|
|
Reply
|
Tom
|
2/4/2005 12:30:40 AM
|
|
At 12:30 AM +0000 2005-02-04, Tom Smith wrote:
> 1) ntpd -gq
> 2) sleep [guess how long ntpd -gq will take, worst case, to set the clock]
> -or-
> spin on ps looking for ntpd to start and then end
> 3) start ntpd [normal operation options]
> 4) start time-dependent services
If you're going to follow this model, then in step #2 you can
send queries to ntpd to see what it's status is, or you could just
monitor the clock and see if/when there is a large step. Or, you
could monitor syslog output, or look for changes in the ntpd.pid
file, or any number of other things.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/4/2005 12:32:08 AM
|
|
At 12:30 AM +0000 2005-02-04, Tom Smith wrote:
> 1) step the clock to a "good enough" time
> do this within a few seconds
> block while doing it
> don't touch/change the pre-computed drift while doing this
> 2) start ntpd
> 3) start time-dependent services
Thinking about this some more, what I hear you saying is that
ntpd should not background itself until such time as it has
calculated the initial offset and stepped the clock as necessary to
get you within normal slew distance, at which point it backgrounds
itself and continues normal startup operations, starts working on
calculating/updating the drift, etc....
At least, it should have this as an optional startup mode,
perhaps as a part of "-g".
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/4/2005 12:59:00 AM
|
|
At 2:23 PM -0600 2005-02-03, Kenneth Porter wrote:
> What about mobile clients? In the mobile environment, what does ntpd do
> when you sever the network connection (ie. undock)?
The current version of ntpd is not well-suited for use with
mobile clients. It assumes that your IP address does not change. It
assumes that your local network latency is pretty constant, and any
variation in network latency is largely due to WAN issues.
It assumes that there is just one absolute "right" canonical
time, and that all servers are closer or farther away from that, and
that it's job is to try to figure out which server is currently the
closest (using long-term statistical data) and then to make that one
the syspeer.
It assumes a whole host of things that are not suitable to a
mobile environment.
> Suppose I undock
> (taking down eth1) and plug in down the hall with the built-in NIC
> (bringing up eth0).
You may no longer be anywhere "close" to the upstream time
servers you had previously configured, and may have to tear down all
your server associations and put up all new ones.
Any time you switch interfaces, get a new IP address, or any of
the other things that are typical for mobile environments, you're
basically looking at a complete stop and restart, if not a complete
stop, re-configure (presumably with totally different servers), and
re-start.
> What must one do to make ntpd tolerant of that?
I'm not convinced that is possible. At least, not in the way
you're thinking of.
> Or must mobile apps give up quality time because their network
> interfaces are transient?
I think you have to assume that a mobile client would have to be
a lot more dependant on the local network services that are provided
wherever they are, and the DHCP server to tell you what the
appropriate time servers are for you to use, etc.... Then you stop
ntpd, throw away everything you previously had, completely
re-configure with the new information, and restart ntpd.
> Would one need to script a complete
> stop/start of ntpd whenever interfaces come and go?
Yup.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/4/2005 1:12:33 AM
|
|
At 2:20 PM -0600 2005-02-03, Kenneth Porter wrote:
>> If you want to make that delay shorter, I guess you could package
>> Stratum 1 refclocks with every machine.
>
> Given the cheap price of a consumer GPS receiver, this doesn't sound that
> crazy. How fast can ntpd lock on to the NMEA messages?
Cheap GPS receivers don't do NMEA. Those that do NMEA don't give
you a PPS signal, so they're pretty much useless in this role. You
have to look very carefully at GPS devices before you can be sure
that you've found one that will be able to give you a good time
reference.
For example, the Motorola Oncore 12 might be a piece of crap,
while the Motorola Oncore 12+ might be good. Or the Garmin 18 LCS
might be good, but all the other Garmin 18 models might be garbage.
You've got to know what you're looking for.
Please note that I don't have any specific knowledge of which
models are good or bad, I just pulled these examples out of the air.
You need to do your own research to ensure that you've got a GPS
device that will be useful as input to a proposed Stratum 1 time
source.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/4/2005 1:15:57 AM
|
|
Brad Knowles wrote:
> At 12:30 AM +0000 2005-02-04, Tom Smith wrote:
>
>> 1) step the clock to a "good enough" time
>> do this within a few seconds
>> block while doing it
>> don't touch/change the pre-computed drift while doing this
>> 2) start ntpd
>> 3) start time-dependent services
>
>
> Thinking about this some more, what I hear you saying is that ntpd
> should not background itself until such time as it has calculated the
> initial offset and stepped the clock as necessary to get you within
> normal slew distance, at which point it backgrounds itself and continues
> normal startup operations, starts working on calculating/updating the
> drift, etc....
>
> At least, it should have this as an optional startup mode, perhaps
> as a part of "-g".
>
Bingo. And DON'T TOUCH THE DRIFT until the second phase starts.
I'd also say that should always be the startup mode, -g or no -g
(or at least -g should be implied during that first phase, whether
or not it's requested for the "permanent" phase).
-Tom
________________________________________________________________________
Tom Smith smith@alum.mit.edu,smith@cag.lkg.hp.com
Hewlett-Packard Company Tel: +1 (603) 884-6329
110 Spit Brook Road ZKO1-3/H42 FAX: +1 (603) 884-6484
Nashua, New Hampshire 03062-2698, USA Mobile: +1 978 397 3411
|
|
0
|
|
|
|
Reply
|
Tom
|
2/4/2005 1:18:14 AM
|
|
On 2005-02-04, Tom Smith <smith@cag.lkg.hp.com> wrote:
> Putting ntpdate into the boot sequence means that time will be however
> long it takes for ntpdate to complete, exactly that long, and no longer
> and that when ntpdate completes the rest of your boot sequence, including
> ntpd is "safe". That's why people are unhappy about ntpdate being
> removed without an equivalent way of doing the same thing with
> ntpd:
'ntpd -gq' blocks just like ntpdate. Try it and you'll see. The
difference is that ntpd does a bit more work before it sets the clock
and 'ntpd -gq' can use NTP authentication.
> 1) step the clock to a "good enough" time
> do this within a few seconds
> block while doing it
> don't touch/change the pre-computed drift while doing this
If a drift file exists ntpd uses its contents. Any updates to the drift
file are not written out until ntpd has been running for an hour.
> 2) start ntpd
> 3) start time-dependent services
>
> It would certainly be possible for ntpd to do this all by itself,
It is.
> but unless I misunderstand, the only way to do this with just
> ntpd at the moment is:
>
> 1) ntpd -gq
> 2) sleep [guess how long ntpd -gq will take, worst case, to set the clock]
> -or-
> spin on ps looking for ntpd to start and then end
That's not necessary because 'ntpd -gq' blocks just like ntpdate ... as
long as the init script doesn't background it.
--
Steve Kostecke <kostecke@ntp.isc.org>
NTP Public Services Project - http://ntp.isc.org/
|
|
0
|
|
|
|
Reply
|
Steve
|
2/4/2005 2:03:50 AM
|
|
Brad & Co.,
I make no value judgements in any form here, but I do observe just about
every motherboard on the planet today has a TOY clock which is updated
occasionally by the operating system. So, a machine coming up to sell an
Airbus is probably pretty close, at least within the second if the TOY
time since the last update was not too long and in that case you might
have lost a large number of Airbus sales anyway.
Here's how to sell more Airba. The operating system should write the TOY
time and offset within the second to a file from time to time keeping a
history of at least the last two updates. Assuming the motherboard is
not tossed in the frigid sea or boiling desert, the operating system (or
NTP) can retrieve these values, compute the time and current offset and
set the clock with good accuracy.
Dave
Brad Knowles wrote:
> At 10:21 AM -0500 2005-02-03, Tom Smith wrote:
>
>>> With a decent drift file ...
>>
>>
>> Precisely. The decent drift file is a problem. It sometimes doesn't
>> exist after a large initial offset has been turned over to ntpd.
>
>
> Even without a good drift file, you can still sync very quickly. It
> may not be seven seconds, it may be fifteen. But that should still be
> tolerable.
>
>> You should discuss that with a bank or stock exchange that
>> is losing millions in transactions during those seconds
>> or with public utility that is paying the government
>> penalties for downtime. :-)
>
>
> My wife is general counsel, head of legal, and secretary to the
> board for the world's largest clearing and settlement firm for European
> stocks and bonds, with an annual turnover in excess of 256 trillion Euro
> last year, and assets under management in excess of twelve trillion
> Euros. Yes, I mean trillion.
>
> When Argentina decides not to make their interest payments on their
> Brady bond debt, because 80% of their bonds are held through her
> company, the final decision of whether or not to declare what used to be
> the world's seventh largest economy officially bankrupt, arrives on her
> desk.
>
> I understand the scale of the problem. With over a trillion Euro of
> turnover in a single workday, milliseconds do count.
>
>> Well, no. As David pointed out in his posting, all engineering
>> is a matter of tradeoffs. For many users, the tradeoff needs
>> to be 'Get these applications up fast on a "good enough"
>> time and refine the time (and frequency) in the background.'
>
>
> So, doing a single query and taking whatever bogus time may be set
> from that server, is more important than waiting a few more seconds to
> make sure that you've got a pretty good timesync?
>
> I'm sorry, I don't buy it. The bigger the application, the more you
> have to lose, the more important it is to have good time sync.
>
>
> See above -- milliseconds do count.
>
>> Perhaps it is. For you. If it's seven seconds.
>
>
> For financial applications, if the server goes down, then your N+M
> fault-tolerant systems take over that load, and not a single transaction
> is dropped or excessively delayed. If your main server facility is
> taken out by terrorists or natural disaster, then your hot spare
> facility, that is located hundreds or thousands of miles away, takes
> over and a few transactions might be delayed, but nothing is dropped.
>
> If you're running something that mission-critical and you don't have
> those kinds of systems (which can tolerate a few extra seconds of
> startup time in order to ensure that the time is set reasonably well),
> then you are shooting yourself in the foot with a thermonuclear weapon,
> and you will get what you deserve.
>
|
|
0
|
|
|
|
Reply
|
David
|
2/4/2005 2:15:44 AM
|
|
Kenneth,
I have no idea what you are assuming about start/stop. The scheme I
mentioned has nothing to do with that, just the number of responses
necessary to set the clock. The only practical way to disciplne the time
when hibernating or changing wifi cards is using the TOY chip and this
can be quite accurate if the scheme I mentioned in my last is adopted.
PLEASE NOTE: There have been a number of changes since the 4.2.0
distribution, which is now well over a year old. Some of the schemes I
have mentioned here are only in recent versions.
Dave
Kenneth Porter wrote:
> "David L. Mills" <mills@udel.edu> wrote in news:cttv9l$ql7$1
> @dewey.udel.edu:
>
>
>>It is true that ntpd is specifically engineered for Internet badlands
>>where popcorn spikes, evil masqueraders, misbehaving clocks and other
>>vermin might poison your DNS cache. A client using the pool servers
>>scheme really needs this ammunition.
>
>
> What about mobile clients? In the mobile environment, what does ntpd do
> when you sever the network connection (ie. undock)? Suppose I undock
> (taking down eth1) and plug in down the hall with the built-in NIC
> (bringing up eth0). What must one do to make ntpd tolerant of that? Or must
> mobile apps give up quality time because their network interfaces are
> transient? Would one need to script a complete stop/start of ntpd whenever
> interfaces come and go?
|
|
0
|
|
|
|
Reply
|
David
|
2/4/2005 2:24:15 AM
|
|
Tom,
Hangups as you describe is exactly what the simulator is designed to
reveal and it has revealed them from time to time as folks learn new
ways to misconfigure and warp the hardware and new operating system
violations of the Principle of Least Astonishment occur. I try to keep
ahead of those as they pop up, but do confirm that any combination of
broken frequency file and initial clock error does damp out eventually,
although sometimes with behavior like a pinball machine. Emphasis added:
I can confirm all atrocities found do damp out only on the latest
(development) version.
There are a great many divergent views on what to expect of the NTP
model, some very contradictory and unworkable in a specific combination
of ntpd and j-random operating system. Solaris 2.7 comes to mind where
the CPU clock frequency was determined by the kernel in error and far
beyond the tolerance of ntpd. I've had to simulate all of these things,
including a system with a clock resolution of one second (sic) (and it
works). It could be your systems suffer from one or another of such ills
or simply that you are using an older version not yet recently tamed. It
could be your operating system has the same ill-mannered behavior as
current Solaris adjtime(). With large time adjustments, this turns ntpd
into a megawatt oscillator.
Dave
Tom Smith wrote:
> David L. Mills wrote:
>
>> I get nervous about nonquantitative statements, since they might start
>> urban legends. A "decent" frequency file is one created when first
>> starting ntpd without the file and letting it determine the intrinsic
>> frequency error. This takes about fifteen minutes. However, the
>> frequency file itself is written only after the first hour and at
>> hourly intervals after that. The discipline should be stable even if
>> the frequency file is present and intentionally set as much as +-500
>> PPM in error and that even with a large initial time offset. This has
>> been confirmed by simulation; however, the simulations assume the
>> adjtime() system call operates as in original Unix model; the Solaris
>> adjtime() is a killer when large offsets are involved.
>
>
> A physicist I worked with early in my career taught me a very
> useful law. "Different things vary."
>
> I couldn't tell you how many ntp.drift files I've encountered
> with a vlaue of +-500.000. It's a lot. There are many ways this
> can occur, but all of them involve ntpd starting up against a large
> offset with its reference clocks and/or shutting down while it is
> working one off, the latter usually because of an NTP misconfiguration,
> but also sometimes because of thunderstorms in July. Others have
> also observed how this happens on mobile systems that get booted
> and shut down a lot.
>
> Once a system is in this state, it depends on the specifics of the OS
> how long or even if that system will "settle". I can assure you that
> for some systems, if not most, this is most assuredly not 15 minutes,
> might be days, or might, for all practical purposes, be never. These
> are the systems on which the ordinary non-NTP-expert system
> manager or field support team will go through several rounds of
> battery or crystal or motherboard or system replacement before
> anyone tells them to just delete the drift file and start over.
>
> ________________________________________________________________________
> Tom Smith smith@alum.mit.edu,smith@cag.lkg.hp.com
> Hewlett-Packard Company Tel: +1 (603) 884-6329
> 110 Spit Brook Road ZKO1-3/H42 FAX: +1 (603) 884-6484
> Nashua, New Hampshire 03062-2698, USA Mobile: +1 978 397 3411
|
|
0
|
|
|
|
Reply
|
David
|
2/4/2005 2:44:50 AM
|
|
Kit,
At least until recently USNO was using GPS radios and VME interfaces on
HP machines with excellent results. You can easily find out whether the
CPU clock is the culprit by starting ntpd with a "disable ntp" in the
configuration file and watching another server. COmpute the intrinsic
frequency error from the change in offsets over a few hours. A fix that
works even with older untamed versions is to craft a frequency file with
the value computed.
Dave
Kit Plummer wrote:
> How timely this thread is. Given my volatile NTP situation...I am
> starting to believe that using a VME-based GPS source as a reference
> clock to my NTP daemon in a VME-based SPARC SBC isn't a good idea. It
> has been one thing after another...and the "clients" are not getting
> good time from their sole-source server. The main problem that I've
> encountered is +500 PPM and steps during operations - which has
> effectively ruined my data. So...on to plan B???
>
> I am wondering if a simple, static network time server is a better
> option...guess so at this point.
>
> Thanks for the info.
>
> Kit
>
> ------------------------------------------------------------------------
> Kit Plummer
> Operations Research and System Performance Dept.
> Raytheon Missile Systems
>
> On Feb 3, 2005, at 2:13 PM, Tom Smith wrote:
>
>> David L. Mills wrote:
>>
>>> I get nervous about nonquantitative statements, since they might
>>> start urban legends. A "decent" frequency file is one created when
>>> first starting ntpd without the file and letting it determine the
>>> intrinsic frequency error. This takes about fifteen minutes.
>>> However, the frequency file itself is written only after the first
>>> hour and at hourly intervals after that. The discipline should be
>>> stable even if the frequency file is present and intentionally set
>>> as much as +-500 PPM in error and that even with a large initial
>>> time offset. This has been confirmed by simulation; however, the
>>> simulations assume the adjtime() system call operates as in original
>>> Unix model; the Solaris adjtime() is a killer when large offsets are
>>> involved.
>>
>>
>> A physicist I worked with early in my career taught me a very
>> useful law. "Different things vary."
>>
>> I couldn't tell you how many ntp.drift files I've encountered
>> with a vlaue of +-500.000. It's a lot. There are many ways this
>> can occur, but all of them involve ntpd starting up against a large
>> offset with its reference clocks and/or shutting down while it is
>> working one off, the latter usually because of an NTP misconfiguration,
>> but also sometimes because of thunderstorms in July. Others have
>> also observed how this happens on mobile systems that get booted
>> and shut down a lot.
>>
>> Once a system is in this state, it depends on the specifics of the OS
>> how long or even if that system will "settle". I can assure you that
>> for some systems, if not most, this is most assuredly not 15 minutes,
>> might be days, or might, for all practical purposes, be never. These
>> are the systems on which the ordinary non-NTP-expert system
>> manager or field support team will go through several rounds of
>> battery or crystal or motherboard or system replacement before
>> anyone tells them to just delete the drift file and start over.
>>
>> _______________________________________________________________________ _
>> Tom Smith smith@alum.mit.edu,smith@cag.lkg.hp.com
>> Hewlett-Packard Company Tel: +1 (603) 884-6329
>> 110 Spit Brook Road ZKO1-3/H42 FAX: +1 (603) 884-6484
>> Nashua, New Hampshire 03062-2698, USA Mobile: +1 978 397 3411
>> _______________________________________________
>> questions mailing list
>> questions@lists.ntp.isc.org
>> https://lists.ntp.isc.org/mailman/listinfo/questions
>>
>
|
|
0
|
|
|
|
Reply
|
David
|
2/4/2005 2:52:12 AM
|
|
Alain,
No, don't use ntpdate at all. Use ntpd -g with the tos maxdist 16 if you
want instant synchronization. There are several combinations of the tos
commmand that could be used to modify behavior, such as the minsane and
minclocks arguments. TO determine what they do you have to understand
the icky algorithms; however, after studying the briefings at the
project page and understanding how the algorithms work, it should be
fairly obvious.
Dave
Dave
Alain wrote:
>
>
> Tom Smith escreveu:
>
>> David L. Mills wrote:
>>
>>> I get nervous about nonquantitative statements, since they might
>>> start urban legends. A "decent" frequency file is one created when
>>> first starting ntpd without the file and letting it determine the
>>> intrinsic frequency error. This takes about fifteen minutes. However,
>>> the frequency file itself is written only after the first hour and at
>>> hourly intervals after that. The discipline should be stable even if
>>> the frequency file is present and intentionally set as much as +-500
>>> PPM in error and that even with a large initial time offset. This has
>>> been confirmed by simulation; however, the simulations assume the
>>> adjtime() system call operates as in original Unix model; the Solaris
>>> adjtime() is a killer when large offsets are involved.
>>
>>
>>
>> A physicist I worked with early in my career taught me a very
>> useful law. "Different things vary."
>>
>> I couldn't tell you how many ntp.drift files I've encountered
>> with a vlaue of +-500.000. It's a lot. There are many ways this
>> can occur, but all of them involve ntpd starting up against a large
>> offset with its reference clocks and/or shutting down while it is
>> working one off, the latter usually because of an NTP misconfiguration,
>> but also sometimes because of thunderstorms in July. Others have
>> also observed how this happens on mobile systems that get booted
>> and shut down a lot.
>>
>> Once a system is in this state, it depends on the specifics of the OS
>> how long or even if that system will "settle". I can assure you that
>> for some systems, if not most, this is most assuredly not 15 minutes,
>> might be days, or might, for all practical purposes, be never. These
>> are the systems on which the ordinary non-NTP-expert system
>> manager or field support team will go through several rounds of
>> battery or crystal or motherboard or system replacement before
>> anyone tells them to just delete the drift file and start over.
>
>
> So, if I manage to set the initial time within a good aproximation od
> "real" time using ntpdate using 5 servers as explained earlyer, would
> you recomend to delete the drift file and let it start all over again?
>
> Does this afect the time for the server to start serving?
>
> Alain
|
|
0
|
|
|
|
Reply
|
David
|
2/4/2005 4:13:40 AM
|
|
Steve Kostecke wrote:
> 'ntpd -gq' blocks just like ntpdate. Try it and you'll see. The
> difference is that ntpd does a bit more work before it sets the clock
> and 'ntpd -gq' can use NTP authentication.
You're absolutely right. My mistake.
>>It would certainly be possible for ntpd to do this all by itself,
>
>
> It is.
>
Yes, I wasn't clear. I meant that a single invocation of
ntpd could do this all by itself, as suggested by Brad.
The remaining issue, then, is that of the time required.
On a system already within less than a millisecond of nearly
all of its servers:
# time ntpd -gq [13 servers in ntp.conf]
ntpd: time slew -0.000373s
real 1m43.03s
user 0m0.10s
sys 0m0.60s
# time ntpdate -b [three selected servers]
3 Feb 22:44:45 ntpdate[186032]: step time server [IP address] offset -0.000157 sec
real 0m0.90s
user 0m0.00s
sys 0m0.00s
I'm sure the time would be less with ntpd with fewer servers,
provided that all of them were up at boot time.
-Tom
________________________________________________________________________
Tom Smith smith@alum.mit.edu,smith@cag.lkg.hp.com
Hewlett-Packard Company Tel: +1 (603) 884-6329
110 Spit Brook Road ZKO1-3/H42 FAX: +1 (603) 884-6484
Nashua, New Hampshire 03062-2698, USA Mobile: +1 978 397 3411
|
|
0
|
|
|
|
Reply
|
Tom
|
2/4/2005 4:21:19 AM
|
|
Tom,
The code I see in the ntpdate source does an adjtime() for all offsets,
even large ones. I don't see a settimeofday() or equivalent. Thus, if
you run ntpdate and it produces a large correction (maybe a second or
more), you should wait until that adjustment is made before starting
ntpd. That's about 2000 s of wait for a 1-s adjustment with stock Unix
kernels and slew rate 500 PPM. You wouldn't have to wait that long for a
Solaris kernel, but you would have to wait. Why not give up and use the
-g option?
Dave
Tom Smith wrote:
> Alain wrote:
>
>> So, if I manage to set the initial time within a good aproximation od
>> "real" time using ntpdate using 5 servers as explained earlyer, would
>> you recomend to delete the drift file and let it start all over again?
>>
>> Does this afect the time for the server to start serving?
>
>
> I assume you asked this of me, rather than of Dave. Your method
> seems to me to excellent.
>
> If you do that, there should be no need to clean out the
> characteristic drift that was so carefully computed over a long
> period of time. Cleaning out the drift file is a drastic measure
> required only if you do NOT pre-set the clock before starting ntpd
> and you end up, as a result, with a bogus re-calculated drift rate.
>
> What you have to calculate into the boot time is the time you have
> to block while stepping the clock to the "right" time in the first place.
> Putting ntpdate into the boot sequence means that time will be however
> long it takes for ntpdate to complete, exactly that long, and no longer
> and that when ntpdate completes the rest of your boot sequence, including
> ntpd is "safe". That's why people are unhappy about ntpdate being
> removed without an equivalent way of doing the same thing with
> ntpd:
>
> 1) step the clock to a "good enough" time
> do this within a few seconds
> block while doing it
> don't touch/change the pre-computed drift while doing this
> 2) start ntpd
> 3) start time-dependent services
>
> It would certainly be possible for ntpd to do this all by itself,
> but unless I misunderstand, the only way to do this with just
> ntpd at the moment is:
>
> 1) ntpd -gq
> 2) sleep [guess how long ntpd -gq will take, worst case, to set the clock]
> -or-
> spin on ps looking for ntpd to start and then end
> 3) start ntpd [normal operation options]
> 4) start time-dependent services
>
> Which is not the same thing.
>
> ________________________________________________________________________
> Tom Smith smith@alum.mit.edu,smith@cag.lkg.hp.com
> Hewlett-Packard Company Tel: +1 (603) 884-6329
> 110 Spit Brook Road ZKO1-3/H42 FAX: +1 (603) 884-6484
> Nashua, New Hampshire 03062-2698, USA Mobile: +1 978 397 3411
|
|
0
|
|
|
|
Reply
|
David
|
2/4/2005 4:25:13 AM
|
|
Brad,
Why do you need to wait? The ntpd will exit when the clock is set, just
like ntpdate and, like ntpdate, it will either step the clock or leave
the residual adjustment in the kernel to be amortized over whatever
period the particular kernel supports. This could be a considerable
interval, just like with ntpdate.
Dave
Brad Knowles wrote:
> At 12:30 AM +0000 2005-02-04, Tom Smith wrote:
>
>> 1) ntpd -gq
>> 2) sleep [guess how long ntpd -gq will take, worst case, to set the
>> clock]
>> -or-
>> spin on ps looking for ntpd to start and then end
>> 3) start ntpd [normal operation options]
>> 4) start time-dependent services
>
>
> If you're going to follow this model, then in step #2 you can send
> queries to ntpd to see what it's status is, or you could just monitor
> the clock and see if/when there is a large step. Or, you could monitor
> syslog output, or look for changes in the ntpd.pid file, or any number
> of other things.
>
|
|
0
|
|
|
|
Reply
|
David
|
2/4/2005 4:29:16 AM
|
|
Tom,
I don't think you have the model right. The ntpd never "backgrounds
itself", but keeps on ticking according to a state machine which
controls whether or not to do a direct frequency measurement rather than
the usual incremental feedback loop, which depends on whether the
frequecy file is present. The only thing the -g does is exit the daemon
when the clock is first set.
Dave
Tom Smith wrote:
> Brad Knowles wrote:
>
>> At 12:30 AM +0000 2005-02-04, Tom Smith wrote:
>>
>>> 1) step the clock to a "good enough" time
>>> do this within a few seconds
>>> block while doing it
>>> don't touch/change the pre-computed drift while doing this
>>> 2) start ntpd
>>> 3) start time-dependent services
>>
>>
>>
>> Thinking about this some more, what I hear you saying is that ntpd
>> should not background itself until such time as it has calculated the
>> initial offset and stepped the clock as necessary to get you within
>> normal slew distance, at which point it backgrounds itself and
>> continues normal startup operations, starts working on
>> calculating/updating the drift, etc....
>>
>> At least, it should have this as an optional startup mode, perhaps
>> as a part of "-g".
>>
>
> Bingo. And DON'T TOUCH THE DRIFT until the second phase starts.
>
> I'd also say that should always be the startup mode, -g or no -g
> (or at least -g should be implied during that first phase, whether
> or not it's requested for the "permanent" phase).
>
> -Tom
> ________________________________________________________________________
> Tom Smith smith@alum.mit.edu,smith@cag.lkg.hp.com
> Hewlett-Packard Company Tel: +1 (603) 884-6329
> 110 Spit Brook Road ZKO1-3/H42 FAX: +1 (603) 884-6484
> Nashua, New Hampshire 03062-2698, USA Mobile: +1 978 397 3411
|
|
0
|
|
|
|
Reply
|
David
|
2/4/2005 4:39:00 AM
|
|
Wolfgang S. Rupprecht wrote:
> Brad Knowles <brad@stop.mail-abuse.org> writes:
>
>> If you want to make that delay shorter, I guess you could
>>package Stratum 1 refclocks with every machine.
>
>
> I'd be waiting for minutes if I waited till ntpd decided it had a
> "good enough" estimate of the Motorola Oncore's time. Ntpd *could*
> have its first time estimate accurate to well under 1ms in half a
> second on average. The problem is, it won't tell you the time or step
> the system clock until it filters the crap out of the refclock signal.
>
> Ntpd has a hard act to follow. The old ntpdate program was a near
> ideal solution from the perspective of booting. It did its job
> quickly and got off the pot.
Except the few times when it messed up horribly:
I had one of the my GPS-based stratum 1 server go bad on me a few years
ago, and found out the hard way that _one_ critical server in our
infrastructure had been configured to use this particular server as the
only ntpdate reference:
It was rebooted during the time interval before the failing ntp/gps
server was located and turned off, with the result that this unix db
machine came up with a wildly wrong date.
After ntpdate had run, ntpd was started, but even though it had been
configured with four server lines, only the temporarily broken server
used by ntpdate was sufficiently close to be accepted, all the others
were deemed falsetickers.
A perfect ntpdate replacement needs to locate all or most of all the
configured servers, and run at least a couple of packets to each server,
and wait until a plurality have agreed on what the time is. This can be
handled in an absolute minimum of about 3-5 seconds without polling each
server more often than every two seconds.
If you configure fewer servers, then you'd need more packets to each
server before a sufficient reach value could be achieved for the set of
servers that agree.
The same goes in case of one or more falsetickers of course: They must
be detected and filtered out, which requires more packets to all the
servers than if they all agree.
Prof. Mills' latest tweak, allowing you to tune (i.e. reduce) the
required quality of the initial time estimate could be used to balance
your need for believable time vs time to achieve first sync.
Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
|
|
0
|
|
|
|
Reply
|
Terje
|
2/4/2005 7:31:52 AM
|
|
At 4:21 AM +0000 2005-02-04, Tom Smith wrote:
> # time ntpd -gq [13 servers in ntp.conf]
> ntpd: time slew -0.000373s
>
> real 1m43.03s
> user 0m0.10s
> sys 0m0.60s
Yes, but what does that ntp.conf look like? Are you using
iburst? Any authentication? Manually setting minpoll and/or
maxpoll? Did you have a good drift file to start with?
You need to provide some more specifics before you can make an
attempt to compare this to ntpdate.
> # time ntpdate -b [three selected servers]
> 3 Feb 22:44:45 ntpdate[186032]: step time server [IP address]
>offset -0.000157 sec
Not comparable. You need to include all thirteen servers before
this could potentially be considered comparable.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/4/2005 8:50:43 AM
|
|
At 8:18 PM -0500 2005-02-03, Tom Smith wrote:
> I'd also say that should always be the startup mode, -g or no -g
> (or at least -g should be implied during that first phase, whether
> or not it's requested for the "permanent" phase).
No. The point of "-g" is that the admin has to explicitly
request that ntpd be allowed to make large-scale changes, and ntpd
should not assume that it can do that unless specifically requested.
No. Absolutely not.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/4/2005 8:52:09 AM
|
|
Brad Knowles <brad@stop.mail-abuse.org> wrote in
news:mailman.36.1107479858.583.questions@lists.ntp.isc.org:
> At 2:20 PM -0600 2005-02-03, Kenneth Porter wrote:
>
>>> If you want to make that delay shorter, I guess you could package
>>> Stratum 1 refclocks with every machine.
>>
>> Given the cheap price of a consumer GPS receiver, this doesn't sound
>> that crazy. How fast can ntpd lock on to the NMEA messages?
>
> Cheap GPS receivers don't do NMEA. Those that do NMEA don't give
> you a PPS signal, so they're pretty much useless in this role. You
> have to look very carefully at GPS devices before you can be sure
> that you've found one that will be able to give you a good time
> reference.
Point taken, but I have a "looser" definition of "cheap", so a receiver
that costs a couple hundred dollars would be acceptable for a system that
was so critical.
How "bad" is the time from a GPS receiver with NMEA but no PPS? Is it
sufficient to use for an initial setting (replacing ntpdate and ntpd -g)?
IIRC these things typically issue an update once a second.
> For example, the Motorola Oncore 12 might be a piece of crap,
> while the Motorola Oncore 12+ might be good. Or the Garmin 18 LCS
> might be good, but all the other Garmin 18 models might be garbage.
> You've got to know what you're looking for.
>
>
> Please note that I don't have any specific knowledge of which
> models are good or bad, I just pulled these examples out of the air.
>
> You need to do your own research to ensure that you've got a GPS
> device that will be useful as input to a proposed Stratum 1 time
> source.
I got an eTrex Vista for xmas and will have to see what it's capable
of....
|
|
0
|
|
|
|
Reply
|
Kenneth
|
2/4/2005 1:52:03 PM
|
|
Brad Knowles escreveu:
>> Suppose I undock
>> (taking down eth1) and plug in down the hall with the built-in NIC
>> (bringing up eth0).
>
> You may no longer be anywhere "close" to the upstream time servers
> you had previously configured, and may have to tear down all your server
> associations and put up all new ones.
>
> Any time you switch interfaces, get a new IP address, or any of the
> other things that are typical for mobile environments, you're basically
> looking at a complete stop and restart, if not a complete stop,
> re-configure (presumably with totally different servers), and re-start.
Woldn normal operation cope with that? If it is a server at all,
continue with calculated historical drift. If both sets of servers are
in ntp.conf, after a while it will acquire sync again.
>> What must one do to make ntpd tolerant of that?
>
> I'm not convinced that is possible. At least, not in the way you're
> thinking of.
Maybe just be a bit more tolerant to a situation where all servers
become unavailable?
And maybe just use longer times to calculate drift if connection times
are not stable. Just a parameter adjustment maybe?
Alain
|
|
0
|
|
|
|
Reply
|
Alain
|
2/4/2005 4:47:59 PM
|
|
Terje,
I am baffled by your comments. Your "perfect ntpdate replacement"
described is precisely what ntpd -g does.
Dave
Terje Mathisen wrote:
> Wolfgang S. Rupprecht wrote:
>
>> Brad Knowles <brad@stop.mail-abuse.org> writes:
>>
>>> If you want to make that delay shorter, I guess you could
>>> package Stratum 1 refclocks with every machine.
>>
>>
>>
>> I'd be waiting for minutes if I waited till ntpd decided it had a
>> "good enough" estimate of the Motorola Oncore's time. Ntpd *could*
>> have its first time estimate accurate to well under 1ms in half a
>> second on average. The problem is, it won't tell you the time or step
>> the system clock until it filters the crap out of the refclock signal.
>>
>> Ntpd has a hard act to follow. The old ntpdate program was a near
>> ideal solution from the perspective of booting. It did its job
>> quickly and got off the pot.
>
>
> Except the few times when it messed up horribly:
>
> I had one of the my GPS-based stratum 1 server go bad on me a few years
> ago, and found out the hard way that _one_ critical server in our
> infrastructure had been configured to use this particular server as the
> only ntpdate reference:
>
> It was rebooted during the time interval before the failing ntp/gps
> server was located and turned off, with the result that this unix db
> machine came up with a wildly wrong date.
>
> After ntpdate had run, ntpd was started, but even though it had been
> configured with four server lines, only the temporarily broken server
> used by ntpdate was sufficiently close to be accepted, all the others
> were deemed falsetickers.
>
> A perfect ntpdate replacement needs to locate all or most of all the
> configured servers, and run at least a couple of packets to each server,
> and wait until a plurality have agreed on what the time is. This can be
> handled in an absolute minimum of about 3-5 seconds without polling each
> server more often than every two seconds.
>
> If you configure fewer servers, then you'd need more packets to each
> server before a sufficient reach value could be achieved for the set of
> servers that agree.
>
> The same goes in case of one or more falsetickers of course: They must
> be detected and filtered out, which requires more packets to all the
> servers than if they all agree.
>
> Prof. Mills' latest tweak, allowing you to tune (i.e. reduce) the
> required quality of the initial time estimate could be used to balance
> your need for believable time vs time to achieve first sync.
>
> Terje
>
|
|
0
|
|
|
|
Reply
|
David
|
2/4/2005 4:58:22 PM
|
|
At 2:47 PM -0200 2005-02-04, Alain wrote:
>> Any time you switch interfaces, get a new IP address, or any of
>> the other things that are typical for mobile environments, you're
>> basically looking at a complete stop and restart, if not a complete
>> stop, re-configure (presumably with totally different servers), and
>> re-start.
>
> Woldn normal operation cope with that? If it is a server at all,
> continue with calculated historical drift. If both sets of servers
> are in ntp.conf, after a while it will acquire sync again.
The way ntpd works is that it looks up the IP address of the
interfaces it is listening on when it boots, and then it explicitly
binds to those IP addresses. It never again looks up that
information. So, when your IP address changes, you have to restart
ntpd. You have the same problem if you switch interfaces.
If you briefly lose connectivity, and you keep the same
interface, and you keep the same IP address (maybe you were walking
around in your house and were in an area that doesn't have good
wireless coverage for a while), then you can help make recovery
easier by configuring the LOCAL refclock.
With a LOCAL refclock, if all the other servers go away, ntpd
will at least continue running in degraded mode and using the latest
calculated variables, and while the stratum value will drop (to
whatever you fudged for the LOCAL refclock), it won't go out of
state=4, and will start recovering as soon as you regain connectivity.
If you don't have a LOCAL refclock, I'm not entirely certain what
will happen, but it most likely won't be good and you will have to
restart.
> Maybe just be a bit more tolerant to a situation where all servers
> become unavailable?
Within limits, you can address that with a LOCAL refclock.
Of course, not all vendors ship ntpd compiled with support for a
LOCAL refclock, so you may have to rebuild your own ntpd binary.
> And maybe just use longer times to calculate drift if connection times
> are not stable. Just a parameter adjustment maybe?
You should leave that up to ntpd. These algorithms have been
developed the hard way over thirty years, and unless you're Einstein
or Dr. Mills, you're unlikely to be able to improve on them.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/4/2005 5:02:42 PM
|
|
Terje,
Ick. replace my -g with -q. Mea stromboli.
Dave
David L. Mills wrote:
> Terje,
>
> I am baffled by your comments. Your "perfect ntpdate replacement"
> described is precisely what ntpd -g does.
>
> Dave
>
> Terje Mathisen wrote:
>
>> Wolfgang S. Rupprecht wrote:
>>
>>> Brad Knowles <brad@stop.mail-abuse.org> writes:
>>>
>>>> If you want to make that delay shorter, I guess you could
>>>> package Stratum 1 refclocks with every machine.
>>>
>>>
>>>
>>>
>>> I'd be waiting for minutes if I waited till ntpd decided it had a
>>> "good enough" estimate of the Motorola Oncore's time. Ntpd *could*
>>> have its first time estimate accurate to well under 1ms in half a
>>> second on average. The problem is, it won't tell you the time or step
>>> the system clock until it filters the crap out of the refclock signal.
>>>
>>> Ntpd has a hard act to follow. The old ntpdate program was a near
>>> ideal solution from the perspective of booting. It did its job
>>> quickly and got off the pot.
>>
>>
>>
>> Except the few times when it messed up horribly:
>>
>> I had one of the my GPS-based stratum 1 server go bad on me a few
>> years ago, and found out the hard way that _one_ critical server in
>> our infrastructure had been configured to use this particular server
>> as the only ntpdate reference:
>>
>> It was rebooted during the time interval before the failing ntp/gps
>> server was located and turned off, with the result that this unix db
>> machine came up with a wildly wrong date.
>>
>> After ntpdate had run, ntpd was started, but even though it had been
>> configured with four server lines, only the temporarily broken server
>> used by ntpdate was sufficiently close to be accepted, all the others
>> were deemed falsetickers.
>>
>> A perfect ntpdate replacement needs to locate all or most of all the
>> configured servers, and run at least a couple of packets to each
>> server, and wait until a plurality have agreed on what the time is.
>> This can be handled in an absolute minimum of about 3-5 seconds
>> without polling each server more often than every two seconds.
>>
>> If you configure fewer servers, then you'd need more packets to each
>> server before a sufficient reach value could be achieved for the set
>> of servers that agree.
>>
>> The same goes in case of one or more falsetickers of course: They must
>> be detected and filtered out, which requires more packets to all the
>> servers than if they all agree.
>>
>> Prof. Mills' latest tweak, allowing you to tune (i.e. reduce) the
>> required quality of the initial time estimate could be used to balance
>> your need for believable time vs time to achieve first sync.
>>
>> Terje
>>
|
|
0
|
|
|
|
Reply
|
David
|
2/4/2005 5:21:08 PM
|
|
Answers to several messages:
----------
David L. Mills escreveu:
> Alain,
> No, don't use ntpdate at all. Use ntpd -g with the tos maxdist 16 if
you want instant synchronization. There are several combinations of the
tos commmand that could be used to modify behavior, such as the minsane
and minclocks arguments. TO determine what they do you have to
understand the icky algorithms; however, after studying the briefings at
the project page and understanding how the algorithms work, it should be
fairly obvious.
The problem is that -g has a limit of 1000 seconds. My worst case
scenario (I have seen it a lot) includes dead bateries and initial times
of 1/1/1980. Second worst is UCT missconfiguration, many hours off. And
not to forguet Summer Daylight savings time that is more than 1000s.
Even worse is that it exits ntpd. This requires another deamon to check
if it happened :(
If ntpdate disapears, there should be *some* way of handling this.
-----------
Tom Smith escreveu:
>> So, if I manage to set the initial time within a good aproximation od
>> "real" time using ntpdate using 5 servers as explained earlyer, would
>> you recomend to delete the drift file and let it start all over again?
>
> If you do that, there should be no need to clean out the
> characteristic drift that was so carefully computed over a long
> period of time. Cleaning out the drift file is a drastic measure
> required only if you do NOT pre-set the clock before starting ntpd
> and you end up, as a result, with a bogus re-calculated drift rate.
So I understand that testing drift and deleting it only if it is
+-500.00 is the more generic aproach for for all situations?
----------
Terje Mathisen escreveu:
> I had one of the my GPS-based stratum 1 server go bad on me a few years
> ago, and found out the hard way that _one_ critical server in our
> infrastructure had been configured to use this particular server as the
> only ntpdate reference:
>
> It was rebooted during the time interval before the failing ntp/gps
> server was located and turned off, with the result that this unix db
> machine came up with a wildly wrong date.
>
> After ntpdate had run, ntpd was started, but even though it had been
> configured with four server lines, only the temporarily broken server
> used by ntpdate was sufficiently close to be accepted, all the others
> were deemed falsetickers.
>
> A perfect ntpdate replacement needs to locate all or most of all the
> configured servers, and run at least a couple of packets to each server,
> and wait until a plurality have agreed on what the time is. This can be
> handled in an absolute minimum of about 3-5 seconds without polling each
> server more often than every two seconds.
>
> If you configure fewer servers, then you'd need more packets to each
> server before a sufficient reach value could be achieved for the set of
> servers that agree.
>
> The same goes in case of one or more falsetickers of course: They must
> be detected and filtered out, which requires more packets to all the
> servers than if they all agree.
Behind this is a missconfigured and forgotten configuration. I believe
the script to use many servers that are *in* ntp.conf would prvent that,
specialy that these servers will probably be checked periodicaly with
ntpq -p.
-----------
I believe than that for a more general situation:
1) ntpdate with many servers from ntp.conf (5 is reasonable)
2) check if drift is +-500.000, if so delete it.
3) Start ntpd
4) periodicaly check external (and internal) servers to see if a
reasonable number of them are still there.
Alain
|
|
0
|
|
|
|
Reply
|
Alain
|
2/4/2005 5:33:18 PM
|
|
On 2005-02-04, Alain <alainm@pobox.com> wrote:
> The problem is that -g has a limit of 1000 seconds.
No. -g allows ntpd to EXCEED the sanity limit of 1000 seconds.
-q by itsself is subject to the 1000 second limit.
-gq is not.
--
Steve Kostecke <kostecke@ntp.isc.org>
NTP Public Services Project - http://ntp.isc.org/
|
|
0
|
|
|
|
Reply
|
Steve
|
2/4/2005 6:18:17 PM
|
|
Brad Knowles wrote:
> At 4:21 AM +0000 2005-02-04, Tom Smith wrote:
>
>> # time ntpd -gq [13 servers in ntp.conf]
>> ntpd: time slew -0.000373s
>>
>> real 1m43.03s
>> user 0m0.10s
>> sys 0m0.60s
>
>
> Yes, but what does that ntp.conf look like? Are you using iburst?
> Any authentication? Manually setting minpoll and/or maxpoll? Did you
> have a good drift file to start with?
>
> You need to provide some more specifics before you can make an
> attempt to compare this to ntpdate.
>
>> # time ntpdate -b [three selected servers]
>> 3 Feb 22:44:45 ntpdate[186032]: step time server [IP address] offset
>> -0.000157 sec
>
>
> Not comparable. You need to include all thirteen servers before
> this could potentially be considered comparable.
>
As you wish...
# cat /etc/ntp.drift
-2.653
# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
LOCAL(1) LOCAL(1) 5 l 41 64 377 0.000 0.000 0.004
[IP.255] 0.0.0.0 16 u - 64 0 0.000 0.000 4000.00
-[name] .TRUE. 1 u 468 1024 377 12.741 -0.950 0.092
-[name] .WWVB. 1 u 470 1024 377 1.388 1.015 1.216
-[name] [name] 2 u 135 256 337 0.004 -0.490 0.440
-[name] .GPS. 1 u 997 1024 377 88.936 0.153 1.854
+[name] .GPS. 1 u 603 1024 377 88.146 0.546 0.119
*[name] .GPS. 1 u 764 1024 377 88.311 0.605 1.387
-[name] .GPS. 1 u 783 1024 377 73.649 0.766 12.499
#[name] [name] 2 u 652 1024 377 83.509 -0.371 0.725
-[name] .GPS. 1 u 720 1024 377 32.581 1.203 0.082
+[name] .GPS. 1 u 744 1024 377 105.719 0.604 1.952
-[name] .GPS. 1 u 686 1024 377 92.301 3.050 0.447
#[name] [name] 2 u 339 1024 176 0.796 -1.401 0.245
#[name] [name] 2 u 520 1024 376 10.376 -2.257 0.814
[name] 0.0.0.0 16 u - 1024 0 0.000 0.000 4000.00
# time ntpd -gq [above 14 servers/peers with iburst, plus local. No authentication.
Default min/max poll. All appropriate for long-term time
maintenance.]
ntpd: time slew 0.000527s
real 0m45.00s
user 0m0.11s
sys 0m0.55s
[Note that ntpd -gq isn't really done when it unblocks. It has only
initiated a slew. But we'll let that pass becasuse the time is
already way more than "good enough".]
# ntpdate -b [same set of servers and peers, except no local clock. Way
overconfigured for one-time "good enough" acquisition at boot.]
4 Feb 13:21:32 ntpdate[201766]: step time server [IP] offset 0.000647 sec
real 0m6.73s
user 0m0.00s
sys 0m0.00s
________________________________________________________________________
Tom Smith smith@alum.mit.edu,smith@cag.lkg.hp.com
Hewlett-Packard Company Tel: +1 (603) 884-6329
110 Spit Brook Road ZKO1-3/H42 FAX: +1 (603) 884-6484
Nashua, New Hampshire 03062-2698, USA Mobile: +1 978 397 3411
|
|
0
|
|
|
|
Reply
|
Tom
|
2/4/2005 6:59:39 PM
|
|
In article <ctutfh$4fq$1@dewey.udel.edu> "David L. Mills"
<mills@udel.edu> writes:
>
>The code I see in the ntpdate source does an adjtime() for all offsets,
>even large ones. I don't see a settimeofday() or equivalent.
That's just because it calls step_systime() in libntp/systime.c for that
- as it should, of course. Surely you didn't think the -b option was a
no-op and the documentation full of lies... (Well, I'm looking at 4.2.0
and assuming this functionality hasn't been removed in the development
version...)
--Per Hedeland
per@hedeland.org
|
|
0
|
|
|
|
Reply
|
per
|
2/4/2005 7:46:07 PM
|
|
In article <mailman.44.1107536681.583.questions@lists.ntp.isc.org> Brad
Knowles <brad@stop.mail-abuse.org> writes:
>
> If you briefly lose connectivity, and you keep the same
>interface, and you keep the same IP address (maybe you were walking
>around in your house and were in an area that doesn't have good
>wireless coverage for a while), then you can help make recovery
>easier by configuring the LOCAL refclock.
>
> With a LOCAL refclock, if all the other servers go away, ntpd
>will at least continue running in degraded mode and using the latest
>calculated variables
As has been pointed out many times here, there's no need to have a LOCAL
clock configured for that - ntpd will do it anyway. The only reason to
have a LOCAL clock configured is if you need ntpd to serve time to
others in this situation.
--Per Hedeland
per@hedeland.org
|
|
0
|
|
|
|
Reply
|
per
|
2/4/2005 8:02:19 PM
|
|
David L. Mills wrote:
> Terje,
>
> I am baffled by your comments. Your "perfect ntpdate replacement"
> described is precisely what ntpd -g does.
Exactly!
That was what I hoped the original poster would realize. :-)
Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
|
|
0
|
|
|
|
Reply
|
Terje
|
2/4/2005 8:05:39 PM
|
|
Tom Smith wrote:
> # time ntpd -gq [above 14 servers/peers with iburst, plus local. No
> authentication.
> Default min/max poll. All appropriate for long-term time
> maintenance.]
I believe using less servers, but minpoll 4, would result in faster
initial aquisition.
Prof. Mills' recent addition of a tuning knob to reduce the confidence
level for first time acceptance will make it even faster.
Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
|
|
0
|
|
|
|
Reply
|
Terje
|
2/4/2005 8:12:12 PM
|
|
"David L. Mills" <mills@udel.edu> writes:
> Hangups as you describe is exactly what the simulator is designed
> to reveal and it has revealed them from time to time as folks learn
> new ways to misconfigure and warp the hardware and new operating
I know this is both a newsgroup and a mailing list, but would it be
possible not to top post? :>
Regards,
David
|
|
0
|
|
|
|
Reply
|
David
|
2/4/2005 8:18:00 PM
|
|
Terje Mathisen wrote:
> Tom Smith wrote:
>
>> # time ntpd -gq [above 14 servers/peers with iburst, plus local. No
>> authentication.
>> Default min/max poll. All appropriate for long-term time
>> maintenance.]
>
>
> I believe using less servers, but minpoll 4, would result in faster
> initial aquisition.
>
> Prof. Mills' recent addition of a tuning knob to reduce the confidence
> level for first time acceptance will make it even faster.
>
> Terje
So you're suggesting that the requirements for initial acquisition
of an estimated time necessary to get ntpd off the grounbd are different
from the requirements for long-term maintenance of an accurate and stable time.
I certainly agree.
|
|
0
|
|
|
|
Reply
|
Tom
|
2/4/2005 8:55:57 PM
|
|
Is it intentional or a but that this list has no "Reply to:" attribute?
The list just moved, is this simply a a missing configuration?
I just sent a message to Tom Smith that was intended for the list, sorry
Tom ;-)
Alain
|
|
0
|
|
|
|
Reply
|
Alain
|
2/4/2005 9:34:08 PM
|
|
Alain,
In the ntpd documentation for the -g option:
Normally, ntpd exits with a message to the system log if the offset
exceeds the panic threshold, which is 1000 s by default. This option
allows the time to be set to any value without restriction; however,
this can happen only once. If the threshold is exceeded after that, ntpd
will exit with a message to the system log. This option can be used with
the -q and -x options. See the tinker command for other options.
Dave
Alain wrote:
> Answers to several messages:
>
> ----------
> David L. Mills escreveu:
>
> > Alain,
> > No, don't use ntpdate at all. Use ntpd -g with the tos maxdist 16 if
> you want instant synchronization. There are several combinations of the
> tos commmand that could be used to modify behavior, such as the minsane
> and minclocks arguments. TO determine what they do you have to
> understand the icky algorithms; however, after studying the briefings at
> the project page and understanding how the algorithms work, it should be
> fairly obvious.
>
>
> The problem is that -g has a limit of 1000 seconds. My worst case
> scenario (I have seen it a lot) includes dead bateries and initial times
> of 1/1/1980. Second worst is UCT missconfiguration, many hours off. And
> not to forguet Summer Daylight savings time that is more than 1000s.
>
> Even worse is that it exits ntpd. This requires another deamon to check
> if it happened :(
>
> If ntpdate disapears, there should be *some* way of handling this.
>
> -----------
> Tom Smith escreveu:
> >> So, if I manage to set the initial time within a good aproximation od
> >> "real" time using ntpdate using 5 servers as explained earlyer, would
> >> you recomend to delete the drift file and let it start all over again?
> >
> > If you do that, there should be no need to clean out the
> > characteristic drift that was so carefully computed over a long
> > period of time. Cleaning out the drift file is a drastic measure
> > required only if you do NOT pre-set the clock before starting ntpd
> > and you end up, as a result, with a bogus re-calculated drift rate.
>
> So I understand that testing drift and deleting it only if it is
> +-500.00 is the more generic aproach for for all situations?
>
> ----------
> Terje Mathisen escreveu:
> > I had one of the my GPS-based stratum 1 server go bad on me a few years
> > ago, and found out the hard way that _one_ critical server in our
> > infrastructure had been configured to use this particular server as the
> > only ntpdate reference:
> >
> > It was rebooted during the time interval before the failing ntp/gps
> > server was located and turned off, with the result that this unix db
> > machine came up with a wildly wrong date.
> >
> > After ntpdate had run, ntpd was started, but even though it had been
> > configured with four server lines, only the temporarily broken server
> > used by ntpdate was sufficiently close to be accepted, all the others
> > were deemed falsetickers.
> >
> > A perfect ntpdate replacement needs to locate all or most of all the
> > configured servers, and run at least a couple of packets to each server,
> > and wait until a plurality have agreed on what the time is. This can be
> > handled in an absolute minimum of about 3-5 seconds without polling each
> > server more often than every two seconds.
> >
> > If you configure fewer servers, then you'd need more packets to each
> > server before a sufficient reach value could be achieved for the set of
> > servers that agree.
> >
> > The same goes in case of one or more falsetickers of course: They must
> > be detected and filtered out, which requires more packets to all the
> > servers than if they all agree.
>
> Behind this is a missconfigured and forgotten configuration. I believe
> the script to use many servers that are *in* ntp.conf would prvent that,
> specialy that these servers will probably be checked periodicaly with
> ntpq -p.
>
> -----------
> I believe than that for a more general situation:
> 1) ntpdate with many servers from ntp.conf (5 is reasonable)
> 2) check if drift is +-500.000, if so delete it.
> 3) Start ntpd
> 4) periodicaly check external (and internal) servers to see if a
> reasonable number of them are still there.
>
> Alain
|
|
0
|
|
|
|
Reply
|
David
|
2/4/2005 9:53:31 PM
|
|
David,
With genuine respect, I won't do that. I have enough trouble with
degraded eyesight and enlarged font as it is. The messages go from most
recent to oldest and I keep that order.
Dave
David Magda wrote:
> "David L. Mills" <mills@udel.edu> writes:
>
>
>>Hangups as you describe is exactly what the simulator is designed
>>to reveal and it has revealed them from time to time as folks learn
>>new ways to misconfigure and warp the hardware and new operating
>
>
> I know this is both a newsgroup and a mailing list, but would it be
> possible not to top post? :>
>
> Regards,
> David
|
|
0
|
|
|
|
Reply
|
David
|
2/4/2005 9:58:35 PM
|
|
Tom Smith wrote:
> Brad Knowles wrote:
>
>> At 4:21 AM +0000 2005-02-04, Tom Smith wrote:
>>
>>> # time ntpd -gq [13 servers in ntp.conf]
>>> ntpd: time slew -0.000373s
>>>
>>> real 1m43.03s
>>> user 0m0.10s
>>> sys 0m0.60s
>>
>>
>>
>> Yes, but what does that ntp.conf look like? Are you using
>> iburst? Any authentication? Manually setting minpoll and/or
>> maxpoll? Did you have a good drift file to start with?
>>
>> You need to provide some more specifics before you can make an
>> attempt to compare this to ntpdate.
>>
>>> # time ntpdate -b [three selected servers]
>>> 3 Feb 22:44:45 ntpdate[186032]: step time server [IP address]
>>> offset -0.000157 sec
>>
>>
>>
>> Not comparable. You need to include all thirteen servers before
>> this could potentially be considered comparable.
>>
>
> As you wish...
>
> # cat /etc/ntp.drift
> -2.653
>
> # ntpq -p
> remote refid st t when poll reach delay
> offset jitter
> ==============================================================================
>
> LOCAL(1) LOCAL(1) 5 l 41 64 377 0.000
> 0.000 0.004
> [IP.255] 0.0.0.0 16 u - 64 0 0.000 0.000
> 4000.00
> -[name] .TRUE. 1 u 468 1024 377 12.741
> -0.950 0.092
> -[name] .WWVB. 1 u 470 1024 377 1.388
> 1.015 1.216
> -[name] [name] 2 u 135 256 337 0.004
> -0.490 0.440
> -[name] .GPS. 1 u 997 1024 377 88.936
> 0.153 1.854
> +[name] .GPS. 1 u 603 1024 377 88.146
> 0.546 0.119
> *[name] .GPS. 1 u 764 1024 377 88.311
> 0.605 1.387
> -[name] .GPS. 1 u 783 1024 377 73.649
> 0.766 12.499
> #[name] [name] 2 u 652 1024 377 83.509
> -0.371 0.725
> -[name] .GPS. 1 u 720 1024 377 32.581
> 1.203 0.082
> +[name] .GPS. 1 u 744 1024 377 105.719
> 0.604 1.952
> -[name] .GPS. 1 u 686 1024 377 92.301
> 3.050 0.447
> #[name] [name] 2 u 339 1024 176 0.796
> -1.401 0.245
> #[name] [name] 2 u 520 1024 376 10.376
> -2.257 0.814
> [name] 0.0.0.0 16 u - 1024 0 0.000 0.000
> 4000.00
>
> # time ntpd -gq [above 14 servers/peers with iburst, plus local. No
> authentication.
> Default min/max poll. All appropriate for long-term time
> maintenance.]
> ntpd: time slew 0.000527s
>
> real 0m45.00s
> user 0m0.11s
> sys 0m0.55s
>
> [Note that ntpd -gq isn't really done when it unblocks. It has only
> initiated a slew. But we'll let that pass becasuse the time is
> already way more than "good enough".]
>
> # ntpdate -b [same set of servers and peers, except no local clock. Way
> overconfigured for one-time "good enough" acquisition at
> boot.]
> 4 Feb 13:21:32 ntpdate[201766]: step time server [IP] offset 0.000647
> sec
>
> real 0m6.73s
> user 0m0.00s
> sys 0m0.00s
>
> ________________________________________________________________________
> Tom Smith smith@alum.mit.edu,smith@cag.lkg.hp.com
> Hewlett-Packard Company Tel: +1 (603) 884-6329
> 110 Spit Brook Road ZKO1-3/H42 FAX: +1 (603) 884-6484
> Nashua, New Hampshire 03062-2698, USA Mobile: +1 978 397 3411
If you omit the "q" option, ntpd will simply set the clock and keep
running! If the error is greater than 128 milliseconds it will step the
clock, otherwise it will slew the clock. In either case you are up and
running with a clock error of less than 128 millieconds. Is this not
good enough? If not, what is your requirement for accuracy?
|
|
0
|
|
|
|
Reply
|
Richard
|
2/4/2005 9:59:10 PM
|
|
Per,
Well, I found the step routine call; however, that code has grown so
weedy with intricate evil little OS-dependencies that I find it
unreadable. I haven't touched the ntpdate code since it first appeared
probably fifteen years ago. So far as I can see, if somebody opts out
the step correction, ntpdate can leave a big offset for later ntpd to
chew on.
Dav
Per Hedeland wrote:
> In article <ctutfh$4fq$1@dewey.udel.edu> "David L. Mills"
> <mills@udel.edu> writes:
>
>>The code I see in the ntpdate source does an adjtime() for all offsets,
>>even large ones. I don't see a settimeofday() or equivalent.
>
>
> That's just because it calls step_systime() in libntp/systime.c for that
> - as it should, of course. Surely you didn't think the -b option was a
> no-op and the documentation full of lies... (Well, I'm looking at 4.2.0
> and assuming this functionality hasn't been removed in the development
> version...)
>
> --Per Hedeland
> per@hedeland.org
|
|
0
|
|
|
|
Reply
|
David
|
2/4/2005 10:41:35 PM
|
|
In article <cu0tn7$jmh$1@dewey.udel.edu> "David L. Mills"
<mills@udel.edu> writes:
>
>Well, I found the step routine call; however, that code has grown so
>weedy with intricate evil little OS-dependencies that I find it
>unreadable. I haven't touched the ntpdate code since it first appeared
>probably fifteen years ago. So far as I can see, if somebody opts out
>the step correction, ntpdate can leave a big offset for later ntpd to
>chew on.
Yes, and along these lines are the reasons for dropping ntpdate that you
have given in the past:
a) It's hideously complex
b) It does something that is similar to / a subset of what ntpd does (or
rather what it did many years ago), but not the same, which together
with a) makes it a pain to maintain
c) It's feature set gives the impression that it might be reasonable to
use *instead* of ntpd (e.g. running hourly from cron) - there's no
point having the slew modes otherwise
d) If widely used as in c), it's quite unfriendly to servers, when
gazillions of boxes send their 4 packets * N servers exactly on the
hour.
FWIW, I find them perfectly valid - I'm just still not quite happy with
the replacement.:-)
If your latest ntpd tweak knobs can really achieve the low-quality-but-
really-quick time setting that ntpdate provides, *and* the combination
of knob settings needed for that is codified into another ntpd option
(--impatient maybe?:-), I think ntpdate can finally be laid to rest.
Requiring a separate config file just for this boot-time setting, with
parameters and values that are even more esoteric to the average user,
is really a show stopper IMHO.
--Per Hedeland
per@hedeland.org
|
|
0
|
|
|
|
Reply
|
per
|
2/4/2005 11:34:34 PM
|
|
Hi all,
I made some tests with extreme bad initial clocks:
# date --set "01/01/00 00:00"
S�b Jan 1 00:00:00 BRDT 2000
# ntpd -gq
ntpd: time reset 160871283.056670s
This does works great, but takes 45 seconds (8 servers with iburst).
This -gq combination was not clear in the docs, now that I know about it
I read it again and in fact it is there, but I did read it many times
and missed just something.
ntpdate with 5 servers 3 seconds
ntpdate with 8 servers 4 seconds
It looks like ntpdate does most things in parelel.
Per Hedeland escreveu:
> If your latest ntpd tweak knobs can really achieve the low-quality-but-
> really-quick time setting that ntpdate provides, *and* the combination
> of knob settings needed for that is codified into another ntpd option
> (--impatient maybe?:-), I think ntpdate can finally be laid to rest.
> Requiring a separate config file just for this boot-time setting, with
> parameters and values that are even more esoteric to the average user,
> is really a show stopper IMHO.
Agreed. Maybe the "twek knobs" + "minpoll=1" (this look similar to what
ntpdate uses) could be command line or just temporary and disabled after
the initial time set. I hope that whatever replaces ntpdate is
aproximately as fast (say like <5s)
Thanks for your patience again,
Alain
|
|
0
|
|
|
|
Reply
|
Alain
|
2/5/2005 12:52:37 AM
|
|
At 8:02 PM +0000 2005-02-04, Per Hedeland wrote:
> As has been pointed out many times here, there's no need to have a LOCAL
> clock configured for that - ntpd will do it anyway. The only reason to
> have a LOCAL clock configured is if you need ntpd to serve time to
> others in this situation.
I described the behaviour that I have personally experienced.
No more, no less.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/5/2005 1:12:04 AM
|
|
At 7:34 PM -0200 2005-02-04, Alain wrote:
> Is it intentional or a but that this list has no "Reply to:" attribute?
> The list just moved, is this simply a a missing configuration?
There is no Reply-to: header, nor will there be so long as I am
in charge of running the mail system and the mailing list server
system.
See <http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq03.048.htp>.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/5/2005 1:22:03 AM
|
|
Per Hedeland wrote:
>In article <cu0tn7$jmh$1@dewey.udel.edu> "David L. Mills"
><mills@udel.edu> writes:
>
>
>>Well, I found the step routine call; however, that code has grown so
>>weedy with intricate evil little OS-dependencies that I find it
>>unreadable. I haven't touched the ntpdate code since it first appeared
>>probably fifteen years ago. So far as I can see, if somebody opts out
>>the step correction, ntpdate can leave a big offset for later ntpd to
>>chew on.
>>
>>
>
>Yes, and along these lines are the reasons for dropping ntpdate that you
>have given in the past:
>
>a) It's hideously complex
>b) It does something that is similar to / a subset of what ntpd does (or
> rather what it did many years ago), but not the same, which together
> with a) makes it a pain to maintain
>c) It's feature set gives the impression that it might be reasonable to
> use *instead* of ntpd (e.g. running hourly from cron) - there's no
> point having the slew modes otherwise
>d) If widely used as in c), it's quite unfriendly to servers, when
> gazillions of boxes send their 4 packets * N servers exactly on the
> hour.
>
>FWIW, I find them perfectly valid - I'm just still not quite happy with
>the replacement.:-)
>
>If your latest ntpd tweak knobs can really achieve the low-quality-but-
>really-quick time setting that ntpdate provides, *and* the combination
>of knob settings needed for that is codified into another ntpd option
>(--impatient maybe?:-), I think ntpdate can finally be laid to rest.
>Requiring a separate config file just for this boot-time setting, with
>parameters and values that are even more esoteric to the average user,
>is really a show stopper IMHO.
>
>--Per Hedeland
>per@hedeland.org
>
>
I think maybe what people are looking for, that ntpdate seems to do and
ntpd does not, is:
1. Get the time accurate to within X seconds quickly.
2. Set that time quickly. The presumption here is that we are in a
state where it's OK to step the clock; e.g. we are booting and nothing
has started yet that will be upset by a change in time. Slewing the
clock to correct an offset of, say, 127 milliseconds, to within, say, 5
milliseconds, will take about 250 seconds and that is unacceptable when
we need to be within 5 milliseconds of the correct time as quickly as
possible.
The user may wish to specify the required accuracy, knowing that there
is a trade off with elapsed time to achieve that accuracy. How good
"good enough" must be will vary. Someone recording the time workers
started work and the time they stopped may be satisfied with plus/minus
one minute but it's 8:00AM local time and he needs it right now!
Someone else might really need plus/minus ten microseconds and be
willing and able to wait two or three hours to get it.
NTP may not be the right tool to meet some requirements but I think it
should be possible to satisfy a lot of people with something that has
capabilities similar to what I've outlined.
|
|
0
|
|
|
|
Reply
|
Richard
|
2/5/2005 1:55:55 AM
|
|
Alain,
I did the same thing as you, but ntpd -gq with 8 servers and iburst set
the clock in 8 seconds, not 45. This includes DNS. Did yours stall in
DNS? Each of the g, q and x option descriptions has a sentence that
mentions that the option can be used in conjuntion with the other two.
The ntpd polls the servers in parallel, but offsets to avoid bunching.
Note ntpd uses a two-second poll interval to avoid violating the KoD
rules. The ntpdate uses one second, which result up to half the polls
can be ignored and (if configured - our severs are) a KoD sent.
The Corps is working on a SNTP replacement for ntpdate, presumably
compliant with the ID I mentioned earlier. I can't tell you how relieved
I would be when this comes to pass and evil ntpdate can be finally torched.
I'm really nervous about the kit of knobs now in ntpd which may be ripe
for misinterpretation, misuse and misunderstood documenation. Every new
"feature" increases the complexity and fragility of the program and
consumes lots of my time in testing, documentation and mail
correspondence. The IETF task force now studying the specification issue
should take up the issue of a standard set of parameters selectable at
configuration time which would cater to whatever the community wants.
Dave
Alain wrote:
> Hi all,
>
> I made some tests with extreme bad initial clocks:
>
> # date --set "01/01/00 00:00"
> S�b Jan 1 00:00:00 BRDT 2000
> # ntpd -gq
> ntpd: time reset 160871283.056670s
>
> This does works great, but takes 45 seconds (8 servers with iburst).
> This -gq combination was not clear in the docs, now that I know about it
> I read it again and in fact it is there, but I did read it many times
> and missed just something.
>
> ntpdate with 5 servers 3 seconds
> ntpdate with 8 servers 4 seconds
>
> It looks like ntpdate does most things in parelel.
>
> Per Hedeland escreveu:
>
>> If your latest ntpd tweak knobs can really achieve the low-quality-but-
>> really-quick time setting that ntpdate provides, *and* the combination
>> of knob settings needed for that is codified into another ntpd option
>> (--impatient maybe?:-), I think ntpdate can finally be laid to rest.
>> Requiring a separate config file just for this boot-time setting, with
>> parameters and values that are even more esoteric to the average user,
>> is really a show stopper IMHO.
>
>
> Agreed. Maybe the "twek knobs" + "minpoll=1" (this look similar to what
> ntpdate uses) could be command line or just temporary and disabled after
> the initial time set. I hope that whatever replaces ntpdate is
> aproximately as fast (say like <5s)
>
> Thanks for your patience again,
> Alain
|
|
0
|
|
|
|
Reply
|
David
|
2/5/2005 5:07:05 PM
|
|
In article <mailman.52.1107569121.583.questions@lists.ntp.isc.org> Brad
Knowles <brad@stop.mail-abuse.org> writes:
>At 8:02 PM +0000 2005-02-04, Per Hedeland wrote:
>
>> As has been pointed out many times here, there's no need to have a LOCAL
>> clock configured for that - ntpd will do it anyway. The only reason to
>> have a LOCAL clock configured is if you need ntpd to serve time to
>> others in this situation.
>
> I described the behaviour that I have personally experienced.
>
> No more, no less.
Not so - the experience you described was that having a LOCAL clock
configured worked well. I don't dispute that, but what you actually
wrote was that this was something that was needed: "you can help make
recovery easier by configuring the LOCAL refclock" and (without LOCAL)
"it most likely won't be good and you will have to restart". Both of
those statements are blatantly incorrect, and the configuration of a
LOCAL clock will often cause serious problems, in particular when the
appropriate "fudge" is omitted.
--Per Hedeland
per@hedeland.org
|
|
0
|
|
|
|
Reply
|
per
|
2/5/2005 9:13:13 PM
|
|
Richard B. Gilbert wrote:
> Tom Smith wrote:
>
> > David L. Mills wrote:
> >
> >> Kenneth,
> >>
> >> This is the single most persistent issue in the engineering design
of
> >> NTP. There must be tradeoffs between security, robustenss,
accuracy
> >> and initial delay. In the current design compromise, a server is
> >> acceptable only after three/four rounds of messages and the
ensemble
> >> time is acceptable with at least one of possibly several
acceptable
> >> servers. With IBURST mode, takes takes 6-8 seconds.
> >>
> >> For better robustness use "tos minclock N", where the at least N
> >> (default 1) servers must be acceptable to set the clock. Tonight I
> >> put in a "tos maxdist M", where M is the distance threshold below
> >> which the server is acceptable. Set "tos maxdist 16" and the first
> >> sample received from any server will set the clock likety-split.
Of
> >> course, essentially all the mitigation algorithms using
> >> multiple-sample redundancy and multiple-server diversity are
> >> systematically defeated. You might as well use SNTP.
> >
> >
> > David,
> >
> > I know the subject has been workstations, but let's talk for a
moment
> > about this religion as it concerns servers - like the ones that run
> > telephone companies, stock exchanges, and banks inside heavily
> > defended firewalls. It's the same issue, it's just that the stakes
> > are higher. The issue is how quickly can you get these
> > systems back up at boot. 15-30 seconds is a long time to wait.
> > Too long.
> >
> > We're not talking about one-shot sampling for maintaining the time,
> > so comparisons to SNTP are not helpful. We're talking about speed
of
> > acquistion of an initial "good enough" time, keeping in mind that
the
> > perfect is often the enemy of the good.
> >
> > You might argue that if boot time is critical, just let the server
come
> > up with whatever random time it comes up with and let ntpd fix
> > it up later. Give it a "-g" so it doesn't complain. A lot of folks
> > have tried this in the past inadvertently (and continue to do so)
> > by neglecting to put ntpdate into their boot sequence ahead of
ntpd.
> > I've fixed a lot of systems whose drift files were pinned
> > at 500 ppm and whose systems ran perpetually fast or slow as
> > a result. We've also spent a lot of money fruitlessly replacing
> > motherboards on those systems. Turning a large initial offset over
> > to ntpd is decidedly NOT a Good Idea.
> >
> > The reason why so many of your constituency keep bringing this
> > subject up is that they know that ntpd needs a good (not perfect)
> > estimate of the time before it starts and that critical systems
> > can't wait for perfection to get that estimate.
> >
> > -Tom
> >
________________________________________________________________________
> > Tom Smith
smith@alum.mit.edu,smith@cag.lkg.hp.com
> > Hewlett-Packard Company Tel: +1 (603)
884-6329
> > 110 Spit Brook Road ZKO1-3/H42 FAX: +1 (603)
884-6484
> > Nashua, New Hampshire 03062-2698, USA Mobile: +1 978 397
3411
> >
> Tom,
>
> I think it all boils down to how good is "good enough"? Your snail
> mail address suggests that you're in VMS Engineering or, if not, you
> could throw rocks at them! VMS, although it keeps time in units of
100
> nanosecond "ticks", only updates the clock every ten milliseconds!
> (Measure with micrometer, mark with chalk, cut with ax?)
> The documented and supported interfaces in VMS only permit you to set
> the clock and read the clock to the nearest ten milliseconds.
>
Actually Richard, if you look again he is using an lkg email address
and this is the networks group that recently moved from LKG to ZKO.
You shouldn't make any assumptions about which group he's talking
about since the Tru64 Unix group is also in ZK0. VMS Engineering
used to be in ZKO3 but I don't know if that's changed.
I've forgotten about how the VMS API's look as that's been a few years,
though I still have access to the doc sets. There's also likely
to be a difference between what you can do with A VAX as opposed to
an Alpha.
> If you are willing to have a server come up with a clock error of one
> second, just boot and start ntpd later. If you need to have time
> correct to the nearest microsecond, you are using the wrong tools.
>
> If you are, in fact, talking about VMS and TCP/IP services, porting
the
> latest version of the NTP reference implementation would help you
speed
> up the startup. The last time I looked, TCP/IP Services (V5.1) was
> using a port of NTP V3-5.91 which does not support the iburst
> qualifier. Iburst allows much faster initialization; it gets you a
> "good enough" time and frequency correction in about 8 seconds.
>
Jason has been doing that work for VMS. Tom sounds more like a
support person trying to deal with customers having problems like
this.
> If eight seconds is too long, you need to specify how quickly you
need
> to acquire the correct time and how accurate the time must be.
These
> two specifications pretty much determine the tools you must use to
meet
> them; e.g. if you need time correct to +/- 50 nanoseconds and need to
> set it within 100 microseconds, you will almost certainly need to use
a
> hardware reference clock such as a cesium or rubidium standard.
If you are a client trying to deal with mission-critical systems
I can well imagine a need to get the initial time accurate very
quickly, but then if it's that critical, a refclock is the proper
solution rather than relying on another system to provide it.
Don't forget you need the IP stack running first before you can
check externally for a time reference.
Danny
|
|
0
|
|
|
|
Reply
|
mayer
|
2/5/2005 10:27:36 PM
|
|
At 9:13 PM +0000 2005-02-05, Per Hedeland wrote:
> I don't dispute that, but what you actually
> wrote was that this was something that was needed: "you can help make
> recovery easier by configuring the LOCAL refclock" and (without LOCAL)
> "it most likely won't be good and you will have to restart".
Which is precisely the behaviour that I experienced. Without a
LocalCLK, my machine would never recover from a loss of LAN
connectivity. With a LocalCLK, it would. Therefore, I came to the
conclusion that I described.
Now, what part of my own personal experience are you supposedly
not disputing?
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/6/2005 1:01:35 AM
|
|
At 10:52 PM -0200 2005-02-04, Alain wrote:
> ntpdate with 5 servers 3 seconds
> ntpdate with 8 servers 4 seconds
>
> It looks like ntpdate does most things in parelel.
No, it does them in sequence. Witness:
# /usr/bin/time -p ntpdate -b 10.0.1.240 time.euro.apple.com
de.pool.ntp.org fr.pool.ntp.org nl.pool.ntp.org uk.pool.ntp.org
0.europe.pool.ntp.org 1.europe.pool.ntp.org 2.europe.pool.ntp.org
europe.pool.ntp.org
Password:
Looking for host 10.0.1.240 and service ntp
host found : 10.0.1.240
Looking for host time.euro.apple.com and service ntp
host found : interweb.euro.apple.com
Looking for host de.pool.ntp.org and service ntp
host found : ipx10540.ipxserver.de
Looking for host fr.pool.ntp.org and service ntp
host found : granny.lievin.net
Looking for host nl.pool.ntp.org and service ntp
host found : i157107.upc-i.chello.nl
Looking for host uk.pool.ntp.org and service ntp
host found : cheddar.halon.org.uk
Looking for host 0.europe.pool.ntp.org and service ntp
host found : time.as-computer.biz
Looking for host 1.europe.pool.ntp.org and service ntp
host found : change2linux.com
Looking for host 2.europe.pool.ntp.org and service ntp
host found : 62.101.81.203
Looking for host europe.pool.ntp.org and service ntp
host found : aszlig.net
6 Feb 04:05:36 ntpdate[28684]: sendto(10.0.1.240): Host is down
6 Feb 04:05:37 ntpdate[28684]: sendto(10.0.1.240): Host is down
6 Feb 04:05:38 ntpdate[28684]: sendto(10.0.1.240): Host is down
6 Feb 04:05:39 ntpdate[28684]: sendto(10.0.1.240): Host is down
6 Feb 04:05:40 ntpdate[28684]: step time server 193.201.200.139
offset -0.013175 sec
real 14.43
user 0.01
sys 0.04
And here's ntpd:
# /usr/bin/time -p ntpd -gq -f /var/run/ntp.drift -p
/var/run/ntpd.pid
ntpd: time slew -0.009592s
real 16.07
user 0.01
sys 0.01
It just so happens that, in many cases, you can get those few
packets out to the servers and back quickly enough that ntpdate will
finish within a few seconds.
Steve has shown that ntpd will get up and running in about seven
seconds on his machine, whereas it's a bit slower on mine since some
of the servers I've selected in my ntp.conf are down/non-responsive.
I'm also on a slow ADSL line, I'm connected to that through an Apple
Airport Extreme base station on a wireless 802.11b network with WEP
encryption (which always slows down wireless network throughput), and
I'm running on an ancient PowerBook G3 laptop running MacOS X that is
being pushed fairly hard to just run the OS and keep a few programs
in memory, much less do anything useful.
Contrariwise, Steve is running on a Soekris 4801 single board
computer which is pretty much otherwise unloaded (except for ntpd),
running a fairly stripped version of FreeBSD, and I believe he's
connected to his cable modem line directly via 10/100 Base-T/TX
Ethernet as opposed to WLAN.
I don't know how much of my slow down is due to what factors, but
I'm not too surprised.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/6/2005 3:07:36 AM
|
|
Brad Knowles wrote:
> At 10:52 PM -0200 2005-02-04, Alain wrote:
>
>> ntpdate with 5 servers 3 seconds
>> ntpdate with 8 servers 4 seconds
>>
>> It looks like ntpdate does most things in parelel.
>
>
> No, it does them in sequence. Witness:
>
> # /usr/bin/time -p ntpdate -b 10.0.1.240 time.euro.apple.com
> de.pool.ntp.org fr.pool.ntp.org nl.pool.ntp.org uk.pool.ntp.org
> 0.europe.pool.ntp.org 1.europe.pool.ntp.org 2.europe.pool.ntp.org
> europe.pool.ntp.org
> Password:
> Looking for host 10.0.1.240 and service ntp
> host found : 10.0.1.240
> Looking for host time.euro.apple.com and service ntp
> host found : interweb.euro.apple.com
> Looking for host de.pool.ntp.org and service ntp
> host found : ipx10540.ipxserver.de
> Looking for host fr.pool.ntp.org and service ntp
> host found : granny.lievin.net
> Looking for host nl.pool.ntp.org and service ntp
> host found : i157107.upc-i.chello.nl
> Looking for host uk.pool.ntp.org and service ntp
> host found : cheddar.halon.org.uk
> Looking for host 0.europe.pool.ntp.org and service ntp
> host found : time.as-computer.biz
> Looking for host 1.europe.pool.ntp.org and service ntp
> host found : change2linux.com
> Looking for host 2.europe.pool.ntp.org and service ntp
> host found : 62.101.81.203
> Looking for host europe.pool.ntp.org and service ntp
> host found : aszlig.net
> 6 Feb 04:05:36 ntpdate[28684]: sendto(10.0.1.240): Host is down
> 6 Feb 04:05:37 ntpdate[28684]: sendto(10.0.1.240): Host is down
> 6 Feb 04:05:38 ntpdate[28684]: sendto(10.0.1.240): Host is down
> 6 Feb 04:05:39 ntpdate[28684]: sendto(10.0.1.240): Host is down
> 6 Feb 04:05:40 ntpdate[28684]: step time server 193.201.200.139 offset
> -0.013175 sec
> real 14.43
> user 0.01
> sys 0.04
>
> And here's ntpd:
>
> # /usr/bin/time -p ntpd -gq -f /var/run/ntp.drift -p /var/run/ntpd.pid
> ntpd: time slew -0.009592s
> real 16.07
> user 0.01
> sys 0.01
>
>
> It just so happens that, in many cases, you can get those few
> packets out to the servers and back quickly enough that ntpdate will
> finish within a few seconds.
>
> Steve has shown that ntpd will get up and running in about seven
> seconds on his machine, whereas it's a bit slower on mine since some of
> the servers I've selected in my ntp.conf are down/non-responsive. I'm
> also on a slow ADSL line, I'm connected to that through an Apple Airport
> Extreme base station on a wireless 802.11b network with WEP encryption
> (which always slows down wireless network throughput), and I'm running
> on an ancient PowerBook G3 laptop running MacOS X that is being pushed
> fairly hard to just run the OS and keep a few programs in memory, much
> less do anything useful.
>
> Contrariwise, Steve is running on a Soekris 4801 single board
> computer which is pretty much otherwise unloaded (except for ntpd),
> running a fairly stripped version of FreeBSD, and I believe he's
> connected to his cable modem line directly via 10/100 Base-T/TX Ethernet
> as opposed to WLAN.
>
> I don't know how much of my slow down is due to what factors, but
> I'm not too surprised.
>
Maybe a large part of your ntpdate time was printing out the messages.
There was some question earlier about whether DNS delays might
explain the lengthy time for ntpd. So I performed the same experiment
on the same system, which is its own DNS server, caching the
names first, and doing ntpdate before ntpd to make doubly sure.
There is no meaningful difference.
Starting with:
# cat /etc/ntp.drift
-2.300
# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
LOCAL(1) LOCAL(1) 5 l 9 64 377 0.000 0.000 0.004
[network].255 0.0.0.0 16 u - 64 0 0.000 0.000 4000.00
-[name] .TRUE. 1 u 752 1024 377 30.571 6.192 8.958
-[name] .WWVB. 1 u 694 1024 377 0.176 2.184 1.237
-[name] [name] 2 u 888 512 156 0.004 1.563 1.258
+[name] .GPS. 1 u 671 1024 377 87.984 0.651 0.047
+[name*] .GPS. 1 u 752 1024 377 88.089 0.590 0.055
*[name] .GPS. 1 u 338 1024 377 87.950 0.649 0.014
-[name] .GPS. 1 u 749 1024 377 73.684 0.285 0.227
#[name] [name] 2 u 811 1024 377 83.563 0.551 0.681
-[name] .GPS. 1 u 766 1024 377 32.532 1.301 1.149
-[name] .GPS. 1 u 401 1024 377 105.854 0.586 0.048
-[name*] .GPS. 1 u 771 1024 377 92.284 3.090 0.076
#[name] [name] 2 u 106 1024 377 0.512 1.809 0.057
#[name*] [name*] 2 u 463 1024 376 9.281 1.583 0.269
[name] 0.0.0.0 16 u - 1024 0 0.000 0.000 4000.00
[name*] = servers chosen for ntpdate/ntpd -gq
All are on an internal network, physical distances from feet to 3000 miles.
Delays from home across my own wireless onto the Internet to servers 3000
miles away are 93 milliseconds, so I wouldn't put much faith in that
as an important difference.
[shut down ntpd]
# time ntpdate -b [3 servers]
5 Feb 22:58:12 ntpdate[234803]: step time server [IP] offset -0.000136 sec
real 0m0.71s [you'll recall that with all the above servers, including
user 0m0.00s the one that's down, this still took only 6.73 seconds]
sys 0m0.00s
# time ntpd -gq -c ntp.boot [same 3 servers, iburst minpoll 4]
ntpd: time slew 0.001556s
real 0m41.01s
user 0m0.06s
sys 0m0.60s
# time ntpd -gq -c ntp.boot [same 3 servers, iburst minpoll 1]
ntpd: time slew 0.002653s
real 0m37.00s
user 0m0.10s
sys 0m0.56s
-Tom
|
|
0
|
|
|
|
Reply
|
Tom
|
2/6/2005 4:53:32 AM
|
|
At 4:53 AM +0000 2005-02-06, Tom Smith wrote:
> Maybe a large part of your ntpdate time was printing out the messages.
Could be. I doubt that could make it go from 0.7 seconds to 14
seconds, but it might have added a bit.
> There was some question earlier about whether DNS delays might
> explain the lengthy time for ntpd. So I performed the same experiment
> on the same system, which is its own DNS server, caching the
> names first, and doing ntpdate before ntpd to make doubly sure.
> There is no meaningful difference.
I ran ntpdate and ntpd multiple times myself, so as to make sure
that DNS caching was not an issue. I also didn't muck about with
minpoll or maxpoll, although I did use iburst. Still, my ntpd
execution wasn't very much longer than my ntpdate (which was slower
than your full one), and my ntpd startup was considerably faster than
yours.
At this point, all I'll say is that there are a lot of factors
involved, and if you try to set up the situation so as to be as
comparable as possible, ntpdate does not fare well. Moreover,
ntpdate has some nasty failure modes (which have been described by
others) if you don't give it enough servers to check against and/or
if some of them are down.
I know what you want to use it for.
You want a guaranteed less-than-one-second "good enough" answer
for doing any necessary large-scale changes to the clock, afterwards
you can start up various somewhat time-sensitive applications while
the system can start getting into the detailed long-term clock
maintenance "in the background".
Problem is, the things you can do in order to get the upper limit
down below one second are the same sorts of things which tend to give
you really nasty failure modes.
If you're *that* sensitive to time on startup, then you're
probably also sensitive to nasty failure modes.
I am not at all convinced that you can have your cake and eat it,
too -- Past illusions of being able to do so in the past with ntpdate
not withstanding.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/6/2005 5:59:36 AM
|
|
In article <vaKdnSNBMfU2upnfRVn-uQ@comcast.com> "Richard B. Gilbert"
<rgilbert88@comcast.net> writes:
>
>I think maybe what people are looking for, that ntpdate seems to do and
>ntpd does not, is:
>1. Get the time accurate to within X seconds quickly.
>2. Set that time quickly. The presumption here is that we are in a
>state where it's OK to step the clock; e.g. we are booting and nothing
>has started yet that will be upset by a change in time. Slewing the
>clock to correct an offset of, say, 127 milliseconds, to within, say, 5
>milliseconds, will take about 250 seconds and that is unacceptable when
>we need to be within 5 milliseconds of the correct time as quickly as
>possible.
Agreed.
>The user may wish to specify the required accuracy, knowing that there
>is a trade off with elapsed time to achieve that accuracy. How good
>"good enough" must be will vary. Someone recording the time workers
>started work and the time they stopped may be satisfied with plus/minus
>one minute but it's 8:00AM local time and he needs it right now!
>Someone else might really need plus/minus ten microseconds and be
>willing and able to wait two or three hours to get it.
This I think is way beyond "what people are looking for" in this
particular area - and I'm not sure it's very meaningful either: It makes
no sense to set the time with great accuracy and then just leave it,
since the clock will quickly drift away. Achieving and *maintaining*
great accuracy, by calculating and taking the drift into account, is
precisely what ntpd already does in its normal mode of operation, of
course.
The goal, whether achieved with a separate program or a special startup
mode of ntpd, is to "quickly" (5 seconds sounds like a reasonable upper
bound) set the clock "well enough" that ntpd in normal operation can
from that point on maintain it without steps. I'm intentionally using
"goal" rather than "requirement", because with a big clock drift and
lack of previous knowledge of it (i.e. drift file), the goal may not be
possible to achieve. In "normal circumstances" it should be no problem
at all though, and ntpdate does it easily.
--Per Hedeland
per@hedeland.org
|
|
0
|
|
|
|
Reply
|
per
|
2/6/2005 10:35:36 AM
|
|
In article <mailman.57.1107659271.583.questions@lists.ntp.isc.org> Brad
Knowles <brad@stop.mail-abuse.org> writes:
>At 10:52 PM -0200 2005-02-04, Alain wrote:
>
>> ntpdate with 5 servers 3 seconds
>> ntpdate with 8 servers 4 seconds
>>
>> It looks like ntpdate does most things in parelel.
>
> No, it does them in sequence. Witness:
>
># /usr/bin/time -p ntpdate -b 10.0.1.240 time.euro.apple.com
>de.pool.ntp.org fr.pool.ntp.org nl.pool.ntp.org uk.pool.ntp.org
>0.europe.pool.ntp.org 1.europe.pool.ntp.org 2.europe.pool.ntp.org
>europe.pool.ntp.org
Well, that output only shows it doing the DNS lookups in sequence - it's
a bit of a pain to do it any other way given the synchronous nature of
gethostby*(). To see the actual queries, use -d - it will reveal that
there is parallellism "when needed":
$ time ntpdate -ud 10.1.1.6 10.1.1.17 de.pool.ntp.org fr.pool.ntp.org
[snip]
transmit(10.1.1.6)
transmit(10.1.1.17)
transmit(193.218.127.251)
receive(193.218.127.251)
transmit(193.218.127.251)
receive(193.218.127.251)
transmit(193.218.127.251)
receive(193.218.127.251)
transmit(193.218.127.251)
receive(193.218.127.251)
transmit(193.218.127.251)
transmit(81.56.134.142)
receive(81.56.134.142)
transmit(81.56.134.142)
receive(81.56.134.142)
transmit(81.56.134.142)
transmit(10.1.1.6)
receive(81.56.134.142)
transmit(81.56.134.142)
receive(81.56.134.142)
transmit(81.56.134.142)
transmit(10.1.1.17)
transmit(10.1.1.6)
transmit(10.1.1.17)
transmit(10.1.1.6)
transmit(10.1.1.17)
transmit(10.1.1.6)
transmit(10.1.1.17)
10.1.1.6: Server dropped: no data
10.1.1.17: Server dropped: no data
[snip]
0.017u 0.017s 0:04.64 0.4% 56+448k 0+0io 0pf+0w
I.e. even with the two unresponsive servers I intentionally gave it, it
finishes in less than 5 seconds (the retransmits to those servers are
done at 1-second intervals, but in parallell).
>real 14.43
>user 0.01
>sys 0.04
To get this kind of runtime, I think DNS lookup delays is the only
possible explanation. Is the DNS server local to your box, or behind the
slow network? Just out of curiosity - it doesn't help to have a local
server in the scenario where ntpdate is used, i.e. at boot, of course.
If I put an unreachable DNS server first in my resolv.conf, ntpdate's
runtime in the above case skyrockets to 45 seconds (OK, this is a
weakness:-). And while ntpd too still suffers from synchronous DNS
lookups done sequentially, the sequence of lookups is done in parallell
with its normal operation, i.e. it can at least get to work as soon as
the first answer has arrived.
--Per Hedeland
per@hedeland.org
|
|
0
|
|
|
|
Reply
|
per
|
2/6/2005 11:13:25 AM
|
|
Brad Knowles wrote:
> At this point, all I'll say is that there are a lot of factors
> involved, and if you try to set up the situation so as to be as
> comparable as possible, ntpdate does not fare well.
Maybe you missed the data showing identical conditions and
a greater than 50:1 difference between the 2? One is 2 notes back.
The note you replied to. There are 2 or 3 other previous posts with
detailed data showing the same ting.
> Moreover, ntpdate
> has some nasty failure modes (which have been described by others) if
> you don't give it enough servers to check against and/or if some of them
> are down.
>
Including a down server. ntpd has the same problems if you don't
give it enough servers if they're down, after all.
>
> I know what you want to use it for.
>
> You want a guaranteed less-than-one-second "good enough" answer for
> doing any necessary large-scale changes to the clock, afterwards you can
> start up various somewhat time-sensitive applications while the system
> can start getting into the detailed long-term clock maintenance "in the
> background".
>
Exactly. And ntpd itself is one of tose time-sensitive applications
that happens sometimes to react badly to starting in the wrong place
(yielding drift rates pinned at +-500).
>
> Problem is, the things you can do in order to get the upper limit
> down below one second are the same sorts of things which tend to give
> you really nasty failure modes.
>
Yes, welll, perhaps. I'll take the demonstrated and often seen failure
modes of not doing it over the theoretical ones, though.
> I am not at all convinced that you can have your cake and eat it,
> too -- Past illusions of being able to do so in the past with ntpdate
> not withstanding.
>
I guess it's all a matter of experience. I assure you I have very
few illusions left.
|
|
0
|
|
|
|
Reply
|
Tom
|
2/6/2005 11:51:12 AM
|
|
In article <mailman.55.1107652713.583.questions@lists.ntp.isc.org> Brad
Knowles <brad@stop.mail-abuse.org> writes:
>At 9:13 PM +0000 2005-02-05, Per Hedeland wrote:
>
>> I don't dispute that, but what you actually
>> wrote was that this was something that was needed: "you can help make
>> recovery easier by configuring the LOCAL refclock" and (without LOCAL)
>> "it most likely won't be good and you will have to restart".
>
> Which is precisely the behaviour that I experienced. Without a
>LocalCLK, my machine would never recover from a loss of LAN
>connectivity. With a LocalCLK, it would. Therefore, I came to the
>conclusion that I described.
OK, I failed to realize that you were actually talking about your
experience with the wording "I'm not entirely certain what will happen,
but it most likely won't be good". Anyway, configuring a local clock
should not be needed for this purpose, and if it was in your setup,
maybe there is a bug somewhere.
--Per Hedeland
per@hedeland.org
|
|
0
|
|
|
|
Reply
|
per
|
2/6/2005 11:51:26 AM
|
|
Brad Knowles wrote:
> At 4:53 AM +0000 2005-02-06, Tom Smith wrote:
>
>> Maybe a large part of your ntpdate time was printing out the messages.
>
>
> Could be. I doubt that could make it go from 0.7 seconds to 14
> seconds, but it might have added a bit.
> <snip>
>
> I know what you want to use it for.
>
> You want a guaranteed less-than-one-second "good enough" answer
> for doing any necessary large-scale changes to the clock, afterwards
> you can start up various somewhat time-sensitive applications while
> the system can start getting into the detailed long-term clock
> maintenance "in the background".
>
>
> Problem is, the things you can do in order to get the upper limit
> down below one second are the same sorts of things which tend to give
> you really nasty failure modes.
>
How is this different from setting the correct time by hand from your
cell phone or wrist watch? Doing it by hand is clumsy, slow, and, at
best, only accurate to within a second or so (unless your reflexes are
far faster than mine). What nasty failure modes would that induce?
What additional failure modes would doing it with ntpd induce?
I'm suggesting that ntpd query the usual suspects using iburst and then
unconditionally set (not slew) the clock. Assuming that you have a more
or less accurate drift file, and use it, why would that not give a fast
startup and a time accurate to within, say, twenty milliseconds? The
time budget would be something like eight seconds to send four queries,
get responses, and make initial time and frequency settings. Compared
with 214 seconds to remove 107 milliseconds of a 127 millisecond offset,
that looks pretty good when you are in a hurry to get you system back on
line.
|
|
0
|
|
|
|
Reply
|
Richard
|
2/6/2005 1:46:47 PM
|
|
Per Hedeland wrote:
>In article <vaKdnSNBMfU2upnfRVn-uQ@comcast.com> "Richard B. Gilbert"
><rgilbert88@comcast.net> writes:
>
>
>>I think maybe what people are looking for, that ntpdate seems to do and
>>ntpd does not, is:
>>1. Get the time accurate to within X seconds quickly.
>>2. Set that time quickly. The presumption here is that we are in a
>>state where it's OK to step the clock; e.g. we are booting and nothing
>>has started yet that will be upset by a change in time. Slewing the
>>clock to correct an offset of, say, 127 milliseconds, to within, say, 5
>>milliseconds, will take about 250 seconds and that is unacceptable when
>>we need to be within 5 milliseconds of the correct time as quickly as
>>possible.
>>
>>
>
>Agreed.
>
>
>
>>The user may wish to specify the required accuracy, knowing that there
>>is a trade off with elapsed time to achieve that accuracy. How good
>>"good enough" must be will vary. Someone recording the time workers
>>started work and the time they stopped may be satisfied with plus/minus
>>one minute but it's 8:00AM local time and he needs it right now!
>>Someone else might really need plus/minus ten microseconds and be
>>willing and able to wait two or three hours to get it.
>>
>>
>
>This I think is way beyond "what people are looking for" in this
>particular area - and I'm not sure it's very meaningful either: It makes
>no sense to set the time with great accuracy and then just leave it,
>since the clock will quickly drift away. Achieving and *maintaining*
>great accuracy, by calculating and taking the drift into account, is
>precisely what ntpd already does in its normal mode of operation, of
>course.
>
>
I'm not suggesting that ntpd should set the clock and let it drift!
Having once set the clock, ntpd would resume normal operation with "good
enough" values of time and frequency. An additional minute or two would
be required to check the clock frequency. If it takes two minutes and
the frequency error is less than 500ppm, as it must be if ntpd is to
work at all, the clock could not drift more than sixty milliseconds in
two minutes.
If a user elects to use "fast startup" he must understand and accept the
risks and the fact that it may still take anywhere from several minutes
to several hours to achieve the best synchronization the system is
capable of. I've seen "cold" startups where the time was set by ntpdate
and no drift file take up to twelve hours to stabilize with offsets
below 500 microseconds. The offsets never exceeded twenty milliseconds
and decreased steadily.
>The goal, whether achieved with a separate program or a special startup
>mode of ntpd, is to "quickly" (5 seconds sounds like a reasonable upper
>bound) set the clock "well enough" that ntpd in normal operation can
>from that point on maintain it without steps. I'm intentionally using
>"goal" rather than "requirement", because with a big clock drift and
>lack of previous knowledge of it (i.e. drift file), the goal may not be
>possible to achieve. In "normal circumstances" it should be no problem
>at all though, and ntpdate does it easily.
>
>--Per Hedeland
>per@hedeland.org
>
>
|
|
0
|
|
|
|
Reply
|
Richard
|
2/6/2005 2:08:39 PM
|
|
In article <ZLidneYGhOB1uZvfRVn-pA@comcast.com> "Richard B. Gilbert"
<rgilbert88@comcast.net> writes:
>Per Hedeland wrote:
>
>>In article <vaKdnSNBMfU2upnfRVn-uQ@comcast.com> "Richard B. Gilbert"
>><rgilbert88@comcast.net> writes:
>>
>>>The user may wish to specify the required accuracy, knowing that there
>>>is a trade off with elapsed time to achieve that accuracy. How good
>>>"good enough" must be will vary. Someone recording the time workers
>>>started work and the time they stopped may be satisfied with plus/minus
>>>one minute but it's 8:00AM local time and he needs it right now!
>>>Someone else might really need plus/minus ten microseconds and be
>>>willing and able to wait two or three hours to get it.
>>>
>>>
>>
>>This I think is way beyond "what people are looking for" in this
>>particular area - and I'm not sure it's very meaningful either: It makes
>>no sense to set the time with great accuracy and then just leave it,
>>since the clock will quickly drift away. Achieving and *maintaining*
>>great accuracy, by calculating and taking the drift into account, is
>>precisely what ntpd already does in its normal mode of operation, of
>>course.
>>
>>
>I'm not suggesting that ntpd should set the clock and let it drift!
>Having once set the clock, ntpd would resume normal operation with "good
>enough" values of time and frequency. An additional minute or two would
>be required to check the clock frequency. If it takes two minutes and
>the frequency error is less than 500ppm, as it must be if ntpd is to
>work at all, the clock could not drift more than sixty milliseconds in
>two minutes.
OK, I probably got confused by the "The user may wish to specify the
required accuracy" part - this made me think that you were talking about
a separate program (or possibly ntpd in "-q mode"), that would exit when
the required accuracy had been achieved. In the context of a special
startup mode for ntpd, with ntpd resuming normal operation before that
accuracy was achieved, I'm not sure I understand where this "required
accuracy" would fit in... - should ntpd somehow signal the environment
at the point when it was achieved?
--Per Hedeland
per@hedeland.org
|
|
0
|
|
|
|
Reply
|
per
|
2/6/2005 4:22:05 PM
|
|
At 11:13 AM +0000 2005-02-06, Per Hedeland wrote:
> Well, that output only shows it doing the DNS lookups in sequence - it's
> a bit of a pain to do it any other way given the synchronous nature of
> gethostby*().
You are correct. It does the DNS queries in sequence, before it
does anything else. Even though we do not currently have an
asynchronous resolver built into the system, it would still be
possible for the program to try to process information about other
servers (perhaps only those that are specified by IP address), while
waiting for the answers regarding the servers it's trying to look up.
> To get this kind of runtime, I think DNS lookup delays is the only
> possible explanation. Is the DNS server local to your box, or behind the
> slow network? Just out of curiosity - it doesn't help to have a local
> server in the scenario where ntpdate is used, i.e. at boot, of course.
Using ntpdate exactly as I had done before, with a fresh local
nameserver (i.e., the nameserver was just started and no other
commands had been run which would have been likely to generate DNS
traffic), it took about 20.5 seconds to execute. Running the same
command again, immediately afterwards, took about 12.5 seconds.
Running it a third and a fourth time, it took about 8.5 and 9.9
seconds respectively. However, after the fourth execution, my system
locked up to the point where I couldn't shut down some processes and
I had to pull the plug.
Using ntpdate exactly as before, but using my ISPs DNS servers,
it took about 12 seconds to start up the first time, then settled
down to a pretty reliable 10.07 seconds. However, after the fourth
execution of ntpdate, my system locked up again.
Switching to testing ntpd, using my ISPs upstream nameservers, I
can't tell you how long it took the first time, because my system
clock was so screwed up by the previous recovery that it came up with
a date/time stamp in 1970, after which the execution of ntpd caused
it to reset it's clock to 1934. Running ntpd a second time, it
executed in 10.07 seconds.
That's when I realized that the time was horribly off, so I
corrected the clock manually (to the second, set from my wristwatch),
and then re-ran ntpd, taking 64.18 seconds. The second and third
times executed in 10.07 seconds. The fourth time was 11.08 seconds,
the fifth was 9.08 seconds, and the sixth was 10.08 seconds.
Starting up my own local nameserver, the first execution of ntpd
took 15.07 seconds. All the subsequent executions of ntpd took
exactly 8.07 seconds.
Please note that all of my tests were done without a drift file.
I didn't discover that until after I had completed them all. With a
drift file, I imagine that the ntpd executions could well have
completed even faster.
Also note that I didn't muck with minpoll, and that the versions
of ntpdate and ntpd I'm working with are from 4.1.80-rc1@1.1111-r
which I had built and installed myself from a BitKeeper extract of
ntp-dev at a time that was a little before 4.2.0 was officially
released. So, the new "tos" option that Dr. Mills has added was not
available to me, and I'm not testing from the latest code.
Finally, the earlier tests I posted about were all using an
/etc/resolv.conf which pointed first to a server on my local LAN that
has actually been down for days, and the tests I've run here are more
reflective of what you should expect to see in the real world, with
properly operating nameservers.
There is a slight advantage to having a nameserver running
locally to the same system running ntpd, but not a whole lot.
Obviously, you don't want to point your resolv.conf to
non-operational nameservers, but with ntpd, this shouldn't kill you.
So long as the nameservers are reasonably "close" to your ntpd
server, and are not excessively overloaded, the additional delay
should be relatively minimal. And caching effects resulting from a
larger central nameserver can have a big impact over a de-populated
nameserver running on the same machine.
However, I still see no advantage to using ntpdate over ntpd --
correcting for DNS caching issues, I did not see any ntpdate runs
that were faster than the corresponding ntpd startups. While I'm not
100% convinced that my lockup problems are the fault of ntpdate (they
could be an OS fault, or a problem with my particular machine), it is
clear that ntpdate causes problems in this area for me that ntpd does
not.
As I said before, there are lots of factors at work here.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/6/2005 6:49:54 PM
|
|
At 11:51 AM +0000 2005-02-06, Tom Smith wrote:
> Maybe you missed the data showing identical conditions and
> a greater than 50:1 difference between the 2? One is 2 notes back.
> The note you replied to. There are 2 or 3 other previous posts with
> detailed data showing the same ting.
Those could be reasonably explained by DNS caching effects. When
comparing the performance of ntpdate to ntpd, you need to compensate
for that.
> Including a down server. ntpd has the same problems if you don't
> give it enough servers if they're down, after all.
Right, but if you feed ntpdate only one server, or only three
servers (as cut down from your "large" ntp.conf), in order to try to
get it to start up that vitally critical few seconds faster, and that
server is down (or one or more of those servers is down), you could
well be seriously toasted.
This is a case which ntpd handles better than ntpdate, given
suitably large numbers of servers to each.
> Yes, welll, perhaps. I'll take the demonstrated and often seen failure
> modes of not doing it over the theoretical ones, though.
There's a limit to what we can do when comparing the
dain-bramaged use of simple tools which people have in the past shot
themselves in the foot. The best we can do is to try to improve the
tools in the future, so as to try to make it more difficult for
people to shoot themselves in the foot.
The problem is that while we're trying to improve the overall
performance of the tools as-used in the field (and help protect
people from their own stupidity), you're asking us to give you more
thermonuclear foot-shooting leeway, because you believe that you know
how better to deal with this problem than Dr. Mills.
I am not convinced that these are design goals that can be made
to be compatible.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/6/2005 6:58:59 PM
|
|
At 11:51 AM +0000 2005-02-06, Per Hedeland wrote:
> Anyway, configuring a local clock
> should not be needed for this purpose, and if it was in your setup,
> maybe there is a bug somewhere.
I will concede that I am running a somewhat old version of the
code, and I may be experiencing problems which have already been
corrected in more recent versions, or I may be having problems on my
system which are unique to the OS or perhaps even unique to my
particular machine.
But I described the situation as I have experienced it.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/6/2005 7:00:36 PM
|
|
At 8:46 AM -0500 2005-02-06, Richard B. Gilbert wrote:
> I'm suggesting that ntpd query the usual suspects using iburst and
> then unconditionally set (not slew) the clock.
I will allow that perhaps there should be a startup mode where
this behaviour is used, but I do not believe that this should be the
default.
If nothing else, we have the 34 year problem whereby you could
easily have your clock mis-set to a completely inappropriate value,
if it's not set closely enough on startup.
> Assuming that you have
> a more or less accurate drift file, and use it, why would that not give
> a fast startup and a time accurate to within, say, twenty milliseconds?
> The time budget would be something like eight seconds to send four
> queries, get responses, and make initial time and frequency settings.
> Compared with 214 seconds to remove 107 milliseconds of a 127
> millisecond offset, that looks pretty good when you are in a hurry to
> get you system back on line.
I don't think that setting a hard eight second limit would be
wise. If nothing else, there are potentially serious delays that can
be caused while waiting for DNS queries to be answered.
If your primary nameserver is located on the same machine as your
ntpd server, there could be very nasty name resolution deadlock
issues which might result, if your startup sequence chooses to start
named after ntpd, instead of the reverse. Throw DNSSEC into the mix,
and you could be in for a very serious world of hurt.
If all goes well, even without a good local Stratum 1 timeserver,
and without a good drift file, you can get pretty much full startup
of ntpd in about eight seconds.
But then all hell could break loose if things don't go well, and
tying yourself to a hard eight second time limit would be about the
worst possible thing you could do under those circumstances.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/6/2005 7:08:16 PM
|
|
Brad Knowles wrote:
> At 11:51 AM +0000 2005-02-06, Tom Smith wrote:
>
>> Maybe you missed the data showing identical conditions and
>> a greater than 50:1 difference between the 2? One is 2 notes back.
>> The note you replied to. There are 2 or 3 other previous posts with
>> detailed data showing the same ting.
>
> Those could be reasonably explained by DNS caching effects. When
> comparing the performance of ntpdate to ntpd, you need to compensate for
> that.
>
Yes, that's right. By removing DNS from the equation, the
experiment clearly shows the difference between the two. When the
experiment is instead constructed to make both dependent on the
same DNS delays, it's really not very surprising that both look
similar. When designing an experiment to compare X and Y,
it's often quite useful to eliminate things that aren't X and Y.
I think what your experiment actually showed was that if you
make both dependent on DNS timeouts, you can make ntpdate as
slow as ntpd at this task.
A lot of folks don't depend on DNS in the first place and
place critical servers in /etc/hosts (or in very secure
environments use only /etc/hosts). With respect to ntpd,
a lot of folks use IP addresses in ntp.conf instead of
names if there is any doubt about DNS server availability.
>> Including a down server. ntpd has the same problems if you don't
>> give it enough servers if they're down, after all.
>
> Right, but if you feed ntpdate only one server, or only three
> servers (as cut down from your "large" ntp.conf), in order to try to get
> it to start up that vitally critical few seconds faster, and that server
> is down (or one or more of those servers is down), you could well be
> seriously toasted.
>
> This is a case which ntpd handles better than ntpdate, given
> suitably large numbers of servers to each.
>
And if you feed ntpd only one server, or only three servers, and
one or more of those servers is down, you can just as well
be just as seriously toasted with ntpd. Dumb is dumb whether
you're using a hammer or a screwdriver. How you choose boot time
servers is a consideration that needs to be made carefully no
matter which method you use. Too few, too many, or subject to
single points of failure (e.g. site power failures) are all bad
choices.
The data in fact show that ntpdate in fact made adjustments closer
to zero to an already stable time than ntpd in all but one case. Not
that I attribute that to anything other than coincidence or consider
a difference of +- 3 milliseconds of any significance at all for
the stated purpose.
>> Yes, welll, perhaps. I'll take the demonstrated and often seen failure
>> modes of not doing it over the theoretical ones, though.
>
> There's a limit to what we can do when comparing the dain-bramaged
> use of simple tools which people have in the past shot themselves in the
> foot. The best we can do is to try to improve the tools in the future,
> so as to try to make it more difficult for people to shoot themselves in
> the foot.
>
> The problem is that while we're trying to improve the overall
> performance of the tools as-used in the field (and help protect people
> from their own stupidity), you're asking us to give you more
> thermonuclear foot-shooting leeway, because you believe that you know
> how better to deal with this problem than Dr. Mills.
What I beieve is that you should let Dave speak for himself and let
carefully chosen and presented data speak for you. Like Dave, I prefer
to base opinions on actual data, and, like Dave, I tend to place more
faith in opinions similarly supported.
>
> I am not convinced that these are design goals that can be made to
> be compatible.
>
Oh ye of little faith.
|
|
0
|
|
|
|
Reply
|
Tom
|
2/6/2005 8:01:16 PM
|
|
At 8:08 PM +0100 2005-02-06, Brad Knowles quoted "Richard B. Gilbert"
<rgilbert88@comcast.net>:
>> The time budget would be something like eight seconds to send four
>> queries, get responses, and make initial time and frequency settings.
>> Compared with 214 seconds to remove 107 milliseconds of a 127
>> millisecond offset, that looks pretty good when you are in a hurry to
>> get you system back on line.
>
> I don't think that setting a hard eight second limit would be wise.
> If nothing else, there are potentially serious delays that can be caused
> while waiting for DNS queries to be answered.
Thinking about this a bit more, there are factors which influence
the startup time for ntpd (and ntpdate) which are beyond our control.
The best I think we could do would be to identify a lower bound on
the accuracy/precision that you would be willing to accept, and an
upper bound on the amount of time you'd like to be spent trying to
achieve that.
It's easy enough to figure out that if you reach the
accuracy/precision lower boundary before you reach the time limit,
you could either go ahead and set the time, or continue to try to get
more accuracy/precision up until you do reach the time limit. That's
easy.
The hard part is figuring out what to do if you reach the time
limit before you reach the accuracy/precision limit? Do you take the
risk of going ahead and setting the clock to a value which could turn
out to be catastrophic? Or do you wait until you have reached the
specified accuracy/precision limit?
I think that ntpd has achieved the best overall balance between
these two issues that it could reasonably do so far, and while I
think we may be able to tune this further (perhaps by using an async
resolver, among other things), I don't know how much more improvement
we can make in this area.
The introduction of the new "tos" control by Dr. Mills will give
you more thermonuclear foot-shooting room. However, I am not at all
convinced that it should actually be used by applications where those
few additional seconds of startup time are critical, for the reasons
previously discussed -- if you're that sensitive to startup time,
then you're almost certainly also more sensitive to time
accuracy/precision, and you're in precisely the situation where you
should avoid scratching that itch.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/6/2005 8:12:37 PM
|
|
At 3:01 PM -0500 2005-02-06, Tom Smith wrote:
> I think what your experiment actually showed was that if you
> make both dependent on DNS timeouts, you can make ntpdate as
> slow as ntpd at this task.
No, my results clearly show that ntpdate is roughly as slow (or
slower) than ntpd on startup for comparable sets of servers,
independent of DNS slowdowns.
> A lot of folks don't depend on DNS in the first place and
> place critical servers in /etc/hosts (or in very secure
> environments use only /etc/hosts). With respect to ntpd,
> a lot of folks use IP addresses in ntp.conf instead of
> names if there is any doubt about DNS server availability.
As we know, IP addresses of servers can change. And when you're
talking about services like pool.ntp.org, since you're using a DNS
round-robin "rotor", the IP addresses are supposed to change on every
query.
So, just using IP addresses alone does not work in the general
case. Indeed, with the recent changes in the Debian, Gentoo,
OpenBSD, NetBSD, and FreeBSD camps, I would submit that there are
probably now more people who are dependant on using pool.ntp.org than
have ever previously hand-coded their own ntp.conf and plugged in
primarily servers by IP address or which were specified in their
/etc/hosts. Then add to that all the MacOS X clients that are using
a time server provided by Apple, and the Windows clients that are
using a time server provided by Microsoft, and you push out the
numbers of DNS-dependant clients much, much further.
If you want to compare carefully created hand-crafted
configurations to anything else, you can show anything you want.
That's a clear case of rigging the jury.
> And if you feed ntpd only one server, or only three servers, and
> one or more of those servers is down, you can just as well
> be just as seriously toasted with ntpd. Dumb is dumb whether
> you're using a hammer or a screwdriver.
If you want to make a tool analogy, try the model of a Yankee
screwdriver as compared to a regular one, or a power model. Anyone
who has ever used a Yankee screwdriver knows that they can be
powerful and faster than a regular model, but they are also much more
likely to seriously injure you than either a regular screwdriver or
an electric one, exclusively because of the inherent design
differences.
Experience teaches us that ntpdate is far more likely to be
abused in stupid ways than ntpd, although stupidity with either can
be fatal.
> The data in fact show that ntpdate in fact made adjustments closer
> to zero to an already stable time than ntpd in all but one case.
No, the data doesn't show that. Your data is clearly different
from my data, and I've gone to significant lengths to try to make the
comparisons as clear and simple as possible.
> What I beieve is that you should let Dave speak for himself and let
> carefully chosen and presented data speak for you. Like Dave, I prefer
> to base opinions on actual data, and, like Dave, I tend to place more
> faith in opinions similarly supported.
Well, we've got some actual data here, and I don't see any
practical advantage to using ntpdate over ntpd.
>> I am not convinced that these are design goals that can be made
>> to be compatible.
>
> Oh ye of little faith.
You're always welcome to step up to the plate and contribute code
which proves your claims. This is an open source project, after all.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/6/2005 8:29:51 PM
|
|
Abandoning the right to remain silent, Brad Knowles at Sun, 06 Feb 2005
19:49:54 +0100 said:
> At 11:13 AM +0000 2005-02-06, Per Hedeland wrote:
>
>> Well, that output only shows it doing the DNS lookups in sequence - it's
>> a bit of a pain to do it any other way given the synchronous nature of
>> gethostby*().
>
I tried it by doing a 'ntp -n -c pe' and putting the resulting 16 IPs in
"ntpdate -d ..."
The first run took ~19 secs, most of which was *reverse* lookups of all
the IPs.
The following 3 runs each took between 3 and 4 secs.
I do run a nameserver on the same box. Apart from caching responses, this
smooths out response from dead nameservers because named caches the
response time and tries the relevent NS with the best response time.
--
Avoid reality at all costs.
$email =~ s/n(.)a(.)n(.)a(.)e(.+)invalid/$1$2$3$4$5au/;
icbm: 33.43.46S 150.59.27E
|
|
0
|
|
|
|
Reply
|
You
|
2/8/2005 10:05:51 AM
|
|
At 10:05 AM +0000 2005-02-08, You have no need to know wrote:
> I tried it by doing a 'ntp -n -c pe' and putting the resulting 16 IPs in
> "ntpdate -d ..."
>
> The first run took ~19 secs, most of which was *reverse* lookups of all
> the IPs.
>
> The following 3 runs each took between 3 and 4 secs.
Listing four servers by IP address and making use of iburst,
Steve has demonstrated that you can reliably get initial startup of
ntpd in seven seconds. Listing six servers by IP address and using
iburst, along with Dr. Mills' recommendation of "tos minclock 4
minsane 4", Steve has shown that you can get startup in eleven
seconds.
In my own testing, I've demonstrated that listing nine servers by
host name (not IP address), with a good local nameserver running on
the same machine, along with using iburst, you can get startup in
eight seconds once the DNS cache is primed, which worsens to twenty
seconds when the DNS cache is empty.
But if seven seconds still isn't fast enough, see
<https://ntp.isc.org/bin/view/Support/StartingNTP4#Section_6.1.4.4.>.
As far as I'm concerned, seven to eight seconds is more than fast
enough all for all situations I can think of, including the financial
industry where you might be losing $50 million per second of
downtime, or military applications.
However, if you absolutely, positively, must have initial ntpd
startup in less time than that, otherwise you will die the most
horrible and painful possible death, other options are documented
that can get your startup down as low as three seconds.
If that's not fast enough for you, then you might as well give up hope.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/8/2005 10:23:20 AM
|
|
Tom Smith wrote:
> [snip]
> # time ntpdate -b [3 servers]
> 5 Feb 22:58:12 ntpdate[234803]: step time server [IP] offset -0.000136 sec
>
> real 0m0.71s [you'll recall that with all the above servers, including
> user 0m0.00s the one that's down, this still took only 6.73 seconds]
> sys 0m0.00s
>
> # time ntpd -gq -c ntp.boot [same 3 servers, iburst minpoll 4]
> ntpd: time slew 0.001556s
>
> real 0m41.01s
> user 0m0.06s
> sys 0m0.60s
>
> # time ntpd -gq -c ntp.boot [same 3 servers, iburst minpoll 1]
> ntpd: time slew 0.002653s
>
> real 0m37.00s
> user 0m0.10s
> sys 0m0.56s
As it turns out, this happened to have been using V4.1.1. The 4.2.0
build on this system was defective and had been moved out of the way.
Retested identically with 4.2.0:
# cat /etc/ntp.drift
-2.157
# time ntpdate -b [3 servers]
Looking for host [name] and service ntp
host found : [name]
Looking for host [name] and service ntp
host found : [name]
Looking for host [name] and service ntp
host found : [name]
8 Feb 16:47:53 ntpdate[488600]: step time server [IP] offset -0.000395 sec
real 0m0.71s
user 0m0.00s
sys 0m0.01s
# time ntpd -gq -c ntp.boot [same 3 servers, iburst minpoll 1]
ntpd: time slew -0.000711s
real 0m7.13s
user 0m0.00s
sys 0m0.03s
So, without DNS in the way, 4.2.0 gets it from a 50:1 difference to a
10:1 difference. A big improvement, to be sure, but this is only with
3 servers (none of them the local clock, of course).
-Tom
|
|
0
|
|
|
|
Reply
|
Tom
|
2/8/2005 10:09:06 PM
|
|
At 5:09 PM -0500 2005-02-08, Tom Smith wrote:
> As it turns out, this happened to have been using V4.1.1. The 4.2.0
> build on this system was defective and had been moved out of the way.
As Dr. Mills has said, the code has been significantly improved
since then. Even the current ntp-dev is significantly improved over
4.2.0-REL. Try building from the latest snapshot tarball. That's
what Steve and I have been doing our most recent testing with.
> So, without DNS in the way, 4.2.0 gets it from a 50:1 difference to a
> 10:1 difference. A big improvement, to be sure, but this is only with
> 3 servers (none of them the local clock, of course).
To the best of my knowledge, all of our collected experience so
far is detailed at
<https://ntp.isc.org/bin/view/Support/StartingNTP4>, with the issues
regarding sensitivity to startup delays in section 6.1.4.4.
In short, with a reasonable number of servers (i.e., six to
nine), with a reasonable /etc/ntp.conf (making use of iburst, "tos
minclock 4 minsane 4", etc...), using NTP servers by hostname instead
of IP address, making sure that the NTP servers are "good" as well as
close by (within 20-50ms delay), a good nameserver running on the
local machine, etc... you should be able to see startup times on the
order of fifteen seconds. I think that's perfectly reasonable for
virtually all situations, including high-value financial businesses
(including those which would lose $50 million per second of downtime)
and military applications.
However, if you absolutely positively have to get that down
further, then listing six servers by IP address and not name, you
should be able to see startup times around eleven seconds. Removing
the "tos minclock 4 minsane 4" and using four servers listed by IP
address, you should be able to get down to seven seconds.
If you must cut that down further, you can make use of "tos
maxdist 16", and get it all the way down to three seconds, but
further reductions in the number of servers won't help you, nor will
anything else we've tried. However, if you do that, then I believe
that you will get what you deserve.
Also note that the lowest minpoll the code allows is four. If
you try to go below that, it silently limits you to this floor.
--
Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
Brad
|
2/8/2005 10:41:16 PM
|
|
Brad,
Let me be sure this is your intent. Your minclock 4 says you must have
at least 4 severs pass the maxdist threshold on the way down before
setting the clock. Against my better judgement the default minclock is
1, so ordinarily the clock is set by the first server found, and it
might be a falseticker. I am happier with 4, because that's the
Byzantine minimum that reliably kicks out a single ralseticker. Your
minsane 4 requires the clustering algorithm to stop tossing out outlyers
when 4 remain. Also good Byzantine. But, if you did that, it would be a
good idea to use a couple more servers, say a total of 6, so the
clusting algorithm could improve the quality. However, with minclock 4,
if for some reason less than that number of servers were actually
available, the clock would never be set.
Is that what you had in mind? I have no problem with it, but I do want
to make sure the model is clearly understood.
Dave
Brad Knowles wrote:
> At 5:09 PM -0500 2005-02-08, Tom Smith wrote:
>
>> As it turns out, this happened to have been using V4.1.1. The 4.2.0
>> build on this system was defective and had been moved out of the way.
>
>
> As Dr. Mills has said, the code has been significantly improved
> since then. Even the current ntp-dev is significantly improved over
> 4.2.0-REL. Try building from the latest snapshot tarball. That's what
> Steve and I have been doing our most recent testing with.
>
>> So, without DNS in the way, 4.2.0 gets it from a 50:1 difference to a
>> 10:1 difference. A big improvement, to be sure, but this is only with
>> 3 servers (none of them the local clock, of course).
>
>
> To the best of my knowledge, all of our collected experience so far
> is detailed at <https://ntp.isc.org/bin/view/Support/StartingNTP4>, with
> the issues regarding sensitivity to startup delays in section 6.1.4.4.
>
> In short, with a reasonable number of servers (i.e., six to nine),
> with a reasonable /etc/ntp.conf (making use of iburst, "tos minclock 4
> minsane 4", etc...), using NTP servers by hostname instead of IP
> address, making sure that the NTP servers are "good" as well as close by
> (within 20-50ms delay), a good nameserver running on the local machine,
> etc... you should be able to see startup times on the order of fifteen
> seconds. I think that's perfectly reasonable for virtually all
> situations, including high-value financial businesses (including those
> which would lose $50 million per second of downtime) and military
> applications.
>
> However, if you absolutely positively have to get that down further,
> then listing six servers by IP address and not name, you should be able
> to see startup times around eleven seconds. Removing the "tos minclock
> 4 minsane 4" and using four servers listed by IP address, you should be
> able to get down to seven seconds.
>
> If you must cut that down further, you can make use of "tos maxdist
> 16", and get it all the way down to three seconds, but further
> reductions in the number of servers won't help you, nor will anything
> else we've tried. However, if you do that, then I believe that you will
> get what you deserve.
>
>
> Also note that the lowest minpoll the code allows is four. If you
> try to go below that, it silently limits you to this floor.
>
|
|
0
|
|
|
|
Reply
|
David
|
2/9/2005 12:43:14 AM
|
|
At 05:05 AM 2/8/2005, You have no need to know wrote:
>Abandoning the right to remain silent, Brad Knowles at Sun, 06 Feb 2005
>19:49:54 +0100 said:
>
> > At 11:13 AM +0000 2005-02-06, Per Hedeland wrote:
> >
> >> Well, that output only shows it doing the DNS lookups in sequence - it's
> >> a bit of a pain to do it any other way given the synchronous nature of
> >> gethostby*().
> >
>
>I tried it by doing a 'ntp -n -c pe' and putting the resulting 16 IPs in
>"ntpdate -d ..."
>
>The first run took ~19 secs, most of which was *reverse* lookups of all
>the IPs.
Did you create a query log in your DNS? If so can you send it to me
so I can analyze it? Did you do the same thing running with ntpd?
You can send this to me directly rather than to the mailing list.
Danny
>The following 3 runs each took between 3 and 4 secs.
>
>I do run a nameserver on the same box. Apart from caching responses, this
>smooths out response from dead nameservers because named caches the
>response time and tries the relevent NS with the best response time.
>
>--
>Avoid reality at all costs.
>$email =~ s/n(.)a(.)n(.)a(.)e(.+)invalid/$1$2$3$4$5au/;
>icbm: 33.43.46S 150.59.27E
>
>_______________________________________________
>questions mailing list
>questions@lists.ntp.isc.org
>https://lists.ntp.isc.org/mailman/listinfo/questions
|
|
0
|
|
|
|
Reply
|
Danny
|
2/9/2005 1:58:20 AM
|
|
Hello,
Is there any specific reason behind the change in name for xntpd in NTP
version 4?
There are many scripts which assumes "xntpd" as the name. Is it a good
idea to rename ntpd to xntpd and use it?
Regards,
Sajitha
|
|
0
|
|
|
|
Reply
|
Sajitha
|
2/13/2005 7:20:15 AM
|
|
It should never have been released as xntpd in the first place.
H
|
|
0
|
|
|
|
Reply
|
Harlan
|
2/14/2005 4:40:58 AM
|
|
"David L. Mills" <mills@udel.edu> wrote:
> Let me be sure this is your intent. Your minclock 4 says you must have
> at least 4 severs pass the maxdist threshold on the way down before
> setting the clock.
That's basically what I had understood.
> Against my better judgement the default
> minclock is 1, so ordinarily the clock is set by the first server found,
> and it might be a falseticker. I am happier with 4, because that's the
> Byzantine minimum that reliably kicks out a single ralseticker.
Which was the recommendation of your message in
<http://lists.ntp.isc.org/pipermail/questions/2003-September/000737.html>.
> Your
> minsane 4 requires the clustering algorithm to stop tossing out outlyers
> when 4 remain. Also good Byzantine. But, if you did that, it would be a
> good idea to use a couple more servers, say a total of 6, so the
> clusting algorithm could improve the quality. However, with minclock 4,
> if for some reason less than that number of servers were actually
> available, the clock would never be set.
I am currently using more than six upstream servers. Actually, I'm
currently using fourteen, and I should kick that down quite a bit, so as to
reduce my use of the pool.ntp.org servers.
> Is that what you had in mind? I have no problem with it, but I do want
> to make sure the model is clearly understood.
Yup, that was what I wanted. However, based on your comments here, I
should update the page at
<https://ntp.isc.org/bin/view/Support/StartingNTP4#Section_6.1.4.3.> to
include the appropriate caveats.
Thanks!
--
Brad Knowles, <brad@stop.mail-abuse.org>
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
-Benjamin Franklin, Historical Review of Pennsylvania.
SAGE member since 1995. See <http://www.sage.org/> for more info.
|
|
0
|
|
|
|
Reply
|
brad
|
2/15/2005 9:41:13 AM
|
|
> <https://ntp.isc.org/bin/view/Support/StartingNTP4#Section_6.1.4.3.> to
This page glosses over one important little detail. Boot order. Can
the folks pushing for "ntpd -g -N" over "ntpdate" please put a
concrete boot order proposal forward? Of particular interest would be
to see their proposed ordering for:
bind ntpd syslogd
Under the old way of doing things we had a relatively clean order of:
ntpdate
syslogd
bind
ntpd
That allowed for: 1) the rough date to be set before syslogd started
writing into the logfiles. 2) bind to be started before the symbolic
names in ntp.conf needed processing. 3) the ntpd (and bind) startup
messages would be saved to the syslogs so one had a record if things
went wrong. One still needed to have some of the ntpd server names in
/etc/hosts for ntpdate, but one didn't need to have all of the names
in there and the IP addresses didn't all have to be 100% correct since
a few resolve failures at this point wouldn't really matter.
-wolfgang
--
Wolfgang S. Rupprecht http://www.wsrcc.com/wolfgang/
Hate software patents? Sign here: http://thankpoland.info/
|
|
0
|
|
|
|
Reply
|
Wolfgang
|
2/15/2005 5:47:59 PM
|
|
http://ntp.isc.org/Support/StartingNTPDev would be a great place to discuss
this as well.
H
|
|
0
|
|
|
|
Reply
|
Harlan
|
2/15/2005 11:41:46 PM
|
|
|
98 Replies
638 Views
(page loaded in 0.646 seconds)
Similiar Articles: ntpd, boot time, and hot plugging - comp.protocols.time.ntp ...There's been some discussion on the Fedora-devel list about ways to speed up booting for workstations. One of the things that slows down the boot pr... ntpd.exe fails, while its NT service continues to run - comp ...ntpd, boot time, and hot plugging - comp.protocols.time.ntp ..... change the pre-computed drift while doing this 2) start ntpd 3) start time-dependent services It ... NTPd looses sync regularly / 12 hour intervals. - comp.protocols ...ntpd, boot time, and hot plugging - comp.protocols.time.ntp ..... after the first hour and at hourly intervals ... it will acquire sync again. TT_ERR_NO_MATCH in Solaris 5.9 exiting Exceed - comp.unix.solaris ...ntpd, boot time, and hot plugging - comp.protocols.time.ntp ... The only thing the -g does is exit the daemon when the ... No. -g allows ntpd to EXCEED the sanity ... Lognormal distribution of stock price using volatility. - comp ...ntpd, boot time, and hot plugging - comp.protocols.time.ntp ... You should discuss that with a bank or stock exchange ... Given the cheap price of a consumer GPS receiver ... Starting ntpd with no network - comp.protocols.time.ntpntpd, boot time, and hot plugging - comp.protocols.time.ntp ..... opinions on how to organize the interaction between ntpd, the OS, the boot scripts, the network ... NTPd with proxy - comp.protocols.time.ntp... sparc 8 smpatch update/analyze Response code was 500 ..... patchdb > patchpro.proxy.host XX.XX.XX.XX "" > patchpro.proxy.passwd ... ntpd, boot time, and hot plugging ... Regarding the PPS skew between various GPS receiver units ...ntpd, boot time, and hot plugging - comp.protocols.time.ntp ... VMS, although it keeps time in units of 100 ... How "bad" is the time from a GPS receiver with NMEA but no ... Problem in adding optional parameters in DHCP Discover request ...ntpd, boot time, and hot plugging - comp.protocols.time.ntp ... The problem is that there are many services which really ... At least, it should have this as an optional ... kernel panic after upgrade from redhat 9 to centos 5 - comp.unix ...ntpd, boot time, and hot plugging - comp.protocols.time.ntp ..... is to run "ntptime -f 0" to kill the kernel ... log if the offset exceeds the panic ... and a fourth ... Apple Mac mini instabiliy - comp.protocols.time.ntpHow is the drift value in ntp.drift calculated? ... ... calculating drift value for ntp.drift? - comp.protocols.time.ntp ... ntpd, boot time, and hot plugging - comp ... Performing incremental code coverage with modelsim - comp.arch ...ntpd, boot time, and hot plugging - comp.protocols.time.ntp ... Tom, The code I see in the ntpdate source does an ... were in an area that doesn't have good wireless ... Dendrogram cut according to number of clusters - comp.soft-sys ...ntpd, boot time, and hot plugging - comp.protocols.time.ntp ... (Measure with micrometer, mark with chalk, cut with ax?) ... or look for changes in the ntpd.pid file, or ... Error - Could not start transaction; too many transactions already ...ntpd, boot time, and hot plugging - comp.protocols.time.ntp ..... server come up with a clock error of one second, just boot and start ... take over that load, and not a ... Linux clock glitch - drift jump on reboot - comp.protocols.time ...If ntpd was stable on 151 PPM, and now it's stable on 115 PPM, maybe ... drift ... info on via the network then you are jumping up ... ntpd, boot time, and hot plugging ... ntpd, boot time, and hot plugging - comp.protocols.time.ntp ...There's been some discussion on the Fedora-devel list about ways to speed up booting for workstations. One of the things that slows down the boot pr... [ntp:questions] Frequency Errors +500 PPM...Next message: [ntp:questions] Re: ntpd, boot time, and hot plugging Messages sorted by: ... here is a snip of my ntp.log file: >>>> >>>> 3 Feb 10:00:14 ntpd[412]: kernel time ... 7/23/2012 8:18:51 PM
|