ntpd, boot time, and hot plugging

  • Follow


There's been some discussion on the Fedora-devel list about ways to speed 
up booting for workstations. One of the things that slows down the boot 
process is waiting for an initial network time sync. I'd like to solicit 
opinions on how to organize the interaction between ntpd, the OS, the boot 
scripts, the network interfaces (which may come and go; think mobile 
devices), and possible hot-plugged local time sources. There was also 
discussion a few months ago about getting NTP server addresses from DHCP, 
so that should be considered.
0
Reply Kenneth 2/2/2005 2:48:02 PM

With a good ntp.drift file and the use of iburst, ntpd should be able to get
things in line in 11-15 seconds.

H
0
Reply Harlan 2/2/2005 4:21:44 PM


At 8:48 AM -0600 2005-02-02, Kenneth Porter wrote:

>  There's been some discussion on the Fedora-devel list about ways to speed
>  up booting for workstations. One of the things that slows down the boot
>  process is waiting for an initial network time sync. I'd like to solicit
>  opinions on how to organize the interaction between ntpd, the OS, the boot
>  scripts, the network interfaces (which may come and go; think mobile
>  devices), and possible hot-plugged local time sources.

	The problem is that there are many services which really need 
proper time sync in order to operate correctly.  You could just turn 
off time sync and let these things run freely, or you could find some 
way to shift the startup sequence so that only those things that 
depend on time sync are started after ntpd, and everything else is 
started before.

	But beyond the standard mechanisms to speed up the initialization 
process of starting ntpd (e.g., using "iburst" on all the server 
configuration lines in your /etc/ntp.conf, etc...), I don't see any 
other ways to make this process faster.

	Either you do or do not run ntpd.  Either you do or do not run 
those services which depend on good time sync.  I don't see that 
there can be any other options.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/2/2005 4:23:04 PM

Brad Knowles <brad@stop.mail-abuse.org> wrote in 
news:mailman.4.1107361920.583.questions@lists.ntp.isc.org:

>      The problem is that there are many services which really need 
> proper time sync in order to operate correctly.  You could just turn 
> off time sync and let these things run freely, or you could find some 
> way to shift the startup sequence so that only those things that 
> depend on time sync are started after ntpd, and everything else is 
> started before.

Does not "needing good time sync" imply communications with the outside 
world? So wouldn't those items fail for other reasons if NTP wasn't up yet? 
Ideally things that need quality time *and* connectivity would wake up when 
both conditions came true, and take some other action when connectivity was 
removed. It would be desirable for applications to be able to register to 
be notified of these events.
0
Reply Kenneth 2/2/2005 5:53:18 PM

On 2005-02-02, Brad Knowles <brad@stop.mail-abuse.org> wrote:

> At 8:48 AM -0600 2005-02-02, Kenneth Porter wrote:
>
>>  There's been some discussion on the Fedora-devel list about ways to
>>  speed up booting for workstations. One of the things that slows down
>>  the boot process is waiting for an initial network time sync. I'd
>>  like to solicit opinions on how to organize the interaction between
>>  ntpd, the OS, the boot scripts, the network interfaces (which may
>>  come and go; think mobile devices), and possible hot-plugged local
>>  time sources.
>
>  The problem is that there are many services which really need proper
> time sync in order to operate correctly.

<snip>

> But beyond the standard mechanisms to speed up the initialization
> process of starting ntpd (e.g., using "iburst" on all the server
> configuration lines in your /etc/ntp.conf, etc...), I don't see any
> other ways to make this process faster.

My informal tests show that ntpd needs somewhere between 7 and 20
seconds to intially set the clock (using 'ntpd -gq'). It would be
reasonable to assume that starting ntpd with '-g' will take roughly the
same amount of time.

We need to keep in mind the fact that we're talking about workstations,
not servers, and that 'ntpq -g' can, and usually does, run in the
background.

-- 
Steve Kostecke <kostecke@ntp.isc.org>
NTP Public Services Project - http://ntp.isc.org/
0
Reply Steve 2/2/2005 6:08:39 PM

At 11:53 AM -0600 2005-02-02, Kenneth Porter wrote:

>  Ideally things that need quality time *and* connectivity would wake up when
>  both conditions came true, and take some other action when connectivity was
>  removed. It would be desirable for applications to be able to register to
>  be notified of these events.

	Feel free to rewrite the entire OS and all the applications to 
work in this manner.

	In the meanwhile, the rest of us will try to find what solutions 
we can that will work with the existing OSes and applications we have 
available to us.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/2/2005 6:52:35 PM

Kenneth,

This is the single most persistent issue in the engineering design of 
NTP. There must be tradeoffs between security, robustenss, accuracy and 
initial delay. In the current design compromise, a server is acceptable 
only after three/four rounds of messages and the ensemble time is 
acceptable with at least one of possibly several acceptable servers. 
With IBURST mode, takes takes 6-8 seconds.

For better robustness use "tos minclock N", where the at least N 
(default 1) servers must be acceptable to set the clock. Tonight I put 
in a "tos maxdist M", where M is the distance threshold below which the 
server is acceptable. Set "tos maxdist 16" and the first sample received 
from any server will set the clock likety-split. Of course, essentially 
all the mitigation algorithms using multiple-sample redundancy and 
multiple-server diversity are systematically defeated. You might as well 
use SNTP.

Dave

Kenneth Porter wrote:

> There's been some discussion on the Fedora-devel list about ways to speed 
> up booting for workstations. One of the things that slows down the boot 
> process is waiting for an initial network time sync. I'd like to solicit 
> opinions on how to organize the interaction between ntpd, the OS, the boot 
> scripts, the network interfaces (which may come and go; think mobile 
> devices), and possible hot-plugged local time sources. There was also 
> discussion a few months ago about getting NTP server addresses from DHCP, 
> so that should be considered.
0
Reply David 2/3/2005 3:16:46 AM

At 3:00 PM +0000 2005-02-03, Tom Smith wrote:

>  I know the subject has been workstations, but let's talk for a moment
>  about this religion as it concerns servers - like the ones that run
>  telephone companies, stock exchanges, and banks inside heavily
>  defended firewalls. It's the same issue, it's just that the stakes
>  are higher. The issue is how quickly can you get these
>  systems back up at boot. 15-30 seconds is a long time to wait.
>  Too long.

	With a decent drift file and using iburst throughout the server 
definitions, Steve has demonstrated that you can get this down to 
about seven seconds across a cable modem line, without any local 
Stratum 1 time servers.  This is real-world experience.

	If your servers are time-sensitive, then they should be the ones 
best able to tolerate that extra seven seconds during the startup 
phase.  The more important it is to have the time correct, the more 
important it is that you be able to tolerate short delays on startup.

	If you want to make that delay shorter, I guess you could package 
Stratum 1 refclocks with every machine.

>  We're not talking about one-shot sampling for maintaining the time,
>  so comparisons to SNTP are not helpful. We're talking about speed of
>  acquistion of an initial "good enough" time, keeping in mind that the
>  perfect is often the enemy of the good.

	Seven seconds to find "good enough" seems to be a pretty good 
balance to me.

	However, if you want to shoot yourself in the foot with a 
thermonuclear bomb, please feel free to do so.

>  The reason why so many of your constituency keep bringing this
>  subject up is that they know that ntpd needs a good (not perfect)
>  estimate of the time before it starts and that critical systems
>  can't wait for perfection to get that estimate.

	I don't know how much more perfection you want.  If you can't 
tolerate seven seconds during the startup phase, then you're using 
the wrong protocols.

	If you need a true fault-tolerant real-time system with 
resolution down to attoseconds, and those seven additional seconds 
during startup are effectively seven additional aeons for your 
application and you cannot possibly tolerate them, then you shouldn't 
be using TCP/IP, Unix, or anything else that anyone on this list 
would recognize.

	In this case, ntpd is the least of your worries.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/3/2005 2:56:51 PM

David L. Mills wrote:
> Kenneth,
> 
> This is the single most persistent issue in the engineering design of 
> NTP. There must be tradeoffs between security, robustenss, accuracy and 
> initial delay. In the current design compromise, a server is acceptable 
> only after three/four rounds of messages and the ensemble time is 
> acceptable with at least one of possibly several acceptable servers. 
> With IBURST mode, takes takes 6-8 seconds.
> 
> For better robustness use "tos minclock N", where the at least N 
> (default 1) servers must be acceptable to set the clock. Tonight I put 
> in a "tos maxdist M", where M is the distance threshold below which the 
> server is acceptable. Set "tos maxdist 16" and the first sample received 
> from any server will set the clock likety-split. Of course, essentially 
> all the mitigation algorithms using multiple-sample redundancy and 
> multiple-server diversity are systematically defeated. You might as well 
> use SNTP.

David,

I know the subject has been workstations, but let's talk for a moment
about this religion as it concerns servers - like the ones that run
telephone companies, stock exchanges, and banks inside heavily
defended firewalls. It's the same issue, it's just that the stakes
are higher. The issue is how quickly can you get these
systems back up at boot. 15-30 seconds is a long time to wait.
Too long.

We're not talking about one-shot sampling for maintaining the time,
so comparisons to SNTP are not helpful. We're talking about speed of
acquistion of an initial "good enough" time, keeping in mind that the
perfect is often the enemy of the good.

You might argue that if boot time is critical, just let the server come
up with whatever random time it comes up with and let ntpd fix
it up later. Give it a "-g" so it doesn't complain. A lot of folks
have tried this in the past inadvertently (and continue to do so)
by neglecting to put ntpdate into their boot sequence ahead of ntpd.
I've fixed a lot of systems whose drift files were pinned
at 500 ppm and whose systems ran perpetually fast or slow as
a result. We've also spent a lot of money fruitlessly replacing
motherboards on those systems. Turning a large initial offset over
to ntpd is decidedly NOT a Good Idea.

The reason why so many of your constituency keep bringing this
subject up is that they know that ntpd needs a good (not perfect)
estimate of the time before it starts and that critical systems
can't wait for perfection to get that estimate.

-Tom
________________________________________________________________________
Tom Smith                       smith@alum.mit.edu,smith@cag.lkg.hp.com
Hewlett-Packard Company                          Tel: +1 (603) 884-6329
110 Spit Brook Road ZKO1-3/H42                   FAX: +1 (603) 884-6484
Nashua, New Hampshire 03062-2698, USA           Mobile: +1 978 397 3411

0
Reply Tom 2/3/2005 3:00:21 PM

Brad Knowles wrote:
 > At 3:00 PM +0000 2005-02-03, Tom Smith wrote:
 >
 >>  I know the subject has been workstations, but let's talk for a moment
 >>  about this religion as it concerns servers - like the ones that run
 >>  telephone companies, stock exchanges, and banks inside heavily
 >>  defended firewalls. It's the same issue, it's just that the stakes
 >>  are higher. The issue is how quickly can you get these
 >>  systems back up at boot. 15-30 seconds is a long time to wait.
 >>  Too long.
 >
 >
 >     With a decent drift file ...

Precisely. The decent drift file is a problem. It sometimes doesn't
exist after a large initial offset has been turned over to ntpd.
Now, if ntpd all by itself did a quick acquisition, didn't
count that initial clock setting in any way into the frequency
correction, and blocked the startup script progress until that
was complete and it was safe to proceed with starting the
time-sensitive stuff, all would be well with the world.
If I've missed how that happens, I apologize.

 >     If your servers are time-sensitive, then they should be the ones
 > best able to tolerate that extra seven seconds during the startup
 > phase.

You should discuss that with a bank or stock exchange that
is losing millions in transactions during those seconds
or with public utility that is paying the government
penalties for downtime. :-)

 >  The more important it is to have the time correct, the more
 > important it is that you be able to tolerate short delays on startup.

Well, no. As David pointed out in his posting, all engineering
is a matter of tradeoffs. For many users, the tradeoff needs
to be 'Get these applications up fast on a "good enough"
time and refine the time (and frequency) in the background.'

 >     Seven seconds to find "good enough" seems to be a pretty good
 > balance to me.
 >

Perhaps it is. For you. If it's seven seconds.

 >     I don't know how much more perfection you want.  If you can't
 > tolerate seven seconds during the startup phase, then you're using the
 > wrong protocols.

I don't want perfection at all. That's the point. ntpd gets it as right
as it needs to be. It just has to have something reasonable
to work with when it starts.
0
Reply Tom 2/3/2005 4:06:42 PM

Brad Knowles <brad@stop.mail-abuse.org> writes:
> 	If you want to make that delay shorter, I guess you could
> package Stratum 1 refclocks with every machine.

I'd be waiting for minutes if I waited till ntpd decided it had a
"good enough" estimate of the Motorola Oncore's time.  Ntpd *could*
have its first time estimate accurate to well under 1ms in half a
second on average.  The problem is, it won't tell you the time or step
the system clock until it filters the crap out of the refclock signal.

Ntpd has a hard act to follow.  The old ntpdate program was a near
ideal solution from the perspective of booting.  It did its job
quickly and got off the pot.

-wolfgang
-- 
Wolfgang S. Rupprecht                http://www.wsrcc.com/wolfgang/
     Hate software patents?  Sign here: http://thankpoland.info/
0
Reply Wolfgang 2/3/2005 5:23:57 PM

> Precisely. The decent drift file is a problem. It sometimes doesn't
> exist after a large initial offset has been turned over to ntpd.
> Now, if ntpd all by itself did a quick acquisition, didn't
> count that initial clock setting in any way into the frequency
> correction, and blocked the startup script progress until that
> was complete and it was safe to proceed with starting the
> time-sensitive stuff, all would be well with the world.
> If I've missed how that happens, I apologize.

I always wondered why ntpd would throw a valuable drift value out the window
when it encounters an offset at startup, and would try to explain the
offset with a ridiculous frequency error of +- 500ppm and take forever to
settle, rather than correct the initial offset, load the drift value, and
be happy.

Such behaviour would also make the startup scripts easier.

Roman Maeder

0
Reply Roman 2/3/2005 6:33:16 PM

At 10:21 AM -0500 2005-02-03, Tom Smith wrote:

>>      With a decent drift file ...
>
>  Precisely. The decent drift file is a problem. It sometimes doesn't
>  exist after a large initial offset has been turned over to ntpd.

	Even without a good drift file, you can still sync very quickly. 
It may not be seven seconds, it may be fifteen.  But that should 
still be tolerable.

>  You should discuss that with a bank or stock exchange that
>  is losing millions in transactions during those seconds
>  or with public utility that is paying the government
>  penalties for downtime. :-)

	My wife is general counsel, head of legal, and secretary to the 
board for the world's largest clearing and settlement firm for 
European stocks and bonds, with an annual turnover in excess of 256 
trillion Euro last year, and assets under management in excess of 
twelve trillion Euros.  Yes, I mean trillion.

	When Argentina decides not to make their interest payments on 
their Brady bond debt, because 80% of their bonds are held through 
her company, the final decision of whether or not to declare what 
used to be the world's seventh largest economy officially bankrupt, 
arrives on her desk.

	I understand the scale of the problem.  With over a trillion Euro 
of turnover in a single workday, milliseconds do count.

>  Well, no. As David pointed out in his posting, all engineering
>  is a matter of tradeoffs. For many users, the tradeoff needs
>  to be 'Get these applications up fast on a "good enough"
>  time and refine the time (and frequency) in the background.'

	So, doing a single query and taking whatever bogus time may be 
set from that server, is more important than waiting a few more 
seconds to make sure that you've got a pretty good timesync?

	I'm sorry, I don't buy it.  The bigger the application, the more 
you have to lose, the more important it is to have good time sync.


	See above -- milliseconds do count.

>  Perhaps it is. For you. If it's seven seconds.

	For financial applications, if the server goes down, then your 
N+M fault-tolerant systems take over that load, and not a single 
transaction is dropped or excessively delayed.  If your main server 
facility is taken out by terrorists or natural disaster, then your 
hot spare facility, that is located hundreds or thousands of miles 
away, takes over and a few transactions might be delayed, but nothing 
is dropped.

	If you're running something that mission-critical and you don't 
have those kinds of systems (which can tolerate a few extra seconds 
of startup time in order to ensure that the time is set reasonably 
well), then you are shooting yourself in the foot with a 
thermonuclear weapon, and you will get what you deserve.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/3/2005 7:19:12 PM

Tom,

It is true that ntpd is specifically engineered for Internet badlands 
where popcorn spikes, evil masqueraders, misbehaving clocks and other 
vermin might poison your DNS cache. A client using the pool servers 
scheme really needs this ammunition.

The current version tries hard to strike a compromise between verifiable 
assertions and acquisition speed. However, with the twinkle I described 
earlier you can engineer any compromise your wish, including set the 
clock on the first response received and in principle when the first two 
responses from at least two servers and so on.

In very many simulation runs here I found it hard to get into a true 
lockup condtion where the daemon did not recover from large initial time 
or frequency offsets, even with the -x option. However, there are some 
things the simulator can't pick up, like a large frequency offset 
pre-programmed in the kernel. The recommended repair procedure should 
that somehow happen is to run "ntptime -f 0" to kill the kernel offset 
and then remove the ntp.drift file. Upon restart ntpd measures the 
intrinsic frequency offset over about fifteen minutes, sets the clock 
and resumes normal operation. I would think this a good way to determine 
if a motherboard is or is not acceptable. I've seen lots of motherboards 
and found most of them within 100 PPM and all of them within 500 PPM. 
Even if over 500 PPM the clock is still disciplined but the offset 
cannot be forced to zero.

So, best advice is to run ntpd with -g and "tos maxdist 16" in the 
configuration file. I assume the version with this command will soon 
appear as a snapshot. Note that the only thing this does is admit 
servers to the selection algorithm no matter what the synchronization 
distance is. Ordinarily the distance starts from 16 and reduces by half 
for each response received. Other than this criterion, the algorithms 
operate without change. You can do your own security analysis.

Dave

Tom Smith wrote:
> David L. Mills wrote:
> 
>> Kenneth,
>>
>> This is the single most persistent issue in the engineering design of 
>> NTP. There must be tradeoffs between security, robustenss, accuracy 
>> and initial delay. In the current design compromise, a server is 
>> acceptable only after three/four rounds of messages and the ensemble 
>> time is acceptable with at least one of possibly several acceptable 
>> servers. With IBURST mode, takes takes 6-8 seconds.
>>
>> For better robustness use "tos minclock N", where the at least N 
>> (default 1) servers must be acceptable to set the clock. Tonight I put 
>> in a "tos maxdist M", where M is the distance threshold below which 
>> the server is acceptable. Set "tos maxdist 16" and the first sample 
>> received from any server will set the clock likety-split. Of course, 
>> essentially all the mitigation algorithms using multiple-sample 
>> redundancy and multiple-server diversity are systematically defeated. 
>> You might as well use SNTP.
> 
> 
> David,
> 
> I know the subject has been workstations, but let's talk for a moment
> about this religion as it concerns servers - like the ones that run
> telephone companies, stock exchanges, and banks inside heavily
> defended firewalls. It's the same issue, it's just that the stakes
> are higher. The issue is how quickly can you get these
> systems back up at boot. 15-30 seconds is a long time to wait.
> Too long.
> 
> We're not talking about one-shot sampling for maintaining the time,
> so comparisons to SNTP are not helpful. We're talking about speed of
> acquistion of an initial "good enough" time, keeping in mind that the
> perfect is often the enemy of the good.
> 
> You might argue that if boot time is critical, just let the server come
> up with whatever random time it comes up with and let ntpd fix
> it up later. Give it a "-g" so it doesn't complain. A lot of folks
> have tried this in the past inadvertently (and continue to do so)
> by neglecting to put ntpdate into their boot sequence ahead of ntpd.
> I've fixed a lot of systems whose drift files were pinned
> at 500 ppm and whose systems ran perpetually fast or slow as
> a result. We've also spent a lot of money fruitlessly replacing
> motherboards on those systems. Turning a large initial offset over
> to ntpd is decidedly NOT a Good Idea.
> 
> The reason why so many of your constituency keep bringing this
> subject up is that they know that ntpd needs a good (not perfect)
> estimate of the time before it starts and that critical systems
> can't wait for perfection to get that estimate.
> 
> -Tom
> ________________________________________________________________________
> Tom Smith                       smith@alum.mit.edu,smith@cag.lkg.hp.com
> Hewlett-Packard Company                          Tel: +1 (603) 884-6329
> 110 Spit Brook Road ZKO1-3/H42                   FAX: +1 (603) 884-6484
> Nashua, New Hampshire 03062-2698, USA           Mobile: +1 978 397 3411
> 
0
Reply David 2/3/2005 7:50:06 PM

Tom,

I get nervous about nonquantitative statements, since they might start 
urban legends. A "decent" frequency file is one created when first 
starting ntpd without the file and letting it determine the intrinsic 
frequency error. This takes about fifteen minutes. However, the 
frequency file itself is written only after the first hour and at hourly 
intervals after that. The discipline should be stable even if the 
frequency file is present and intentionally set as much as +-500 PPM in 
error and that even with a large initial time offset. This has been 
confirmed by simulation; however, the simulations assume the adjtime() 
system call operates as in original Unix model; the Solaris adjtime() is 
a killer when large offsets are involved.

Dave

Tom Smith wrote:

> Brad Knowles wrote:
>  > At 3:00 PM +0000 2005-02-03, Tom Smith wrote:
>  >
>  >>  I know the subject has been workstations, but let's talk for a moment
>  >>  about this religion as it concerns servers - like the ones that run
>  >>  telephone companies, stock exchanges, and banks inside heavily
>  >>  defended firewalls. It's the same issue, it's just that the stakes
>  >>  are higher. The issue is how quickly can you get these
>  >>  systems back up at boot. 15-30 seconds is a long time to wait.
>  >>  Too long.
>  >
>  >
>  >     With a decent drift file ...
> 
> Precisely. The decent drift file is a problem. It sometimes doesn't
> exist after a large initial offset has been turned over to ntpd.
> Now, if ntpd all by itself did a quick acquisition, didn't
> count that initial clock setting in any way into the frequency
> correction, and blocked the startup script progress until that
> was complete and it was safe to proceed with starting the
> time-sensitive stuff, all would be well with the world.
> If I've missed how that happens, I apologize.
> 
>  >     If your servers are time-sensitive, then they should be the ones
>  > best able to tolerate that extra seven seconds during the startup
>  > phase.
> 
> You should discuss that with a bank or stock exchange that
> is losing millions in transactions during those seconds
> or with public utility that is paying the government
> penalties for downtime. :-)
> 
>  >  The more important it is to have the time correct, the more
>  > important it is that you be able to tolerate short delays on startup.
> 
> Well, no. As David pointed out in his posting, all engineering
> is a matter of tradeoffs. For many users, the tradeoff needs
> to be 'Get these applications up fast on a "good enough"
> time and refine the time (and frequency) in the background.'
> 
>  >     Seven seconds to find "good enough" seems to be a pretty good
>  > balance to me.
>  >
> 
> Perhaps it is. For you. If it's seven seconds.
> 
>  >     I don't know how much more perfection you want.  If you can't
>  > tolerate seven seconds during the startup phase, then you're using the
>  > wrong protocols.
> 
> I don't want perfection at all. That's the point. ntpd gets it as right
> as it needs to be. It just has to have something reasonable
> to work with when it starts.
0
Reply David 2/3/2005 8:09:16 PM

Brad Knowles <brad@stop.mail-abuse.org> wrote in 
news:mailman.20.1107442717.583.questions@lists.ntp.isc.org:

>      If you want to make that delay shorter, I guess you could package 
> Stratum 1 refclocks with every machine.

Given the cheap price of a consumer GPS receiver, this doesn't sound that 
crazy. How fast can ntpd lock on to the NMEA messages?
0
Reply Kenneth 2/3/2005 8:20:53 PM

"David L. Mills" <mills@udel.edu> wrote in news:cttv9l$ql7$1
@dewey.udel.edu:

> It is true that ntpd is specifically engineered for Internet badlands 
> where popcorn spikes, evil masqueraders, misbehaving clocks and other 
> vermin might poison your DNS cache. A client using the pool servers 
> scheme really needs this ammunition.

What about mobile clients? In the mobile environment, what does ntpd do 
when you sever the network connection (ie. undock)? Suppose I undock 
(taking down eth1) and plug in down the hall with the built-in NIC 
(bringing up eth0). What must one do to make ntpd tolerant of that? Or must 
mobile apps give up quality time because their network interfaces are 
transient? Would one need to script a complete stop/start of ntpd whenever 
interfaces come and go?
0
Reply Kenneth 2/3/2005 8:23:12 PM

How timely this thread is.  Given my volatile NTP situation...I am  
starting to believe that using a VME-based GPS source as a reference  
clock to my NTP daemon in a VME-based SPARC SBC isn't a good idea.  It  
has been one thing after another...and the "clients" are not getting  
good time from their sole-source server.  The main problem that I've  
encountered is +500 PPM and steps during operations - which has  
effectively ruined my data.  So...on to plan B???

I am wondering if a simple, static network time server is a better  
option...guess so at this point.

Thanks for the info.

Kit

------------------------------------------------------------------------
Kit Plummer
Operations Research and System Performance Dept.
Raytheon Missile Systems

On Feb 3, 2005, at 2:13 PM, Tom Smith wrote:

> David L. Mills wrote:
>> I get nervous about nonquantitative statements, since they might  
>> start urban legends. A "decent" frequency file is one created when  
>> first starting ntpd without the file and letting it determine the  
>> intrinsic frequency error. This takes about fifteen minutes. However,  
>> the frequency file itself is written only after the first hour and at  
>> hourly intervals after that. The discipline should be stable even if  
>> the frequency file is present and intentionally set as much as +-500  
>> PPM in error and that even with a large initial time offset. This has  
>> been confirmed by simulation; however, the simulations assume the  
>> adjtime() system call operates as in original Unix model; the Solaris  
>> adjtime() is a killer when large offsets are involved.
>
> A physicist I worked with early in my career taught me a very
> useful law. "Different things vary."
>
> I couldn't tell you how many ntp.drift files I've encountered
> with a vlaue of +-500.000. It's a lot. There are many ways this
> can occur, but all of them involve ntpd starting up against a large
> offset with its reference clocks and/or shutting down while it is
> working one off, the latter usually because of an NTP misconfiguration,
> but also sometimes because of thunderstorms in July. Others have
> also observed how this happens on mobile systems that get booted
> and shut down a lot.
>
> Once a system is in this state, it depends on the specifics of the OS
> how long or even if that system will "settle". I can assure you that
> for some systems, if not most, this is most assuredly not 15 minutes,
> might be days, or might, for all practical purposes, be never. These
> are the systems on which the ordinary non-NTP-expert system
> manager or field support team will go through several rounds of
> battery or crystal or motherboard or system replacement before
> anyone tells them to just delete the drift file and start over.
>
> _______________________________________________________________________ 
> _
> Tom Smith                       smith@alum.mit.edu,smith@cag.lkg.hp.com
> Hewlett-Packard Company                          Tel: +1 (603) 884-6329
> 110 Spit Brook Road ZKO1-3/H42                   FAX: +1 (603) 884-6484
> Nashua, New Hampshire 03062-2698, USA           Mobile: +1 978 397 3411
> _______________________________________________
> questions mailing list
> questions@lists.ntp.isc.org
> https://lists.ntp.isc.org/mailman/listinfo/questions
>

0
Reply Kit 2/3/2005 9:09:23 PM

David L. Mills wrote:
> I get nervous about nonquantitative statements, since they might start 
> urban legends. A "decent" frequency file is one created when first 
> starting ntpd without the file and letting it determine the intrinsic 
> frequency error. This takes about fifteen minutes. However, the 
> frequency file itself is written only after the first hour and at hourly 
> intervals after that. The discipline should be stable even if the 
> frequency file is present and intentionally set as much as +-500 PPM in 
> error and that even with a large initial time offset. This has been 
> confirmed by simulation; however, the simulations assume the adjtime() 
> system call operates as in original Unix model; the Solaris adjtime() is 
> a killer when large offsets are involved.

A physicist I worked with early in my career taught me a very
useful law. "Different things vary."

I couldn't tell you how many ntp.drift files I've encountered
with a vlaue of +-500.000. It's a lot. There are many ways this
can occur, but all of them involve ntpd starting up against a large
offset with its reference clocks and/or shutting down while it is
working one off, the latter usually because of an NTP misconfiguration,
but also sometimes because of thunderstorms in July. Others have
also observed how this happens on mobile systems that get booted
and shut down a lot.

Once a system is in this state, it depends on the specifics of the OS
how long or even if that system will "settle". I can assure you that
for some systems, if not most, this is most assuredly not 15 minutes,
might be days, or might, for all practical purposes, be never. These
are the systems on which the ordinary non-NTP-expert system
manager or field support team will go through several rounds of
battery or crystal or motherboard or system replacement before
anyone tells them to just delete the drift file and start over.

________________________________________________________________________
Tom Smith                       smith@alum.mit.edu,smith@cag.lkg.hp.com
Hewlett-Packard Company                          Tel: +1 (603) 884-6329
110 Spit Brook Road ZKO1-3/H42                   FAX: +1 (603) 884-6484
Nashua, New Hampshire 03062-2698, USA           Mobile: +1 978 397 3411
0
Reply Tom 2/3/2005 9:13:47 PM

Tom Smith wrote:

> David L. Mills wrote:
>
>> Kenneth,
>>
>> This is the single most persistent issue in the engineering design of 
>> NTP. There must be tradeoffs between security, robustenss, accuracy 
>> and initial delay. In the current design compromise, a server is 
>> acceptable only after three/four rounds of messages and the ensemble 
>> time is acceptable with at least one of possibly several acceptable 
>> servers. With IBURST mode, takes takes 6-8 seconds.
>>
>> For better robustness use "tos minclock N", where the at least N 
>> (default 1) servers must be acceptable to set the clock. Tonight I 
>> put in a "tos maxdist M", where M is the distance threshold below 
>> which the server is acceptable. Set "tos maxdist 16" and the first 
>> sample received from any server will set the clock likety-split. Of 
>> course, essentially all the mitigation algorithms using 
>> multiple-sample redundancy and multiple-server diversity are 
>> systematically defeated. You might as well use SNTP.
>
>
> David,
>
> I know the subject has been workstations, but let's talk for a moment
> about this religion as it concerns servers - like the ones that run
> telephone companies, stock exchanges, and banks inside heavily
> defended firewalls. It's the same issue, it's just that the stakes
> are higher. The issue is how quickly can you get these
> systems back up at boot. 15-30 seconds is a long time to wait.
> Too long.
>
> We're not talking about one-shot sampling for maintaining the time,
> so comparisons to SNTP are not helpful. We're talking about speed of
> acquistion of an initial "good enough" time, keeping in mind that the
> perfect is often the enemy of the good.
>
> You might argue that if boot time is critical, just let the server come
> up with whatever random time it comes up with and let ntpd fix
> it up later. Give it a "-g" so it doesn't complain. A lot of folks
> have tried this in the past inadvertently (and continue to do so)
> by neglecting to put ntpdate into their boot sequence ahead of ntpd.
> I've fixed a lot of systems whose drift files were pinned
> at 500 ppm and whose systems ran perpetually fast or slow as
> a result. We've also spent a lot of money fruitlessly replacing
> motherboards on those systems. Turning a large initial offset over
> to ntpd is decidedly NOT a Good Idea.
>
> The reason why so many of your constituency keep bringing this
> subject up is that they know that ntpd needs a good (not perfect)
> estimate of the time before it starts and that critical systems
> can't wait for perfection to get that estimate.
>
> -Tom
> ________________________________________________________________________
> Tom Smith                       smith@alum.mit.edu,smith@cag.lkg.hp.com
> Hewlett-Packard Company                          Tel: +1 (603) 884-6329
> 110 Spit Brook Road ZKO1-3/H42                   FAX: +1 (603) 884-6484
> Nashua, New Hampshire 03062-2698, USA           Mobile: +1 978 397 3411
>
Tom,

I think it all boils down to how good is "good enough"?   Your snail 
mail address suggests that you're in VMS Engineering or, if not, you 
could throw rocks at them!   VMS, although it keeps time in units of 100 
nanosecond "ticks", only updates the clock every ten milliseconds!  
(Measure with micrometer, mark with chalk, cut with ax?)
The documented and supported interfaces in VMS only permit you to set 
the clock and read the clock to the nearest ten milliseconds.

If you are willing to have a server come up with a clock error of one 
second, just boot and start ntpd later.  If you need to have time 
correct to the nearest microsecond, you are using the wrong tools.

If you are, in fact, talking about VMS and TCP/IP services, porting the 
latest version of the NTP reference implementation would help you speed 
up the startup.  The last time I looked, TCP/IP Services (V5.1)  was 
using a port of NTP V3-5.91 which does not support the iburst 
qualifier.  Iburst allows much faster initialization; it gets you a 
"good enough" time and frequency correction in about 8 seconds.

If eight seconds is too long, you need to specify how quickly you need 
to acquire the correct time and how accurate the time must be.   These 
two specifications pretty much determine the tools you must use to meet 
them; e.g. if you need time correct to +/- 50 nanoseconds and need to 
set it within 100 microseconds, you will almost certainly need to use a 
hardware reference clock such as a cesium or rubidium standard.
0
Reply Richard 2/3/2005 9:52:53 PM

Wolfgang S. Rupprecht wrote:

>Brad Knowles <brad@stop.mail-abuse.org> writes:
>  
>
>>	If you want to make that delay shorter, I guess you could
>>package Stratum 1 refclocks with every machine.
>>    
>>
>
>I'd be waiting for minutes if I waited till ntpd decided it had a
>"good enough" estimate of the Motorola Oncore's time.  Ntpd *could*
>have its first time estimate accurate to well under 1ms in half a
>second on average.  The problem is, it won't tell you the time or step
>the system clock until it filters the crap out of the refclock signal.
>
>Ntpd has a hard act to follow.  The old ntpdate program was a near
>ideal solution from the perspective of booting.  It did its job
>quickly and got off the pot.
>
>-wolfgang
>  
>
Since the Motorola driver polls the clock every sixteen seconds and 
since four samples are required, sixty-four seconds should be sufficient!
0
Reply Richard 2/3/2005 10:06:05 PM

Brad Knowles wrote:

> At 10:21 AM -0500 2005-02-03, Tom Smith wrote:
>
>>>      With a decent drift file ...
>>
>>
>>  Precisely. The decent drift file is a problem. It sometimes doesn't
>>  exist after a large initial offset has been turned over to ntpd.
>
>
>     Even without a good drift file, you can still sync very quickly. 
> It may not be seven seconds, it may be fifteen.  But that should still 
> be tolerable.
>
>>  You should discuss that with a bank or stock exchange that
>>  is losing millions in transactions during those seconds
>>  or with public utility that is paying the government
>>  penalties for downtime. :-)
>
>
>     My wife is general counsel, head of legal, and secretary to the 
> board for the world's largest clearing and settlement firm for 
> European stocks and bonds, with an annual turnover in excess of 256 
> trillion Euro last year, and assets under management in excess of 
> twelve trillion Euros.  Yes, I mean trillion.
>
>     When Argentina decides not to make their interest payments on 
> their Brady bond debt, because 80% of their bonds are held through her 
> company, the final decision of whether or not to declare what used to 
> be the world's seventh largest economy officially bankrupt, arrives on 
> her desk.
>
>     I understand the scale of the problem.  With over a trillion Euro 
> of turnover in a single workday, milliseconds do count.
>
>>  Well, no. As David pointed out in his posting, all engineering
>>  is a matter of tradeoffs. For many users, the tradeoff needs
>>  to be 'Get these applications up fast on a "good enough"
>>  time and refine the time (and frequency) in the background.'
>
>
>     So, doing a single query and taking whatever bogus time may be set 
> from that server, is more important than waiting a few more seconds to 
> make sure that you've got a pretty good timesync?
>
>     I'm sorry, I don't buy it.  The bigger the application, the more 
> you have to lose, the more important it is to have good time sync.
>
>
>     See above -- milliseconds do count.
>
>>  Perhaps it is. For you. If it's seven seconds.
>
>
>     For financial applications, if the server goes down, then your N+M 
> fault-tolerant systems take over that load, and not a single 
> transaction is dropped or excessively delayed.  If your main server 
> facility is taken out by terrorists or natural disaster, then your hot 
> spare facility, that is located hundreds or thousands of miles away, 
> takes over and a few transactions might be delayed, but nothing is 
> dropped.
>
>     If you're running something that mission-critical and you don't 
> have those kinds of systems (which can tolerate a few extra seconds of 
> startup time in order to ensure that the time is set reasonably well), 
> then you are shooting yourself in the foot with a thermonuclear 
> weapon, and you will get what you deserve.
>
It's worth noting that, on September 11, 2001, Merrill-Lynch "failed 
over'" to a duplicate data center in Westchester County in something 
like four minutes; without losing a  single transaction or a byte of 
data.  If  downtime costs you $50,000,000/minute, the budget to ensure 
that there isn't any downtime is practically infinite!!!!!
0
Reply Richard 2/3/2005 10:14:26 PM


Tom Smith escreveu:
> David L. Mills wrote:
> 
>> I get nervous about nonquantitative statements, since they might start 
>> urban legends. A "decent" frequency file is one created when first 
>> starting ntpd without the file and letting it determine the intrinsic 
>> frequency error. This takes about fifteen minutes. However, the 
>> frequency file itself is written only after the first hour and at 
>> hourly intervals after that. The discipline should be stable even if 
>> the frequency file is present and intentionally set as much as +-500 
>> PPM in error and that even with a large initial time offset. This has 
>> been confirmed by simulation; however, the simulations assume the 
>> adjtime() system call operates as in original Unix model; the Solaris 
>> adjtime() is a killer when large offsets are involved.
> 
> 
> A physicist I worked with early in my career taught me a very
> useful law. "Different things vary."
> 
> I couldn't tell you how many ntp.drift files I've encountered
> with a vlaue of +-500.000. It's a lot. There are many ways this
> can occur, but all of them involve ntpd starting up against a large
> offset with its reference clocks and/or shutting down while it is
> working one off, the latter usually because of an NTP misconfiguration,
> but also sometimes because of thunderstorms in July. Others have
> also observed how this happens on mobile systems that get booted
> and shut down a lot.
> 
> Once a system is in this state, it depends on the specifics of the OS
> how long or even if that system will "settle". I can assure you that
> for some systems, if not most, this is most assuredly not 15 minutes,
> might be days, or might, for all practical purposes, be never. These
> are the systems on which the ordinary non-NTP-expert system
> manager or field support team will go through several rounds of
> battery or crystal or motherboard or system replacement before
> anyone tells them to just delete the drift file and start over.

So, if I manage to set the initial time within a good aproximation od 
"real" time using ntpdate using 5 servers as explained earlyer, would 
you recomend to delete the drift file and let it start all over again?

Does this afect the time for the server to start serving?

Alain
0
Reply Alain 2/3/2005 11:03:22 PM

At 9:03 PM -0200 2005-02-03, Alain wrote:

>  So, if I manage to set the initial time within a good aproximation od
>  "real" time using ntpdate using 5 servers as explained earlyer, would
>  you recomend to delete the drift file and let it start all over again?

	If the drift file states +/- 500ms, then I would be inclined to 
remove it on startup no matter what.  Of course, I would also be 
inclined to use "ntpd -g" instead of ntpdate, for the reasons I've 
previously given you, not to mention the issues that you have 
mentioned here regarding the use of ntpdate against servers that are 
not up.

>  Does this afect the time for the server to start serving?

	If the drift file is whacked out, it's better to remove it and 
let the server re-calculate, than to try to compensate for a whacked 
out drift file.  You'll start serving time faster if you let the 
system try to calculate the real situation once the clock has been 
set to a reasonable value on startup.

	Keep in mind that the server will have to take extra time to get 
to a state where it can start serving time to clients, if it has had 
to calculate a new drift file or try to deal with a whacked out drift 
file.  If you're sensitive to seven seconds of additional startup 
time, then you really, really want to make sure that you always keep 
a reasonable drift file.


	Given the kinds of time periods we're talking about, I still have 
yet to see a good reason for using "ntpdate; ntpd" instead of "ntpd 
-g".

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/3/2005 11:55:01 PM

Alain wrote:
> So, if I manage to set the initial time within a good aproximation od 
> "real" time using ntpdate using 5 servers as explained earlyer, would 
> you recomend to delete the drift file and let it start all over again?
> 
> Does this afect the time for the server to start serving?

I assume you asked this of me, rather than of Dave. Your method
seems to me to excellent.

If you do that, there should be no need to clean out the
characteristic drift that was so carefully computed over a long
period of time. Cleaning out the drift file is a drastic measure
required only if you do NOT pre-set the clock before starting ntpd
and you end up, as a result, with a bogus re-calculated drift rate.

What you have to calculate into the boot time is the time you have
to block while stepping the clock to the "right" time in the first place.
Putting ntpdate into the boot sequence means that time will be however
long it takes for ntpdate to complete, exactly that long, and no longer
and that when ntpdate completes the rest of your boot sequence, including
ntpd is "safe". That's why people are unhappy about ntpdate being
removed without an equivalent way of doing the same thing with
ntpd:

1) step the clock to a "good enough" time
       do this within a few seconds
       block while doing it
       don't touch/change the pre-computed drift while doing this
2) start ntpd
3) start time-dependent services

It would certainly be possible for ntpd to do this all by itself,
but unless I misunderstand, the only way to do this with just
ntpd at the moment is:

1) ntpd -gq
2) sleep [guess how long ntpd -gq will take, worst case, to set the clock]
    -or-
    spin on ps looking for ntpd to start and then end
3) start ntpd [normal operation options]
4) start time-dependent services

Which is not the same thing.

________________________________________________________________________
Tom Smith                       smith@alum.mit.edu,smith@cag.lkg.hp.com
Hewlett-Packard Company                          Tel: +1 (603) 884-6329
110 Spit Brook Road ZKO1-3/H42                   FAX: +1 (603) 884-6484
Nashua, New Hampshire 03062-2698, USA           Mobile: +1 978 397 3411
0
Reply Tom 2/4/2005 12:30:40 AM

At 12:30 AM +0000 2005-02-04, Tom Smith wrote:

>  1) ntpd -gq
>  2) sleep [guess how long ntpd -gq will take, worst case, to set the clock]
>     -or-
>     spin on ps looking for ntpd to start and then end
>  3) start ntpd [normal operation options]
>  4) start time-dependent services

	If you're going to follow this model, then in step #2 you can 
send queries to ntpd to see what it's status is, or you could just 
monitor the clock and see if/when there is a large step.  Or, you 
could monitor syslog output, or look for changes in the ntpd.pid 
file, or any number of other things.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/4/2005 12:32:08 AM

At 12:30 AM +0000 2005-02-04, Tom Smith wrote:

>  1) step the clock to a "good enough" time
>        do this within a few seconds
>        block while doing it
>        don't touch/change the pre-computed drift while doing this
>  2) start ntpd
>  3) start time-dependent services

	Thinking about this some more, what I hear you saying is that 
ntpd should not background itself until such time as it has 
calculated the initial offset and stepped the clock as necessary to 
get you within normal slew distance, at which point it backgrounds 
itself and continues normal startup operations, starts working on 
calculating/updating the drift, etc....

	At least, it should have this as an optional startup mode, 
perhaps as a part of "-g".

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/4/2005 12:59:00 AM

At 2:23 PM -0600 2005-02-03, Kenneth Porter wrote:

>  What about mobile clients? In the mobile environment, what does ntpd do
>  when you sever the network connection (ie. undock)?

	The current version of ntpd is not well-suited for use with 
mobile clients.  It assumes that your IP address does not change.  It 
assumes that your local network latency is pretty constant, and any 
variation in network latency is largely due to WAN issues.

	It assumes that there is just one absolute "right" canonical 
time, and that all servers are closer or farther away from that, and 
that it's job is to try to figure out which server is currently the 
closest (using long-term statistical data) and then to make that one 
the syspeer.

	It assumes a whole host of things that are not suitable to a 
mobile environment.

>                                                         Suppose I undock
>  (taking down eth1) and plug in down the hall with the built-in NIC
>  (bringing up eth0).

	You may no longer be anywhere "close" to the upstream time 
servers you had previously configured, and may have to tear down all 
your server associations and put up all new ones.

	Any time you switch interfaces, get a new IP address, or any of 
the other things that are typical for mobile environments, you're 
basically looking at a complete stop and restart, if not a complete 
stop, re-configure (presumably with totally different servers), and 
re-start.

>                      What must one do to make ntpd tolerant of that?

	I'm not convinced that is possible.  At least, not in the way 
you're thinking of.

>  Or must mobile apps give up quality time because their network
>  interfaces are transient?

	I think you have to assume that a mobile client would have to be 
a lot more dependant on the local network services that are provided 
wherever they are, and the DHCP server to tell you what the 
appropriate time servers are for you to use, etc....  Then you stop 
ntpd, throw away everything you previously had, completely 
re-configure with the new information, and restart ntpd.

>                             Would one need to script a complete
>  stop/start of ntpd whenever interfaces come and go?

	Yup.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/4/2005 1:12:33 AM

At 2:20 PM -0600 2005-02-03, Kenneth Porter wrote:

>>       If you want to make that delay shorter, I guess you could package
>>  Stratum 1 refclocks with every machine.
>
>  Given the cheap price of a consumer GPS receiver, this doesn't sound that
>  crazy. How fast can ntpd lock on to the NMEA messages?

	Cheap GPS receivers don't do NMEA.  Those that do NMEA don't give 
you a PPS signal, so they're pretty much useless in this role.  You 
have to look very carefully at GPS devices before you can be sure 
that you've found one that will be able to give you a good time 
reference.

	For example, the Motorola Oncore 12 might be a piece of crap, 
while the Motorola Oncore 12+ might be good.  Or the Garmin 18 LCS 
might be good, but all the other Garmin 18 models might be garbage. 
You've got to know what you're looking for.


	Please note that I don't have any specific knowledge of which 
models are good or bad, I just pulled these examples out of the air.

	You need to do your own research to ensure that you've got a GPS 
device that will be useful as input to a proposed Stratum 1 time 
source.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/4/2005 1:15:57 AM

Brad Knowles wrote:
> At 12:30 AM +0000 2005-02-04, Tom Smith wrote:
> 
>>  1) step the clock to a "good enough" time
>>        do this within a few seconds
>>        block while doing it
>>        don't touch/change the pre-computed drift while doing this
>>  2) start ntpd
>>  3) start time-dependent services
> 
> 
>     Thinking about this some more, what I hear you saying is that ntpd 
> should not background itself until such time as it has calculated the 
> initial offset and stepped the clock as necessary to get you within 
> normal slew distance, at which point it backgrounds itself and continues 
> normal startup operations, starts working on calculating/updating the 
> drift, etc....
> 
>     At least, it should have this as an optional startup mode, perhaps 
> as a part of "-g".
> 

Bingo. And DON'T TOUCH THE DRIFT until the second phase starts.

I'd also say that should always be the startup mode, -g or no -g
(or at least -g should be implied during that first phase, whether
or not it's requested for the "permanent" phase).

-Tom
________________________________________________________________________
Tom Smith                       smith@alum.mit.edu,smith@cag.lkg.hp.com
Hewlett-Packard Company                          Tel: +1 (603) 884-6329
110 Spit Brook Road ZKO1-3/H42                   FAX: +1 (603) 884-6484
Nashua, New Hampshire 03062-2698, USA           Mobile: +1 978 397 3411
0
Reply Tom 2/4/2005 1:18:14 AM

On 2005-02-04, Tom Smith <smith@cag.lkg.hp.com> wrote:

> Putting ntpdate into the boot sequence means that time will be however
> long it takes for ntpdate to complete, exactly that long, and no longer
> and that when ntpdate completes the rest of your boot sequence, including
> ntpd is "safe". That's why people are unhappy about ntpdate being
> removed without an equivalent way of doing the same thing with
> ntpd:

'ntpd -gq' blocks just like ntpdate. Try it and you'll see. The
difference is that ntpd does a bit more work before it sets the clock
and 'ntpd -gq' can use NTP authentication.

> 1) step the clock to a "good enough" time
>        do this within a few seconds
>        block while doing it
>        don't touch/change the pre-computed drift while doing this

If a drift file exists ntpd uses its contents. Any updates to the drift
file are not written out until ntpd has been running for an hour.

> 2) start ntpd
> 3) start time-dependent services
>
> It would certainly be possible for ntpd to do this all by itself,

It is.

> but unless I misunderstand, the only way to do this with just
> ntpd at the moment is:
>
> 1) ntpd -gq
> 2) sleep [guess how long ntpd -gq will take, worst case, to set the clock]
>     -or-
>     spin on ps looking for ntpd to start and then end

That's not necessary because 'ntpd -gq' blocks just like ntpdate ... as
long as the init script doesn't background it.

-- 
Steve Kostecke <kostecke@ntp.isc.org>
NTP Public Services Project - http://ntp.isc.org/
0
Reply Steve 2/4/2005 2:03:50 AM

Brad & Co.,

I make no value judgements in any form here, but I do observe just about 
every motherboard on the planet today has a TOY clock which is updated 
occasionally by the operating system. So, a machine coming up to sell an 
Airbus is probably pretty close, at least within the second if the TOY 
time since the last update was not too long and in that case you might 
have lost a large number of Airbus sales anyway.

Here's how to sell more Airba. The operating system should write the TOY 
time and offset within the second to a file from time to time keeping a 
history of at least the last two updates. Assuming the motherboard is 
not tossed in the frigid sea or boiling desert, the operating system (or 
NTP) can retrieve these values, compute the time and current offset and 
set the clock with good accuracy.

Dave

Brad Knowles wrote:
> At 10:21 AM -0500 2005-02-03, Tom Smith wrote:
> 
>>>      With a decent drift file ...
>>
>>
>>  Precisely. The decent drift file is a problem. It sometimes doesn't
>>  exist after a large initial offset has been turned over to ntpd.
> 
> 
>     Even without a good drift file, you can still sync very quickly. It 
> may not be seven seconds, it may be fifteen.  But that should still be 
> tolerable.
> 
>>  You should discuss that with a bank or stock exchange that
>>  is losing millions in transactions during those seconds
>>  or with public utility that is paying the government
>>  penalties for downtime. :-)
> 
> 
>     My wife is general counsel, head of legal, and secretary to the 
> board for the world's largest clearing and settlement firm for European 
> stocks and bonds, with an annual turnover in excess of 256 trillion Euro 
> last year, and assets under management in excess of twelve trillion 
> Euros.  Yes, I mean trillion.
> 
>     When Argentina decides not to make their interest payments on their 
> Brady bond debt, because 80% of their bonds are held through her 
> company, the final decision of whether or not to declare what used to be 
> the world's seventh largest economy officially bankrupt, arrives on her 
> desk.
> 
>     I understand the scale of the problem.  With over a trillion Euro of 
> turnover in a single workday, milliseconds do count.
> 
>>  Well, no. As David pointed out in his posting, all engineering
>>  is a matter of tradeoffs. For many users, the tradeoff needs
>>  to be 'Get these applications up fast on a "good enough"
>>  time and refine the time (and frequency) in the background.'
> 
> 
>     So, doing a single query and taking whatever bogus time may be set 
> from that server, is more important than waiting a few more seconds to 
> make sure that you've got a pretty good timesync?
> 
>     I'm sorry, I don't buy it.  The bigger the application, the more you 
> have to lose, the more important it is to have good time sync.
> 
> 
>     See above -- milliseconds do count.
> 
>>  Perhaps it is. For you. If it's seven seconds.
> 
> 
>     For financial applications, if the server goes down, then your N+M 
> fault-tolerant systems take over that load, and not a single transaction 
> is dropped or excessively delayed.  If your main server facility is 
> taken out by terrorists or natural disaster, then your hot spare 
> facility, that is located hundreds or thousands of miles away, takes 
> over and a few transactions might be delayed, but nothing is dropped.
> 
>     If you're running something that mission-critical and you don't have 
> those kinds of systems (which can tolerate a few extra seconds of 
> startup time in order to ensure that the time is set reasonably well), 
> then you are shooting yourself in the foot with a thermonuclear weapon, 
> and you will get what you deserve.
> 
0
Reply David 2/4/2005 2:15:44 AM

Kenneth,

I have no idea what you are assuming about start/stop. The scheme I 
mentioned has nothing to do with that, just the number of responses 
necessary to set the clock. The only practical way to disciplne the time 
when hibernating or changing wifi cards is using the TOY chip and this 
can be quite accurate if the scheme I mentioned in my last is adopted.

PLEASE NOTE: There have been a number of changes since the 4.2.0 
distribution, which is now well over a year old. Some of the schemes I 
have mentioned here are only in recent versions.

Dave

Kenneth Porter wrote:

> "David L. Mills" <mills@udel.edu> wrote in news:cttv9l$ql7$1
> @dewey.udel.edu:
> 
> 
>>It is true that ntpd is specifically engineered for Internet badlands 
>>where popcorn spikes, evil masqueraders, misbehaving clocks and other 
>>vermin might poison your DNS cache. A client using the pool servers 
>>scheme really needs this ammunition.
> 
> 
> What about mobile clients? In the mobile environment, what does ntpd do 
> when you sever the network connection (ie. undock)? Suppose I undock 
> (taking down eth1) and plug in down the hall with the built-in NIC 
> (bringing up eth0). What must one do to make ntpd tolerant of that? Or must 
> mobile apps give up quality time because their network interfaces are 
> transient? Would one need to script a complete stop/start of ntpd whenever 
> interfaces come and go?
0
Reply David 2/4/2005 2:24:15 AM

Tom,

Hangups as you describe is exactly what the simulator is designed to 
reveal and it has revealed them from time to time as folks learn new 
ways to misconfigure and warp the hardware and new operating system 
violations of the Principle of Least Astonishment occur. I try to keep 
ahead of those as they pop up, but do confirm that any combination of 
broken frequency file and initial clock error does damp out eventually, 
although sometimes with behavior like a pinball machine. Emphasis added: 
I can confirm all atrocities found do damp out only on the latest 
(development) version.

There are a great many divergent views on what to expect of the NTP 
model, some very contradictory and unworkable in a specific combination 
of ntpd and j-random operating system. Solaris 2.7 comes to mind where 
the CPU clock frequency was determined by the kernel in error and far 
beyond the tolerance of ntpd. I've had to simulate all of these things, 
including a system with a clock resolution of one second (sic) (and it 
works). It could be your systems suffer from one or another of such ills 
or simply that you are using an older version not yet recently tamed. It 
could be your operating system has the same ill-mannered behavior as 
current Solaris adjtime(). With large time adjustments, this turns ntpd 
into a megawatt oscillator.

Dave

Tom Smith wrote:

> David L. Mills wrote:
> 
>> I get nervous about nonquantitative statements, since they might start 
>> urban legends. A "decent" frequency file is one created when first 
>> starting ntpd without the file and letting it determine the intrinsic 
>> frequency error. This takes about fifteen minutes. However, the 
>> frequency file itself is written only after the first hour and at 
>> hourly intervals after that. The discipline should be stable even if 
>> the frequency file is present and intentionally set as much as +-500 
>> PPM in error and that even with a large initial time offset. This has 
>> been confirmed by simulation; however, the simulations assume the 
>> adjtime() system call operates as in original Unix model; the Solaris 
>> adjtime() is a killer when large offsets are involved.
> 
> 
> A physicist I worked with early in my career taught me a very
> useful law. "Different things vary."
> 
> I couldn't tell you how many ntp.drift files I've encountered
> with a vlaue of +-500.000. It's a lot. There are many ways this
> can occur, but all of them involve ntpd starting up against a large
> offset with its reference clocks and/or shutting down while it is
> working one off, the latter usually because of an NTP misconfiguration,
> but also sometimes because of thunderstorms in July. Others have
> also observed how this happens on mobile systems that get booted
> and shut down a lot.
> 
> Once a system is in this state, it depends on the specifics of the OS
> how long or even if that system will "settle". I can assure you that
> for some systems, if not most, this is most assuredly not 15 minutes,
> might be days, or might, for all practical purposes, be never. These
> are the systems on which the ordinary non-NTP-expert system
> manager or field support team will go through several rounds of
> battery or crystal or motherboard or system replacement before
> anyone tells them to just delete the drift file and start over.
> 
> ________________________________________________________________________
> Tom Smith                       smith@alum.mit.edu,smith@cag.lkg.hp.com
> Hewlett-Packard Company                          Tel: +1 (603) 884-6329
> 110 Spit Brook Road ZKO1-3/H42                   FAX: +1 (603) 884-6484
> Nashua, New Hampshire 03062-2698, USA           Mobile: +1 978 397 3411
0
Reply David 2/4/2005 2:44:50 AM

Kit,

At least until recently USNO was using GPS radios and VME interfaces on 
HP machines with excellent results. You can easily find out whether the 
CPU clock is the culprit by starting ntpd with a "disable ntp" in the 
configuration file and watching another server. COmpute the intrinsic 
frequency error from the change in offsets over a few hours. A fix that 
works even with older untamed versions is to craft a frequency file with 
the value computed.

Dave

Kit Plummer wrote:

> How timely this thread is.  Given my volatile NTP situation...I am  
> starting to believe that using a VME-based GPS source as a reference  
> clock to my NTP daemon in a VME-based SPARC SBC isn't a good idea.  It  
> has been one thing after another...and the "clients" are not getting  
> good time from their sole-source server.  The main problem that I've  
> encountered is +500 PPM and steps during operations - which has  
> effectively ruined my data.  So...on to plan B???
> 
> I am wondering if a simple, static network time server is a better  
> option...guess so at this point.
> 
> Thanks for the info.
> 
> Kit
> 
> ------------------------------------------------------------------------
> Kit Plummer
> Operations Research and System Performance Dept.
> Raytheon Missile Systems
> 
> On Feb 3, 2005, at 2:13 PM, Tom Smith wrote:
> 
>> David L. Mills wrote:
>>
>>> I get nervous about nonquantitative statements, since they might  
>>> start urban legends. A "decent" frequency file is one created when  
>>> first starting ntpd without the file and letting it determine the  
>>> intrinsic frequency error. This takes about fifteen minutes. 
>>> However,  the frequency file itself is written only after the first 
>>> hour and at  hourly intervals after that. The discipline should be 
>>> stable even if  the frequency file is present and intentionally set 
>>> as much as +-500  PPM in error and that even with a large initial 
>>> time offset. This has  been confirmed by simulation; however, the 
>>> simulations assume the  adjtime() system call operates as in original 
>>> Unix model; the Solaris  adjtime() is a killer when large offsets are 
>>> involved.
>>
>>
>> A physicist I worked with early in my career taught me a very
>> useful law. "Different things vary."
>>
>> I couldn't tell you how many ntp.drift files I've encountered
>> with a vlaue of +-500.000. It's a lot. There are many ways this
>> can occur, but all of them involve ntpd starting up against a large
>> offset with its reference clocks and/or shutting down while it is
>> working one off, the latter usually because of an NTP misconfiguration,
>> but also sometimes because of thunderstorms in July. Others have
>> also observed how this happens on mobile systems that get booted
>> and shut down a lot.
>>
>> Once a system is in this state, it depends on the specifics of the OS
>> how long or even if that system will "settle". I can assure you that
>> for some systems, if not most, this is most assuredly not 15 minutes,
>> might be days, or might, for all practical purposes, be never. These
>> are the systems on which the ordinary non-NTP-expert system
>> manager or field support team will go through several rounds of
>> battery or crystal or motherboard or system replacement before
>> anyone tells them to just delete the drift file and start over.
>>
>> _______________________________________________________________________ _
>> Tom Smith                       smith@alum.mit.edu,smith@cag.lkg.hp.com
>> Hewlett-Packard Company                          Tel: +1 (603) 884-6329
>> 110 Spit Brook Road ZKO1-3/H42                   FAX: +1 (603) 884-6484
>> Nashua, New Hampshire 03062-2698, USA           Mobile: +1 978 397 3411
>> _______________________________________________
>> questions mailing list
>> questions@lists.ntp.isc.org
>> https://lists.ntp.isc.org/mailman/listinfo/questions
>>
> 
0
Reply David 2/4/2005 2:52:12 AM

Alain,

No, don't use ntpdate at all. Use ntpd -g with the tos maxdist 16 if you 
want instant synchronization. There are several combinations of the tos 
commmand that could be used to modify behavior, such as the minsane and 
minclocks arguments. TO determine what they do you have to understand 
the icky algorithms; however, after studying the briefings at the 
project page and understanding how the algorithms work, it should be 
fairly obvious.

Dave

Dave

Alain wrote:

> 
> 
> Tom Smith escreveu:
> 
>> David L. Mills wrote:
>>
>>> I get nervous about nonquantitative statements, since they might 
>>> start urban legends. A "decent" frequency file is one created when 
>>> first starting ntpd without the file and letting it determine the 
>>> intrinsic frequency error. This takes about fifteen minutes. However, 
>>> the frequency file itself is written only after the first hour and at 
>>> hourly intervals after that. The discipline should be stable even if 
>>> the frequency file is present and intentionally set as much as +-500 
>>> PPM in error and that even with a large initial time offset. This has 
>>> been confirmed by simulation; however, the simulations assume the 
>>> adjtime() system call operates as in original Unix model; the Solaris 
>>> adjtime() is a killer when large offsets are involved.
>>
>>
>>
>> A physicist I worked with early in my career taught me a very
>> useful law. "Different things vary."
>>
>> I couldn't tell you how many ntp.drift files I've encountered
>> with a vlaue of +-500.000. It's a lot. There are many ways this
>> can occur, but all of them involve ntpd starting up against a large
>> offset with its reference clocks and/or shutting down while it is
>> working one off, the latter usually because of an NTP misconfiguration,
>> but also sometimes because of thunderstorms in July. Others have
>> also observed how this happens on mobile systems that get booted
>> and shut down a lot.
>>
>> Once a system is in this state, it depends on the specifics of the OS
>> how long or even if that system will "settle". I can assure you that
>> for some systems, if not most, this is most assuredly not 15 minutes,
>> might be days, or might, for all practical purposes, be never. These
>> are the systems on which the ordinary non-NTP-expert system
>> manager or field support team will go through several rounds of
>> battery or crystal or motherboard or system replacement before
>> anyone tells them to just delete the drift file and start over.
> 
> 
> So, if I manage to set the initial time within a good aproximation od 
> "real" time using ntpdate using 5 servers as explained earlyer, would 
> you recomend to delete the drift file and let it start all over again?
> 
> Does this afect the time for the server to start serving?
> 
> Alain
0
Reply David 2/4/2005 4:13:40 AM

Steve Kostecke wrote:
> 'ntpd -gq' blocks just like ntpdate. Try it and you'll see. The
> difference is that ntpd does a bit more work before it sets the clock
> and 'ntpd -gq' can use NTP authentication.

You're absolutely right. My mistake.

>>It would certainly be possible for ntpd to do this all by itself,
> 
> 
> It is.
> 

Yes, I wasn't clear. I meant that a single invocation of
ntpd could do this all by itself, as suggested by Brad.

The remaining issue, then, is that of the time required.

On a system already within less than a millisecond of nearly
all of its servers:

# time ntpd -gq [13 servers in ntp.conf]
ntpd: time slew -0.000373s

real    1m43.03s
user    0m0.10s
sys     0m0.60s

# time ntpdate -b [three selected servers]
  3 Feb 22:44:45 ntpdate[186032]: step time server [IP address] offset -0.000157 sec

real    0m0.90s
user    0m0.00s
sys     0m0.00s

I'm sure the time would be less with ntpd with fewer servers,
provided that all of them were up at boot time.

-Tom
________________________________________________________________________
Tom Smith                       smith@alum.mit.edu,smith@cag.lkg.hp.com
Hewlett-Packard Company                          Tel: +1 (603) 884-6329
110 Spit Brook Road ZKO1-3/H42                   FAX: +1 (603) 884-6484
Nashua, New Hampshire 03062-2698, USA           Mobile: +1 978 397 3411
0
Reply Tom 2/4/2005 4:21:19 AM

Tom,

The code I see in the ntpdate source does an adjtime() for all offsets, 
even large ones. I don't see a settimeofday() or equivalent. Thus, if 
you run ntpdate and it produces a large correction (maybe a second or 
more), you should wait until that adjustment is made before starting 
ntpd. That's about 2000 s of wait for a 1-s adjustment with stock Unix 
kernels and slew rate 500 PPM. You wouldn't have to wait that long for a 
Solaris kernel, but you would have to wait. Why not give up and use the 
-g option?

Dave

Tom Smith wrote:

> Alain wrote:
> 
>> So, if I manage to set the initial time within a good aproximation od 
>> "real" time using ntpdate using 5 servers as explained earlyer, would 
>> you recomend to delete the drift file and let it start all over again?
>>
>> Does this afect the time for the server to start serving?
> 
> 
> I assume you asked this of me, rather than of Dave. Your method
> seems to me to excellent.
> 
> If you do that, there should be no need to clean out the
> characteristic drift that was so carefully computed over a long
> period of time. Cleaning out the drift file is a drastic measure
> required only if you do NOT pre-set the clock before starting ntpd
> and you end up, as a result, with a bogus re-calculated drift rate.
> 
> What you have to calculate into the boot time is the time you have
> to block while stepping the clock to the "right" time in the first place.
> Putting ntpdate into the boot sequence means that time will be however
> long it takes for ntpdate to complete, exactly that long, and no longer
> and that when ntpdate completes the rest of your boot sequence, including
> ntpd is "safe". That's why people are unhappy about ntpdate being
> removed without an equivalent way of doing the same thing with
> ntpd:
> 
> 1) step the clock to a "good enough" time
>       do this within a few seconds
>       block while doing it
>       don't touch/change the pre-computed drift while doing this
> 2) start ntpd
> 3) start time-dependent services
> 
> It would certainly be possible for ntpd to do this all by itself,
> but unless I misunderstand, the only way to do this with just
> ntpd at the moment is:
> 
> 1) ntpd -gq
> 2) sleep [guess how long ntpd -gq will take, worst case, to set the clock]
>    -or-
>    spin on ps looking for ntpd to start and then end
> 3) start ntpd [normal operation options]
> 4) start time-dependent services
> 
> Which is not the same thing.
> 
> ________________________________________________________________________
> Tom Smith                       smith@alum.mit.edu,smith@cag.lkg.hp.com
> Hewlett-Packard Company                          Tel: +1 (603) 884-6329
> 110 Spit Brook Road ZKO1-3/H42                   FAX: +1 (603) 884-6484
> Nashua, New Hampshire 03062-2698, USA           Mobile: +1 978 397 3411
0
Reply David 2/4/2005 4:25:13 AM

Brad,

Why do you need to wait? The ntpd will exit when the clock is set, just 
like ntpdate and, like ntpdate, it will either step the clock or leave 
the residual adjustment in the kernel to be amortized over whatever 
period the particular kernel supports. This could be a considerable 
interval, just like with ntpdate.

Dave

Brad Knowles wrote:

> At 12:30 AM +0000 2005-02-04, Tom Smith wrote:
> 
>>  1) ntpd -gq
>>  2) sleep [guess how long ntpd -gq will take, worst case, to set the 
>> clock]
>>     -or-
>>     spin on ps looking for ntpd to start and then end
>>  3) start ntpd [normal operation options]
>>  4) start time-dependent services
> 
> 
>     If you're going to follow this model, then in step #2 you can send 
> queries to ntpd to see what it's status is, or you could just monitor 
> the clock and see if/when there is a large step.  Or, you could monitor 
> syslog output, or look for changes in the ntpd.pid file, or any number 
> of other things.
> 
0
Reply David 2/4/2005 4:29:16 AM

Tom,

I don't think you have the model right. The ntpd never "backgrounds 
itself", but keeps on ticking according to a state machine which 
controls whether or not to do a direct frequency measurement rather than 
the usual incremental feedback loop, which depends on whether the 
frequecy file is present. The only thing the -g does is exit the daemon 
when the clock is first set.

Dave

Tom Smith wrote:

> Brad Knowles wrote:
> 
>> At 12:30 AM +0000 2005-02-04, Tom Smith wrote:
>>
>>>  1) step the clock to a "good enough" time
>>>        do this within a few seconds
>>>        block while doing it
>>>        don't touch/change the pre-computed drift while doing this
>>>  2) start ntpd
>>>  3) start time-dependent services
>>
>>
>>
>>     Thinking about this some more, what I hear you saying is that ntpd 
>> should not background itself until such time as it has calculated the 
>> initial offset and stepped the clock as necessary to get you within 
>> normal slew distance, at which point it backgrounds itself and 
>> continues normal startup operations, starts working on 
>> calculating/updating the drift, etc....
>>
>>     At least, it should have this as an optional startup mode, perhaps 
>> as a part of "-g".
>>
> 
> Bingo. And DON'T TOUCH THE DRIFT until the second phase starts.
> 
> I'd also say that should always be the startup mode, -g or no -g
> (or at least -g should be implied during that first phase, whether
> or not it's requested for the "permanent" phase).
> 
> -Tom
> ________________________________________________________________________
> Tom Smith                       smith@alum.mit.edu,smith@cag.lkg.hp.com
> Hewlett-Packard Company                          Tel: +1 (603) 884-6329
> 110 Spit Brook Road ZKO1-3/H42                   FAX: +1 (603) 884-6484
> Nashua, New Hampshire 03062-2698, USA           Mobile: +1 978 397 3411
0
Reply David 2/4/2005 4:39:00 AM

Wolfgang S. Rupprecht wrote:

> Brad Knowles <brad@stop.mail-abuse.org> writes:
> 
>>	If you want to make that delay shorter, I guess you could
>>package Stratum 1 refclocks with every machine.
> 
> 
> I'd be waiting for minutes if I waited till ntpd decided it had a
> "good enough" estimate of the Motorola Oncore's time.  Ntpd *could*
> have its first time estimate accurate to well under 1ms in half a
> second on average.  The problem is, it won't tell you the time or step
> the system clock until it filters the crap out of the refclock signal.
> 
> Ntpd has a hard act to follow.  The old ntpdate program was a near
> ideal solution from the perspective of booting.  It did its job
> quickly and got off the pot.

Except the few times when it messed up horribly:

I had one of the my GPS-based stratum 1 server go bad on me a few years 
ago, and found out the hard way that _one_ critical server in our 
infrastructure had been configured to use this particular server as the 
only ntpdate reference:

It was rebooted during the time interval before the failing ntp/gps 
server was located and turned off, with the result that this unix db 
machine came up with a wildly wrong date.

After ntpdate had run, ntpd was started, but even though it had been 
configured with four server lines, only the temporarily broken server 
used by ntpdate was sufficiently close to be accepted, all the others 
were deemed falsetickers.

A perfect ntpdate replacement needs to locate all or most of all the 
configured servers, and run at least a couple of packets to each server, 
and wait until a plurality have agreed on what the time is. This can be 
handled in an absolute minimum of about 3-5 seconds without polling each 
server more often than every two seconds.

If you configure fewer servers, then you'd need more packets to each 
server before a sufficient reach value could be achieved for the set of 
servers that agree.

The same goes in case of one or more falsetickers of course: They must 
be detected and filtered out, which requires more packets to all the 
servers than if they all agree.

Prof. Mills' latest tweak, allowing you to tune (i.e. reduce) the 
required quality of the initial time estimate could be used to balance 
your need for believable time vs time to achieve first sync.

Terje

-- 
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 2/4/2005 7:31:52 AM

At 4:21 AM +0000 2005-02-04, Tom Smith wrote:

>  # time ntpd -gq [13 servers in ntp.conf]
>  ntpd: time slew -0.000373s
>
>  real    1m43.03s
>  user    0m0.10s
>  sys     0m0.60s

	Yes, but what does that ntp.conf look like?  Are you using 
iburst?  Any authentication?  Manually setting minpoll and/or 
maxpoll?  Did you have a good drift file to start with?

	You need to provide some more specifics before you can make an 
attempt to compare this to ntpdate.

>  # time ntpdate -b [three selected servers]
>   3 Feb 22:44:45 ntpdate[186032]: step time server [IP address] 
>offset -0.000157 sec

	Not comparable.  You need to include all thirteen servers before 
this could potentially be considered comparable.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/4/2005 8:50:43 AM

At 8:18 PM -0500 2005-02-03, Tom Smith wrote:

>  I'd also say that should always be the startup mode, -g or no -g
>  (or at least -g should be implied during that first phase, whether
>  or not it's requested for the "permanent" phase).

	No.  The point of "-g" is that the admin has to explicitly 
request that ntpd be allowed to make large-scale changes, and ntpd 
should not assume that it can do that unless specifically requested.

	No.  Absolutely not.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/4/2005 8:52:09 AM

Brad Knowles <brad@stop.mail-abuse.org> wrote in 
news:mailman.36.1107479858.583.questions@lists.ntp.isc.org:

> At 2:20 PM -0600 2005-02-03, Kenneth Porter wrote: 
> 
>>>  If you want to make that delay shorter, I guess you could package 
>>>  Stratum 1 refclocks with every machine. 
>>
>>  Given the cheap price of a consumer GPS receiver, this doesn't sound 
>>  that crazy. How fast can ntpd lock on to the NMEA messages? 
>
> Cheap GPS receivers don't do NMEA.  Those that do NMEA don't give 
> you a PPS signal, so they're pretty much useless in this role.  You 
> have to look very carefully at GPS devices before you can be sure 
> that you've found one that will be able to give you a good time 
> reference.

Point taken, but I have a "looser" definition of "cheap", so a receiver 
that costs a couple hundred dollars would be acceptable for a system that 
was so critical.

How "bad" is the time from a GPS receiver with NMEA but no PPS? Is it 
sufficient to use for an initial setting (replacing ntpdate and ntpd -g)? 
IIRC these things typically issue an update once a second.

> 	For example, the Motorola Oncore 12 might be a piece of crap, 
> while the Motorola Oncore 12+ might be good.  Or the Garmin 18 LCS 
> might be good, but all the other Garmin 18 models might be garbage. 
> You've got to know what you're looking for.
> 
> 
> 	Please note that I don't have any specific knowledge of which 
> models are good or bad, I just pulled these examples out of the air.
> 
> 	You need to do your own research to ensure that you've got a GPS 
> device that will be useful as input to a proposed Stratum 1 time 
> source.

I got an eTrex Vista for xmas and will have to see what it's capable 
of....
0
Reply Kenneth 2/4/2005 1:52:03 PM


Brad Knowles escreveu:
>>                                                         Suppose I undock
>>  (taking down eth1) and plug in down the hall with the built-in NIC
>>  (bringing up eth0).
> 
>     You may no longer be anywhere "close" to the upstream time servers 
> you had previously configured, and may have to tear down all your server 
> associations and put up all new ones.
> 
>     Any time you switch interfaces, get a new IP address, or any of the 
> other things that are typical for mobile environments, you're basically 
> looking at a complete stop and restart, if not a complete stop, 
> re-configure (presumably with totally different servers), and re-start.

Woldn normal operation cope with that? If it is a server at all, 
continue with calculated historical drift. If both sets of servers are 
in ntp.conf, after a while it will acquire sync again.

>>                      What must one do to make ntpd tolerant of that?
> 
>     I'm not convinced that is possible.  At least, not in the way you're 
> thinking of.

Maybe just be a bit more tolerant to a situation where all servers 
become unavailable?

And maybe just use longer times to calculate drift if connection times 
are not stable. Just a parameter adjustment maybe?

Alain
0
Reply Alain 2/4/2005 4:47:59 PM

Terje,

I am baffled by your comments. Your "perfect ntpdate replacement" 
described is precisely what ntpd -g does.

Dave

Terje Mathisen wrote:
> Wolfgang S. Rupprecht wrote:
> 
>> Brad Knowles <brad@stop.mail-abuse.org> writes:
>>
>>>     If you want to make that delay shorter, I guess you could
>>> package Stratum 1 refclocks with every machine.
>>
>>
>>
>> I'd be waiting for minutes if I waited till ntpd decided it had a
>> "good enough" estimate of the Motorola Oncore's time.  Ntpd *could*
>> have its first time estimate accurate to well under 1ms in half a
>> second on average.  The problem is, it won't tell you the time or step
>> the system clock until it filters the crap out of the refclock signal.
>>
>> Ntpd has a hard act to follow.  The old ntpdate program was a near
>> ideal solution from the perspective of booting.  It did its job
>> quickly and got off the pot.
> 
> 
> Except the few times when it messed up horribly:
> 
> I had one of the my GPS-based stratum 1 server go bad on me a few years 
> ago, and found out the hard way that _one_ critical server in our 
> infrastructure had been configured to use this particular server as the 
> only ntpdate reference:
> 
> It was rebooted during the time interval before the failing ntp/gps 
> server was located and turned off, with the result that this unix db 
> machine came up with a wildly wrong date.
> 
> After ntpdate had run, ntpd was started, but even though it had been 
> configured with four server lines, only the temporarily broken server 
> used by ntpdate was sufficiently close to be accepted, all the others 
> were deemed falsetickers.
> 
> A perfect ntpdate replacement needs to locate all or most of all the 
> configured servers, and run at least a couple of packets to each server, 
> and wait until a plurality have agreed on what the time is. This can be 
> handled in an absolute minimum of about 3-5 seconds without polling each 
> server more often than every two seconds.
> 
> If you configure fewer servers, then you'd need more packets to each 
> server before a sufficient reach value could be achieved for the set of 
> servers that agree.
> 
> The same goes in case of one or more falsetickers of course: They must 
> be detected and filtered out, which requires more packets to all the 
> servers than if they all agree.
> 
> Prof. Mills' latest tweak, allowing you to tune (i.e. reduce) the 
> required quality of the initial time estimate could be used to balance 
> your need for believable time vs time to achieve first sync.
> 
> Terje
> 
0
Reply David 2/4/2005 4:58:22 PM

At 2:47 PM -0200 2005-02-04, Alain wrote:

>>      Any time you switch interfaces, get a new IP address, or any of
>>  the other things that are typical for mobile environments, you're
>>  basically looking at a complete stop and restart, if not a complete
>>  stop, re-configure (presumably with totally different servers), and
>>  re-start.
>
>  Woldn normal operation cope with that? If it is a server at all,
>  continue with calculated historical drift. If both sets of servers
>  are in ntp.conf, after a while it will acquire sync again.

	The way ntpd works is that it looks up the IP address of the 
interfaces it is listening on when it boots, and then it explicitly 
binds to those IP addresses.  It never again looks up that 
information.  So, when your IP address changes, you have to restart 
ntpd.  You have the same problem if you switch interfaces.


	If you briefly lose connectivity, and you keep the same 
interface, and you keep the same IP address (maybe you were walking 
around in your house and were in an area that doesn't have good 
wireless coverage for a while), then you can help make recovery 
easier by configuring the LOCAL refclock.

	With a LOCAL refclock, if all the other servers go away, ntpd 
will at least continue running in degraded mode and using the latest 
calculated variables, and while the stratum value will drop (to 
whatever you fudged for the LOCAL refclock), it won't go out of 
state=4, and will start recovering as soon as you regain connectivity.

	If you don't have a LOCAL refclock, I'm not entirely certain what 
will happen, but it most likely won't be good and you will have to 
restart.

>  Maybe just be a bit more tolerant to a situation where all servers
>  become unavailable?

	Within limits, you can address that with a LOCAL refclock.

	Of course, not all vendors ship ntpd compiled with support for a 
LOCAL refclock, so you may have to rebuild your own ntpd binary.

>  And maybe just use longer times to calculate drift if connection times
>  are not stable. Just a parameter adjustment maybe?

	You should leave that up to ntpd.  These algorithms have been 
developed the hard way over thirty years, and unless you're Einstein 
or Dr. Mills, you're unlikely to be able to improve on them.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/4/2005 5:02:42 PM

Terje,

Ick. replace my -g with -q. Mea stromboli.

Dave

David L. Mills wrote:

> Terje,
> 
> I am baffled by your comments. Your "perfect ntpdate replacement" 
> described is precisely what ntpd -g does.
> 
> Dave
> 
> Terje Mathisen wrote:
> 
>> Wolfgang S. Rupprecht wrote:
>>
>>> Brad Knowles <brad@stop.mail-abuse.org> writes:
>>>
>>>>     If you want to make that delay shorter, I guess you could
>>>> package Stratum 1 refclocks with every machine.
>>>
>>>
>>>
>>>
>>> I'd be waiting for minutes if I waited till ntpd decided it had a
>>> "good enough" estimate of the Motorola Oncore's time.  Ntpd *could*
>>> have its first time estimate accurate to well under 1ms in half a
>>> second on average.  The problem is, it won't tell you the time or step
>>> the system clock until it filters the crap out of the refclock signal.
>>>
>>> Ntpd has a hard act to follow.  The old ntpdate program was a near
>>> ideal solution from the perspective of booting.  It did its job
>>> quickly and got off the pot.
>>
>>
>>
>> Except the few times when it messed up horribly:
>>
>> I had one of the my GPS-based stratum 1 server go bad on me a few 
>> years ago, and found out the hard way that _one_ critical server in 
>> our infrastructure had been configured to use this particular server 
>> as the only ntpdate reference:
>>
>> It was rebooted during the time interval before the failing ntp/gps 
>> server was located and turned off, with the result that this unix db 
>> machine came up with a wildly wrong date.
>>
>> After ntpdate had run, ntpd was started, but even though it had been 
>> configured with four server lines, only the temporarily broken server 
>> used by ntpdate was sufficiently close to be accepted, all the others 
>> were deemed falsetickers.
>>
>> A perfect ntpdate replacement needs to locate all or most of all the 
>> configured servers, and run at least a couple of packets to each 
>> server, and wait until a plurality have agreed on what the time is. 
>> This can be handled in an absolute minimum of about 3-5 seconds 
>> without polling each server more often than every two seconds.
>>
>> If you configure fewer servers, then you'd need more packets to each 
>> server before a sufficient reach value could be achieved for the set 
>> of servers that agree.
>>
>> The same goes in case of one or more falsetickers of course: They must 
>> be detected and filtered out, which requires more packets to all the 
>> servers than if they all agree.
>>
>> Prof. Mills' latest tweak, allowing you to tune (i.e. reduce) the 
>> required quality of the initial time estimate could be used to balance 
>> your need for believable time vs time to achieve first sync.
>>
>> Terje
>>
0
Reply David 2/4/2005 5:21:08 PM

Answers to several messages:

----------
David L. Mills escreveu:

 > Alain,
 > No, don't use ntpdate at all. Use ntpd -g with the tos maxdist 16 if 
you want instant synchronization. There are several combinations of the 
tos commmand that could be used to modify behavior, such as the minsane 
and minclocks arguments. TO determine what they do you have to 
understand the icky algorithms; however, after studying the briefings at 
the project page and understanding how the algorithms work, it should be 
fairly obvious.


The problem is that -g has a limit of 1000 seconds. My worst case 
scenario (I have seen it a lot) includes dead bateries and initial times 
of 1/1/1980. Second worst is UCT missconfiguration, many hours off. And 
not to forguet Summer Daylight savings time that is more than 1000s.

Even worse is that it exits ntpd. This requires another deamon to check 
if it happened :(

If ntpdate disapears, there should be *some* way of handling this.

-----------
Tom Smith escreveu:
 >> So, if I manage to set the initial time within a good aproximation od
 >> "real" time using ntpdate using 5 servers as explained earlyer, would
 >> you recomend to delete the drift file and let it start all over again?
 >
 > If you do that, there should be no need to clean out the
 > characteristic drift that was so carefully computed over a long
 > period of time. Cleaning out the drift file is a drastic measure
 > required only if you do NOT pre-set the clock before starting ntpd
 > and you end up, as a result, with a bogus re-calculated drift rate.

So I understand that testing drift and deleting it only if it is 
+-500.00 is the more generic aproach for for all situations?

----------
Terje Mathisen escreveu:
 > I had one of the my GPS-based stratum 1 server go bad on me a few years
 > ago, and found out the hard way that _one_ critical server in our
 > infrastructure had been configured to use this particular server as the
 > only ntpdate reference:
 >
 > It was rebooted during the time interval before the failing ntp/gps
 > server was located and turned off, with the result that this unix db
 > machine came up with a wildly wrong date.
 >
 > After ntpdate had run, ntpd was started, but even though it had been
 > configured with four server lines, only the temporarily broken server
 > used by ntpdate was sufficiently close to be accepted, all the others
 > were deemed falsetickers.
 >
 > A perfect ntpdate replacement needs to locate all or most of all the
 > configured servers, and run at least a couple of packets to each server,
 > and wait until a plurality have agreed on what the time is. This can be
 > handled in an absolute minimum of about 3-5 seconds without polling each
 > server more often than every two seconds.
 >
 > If you configure fewer servers, then you'd need more packets to each
 > server before a sufficient reach value could be achieved for the set of
 > servers that agree.
 >
 > The same goes in case of one or more falsetickers of course: They must
 > be detected and filtered out, which requires more packets to all the
 > servers than if they all agree.

Behind this is a missconfigured and forgotten configuration. I believe 
the script to use many servers that are *in* ntp.conf would prvent that, 
specialy that these servers will probably be checked periodicaly with 
ntpq -p.

-----------
I believe than that for a more general situation:
1) ntpdate with many servers from ntp.conf (5 is reasonable)
2) check if drift is +-500.000, if so delete it.
3) Start ntpd
4) periodicaly check external (and internal) servers to see if a 
reasonable number of them are still there.

Alain
0
Reply Alain 2/4/2005 5:33:18 PM

On 2005-02-04, Alain <alainm@pobox.com> wrote:

> The problem is that -g has a limit of 1000 seconds.

No. -g allows ntpd to EXCEED the sanity limit of 1000 seconds.

-q by itsself is subject to the 1000 second limit.

-gq is not.

-- 
Steve Kostecke <kostecke@ntp.isc.org>
NTP Public Services Project - http://ntp.isc.org/
0
Reply Steve 2/4/2005 6:18:17 PM

Brad Knowles wrote:
> At 4:21 AM +0000 2005-02-04, Tom Smith wrote:
> 
>>  # time ntpd -gq [13 servers in ntp.conf]
>>  ntpd: time slew -0.000373s
>>
>>  real    1m43.03s
>>  user    0m0.10s
>>  sys     0m0.60s
> 
> 
>     Yes, but what does that ntp.conf look like?  Are you using iburst?  
> Any authentication?  Manually setting minpoll and/or maxpoll?  Did you 
> have a good drift file to start with?
> 
>     You need to provide some more specifics before you can make an 
> attempt to compare this to ntpdate.
> 
>>  # time ntpdate -b [three selected servers]
>>   3 Feb 22:44:45 ntpdate[186032]: step time server [IP address] offset 
>> -0.000157 sec
> 
> 
>     Not comparable.  You need to include all thirteen servers before 
> this could potentially be considered comparable.
> 

As you wish...

# cat /etc/ntp.drift
-2.653

# ntpq -p
      remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
  LOCAL(1)        LOCAL(1)         5 l   41   64  377    0.000    0.000   0.004
  [IP.255]        0.0.0.0         16 u    -   64    0    0.000    0.000 4000.00
-[name]          .TRUE.           1 u  468 1024  377   12.741   -0.950   0.092
-[name]          .WWVB.           1 u  470 1024  377    1.388    1.015   1.216
-[name]          [name]           2 u  135  256  337    0.004   -0.490   0.440
-[name]          .GPS.            1 u  997 1024  377   88.936    0.153   1.854
+[name]          .GPS.            1 u  603 1024  377   88.146    0.546   0.119
*[name]          .GPS.            1 u  764 1024  377   88.311    0.605   1.387
-[name]          .GPS.            1 u  783 1024  377   73.649    0.766  12.499
#[name]          [name]           2 u  652 1024  377   83.509   -0.371   0.725
-[name]          .GPS.            1 u  720 1024  377   32.581    1.203   0.082
+[name]          .GPS.            1 u  744 1024  377  105.719    0.604   1.952
-[name]          .GPS.            1 u  686 1024  377   92.301    3.050   0.447
#[name]          [name]           2 u  339 1024  176    0.796   -1.401   0.245
#[name]          [name]           2 u  520 1024  376   10.376   -2.257   0.814
  [name]          0.0.0.0         16 u    - 1024    0    0.000    0.000 4000.00

# time ntpd -gq [above 14 servers/peers with iburst, plus local. No authentication.
                  Default min/max poll. All appropriate for long-term time
                  maintenance.]
ntpd: time slew 0.000527s

real    0m45.00s
user    0m0.11s
sys     0m0.55s

[Note that ntpd -gq isn't really done when it unblocks. It has only
  initiated a slew. But we'll let that pass becasuse the time is
  already way more than "good enough".]

# ntpdate -b [same set of servers and peers, except no local clock. Way
               overconfigured for one-time "good enough" acquisition at boot.]
  4 Feb 13:21:32 ntpdate[201766]: step time server [IP] offset 0.000647 sec

real    0m6.73s
user    0m0.00s
sys     0m0.00s

________________________________________________________________________
Tom Smith                       smith@alum.mit.edu,smith@cag.lkg.hp.com
Hewlett-Packard Company                          Tel: +1 (603) 884-6329
110 Spit Brook Road ZKO1-3/H42                   FAX: +1 (603) 884-6484
Nashua, New Hampshire 03062-2698, USA           Mobile: +1 978 397 3411
0
Reply Tom 2/4/2005 6:59:39 PM

In article <ctutfh$4fq$1@dewey.udel.edu> "David L. Mills"
<mills@udel.edu> writes:
>
>The code I see in the ntpdate source does an adjtime() for all offsets, 
>even large ones. I don't see a settimeofday() or equivalent.

That's just because it calls step_systime() in libntp/systime.c for that
- as it should, of course. Surely you didn't think the -b option was a
no-op and the documentation full of lies... (Well, I'm looking at 4.2.0
and assuming this functionality hasn't been removed in the development
version...)

--Per Hedeland
per@hedeland.org
0
Reply per 2/4/2005 7:46:07 PM

In article <mailman.44.1107536681.583.questions@lists.ntp.isc.org> Brad
Knowles <brad@stop.mail-abuse.org> writes:
>
>	If you briefly lose connectivity, and you keep the same 
>interface, and you keep the same IP address (maybe you were walking 
>around in your house and were in an area that doesn't have good 
>wireless coverage for a while), then you can help make recovery 
>easier by configuring the LOCAL refclock.
>
>	With a LOCAL refclock, if all the other servers go away, ntpd 
>will at least continue running in degraded mode and using the latest 
>calculated variables

As has been pointed out many times here, there's no need to have a LOCAL
clock configured for that - ntpd will do it anyway. The only reason to
have a LOCAL clock configured is if you need ntpd to serve time to
others in this situation.

--Per Hedeland
per@hedeland.org
0
Reply per 2/4/2005 8:02:19 PM

David L. Mills wrote:

> Terje,
> 
> I am baffled by your comments. Your "perfect ntpdate replacement" 
> described is precisely what ntpd -g does.

Exactly!

That was what I hoped the original poster would realize. :-)

Terje

-- 
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 2/4/2005 8:05:39 PM

Tom Smith wrote:
> # time ntpd -gq [above 14 servers/peers with iburst, plus local. No 
> authentication.
>                  Default min/max poll. All appropriate for long-term time
>                  maintenance.]

I believe using less servers, but minpoll 4, would result in faster 
initial aquisition.

Prof. Mills' recent addition of a tuning knob to reduce the confidence 
level for first time acceptance will make it even faster.

Terje
-- 
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
0
Reply Terje 2/4/2005 8:12:12 PM

"David L. Mills" <mills@udel.edu> writes:

> Hangups as you describe is exactly what the simulator is designed
> to reveal and it has revealed them from time to time as folks learn
> new ways to misconfigure and warp the hardware and new operating

I know this is both a newsgroup and a mailing list, but would it be
possible not to top post?  :>

Regards,
David
0
Reply David 2/4/2005 8:18:00 PM

Terje Mathisen wrote:
> Tom Smith wrote:
> 
>> # time ntpd -gq [above 14 servers/peers with iburst, plus local. No 
>> authentication.
>>                  Default min/max poll. All appropriate for long-term time
>>                  maintenance.]
> 
> 
> I believe using less servers, but minpoll 4, would result in faster 
> initial aquisition.
> 
> Prof. Mills' recent addition of a tuning knob to reduce the confidence 
> level for first time acceptance will make it even faster.
> 
> Terje

So you're suggesting that the requirements for initial acquisition
of an estimated time necessary to get ntpd off the grounbd are different
from the requirements for long-term maintenance of an accurate and stable time.

I certainly agree.
0
Reply Tom 2/4/2005 8:55:57 PM

Is it intentional or a but that this list has no "Reply to:" attribute? 
The list just moved, is this simply a a missing configuration?

I just sent a message to Tom Smith that was intended for the list, sorry 
Tom ;-)

Alain


0
Reply Alain 2/4/2005 9:34:08 PM

Alain,

In the ntpd documentation for the -g option:

Normally, ntpd exits with a message to the system log if the offset 
exceeds the panic threshold, which is 1000 s by default. This option 
allows the time to be set to any value without restriction; however, 
this can happen only once. If the threshold is exceeded after that, ntpd 
will exit with a message to the system log. This option can be used with 
the -q and -x options. See the tinker command for other options.

Dave

Alain wrote:
> Answers to several messages:
> 
> ----------
> David L. Mills escreveu:
> 
>  > Alain,
>  > No, don't use ntpdate at all. Use ntpd -g with the tos maxdist 16 if 
> you want instant synchronization. There are several combinations of the 
> tos commmand that could be used to modify behavior, such as the minsane 
> and minclocks arguments. TO determine what they do you have to 
> understand the icky algorithms; however, after studying the briefings at 
> the project page and understanding how the algorithms work, it should be 
> fairly obvious.
> 
> 
> The problem is that -g has a limit of 1000 seconds. My worst case 
> scenario (I have seen it a lot) includes dead bateries and initial times 
> of 1/1/1980. Second worst is UCT missconfiguration, many hours off. And 
> not to forguet Summer Daylight savings time that is more than 1000s.
> 
> Even worse is that it exits ntpd. This requires another deamon to check 
> if it happened :(
> 
> If ntpdate disapears, there should be *some* way of handling this.
> 
> -----------
> Tom Smith escreveu:
>  >> So, if I manage to set the initial time within a good aproximation od
>  >> "real" time using ntpdate using 5 servers as explained earlyer, would
>  >> you recomend to delete the drift file and let it start all over again?
>  >
>  > If you do that, there should be no need to clean out the
>  > characteristic drift that was so carefully computed over a long
>  > period of time. Cleaning out the drift file is a drastic measure
>  > required only if you do NOT pre-set the clock before starting ntpd
>  > and you end up, as a result, with a bogus re-calculated drift rate.
> 
> So I understand that testing drift and deleting it only if it is 
> +-500.00 is the more generic aproach for for all situations?
> 
> ----------
> Terje Mathisen escreveu:
>  > I had one of the my GPS-based stratum 1 server go bad on me a few years
>  > ago, and found out the hard way that _one_ critical server in our
>  > infrastructure had been configured to use this particular server as the
>  > only ntpdate reference:
>  >
>  > It was rebooted during the time interval before the failing ntp/gps
>  > server was located and turned off, with the result that this unix db
>  > machine came up with a wildly wrong date.
>  >
>  > After ntpdate had run, ntpd was started, but even though it had been
>  > configured with four server lines, only the temporarily broken server
>  > used by ntpdate was sufficiently close to be accepted, all the others
>  > were deemed falsetickers.
>  >
>  > A perfect ntpdate replacement needs to locate all or most of all the
>  > configured servers, and run at least a couple of packets to each server,
>  > and wait until a plurality have agreed on what the time is. This can be
>  > handled in an absolute minimum of about 3-5 seconds without polling each
>  > server more often than every two seconds.
>  >
>  > If you configure fewer servers, then you'd need more packets to each
>  > server before a sufficient reach value could be achieved for the set of
>  > servers that agree.
>  >
>  > The same goes in case of one or more falsetickers of course: They must
>  > be detected and filtered out, which requires more packets to all the
>  > servers than if they all agree.
> 
> Behind this is a missconfigured and forgotten configuration. I believe 
> the script to use many servers that are *in* ntp.conf would prvent that, 
> specialy that these servers will probably be checked periodicaly with 
> ntpq -p.
> 
> -----------
> I believe than that for a more general situation:
> 1) ntpdate with many servers from ntp.conf (5 is reasonable)
> 2) check if drift is +-500.000, if so delete it.
> 3) Start ntpd
> 4) periodicaly check external (and internal) servers to see if a 
> reasonable number of them are still there.
> 
> Alain
0
Reply David 2/4/2005 9:53:31 PM

David,

With genuine respect, I won't do that. I have enough trouble with 
degraded eyesight and enlarged font as it is. The messages go from most 
recent to oldest and I keep that order.

Dave

David Magda wrote:

> "David L. Mills" <mills@udel.edu> writes:
> 
> 
>>Hangups as you describe is exactly what the simulator is designed
>>to reveal and it has revealed them from time to time as folks learn
>>new ways to misconfigure and warp the hardware and new operating
> 
> 
> I know this is both a newsgroup and a mailing list, but would it be
> possible not to top post?  :>
> 
> Regards,
> David
0
Reply David 2/4/2005 9:58:35 PM

Tom Smith wrote:

> Brad Knowles wrote:
>
>> At 4:21 AM +0000 2005-02-04, Tom Smith wrote:
>>
>>>  # time ntpd -gq [13 servers in ntp.conf]
>>>  ntpd: time slew -0.000373s
>>>
>>>  real    1m43.03s
>>>  user    0m0.10s
>>>  sys     0m0.60s
>>
>>
>>
>>     Yes, but what does that ntp.conf look like?  Are you using 
>> iburst?  Any authentication?  Manually setting minpoll and/or 
>> maxpoll?  Did you have a good drift file to start with?
>>
>>     You need to provide some more specifics before you can make an 
>> attempt to compare this to ntpdate.
>>
>>>  # time ntpdate -b [three selected servers]
>>>   3 Feb 22:44:45 ntpdate[186032]: step time server [IP address] 
>>> offset -0.000157 sec
>>
>>
>>
>>     Not comparable.  You need to include all thirteen servers before 
>> this could potentially be considered comparable.
>>
>
> As you wish...
>
> # cat /etc/ntp.drift
> -2.653
>
> # ntpq -p
>      remote           refid      st t when poll reach   delay   
> offset  jitter
> ============================================================================== 
>
>  LOCAL(1)        LOCAL(1)         5 l   41   64  377    0.000    
> 0.000   0.004
>  [IP.255]        0.0.0.0         16 u    -   64    0    0.000    0.000 
> 4000.00
> -[name]          .TRUE.           1 u  468 1024  377   12.741   
> -0.950   0.092
> -[name]          .WWVB.           1 u  470 1024  377    1.388    
> 1.015   1.216
> -[name]          [name]           2 u  135  256  337    0.004   
> -0.490   0.440
> -[name]          .GPS.            1 u  997 1024  377   88.936    
> 0.153   1.854
> +[name]          .GPS.            1 u  603 1024  377   88.146    
> 0.546   0.119
> *[name]          .GPS.            1 u  764 1024  377   88.311    
> 0.605   1.387
> -[name]          .GPS.            1 u  783 1024  377   73.649    
> 0.766  12.499
> #[name]          [name]           2 u  652 1024  377   83.509   
> -0.371   0.725
> -[name]          .GPS.            1 u  720 1024  377   32.581    
> 1.203   0.082
> +[name]          .GPS.            1 u  744 1024  377  105.719    
> 0.604   1.952
> -[name]          .GPS.            1 u  686 1024  377   92.301    
> 3.050   0.447
> #[name]          [name]           2 u  339 1024  176    0.796   
> -1.401   0.245
> #[name]          [name]           2 u  520 1024  376   10.376   
> -2.257   0.814
>  [name]          0.0.0.0         16 u    - 1024    0    0.000    0.000 
> 4000.00
>
> # time ntpd -gq [above 14 servers/peers with iburst, plus local. No 
> authentication.
>                  Default min/max poll. All appropriate for long-term time
>                  maintenance.]
> ntpd: time slew 0.000527s
>
> real    0m45.00s
> user    0m0.11s
> sys     0m0.55s
>
> [Note that ntpd -gq isn't really done when it unblocks. It has only
>  initiated a slew. But we'll let that pass becasuse the time is
>  already way more than "good enough".]
>
> # ntpdate -b [same set of servers and peers, except no local clock. Way
>               overconfigured for one-time "good enough" acquisition at 
> boot.]
>  4 Feb 13:21:32 ntpdate[201766]: step time server [IP] offset 0.000647 
> sec
>
> real    0m6.73s
> user    0m0.00s
> sys     0m0.00s
>
> ________________________________________________________________________
> Tom Smith                       smith@alum.mit.edu,smith@cag.lkg.hp.com
> Hewlett-Packard Company                          Tel: +1 (603) 884-6329
> 110 Spit Brook Road ZKO1-3/H42                   FAX: +1 (603) 884-6484
> Nashua, New Hampshire 03062-2698, USA           Mobile: +1 978 397 3411

If you omit the "q" option, ntpd will simply set the clock and keep 
running!  If the error is greater than 128 milliseconds it will step the 
clock, otherwise it will slew the clock.   In either case you are up and 
running with a clock error of less than 128 millieconds.  Is this not 
good enough?   If not, what is your requirement for accuracy?
0
Reply Richard 2/4/2005 9:59:10 PM

Per,

Well, I found the step routine call; however, that code has grown so 
weedy with intricate evil little OS-dependencies that I find it 
unreadable. I haven't touched the ntpdate code since it first appeared 
probably fifteen years ago. So far as I can see, if somebody opts out 
the step correction, ntpdate can leave a big offset for later ntpd to 
chew on.

Dav

Per Hedeland wrote:

> In article <ctutfh$4fq$1@dewey.udel.edu> "David L. Mills"
> <mills@udel.edu> writes:
> 
>>The code I see in the ntpdate source does an adjtime() for all offsets, 
>>even large ones. I don't see a settimeofday() or equivalent.
> 
> 
> That's just because it calls step_systime() in libntp/systime.c for that
> - as it should, of course. Surely you didn't think the -b option was a
> no-op and the documentation full of lies... (Well, I'm looking at 4.2.0
> and assuming this functionality hasn't been removed in the development
> version...)
> 
> --Per Hedeland
> per@hedeland.org
0
Reply David 2/4/2005 10:41:35 PM

In article <cu0tn7$jmh$1@dewey.udel.edu> "David L. Mills"
<mills@udel.edu> writes:
>
>Well, I found the step routine call; however, that code has grown so 
>weedy with intricate evil little OS-dependencies that I find it 
>unreadable. I haven't touched the ntpdate code since it first appeared 
>probably fifteen years ago. So far as I can see, if somebody opts out 
>the step correction, ntpdate can leave a big offset for later ntpd to 
>chew on.

Yes, and along these lines are the reasons for dropping ntpdate that you
have given in the past:

a) It's hideously complex
b) It does something that is similar to / a subset of what ntpd does (or
   rather what it did many years ago), but not the same, which together
   with a) makes it a pain to maintain
c) It's feature set gives the impression that it might be reasonable to
   use *instead* of ntpd (e.g. running hourly from cron) - there's no
   point having the slew modes otherwise
d) If widely used as in c), it's quite unfriendly to servers, when
   gazillions of boxes send their 4 packets * N servers exactly on the
   hour.

FWIW, I find them perfectly valid - I'm just still not quite happy with
the replacement.:-)

If your latest ntpd tweak knobs can really achieve the low-quality-but-
really-quick time setting that ntpdate provides, *and* the combination
of knob settings needed for that is codified into another ntpd option
(--impatient maybe?:-), I think ntpdate can finally be laid to rest.
Requiring a separate config file just for this boot-time setting, with
parameters and values that are even more esoteric to the average user,
is really a show stopper IMHO.

--Per Hedeland
per@hedeland.org
0
Reply per 2/4/2005 11:34:34 PM

Hi all,

I made some tests with extreme bad initial clocks:

# date --set "01/01/00 00:00"
S�b Jan  1 00:00:00 BRDT 2000
# ntpd -gq
ntpd: time reset 160871283.056670s

This does works great, but takes 45 seconds (8 servers with iburst).
This -gq combination was not clear in the docs, now that I know about it 
I read it again and in fact it is there, but I did read it many times 
and missed just something.

ntpdate with 5 servers 3 seconds
ntpdate with 8 servers 4 seconds

It looks like ntpdate does most things in parelel.

Per Hedeland escreveu:
> If your latest ntpd tweak knobs can really achieve the low-quality-but-
> really-quick time setting that ntpdate provides, *and* the combination
> of knob settings needed for that is codified into another ntpd option
> (--impatient maybe?:-), I think ntpdate can finally be laid to rest.
> Requiring a separate config file just for this boot-time setting, with
> parameters and values that are even more esoteric to the average user,
> is really a show stopper IMHO.

Agreed. Maybe the "twek knobs" + "minpoll=1" (this look similar to what 
ntpdate uses) could be command line or just temporary and disabled after 
the initial time set. I hope that whatever replaces ntpdate is 
aproximately as fast (say like <5s)

Thanks for your patience again,
Alain
0
Reply Alain 2/5/2005 12:52:37 AM

At 8:02 PM +0000 2005-02-04, Per Hedeland wrote:

>  As has been pointed out many times here, there's no need to have a LOCAL
>  clock configured for that - ntpd will do it anyway. The only reason to
>  have a LOCAL clock configured is if you need ntpd to serve time to
>  others in this situation.

	I described the behaviour that I have personally experienced.

	No more, no less.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/5/2005 1:12:04 AM

At 7:34 PM -0200 2005-02-04, Alain wrote:

>  Is it intentional or a but that this list has no "Reply to:" attribute?
>  The list just moved, is this simply a a missing configuration?

	There is no Reply-to: header, nor will there be so long as I am 
in charge of running the mail system and the mailing list server 
system.

	See <http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq03.048.htp>.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/5/2005 1:22:03 AM

Per Hedeland wrote:

>In article <cu0tn7$jmh$1@dewey.udel.edu> "David L. Mills"
><mills@udel.edu> writes:
>  
>
>>Well, I found the step routine call; however, that code has grown so 
>>weedy with intricate evil little OS-dependencies that I find it 
>>unreadable. I haven't touched the ntpdate code since it first appeared 
>>probably fifteen years ago. So far as I can see, if somebody opts out 
>>the step correction, ntpdate can leave a big offset for later ntpd to 
>>chew on.
>>    
>>
>
>Yes, and along these lines are the reasons for dropping ntpdate that you
>have given in the past:
>
>a) It's hideously complex
>b) It does something that is similar to / a subset of what ntpd does (or
>   rather what it did many years ago), but not the same, which together
>   with a) makes it a pain to maintain
>c) It's feature set gives the impression that it might be reasonable to
>   use *instead* of ntpd (e.g. running hourly from cron) - there's no
>   point having the slew modes otherwise
>d) If widely used as in c), it's quite unfriendly to servers, when
>   gazillions of boxes send their 4 packets * N servers exactly on the
>   hour.
>
>FWIW, I find them perfectly valid - I'm just still not quite happy with
>the replacement.:-)
>
>If your latest ntpd tweak knobs can really achieve the low-quality-but-
>really-quick time setting that ntpdate provides, *and* the combination
>of knob settings needed for that is codified into another ntpd option
>(--impatient maybe?:-), I think ntpdate can finally be laid to rest.
>Requiring a separate config file just for this boot-time setting, with
>parameters and values that are even more esoteric to the average user,
>is really a show stopper IMHO.
>
>--Per Hedeland
>per@hedeland.org
>  
>
I think maybe what people are looking for, that ntpdate seems to do and 
ntpd does not, is:
1.  Get the time accurate to within  X seconds quickly.
2.  Set that time quickly.  The presumption here is that we are in a 
state where it's OK to step the clock; e.g. we are booting and nothing 
has started yet that will be upset by a change in time.  Slewing the 
clock to correct an offset of, say, 127 milliseconds, to within, say, 5 
milliseconds,  will take about 250 seconds and that is unacceptable when 
we need to be within 5 milliseconds of the correct time as quickly as 
possible.

The user may wish to specify the required accuracy, knowing that there 
is a trade off with elapsed time to achieve that accuracy.  How good 
"good enough" must be will vary.   Someone recording the time workers 
started work and the time they stopped may be satisfied with plus/minus 
one minute but it's 8:00AM local time and he needs it right now!  
Someone else might really need plus/minus ten microseconds and be 
willing and able to wait two or three hours to get it.

NTP may not be the right tool to meet some requirements but I think it 
should be possible to satisfy a lot of people with something that has 
capabilities similar to what I've outlined.

0
Reply Richard 2/5/2005 1:55:55 AM

Alain,

I did the same thing as you, but ntpd -gq with 8 servers and iburst set 
the clock in 8 seconds, not 45. This includes DNS. Did yours stall in 
DNS? Each of the g, q and x option descriptions has a sentence that 
mentions that the option can be used in conjuntion with the other two.

The ntpd polls the servers in parallel, but offsets to avoid bunching. 
Note ntpd uses a two-second poll interval to avoid violating the KoD 
rules. The ntpdate uses one second, which result up to half the polls 
can be ignored and (if configured - our severs are) a KoD sent.

The Corps is working on a SNTP replacement for ntpdate, presumably 
compliant with the ID I mentioned earlier. I can't tell you how relieved 
I would be when this comes to pass and evil ntpdate can be finally torched.

I'm really nervous about the kit of knobs now in ntpd which may be ripe 
for misinterpretation, misuse and misunderstood documenation. Every new 
"feature" increases the complexity and fragility of the program and 
consumes lots of my time in testing, documentation and mail 
correspondence. The IETF task force now studying the specification issue 
should take up the issue of a standard set of parameters selectable at 
configuration time which would cater to whatever the community wants.

Dave

Alain wrote:
> Hi all,
> 
> I made some tests with extreme bad initial clocks:
> 
> # date --set "01/01/00 00:00"
> S�b Jan  1 00:00:00 BRDT 2000
> # ntpd -gq
> ntpd: time reset 160871283.056670s
> 
> This does works great, but takes 45 seconds (8 servers with iburst).
> This -gq combination was not clear in the docs, now that I know about it 
> I read it again and in fact it is there, but I did read it many times 
> and missed just something.
> 
> ntpdate with 5 servers 3 seconds
> ntpdate with 8 servers 4 seconds
> 
> It looks like ntpdate does most things in parelel.
> 
> Per Hedeland escreveu:
> 
>> If your latest ntpd tweak knobs can really achieve the low-quality-but-
>> really-quick time setting that ntpdate provides, *and* the combination
>> of knob settings needed for that is codified into another ntpd option
>> (--impatient maybe?:-), I think ntpdate can finally be laid to rest.
>> Requiring a separate config file just for this boot-time setting, with
>> parameters and values that are even more esoteric to the average user,
>> is really a show stopper IMHO.
> 
> 
> Agreed. Maybe the "twek knobs" + "minpoll=1" (this look similar to what 
> ntpdate uses) could be command line or just temporary and disabled after 
> the initial time set. I hope that whatever replaces ntpdate is 
> aproximately as fast (say like <5s)
> 
> Thanks for your patience again,
> Alain
0
Reply David 2/5/2005 5:07:05 PM

In article <mailman.52.1107569121.583.questions@lists.ntp.isc.org> Brad
Knowles <brad@stop.mail-abuse.org> writes:
>At 8:02 PM +0000 2005-02-04, Per Hedeland wrote:
>
>>  As has been pointed out many times here, there's no need to have a LOCAL
>>  clock configured for that - ntpd will do it anyway. The only reason to
>>  have a LOCAL clock configured is if you need ntpd to serve time to
>>  others in this situation.
>
>	I described the behaviour that I have personally experienced.
>
>	No more, no less.

Not so - the experience you described was that having a LOCAL clock
configured worked well. I don't dispute that, but what you actually
wrote was that this was something that was needed: "you can help make
recovery easier by configuring the LOCAL refclock" and (without LOCAL)
"it most likely won't be good and you will have to restart". Both of
those statements are blatantly incorrect, and the configuration of a
LOCAL clock will often cause serious problems, in particular when the
appropriate "fudge" is omitted.

--Per Hedeland
per@hedeland.org
0
Reply per 2/5/2005 9:13:13 PM

Richard B. Gilbert wrote:
> Tom Smith wrote:
>
> > David L. Mills wrote:
> >
> >> Kenneth,
> >>
> >> This is the single most persistent issue in the engineering design
of
> >> NTP. There must be tradeoffs between security, robustenss,
accuracy
> >> and initial delay. In the current design compromise, a server is
> >> acceptable only after three/four rounds of messages and the
ensemble
> >> time is acceptable with at least one of possibly several
acceptable
> >> servers. With IBURST mode, takes takes 6-8 seconds.
> >>
> >> For better robustness use "tos minclock N", where the at least N
> >> (default 1) servers must be acceptable to set the clock. Tonight I

> >> put in a "tos maxdist M", where M is the distance threshold below
> >> which the server is acceptable. Set "tos maxdist 16" and the first

> >> sample received from any server will set the clock likety-split.
Of
> >> course, essentially all the mitigation algorithms using
> >> multiple-sample redundancy and multiple-server diversity are
> >> systematically defeated. You might as well use SNTP.
> >
> >
> > David,
> >
> > I know the subject has been workstations, but let's talk for a
moment
> > about this religion as it concerns servers - like the ones that run
> > telephone companies, stock exchanges, and banks inside heavily
> > defended firewalls. It's the same issue, it's just that the stakes
> > are higher. The issue is how quickly can you get these
> > systems back up at boot. 15-30 seconds is a long time to wait.
> > Too long.
> >
> > We're not talking about one-shot sampling for maintaining the time,
> > so comparisons to SNTP are not helpful. We're talking about speed
of
> > acquistion of an initial "good enough" time, keeping in mind that
the
> > perfect is often the enemy of the good.
> >
> > You might argue that if boot time is critical, just let the server
come
> > up with whatever random time it comes up with and let ntpd fix
> > it up later. Give it a "-g" so it doesn't complain. A lot of folks
> > have tried this in the past inadvertently (and continue to do so)
> > by neglecting to put ntpdate into their boot sequence ahead of
ntpd.
> > I've fixed a lot of systems whose drift files were pinned
> > at 500 ppm and whose systems ran perpetually fast or slow as
> > a result. We've also spent a lot of money fruitlessly replacing
> > motherboards on those systems. Turning a large initial offset over
> > to ntpd is decidedly NOT a Good Idea.
> >
> > The reason why so many of your constituency keep bringing this
> > subject up is that they know that ntpd needs a good (not perfect)
> > estimate of the time before it starts and that critical systems
> > can't wait for perfection to get that estimate.
> >
> > -Tom
> >
________________________________________________________________________
> > Tom Smith
smith@alum.mit.edu,smith@cag.lkg.hp.com
> > Hewlett-Packard Company                          Tel: +1 (603)
884-6329
> > 110 Spit Brook Road ZKO1-3/H42                   FAX: +1 (603)
884-6484
> > Nashua, New Hampshire 03062-2698, USA           Mobile: +1 978 397
3411
> >
> Tom,
>
> I think it all boils down to how good is "good enough"?   Your snail
> mail address suggests that you're in VMS Engineering or, if not, you
> could throw rocks at them!   VMS, although it keeps time in units of
100
> nanosecond "ticks", only updates the clock every ten milliseconds!
> (Measure with micrometer, mark with chalk, cut with ax?)
> The documented and supported interfaces in VMS only permit you to set

> the clock and read the clock to the nearest ten milliseconds.
>

Actually Richard, if you look again he is using an lkg email address
and this is the networks group that recently moved from LKG to ZKO.
You shouldn't make any assumptions about which group he's talking
about since the Tru64 Unix group is also in ZK0. VMS Engineering
used to be in ZKO3 but I don't know if that's changed.
I've forgotten about how the VMS API's look as that's been a few years,
though I still have access to the doc sets. There's also likely
to be a difference between what you can do with A VAX as opposed to
an Alpha.

> If you are willing to have a server come up with a clock error of one

> second, just boot and start ntpd later.  If you need to have time
> correct to the nearest microsecond, you are using the wrong tools.
>
> If you are, in fact, talking about VMS and TCP/IP services, porting
the
> latest version of the NTP reference implementation would help you
speed
> up the startup.  The last time I looked, TCP/IP Services (V5.1)  was
> using a port of NTP V3-5.91 which does not support the iburst
> qualifier.  Iburst allows much faster initialization; it gets you a
> "good enough" time and frequency correction in about 8 seconds.
>

Jason has been doing that work for VMS. Tom sounds more like a
support person trying to deal with customers having problems like
this.

> If eight seconds is too long, you need to specify how quickly you
need
> to acquire the correct time and how accurate the time must be.
These
> two specifications pretty much determine the tools you must use to
meet
> them; e.g. if you need time correct to +/- 50 nanoseconds and need to

> set it within 100 microseconds, you will almost certainly need to use
a
> hardware reference clock such as a cesium or rubidium standard.

If you are a client trying to deal with mission-critical systems
I can well imagine a need to get the initial time accurate very
quickly, but then if it's that critical, a refclock is the proper
solution rather than relying on another system to provide it.
Don't forget you need the IP stack running first before you can
check externally for a time reference.

Danny

0
Reply mayer 2/5/2005 10:27:36 PM

At 9:13 PM +0000 2005-02-05, Per Hedeland wrote:

>                           I don't dispute that, but what you actually
>  wrote was that this was something that was needed: "you can help make
>  recovery easier by configuring the LOCAL refclock" and (without LOCAL)
>  "it most likely won't be good and you will have to restart".

	Which is precisely the behaviour that I experienced.  Without a 
LocalCLK, my machine would never recover from a loss of LAN 
connectivity.  With a LocalCLK, it would.  Therefore, I came to the 
conclusion that I described.

	Now, what part of my own personal experience are you supposedly 
not disputing?

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/6/2005 1:01:35 AM

At 10:52 PM -0200 2005-02-04, Alain wrote:

>  ntpdate with 5 servers 3 seconds
>  ntpdate with 8 servers 4 seconds
>
>  It looks like ntpdate does most things in parelel.

	No, it does them in sequence.  Witness:

# /usr/bin/time -p ntpdate -b 10.0.1.240 time.euro.apple.com 
de.pool.ntp.org fr.pool.ntp.org nl.pool.ntp.org uk.pool.ntp.org 
0.europe.pool.ntp.org 1.europe.pool.ntp.org 2.europe.pool.ntp.org 
europe.pool.ntp.org
Password:
Looking for host 10.0.1.240 and service ntp
host found : 10.0.1.240
Looking for host time.euro.apple.com and service ntp
host found : interweb.euro.apple.com
Looking for host de.pool.ntp.org and service ntp
host found : ipx10540.ipxserver.de
Looking for host fr.pool.ntp.org and service ntp
host found : granny.lievin.net
Looking for host nl.pool.ntp.org and service ntp
host found : i157107.upc-i.chello.nl
Looking for host uk.pool.ntp.org and service ntp
host found : cheddar.halon.org.uk
Looking for host 0.europe.pool.ntp.org and service ntp
host found : time.as-computer.biz
Looking for host 1.europe.pool.ntp.org and service ntp
host found : change2linux.com
Looking for host 2.europe.pool.ntp.org and service ntp
host found : 62.101.81.203
Looking for host europe.pool.ntp.org and service ntp
host found : aszlig.net
  6 Feb 04:05:36 ntpdate[28684]: sendto(10.0.1.240): Host is down
  6 Feb 04:05:37 ntpdate[28684]: sendto(10.0.1.240): Host is down
  6 Feb 04:05:38 ntpdate[28684]: sendto(10.0.1.240): Host is down
  6 Feb 04:05:39 ntpdate[28684]: sendto(10.0.1.240): Host is down
  6 Feb 04:05:40 ntpdate[28684]: step time server 193.201.200.139 
offset -0.013175 sec
real        14.43
user         0.01
sys          0.04

	And here's ntpd:

# /usr/bin/time -p ntpd -gq -f /var/run/ntp.drift -p 
/var/run/ntpd.pid 
ntpd: time slew -0.009592s
real        16.07
user         0.01
sys          0.01


	It just so happens that, in many cases, you can get those few 
packets out to the servers and back quickly enough that ntpdate will 
finish within a few seconds.

	Steve has shown that ntpd will get up and running in about seven 
seconds on his machine, whereas it's a bit slower on mine since some 
of the servers I've selected in my ntp.conf are down/non-responsive. 
I'm also on a slow ADSL line, I'm connected to that through an Apple 
Airport Extreme base station on a wireless 802.11b network with WEP 
encryption (which always slows down wireless network throughput), and 
I'm running on an ancient PowerBook G3 laptop running MacOS X that is 
being pushed fairly hard to just run the OS and keep a few programs 
in memory, much less do anything useful.

	Contrariwise, Steve is running on a Soekris 4801 single board 
computer which is pretty much otherwise unloaded (except for ntpd), 
running a fairly stripped version of FreeBSD, and I believe he's 
connected to his cable modem line directly via 10/100 Base-T/TX 
Ethernet as opposed to WLAN.

	I don't know how much of my slow down is due to what factors, but 
I'm not too surprised.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/6/2005 3:07:36 AM

Brad Knowles wrote:
> At 10:52 PM -0200 2005-02-04, Alain wrote:
> 
>>  ntpdate with 5 servers 3 seconds
>>  ntpdate with 8 servers 4 seconds
>>
>>  It looks like ntpdate does most things in parelel.
> 
> 
>     No, it does them in sequence.  Witness:
> 
> # /usr/bin/time -p ntpdate -b 10.0.1.240 time.euro.apple.com 
> de.pool.ntp.org fr.pool.ntp.org nl.pool.ntp.org uk.pool.ntp.org 
> 0.europe.pool.ntp.org 1.europe.pool.ntp.org 2.europe.pool.ntp.org 
> europe.pool.ntp.org
> Password:
> Looking for host 10.0.1.240 and service ntp
> host found : 10.0.1.240
> Looking for host time.euro.apple.com and service ntp
> host found : interweb.euro.apple.com
> Looking for host de.pool.ntp.org and service ntp
> host found : ipx10540.ipxserver.de
> Looking for host fr.pool.ntp.org and service ntp
> host found : granny.lievin.net
> Looking for host nl.pool.ntp.org and service ntp
> host found : i157107.upc-i.chello.nl
> Looking for host uk.pool.ntp.org and service ntp
> host found : cheddar.halon.org.uk
> Looking for host 0.europe.pool.ntp.org and service ntp
> host found : time.as-computer.biz
> Looking for host 1.europe.pool.ntp.org and service ntp
> host found : change2linux.com
> Looking for host 2.europe.pool.ntp.org and service ntp
> host found : 62.101.81.203
> Looking for host europe.pool.ntp.org and service ntp
> host found : aszlig.net
>  6 Feb 04:05:36 ntpdate[28684]: sendto(10.0.1.240): Host is down
>  6 Feb 04:05:37 ntpdate[28684]: sendto(10.0.1.240): Host is down
>  6 Feb 04:05:38 ntpdate[28684]: sendto(10.0.1.240): Host is down
>  6 Feb 04:05:39 ntpdate[28684]: sendto(10.0.1.240): Host is down
>  6 Feb 04:05:40 ntpdate[28684]: step time server 193.201.200.139 offset 
> -0.013175 sec
> real        14.43
> user         0.01
> sys          0.04
> 
>     And here's ntpd:
> 
> # /usr/bin/time -p ntpd -gq -f /var/run/ntp.drift -p /var/run/ntpd.pid 
> ntpd: time slew -0.009592s
> real        16.07
> user         0.01
> sys          0.01
> 
> 
>     It just so happens that, in many cases, you can get those few 
> packets out to the servers and back quickly enough that ntpdate will 
> finish within a few seconds.
> 
>     Steve has shown that ntpd will get up and running in about seven 
> seconds on his machine, whereas it's a bit slower on mine since some of 
> the servers I've selected in my ntp.conf are down/non-responsive. I'm 
> also on a slow ADSL line, I'm connected to that through an Apple Airport 
> Extreme base station on a wireless 802.11b network with WEP encryption 
> (which always slows down wireless network throughput), and I'm running 
> on an ancient PowerBook G3 laptop running MacOS X that is being pushed 
> fairly hard to just run the OS and keep a few programs in memory, much 
> less do anything useful.
> 
>     Contrariwise, Steve is running on a Soekris 4801 single board 
> computer which is pretty much otherwise unloaded (except for ntpd), 
> running a fairly stripped version of FreeBSD, and I believe he's 
> connected to his cable modem line directly via 10/100 Base-T/TX Ethernet 
> as opposed to WLAN.
> 
>     I don't know how much of my slow down is due to what factors, but 
> I'm not too surprised.
> 

Maybe a large part of your ntpdate time was printing out the messages.

There was some question earlier about whether DNS delays might
explain the lengthy time for ntpd. So I performed the same experiment
on the same system, which is its own DNS server, caching the
names first, and doing ntpdate before ntpd to make doubly sure.
There is no meaningful difference.

Starting with:

# cat /etc/ntp.drift
-2.300

# ntpq -p
      remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
  LOCAL(1)        LOCAL(1)         5 l    9   64  377    0.000    0.000   0.004
  [network].255   0.0.0.0         16 u    -   64    0    0.000    0.000 4000.00
-[name]          .TRUE.           1 u  752 1024  377   30.571    6.192   8.958
-[name]          .WWVB.           1 u  694 1024  377    0.176    2.184   1.237
-[name]          [name]           2 u  888  512  156    0.004    1.563   1.258
+[name]          .GPS.            1 u  671 1024  377   87.984    0.651   0.047
+[name*]         .GPS.            1 u  752 1024  377   88.089    0.590   0.055
*[name]          .GPS.            1 u  338 1024  377   87.950    0.649   0.014
-[name]          .GPS.            1 u  749 1024  377   73.684    0.285   0.227
#[name]          [name]           2 u  811 1024  377   83.563    0.551   0.681
-[name]          .GPS.            1 u  766 1024  377   32.532    1.301   1.149
-[name]          .GPS.            1 u  401 1024  377  105.854    0.586   0.048
-[name*]         .GPS.            1 u  771 1024  377   92.284    3.090   0.076
#[name]          [name]           2 u  106 1024  377    0.512    1.809   0.057
#[name*]         [name*]          2 u  463 1024  376    9.281    1.583   0.269
  [name]          0.0.0.0         16 u    - 1024    0    0.000    0.000 4000.00

[name*] = servers chosen for ntpdate/ntpd -gq

All are on an internal network, physical distances from feet to 3000 miles.
Delays from home across my own wireless onto the Internet to servers 3000
miles away are 93 milliseconds, so I wouldn't put much faith in that
as an important difference.

[shut down ntpd]

# time ntpdate -b [3 servers]
  5 Feb 22:58:12 ntpdate[234803]: step time server [IP] offset -0.000136 sec

real    0m0.71s [you'll recall that with all the above servers, including
user    0m0.00s  the one that's down, this still took only 6.73 seconds]
sys     0m0.00s

# time ntpd -gq -c ntp.boot [same 3 servers, iburst minpoll 4]
ntpd: time slew 0.001556s

real    0m41.01s
user    0m0.06s
sys     0m0.60s

# time ntpd -gq -c ntp.boot [same 3 servers, iburst minpoll 1]
ntpd: time slew 0.002653s

real    0m37.00s
user    0m0.10s
sys     0m0.56s

-Tom
0
Reply Tom 2/6/2005 4:53:32 AM

At 4:53 AM +0000 2005-02-06, Tom Smith wrote:

>  Maybe a large part of your ntpdate time was printing out the messages.

	Could be.  I doubt that could make it go from 0.7 seconds to 14 
seconds, but it might have added a bit.

>  There was some question earlier about whether DNS delays might
>  explain the lengthy time for ntpd. So I performed the same experiment
>  on the same system, which is its own DNS server, caching the
>  names first, and doing ntpdate before ntpd to make doubly sure.
>  There is no meaningful difference.

	I ran ntpdate and ntpd multiple times myself, so as to make sure 
that DNS caching was not an issue.  I also didn't muck about with 
minpoll or maxpoll, although I did use iburst.  Still, my ntpd 
execution wasn't very much longer than my ntpdate (which was slower 
than your full one), and my ntpd startup was considerably faster than 
yours.

	At this point, all I'll say is that there are a lot of factors 
involved, and if you try to set up the situation so as to be as 
comparable as possible, ntpdate does not fare well.  Moreover, 
ntpdate has some nasty failure modes (which have been described by 
others) if you don't give it enough servers to check against and/or 
if some of them are down.


	I know what you want to use it for.

	You want a guaranteed less-than-one-second "good enough" answer 
for doing any necessary large-scale changes to the clock, afterwards 
you can start up various somewhat time-sensitive applications while 
the system can start getting into the detailed long-term clock 
maintenance "in the background".


	Problem is, the things you can do in order to get the upper limit 
down below one second are the same sorts of things which tend to give 
you really nasty failure modes.

	If you're *that* sensitive to time on startup, then you're 
probably also sensitive to nasty failure modes.

	I am not at all convinced that you can have your cake and eat it, 
too -- Past illusions of being able to do so in the past with ntpdate 
not withstanding.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/6/2005 5:59:36 AM

In article <vaKdnSNBMfU2upnfRVn-uQ@comcast.com> "Richard B. Gilbert"
<rgilbert88@comcast.net> writes:
>
>I think maybe what people are looking for, that ntpdate seems to do and 
>ntpd does not, is:
>1.  Get the time accurate to within  X seconds quickly.
>2.  Set that time quickly.  The presumption here is that we are in a 
>state where it's OK to step the clock; e.g. we are booting and nothing 
>has started yet that will be upset by a change in time.  Slewing the 
>clock to correct an offset of, say, 127 milliseconds, to within, say, 5 
>milliseconds,  will take about 250 seconds and that is unacceptable when 
>we need to be within 5 milliseconds of the correct time as quickly as 
>possible.

Agreed.

>The user may wish to specify the required accuracy, knowing that there 
>is a trade off with elapsed time to achieve that accuracy.  How good 
>"good enough" must be will vary.   Someone recording the time workers 
>started work and the time they stopped may be satisfied with plus/minus 
>one minute but it's 8:00AM local time and he needs it right now!  
>Someone else might really need plus/minus ten microseconds and be 
>willing and able to wait two or three hours to get it.

This I think is way beyond "what people are looking for" in this
particular area - and I'm not sure it's very meaningful either: It makes
no sense to set the time with great accuracy and then just leave it,
since the clock will quickly drift away. Achieving and *maintaining*
great accuracy, by calculating and taking the drift into account, is
precisely what ntpd already does in its normal mode of operation, of
course.

The goal, whether achieved with a separate program or a special startup
mode of ntpd, is to "quickly" (5 seconds sounds like a reasonable upper
bound) set the clock "well enough" that ntpd in normal operation can
from that point on maintain it without steps. I'm intentionally using
"goal" rather than "requirement", because with a big clock drift and
lack of previous knowledge of it (i.e. drift file), the goal may not be
possible to achieve. In "normal circumstances" it should be no problem
at all though, and ntpdate does it easily.

--Per Hedeland
per@hedeland.org
0
Reply per 2/6/2005 10:35:36 AM

In article <mailman.57.1107659271.583.questions@lists.ntp.isc.org> Brad
Knowles <brad@stop.mail-abuse.org> writes:
>At 10:52 PM -0200 2005-02-04, Alain wrote:
>
>>  ntpdate with 5 servers 3 seconds
>>  ntpdate with 8 servers 4 seconds
>>
>>  It looks like ntpdate does most things in parelel.
>
>	No, it does them in sequence.  Witness:
>
># /usr/bin/time -p ntpdate -b 10.0.1.240 time.euro.apple.com 
>de.pool.ntp.org fr.pool.ntp.org nl.pool.ntp.org uk.pool.ntp.org 
>0.europe.pool.ntp.org 1.europe.pool.ntp.org 2.europe.pool.ntp.org 
>europe.pool.ntp.org

Well, that output only shows it doing the DNS lookups in sequence - it's
a bit of a pain to do it any other way given the synchronous nature of
gethostby*(). To see the actual queries, use -d - it will reveal that
there is parallellism "when needed":

$ time ntpdate -ud 10.1.1.6 10.1.1.17 de.pool.ntp.org fr.pool.ntp.org
[snip]
transmit(10.1.1.6)
transmit(10.1.1.17)
transmit(193.218.127.251)
receive(193.218.127.251)
transmit(193.218.127.251)
receive(193.218.127.251)
transmit(193.218.127.251)
receive(193.218.127.251)
transmit(193.218.127.251)
receive(193.218.127.251)
transmit(193.218.127.251)
transmit(81.56.134.142)
receive(81.56.134.142)
transmit(81.56.134.142)
receive(81.56.134.142)
transmit(81.56.134.142)
transmit(10.1.1.6)
receive(81.56.134.142)
transmit(81.56.134.142)
receive(81.56.134.142)
transmit(81.56.134.142)
transmit(10.1.1.17)
transmit(10.1.1.6)
transmit(10.1.1.17)
transmit(10.1.1.6)
transmit(10.1.1.17)
transmit(10.1.1.6)
transmit(10.1.1.17)
10.1.1.6: Server dropped: no data
10.1.1.17: Server dropped: no data
[snip]
0.017u 0.017s 0:04.64 0.4%      56+448k 0+0io 0pf+0w

I.e. even with the two unresponsive servers I intentionally gave it, it
finishes in less than 5 seconds (the retransmits to those servers are
done at 1-second intervals, but in parallell).

>real        14.43
>user         0.01
>sys          0.04

To get this kind of runtime, I think DNS lookup delays is the only
possible explanation. Is the DNS server local to your box, or behind the
slow network? Just out of curiosity - it doesn't help to have a local
server in the scenario where ntpdate is used, i.e. at boot, of course.

If I put an unreachable DNS server first in my resolv.conf, ntpdate's
runtime in the above case skyrockets to 45 seconds (OK, this is a
weakness:-). And while ntpd too still suffers from synchronous DNS
lookups done sequentially, the sequence of lookups is done in parallell
with its normal operation, i.e. it can at least get to work as soon as
the first answer has arrived.

--Per Hedeland
per@hedeland.org
0
Reply per 2/6/2005 11:13:25 AM

Brad Knowles wrote:
>     At this point, all I'll say is that there are a lot of factors 
> involved, and if you try to set up the situation so as to be as 
> comparable as possible, ntpdate does not fare well.

Maybe you missed the data showing identical conditions and
a greater than 50:1 difference between the 2? One is 2 notes back.
The note you replied to. There are 2 or 3 other previous posts with
detailed data showing the same ting.

>  Moreover, ntpdate 
> has some nasty failure modes (which have been described by others) if 
> you don't give it enough servers to check against and/or if some of them 
> are down.
> 

Including a down server. ntpd has the same problems if you don't
give it enough servers if they're down, after all.

> 
>     I know what you want to use it for.
> 
>     You want a guaranteed less-than-one-second "good enough" answer for 
> doing any necessary large-scale changes to the clock, afterwards you can 
> start up various somewhat time-sensitive applications while the system 
> can start getting into the detailed long-term clock maintenance "in the 
> background".
> 

Exactly. And ntpd itself is one of tose time-sensitive applications
that happens sometimes to react badly to starting in the wrong place
(yielding drift rates pinned at +-500).

> 
>     Problem is, the things you can do in order to get the upper limit 
> down below one second are the same sorts of things which tend to give 
> you really nasty failure modes.
> 

Yes, welll, perhaps. I'll take the demonstrated and often seen failure
modes of not doing it over the theoretical ones, though.

>     I am not at all convinced that you can have your cake and eat it, 
> too -- Past illusions of being able to do so in the past with ntpdate 
> not withstanding.
> 

I guess it's all a matter of experience. I assure you I have very
few illusions left.
0
Reply Tom 2/6/2005 11:51:12 AM

In article <mailman.55.1107652713.583.questions@lists.ntp.isc.org> Brad
Knowles <brad@stop.mail-abuse.org> writes:
>At 9:13 PM +0000 2005-02-05, Per Hedeland wrote:
>
>>                           I don't dispute that, but what you actually
>>  wrote was that this was something that was needed: "you can help make
>>  recovery easier by configuring the LOCAL refclock" and (without LOCAL)
>>  "it most likely won't be good and you will have to restart".
>
>	Which is precisely the behaviour that I experienced.  Without a 
>LocalCLK, my machine would never recover from a loss of LAN 
>connectivity.  With a LocalCLK, it would.  Therefore, I came to the 
>conclusion that I described.

OK, I failed to realize that you were actually talking about your
experience with the wording "I'm not entirely certain what will happen,
but it most likely won't be good". Anyway, configuring a local clock
should not be needed for this purpose, and if it was in your setup,
maybe there is a bug somewhere.

--Per Hedeland
per@hedeland.org

0
Reply per 2/6/2005 11:51:26 AM

Brad Knowles wrote:

> At 4:53 AM +0000 2005-02-06, Tom Smith wrote:
>
>>  Maybe a large part of your ntpdate time was printing out the messages.
>
>
>     Could be.  I doubt that could make it go from 0.7 seconds to 14 
> seconds, but it might have added a bit.
> <snip>


>
>     I know what you want to use it for.
>
>     You want a guaranteed less-than-one-second "good enough" answer 
> for doing any necessary large-scale changes to the clock, afterwards 
> you can start up various somewhat time-sensitive applications while 
> the system can start getting into the detailed long-term clock 
> maintenance "in the background".
>
>
>     Problem is, the things you can do in order to get the upper limit 
> down below one second are the same sorts of things which tend to give 
> you really nasty failure modes.
>
How is this different from setting the correct time by hand from your 
cell phone or wrist watch?   Doing it by hand is clumsy, slow, and, at 
best, only accurate to within a second or so (unless your reflexes are 
far faster than mine).  What nasty failure modes would that induce?  
What additional failure modes would doing it with ntpd induce?

I'm suggesting that ntpd query the usual suspects using iburst and then 
unconditionally set (not slew) the clock.  Assuming that you have a more 
or less accurate drift file, and use it, why would that not give a fast 
startup and a time accurate to within, say, twenty milliseconds?   The 
time budget would be something like eight seconds to send four queries, 
get responses, and make initial time and frequency settings.   Compared 
with 214 seconds to remove 107 milliseconds of a 127 millisecond offset, 
that looks pretty good when you are in a hurry to get you system back on 
line.
0
Reply Richard 2/6/2005 1:46:47 PM

Per Hedeland wrote:

>In article <vaKdnSNBMfU2upnfRVn-uQ@comcast.com> "Richard B. Gilbert"
><rgilbert88@comcast.net> writes:
>  
>
>>I think maybe what people are looking for, that ntpdate seems to do and 
>>ntpd does not, is:
>>1.  Get the time accurate to within  X seconds quickly.
>>2.  Set that time quickly.  The presumption here is that we are in a 
>>state where it's OK to step the clock; e.g. we are booting and nothing 
>>has started yet that will be upset by a change in time.  Slewing the 
>>clock to correct an offset of, say, 127 milliseconds, to within, say, 5 
>>milliseconds,  will take about 250 seconds and that is unacceptable when 
>>we need to be within 5 milliseconds of the correct time as quickly as 
>>possible.
>>    
>>
>
>Agreed.
>
>  
>
>>The user may wish to specify the required accuracy, knowing that there 
>>is a trade off with elapsed time to achieve that accuracy.  How good 
>>"good enough" must be will vary.   Someone recording the time workers 
>>started work and the time they stopped may be satisfied with plus/minus 
>>one minute but it's 8:00AM local time and he needs it right now!  
>>Someone else might really need plus/minus ten microseconds and be 
>>willing and able to wait two or three hours to get it.
>>    
>>
>
>This I think is way beyond "what people are looking for" in this
>particular area - and I'm not sure it's very meaningful either: It makes
>no sense to set the time with great accuracy and then just leave it,
>since the clock will quickly drift away. Achieving and *maintaining*
>great accuracy, by calculating and taking the drift into account, is
>precisely what ntpd already does in its normal mode of operation, of
>course.
>  
>
I'm not suggesting that ntpd should set the clock and let it drift!  
Having once set the clock, ntpd would resume normal operation with "good 
enough" values of time and frequency.  An additional minute or two would 
be required to check the clock frequency.  If it takes two minutes and 
the frequency error is less than 500ppm, as it must be if ntpd is to 
work at all, the clock could not drift more than sixty milliseconds in 
two minutes.

If a user elects to use "fast startup" he must understand and accept the 
risks and the fact that it may still take anywhere from several minutes 
to several hours to achieve the best synchronization the system is 
capable of.  I've seen "cold" startups where the time was set by ntpdate 
and no drift file take up to twelve hours to stabilize with offsets 
below 500 microseconds.  The offsets never exceeded twenty milliseconds 
and decreased steadily.

>The goal, whether achieved with a separate program or a special startup
>mode of ntpd, is to "quickly" (5 seconds sounds like a reasonable upper
>bound) set the clock "well enough" that ntpd in normal operation can
>from that point on maintain it without steps. I'm intentionally using
>"goal" rather than "requirement", because with a big clock drift and
>lack of previous knowledge of it (i.e. drift file), the goal may not be
>possible to achieve. In "normal circumstances" it should be no problem
>at all though, and ntpdate does it easily.
>
>--Per Hedeland
>per@hedeland.org
>  
>
0
Reply Richard 2/6/2005 2:08:39 PM

In article <ZLidneYGhOB1uZvfRVn-pA@comcast.com> "Richard B. Gilbert"
<rgilbert88@comcast.net> writes:
>Per Hedeland wrote:
>
>>In article <vaKdnSNBMfU2upnfRVn-uQ@comcast.com> "Richard B. Gilbert"
>><rgilbert88@comcast.net> writes:
>>  
>>>The user may wish to specify the required accuracy, knowing that there 
>>>is a trade off with elapsed time to achieve that accuracy.  How good 
>>>"good enough" must be will vary.   Someone recording the time workers 
>>>started work and the time they stopped may be satisfied with plus/minus 
>>>one minute but it's 8:00AM local time and he needs it right now!  
>>>Someone else might really need plus/minus ten microseconds and be 
>>>willing and able to wait two or three hours to get it.
>>>    
>>>
>>
>>This I think is way beyond "what people are looking for" in this
>>particular area - and I'm not sure it's very meaningful either: It makes
>>no sense to set the time with great accuracy and then just leave it,
>>since the clock will quickly drift away. Achieving and *maintaining*
>>great accuracy, by calculating and taking the drift into account, is
>>precisely what ntpd already does in its normal mode of operation, of
>>course.
>>  
>>
>I'm not suggesting that ntpd should set the clock and let it drift!  
>Having once set the clock, ntpd would resume normal operation with "good 
>enough" values of time and frequency.  An additional minute or two would 
>be required to check the clock frequency.  If it takes two minutes and 
>the frequency error is less than 500ppm, as it must be if ntpd is to 
>work at all, the clock could not drift more than sixty milliseconds in 
>two minutes.

OK, I probably got confused by the "The user may wish to specify the
required accuracy" part - this made me think that you were talking about
a separate program (or possibly ntpd in "-q mode"), that would exit when
the required accuracy had been achieved. In the context of a special
startup mode for ntpd, with ntpd resuming normal operation before that
accuracy was achieved, I'm not sure I understand where this "required
accuracy" would fit in... - should ntpd somehow signal the environment
at the point when it was achieved?

--Per Hedeland
per@hedeland.org
0
Reply per 2/6/2005 4:22:05 PM

At 11:13 AM +0000 2005-02-06, Per Hedeland wrote:

>  Well, that output only shows it doing the DNS lookups in sequence - it's
>  a bit of a pain to do it any other way given the synchronous nature of
>  gethostby*().

	You are correct.  It does the DNS queries in sequence, before it 
does anything else.  Even though we do not currently have an 
asynchronous resolver built into the system, it would still be 
possible for the program to try to process information about other 
servers (perhaps only those that are specified by IP address), while 
waiting for the answers regarding the servers it's trying to look up.

>  To get this kind of runtime, I think DNS lookup delays is the only
>  possible explanation. Is the DNS server local to your box, or behind the
>  slow network? Just out of curiosity - it doesn't help to have a local
>  server in the scenario where ntpdate is used, i.e. at boot, of course.

	Using ntpdate exactly as I had done before, with a fresh local 
nameserver (i.e., the nameserver was just started and no other 
commands had been run which would have been likely to generate DNS 
traffic), it took about 20.5 seconds to execute.  Running the same 
command again, immediately afterwards, took about 12.5 seconds. 
Running it a third and a fourth time, it took about 8.5 and 9.9 
seconds respectively.  However, after the fourth execution, my system 
locked up to the point where I couldn't shut down some processes and 
I had to pull the plug.

	Using ntpdate exactly as before, but using my ISPs DNS servers, 
it took about 12 seconds to start up the first time, then settled 
down to a pretty reliable 10.07 seconds.  However, after the fourth 
execution of ntpdate, my system locked up again.


	Switching to testing ntpd, using my ISPs upstream nameservers, I 
can't tell you how long it took the first time, because my system 
clock was so screwed up by the previous recovery that it came up with 
a date/time stamp in 1970, after which the execution of ntpd caused 
it to reset it's clock to 1934.  Running ntpd a second time, it 
executed in 10.07 seconds.

	That's when I realized that the time was horribly off, so I 
corrected the clock manually (to the second, set from my wristwatch), 
and then re-ran ntpd, taking 64.18 seconds.  The second and third 
times executed in 10.07 seconds.  The fourth time was 11.08 seconds, 
the fifth was 9.08 seconds, and the sixth was 10.08 seconds.

	Starting up my own local nameserver, the first execution of ntpd 
took 15.07 seconds.  All the subsequent executions of ntpd took 
exactly 8.07 seconds.


	Please note that all of my tests were done without a drift file. 
I didn't discover that until after I had completed them all.  With a 
drift file, I imagine that the ntpd executions could well have 
completed even faster.

	Also note that I didn't muck with minpoll, and that the versions 
of ntpdate and ntpd I'm working with are from 4.1.80-rc1@1.1111-r 
which I had built and installed myself from a BitKeeper extract of 
ntp-dev at a time that was a little before 4.2.0 was officially 
released.  So, the new "tos" option that Dr. Mills has added was not 
available to me, and I'm not testing from the latest code.

	Finally, the earlier tests I posted about were all using an 
/etc/resolv.conf which pointed first to a server on my local LAN that 
has actually been down for days, and the tests I've run here are more 
reflective of what you should expect to see in the real world, with 
properly operating nameservers.


	There is a slight advantage to having a nameserver running 
locally to the same system running ntpd, but not a whole lot. 
Obviously, you don't want to point your resolv.conf to 
non-operational nameservers, but with ntpd, this shouldn't kill you. 
So long as the nameservers are reasonably "close" to your ntpd 
server, and are not excessively overloaded, the additional delay 
should be relatively minimal.  And caching effects resulting from a 
larger central nameserver can have a big impact over a de-populated 
nameserver running on the same machine.

	However, I still see no advantage to using ntpdate over ntpd -- 
correcting for DNS caching issues, I did not see any ntpdate runs 
that were faster than the corresponding ntpd startups.  While I'm not 
100% convinced that my lockup problems are the fault of ntpdate (they 
could be an OS fault, or a problem with my particular machine), it is 
clear that ntpdate causes problems in this area for me that ntpd does 
not.


	As I said before, there are lots of factors at work here.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/6/2005 6:49:54 PM

At 11:51 AM +0000 2005-02-06, Tom Smith wrote:

>  Maybe you missed the data showing identical conditions and
>  a greater than 50:1 difference between the 2? One is 2 notes back.
>  The note you replied to. There are 2 or 3 other previous posts with
>  detailed data showing the same ting.

	Those could be reasonably explained by DNS caching effects.  When 
comparing the performance of ntpdate to ntpd, you need to compensate 
for that.

>  Including a down server. ntpd has the same problems if you don't
>  give it enough servers if they're down, after all.

	Right, but if you feed ntpdate only one server, or only three 
servers (as cut down from your "large" ntp.conf), in order to try to 
get it to start up that vitally critical few seconds faster, and that 
server is down (or one or more of those servers is down), you could 
well be seriously toasted.

	This is a case which ntpd handles better than ntpdate, given 
suitably large numbers of servers to each.

>  Yes, welll, perhaps. I'll take the demonstrated and often seen failure
>  modes of not doing it over the theoretical ones, though.

	There's a limit to what we can do when comparing the 
dain-bramaged use of simple tools which people have in the past shot 
themselves in the foot.  The best we can do is to try to improve the 
tools in the future, so as to try to make it more difficult for 
people to shoot themselves in the foot.


	The problem is that while we're trying to improve the overall 
performance of the tools as-used in the field (and help protect 
people from their own stupidity), you're asking us to give you more 
thermonuclear foot-shooting leeway, because you believe that you know 
how better to deal with this problem than Dr. Mills.

	I am not convinced that these are design goals that can be made 
to be compatible.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/6/2005 6:58:59 PM

At 11:51 AM +0000 2005-02-06, Per Hedeland wrote:

>                                     Anyway, configuring a local clock
>  should not be needed for this purpose, and if it was in your setup,
>  maybe there is a bug somewhere.

	I will concede that I am running a somewhat old version of the 
code, and I may be experiencing problems which have already been 
corrected in more recent versions, or I may be having problems on my 
system which are unique to the OS or perhaps even unique to my 
particular machine.

	But I described the situation as I have experienced it.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/6/2005 7:00:36 PM

At 8:46 AM -0500 2005-02-06, Richard B. Gilbert wrote:

>  I'm suggesting that ntpd query the usual suspects using iburst and
>  then unconditionally set (not slew) the clock.

	I will allow that perhaps there should be a startup mode where 
this behaviour is used, but I do not believe that this should be the 
default.

	If nothing else, we have the 34 year problem whereby you could 
easily have your clock mis-set to a completely inappropriate value, 
if it's not set closely enough on startup.

>                                                  Assuming that you have
>  a more or less accurate drift file, and use it, why would that not give
>  a fast startup and a time accurate to within, say, twenty milliseconds?
>  The time budget would be something like eight seconds to send four
>  queries, get responses, and make initial time and frequency settings.
>  Compared with 214 seconds to remove 107 milliseconds of a 127
>  millisecond offset, that looks pretty good when you are in a hurry to
>  get you system back on line.

	I don't think that setting a hard eight second limit would be 
wise.  If nothing else, there are potentially serious delays that can 
be caused while waiting for DNS queries to be answered.

	If your primary nameserver is located on the same machine as your 
ntpd server, there could be very nasty name resolution deadlock 
issues which might result, if your startup sequence chooses to start 
named after ntpd, instead of the reverse.  Throw DNSSEC into the mix, 
and you could be in for a very serious world of hurt.


	If all goes well, even without a good local Stratum 1 timeserver, 
and without a good drift file, you can get pretty much full startup 
of ntpd in about eight seconds.

	But then all hell could break loose if things don't go well, and 
tying yourself to a hard eight second time limit would be about the 
worst possible thing you could do under those circumstances.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/6/2005 7:08:16 PM

Brad Knowles wrote:
> At 11:51 AM +0000 2005-02-06, Tom Smith wrote:
> 
>>  Maybe you missed the data showing identical conditions and
>>  a greater than 50:1 difference between the 2? One is 2 notes back.
>>  The note you replied to. There are 2 or 3 other previous posts with
>>  detailed data showing the same ting.
>  
>     Those could be reasonably explained by DNS caching effects.  When 
> comparing the performance of ntpdate to ntpd, you need to compensate for 
> that.
> 

Yes, that's right. By removing DNS from the equation, the
experiment clearly shows the difference between the two. When the
experiment is instead constructed to make both dependent on the
same DNS delays, it's really not very surprising that both look
similar. When designing an experiment to compare X and Y,
it's often quite useful to eliminate things that aren't X and Y.

I think what your experiment actually showed was that if you
make both dependent on DNS timeouts, you can make ntpdate as
slow as ntpd at this task.

A lot of folks don't depend on DNS in the first place and
place critical servers in /etc/hosts (or in very secure
environments use only /etc/hosts). With respect to ntpd,
a lot of folks use IP addresses in ntp.conf instead of
names if there is any doubt about DNS server availability.

>>  Including a down server. ntpd has the same problems if you don't
>>  give it enough servers if they're down, after all.
>  
>     Right, but if you feed ntpdate only one server, or only three 
> servers (as cut down from your "large" ntp.conf), in order to try to get 
> it to start up that vitally critical few seconds faster, and that server 
> is down (or one or more of those servers is down), you could well be 
> seriously toasted.
> 
>     This is a case which ntpd handles better than ntpdate, given 
> suitably large numbers of servers to each.
> 

And if you feed ntpd only one server, or only three servers, and
one or more of those servers is down, you can just as well
be just as seriously toasted with ntpd. Dumb is dumb whether
you're using a hammer or a screwdriver. How you choose boot time
servers is a consideration that needs to be made carefully no
matter which method you use. Too few, too many, or subject to
single points of failure (e.g. site power failures) are all bad
choices.

The data in fact show that ntpdate in fact made adjustments closer
to zero to an already stable time than ntpd in all but one case. Not
that I attribute that to anything other than coincidence or consider
a difference of +- 3 milliseconds of any significance at all for
the stated purpose.

>>  Yes, welll, perhaps. I'll take the demonstrated and often seen failure
>>  modes of not doing it over the theoretical ones, though.
>  
>     There's a limit to what we can do when comparing the dain-bramaged 
> use of simple tools which people have in the past shot themselves in the 
> foot.  The best we can do is to try to improve the tools in the future, 
> so as to try to make it more difficult for people to shoot themselves in 
> the foot. 
> 
>     The problem is that while we're trying to improve the overall 
> performance of the tools as-used in the field (and help protect people 
> from their own stupidity), you're asking us to give you more 
> thermonuclear foot-shooting leeway, because you believe that you know 
> how better to deal with this problem than Dr. Mills.

What I beieve is that you should let Dave speak for himself and let
carefully chosen and presented data speak for you. Like Dave, I prefer
to base opinions on actual data, and, like Dave, I tend to place more
faith in opinions similarly supported.

> 
>     I am not convinced that these are design goals that can be made to 
> be compatible.
> 

Oh ye of little faith.
0
Reply Tom 2/6/2005 8:01:16 PM

At 8:08 PM +0100 2005-02-06, Brad Knowles quoted "Richard B. Gilbert" 
<rgilbert88@comcast.net>:

>>   The time budget would be something like eight seconds to send four
>>   queries, get responses, and make initial time and frequency settings.
>>   Compared with 214 seconds to remove 107 milliseconds of a 127
>>   millisecond offset, that looks pretty good when you are in a hurry to
>>   get you system back on line.
>
>  	I don't think that setting a hard eight second limit would be wise.
>  If nothing else, there are potentially serious delays that can be caused
>  while waiting for DNS queries to be answered.

	Thinking about this a bit more, there are factors which influence 
the startup time for ntpd (and ntpdate) which are beyond our control. 
The best I think we could do would be to identify a lower bound on 
the accuracy/precision that you would be willing to accept, and an 
upper bound on the amount of time you'd like to be spent trying to 
achieve that.

	It's easy enough to figure out that if you reach the 
accuracy/precision lower boundary before you reach the time limit, 
you could either go ahead and set the time, or continue to try to get 
more accuracy/precision up until you do reach the time limit.  That's 
easy.

	The hard part is figuring out what to do if you reach the time 
limit before you reach the accuracy/precision limit?  Do you take the 
risk of going ahead and setting the clock to a value which could turn 
out to be catastrophic?  Or do you wait until you have reached the 
specified accuracy/precision limit?


	I think that ntpd has achieved the best overall balance between 
these two issues that it could reasonably do so far, and while I 
think we may be able to tune this further (perhaps by using an async 
resolver, among other things), I don't know how much more improvement 
we can make in this area.

	The introduction of the new "tos" control by Dr. Mills will give 
you more thermonuclear foot-shooting room.  However, I am not at all 
convinced that it should actually be used by applications where those 
few additional seconds of startup time are critical, for the reasons 
previously discussed -- if you're that sensitive to startup time, 
then you're almost certainly also more sensitive to time 
accuracy/precision, and you're in precisely the situation where you 
should avoid scratching that itch.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/6/2005 8:12:37 PM

At 3:01 PM -0500 2005-02-06, Tom Smith wrote:

>  I think what your experiment actually showed was that if you
>  make both dependent on DNS timeouts, you can make ntpdate as
>  slow as ntpd at this task.

	No, my results clearly show that ntpdate is roughly as slow (or 
slower) than ntpd on startup for comparable sets of servers, 
independent of DNS slowdowns.

>  A lot of folks don't depend on DNS in the first place and
>  place critical servers in /etc/hosts (or in very secure
>  environments use only /etc/hosts). With respect to ntpd,
>  a lot of folks use IP addresses in ntp.conf instead of
>  names if there is any doubt about DNS server availability.

	As we know, IP addresses of servers can change.  And when you're 
talking about services like pool.ntp.org, since you're using a DNS 
round-robin "rotor", the IP addresses are supposed to change on every 
query.

	So, just using IP addresses alone does not work in the general 
case.  Indeed, with the recent changes in the Debian, Gentoo, 
OpenBSD, NetBSD, and FreeBSD camps, I would submit that there are 
probably now more people who are dependant on using pool.ntp.org than 
have ever previously hand-coded their own ntp.conf and plugged in 
primarily servers by IP address or which were specified in their 
/etc/hosts.  Then add to that all the MacOS X clients that are using 
a time server provided by Apple, and the Windows clients that are 
using a time server provided by Microsoft, and you push out the 
numbers of DNS-dependant clients much, much further.


	If you want to compare carefully created hand-crafted 
configurations to anything else, you can show anything you want. 
That's a clear case of rigging the jury.

>  And if you feed ntpd only one server, or only three servers, and
>  one or more of those servers is down, you can just as well
>  be just as seriously toasted with ntpd. Dumb is dumb whether
>  you're using a hammer or a screwdriver.

	If you want to make a tool analogy, try the model of a Yankee 
screwdriver as compared to a regular one, or a power model.  Anyone 
who has ever used a Yankee screwdriver knows that they can be 
powerful and faster than a regular model, but they are also much more 
likely to seriously injure you than either a regular screwdriver or 
an electric one, exclusively because of the inherent design 
differences.

	Experience teaches us that ntpdate is far more likely to be 
abused in stupid ways than ntpd, although stupidity with either can 
be fatal.

>  The data in fact show that ntpdate in fact made adjustments closer
>  to zero to an already stable time than ntpd in all but one case.

	No, the data doesn't show that.  Your data is clearly different 
from my data, and I've gone to significant lengths to try to make the 
comparisons as clear and simple as possible.

>  What I beieve is that you should let Dave speak for himself and let
>  carefully chosen and presented data speak for you. Like Dave, I prefer
>  to base opinions on actual data, and, like Dave, I tend to place more
>  faith in opinions similarly supported.

	Well, we've got some actual data here, and I don't see any 
practical advantage to using ntpdate over ntpd.

>>      I am not convinced that these are design goals that can be made
>>  to be compatible.
>
>  Oh ye of little faith.

	You're always welcome to step up to the plate and contribute code 
which proves your claims.  This is an open source project, after all.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/6/2005 8:29:51 PM

Abandoning the right to remain silent, Brad Knowles at Sun, 06 Feb 2005
19:49:54 +0100 said:

> At 11:13 AM +0000 2005-02-06, Per Hedeland wrote:
> 
>>  Well, that output only shows it doing the DNS lookups in sequence - it's
>>  a bit of a pain to do it any other way given the synchronous nature of
>>  gethostby*().
> 

I tried it by doing a 'ntp -n -c pe' and putting the resulting 16 IPs in
"ntpdate -d ..."

The first run took ~19 secs, most of which was *reverse* lookups of all
the IPs.

The following 3 runs each took between 3 and 4 secs.

I do run a nameserver on the same box. Apart from caching responses, this
smooths out response from dead nameservers because named caches the
response time and tries the relevent NS with the best response time.

-- 
Avoid reality at all costs.
$email =~ s/n(.)a(.)n(.)a(.)e(.+)invalid/$1$2$3$4$5au/;
icbm: 33.43.46S 150.59.27E

0
Reply You 2/8/2005 10:05:51 AM

At 10:05 AM +0000 2005-02-08, You have no need to know wrote:

>  I tried it by doing a 'ntp -n -c pe' and putting the resulting 16 IPs in
>  "ntpdate -d ..."
>
>  The first run took ~19 secs, most of which was *reverse* lookups of all
>  the IPs.
>
>  The following 3 runs each took between 3 and 4 secs.

	Listing four servers by IP address and making use of iburst, 
Steve has demonstrated that you can reliably get initial startup of 
ntpd in seven seconds.  Listing six servers by IP address and using 
iburst, along with Dr. Mills' recommendation of "tos minclock 4 
minsane 4", Steve has shown that you can get startup in eleven 
seconds.

	In my own testing, I've demonstrated that listing nine servers by 
host name (not IP address), with a good local nameserver running on 
the same machine, along with using iburst, you can get startup in 
eight seconds once the DNS cache is primed, which worsens to twenty 
seconds when the DNS cache is empty.

	But if seven seconds still isn't fast enough, see 
<https://ntp.isc.org/bin/view/Support/StartingNTP4#Section_6.1.4.4.>.


	As far as I'm concerned, seven to eight seconds is more than fast 
enough all for all situations I can think of, including the financial 
industry where you might be losing $50 million per second of 
downtime, or military applications.

	However, if you absolutely, positively, must have initial ntpd 
startup in less time than that, otherwise you will die the most 
horrible and painful possible death, other options are documented 
that can get your startup down as low as three seconds.

	If that's not fast enough for you, then you might as well give up hope.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/8/2005 10:23:20 AM

Tom Smith wrote:
 > [snip]
> # time ntpdate -b [3 servers]
>  5 Feb 22:58:12 ntpdate[234803]: step time server [IP] offset -0.000136 sec
> 
> real    0m0.71s [you'll recall that with all the above servers, including
> user    0m0.00s  the one that's down, this still took only 6.73 seconds]
> sys     0m0.00s
> 
> # time ntpd -gq -c ntp.boot [same 3 servers, iburst minpoll 4]
> ntpd: time slew 0.001556s
> 
> real    0m41.01s
> user    0m0.06s
> sys     0m0.60s
> 
> # time ntpd -gq -c ntp.boot [same 3 servers, iburst minpoll 1]
> ntpd: time slew 0.002653s
> 
> real    0m37.00s
> user    0m0.10s
> sys     0m0.56s

As it turns out, this happened to have been using V4.1.1. The 4.2.0
build on this system was defective and had been moved out of the way.

Retested identically with 4.2.0:

# cat /etc/ntp.drift
-2.157

# time ntpdate -b [3 servers]
Looking for host [name] and service ntp
host found : [name]
Looking for host [name] and service ntp
host found : [name]
Looking for host [name] and service ntp
host found : [name]
  8 Feb 16:47:53 ntpdate[488600]: step time server [IP] offset -0.000395 sec

real    0m0.71s
user    0m0.00s
sys     0m0.01s

# time ntpd -gq -c ntp.boot [same 3 servers, iburst minpoll 1]
ntpd: time slew -0.000711s

real    0m7.13s
user    0m0.00s
sys     0m0.03s

So, without DNS in the way, 4.2.0 gets it from a 50:1 difference to a
10:1 difference. A big improvement, to be sure, but this is only with
3 servers (none of them the local clock, of course).

-Tom
0
Reply Tom 2/8/2005 10:09:06 PM

At 5:09 PM -0500 2005-02-08, Tom Smith wrote:

>  As it turns out, this happened to have been using V4.1.1. The 4.2.0
>  build on this system was defective and had been moved out of the way.

	As Dr. Mills has said, the code has been significantly improved 
since then.  Even the current ntp-dev is significantly improved over 
4.2.0-REL.  Try building from the latest snapshot tarball.  That's 
what Steve and I have been doing our most recent testing with.

>  So, without DNS in the way, 4.2.0 gets it from a 50:1 difference to a
>  10:1 difference. A big improvement, to be sure, but this is only with
>  3 servers (none of them the local clock, of course).

	To the best of my knowledge, all of our collected experience so 
far is detailed at 
<https://ntp.isc.org/bin/view/Support/StartingNTP4>, with the issues 
regarding sensitivity to startup delays in section 6.1.4.4.

	In short, with a reasonable number of servers (i.e., six to 
nine), with a reasonable /etc/ntp.conf (making use of iburst, "tos 
minclock 4 minsane 4", etc...), using NTP servers by hostname instead 
of IP address, making sure that the NTP servers are "good" as well as 
close by (within 20-50ms delay), a good nameserver running on the 
local machine, etc... you should be able to see startup times on the 
order of fifteen seconds.  I think that's perfectly reasonable for 
virtually all situations, including high-value financial businesses 
(including those which would lose $50 million per second of downtime) 
and military applications.

	However, if you absolutely positively have to get that down 
further, then listing six servers by IP address and not name, you 
should be able to see startup times around eleven seconds.  Removing 
the "tos minclock 4 minsane 4" and using four servers listed by IP 
address, you should be able to get down to seven seconds.

	If you must cut that down further, you can make use of "tos 
maxdist 16", and get it all the way down to three seconds, but 
further reductions in the number of servers won't help you, nor will 
anything else we've tried.  However, if you do that, then I believe 
that you will get what you deserve.


	Also note that the lowest minpoll the code allows is four.  If 
you try to go below that, it silently limits you to this floor.

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety."

     -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
     Assembly to the Governor, November 11, 1755

   SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply Brad 2/8/2005 10:41:16 PM

Brad,

Let me be sure this is your intent. Your minclock 4 says you must have 
at least 4 severs pass the maxdist threshold on the way down before 
setting the clock. Against my better judgement the default minclock is 
1, so ordinarily the clock is set by the first server found, and it 
might be a falseticker. I am happier with 4, because that's the 
Byzantine minimum that reliably kicks out a single ralseticker. Your 
minsane 4 requires the clustering algorithm to stop tossing out outlyers 
when 4 remain. Also good Byzantine. But, if you did that, it would be a 
good idea to use a couple more servers, say a total of 6, so the 
clusting algorithm could improve the quality. However, with minclock 4, 
if for some reason less than that number of servers were actually 
available, the clock would never be set.

Is that what you had in mind? I have no problem with it, but I do want 
to make sure the model is clearly understood.

Dave

Brad Knowles wrote:

> At 5:09 PM -0500 2005-02-08, Tom Smith wrote:
> 
>>  As it turns out, this happened to have been using V4.1.1. The 4.2.0
>>  build on this system was defective and had been moved out of the way.
> 
> 
>     As Dr. Mills has said, the code has been significantly improved 
> since then.  Even the current ntp-dev is significantly improved over 
> 4.2.0-REL.  Try building from the latest snapshot tarball.  That's what 
> Steve and I have been doing our most recent testing with.
> 
>>  So, without DNS in the way, 4.2.0 gets it from a 50:1 difference to a
>>  10:1 difference. A big improvement, to be sure, but this is only with
>>  3 servers (none of them the local clock, of course).
> 
> 
>     To the best of my knowledge, all of our collected experience so far 
> is detailed at <https://ntp.isc.org/bin/view/Support/StartingNTP4>, with 
> the issues regarding sensitivity to startup delays in section 6.1.4.4.
> 
>     In short, with a reasonable number of servers (i.e., six to nine), 
> with a reasonable /etc/ntp.conf (making use of iburst, "tos minclock 4 
> minsane 4", etc...), using NTP servers by hostname instead of IP 
> address, making sure that the NTP servers are "good" as well as close by 
> (within 20-50ms delay), a good nameserver running on the local machine, 
> etc... you should be able to see startup times on the order of fifteen 
> seconds.  I think that's perfectly reasonable for virtually all 
> situations, including high-value financial businesses (including those 
> which would lose $50 million per second of downtime) and military 
> applications.
> 
>     However, if you absolutely positively have to get that down further, 
> then listing six servers by IP address and not name, you should be able 
> to see startup times around eleven seconds.  Removing the "tos minclock 
> 4 minsane 4" and using four servers listed by IP address, you should be 
> able to get down to seven seconds.
> 
>     If you must cut that down further, you can make use of "tos maxdist 
> 16", and get it all the way down to three seconds, but further 
> reductions in the number of servers won't help you, nor will anything 
> else we've tried.  However, if you do that, then I believe that you will 
> get what you deserve.
> 
> 
>     Also note that the lowest minpoll the code allows is four.  If you 
> try to go below that, it silently limits you to this floor.
> 
0
Reply David 2/9/2005 12:43:14 AM

At 05:05 AM 2/8/2005, You have no need to know wrote:
>Abandoning the right to remain silent, Brad Knowles at Sun, 06 Feb 2005
>19:49:54 +0100 said:
>
> > At 11:13 AM +0000 2005-02-06, Per Hedeland wrote:
> >
> >>  Well, that output only shows it doing the DNS lookups in sequence - it's
> >>  a bit of a pain to do it any other way given the synchronous nature of
> >>  gethostby*().
> >
>
>I tried it by doing a 'ntp -n -c pe' and putting the resulting 16 IPs in
>"ntpdate -d ..."
>
>The first run took ~19 secs, most of which was *reverse* lookups of all
>the IPs.

Did you create a query log in your DNS? If so can you send it to me
so I can analyze it? Did you do the same thing running with ntpd?

You can send this to me directly rather than to the mailing list.

Danny

>The following 3 runs each took between 3 and 4 secs.
>
>I do run a nameserver on the same box. Apart from caching responses, this
>smooths out response from dead nameservers because named caches the
>response time and tries the relevent NS with the best response time.
>
>--
>Avoid reality at all costs.
>$email =~ s/n(.)a(.)n(.)a(.)e(.+)invalid/$1$2$3$4$5au/;
>icbm: 33.43.46S 150.59.27E
>
>_______________________________________________
>questions mailing list
>questions@lists.ntp.isc.org
>https://lists.ntp.isc.org/mailman/listinfo/questions

0
Reply Danny 2/9/2005 1:58:20 AM

Hello,

Is there any specific reason behind the change in name for xntpd in NTP 
version 4?
There are many scripts which assumes "xntpd" as the name. Is it a good 
idea to rename ntpd to xntpd and use it?

Regards,
Sajitha



0
Reply Sajitha 2/13/2005 7:20:15 AM

It should never have been released as xntpd in the first place.

H
0
Reply Harlan 2/14/2005 4:40:58 AM

"David L. Mills" <mills@udel.edu> wrote:

> Let me be sure this is your intent. Your minclock 4 says you must have
> at least 4 severs pass the maxdist threshold on the way down before
> setting the clock.

That's basically what I had understood.

>                           Against my better judgement the default
> minclock is 1, so ordinarily the clock is set by the first server found,
> and it might be a falseticker. I am happier with 4, because that's the
> Byzantine minimum that reliably kicks out a single ralseticker.

Which was the recommendation of your message in
<http://lists.ntp.isc.org/pipermail/questions/2003-September/000737.html>.

>                                               Your
> minsane 4 requires the clustering algorithm to stop tossing out outlyers
> when 4 remain. Also good Byzantine. But, if you did that, it would be a
> good idea to use a couple more servers, say a total of 6, so the
> clusting algorithm could improve the quality. However, with minclock 4,
> if for some reason less than that number of servers were actually
> available, the clock would never be set.

I am currently using more than six upstream servers.  Actually, I'm
currently using fourteen, and I should kick that down quite a bit, so as to
reduce my use of the pool.ntp.org servers.

> Is that what you had in mind? I have no problem with it, but I do want
> to make sure the model is clearly understood.

Yup, that was what I wanted.  However, based on your comments here, I
should update the page at
<https://ntp.isc.org/bin/view/Support/StartingNTP4#Section_6.1.4.3.> to
include the appropriate caveats.

Thanks!

-- 
Brad Knowles, <brad@stop.mail-abuse.org>

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
    -Benjamin Franklin, Historical Review of Pennsylvania.

  SAGE member since 1995.  See <http://www.sage.org/> for more info.
0
Reply brad 2/15/2005 9:41:13 AM

> <https://ntp.isc.org/bin/view/Support/StartingNTP4#Section_6.1.4.3.> to

This page glosses over one important little detail.  Boot order.  Can
the folks pushing for "ntpd -g -N" over "ntpdate" please put a
concrete boot order proposal forward?  Of particular interest would be
to see their proposed ordering for:

    bind ntpd syslogd

Under the old way of doing things we had a relatively clean order of:

    ntpdate
    syslogd
    bind
    ntpd

That allowed for: 1) the rough date to be set before syslogd started
writing into the logfiles.  2) bind to be started before the symbolic
names in ntp.conf needed processing.  3) the ntpd (and bind) startup
messages would be saved to the syslogs so one had a record if things
went wrong.  One still needed to have some of the ntpd server names in
/etc/hosts for ntpdate, but one didn't need to have all of the names
in there and the IP addresses didn't all have to be 100% correct since
a few resolve failures at this point wouldn't really matter.

-wolfgang
-- 
Wolfgang S. Rupprecht                http://www.wsrcc.com/wolfgang/
     Hate software patents?  Sign here: http://thankpoland.info/
0
Reply Wolfgang 2/15/2005 5:47:59 PM

http://ntp.isc.org/Support/StartingNTPDev would be a great place to discuss
this as well.

H
0
Reply Harlan 2/15/2005 11:41:46 PM

98 Replies
638 Views

(page loaded in 0.646 seconds)

Similiar Articles:


















7/23/2012 8:18:51 PM


Reply: