Stabilizing the drift file?

  • Follow


Hi,

did anybody else have the problem, to have to stabilize the frequency discipline
faster than the normal 15 minutes to hours? Does any solution exist?

We have here the requirement to have a replaced board (a fresh one from the factory,
where a correct content of the drift file is unknown) up after 5 minutes in a cluster,
which also means, that its time does not stray more than 10ms (within the cluster).

Any tips?
We tried here to "play" with minpoll, burst, initial drift files (value 0).


Regards, Robert

0
Reply Robert 11/22/2006 10:01:19 AM

"Robert Wachinger" <job@Robert-Wachinger.de> wrote in message
news:ek175f$sp9$1@daniel-new.mch.sbs.de...
[...]
> We have here the requirement to have a replaced board (a fresh one
> from the factory, where a correct content of the drift file is
> unknown) up after 5 minutes in a cluster, which also means, that
> its time does not stray more than 10ms (within the cluster).
>
> Any tips?
> We tried here to "play" with minpoll, burst, initial drift files
> (value 0).

Iburst would start you with a low offset. But that has really
absolutely nothing to do with intrinsic frequency error.

Can you run (NTP on) the board outside the cluster? Your problem is
that you don't know the correct drift value. It seems simple then:
you should find out.

I'm not sure if limiting maxpoll is guaranteed to keep your offset
low the way running ntpdate often would; frankly I doubt it.

Doesn't the cluster allow you to add nodes in standby mode?

Groetjes,
Maarten Wiltink


0
Reply Maarten 11/22/2006 10:51:51 AM


Maarten Wiltink <maarten@kittensandcats.net> wrote:
> "Robert Wachinger" <job@Robert-Wachinger.de> wrote in message
> news:ek175f$sp9$1@daniel-new.mch.sbs.de...
> [...]
>> We have here the requirement to have a replaced board (a fresh one
>> from the factory, where a correct content of the drift file is
>> unknown) up after 5 minutes in a cluster, which also means, that
>> its time does not stray more than 10ms (within the cluster).
>>
>> Any tips?
>> We tried here to "play" with minpoll, burst, initial drift files
>> (value 0).

> Iburst would start you with a low offset. But that has really
> absolutely nothing to do with intrinsic frequency error.

> Can you run (NTP on) the board outside the cluster? Your problem is
> that you don't know the correct drift value. It seems simple then:
> you should find out.

That would mean additional manual handling (and is therefore costful).

> I'm not sure if limiting maxpoll is guaranteed to keep your offset
> low the way running ntpdate often would; frankly I doubt it.

So no idea, to reduce the time until a drift value is stable enough.

> Doesn't the cluster allow you to add nodes in standby mode?

Hm, nice idea. Maybe that could work ...

Regards, Robert

0
Reply Robert 11/22/2006 12:28:27 PM

Robert Wachinger wrote:
> Hi,
> 
> did anybody else have the problem, to have to stabilize the frequency discipline
> faster than the normal 15 minutes to hours? Does any solution exist?
> 
> We have here the requirement to have a replaced board (a fresh one from the factory,
> where a correct content of the drift file is unknown) up after 5 minutes in a cluster,
> which also means, that its time does not stray more than 10ms (within the cluster).
> 
> Any tips?
> We tried here to "play" with minpoll, burst, initial drift files (value 0).
> 
> 
> Regards, Robert
> 

Use "iburst" on all your server statements in ntp.conf.  Be sure to use 
-q as an option when you start ntpd.  This may not be enough but there 
is not much more you can do.  A very good server or several very good 
servers may also help.  The goodness of a server is strongly dependent 
on the quality of the network path between the server and your client.

Is it possible to run this new board standalone before joining the 
cluster?  If so, that could get you a valid drift file.
0
Reply Richard 11/22/2006 3:02:00 PM

Robert,

I've had this discussion literally hundreds of times over the last 25 
years. Somebody tells me they absolutely must have residual error less 
than x millisconds guaranteed not more than y minutes after startup. But 
this pits the Principle of Least Astonishment against the constraints of 
physics. There is a tradeoff between the precision in estimating the 
intrinsic frequency of the oscillator and the time to make and refine 
that estimate.

When first starting ntpd without the drift file, by default the state 
machine takes fifteen minutes to directly compute the initial frequency 
estimate within about 1 PPM, then enables the native clock discipline 
algorithm. You can change this using the "tinker stepout" command to 
shorten the initial time; however, the estimation error will be worse 
and could well cause the succeeding offsets to exceed 10 ms until the 
loop settles down.

The clock discipline algorithm takes far longer than five minutes to 
refine the time and frequency estimate within 10 ms confidence. At the 
lowest poll interval of 16 s, the discipline crosses zero in about 
fifteen minutes, but the frequency convergence can take four times as 
long. So, even if you shorten the stepout threshold to five minutes and 
rely on the discipline to complete the initial convergence, you are 
still left with some uncertainty that your 10-ms constraints might be 
violated.

Notwithstanding the constraints you are faced with, the following are 
your ONLY choices:

1. Pay more money for the oscillator and/or select the crystal within a 
tolerance of 10 PPM or less. That's what Digital did for the Alpha. I've 
never seen an Alpha with more than 5 PPM frequency error.

2. Provide a fine oscillator frequency adjustment (VXO) and calibrate 
during initial test.

3. Measure the oscillator frequency error during manufacturing and save 
this in the BIOS flash where the operating system can find it.

4. Set the stepout interval to five minutes and accept what errors 
remain once the state machine has enabled the discipline.

5. Don't do anything and require a hot spare is always available with a 
burn in of several hours.

Dave

Robert Wachinger wrote:

> Maarten Wiltink <maarten@kittensandcats.net> wrote:
> 
>>"Robert Wachinger" <job@Robert-Wachinger.de> wrote in message
>>news:ek175f$sp9$1@daniel-new.mch.sbs.de...
>>[...]
>>
>>>We have here the requirement to have a replaced board (a fresh one
>>>from the factory, where a correct content of the drift file is
>>>unknown) up after 5 minutes in a cluster, which also means, that
>>>its time does not stray more than 10ms (within the cluster).
>>>
>>>Any tips?
>>>We tried here to "play" with minpoll, burst, initial drift files
>>>(value 0).
> 
> 
>>Iburst would start you with a low offset. But that has really
>>absolutely nothing to do with intrinsic frequency error.
> 
> 
>>Can you run (NTP on) the board outside the cluster? Your problem is
>>that you don't know the correct drift value. It seems simple then:
>>you should find out.
> 
> 
> That would mean additional manual handling (and is therefore costful).
> 
> 
>>I'm not sure if limiting maxpoll is guaranteed to keep your offset
>>low the way running ntpdate often would; frankly I doubt it.
> 
> 
> So no idea, to reduce the time until a drift value is stable enough.
> 
> 
>>Doesn't the cluster allow you to add nodes in standby mode?
> 
> 
> Hm, nice idea. Maybe that could work ...
> 
> Regards, Robert
> 
0
Reply mills 11/22/2006 4:42:35 PM

Hi Dave,

mills@udel.edu wrote:
> Robert,

> I've had this discussion literally hundreds of times over the last 25 
> years.

So this seems to be a heavy case of a FAQ. (I did not find a proper topic
in the FAQ, hope I was not too blind).

Nevertheless: Thank you for your clear answer. It helps.

Robert

0
Reply Robert 11/23/2006 11:03:09 AM

mills@udel.edu wrote :

> When first starting ntpd without the drift file, by default the state
> machine takes fifteen minutes to directly compute the initial frequency
> estimate within about 1 PPM, then enables the native clock discipline
> algorithm.

Hi Dave.

You statement is correct for the complete variety of platforms ntp
supports.
However, on a Linux system (with at least 1us time resolution) you can
get reasonable estimates for the drift much faster.
Have a look at my script in
https://ntp.isc.org/bugs/show_bug.cgi?id=742
Changing the stepout as you proposed should give the same results (on
Linux).
Unfortunately, I didn't understand the meaning of 'stepout' from the
documentation. ;-(

Bye
 Juergen

0
Reply Juergen 11/23/2006 1:13:54 PM

Juergen,

No, you can't always measure the frequency accurately. It depends on the 
initial offset when you start and what happens after the initial 
frequency measurement. As most folks don't want to set the clock 
directly unless forced, the daemon allows an initial offset of up to the 
step threshold (128 ms) and does the best it can after the stepout 
interval (900 s). Note that the code carefully separates the frequency 
measurement from the time offset measurement, so even after an accurate 
frequency measurement the discipline has to wrangle both the time and 
frequency when the initial time is as much as 128 ms in error. This 
initial transient can take several hours to dissipate.

This could be avoided by forcefully setting the clock at initial 
startup, but in some systems the setting function can have a large error 
in and of itself. The only workaround would be to disable all 
applications until the residual offset was within the tolerance, circa 
10 ms. Setting the stepout interval something less (300 s) moves up the 
time the discipline is enabled, but brings with additional error due to 
the expected error in the offset measurements. For instance, if the 
expected error was 10 ms in a 300-s stepout interval, the maximum 
frequency error could be as high as 67 PPM. The discipline could take up 
to a day to amortize this error.

Dave

Juergen.Salm@siemens.com wrote:
> mills@udel.edu wrote :
> 
> 
>>When first starting ntpd without the drift file, by default the state
>>machine takes fifteen minutes to directly compute the initial frequency
>>estimate within about 1 PPM, then enables the native clock discipline
>>algorithm.
> 
> 
> Hi Dave.
> 
> You statement is correct for the complete variety of platforms ntp
> supports.
> However, on a Linux system (with at least 1us time resolution) you can
> get reasonable estimates for the drift much faster.
> Have a look at my script in
> https://ntp.isc.org/bugs/show_bug.cgi?id=742
> Changing the stepout as you proposed should give the same results (on
> Linux).
> Unfortunately, I didn't understand the meaning of 'stepout' from the
> documentation. ;-(
> 
> Bye
>  Juergen
> 
0
Reply David 11/23/2006 5:00:07 PM

Hi Dave.

> This could be avoided by forcefully setting the clock at initial
> startup

That's exactly what we do.

You have to know that Robert an I are working on the same project.
Robert is a systems engineer of the platform we use to build our
device.

The requirements of max. 100ms external offset, max. 10ms internal
offset an 5 minutes until service availability are coming from our
requirement specification. ;-)
And we are always talking about a reboot scenario. So we don't need to
compensate large offsets.

What we do now during reboot is
a) Measure the drift with my script for 2 seconds
b) Set the time with ntpdate -b
c) start ntpd with "iburst burst minpoll 4 maxpoll 6" for the
controller blades requiring  <100ms offfset and with "iburst burst
minpoll 4 maxpoll 4" for those blades requiring <10ms among themselves

So ntpd can start its operation with (almost) no initial offset and a
drift that is +-2ppm away from the long-term value.
This gives us a max. offset of <<1 ms right from the start.
And that's what we were looking for. :-)

Bye
 Juergen

0
Reply Juergen 11/24/2006 2:36:09 PM

>a) Measure the drift with my script for 2 seconds

>So ntpd can start its operation with (almost) no initial offset and a
>drift that is +-2ppm away from the long-term value.

Are you sure that is going to work correctly?

If one of those readings is off by 5 microseconds, that will turn
into an error of 2.5 ppm.

-- 
These are my opinions, not necessarily my employer's.  I hate spam.

0
Reply hal 11/24/2006 5:59:48 PM

Juergen,

Assuming your system specification allows for up to 10 ms measurement 
error over two seconds, your script can result in a maximum frequency 
error of 5000 PPM. Golly.

Dave

Juergen.Salm@siemens.com wrote:
> Hi Dave.
> 
> 
>>This could be avoided by forcefully setting the clock at initial
>>startup
> 
> 
> That's exactly what we do.
> 
> You have to know that Robert an I are working on the same project.
> Robert is a systems engineer of the platform we use to build our
> device.
> 
> The requirements of max. 100ms external offset, max. 10ms internal
> offset an 5 minutes until service availability are coming from our
> requirement specification. ;-)
> And we are always talking about a reboot scenario. So we don't need to
> compensate large offsets.
> 
> What we do now during reboot is
> a) Measure the drift with my script for 2 seconds
> b) Set the time with ntpdate -b
> c) start ntpd with "iburst burst minpoll 4 maxpoll 6" for the
> controller blades requiring  <100ms offfset and with "iburst burst
> minpoll 4 maxpoll 4" for those blades requiring <10ms among themselves
> 
> So ntpd can start its operation with (almost) no initial offset and a
> drift that is +-2ppm away from the long-term value.
> This gives us a max. offset of <<1 ms right from the start.
> And that's what we were looking for. :-)
> 
> Bye
>  Juergen
> 
0
Reply David 11/24/2006 8:27:04 PM

Hi Hal.

> If one of those readings is off by 5 microseconds, that will turn
> into an error of 2.5 ppm.

As I  wrote, +-2ppm is fine. Even +-5ppm would be sufficient not to
exceed 10ms offset in the transient phase after ntpd is started up.

Bye
 Juergen

0
Reply Juergen 11/26/2006 9:00:41 AM

Hi Dave.

> Assuming your system specification allows for up to 10 ms measurement
> error over two seconds, your script can result in a maximum frequency
> error of 5000 PPM. Golly.

With 10ms measurement error, we would never be able to fulfil the
requirement of less than 10ms offset between the blades. ;-(

However, the blades are communicating with each other via a gigabit
ethernet backplane.
The poll delay is usually less than 150us. And to cope with single
outliers, I use '-p 8' in the script. In addition, the script uses the
server timestamp information to calculate the duration of the
measurement period to be independent of the potentionally hight error
of the local clock. And I anyway have to trust the server.

Bye
 Juergen

0
Reply Juergen 11/26/2006 9:14:33 AM

Robert Wachinger <job@Robert-Wachinger.de> writes:

> Hi,
> 
> did anybody else have the problem, to have to stabilize the frequency discipline
> faster than the normal 15 minutes to hours? Does any solution exist?

"Hack the daemon" is the most likely solution (despite of "iburst" and "burst").

> 
> We have here the requirement to have a replaced board (a fresh one from the factory,
> where a correct content of the drift file is unknown) up after 5 minutes in a cluster,
> which also means, that its time does not stray more than 10ms (within the cluster).
> 
> Any tips?
> We tried here to "play" with minpoll, burst, initial drift files (value 0).

What about not running the new board in the cluster first?

Ulrich
0
Reply Ulrich 11/27/2006 10:48:02 AM

Ulrich Windl wrote:
> Robert Wachinger <job@Robert-Wachinger.de> writes:
> 
> 
>>Hi,
>>
>>did anybody else have the problem, to have to stabilize the frequency discipline
>>faster than the normal 15 minutes to hours? Does any solution exist?
> 
> 
> "Hack the daemon" is the most likely solution (despite of "iburst" and "burst").
> 
> 
>>We have here the requirement to have a replaced board (a fresh one from the factory,
>>where a correct content of the drift file is unknown) up after 5 minutes in a cluster,
>>which also means, that its time does not stray more than 10ms (within the cluster).
>>
>>Any tips?
>>We tried here to "play" with minpoll, burst, initial drift files (value 0).
> 
> 
> What about not running the new board in the cluster first?
> 
> Ulrich
Does it have to play "nice" or work as a "reliable server" instantly?

in case of nice ( for "single action replacement" requirements) :
write some shellscript that evaluates the driftfile ( on a regular basis per cron).

Start with and keep the machine mum as ntp server.

Change ntp configuration ( per script ) when your quality requirements are met.

Switch back ( and mail for help ) when the box misbehaves.

uwe


0
Reply Uwe 11/27/2006 11:08:31 AM

Juergen,

You are badly misguided. While the resolution of your Linux (and many 
other systems) system clock might be better than a microsecond, the 
precision of reading it can be much worse, Moreover, you haven't 
calibrated for the maximum intrinsic error in reading the system clock 
upon reboot, which can be in the many milliseconds, much more if the TOY 
clock chip cannot be read to that precision. Suggesting that you can 
with confidence estimate the intrinsic clock frequency within a PPM 
after two seconds of observation is statistical irresponsibility. I said 
what I said; now we should get on to other business.

Dave

Juergen.Salm@siemens.com wrote:

> mills@udel.edu wrote :
> 
> 
>>When first starting ntpd without the drift file, by default the state
>>machine takes fifteen minutes to directly compute the initial frequency
>>estimate within about 1 PPM, then enables the native clock discipline
>>algorithm.
> 
> 
> Hi Dave.
> 
> You statement is correct for the complete variety of platforms ntp
> supports.
> However, on a Linux system (with at least 1us time resolution) you can
> get reasonable estimates for the drift much faster.
> Have a look at my script in
> https://ntp.isc.org/bugs/show_bug.cgi?id=742
> Changing the stepout as you proposed should give the same results (on
> Linux).
> Unfortunately, I didn't understand the meaning of 'stepout' from the
> documentation. ;-(
> 
> Bye
>  Juergen
> 
0
Reply mills 11/27/2006 5:00:59 PM

15 Replies
309 Views

(page loaded in 0.501 seconds)

Similiar Articles:


















7/30/2012 3:38:34 AM


Reply: