f



Leap second smearing test results

Folks,

I've run some tests with smearing of leap seconds. If you're interested,
you can find the results here:
https://www.meinberg.de/download/burnicki/ntp_leap_smearing_test_results.pdf

Martin
--
Martin Burnicki

Meinberg Funkuhren
Bad Pyrmont
Germany
0
Martin
11/16/2016 6:42:56 PM
comp.protocols.time.ntp 4895 articles. 2 followers. Post Follow

10 Replies
164 Views

Similar Articles

[PageSpeed] 35

Martin Burnicki wrote:
> Folks,
>
> I've run some tests with smearing of leap seconds. If you're
> interested, you can find the results here:
> https://www.meinberg.de/download/burnicki/ntp_leap_smearing_test_results.pdf
>
>
Thanks for doing these tests!

Not too surprising to see that any smear interval close to 2000 seconds
will cause problems for all clients, simple because 500 ppm is the 
maximum allowable steering rate in the ntpd control loop. If a client 
starts with a small fixed negative frequency offset, then the maximum 
additional steering rate is reduced by the same amount.

 From a mathematical viewpoint it might be slightly better to use a 
smoother function than a cosine, i.e. something which tries to minimize 
not just the first derivative, i.e. change in frequency, but also the 
second derivative or rate of change of the same.

By default (which means almost all clients) a client which is running at 
normal max (1024 s) polling and observes that the server seems to drift 
away will automatically reduce its polling interval, so it is more 
important to be smooth during startup than at the end of the smear 
interval when most clients will have dropped down to 64/128 s polling.

When fixed at 1024 s you need about 5000 seconds before the client will 
know for sure (i.e. 5 latest polls) that the server is really moving 
away, and your testing shows a mximum offset of 15-20 ms after 4+ hours, 
i.e. around 15000 seconds.

It takes much longer to reach this point because the server is NOT doing 
a linear smear, so there is no new fixed frequency so lock onto. With 
the linear smear test the offset is nearly twice as large but reached 
after just 1-2 hours.

Have you tried to make the same test with clients that are allowed to 
tune the polling interval?

It could actually be quite good to spend maybe 1-2 hours to do a smooth 
change in frequency, then 20-22 hours with a fixed slew rate, followed 
by 1-2 more hours to change back to the original frequency, since this 
would allow all clients to spend most of the time locked onto a fixed 
frequency!

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Terje
11/16/2016 9:14:54 PM
On 2016-11-16, Terje Mathisen <terje.mathisen@tmsw.no> wrote:
>  From a mathematical viewpoint it might be slightly better to use a 
> smoother function than a cosine, i.e. something which tries to minimize 
> not just the first derivative, i.e. change in frequency, but also the 
> second derivative or rate of change of the same.
>
> By default (which means almost all clients) a client which is running at 
> normal max (1024 s) polling and observes that the server seems to drift 
> away will automatically reduce its polling interval, so it is more 
> important to be smooth during startup than at the end of the smear 
> interval when most clients will have dropped down to 64/128 s polling.

I did some tests with a cubic leap smear, but it didn't seem to be
better with ntpd as a client than a quadratic leap smear performed in
the same interval. IIRC it actually made it worse as ntpd had to deal
with a faster rate later during the leap smear.

We just use the quadratic leap smear.

> It could actually be quite good to spend maybe 1-2 hours to do a smooth 
> change in frequency, then 20-22 hours with a fixed slew rate, followed 
> by 1-2 more hours to change back to the original frequency, since this 
> would allow all clients to spend most of the time locked onto a fixed 
> frequency!

I think it depends on what you are trying to minimize. Shorter interval
in which the frequency is changing will increase the maximum phase error
of the clients. In my experience it's better to spend all of the leap
smearing time slowly speeding up and down in order to minimize the
maximum error.

-- 
Miroslav Lichvar
0
Miroslav
11/17/2016 8:51:53 AM
Miroslav Lichvar wrote:
> We just use the quadratic leap smear.
>
>> It could actually be quite good to spend maybe 1-2 hours to do a smooth
>> change in frequency, then 20-22 hours with a fixed slew rate, followed
>> by 1-2 more hours to change back to the original frequency, since this
>> would allow all clients to spend most of the time locked onto a fixed
>> frequency!
>
> I think it depends on what you are trying to minimize. Shorter interval
> in which the frequency is changing will increase the maximum phase error
> of the clients. In my experience it's better to spend all of the leap
> smearing time slowly speeding up and down in order to minimize the
> maximum error.
>
Knowing what we do know about the ntpd control loop there is no real 
need to look at higher derivatives, it should suffice to have a smear 
which minimizes the frequency change rate, i.e. constant acceleration 
for the entire smear period.

This is _not_ a cosine since all derivatives are also sine/cosine 
functions, it is rather a satellite transfer function for a vessel with 
small but constant acceleration that uses a 180 degree flip at the 
midpoint to brake down again.

Since a quadratic function does have a constant second derivative I 
agree that it seems like the best choice here:

Assuming 12 hours to achieve the first 0.5 seconds, and then a mirror 
image for the second half I get something like this:

   offset = 0.5 * (t/43200)^2; // t in seconds since start of smear

with a derivative of

   doffset = t/(43200*43200);

which reaches a maximum slew rate of (1/43200) at the halfway point, or 
about 23 ppm.

and a second derivative which is just

   1/(43200*43200)

or a constant acceleration of 5.35e-10.

Is this how you have implemented it?

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Terje
11/17/2016 9:22:40 AM
Terje Mathisen wrote:
> Martin Burnicki wrote:
>> Folks,
>>
>> I've run some tests with smearing of leap seconds. If you're
>> interested, you can find the results here:
>> https://www.meinberg.de/download/burnicki/ntp_leap_smearing_test_results.pdf
>>
>>
>>
> Thanks for doing these tests!
> 
> Not too surprising to see that any smear interval close to 2000 seconds
> will cause problems for all clients, simple because 500 ppm is the
> maximum allowable steering rate in the ntpd control loop. If a client
> starts with a small fixed negative frequency offset, then the maximum
> additional steering rate is reduced by the same amount.

Agreed. I just ran the tests to visualize the results and make it more
obvious. Recently there have been announcements in some channels that
some companies are trying to do the smearing over 2000 s. I'm not sure
if they have a specific solution for the potential problems, or if they
are just not aware of what can go wrong.

> From a mathematical viewpoint it might be slightly better to use a
> smoother function than a cosine, i.e. something which tries to minimize
> not just the first derivative, i.e. change in frequency, but also the
> second derivative or rate of change of the same.

Agreed, again. However, the cosine approach is what has been implemented
in ntpd last year, and as expected it yields smoother results than a
linear approach.

That's why I wonder why Google who have inveted the cosine smearing some
years ago have changed to the linear approach lately.

> By default (which means almost all clients) a client which is running at
> normal max (1024 s) polling and observes that the server seems to drift
> away will automatically reduce its polling interval, so it is more
> important to be smooth during startup than at the end of the smear
> interval when most clients will have dropped down to 64/128 s polling.
> 
> When fixed at 1024 s you need about 5000 seconds before the client will
> know for sure (i.e. 5 latest polls) that the server is really moving
> away, and your testing shows a mximum offset of 15-20 ms after 4+ hours,
> i.e. around 15000 seconds.

Yes, I know. For now I just wanted to demonstrate the worst cases with a
fixed polling interval.

> It takes much longer to reach this point because the server is NOT doing
> a linear smear, so there is no new fixed frequency so lock onto. With
> the linear smear test the offset is nearly twice as large but reached
> after just 1-2 hours.
> 
> Have you tried to make the same test with clients that are allowed to
> tune the polling interval?

Not yet, but I'm planning to do this.

> It could actually be quite good to spend maybe 1-2 hours to do a smooth
> change in frequency, then 20-22 hours with a fixed slew rate, followed
> by 1-2 more hours to change back to the original frequency, since this
> would allow all clients to spend most of the time locked onto a fixed
> frequency!

Yes, this also sounds good.

Martin
-- 
Martin Burnicki

Meinberg Funkuhren
Bad Pyrmont
Germany
0
Martin
11/17/2016 9:43:45 AM
Miroslav Lichvar wrote:
> On 2016-11-16, Terje Mathisen <terje.mathisen@tmsw.no> wrote:
>>  From a mathematical viewpoint it might be slightly better to use a 
>> smoother function than a cosine, i.e. something which tries to minimize 
>> not just the first derivative, i.e. change in frequency, but also the 
>> second derivative or rate of change of the same.
>>
>> By default (which means almost all clients) a client which is running at 
>> normal max (1024 s) polling and observes that the server seems to drift 
>> away will automatically reduce its polling interval, so it is more 
>> important to be smooth during startup than at the end of the smear 
>> interval when most clients will have dropped down to 64/128 s polling.
> 
> I did some tests with a cubic leap smear, but it didn't seem to be
> better with ntpd as a client than a quadratic leap smear performed in
> the same interval. IIRC it actually made it worse as ntpd had to deal
> with a faster rate later during the leap smear.
> 
> We just use the quadratic leap smear.
> 
>> It could actually be quite good to spend maybe 1-2 hours to do a smooth 
>> change in frequency, then 20-22 hours with a fixed slew rate, followed 
>> by 1-2 more hours to change back to the original frequency, since this 
>> would allow all clients to spend most of the time locked onto a fixed 
>> frequency!
> 
> I think it depends on what you are trying to minimize. Shorter interval
> in which the frequency is changing will increase the maximum phase error
> of the clients. In my experience it's better to spend all of the leap
> smearing time slowly speeding up and down in order to minimize the
> maximum error.

Agreed. If you *can* do leap smearing at all, i.e. if the applications
running on the clients can accept that their time temporarily be off
true UTC, then preferably all clients should have at least the same time.

Martin
-- 
Martin Burnicki

Meinberg Funkuhren
Bad Pyrmont
Germany
0
Martin
11/17/2016 9:47:42 AM
On 2016-11-17, Terje Mathisen <terje.mathisen@tmsw.no> wrote:
> Assuming 12 hours to achieve the first 0.5 seconds, and then a mirror 
> image for the second half I get something like this:
>
>    offset = 0.5 * (t/43200)^2; // t in seconds since start of smear
>
> with a derivative of
>
>    doffset = t/(43200*43200);
>
> which reaches a maximum slew rate of (1/43200) at the halfway point, or 
> about 23 ppm.
>
> and a second derivative which is just
>
>    1/(43200*43200)
>
> or a constant acceleration of 5.35e-10.
>
> Is this how you have implemented it?

Yes, except it's configured by the acceleration instead of the interval.
We recommend 0.001 ppm/s, which gives an interval of about 17.5 hours.

-- 
Miroslav Lichvar
0
Miroslav
11/18/2016 8:40:34 AM
Miroslav Lichvar wrote:
> On 2016-11-17, Terje Mathisen <terje.mathisen@tmsw.no> wrote:
>> Assuming 12 hours to achieve the first 0.5 seconds, and then a mirror
>> image for the second half I get something like this:
>>
>>     offset = 0.5 * (t/43200)^2; // t in seconds since start of smear
>>
>> with a derivative of
>>
>>     doffset = t/(43200*43200);
>>
>> which reaches a maximum slew rate of (1/43200) at the halfway point, or
>> about 23 ppm.
>>
>> and a second derivative which is just
>>
>>     1/(43200*43200)
>>
>> or a constant acceleration of 5.35e-10.
>>
>> Is this how you have implemented it?
>
> Yes, except it's configured by the acceleration instead of the interval.
> We recommend 0.001 ppm/s, which gives an interval of about 17.5 hours.
>

That sounds fine, the main reason for using exactly 24 hours is that it 
is easier to explain and remember when the smear starts and ends.

Do you have tests showing the client responses (with varying poll 
intervals) to such a smear?

It is of course easy to model as long as the poll is constant, but with 
the default/normal dynamic poll determination it does become a bit more 
complicated. :-)

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Terje
11/18/2016 9:15:32 AM
On 2016-11-18, Terje Mathisen <terje.mathisen@tmsw.no> wrote:
> Miroslav Lichvar wrote:
>> Yes, except it's configured by the acceleration instead of the interval.
>> We recommend 0.001 ppm/s, which gives an interval of about 17.5 hours.
>
> That sounds fine, the main reason for using exactly 24 hours is that it 
> is easier to explain and remember when the smear starts and ends.

It's a bit harder to explain, but at least clients have some time to
settle down and after 24 hours they should be pretty close to the state
before the leap smear started.

> Do you have tests showing the client responses (with varying poll 
> intervals) to such a smear?

I actually did some tests showing how the two NTP implementations deal
with leap smear as clients for a blog post I wrote last year. These two
graphs show the frequency and phase offsets of clients using the default
polling interval between 6-10.

https://rhdevelopers.files.wordpress.com/2015/05/smear_freq1.png
https://rhdevelopers.files.wordpress.com/2015/05/smear_server_offset1.png

> It is of course easy to model as long as the poll is constant, but with 
> the default/normal dynamic poll determination it does become a bit more 
> complicated. :-)

Yes, and with the clock filter, which can drop up to 7 consecutive
samples, it's complicated even more. :)

-- 
Miroslav Lichvar
0
Miroslav
11/18/2016 10:17:56 AM
On 18/11/16 09:15, Terje Mathisen wrote:
> Do you have tests showing the client responses (with varying poll
> intervals) to such a smear?

That's going to depend on how close the client is to 500ppm already.
0
David
11/18/2016 12:29:06 PM
David Woolley wrote:
> On 18/11/16 09:15, Terje Mathisen wrote:
>> Do you have tests showing the client responses (with varying poll
>> intervals) to such a smear?
>
> That's going to depend on how close the client is to 500ppm already.

Any client with a baseline offset of more than 2-300 ppm is in fact 
broken, it has lost over half the available skew before you start 
adjusting it.

I assume that any exceptional event like a leap second must be handled 
without ever needing more than a 100 ppm delta, and use that to model 
the minimum time needed, i.e. 10K seconds or about 3 hours, plus the 
time needed to ramp up/down.

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
0
Terje
11/18/2016 5:54:10 PM
Reply: