Folks, I've run some tests with smearing of leap seconds. If you're interested, you can find the results here: https://www.meinberg.de/download/burnicki/ntp_leap_smearing_test_results.pdf Martin -- Martin Burnicki Meinberg Funkuhren Bad Pyrmont Germany

0 |

11/16/2016 6:42:56 PM

Martin Burnicki wrote: > Folks, > > I've run some tests with smearing of leap seconds. If you're > interested, you can find the results here: > https://www.meinberg.de/download/burnicki/ntp_leap_smearing_test_results.pdf > > Thanks for doing these tests! Not too surprising to see that any smear interval close to 2000 seconds will cause problems for all clients, simple because 500 ppm is the maximum allowable steering rate in the ntpd control loop. If a client starts with a small fixed negative frequency offset, then the maximum additional steering rate is reduced by the same amount. From a mathematical viewpoint it might be slightly better to use a smoother function than a cosine, i.e. something which tries to minimize not just the first derivative, i.e. change in frequency, but also the second derivative or rate of change of the same. By default (which means almost all clients) a client which is running at normal max (1024 s) polling and observes that the server seems to drift away will automatically reduce its polling interval, so it is more important to be smooth during startup than at the end of the smear interval when most clients will have dropped down to 64/128 s polling. When fixed at 1024 s you need about 5000 seconds before the client will know for sure (i.e. 5 latest polls) that the server is really moving away, and your testing shows a mximum offset of 15-20 ms after 4+ hours, i.e. around 15000 seconds. It takes much longer to reach this point because the server is NOT doing a linear smear, so there is no new fixed frequency so lock onto. With the linear smear test the offset is nearly twice as large but reached after just 1-2 hours. Have you tried to make the same test with clients that are allowed to tune the polling interval? It could actually be quite good to spend maybe 1-2 hours to do a smooth change in frequency, then 20-22 hours with a fixed slew rate, followed by 1-2 more hours to change back to the original frequency, since this would allow all clients to spend most of the time locked onto a fixed frequency! Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"

0 |

11/16/2016 9:14:54 PM

On 2016-11-16, Terje Mathisen <terje.mathisen@tmsw.no> wrote: > From a mathematical viewpoint it might be slightly better to use a > smoother function than a cosine, i.e. something which tries to minimize > not just the first derivative, i.e. change in frequency, but also the > second derivative or rate of change of the same. > > By default (which means almost all clients) a client which is running at > normal max (1024 s) polling and observes that the server seems to drift > away will automatically reduce its polling interval, so it is more > important to be smooth during startup than at the end of the smear > interval when most clients will have dropped down to 64/128 s polling. I did some tests with a cubic leap smear, but it didn't seem to be better with ntpd as a client than a quadratic leap smear performed in the same interval. IIRC it actually made it worse as ntpd had to deal with a faster rate later during the leap smear. We just use the quadratic leap smear. > It could actually be quite good to spend maybe 1-2 hours to do a smooth > change in frequency, then 20-22 hours with a fixed slew rate, followed > by 1-2 more hours to change back to the original frequency, since this > would allow all clients to spend most of the time locked onto a fixed > frequency! I think it depends on what you are trying to minimize. Shorter interval in which the frequency is changing will increase the maximum phase error of the clients. In my experience it's better to spend all of the leap smearing time slowly speeding up and down in order to minimize the maximum error. -- Miroslav Lichvar

0 |

11/17/2016 8:51:53 AM

Miroslav Lichvar wrote: > We just use the quadratic leap smear. > >> It could actually be quite good to spend maybe 1-2 hours to do a smooth >> change in frequency, then 20-22 hours with a fixed slew rate, followed >> by 1-2 more hours to change back to the original frequency, since this >> would allow all clients to spend most of the time locked onto a fixed >> frequency! > > I think it depends on what you are trying to minimize. Shorter interval > in which the frequency is changing will increase the maximum phase error > of the clients. In my experience it's better to spend all of the leap > smearing time slowly speeding up and down in order to minimize the > maximum error. > Knowing what we do know about the ntpd control loop there is no real need to look at higher derivatives, it should suffice to have a smear which minimizes the frequency change rate, i.e. constant acceleration for the entire smear period. This is _not_ a cosine since all derivatives are also sine/cosine functions, it is rather a satellite transfer function for a vessel with small but constant acceleration that uses a 180 degree flip at the midpoint to brake down again. Since a quadratic function does have a constant second derivative I agree that it seems like the best choice here: Assuming 12 hours to achieve the first 0.5 seconds, and then a mirror image for the second half I get something like this: offset = 0.5 * (t/43200)^2; // t in seconds since start of smear with a derivative of doffset = t/(43200*43200); which reaches a maximum slew rate of (1/43200) at the halfway point, or about 23 ppm. and a second derivative which is just 1/(43200*43200) or a constant acceleration of 5.35e-10. Is this how you have implemented it? Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"

0 |

11/17/2016 9:22:40 AM

Terje Mathisen wrote: > Martin Burnicki wrote: >> Folks, >> >> I've run some tests with smearing of leap seconds. If you're >> interested, you can find the results here: >> https://www.meinberg.de/download/burnicki/ntp_leap_smearing_test_results.pdf >> >> >> > Thanks for doing these tests! > > Not too surprising to see that any smear interval close to 2000 seconds > will cause problems for all clients, simple because 500 ppm is the > maximum allowable steering rate in the ntpd control loop. If a client > starts with a small fixed negative frequency offset, then the maximum > additional steering rate is reduced by the same amount. Agreed. I just ran the tests to visualize the results and make it more obvious. Recently there have been announcements in some channels that some companies are trying to do the smearing over 2000 s. I'm not sure if they have a specific solution for the potential problems, or if they are just not aware of what can go wrong. > From a mathematical viewpoint it might be slightly better to use a > smoother function than a cosine, i.e. something which tries to minimize > not just the first derivative, i.e. change in frequency, but also the > second derivative or rate of change of the same. Agreed, again. However, the cosine approach is what has been implemented in ntpd last year, and as expected it yields smoother results than a linear approach. That's why I wonder why Google who have inveted the cosine smearing some years ago have changed to the linear approach lately. > By default (which means almost all clients) a client which is running at > normal max (1024 s) polling and observes that the server seems to drift > away will automatically reduce its polling interval, so it is more > important to be smooth during startup than at the end of the smear > interval when most clients will have dropped down to 64/128 s polling. > > When fixed at 1024 s you need about 5000 seconds before the client will > know for sure (i.e. 5 latest polls) that the server is really moving > away, and your testing shows a mximum offset of 15-20 ms after 4+ hours, > i.e. around 15000 seconds. Yes, I know. For now I just wanted to demonstrate the worst cases with a fixed polling interval. > It takes much longer to reach this point because the server is NOT doing > a linear smear, so there is no new fixed frequency so lock onto. With > the linear smear test the offset is nearly twice as large but reached > after just 1-2 hours. > > Have you tried to make the same test with clients that are allowed to > tune the polling interval? Not yet, but I'm planning to do this. > It could actually be quite good to spend maybe 1-2 hours to do a smooth > change in frequency, then 20-22 hours with a fixed slew rate, followed > by 1-2 more hours to change back to the original frequency, since this > would allow all clients to spend most of the time locked onto a fixed > frequency! Yes, this also sounds good. Martin -- Martin Burnicki Meinberg Funkuhren Bad Pyrmont Germany

0 |

11/17/2016 9:43:45 AM

Miroslav Lichvar wrote: > On 2016-11-16, Terje Mathisen <terje.mathisen@tmsw.no> wrote: >> From a mathematical viewpoint it might be slightly better to use a >> smoother function than a cosine, i.e. something which tries to minimize >> not just the first derivative, i.e. change in frequency, but also the >> second derivative or rate of change of the same. >> >> By default (which means almost all clients) a client which is running at >> normal max (1024 s) polling and observes that the server seems to drift >> away will automatically reduce its polling interval, so it is more >> important to be smooth during startup than at the end of the smear >> interval when most clients will have dropped down to 64/128 s polling. > > I did some tests with a cubic leap smear, but it didn't seem to be > better with ntpd as a client than a quadratic leap smear performed in > the same interval. IIRC it actually made it worse as ntpd had to deal > with a faster rate later during the leap smear. > > We just use the quadratic leap smear. > >> It could actually be quite good to spend maybe 1-2 hours to do a smooth >> change in frequency, then 20-22 hours with a fixed slew rate, followed >> by 1-2 more hours to change back to the original frequency, since this >> would allow all clients to spend most of the time locked onto a fixed >> frequency! > > I think it depends on what you are trying to minimize. Shorter interval > in which the frequency is changing will increase the maximum phase error > of the clients. In my experience it's better to spend all of the leap > smearing time slowly speeding up and down in order to minimize the > maximum error. Agreed. If you *can* do leap smearing at all, i.e. if the applications running on the clients can accept that their time temporarily be off true UTC, then preferably all clients should have at least the same time. Martin -- Martin Burnicki Meinberg Funkuhren Bad Pyrmont Germany

0 |

11/17/2016 9:47:42 AM

On 2016-11-17, Terje Mathisen <terje.mathisen@tmsw.no> wrote: > Assuming 12 hours to achieve the first 0.5 seconds, and then a mirror > image for the second half I get something like this: > > offset = 0.5 * (t/43200)^2; // t in seconds since start of smear > > with a derivative of > > doffset = t/(43200*43200); > > which reaches a maximum slew rate of (1/43200) at the halfway point, or > about 23 ppm. > > and a second derivative which is just > > 1/(43200*43200) > > or a constant acceleration of 5.35e-10. > > Is this how you have implemented it? Yes, except it's configured by the acceleration instead of the interval. We recommend 0.001 ppm/s, which gives an interval of about 17.5 hours. -- Miroslav Lichvar

0 |

11/18/2016 8:40:34 AM

Miroslav Lichvar wrote: > On 2016-11-17, Terje Mathisen <terje.mathisen@tmsw.no> wrote: >> Assuming 12 hours to achieve the first 0.5 seconds, and then a mirror >> image for the second half I get something like this: >> >> offset = 0.5 * (t/43200)^2; // t in seconds since start of smear >> >> with a derivative of >> >> doffset = t/(43200*43200); >> >> which reaches a maximum slew rate of (1/43200) at the halfway point, or >> about 23 ppm. >> >> and a second derivative which is just >> >> 1/(43200*43200) >> >> or a constant acceleration of 5.35e-10. >> >> Is this how you have implemented it? > > Yes, except it's configured by the acceleration instead of the interval. > We recommend 0.001 ppm/s, which gives an interval of about 17.5 hours. > That sounds fine, the main reason for using exactly 24 hours is that it is easier to explain and remember when the smear starts and ends. Do you have tests showing the client responses (with varying poll intervals) to such a smear? It is of course easy to model as long as the poll is constant, but with the default/normal dynamic poll determination it does become a bit more complicated. :-) Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"

0 |

11/18/2016 9:15:32 AM

On 2016-11-18, Terje Mathisen <terje.mathisen@tmsw.no> wrote: > Miroslav Lichvar wrote: >> Yes, except it's configured by the acceleration instead of the interval. >> We recommend 0.001 ppm/s, which gives an interval of about 17.5 hours. > > That sounds fine, the main reason for using exactly 24 hours is that it > is easier to explain and remember when the smear starts and ends. It's a bit harder to explain, but at least clients have some time to settle down and after 24 hours they should be pretty close to the state before the leap smear started. > Do you have tests showing the client responses (with varying poll > intervals) to such a smear? I actually did some tests showing how the two NTP implementations deal with leap smear as clients for a blog post I wrote last year. These two graphs show the frequency and phase offsets of clients using the default polling interval between 6-10. https://rhdevelopers.files.wordpress.com/2015/05/smear_freq1.png https://rhdevelopers.files.wordpress.com/2015/05/smear_server_offset1.png > It is of course easy to model as long as the poll is constant, but with > the default/normal dynamic poll determination it does become a bit more > complicated. :-) Yes, and with the clock filter, which can drop up to 7 consecutive samples, it's complicated even more. :) -- Miroslav Lichvar

0 |

11/18/2016 10:17:56 AM

On 18/11/16 09:15, Terje Mathisen wrote: > Do you have tests showing the client responses (with varying poll > intervals) to such a smear? That's going to depend on how close the client is to 500ppm already.

0 |

11/18/2016 12:29:06 PM

David Woolley wrote: > On 18/11/16 09:15, Terje Mathisen wrote: >> Do you have tests showing the client responses (with varying poll >> intervals) to such a smear? > > That's going to depend on how close the client is to 500ppm already. Any client with a baseline offset of more than 2-300 ppm is in fact broken, it has lost over half the available skew before you start adjusting it. I assume that any exceptional event like a leap second must be handled without ever needing more than a 100 ppm delta, and use that to model the minimum time needed, i.e. 10K seconds or about 3 hours, plus the time needed to ramp up/down. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"

0 |

11/18/2016 5:54:10 PM