Welford algorithm for computing standard deviation

  • Follow


I am looking for the Welford algorithm for standard deviation. 
Can anyone provide a link or show an example? (I don't have 
access to TAOCP.)

JS
0
Reply John 4/11/2007 6:32:48 PM

On Apr 11, 11:32 am, John Smith <JSm...@mail.net> wrote:
> I am looking for the Welford algorithm for standard deviation.
> Can anyone provide a link or show an example? (I don't have
> access to TAOCP.)
>
> JS

C++ header file:
#include <cmath>

template <class etype>
class WelfordStandardDeviation {
 private:
    etype          m[2];
    etype          s[2];
    size_t          n;
 public:
                    WelfordStandardDeviation(void);
    etype          GetS(void);
    etype          GetM(void);
    etype          GetN(void);
    etype          GetSigma(void);
    void            AddItem(etype x);
};

template <class etype>
WelfordStandardDeviation::WelfordStandardDeviation(void)
{
    m[0] = (etype)0;
    m[1] = (etype)0;
    s[0] = (etype)0;
    s[1] = (etype)0;
    n = 0;
}

template <class etype>
etype          WelfordStandardDeviation::GetS(void)
{
    if (n >= 2)
        return s[0];
    return sqrt(-1.); /* Create a NAN */
}

template <class etype>
etype          WelfordStandardDeviation::GetM(void)
{
    if (n >= 2)
        return s[0];
    return sqrt(-1.); /* Create a NAN */
}

template <class etype>
etype          WelfordStandardDeviation::GetN(void)
{
    return n;
}

template <class etype>
etype          WelfordStandardDeviation::GetSigma(void)
{
    if (n >= 2)
        return sqrt(s[0] / (n - 1.0));
    return sqrt(-1.); /* Create a NAN */
}

template <class etype>
void            WelfordStandardDeviation::AddItem(etype x)
{
    ++n;
    if (n == 1) {
        m[0] = x;
    } else {
        m[1] = m[0] + (x - m[0]) / n;
        s[1] = s[0] + (x - m[0]) * (x - m[1]);
        m[0] = m[1]; /* for next time */
        s[0] = s[1]; /* for next time */
    }
}

C++ test driver (computes using Welford header file and by another
method):

#ifdef UNIT_TEST

#include <iostream>
#include <iomanip>
using namespace std;

#include "stats.hpp"
#include "welford.hpp"

int             main(void)
{
    WelfordStandardDeviation <double> wsd;
    UnivarateStatistics <double, double > us(true);
    size_t i;
    for (i = 0; i < 10000; i++) {
        wsd.AddItem((double) i);
        us.add_pass_one((double) i);
    }
    us.calculate_simple_statistics();
    cout << setprecision(20);
    cout << "Standard deviation of the first 10000 integers is " <<
wsd.GetSigma() << " verified as " << us.get_s() << endl;
    return 0;
}

/*
C:\tmp>welford
Standard deviation of the first 10000 integers is 2886.8956799071675
verified as 2886.8956799071675
*/
#endif          /* UNIT_TEST */

P.S.
Buy a copy of Knuth's TAOCP.
That's an order.

0
Reply user923005 4/11/2007 7:34:53 PM


"John Smith" writes:

>I am looking for the Welford algorithm for standard deviation. Can anyone 
>provide a link or show an example? (I don't have access to TAOCP.)

This might help.

http://www.daheiser.info/excel/notes/notei.pdf 


0
Reply osmium 4/11/2007 7:39:45 PM

user923005 wrote:
> On Apr 11, 11:32 am, John Smith <JSm...@mail.net> wrote:
> 
>>I am looking for the Welford algorithm for standard deviation.
>>Can anyone provide a link or show an example? (I don't have
>>access to TAOCP.)
>>
>>JS

<snip C++ code>

Thanks. I don't know any C++ so it's hard for me to understand 
your example. Below is my C implementation of the pseudocode 
given in Osmium's link. It returns a slightly different value 
from the std dev calculated in the conventional manner. Can you 
see where I've gone wrong?

/* Data X(1 to N)
    A = X(1)
    B = 0
    FOR K = 2 to N
    U = X(K) - A
    A = A + U / CDBL(K)
    B = B + U * (X(K) - A)
    NEXT K
    AVERAGE = A
    STDEV = SQR( B / CDBL(N - 1) )
*/
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
double welford_stdev(double *datalist, int listsize);

int main(void)
{
	int n;
	double data[] = {12.3, 15.5, 13.3, 18.9, 17.7, 14.4};
	
	n = sizeof(data) / sizeof(*data);
	printf("%f\n", welford_stdev(data, n));
	
	return 0;
}

double welford_stdev(double *datalist, int listsize)
{
     double a, b, u;
     int k;

     a = datalist[0];
     b = 0.0;
     for(k = 1; k < listsize; k++)
     {
         u = datalist[k] - a;
         a += u / k;
         b += u * (datalist[k] - a);
     }

     return sqrt(b / (listsize - 1.));
}
> P.S.
> Buy a copy of Knuth's TAOCP.
> That's an order.
> 

Yes sir. It will be my first purchase after I win the lottery.

JS
0
Reply John 4/11/2007 10:39:06 PM

On Apr 11, 5:39 pm, John Smith <JSm...@mail.net> wrote:
> Thanks. I don't know any C++ so it's hard for me to understand
> your example. Below is my C implementation of the pseudocode
> given in Osmium's link. It returns a slightly different value
> from the std dev calculated in the conventional manner. Can you
> see where I've gone wrong?
>
> /* Data X(1 to N)
>     A = X(1)
>     B = 0
>     FOR K = 2 to N
>     U = X(K) - A
>     A = A + U / CDBL(K)
>     B = B + U * (X(K) - A)
>     NEXT K
>     AVERAGE = A
>     STDEV = SQR( B / CDBL(N - 1) )
> */
> #include <stdio.h>
> #include <stdlib.h>
> #include <math.h>
> double welford_stdev(double *datalist, int listsize);
>
> int main(void)
> {
>         int n;
>         double data[] = {12.3, 15.5, 13.3, 18.9, 17.7, 14.4};
>
>         n = sizeof(data) / sizeof(*data);
>         printf("%f\n", welford_stdev(data, n));
>
>         return 0;
>
> }
>
> double welford_stdev(double *datalist, int listsize)
> {
>      double a, b, u;
>      int k;
>
>      a = datalist[0];
>      b = 0.0;
>      for(k = 1; k < listsize; k++)
>      {
>          u = datalist[k] - a;
>          a += u / k;
>          b += u * (datalist[k] - a);
>      }
>
>      return sqrt(b / (listsize - 1.));
>
> }


Looks to me like you are not fully compensating for the fact that C
array indexes are base zero.  Specifically, to match the Basic code, I
think you want to change the "a += u/k;" line to "a += u/(k+1);"

Also, precisely what do you mean by "It returns a slightly different
value from the std dev calculated in the conventional manner."?  All
of the single pass methods of computing variance have problems with
round-off error, Welfords less than most (which is why it's used), but
it can (even using Welford) be vary substantial (IOW, enough to make
the result completely meaningless).  So the question is what are you
comparing this to.  If the "conventional manner" is the two pass
approach (where you first compute the mean, and then the sum of
squares of differences from the mean in a second pass), then I'd
expect slight differences in almost all cases

0
Reply robertwessel2 4/12/2007 7:03:42 AM

John Smith wrote:

> I am looking for the Welford algorithm for standard deviation. Can
> anyone provide a link or show an example?

http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

    -- chris
0
Reply Chris 4/12/2007 7:37:55 AM

robertwessel2@yahoo.com wrote:
> On Apr 11, 5:39 pm, John Smith <JSm...@mail.net> wrote:
> 
>>Thanks. I don't know any C++ so it's hard for me to understand
>>your example. Below is my C implementation of the pseudocode
>>given in Osmium's link. It returns a slightly different value
>>from the std dev calculated in the conventional manner. Can you
>>see where I've gone wrong?

<snip code>

> Looks to me like you are not fully compensating for the fact that C
> array indexes are base zero.  Specifically, to match the Basic code, I
> think you want to change the "a += u/k;" line to "a += u/(k+1);"

Good call; that fixed it.

> 
> Also, precisely what do you mean by "It returns a slightly different
> value from the std dev calculated in the conventional manner."?  All
> of the single pass methods of computing variance have problems with
> round-off error, Welfords less than most (which is why it's used), but
> it can (even using Welford) be vary substantial (IOW, enough to make
> the result completely meaningless).  So the question is what are you
> comparing this to.  If the "conventional manner" is the two pass
> approach (where you first compute the mean, and then the sum of
> squares of differences from the mean in a second pass), then I'd
> expect slight differences in almost all cases

"Conventional manner:"

/* Return the sample variance
    of the values in datalist
                      _
            SUM{(Xi - X)^2}
    svar =  ---------------
                  N-1
*/
double svar(double *datalist, int listsize)
{
     int idx;
     double mean, diff, sumsqrs;

     sumsqrs = 0.0;
     mean = a_mean(datalist, listsize);
     for(idx = 0; idx < listsize; idx++)
     {
         diff = datalist[idx] - mean;
         sumsqrs += (diff * diff);
     }

     return sumsqrs / ((double)listsize-1.0);
}

/* Return the sample standard deviation
    of values in datalist. */
double s_stdev(double *datalist, int listsize)
{
     return sqrt(svar(datalist, listsize));
}

The two methods now agree exactly.

JS
> 
0
Reply John 4/12/2007 2:14:57 PM

On Apr 12, 7:14 am, John Smith <JSm...@mail.net> wrote:
> robertwess...@yahoo.com wrote:
> > On Apr 11, 5:39 pm, John Smith <JSm...@mail.net> wrote:
>
> >>Thanks. I don't know any C++ so it's hard for me to understand
> >>your example. Below is my C implementation of the pseudocode
> >>given in Osmium's link. It returns a slightly different value
> >>from the std dev calculated in the conventional manner. Can you
> >>see where I've gone wrong?
>
> <snip code>
>
> > Looks to me like you are not fully compensating for the fact that C
> > array indexes are base zero.  Specifically, to match the Basic code, I
> > think you want to change the "a += u/k;" line to "a += u/(k+1);"
>
> Good call; that fixed it.
>
>
>
> > Also, precisely what do you mean by "It returns a slightly different
> > value from the std dev calculated in the conventional manner."?  All
> > of the single pass methods of computing variance have problems with
> > round-off error, Welfords less than most (which is why it's used), but
> > it can (even using Welford) be vary substantial (IOW, enough to make
> > the result completely meaningless).  So the question is what are you
> > comparing this to.  If the "conventional manner" is the two pass
> > approach (where you first compute the mean, and then the sum of
> > squares of differences from the mean in a second pass), then I'd
> > expect slight differences in almost all cases
>
> "Conventional manner:"
>
> /* Return the sample variance
>     of the values in datalist
>                       _
>             SUM{(Xi - X)^2}
>     svar =  ---------------
>                   N-1
> */
> double svar(double *datalist, int listsize)
> {
>      int idx;
>      double mean, diff, sumsqrs;
>
>      sumsqrs = 0.0;
>      mean = a_mean(datalist, listsize);
>      for(idx = 0; idx < listsize; idx++)
>      {
>          diff = datalist[idx] - mean;
>          sumsqrs += (diff * diff);
>      }
>
>      return sumsqrs / ((double)listsize-1.0);
>
> }
>
> /* Return the sample standard deviation
>     of values in datalist. */
> double s_stdev(double *datalist, int listsize)
> {
>      return sqrt(svar(datalist, listsize));
>
> }
>
> The two methods now agree exactly.

Another thing that helps a lot is to use compensated summation
(Kahan's adder) for any sort of summations.  It is approximately the
same benefit as presorting the data.  These things become very
important with unusual distributions.

//
// Kahan Summation (also known as compensated summation) template
//  Implementation by Dann Corbit 8-11-1998
//  Thanks also to Tom Lippincott & Nathan Myers who gave valuable
advice.
//  But the crappy stuff in here is my fault, not theirs! ;-)
//
// This technique is due to Dr. William Kahan,
//  a Professor at University of California, Berkeley
//
// Personal note (DRC): Prof. Kahan is a very nice guy and a
gentleman.
//
// Ref:
//  "What Every Computer Scientist Should Know About Floating Point
Arithmetic,"
//  by David Goldberg, published in the March, 1991 issue of Computing
Surveys.
//  Copyright 1991, ACM
//
// Theorem 8 (Kahan Summation Formula) on page 203 shows:
//  error in iterative summation can be reduced to 2 * epsilon,
//  whereas naive summation can produce n * epsilon.
//
template <class Etype> class KahanAdder
{

private:
      Etype sum_;   //The current working sum.
      Etype carry_; //The carry from the previous operation

      // You may be tempted to move these into the body of member
add() as
      // local variables.  DON'T DO IT!  For some extended precision
types
      // the constructor time may dominate your application, or even
cause
      // it to fail due to memory fragmentation.
      Etype temp_;  //A temporary used to capture residual bits of
precision
      Etype y_;     //A temporary used to capture residual bits of
precision

      // This is the heart of the template. After each add operation,
      // the carry will contain the additional bits that could be left
out
      // from the previous operation.
      void      add(const Etype & datum) {
          y_ = datum - carry_;
          temp_ = sum_ + y_;
          carry_ = (temp_ - sum_) - y_;
          sum_ = temp_;
      }

public:
      // Constructors.  Both temp_ and y_ are initialized
      // on first use.  Note that these FORCE a constructor for zero.
      // This was done to make _asolutely sure_ that the components
      // are initialized to zero and not arbitrary values.
      // I'm a bit torn, however, because I may inherit a type for
      // which an integral constructor makes no sense.
      // I would like to hear more debate on this.
      KahanAdder() : sum_(0), carry_(0) {}
      KahanAdder(const Etype& e): sum_(e), carry_(0)  {}
      KahanAdder(const Etype& e, const Etype& c): sum_(e), carry_(c)
{}

      // Default destructor. Nothing special happens.
      // ~KahanAdder() {} // add stuff here if you want to build one.

      // Accessor method to return the working sum.
      Etype get_sum() const { return sum_; }

      // Accessor method to return the current carry.
      Etype get_carry() const { return carry_; }

      // I have powerful pro's and con's over this one.
      // Opinions?
      operator const Etype&() const { return sum_; }

      // Syntactic sugar so that we can use the adder in a transparent
way.
      // Here, we add an element to the sum with +=, which really just
calls
      // the add() method.
      KahanAdder & operator += (const Etype & g) { add(g); return
*this; }

      // Syntactic sugar so that we can use the adder in a transparent
way.
      // Here, we subtract an element to the sum with -=, which really
just
      // calls the add() method with the arg negated.
      KahanAdder & operator -= (const Etype & g) { add(-g); return
*this; }

      // Add two KahanAdders together.
      template< class Etype >
      KahanAdder<Etype> operator+(const KahanAdder<Etype>& right)
      {
          *this += right.get_carry();
          return *this += right.get_sum();
      }

      // Subtract one KahanAdder from another.
      template< class Etype >
      KahanAdder<Etype> operator-(const KahanAdder<Etype>& right)
      {
          *this -= right.get_carry();
          return *this -= right.get_sum();
      }
};

0
Reply user923005 4/12/2007 7:13:08 PM

On Apr 11, 3:39 pm, John Smith <JSm...@mail.net> wrote:
[snip]
> > P.S.
> > Buy a copy of Knuth's TAOCP.
> > That's an order.
>
> Yes sir. It will be my first purchase after I win the lottery.

No need to wait.  Here is the volume in question:
http://www.amazon.com/gp/offer-listing/0201896842/ref=dp_olp_2/102-4012492-5919352

0
Reply user923005 4/12/2007 8:25:29 PM

user923005 wrote:

> > Yes sir. It will be my first purchase after I win the lottery.
> 
> No need to wait.  Here is the volume in question:

I feel that you have confused the notions of necessary and sufficient
precondition...

   -- chris
0
Reply Chris 4/12/2007 8:40:34 PM

On Apr 12, 1:40 pm, "Chris Uppal" <chris.up...@metagnostic.REMOVE-
THIS.org> wrote:
> user923005 wrote:
> > > Yes sir. It will be my first purchase after I win the lottery.
>
> > No need to wait.  Here is the volume in question:
>
> I feel that you have confused the notions of necessary and sufficient
> precondition...

Speaking of necessary and sufficient:
I just figured if you write programs of a numerical nature and you
won't pop $25 for a volume of Knuth that deals with that activity,
then there is something wrong with that picture.

You're not a real programmer until you have read and understood TAOCP.
IMO-YMMV.

Or, at least, your employer is not getting his money's worth.

0
Reply user923005 4/12/2007 11:25:32 PM

user923005 wrote:
> "Chris Uppal" <chris.up...@metagnostic.REMOVE-THIS.org> wrote:
>> user923005 wrote:
>>
>>>> Yes sir. It will be my first purchase after I win the lottery.
>>
>>> No need to wait.  Here is the volume in question:
>>
>> I feel that you have confused the notions of necessary and
>> sufficient precondition...
> 
> Speaking of necessary and sufficient:
> I just figured if you write programs of a numerical nature and
> you won't pop $25 for a volume of Knuth that deals with that
> activity, then there is something wrong with that picture.
> 
> You're not a real programmer until you have read and understood
> TAOCP. IMO-YMMV.

Around here TAOCP costs about $70 per volume.  Total about $200.

-- 
 <http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
 <http://www.securityfocus.com/columnists/423>
 <http://www.aaxnet.com/editor/edit043.html>
 
 "A man who is right every time is not likely to do very much."
                           -- Francis Crick, co-discover of DNA
 "There is nothing more amazing than stupidity in action."
                                             -- Thomas Matthews


-- 
Posted via a free Usenet account from http://www.teranews.com

0
Reply CBFalconer 4/13/2007 12:25:29 AM

user923005 wrote:
> On Apr 12, 1:40 pm, "Chris Uppal" <chris.up...@metagnostic.REMOVE-
> THIS.org> wrote:
> 
>>user923005 wrote:
>>
>>>>Yes sir. It will be my first purchase after I win the lottery.
>>
>>>No need to wait.  Here is the volume in question:
>>
>>I feel that you have confused the notions of necessary and sufficient
>>precondition...
> 
> 
> Speaking of necessary and sufficient:
> I just figured if you write programs of a numerical nature and you
> won't pop $25 for a volume of Knuth that deals with that activity,
> then there is something wrong with that picture.
> 
> You're not a real programmer until you have read and understood TAOCP.
> IMO-YMMV.
> 
> Or, at least, your employer is not getting his money's worth.
> 

I'm not a "real programmer;" it's a hobby. At $25/vol I'd 
probably buy Knuth just to display it on my bookshelf. Much of 
its mathematical underpinning (not to mention the hideous 
assembler in which examples are given) is not useful to me. I 
wonder how many professional programmers find all of it 
understandable and useful. I don't mean to disrespect a genius or 
his magnum opus, just my personal POV.

JS
0
Reply John 4/13/2007 5:11:44 PM

12 Replies
591 Views

(page loaded in 0.128 seconds)

Similiar Articles:


















7/23/2012 10:20:04 AM


Reply: