COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### Welford algorithm for computing standard deviation

• Follow

```I am looking for the Welford algorithm for standard deviation.
Can anyone provide a link or show an example? (I don't have

JS
```
 0

```On Apr 11, 11:32 am, John Smith <JSm...@mail.net> wrote:
> I am looking for the Welford algorithm for standard deviation.
> Can anyone provide a link or show an example? (I don't have
>
> JS

#include <cmath>

template <class etype>
class WelfordStandardDeviation {
private:
etype          m[2];
etype          s[2];
size_t          n;
public:
WelfordStandardDeviation(void);
etype          GetS(void);
etype          GetM(void);
etype          GetN(void);
etype          GetSigma(void);
};

template <class etype>
WelfordStandardDeviation::WelfordStandardDeviation(void)
{
m[0] = (etype)0;
m[1] = (etype)0;
s[0] = (etype)0;
s[1] = (etype)0;
n = 0;
}

template <class etype>
etype          WelfordStandardDeviation::GetS(void)
{
if (n >= 2)
return s[0];
return sqrt(-1.); /* Create a NAN */
}

template <class etype>
etype          WelfordStandardDeviation::GetM(void)
{
if (n >= 2)
return s[0];
return sqrt(-1.); /* Create a NAN */
}

template <class etype>
etype          WelfordStandardDeviation::GetN(void)
{
return n;
}

template <class etype>
etype          WelfordStandardDeviation::GetSigma(void)
{
if (n >= 2)
return sqrt(s[0] / (n - 1.0));
return sqrt(-1.); /* Create a NAN */
}

template <class etype>
{
++n;
if (n == 1) {
m[0] = x;
} else {
m[1] = m[0] + (x - m[0]) / n;
s[1] = s[0] + (x - m[0]) * (x - m[1]);
m[0] = m[1]; /* for next time */
s[0] = s[1]; /* for next time */
}
}

C++ test driver (computes using Welford header file and by another
method):

#ifdef UNIT_TEST

#include <iostream>
#include <iomanip>
using namespace std;

#include "stats.hpp"
#include "welford.hpp"

int             main(void)
{
WelfordStandardDeviation <double> wsd;
UnivarateStatistics <double, double > us(true);
size_t i;
for (i = 0; i < 10000; i++) {
}
us.calculate_simple_statistics();
cout << setprecision(20);
cout << "Standard deviation of the first 10000 integers is " <<
wsd.GetSigma() << " verified as " << us.get_s() << endl;
return 0;
}

/*
C:\tmp>welford
Standard deviation of the first 10000 integers is 2886.8956799071675
verified as 2886.8956799071675
*/
#endif          /* UNIT_TEST */

P.S.
Buy a copy of Knuth's TAOCP.
That's an order.

```
 0

```"John Smith" writes:

>I am looking for the Welford algorithm for standard deviation. Can anyone

This might help.

http://www.daheiser.info/excel/notes/notei.pdf

```
 0

```user923005 wrote:
> On Apr 11, 11:32 am, John Smith <JSm...@mail.net> wrote:
>
>>I am looking for the Welford algorithm for standard deviation.
>>Can anyone provide a link or show an example? (I don't have
>>
>>JS

<snip C++ code>

Thanks. I don't know any C++ so it's hard for me to understand
your example. Below is my C implementation of the pseudocode
given in Osmium's link. It returns a slightly different value
from the std dev calculated in the conventional manner. Can you
see where I've gone wrong?

/* Data X(1 to N)
A = X(1)
B = 0
FOR K = 2 to N
U = X(K) - A
A = A + U / CDBL(K)
B = B + U * (X(K) - A)
NEXT K
AVERAGE = A
STDEV = SQR( B / CDBL(N - 1) )
*/
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
double welford_stdev(double *datalist, int listsize);

int main(void)
{
int n;
double data[] = {12.3, 15.5, 13.3, 18.9, 17.7, 14.4};

n = sizeof(data) / sizeof(*data);
printf("%f\n", welford_stdev(data, n));

return 0;
}

double welford_stdev(double *datalist, int listsize)
{
double a, b, u;
int k;

a = datalist[0];
b = 0.0;
for(k = 1; k < listsize; k++)
{
u = datalist[k] - a;
a += u / k;
b += u * (datalist[k] - a);
}

return sqrt(b / (listsize - 1.));
}
> P.S.
> Buy a copy of Knuth's TAOCP.
> That's an order.
>

Yes sir. It will be my first purchase after I win the lottery.

JS
```
 0

```On Apr 11, 5:39 pm, John Smith <JSm...@mail.net> wrote:
> Thanks. I don't know any C++ so it's hard for me to understand
> your example. Below is my C implementation of the pseudocode
> given in Osmium's link. It returns a slightly different value
> from the std dev calculated in the conventional manner. Can you
> see where I've gone wrong?
>
> /* Data X(1 to N)
>     A = X(1)
>     B = 0
>     FOR K = 2 to N
>     U = X(K) - A
>     A = A + U / CDBL(K)
>     B = B + U * (X(K) - A)
>     NEXT K
>     AVERAGE = A
>     STDEV = SQR( B / CDBL(N - 1) )
> */
> #include <stdio.h>
> #include <stdlib.h>
> #include <math.h>
> double welford_stdev(double *datalist, int listsize);
>
> int main(void)
> {
>         int n;
>         double data[] = {12.3, 15.5, 13.3, 18.9, 17.7, 14.4};
>
>         n = sizeof(data) / sizeof(*data);
>         printf("%f\n", welford_stdev(data, n));
>
>         return 0;
>
> }
>
> double welford_stdev(double *datalist, int listsize)
> {
>      double a, b, u;
>      int k;
>
>      a = datalist[0];
>      b = 0.0;
>      for(k = 1; k < listsize; k++)
>      {
>          u = datalist[k] - a;
>          a += u / k;
>          b += u * (datalist[k] - a);
>      }
>
>      return sqrt(b / (listsize - 1.));
>
> }

Looks to me like you are not fully compensating for the fact that C
array indexes are base zero.  Specifically, to match the Basic code, I
think you want to change the "a += u/k;" line to "a += u/(k+1);"

Also, precisely what do you mean by "It returns a slightly different
value from the std dev calculated in the conventional manner."?  All
of the single pass methods of computing variance have problems with
round-off error, Welfords less than most (which is why it's used), but
it can (even using Welford) be vary substantial (IOW, enough to make
the result completely meaningless).  So the question is what are you
comparing this to.  If the "conventional manner" is the two pass
approach (where you first compute the mean, and then the sum of
squares of differences from the mean in a second pass), then I'd
expect slight differences in almost all cases

```
 0

```John Smith wrote:

> I am looking for the Welford algorithm for standard deviation. Can
> anyone provide a link or show an example?

http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

-- chris
```
 0

```robertwessel2@yahoo.com wrote:
> On Apr 11, 5:39 pm, John Smith <JSm...@mail.net> wrote:
>
>>Thanks. I don't know any C++ so it's hard for me to understand
>>your example. Below is my C implementation of the pseudocode
>>given in Osmium's link. It returns a slightly different value
>>from the std dev calculated in the conventional manner. Can you
>>see where I've gone wrong?

<snip code>

> Looks to me like you are not fully compensating for the fact that C
> array indexes are base zero.  Specifically, to match the Basic code, I
> think you want to change the "a += u/k;" line to "a += u/(k+1);"

Good call; that fixed it.

>
> Also, precisely what do you mean by "It returns a slightly different
> value from the std dev calculated in the conventional manner."?  All
> of the single pass methods of computing variance have problems with
> round-off error, Welfords less than most (which is why it's used), but
> it can (even using Welford) be vary substantial (IOW, enough to make
> the result completely meaningless).  So the question is what are you
> comparing this to.  If the "conventional manner" is the two pass
> approach (where you first compute the mean, and then the sum of
> squares of differences from the mean in a second pass), then I'd
> expect slight differences in almost all cases

"Conventional manner:"

/* Return the sample variance
of the values in datalist
_
SUM{(Xi - X)^2}
svar =  ---------------
N-1
*/
double svar(double *datalist, int listsize)
{
int idx;
double mean, diff, sumsqrs;

sumsqrs = 0.0;
mean = a_mean(datalist, listsize);
for(idx = 0; idx < listsize; idx++)
{
diff = datalist[idx] - mean;
sumsqrs += (diff * diff);
}

return sumsqrs / ((double)listsize-1.0);
}

/* Return the sample standard deviation
of values in datalist. */
double s_stdev(double *datalist, int listsize)
{
return sqrt(svar(datalist, listsize));
}

The two methods now agree exactly.

JS
>
```
 0

```On Apr 12, 7:14 am, John Smith <JSm...@mail.net> wrote:
> robertwess...@yahoo.com wrote:
> > On Apr 11, 5:39 pm, John Smith <JSm...@mail.net> wrote:
>
> >>Thanks. I don't know any C++ so it's hard for me to understand
> >>your example. Below is my C implementation of the pseudocode
> >>given in Osmium's link. It returns a slightly different value
> >>from the std dev calculated in the conventional manner. Can you
> >>see where I've gone wrong?
>
> <snip code>
>
> > Looks to me like you are not fully compensating for the fact that C
> > array indexes are base zero.  Specifically, to match the Basic code, I
> > think you want to change the "a += u/k;" line to "a += u/(k+1);"
>
> Good call; that fixed it.
>
>
>
> > Also, precisely what do you mean by "It returns a slightly different
> > value from the std dev calculated in the conventional manner."?  All
> > of the single pass methods of computing variance have problems with
> > round-off error, Welfords less than most (which is why it's used), but
> > it can (even using Welford) be vary substantial (IOW, enough to make
> > the result completely meaningless).  So the question is what are you
> > comparing this to.  If the "conventional manner" is the two pass
> > approach (where you first compute the mean, and then the sum of
> > squares of differences from the mean in a second pass), then I'd
> > expect slight differences in almost all cases
>
> "Conventional manner:"
>
> /* Return the sample variance
>     of the values in datalist
>                       _
>             SUM{(Xi - X)^2}
>     svar =  ---------------
>                   N-1
> */
> double svar(double *datalist, int listsize)
> {
>      int idx;
>      double mean, diff, sumsqrs;
>
>      sumsqrs = 0.0;
>      mean = a_mean(datalist, listsize);
>      for(idx = 0; idx < listsize; idx++)
>      {
>          diff = datalist[idx] - mean;
>          sumsqrs += (diff * diff);
>      }
>
>      return sumsqrs / ((double)listsize-1.0);
>
> }
>
> /* Return the sample standard deviation
>     of values in datalist. */
> double s_stdev(double *datalist, int listsize)
> {
>      return sqrt(svar(datalist, listsize));
>
> }
>
> The two methods now agree exactly.

Another thing that helps a lot is to use compensated summation
(Kahan's adder) for any sort of summations.  It is approximately the
same benefit as presorting the data.  These things become very
important with unusual distributions.

//
// Kahan Summation (also known as compensated summation) template
//  Implementation by Dann Corbit 8-11-1998
//  Thanks also to Tom Lippincott & Nathan Myers who gave valuable
//  But the crappy stuff in here is my fault, not theirs! ;-)
//
// This technique is due to Dr. William Kahan,
//  a Professor at University of California, Berkeley
//
// Personal note (DRC): Prof. Kahan is a very nice guy and a
gentleman.
//
// Ref:
//  "What Every Computer Scientist Should Know About Floating Point
Arithmetic,"
//  by David Goldberg, published in the March, 1991 issue of Computing
Surveys.
//
// Theorem 8 (Kahan Summation Formula) on page 203 shows:
//  error in iterative summation can be reduced to 2 * epsilon,
//  whereas naive summation can produce n * epsilon.
//
{

private:
Etype sum_;   //The current working sum.
Etype carry_; //The carry from the previous operation

// You may be tempted to move these into the body of member
// local variables.  DON'T DO IT!  For some extended precision
types
// the constructor time may dominate your application, or even
cause
// it to fail due to memory fragmentation.
Etype temp_;  //A temporary used to capture residual bits of
precision
Etype y_;     //A temporary used to capture residual bits of
precision

// This is the heart of the template. After each add operation,
// the carry will contain the additional bits that could be left
out
// from the previous operation.
void      add(const Etype & datum) {
y_ = datum - carry_;
temp_ = sum_ + y_;
carry_ = (temp_ - sum_) - y_;
sum_ = temp_;
}

public:
// Constructors.  Both temp_ and y_ are initialized
// on first use.  Note that these FORCE a constructor for zero.
// This was done to make _asolutely sure_ that the components
// are initialized to zero and not arbitrary values.
// I'm a bit torn, however, because I may inherit a type for
// which an integral constructor makes no sense.
// I would like to hear more debate on this.
KahanAdder(const Etype& e): sum_(e), carry_(0)  {}
KahanAdder(const Etype& e, const Etype& c): sum_(e), carry_(c)
{}

// Default destructor. Nothing special happens.
// ~KahanAdder() {} // add stuff here if you want to build one.

// Accessor method to return the working sum.
Etype get_sum() const { return sum_; }

// Accessor method to return the current carry.
Etype get_carry() const { return carry_; }

// I have powerful pro's and con's over this one.
// Opinions?
operator const Etype&() const { return sum_; }

// Syntactic sugar so that we can use the adder in a transparent
way.
// Here, we add an element to the sum with +=, which really just
calls
*this; }

// Syntactic sugar so that we can use the adder in a transparent
way.
// Here, we subtract an element to the sum with -=, which really
just
// calls the add() method with the arg negated.
*this; }

template< class Etype >
{
*this += right.get_carry();
return *this += right.get_sum();
}

// Subtract one KahanAdder from another.
template< class Etype >
{
*this -= right.get_carry();
return *this -= right.get_sum();
}
};

```
 0

```On Apr 11, 3:39 pm, John Smith <JSm...@mail.net> wrote:
[snip]
> > P.S.
> > Buy a copy of Knuth's TAOCP.
> > That's an order.
>
> Yes sir. It will be my first purchase after I win the lottery.

No need to wait.  Here is the volume in question:
http://www.amazon.com/gp/offer-listing/0201896842/ref=dp_olp_2/102-4012492-5919352

```
 0

```user923005 wrote:

> > Yes sir. It will be my first purchase after I win the lottery.
>
> No need to wait.  Here is the volume in question:

I feel that you have confused the notions of necessary and sufficient
precondition...

-- chris
```
 0

```On Apr 12, 1:40 pm, "Chris Uppal" <chris.up...@metagnostic.REMOVE-
THIS.org> wrote:
> user923005 wrote:
> > > Yes sir. It will be my first purchase after I win the lottery.
>
> > No need to wait.  Here is the volume in question:
>
> I feel that you have confused the notions of necessary and sufficient
> precondition...

Speaking of necessary and sufficient:
I just figured if you write programs of a numerical nature and you
won't pop \$25 for a volume of Knuth that deals with that activity,
then there is something wrong with that picture.

You're not a real programmer until you have read and understood TAOCP.
IMO-YMMV.

Or, at least, your employer is not getting his money's worth.

```
 0

```user923005 wrote:
> "Chris Uppal" <chris.up...@metagnostic.REMOVE-THIS.org> wrote:
>> user923005 wrote:
>>
>>>> Yes sir. It will be my first purchase after I win the lottery.
>>
>>> No need to wait.  Here is the volume in question:
>>
>> I feel that you have confused the notions of necessary and
>> sufficient precondition...
>
> Speaking of necessary and sufficient:
> I just figured if you write programs of a numerical nature and
> you won't pop \$25 for a volume of Knuth that deals with that
> activity, then there is something wrong with that picture.
>
> You're not a real programmer until you have read and understood
> TAOCP. IMO-YMMV.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>
<http://www.aaxnet.com/editor/edit043.html>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews

--
Posted via a free Usenet account from http://www.teranews.com

```
 0

```user923005 wrote:
> On Apr 12, 1:40 pm, "Chris Uppal" <chris.up...@metagnostic.REMOVE-
> THIS.org> wrote:
>
>>user923005 wrote:
>>
>>>>Yes sir. It will be my first purchase after I win the lottery.
>>
>>>No need to wait.  Here is the volume in question:
>>
>>I feel that you have confused the notions of necessary and sufficient
>>precondition...
>
>
> Speaking of necessary and sufficient:
> I just figured if you write programs of a numerical nature and you
> won't pop \$25 for a volume of Knuth that deals with that activity,
> then there is something wrong with that picture.
>
> You're not a real programmer until you have read and understood TAOCP.
> IMO-YMMV.
>
> Or, at least, your employer is not getting his money's worth.
>

I'm not a "real programmer;" it's a hobby. At \$25/vol I'd
probably buy Knuth just to display it on my bookshelf. Much of
its mathematical underpinning (not to mention the hideous
assembler in which examples are given) is not useful to me. I
wonder how many professional programmers find all of it
understandable and useful. I don't mean to disrespect a genius or
his magnum opus, just my personal POV.

JS
```
 0

12 Replies
591 Views

Similiar Articles:

7/23/2012 10:20:04 AM