I am looking for the Welford algorithm for standard deviation.
Can anyone provide a link or show an example? (I don't have
access to TAOCP.)
JS
|
|
0
|
|
|
|
Reply
|
John
|
4/11/2007 6:32:48 PM |
|
On Apr 11, 11:32 am, John Smith <JSm...@mail.net> wrote:
> I am looking for the Welford algorithm for standard deviation.
> Can anyone provide a link or show an example? (I don't have
> access to TAOCP.)
>
> JS
C++ header file:
#include <cmath>
template <class etype>
class WelfordStandardDeviation {
private:
etype m[2];
etype s[2];
size_t n;
public:
WelfordStandardDeviation(void);
etype GetS(void);
etype GetM(void);
etype GetN(void);
etype GetSigma(void);
void AddItem(etype x);
};
template <class etype>
WelfordStandardDeviation::WelfordStandardDeviation(void)
{
m[0] = (etype)0;
m[1] = (etype)0;
s[0] = (etype)0;
s[1] = (etype)0;
n = 0;
}
template <class etype>
etype WelfordStandardDeviation::GetS(void)
{
if (n >= 2)
return s[0];
return sqrt(-1.); /* Create a NAN */
}
template <class etype>
etype WelfordStandardDeviation::GetM(void)
{
if (n >= 2)
return s[0];
return sqrt(-1.); /* Create a NAN */
}
template <class etype>
etype WelfordStandardDeviation::GetN(void)
{
return n;
}
template <class etype>
etype WelfordStandardDeviation::GetSigma(void)
{
if (n >= 2)
return sqrt(s[0] / (n - 1.0));
return sqrt(-1.); /* Create a NAN */
}
template <class etype>
void WelfordStandardDeviation::AddItem(etype x)
{
++n;
if (n == 1) {
m[0] = x;
} else {
m[1] = m[0] + (x - m[0]) / n;
s[1] = s[0] + (x - m[0]) * (x - m[1]);
m[0] = m[1]; /* for next time */
s[0] = s[1]; /* for next time */
}
}
C++ test driver (computes using Welford header file and by another
method):
#ifdef UNIT_TEST
#include <iostream>
#include <iomanip>
using namespace std;
#include "stats.hpp"
#include "welford.hpp"
int main(void)
{
WelfordStandardDeviation <double> wsd;
UnivarateStatistics <double, double > us(true);
size_t i;
for (i = 0; i < 10000; i++) {
wsd.AddItem((double) i);
us.add_pass_one((double) i);
}
us.calculate_simple_statistics();
cout << setprecision(20);
cout << "Standard deviation of the first 10000 integers is " <<
wsd.GetSigma() << " verified as " << us.get_s() << endl;
return 0;
}
/*
C:\tmp>welford
Standard deviation of the first 10000 integers is 2886.8956799071675
verified as 2886.8956799071675
*/
#endif /* UNIT_TEST */
P.S.
Buy a copy of Knuth's TAOCP.
That's an order.
|
|
0
|
|
|
|
Reply
|
user923005
|
4/11/2007 7:34:53 PM
|
|
"John Smith" writes:
>I am looking for the Welford algorithm for standard deviation. Can anyone
>provide a link or show an example? (I don't have access to TAOCP.)
This might help.
http://www.daheiser.info/excel/notes/notei.pdf
|
|
0
|
|
|
|
Reply
|
osmium
|
4/11/2007 7:39:45 PM
|
|
user923005 wrote:
> On Apr 11, 11:32 am, John Smith <JSm...@mail.net> wrote:
>
>>I am looking for the Welford algorithm for standard deviation.
>>Can anyone provide a link or show an example? (I don't have
>>access to TAOCP.)
>>
>>JS
<snip C++ code>
Thanks. I don't know any C++ so it's hard for me to understand
your example. Below is my C implementation of the pseudocode
given in Osmium's link. It returns a slightly different value
from the std dev calculated in the conventional manner. Can you
see where I've gone wrong?
/* Data X(1 to N)
A = X(1)
B = 0
FOR K = 2 to N
U = X(K) - A
A = A + U / CDBL(K)
B = B + U * (X(K) - A)
NEXT K
AVERAGE = A
STDEV = SQR( B / CDBL(N - 1) )
*/
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
double welford_stdev(double *datalist, int listsize);
int main(void)
{
int n;
double data[] = {12.3, 15.5, 13.3, 18.9, 17.7, 14.4};
n = sizeof(data) / sizeof(*data);
printf("%f\n", welford_stdev(data, n));
return 0;
}
double welford_stdev(double *datalist, int listsize)
{
double a, b, u;
int k;
a = datalist[0];
b = 0.0;
for(k = 1; k < listsize; k++)
{
u = datalist[k] - a;
a += u / k;
b += u * (datalist[k] - a);
}
return sqrt(b / (listsize - 1.));
}
> P.S.
> Buy a copy of Knuth's TAOCP.
> That's an order.
>
Yes sir. It will be my first purchase after I win the lottery.
JS
|
|
0
|
|
|
|
Reply
|
John
|
4/11/2007 10:39:06 PM
|
|
On Apr 11, 5:39 pm, John Smith <JSm...@mail.net> wrote:
> Thanks. I don't know any C++ so it's hard for me to understand
> your example. Below is my C implementation of the pseudocode
> given in Osmium's link. It returns a slightly different value
> from the std dev calculated in the conventional manner. Can you
> see where I've gone wrong?
>
> /* Data X(1 to N)
> A = X(1)
> B = 0
> FOR K = 2 to N
> U = X(K) - A
> A = A + U / CDBL(K)
> B = B + U * (X(K) - A)
> NEXT K
> AVERAGE = A
> STDEV = SQR( B / CDBL(N - 1) )
> */
> #include <stdio.h>
> #include <stdlib.h>
> #include <math.h>
> double welford_stdev(double *datalist, int listsize);
>
> int main(void)
> {
> int n;
> double data[] = {12.3, 15.5, 13.3, 18.9, 17.7, 14.4};
>
> n = sizeof(data) / sizeof(*data);
> printf("%f\n", welford_stdev(data, n));
>
> return 0;
>
> }
>
> double welford_stdev(double *datalist, int listsize)
> {
> double a, b, u;
> int k;
>
> a = datalist[0];
> b = 0.0;
> for(k = 1; k < listsize; k++)
> {
> u = datalist[k] - a;
> a += u / k;
> b += u * (datalist[k] - a);
> }
>
> return sqrt(b / (listsize - 1.));
>
> }
Looks to me like you are not fully compensating for the fact that C
array indexes are base zero. Specifically, to match the Basic code, I
think you want to change the "a += u/k;" line to "a += u/(k+1);"
Also, precisely what do you mean by "It returns a slightly different
value from the std dev calculated in the conventional manner."? All
of the single pass methods of computing variance have problems with
round-off error, Welfords less than most (which is why it's used), but
it can (even using Welford) be vary substantial (IOW, enough to make
the result completely meaningless). So the question is what are you
comparing this to. If the "conventional manner" is the two pass
approach (where you first compute the mean, and then the sum of
squares of differences from the mean in a second pass), then I'd
expect slight differences in almost all cases
|
|
0
|
|
|
|
Reply
|
robertwessel2
|
4/12/2007 7:03:42 AM
|
|
John Smith wrote:
> I am looking for the Welford algorithm for standard deviation. Can
> anyone provide a link or show an example?
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
-- chris
|
|
0
|
|
|
|
Reply
|
Chris
|
4/12/2007 7:37:55 AM
|
|
robertwessel2@yahoo.com wrote:
> On Apr 11, 5:39 pm, John Smith <JSm...@mail.net> wrote:
>
>>Thanks. I don't know any C++ so it's hard for me to understand
>>your example. Below is my C implementation of the pseudocode
>>given in Osmium's link. It returns a slightly different value
>>from the std dev calculated in the conventional manner. Can you
>>see where I've gone wrong?
<snip code>
> Looks to me like you are not fully compensating for the fact that C
> array indexes are base zero. Specifically, to match the Basic code, I
> think you want to change the "a += u/k;" line to "a += u/(k+1);"
Good call; that fixed it.
>
> Also, precisely what do you mean by "It returns a slightly different
> value from the std dev calculated in the conventional manner."? All
> of the single pass methods of computing variance have problems with
> round-off error, Welfords less than most (which is why it's used), but
> it can (even using Welford) be vary substantial (IOW, enough to make
> the result completely meaningless). So the question is what are you
> comparing this to. If the "conventional manner" is the two pass
> approach (where you first compute the mean, and then the sum of
> squares of differences from the mean in a second pass), then I'd
> expect slight differences in almost all cases
"Conventional manner:"
/* Return the sample variance
of the values in datalist
_
SUM{(Xi - X)^2}
svar = ---------------
N-1
*/
double svar(double *datalist, int listsize)
{
int idx;
double mean, diff, sumsqrs;
sumsqrs = 0.0;
mean = a_mean(datalist, listsize);
for(idx = 0; idx < listsize; idx++)
{
diff = datalist[idx] - mean;
sumsqrs += (diff * diff);
}
return sumsqrs / ((double)listsize-1.0);
}
/* Return the sample standard deviation
of values in datalist. */
double s_stdev(double *datalist, int listsize)
{
return sqrt(svar(datalist, listsize));
}
The two methods now agree exactly.
JS
>
|
|
0
|
|
|
|
Reply
|
John
|
4/12/2007 2:14:57 PM
|
|
On Apr 12, 7:14 am, John Smith <JSm...@mail.net> wrote:
> robertwess...@yahoo.com wrote:
> > On Apr 11, 5:39 pm, John Smith <JSm...@mail.net> wrote:
>
> >>Thanks. I don't know any C++ so it's hard for me to understand
> >>your example. Below is my C implementation of the pseudocode
> >>given in Osmium's link. It returns a slightly different value
> >>from the std dev calculated in the conventional manner. Can you
> >>see where I've gone wrong?
>
> <snip code>
>
> > Looks to me like you are not fully compensating for the fact that C
> > array indexes are base zero. Specifically, to match the Basic code, I
> > think you want to change the "a += u/k;" line to "a += u/(k+1);"
>
> Good call; that fixed it.
>
>
>
> > Also, precisely what do you mean by "It returns a slightly different
> > value from the std dev calculated in the conventional manner."? All
> > of the single pass methods of computing variance have problems with
> > round-off error, Welfords less than most (which is why it's used), but
> > it can (even using Welford) be vary substantial (IOW, enough to make
> > the result completely meaningless). So the question is what are you
> > comparing this to. If the "conventional manner" is the two pass
> > approach (where you first compute the mean, and then the sum of
> > squares of differences from the mean in a second pass), then I'd
> > expect slight differences in almost all cases
>
> "Conventional manner:"
>
> /* Return the sample variance
> of the values in datalist
> _
> SUM{(Xi - X)^2}
> svar = ---------------
> N-1
> */
> double svar(double *datalist, int listsize)
> {
> int idx;
> double mean, diff, sumsqrs;
>
> sumsqrs = 0.0;
> mean = a_mean(datalist, listsize);
> for(idx = 0; idx < listsize; idx++)
> {
> diff = datalist[idx] - mean;
> sumsqrs += (diff * diff);
> }
>
> return sumsqrs / ((double)listsize-1.0);
>
> }
>
> /* Return the sample standard deviation
> of values in datalist. */
> double s_stdev(double *datalist, int listsize)
> {
> return sqrt(svar(datalist, listsize));
>
> }
>
> The two methods now agree exactly.
Another thing that helps a lot is to use compensated summation
(Kahan's adder) for any sort of summations. It is approximately the
same benefit as presorting the data. These things become very
important with unusual distributions.
//
// Kahan Summation (also known as compensated summation) template
// Implementation by Dann Corbit 8-11-1998
// Thanks also to Tom Lippincott & Nathan Myers who gave valuable
advice.
// But the crappy stuff in here is my fault, not theirs! ;-)
//
// This technique is due to Dr. William Kahan,
// a Professor at University of California, Berkeley
//
// Personal note (DRC): Prof. Kahan is a very nice guy and a
gentleman.
//
// Ref:
// "What Every Computer Scientist Should Know About Floating Point
Arithmetic,"
// by David Goldberg, published in the March, 1991 issue of Computing
Surveys.
// Copyright 1991, ACM
//
// Theorem 8 (Kahan Summation Formula) on page 203 shows:
// error in iterative summation can be reduced to 2 * epsilon,
// whereas naive summation can produce n * epsilon.
//
template <class Etype> class KahanAdder
{
private:
Etype sum_; //The current working sum.
Etype carry_; //The carry from the previous operation
// You may be tempted to move these into the body of member
add() as
// local variables. DON'T DO IT! For some extended precision
types
// the constructor time may dominate your application, or even
cause
// it to fail due to memory fragmentation.
Etype temp_; //A temporary used to capture residual bits of
precision
Etype y_; //A temporary used to capture residual bits of
precision
// This is the heart of the template. After each add operation,
// the carry will contain the additional bits that could be left
out
// from the previous operation.
void add(const Etype & datum) {
y_ = datum - carry_;
temp_ = sum_ + y_;
carry_ = (temp_ - sum_) - y_;
sum_ = temp_;
}
public:
// Constructors. Both temp_ and y_ are initialized
// on first use. Note that these FORCE a constructor for zero.
// This was done to make _asolutely sure_ that the components
// are initialized to zero and not arbitrary values.
// I'm a bit torn, however, because I may inherit a type for
// which an integral constructor makes no sense.
// I would like to hear more debate on this.
KahanAdder() : sum_(0), carry_(0) {}
KahanAdder(const Etype& e): sum_(e), carry_(0) {}
KahanAdder(const Etype& e, const Etype& c): sum_(e), carry_(c)
{}
// Default destructor. Nothing special happens.
// ~KahanAdder() {} // add stuff here if you want to build one.
// Accessor method to return the working sum.
Etype get_sum() const { return sum_; }
// Accessor method to return the current carry.
Etype get_carry() const { return carry_; }
// I have powerful pro's and con's over this one.
// Opinions?
operator const Etype&() const { return sum_; }
// Syntactic sugar so that we can use the adder in a transparent
way.
// Here, we add an element to the sum with +=, which really just
calls
// the add() method.
KahanAdder & operator += (const Etype & g) { add(g); return
*this; }
// Syntactic sugar so that we can use the adder in a transparent
way.
// Here, we subtract an element to the sum with -=, which really
just
// calls the add() method with the arg negated.
KahanAdder & operator -= (const Etype & g) { add(-g); return
*this; }
// Add two KahanAdders together.
template< class Etype >
KahanAdder<Etype> operator+(const KahanAdder<Etype>& right)
{
*this += right.get_carry();
return *this += right.get_sum();
}
// Subtract one KahanAdder from another.
template< class Etype >
KahanAdder<Etype> operator-(const KahanAdder<Etype>& right)
{
*this -= right.get_carry();
return *this -= right.get_sum();
}
};
|
|
0
|
|
|
|
Reply
|
user923005
|
4/12/2007 7:13:08 PM
|
|
On Apr 11, 3:39 pm, John Smith <JSm...@mail.net> wrote:
[snip]
> > P.S.
> > Buy a copy of Knuth's TAOCP.
> > That's an order.
>
> Yes sir. It will be my first purchase after I win the lottery.
No need to wait. Here is the volume in question:
http://www.amazon.com/gp/offer-listing/0201896842/ref=dp_olp_2/102-4012492-5919352
|
|
0
|
|
|
|
Reply
|
user923005
|
4/12/2007 8:25:29 PM
|
|
user923005 wrote:
> > Yes sir. It will be my first purchase after I win the lottery.
>
> No need to wait. Here is the volume in question:
I feel that you have confused the notions of necessary and sufficient
precondition...
-- chris
|
|
0
|
|
|
|
Reply
|
Chris
|
4/12/2007 8:40:34 PM
|
|
On Apr 12, 1:40 pm, "Chris Uppal" <chris.up...@metagnostic.REMOVE-
THIS.org> wrote:
> user923005 wrote:
> > > Yes sir. It will be my first purchase after I win the lottery.
>
> > No need to wait. Here is the volume in question:
>
> I feel that you have confused the notions of necessary and sufficient
> precondition...
Speaking of necessary and sufficient:
I just figured if you write programs of a numerical nature and you
won't pop $25 for a volume of Knuth that deals with that activity,
then there is something wrong with that picture.
You're not a real programmer until you have read and understood TAOCP.
IMO-YMMV.
Or, at least, your employer is not getting his money's worth.
|
|
0
|
|
|
|
Reply
|
user923005
|
4/12/2007 11:25:32 PM
|
|
user923005 wrote:
> "Chris Uppal" <chris.up...@metagnostic.REMOVE-THIS.org> wrote:
>> user923005 wrote:
>>
>>>> Yes sir. It will be my first purchase after I win the lottery.
>>
>>> No need to wait. Here is the volume in question:
>>
>> I feel that you have confused the notions of necessary and
>> sufficient precondition...
>
> Speaking of necessary and sufficient:
> I just figured if you write programs of a numerical nature and
> you won't pop $25 for a volume of Knuth that deals with that
> activity, then there is something wrong with that picture.
>
> You're not a real programmer until you have read and understood
> TAOCP. IMO-YMMV.
Around here TAOCP costs about $70 per volume. Total about $200.
--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>
<http://www.aaxnet.com/editor/edit043.html>
"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
--
Posted via a free Usenet account from http://www.teranews.com
|
|
0
|
|
|
|
Reply
|
CBFalconer
|
4/13/2007 12:25:29 AM
|
|
user923005 wrote:
> On Apr 12, 1:40 pm, "Chris Uppal" <chris.up...@metagnostic.REMOVE-
> THIS.org> wrote:
>
>>user923005 wrote:
>>
>>>>Yes sir. It will be my first purchase after I win the lottery.
>>
>>>No need to wait. Here is the volume in question:
>>
>>I feel that you have confused the notions of necessary and sufficient
>>precondition...
>
>
> Speaking of necessary and sufficient:
> I just figured if you write programs of a numerical nature and you
> won't pop $25 for a volume of Knuth that deals with that activity,
> then there is something wrong with that picture.
>
> You're not a real programmer until you have read and understood TAOCP.
> IMO-YMMV.
>
> Or, at least, your employer is not getting his money's worth.
>
I'm not a "real programmer;" it's a hobby. At $25/vol I'd
probably buy Knuth just to display it on my bookshelf. Much of
its mathematical underpinning (not to mention the hideous
assembler in which examples are given) is not useful to me. I
wonder how many professional programmers find all of it
understandable and useful. I don't mean to disrespect a genius or
his magnum opus, just my personal POV.
JS
|
|
0
|
|
|
|
Reply
|
John
|
4/13/2007 5:11:44 PM
|
|
|
12 Replies
593 Views
(page loaded in 0.132 seconds)
Similiar Articles: Welford algorithm for computing standard deviation - comp ...I am looking for the Welford algorithm for standard deviation. Can anyone provide a link or show an example? (I don't have access to TAOCP.) JS ... How can I compute a Hessian or Jacobian using the genetic ...Welford algorithm for computing standard deviation - comp ..... looking for the Welford algorithm for standard deviation. Can ... No' question regarding optimization ... mean ,variance,standard deviation ,kurtosis of image - comp.soft ...Welford algorithm for computing standard deviation - comp ... mean ,variance,standard deviation ,kurtosis of image - comp.soft ... Welford algorithm for computing standard ... Normrnd of matrix - comp.soft-sys.matlabWelford algorithm for computing standard deviation - comp ... Jacobian matrix from lsqnonlin - comp.soft-sys.matlab | Computer Group Welford ... comp ... compute standard ... scoring algorithm - comp.soft-sys.matlabWelford algorithm for computing standard deviation - comp ... scoring algorithm - comp.soft-sys.matlab Statistics area from z-score function - comp.sys.hp48 The Inverse ... VOICE ACTIVITY DETECTION BASED ON ZERO CROSIING RATE - comp.soft ...Welford algorithm for computing standard deviation - comp ... VOICE ACTIVITY DETECTION BASED ON ZERO CROSIING RATE - comp.soft ... Welford algorithm for computing standard ... Jacobian matrix from lsqnonlin - comp.soft-sys.matlabWelford algorithm for computing standard deviation - comp ... Jacobian matrix from lsqnonlin - comp.soft-sys.matlab | Computer Group Welford algorithm for computing ... cumulative bivariate normal distribution as a function of radius ...Welford algorithm for computing standard deviation - comp ... cumulative bivariate normal distribution as a function of radius ... Welford algorithm ... double integral of normal distribution Follow - Computer GroupWelford algorithm for computing standard deviation - comp ..... main(void) { WelfordStandardDeviation <double ... may inherit a type for // which an integral ... of ... variance x n is called? - comp.soft-sys.matlabWelford algorithm for computing standard deviation - comp ... variance x n is called? - comp.soft-sys.matlab Welford algorithm for computing standard deviation - comp ... thomas alogrithm - comp.soft-sys.matlabThe tridiagonal matrix algorithm (TDMA), also known as the Thomas algorithm, is a ... thomas alogrithm - comp.soft-sys.matlab | Computer Group i made a program for thomas ... How to compute confidence interval of the mean - comp.soft-sys ...Welford algorithm for computing standard deviation - comp ... How to compute confidence interval of the mean - comp.soft-sys ... Genetic Algorithms - comp.soft-sys.matlab ... The Inverse cumulative distribution function for Inverse Gaussian ...Welford algorithm for computing standard deviation - comp ... The Inverse cumulative distribution function for Inverse Gaussian ... Welford algorithm for computing ... Mean Variance optimization with FMINCON Follow - Computer GroupWelford algorithm for computing standard deviation - comp ..... or 'No' question regarding optimization algorithms Follow Welford algorithm for computing standard ... Numerical Errors when calculating a Minimum version FIR Filter ...Welford algorithm for computing standard deviation - comp ... Numerical Errors when calculating a Minimum version FIR Filter ... Welford algorithm for computing standard ... Welford algorithm for computing standard deviation - comp ...I am looking for the Welford algorithm for standard deviation. Can anyone provide a link or show an example? (I don't have access to TAOCP.) JS ... Algorithms for calculating variance - Wikipedia, the free encyclopedia... if the standard deviation is small relative to the mean. However, the algorithm ... This algorithm is due to Knuth, who cites Welford, and it ... pass algorithm for computing ... 7/23/2012 10:20:04 AM
|