f



grouping data to equalize mean

hi all,

i have some objects that, over the course of my experiment,
will be growing larger.  in an effort to minimize any
initial bias, i want to sort the objects before my
experiment begins, such that the average size of the objects
is nearly the same across my experimental groups.

for example, if my objects were plants, and i wanted to
monitor the rate at which the plants grow across a set of 4
experimental conditions, i might start with 40 seedlings.  i
want to measure the lengths of the seedlings, and then
assign them into 4 groups such that the starting lengths are
approximately equal between the 4 groups.

anyone have some insight as to how this could be
accomplished in matlab?

thanks,
bryan
0
cssmwbs (263)
5/27/2008 8:19:02 PM
comp.soft-sys.matlab 211266 articles. 18 followers. lunamoonmoon (258) is leader. Post Follow

5 Replies
243 Views

Similar Articles

[PageSpeed] 40

In article <g1hqbm$4f4$1@fred.mathworks.com>, Bryan  <cssmwbs@gmail.com> wrote:

>i have some objects that, over the course of my experiment,
>will be growing larger.  in an effort to minimize any
>initial bias, i want to sort the objects before my
>experiment begins, such that the average size of the objects
>is nearly the same across my experimental groups.

You can use something like this:


clusteridx = kmeans(sizedata,numberofgroups);
clusters = cell(numberofgroups,1);
for K=1:numberofgroups
  clusters{K} = objectinformation(clusteridx == K);
end


Note that the initial points for growing the clusters are chosen
at random, so if the clusters are not well seperated, the clustering
may differ from run to run. You may wish to use some of the optional
kmeans() parameters to control this. And of course the absolute cluster
numbers don't mean anything, so what comes out as cluster index 1
in one run might come out as cluster index 4 in another.
-- 
  "Product of a myriad various minds and contending tongues, compact of
  obscure and minute association, a language has its own abundant and
  often recondite laws, in the habitual and summary recognition of
  which scholarship consists."                -- Walter Pater
0
roberson2 (8605)
5/27/2008 8:58:11 PM
On May 27, 4:19=A0pm, "Bryan " <cssm...@gmail.com> wrote:
> hi all,
>
> i have some objects that, over the course of my experiment,
> will be growing larger. =A0in an effort to minimize any
> initial bias, i want to sort the objects before my
> experiment begins, such that the average size of the objects
> is nearly the same across my experimental groups.
>
> for example, if my objects were plants, and i wanted to
> monitor the rate at which the plants grow across a set of 4
> experimental conditions, i might start with 40 seedlings. =A0i
> want to measure the lengths of the seedlings, and then
> assign them into 4 groups such that the starting lengths are
> approximately equal between the 4 groups.
>
> anyone have some insight as to how this could be
> accomplished in matlab?
>
> thanks,
> bryan

y =3D sort(x);
for i =3D 1:4
    z(:,i) =3D y(i:4:end):
end% for i

Hope this helps.

Greg
0
heath (3983)
5/27/2008 9:00:31 PM
Greg Heath <heath@alumni.brown.edu> wrote in message
<32779933-71c2-46aa-aa1b-923b4a42471e@8g2000hse.googlegroups.com>...
> On May 27, 4:19=A0pm, "Bryan " <cssm...@gmail.com> wrote:
> > hi all,
> >
> > i have some objects that, over the course of my experiment,
> > will be growing larger. =A0in an effort to minimize any
> > initial bias, i want to sort the objects before my
> > experiment begins, such that the average size of the objects
> > is nearly the same across my experimental groups.
> >
> > for example, if my objects were plants, and i wanted to
> > monitor the rate at which the plants grow across a set of 4
> > experimental conditions, i might start with 40
seedlings. =A0i
> > want to measure the lengths of the seedlings, and then
> > assign them into 4 groups such that the starting lengths are
> > approximately equal between the 4 groups.
> >
> > anyone have some insight as to how this could be
> > accomplished in matlab?
> >
> > thanks,
> > bryan
> 
> y =3D sort(x);
> for i =3D 1:4
>     z(:,i) =3D y(i:4:end):
> end% for i
> 
> Hope this helps.
> 
> Greg

hi,

thanks for the suggestions... i have not yet tried the
kmeans clustering (clever tactic!).  but i did want to point
out that the simple 'sort and bin' method described above
does not work at all.  it certainly does not approach the
equivalence of means that i was searching for.  this can
even be seen with random data:

a = sort(rand(100,1));
idx = repmat([1:10],1,10)';
[grpMeans grpSems] = grpstats(a,idx,{'mean','sem'});

grpMeans =

    0.4164
    0.4209
    0.4342
    0.4467
    0.4556
    0.4642
    0.4740
    0.4908
    0.4965
    0.5108

note the lack of equivalence of the means... rather, they
are in ascending order.

basically what i ended up doing was making a whole bunch
(thousands) of randomized index variables (looping from 1:4
through length(x)), and then iteratively running through and
finding the index that gave the min difference between the
largest and smallest mean values.  this seems to have worked
rather well, and the biggest difference in mean values in my
final result is less than 1% difference.

thanks,
bryan

0
cssmwbs (263)
5/28/2008 6:32:01 PM
On May 28, 2:32 pm, "Bryan " <cssm...@gmail.com> wrote:
> Greg Heath <he...@alumni.brown.edu> wrote in message
> <32779933-71c2-46aa-aa1b-923b4a424...@8g2000hse.googlegroups.com>...
> > On May 27, 4:19=A0pm, "Bryan " <cssm...@gmail.com> wrote:
> > > hi all,
>
> > > i have some objects that, over the course of my experiment,
> > > will be growing larger. in an effort to minimize any
> > > initial bias, i want to sort the objects before my
> > > experiment begins, such that the average size of the objects
> > > is nearly the same across my experimental groups.
>
> > > for example, if my objects were plants, and i wanted to
> > > monitor the rate at which the plants grow across a set of 4
> > > experimental conditions, i might start with 40 seedlings.
>
> > > i want to measure the lengths of the seedlings, and then
> > > assign them into 4 groups such that the starting lengths are
> > > approximately equal between the 4 groups.
>
> > > anyone have some insight as to how this could be
> > > accomplished in matlab?
>
> > > thanks,
> > > bryan
>
> > y = sort(x);
> > for i = 1:4
> >     z(:,i) = y(i:4:end):
> > end% for i
>
> > Hope this helps.
>
> > Greg
>
> hi,
>
> thanks for the suggestions... i have not yet tried the
> kmeans clustering (clever tactic!).  but i did want to point
> out that the simple 'sort and bin' method described above
> does not work at all.  it certainly does not approach the
> equivalence of means that i was searching for.  this can
> even be seen with random data:
>
> a = sort(rand(100,1));
> idx = repmat([1:10],1,10)';
> [grpMeans grpSems] = grpstats(a,idx,{'mean','sem'});
>
> grpMeans =
>
>     0.4164
>     0.4209
>     0.4342
>     0.4467
>     0.4556
>     0.4642
>     0.4740
>     0.4908
>     0.4965
>     0.5108
>
> note the lack of equivalence of the means... rather, they
> are in ascending order.

A simple modification of one line in my code will cure that.

y = sort(x);
for i = 1:ngroups
    % z(:,i) = y(i:ngroups:end);
    z(:,i) = [y(i:2*ngroups:end); y(2*ngroups-i:2*ngroups:end)];
end% for i

> basically what i ended up doing was making a whole bunch
> (thousands) of randomized index variables (looping from 1:4
> through length(x)), and then iteratively running through and
> finding the index that gave the min difference between the
> largest and smallest mean values.  this seems to have worked
> rather well, and the biggest difference in mean values in my
> final result is less than 1% difference.

when

length(y) = 1000
ngroups =  10

I get ~ 0.1%

clear all, clc

n = 1000
nbins = 10
y = sort(rand(n,1));
for i = 1:nbins
    z1(:,i) = y(i:nbins:end);
    z2(:,i) = [y(i:2*nbins:end); y(2*nbins-i:2*nbins:end)];
end% for i
m1   = mean(z1)'
m10  = mean(m1)
rng1 = max(m1)-min(m1)
d1   = 100*rng1/m10
s1   = std(m1)
cv1  = 100*s1/m10

m2   = mean(z2)'
m20  = mean(m2)
rng2 = max(m2)-min(m2)
d2   = 100*rng2/m20
s2   = std(m2)
cv2  = 100*s2/m20

resultd = [d1 d2]
resultc = [cv1 cv2]


Hope this helps.

Greg
0
heath (3983)
5/29/2008 6:39:34 AM
Greg Heath <heath@alumni.brown.edu> wrote in message
<75b8618b-b8eb-4f05-9068-230bf46766ad@j22g2000hsf.googlegroups.com>...
> 
> On May 28, 2:32 pm, "Bryan " <cssm...@gmail.com> wrote:
> > Greg Heath <he...@alumni.brown.edu> wrote in message
> >
<32779933-71c2-46aa-aa1b-923b4a424...@8g2000hse.googlegroups.com>...
> > > On May 27, 4:19=A0pm, "Bryan " <cssm...@gmail.com> wrote:
> > > > hi all,
> >
> > > > i have some objects that, over the course of my
experiment,
> > > > will be growing larger. in an effort to minimize any
> > > > initial bias, i want to sort the objects before my
> > > > experiment begins, such that the average size of the
objects
> > > > is nearly the same across my experimental groups.
> >
> > > > for example, if my objects were plants, and i wanted to
> > > > monitor the rate at which the plants grow across a
set of 4
> > > > experimental conditions, i might start with 40
seedlings.
> >
> > > > i want to measure the lengths of the seedlings, and then
> > > > assign them into 4 groups such that the starting
lengths are
> > > > approximately equal between the 4 groups.
> >
> > > > anyone have some insight as to how this could be
> > > > accomplished in matlab?
> >
> > > > thanks,
> > > > bryan
> >
> > > y = sort(x);
> > > for i = 1:4
> > >     z(:,i) = y(i:4:end):
> > > end% for i
> >
> > > Hope this helps.
> >
> > > Greg
> >
> > hi,
> >
> > thanks for the suggestions... i have not yet tried the
> > kmeans clustering (clever tactic!).  but i did want to point
> > out that the simple 'sort and bin' method described above
> > does not work at all.  it certainly does not approach the
> > equivalence of means that i was searching for.  this can
> > even be seen with random data:
> >
> > a = sort(rand(100,1));
> > idx = repmat([1:10],1,10)';
> > [grpMeans grpSems] = grpstats(a,idx,{'mean','sem'});
> >
> > grpMeans =
> >
> >     0.4164
> >     0.4209
> >     0.4342
> >     0.4467
> >     0.4556
> >     0.4642
> >     0.4740
> >     0.4908
> >     0.4965
> >     0.5108
> >
> > note the lack of equivalence of the means... rather, they
> > are in ascending order.
> 
> A simple modification of one line in my code will cure that.
> 
> y = sort(x);
> for i = 1:ngroups
>     % z(:,i) = y(i:ngroups:end);
>     z(:,i) = [y(i:2*ngroups:end);
y(2*ngroups-i:2*ngroups:end)];
> end% for i
> 
> > basically what i ended up doing was making a whole bunch
> > (thousands) of randomized index variables (looping from 1:4
> > through length(x)), and then iteratively running through and
> > finding the index that gave the min difference between the
> > largest and smallest mean values.  this seems to have worked
> > rather well, and the biggest difference in mean values in my
> > final result is less than 1% difference.
> 
> when
> 
> length(y) = 1000
> ngroups =  10
> 
> I get ~ 0.1%
> 
> clear all, clc
> 
> n = 1000
> nbins = 10
> y = sort(rand(n,1));
> for i = 1:nbins
>     z1(:,i) = y(i:nbins:end);
>     z2(:,i) = [y(i:2*nbins:end); y(2*nbins-i:2*nbins:end)];
> end% for i
> m1   = mean(z1)'
> m10  = mean(m1)
> rng1 = max(m1)-min(m1)
> d1   = 100*rng1/m10
> s1   = std(m1)
> cv1  = 100*s1/m10
> 
> m2   = mean(z2)'
> m20  = mean(m2)
> rng2 = max(m2)-min(m2)
> d2   = 100*rng2/m20
> s2   = std(m2)
> cv2  = 100*s2/m20
> 
> resultd = [d1 d2]
> resultc = [cv1 cv2]
> 
> 
> Hope this helps.
> 
> Greg

hi greg,

thanks so much for the modified version of your code.  it
certainly works much better than the previous version, and
provides an equivalent result to my 'randomized search'
strategy with considerably less computation time.

regards,
bryan


0
cssmwbs (263)
5/29/2008 5:16:02 PM
Reply: