f



Bootstrapping multivariate data

Hi all,

I am searching for a Matlab function that can do the non-parametric bootstrapping of multivariate data. For instance, I have a matrix of sample data (MxN) where M is the dimension of the random vector (multivariate data), and N is the number of observation. I want to generate (resample) bootstrap data from this initial multivariate data. Does anyone knows this function?

Thank you very much for your help.

Best regards, 
CT DO 
0
CT
8/17/2010 5:59:05 AM
comp.soft-sys.matlab 210130 articles. 11 followers. lunamoonmoon (258) is leader. Post Follow

13 Replies
1175 Views

Similar Articles

[PageSpeed] 12

"CT " <cong-thanh.do@hotmail.fr> wrote in message <i4d8f9$n3a$1@fred.mathworks.com>...
> Hi all,
> 
> I am searching for a Matlab function that can do the non-parametric bootstrapping of multivariate data. For instance, I have a matrix of sample data (MxN) where M is the dimension of the random vector (multivariate data), and N is the number of observation. I want to generate (resample) bootstrap data from this initial multivariate data. Does anyone knows this function?
> 
> Thank you very much for your help.
> 
> Best regards, 
> CT DO 

matlab does not not have a pre-built function for multivariate data.However, in the file exhcnage you can find a code, the function is called 'bstrag'
0
Rogelio
8/17/2010 6:33:42 AM
On 8/17/2010 1:59 AM, CT wrote:
> I am searching for a Matlab function that can do the non-parametric
> bootstrapping of multivariate data. For instance, I have a matrix of
> sample data (MxN) where M is the dimension of the random vector
> (multivariate data), and N is the number of observation. I want to
> generate (resample) bootstrap data from this initial multivariate data.
> Does anyone knows this function?

If you have access to the Statistics Toolbox, the BOOTSTRP function does 
what you are asking.  it is here:

<http://www.mathworks.com/access/helpdesk/help/toolbox/stat /bootstrp.html>
0
Peter
8/17/2010 12:34:34 PM
Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <i4dvkq$8ns$2@fred.mathworks.com>...
> On 8/17/2010 1:59 AM, CT wrote:
> > I am searching for a Matlab function that can do the non-parametric
> > bootstrapping of multivariate data. For instance, I have a matrix of
> > sample data (MxN) where M is the dimension of the random vector
> > (multivariate data), and N is the number of observation. I want to
> > generate (resample) bootstrap data from this initial multivariate data.
> > Does anyone knows this function?
> 
> If you have access to the Statistics Toolbox, the BOOTSTRP function does 
> what you are asking.  it is here:
> 
> <http://www.mathworks.com/access/helpdesk/help/toolbox/stat /bootstrp.html>

Isn't this just:

X(:,ceil(rand(1,N)*N))

where X is the sample matrix?
0
Simon
8/17/2010 2:18:05 PM
Thank you all for your replies. I'll try to perform your suggestions and will let you know about the results.

CT DO


"Simon Preston" <preston.simon+mathsworks@gmail.com> wrote in message <i4e5mt$c2f$1@fred.mathworks.com>...
> Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <i4dvkq$8ns$2@fred.mathworks.com>...
> > On 8/17/2010 1:59 AM, CT wrote:
> > > I am searching for a Matlab function that can do the non-parametric
> > > bootstrapping of multivariate data. For instance, I have a matrix of
> > > sample data (MxN) where M is the dimension of the random vector
> > > (multivariate data), and N is the number of observation. I want to
> > > generate (resample) bootstrap data from this initial multivariate data.
> > > Does anyone knows this function?
> > 
> > If you have access to the Statistics Toolbox, the BOOTSTRP function does 
> > what you are asking.  it is here:
> > 
> > <http://www.mathworks.com/access/helpdesk/help/toolbox/stat /bootstrp.html>
> 
> Isn't this just:
> 
> X(:,ceil(rand(1,N)*N))
> 
> where X is the sample matrix?
0
8/17/2010 4:31:11 PM
On 8/17/2010 10:18 AM, Simon Preston wrote:
>> <http://www.mathworks.com/access/helpdesk/help/toolbox/stat
>> /bootstrp.html>

Sorry, for some reason that link was missing an "s"
<http://www.mathworks.com/access/helpdesk/help/toolbox/stats/bootstrp.html>

> Isn't this just:
>
> X(:,ceil(rand(1,N)*N))
>
> where X is the sample matrix?

That's the basis of it, yes.  But:

1) It's kind of tedious to write the same loop over and over, regardless 
of how simple that loop is,
1) There is a good deal of flexibility in the arguments you can pass to 
BOOTSTRP, so a single matrix isn't the only case it handles for you, and
2) (in recent MATLAB releases) There is support for parallelizing the 
computations using PARFOR (if your installation supports that)

Just as an aside, since 2008b you might find it easier to use RANDI to 
generate random integers.
0
8/17/2010 5:58:47 PM
Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <i4eikn$ndn$1@fred.mathworks.com>...
> On 8/17/2010 10:18 AM, Simon Preston wrote:
> >> <http://www.mathworks.com/access/helpdesk/help/toolbox/stat
> >> /bootstrp.html>
> 
> Sorry, for some reason that link was missing an "s"
> <http://www.mathworks.com/access/helpdesk/help/toolbox/stats/bootstrp.html>
> 
> > Isn't this just:
> >
> > X(:,ceil(rand(1,N)*N))
> >
> > where X is the sample matrix?
> 
> That's the basis of it, yes.  But:
> 
> 1) It's kind of tedious to write the same loop over and over, regardless 
> of how simple that loop is,
> 1) There is a good deal of flexibility in the arguments you can pass to 
> BOOTSTRP, so a single matrix isn't the only case it handles for you, and
> 2) (in recent MATLAB releases) There is support for parallelizing the 
> computations using PARFOR (if your installation supports that)
> 
> Just as an aside, since 2008b you might find it easier to use RANDI to 
> generate random integers.

Just one thing to point out, you said that M is the dimention of the data. I thought that you ment different groups or different experiments where the data was collected, after all thats why your data is not of dimenation N*M x 1, for instance. If the columns of the matrix represent different groups, for some or another reason, you cannot pool the series. As far as know 'bootstrp' does not distinguishes among different groups. If this last statement is incorrect, can someone send me the link to read about it. 
Thanks
0
Rogelio
8/17/2010 7:31:04 PM
I mean that I have N observations of the random vectors x, the vector x has M elements, these are the seed data. So each variable here is a vector (of M elements). Their probability density distribution (pdf) might be multivariate distribution, e.g. Gaussian mixture model (GMM). Since the bootstrap here is non-parametric, the N observations will be used instead of a concrete pdf.

I have tried to used BOOTSTRP to perform the bootstrapping, but it is not easy, even unfeasible (tell me if I am wrong), since the manual of BOOTSTRP in Matlab is not clear in this case (I think).

If the generated data is only X(:,ceil(rand(1,N)*N)), I don't see anything new that the bootstrap can bring. As I see, this is only a disorder of the initial data, we cannot expect anything different from the new data, I'm wrong?

"Rogelio " <rogelioa@math.uio.no> wrote in message <i4eo1o$c8b$1@fred.mathworks.com>...
> Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <i4eikn$ndn$1@fred.mathworks.com>...
> > On 8/17/2010 10:18 AM, Simon Preston wrote:
> > >> <http://www.mathworks.com/access/helpdesk/help/toolbox/stat
> > >> /bootstrp.html>
> > 
> > Sorry, for some reason that link was missing an "s"
> > <http://www.mathworks.com/access/helpdesk/help/toolbox/stats/bootstrp.html>
> > 
> > > Isn't this just:
> > >
> > > X(:,ceil(rand(1,N)*N))
> > >
> > > where X is the sample matrix?
> > 
> > That's the basis of it, yes.  But:
> > 
> > 1) It's kind of tedious to write the same loop over and over, regardless 
> > of how simple that loop is,
> > 1) There is a good deal of flexibility in the arguments you can pass to 
> > BOOTSTRP, so a single matrix isn't the only case it handles for you, and
> > 2) (in recent MATLAB releases) There is support for parallelizing the 
> > computations using PARFOR (if your installation supports that)
> > 
> > Just as an aside, since 2008b you might find it easier to use RANDI to 
> > generate random integers.
> 
> Just one thing to point out, you said that M is the dimention of the data. I thought that you ment different groups or different experiments where the data was collected, after all thats why your data is not of dimenation N*M x 1, for instance. If the columns of the matrix represent different groups, for some or another reason, you cannot pool the series. As far as know 'bootstrp' does not distinguishes among different groups. If this last statement is incorrect, can someone send me the link to read about it. 
> Thanks
0
CT
8/18/2010 6:17:24 AM
If you are saying or have a feeling that your data might come from a multivariate distribution, then as far as I know 'bootstrp' will pool your data together, assuming they come from the same pdf which might be an erronous assumption. 
> I have tried to used BOOTSTRP to perform the bootstrapping, but it is not easy, even unfeasible (tell me if I am wrong), since the manual of BOOTSTRP in Matlab is not clear in this case (I think)<
Why? can you tell us what is the mistake or post the code
>As I see, this is only a disorder of the initial data, we cannot expect anything different from the new data, I'm wrong?<
What the bootstrapring does, roughly speaking, is to resample with replacement. We create pseudo random variables out from your original data. The empirical pdf will converge to the pdf, this is asymptotically. 


"CT " <cong-thanh.do@hotmail.fr> wrote in message <i4fttk$qpk$1@fred.mathworks.com>...
> I mean that I have N observations of the random vectors x, the vector x has M elements, these are the seed data. So each variable here is a vector (of M elements). Their probability density distribution (pdf) might be multivariate distribution, e.g. Gaussian mixture model (GMM). Since the bootstrap here is non-parametric, the N observations will be used instead of a concrete pdf.
> 
> I have tried to used BOOTSTRP to perform the bootstrapping, but it is not easy, even unfeasible (tell me if I am wrong), since the manual of BOOTSTRP in Matlab is not clear in this case (I think).
> 
> If the generated data is only X(:,ceil(rand(1,N)*N)), I don't see anything new that the bootstrap can bring. As I see, this is only a disorder of the initial data, we cannot expect anything different from the new data, I'm wrong?
> 
> "Rogelio " <rogelioa@math.uio.no> wrote in message <i4eo1o$c8b$1@fred.mathworks.com>...
> > Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <i4eikn$ndn$1@fred.mathworks.com>...
> > > On 8/17/2010 10:18 AM, Simon Preston wrote:
> > > >> <http://www.mathworks.com/access/helpdesk/help/toolbox/stat
> > > >> /bootstrp.html>
> > > 
> > > Sorry, for some reason that link was missing an "s"
> > > <http://www.mathworks.com/access/helpdesk/help/toolbox/stats/bootstrp.html>
> > > 
> > > > Isn't this just:
> > > >
> > > > X(:,ceil(rand(1,N)*N))
> > > >
> > > > where X is the sample matrix?
> > > 
> > > That's the basis of it, yes.  But:
> > > 
> > > 1) It's kind of tedious to write the same loop over and over, regardless 
> > > of how simple that loop is,
> > > 1) There is a good deal of flexibility in the arguments you can pass to 
> > > BOOTSTRP, so a single matrix isn't the only case it handles for you, and
> > > 2) (in recent MATLAB releases) There is support for parallelizing the 
> > > computations using PARFOR (if your installation supports that)
> > > 
> > > Just as an aside, since 2008b you might find it easier to use RANDI to 
> > > generate random integers.
> > 
> > Just one thing to point out, you said that M is the dimention of the data. I thought that you ment different groups or different experiments where the data was collected, after all thats why your data is not of dimenation N*M x 1, for instance. If the columns of the matrix represent different groups, for some or another reason, you cannot pool the series. As far as know 'bootstrp' does not distinguishes among different groups. If this last statement is incorrect, can someone send me the link to read about it. 
> > Thanks
0
Rogelio
8/18/2010 6:55:23 AM
By the way ...... what is the statistc that you are bootstraping? it will be nice if you post the code.

"Rogelio " <rogelioa@math.uio.no> wrote in message <i4g04r$fa8$1@fred.mathworks.com>...
> If you are saying or have a feeling that your data might come from a multivariate distribution, then as far as I know 'bootstrp' will pool your data together, assuming they come from the same pdf which might be an erronous assumption. 
> > I have tried to used BOOTSTRP to perform the bootstrapping, but it is not easy, even unfeasible (tell me if I am wrong), since the manual of BOOTSTRP in Matlab is not clear in this case (I think)<
> Why? can you tell us what is the mistake or post the code
> >As I see, this is only a disorder of the initial data, we cannot expect anything different from the new data, I'm wrong?<
> What the bootstrapring does, roughly speaking, is to resample with replacement. We create pseudo random variables out from your original data. The empirical pdf will converge to the pdf, this is asymptotically. 
> 
> 
> "CT " <cong-thanh.do@hotmail.fr> wrote in message <i4fttk$qpk$1@fred.mathworks.com>...
> > I mean that I have N observations of the random vectors x, the vector x has M elements, these are the seed data. So each variable here is a vector (of M elements). Their probability density distribution (pdf) might be multivariate distribution, e.g. Gaussian mixture model (GMM). Since the bootstrap here is non-parametric, the N observations will be used instead of a concrete pdf.
> > 
> > I have tried to used BOOTSTRP to perform the bootstrapping, but it is not easy, even unfeasible (tell me if I am wrong), since the manual of BOOTSTRP in Matlab is not clear in this case (I think).
> > 
> > If the generated data is only X(:,ceil(rand(1,N)*N)), I don't see anything new that the bootstrap can bring. As I see, this is only a disorder of the initial data, we cannot expect anything different from the new data, I'm wrong?
> > 
> > "Rogelio " <rogelioa@math.uio.no> wrote in message <i4eo1o$c8b$1@fred.mathworks.com>...
> > > Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <i4eikn$ndn$1@fred.mathworks.com>...
> > > > On 8/17/2010 10:18 AM, Simon Preston wrote:
> > > > >> <http://www.mathworks.com/access/helpdesk/help/toolbox/stat
> > > > >> /bootstrp.html>
> > > > 
> > > > Sorry, for some reason that link was missing an "s"
> > > > <http://www.mathworks.com/access/helpdesk/help/toolbox/stats/bootstrp.html>
> > > > 
> > > > > Isn't this just:
> > > > >
> > > > > X(:,ceil(rand(1,N)*N))
> > > > >
> > > > > where X is the sample matrix?
> > > > 
> > > > That's the basis of it, yes.  But:
> > > > 
> > > > 1) It's kind of tedious to write the same loop over and over, regardless 
> > > > of how simple that loop is,
> > > > 1) There is a good deal of flexibility in the arguments you can pass to 
> > > > BOOTSTRP, so a single matrix isn't the only case it handles for you, and
> > > > 2) (in recent MATLAB releases) There is support for parallelizing the 
> > > > computations using PARFOR (if your installation supports that)
> > > > 
> > > > Just as an aside, since 2008b you might find it easier to use RANDI to 
> > > > generate random integers.
> > > 
> > > Just one thing to point out, you said that M is the dimention of the data. I thought that you ment different groups or different experiments where the data was collected, after all thats why your data is not of dimenation N*M x 1, for instance. If the columns of the matrix represent different groups, for some or another reason, you cannot pool the series. As far as know 'bootstrp' does not distinguishes among different groups. If this last statement is incorrect, can someone send me the link to read about it. 
> > > Thanks
0
Rogelio
8/18/2010 7:08:05 AM
On 8/18/2010 2:55 AM, Rogelio wrote:
> If you are saying or have a feeling that your data might come from a
> multivariate distribution, then as far as I know 'bootstrp' will pool
> your data together, assuming they come from the same pdf which might be
> an erronous assumption.

Rogelio, your definition of "multivariate" seems to mean "grouped" or 
"stratified" or "from a mixture distribution".  The usual way to define 
"multivariate" is simply that there are multiple variables.  You are 
correct that BOOTSTRP does not resample with stratification, but it's 
not clear that that is what the OP was asking about.
0
Peter
8/18/2010 12:22:25 PM
For instance, I have a matrix X(M,N) = X(3,500) of initial data. There are thus N = 500 observations of random vector tri-variate random vector x following the multivariate normal distribution. These data can be generated by the code:
mu = [1 -1 -2]; Sigma = [2 -1 1; -1 2 -1; 1 -1 2];
X = mvnrnd(mu, Sigma, 500);
I don't know if I can use 'bootstrp' to generate the data of the same nature, i.e. they follow (asymptotically) the multivariate normal distribution that I have used to generate X:
[bootstat, bootsamp] = bootstrp(10, [], X); (I don't care about the stats of the data at the moment, I want to have the resampled data only).

However, 'bootstrp' returns the matrix bootsamp of dimension 500x10, so 'bootstrp' has done only for one dimensional variable? And I don't know if 'bootstrp' can return the stats for multivariate distribution or not? (here are the mean vector and covariance matrix)

"Rogelio " <rogelioa@math.uio.no> wrote in message <i4g0sl$a6k$1@fred.mathworks.com>...
> By the way ...... what is the statistc that you are bootstraping? it will be nice if you post the code.
> 
> "Rogelio " <rogelioa@math.uio.no> wrote in message <i4g04r$fa8$1@fred.mathworks.com>...
> > If you are saying or have a feeling that your data might come from a multivariate distribution, then as far as I know 'bootstrp' will pool your data together, assuming they come from the same pdf which might be an erronous assumption. 
> > > I have tried to used BOOTSTRP to perform the bootstrapping, but it is not easy, even unfeasible (tell me if I am wrong), since the manual of BOOTSTRP in Matlab is not clear in this case (I think)<
> > Why? can you tell us what is the mistake or post the code
> > >As I see, this is only a disorder of the initial data, we cannot expect anything different from the new data, I'm wrong?<
> > What the bootstrapring does, roughly speaking, is to resample with replacement. We create pseudo random variables out from your original data. The empirical pdf will converge to the pdf, this is asymptotically. 
> > 
> > 
> > "CT " <cong-thanh.do@hotmail.fr> wrote in message <i4fttk$qpk$1@fred.mathworks.com>...
> > > I mean that I have N observations of the random vectors x, the vector x has M elements, these are the seed data. So each variable here is a vector (of M elements). Their probability density distribution (pdf) might be multivariate distribution, e.g. Gaussian mixture model (GMM). Since the bootstrap here is non-parametric, the N observations will be used instead of a concrete pdf.
> > > 
> > > I have tried to used BOOTSTRP to perform the bootstrapping, but it is not easy, even unfeasible (tell me if I am wrong), since the manual of BOOTSTRP in Matlab is not clear in this case (I think).
> > > 
> > > If the generated data is only X(:,ceil(rand(1,N)*N)), I don't see anything new that the bootstrap can bring. As I see, this is only a disorder of the initial data, we cannot expect anything different from the new data, I'm wrong?
> > > 
> > > "Rogelio " <rogelioa@math.uio.no> wrote in message <i4eo1o$c8b$1@fred.mathworks.com>...
> > > > Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <i4eikn$ndn$1@fred.mathworks.com>...
> > > > > On 8/17/2010 10:18 AM, Simon Preston wrote:
> > > > > >> <http://www.mathworks.com/access/helpdesk/help/toolbox/stat
> > > > > >> /bootstrp.html>
> > > > > 
> > > > > Sorry, for some reason that link was missing an "s"
> > > > > <http://www.mathworks.com/access/helpdesk/help/toolbox/stats/bootstrp.html>
> > > > > 
> > > > > > Isn't this just:
> > > > > >
> > > > > > X(:,ceil(rand(1,N)*N))
> > > > > >
> > > > > > where X is the sample matrix?
> > > > > 
> > > > > That's the basis of it, yes.  But:
> > > > > 
> > > > > 1) It's kind of tedious to write the same loop over and over, regardless 
> > > > > of how simple that loop is,
> > > > > 1) There is a good deal of flexibility in the arguments you can pass to 
> > > > > BOOTSTRP, so a single matrix isn't the only case it handles for you, and
> > > > > 2) (in recent MATLAB releases) There is support for parallelizing the 
> > > > > computations using PARFOR (if your installation supports that)
> > > > > 
> > > > > Just as an aside, since 2008b you might find it easier to use RANDI to 
> > > > > generate random integers.
> > > > 
> > > > Just one thing to point out, you said that M is the dimention of the data. I thought that you ment different groups or different experiments where the data was collected, after all thats why your data is not of dimenation N*M x 1, for instance. If the columns of the matrix represent different groups, for some or another reason, you cannot pool the series. As far as know 'bootstrp' does not distinguishes among different groups. If this last statement is incorrect, can someone send me the link to read about it. 
> > > > Thanks
0
CT
8/18/2010 3:55:28 PM
Here's some very basic code that might illustrate what's' going on

%%  Generate your original data set

mu = [1 -1 -2]; Sigma = [2 -1 1; -1 2 -1; 1 -1 2];

X = mvnrnd(mu, Sigma, 500);



%%  Sampling with replacement to create a new data set



% Generate an index

boot_index = randsample(1:length(X),length(X), 'true')'



% Use the index to create a new dataset

Boot_dataset = X(bootindex,:)



A bootstrap is simply repeating this same operation nboot times and then 
calculating something interesting using this set of new data sets.



Jumping back to the whole "multivariate" discussion.



Each time you're drawing from X, you're extracting an entire row.

All of the elements of this row are related in that they are a single output 
from your original multivariate normal distribution.



All of this assumes that you need to perform a nonparametric bootstrap.



If you have prior knowledge that your population is described by a 
multivariate normal distribution with



mu = [1 -1 -2]



and



Sigma = [2 -1 1; -1 2 -1; 1 -1 2];



then its often entirely appropriate to use parametric bootstrap  and 
generate your new dataset using mvnrnd.


0
Richard
8/18/2010 4:56:17 PM
Just a correction, the covariance matrix that I have used is only an example to illustrate the generation of multivariate data. A matrix like that might have no sense.
Thank you for the discussions.
0
CT
8/19/2010 4:13:58 PM
Reply: