Construct confidence interval for a p-value

  • Follow


When I calculate a p-value of, say, a t-test on two sample distributions, that p-value is a statistic, and that statistic has error associated with it.  (Just like the mean of a distribution has associated error, the "standard error of the mean".)

Does anyone know how to calculate this error?  I have tried resampling techniques, which is how I would normally go about estimating a confidence interval for things like medians (and other percentiles), which do not always have analytical formulas.  [See my resampling code below, where I do a t-test on two normal distributions, slightly offset from each other.]

The resulting resampled distribution is very wide, and would give huge error for the p-value.  I am a bit baffled about what (if anything!) is wrong in this approach.  Or maybe I have a simple coding error.

Any insights will be deeply appreciated.

the cyclist

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [] = confidenceIntervalForPValue()

% Set RNG seed for replicability
stream = RandStream('mt19937ar','Seed',1);
RandStream.setDefaultStream(stream);

% Number in each sample
N = 500;

% How much to offset one random normal from the other,
% to simulate an "effect"
EFFECT_SIZE = 0.1;

% Number of times to resample the original sample, for
% constructing the confidence interval
NRESAMPLE = 20000;

% Generate two random samples.  The second one is offset
% from the first, to simulate an effect

X1 = randn(N,1);
X2 = randn(N,1) + EFFECT_SIZE;

% Perform t-test, and obtain p-value.  [Don't care about hypothesis test.]
[~, p] = ttest2(X1,X2)


% Generate random values for indexing (with replacement) into
% the original sample
resampleIndexIntoX1 = randi(N,[N,NRESAMPLE]);
resampleIndexIntoX2 = randi(N,[N,NRESAMPLE]);

% Create the resamples
resampleX1 = X1(resampleIndexIntoX1);
resampleX2 = X2(resampleIndexIntoX2);

% Perform t-test on resamples, and get resampled p-values
[~, resample_p] = ttest2(resampleX1,resampleX2);

% Plot distribution of p-values
figure
hist(resample_p,0.01:0.01:1);
set(gca,'XLim',[0 1])
ylim = get(gca,'YLim');
legend('Distribution of resampled p-values')
hp = line([p p],ylim);
set(hp,'Color','r')
ht=text(p+0.02,ylim(2)/2,'p-value of original sample')
set(ht,'Color','r')

% Get mean and percentiles of resample, to estimate CI of original sample
mean_resampled_p  = mean(resample_p)
ci_lo_resampled_p = prctile(resample_p,2.5)
ci_hi_resampled_p = prctile(resample_p,97.5)

end
0
Reply the 12/7/2010 3:54:05 PM

On 12/7/2010 10:54 AM, the cyclist wrote:
> When I calculate a p-value of, say, a t-test on two sample
> distributions, that p-value is a statistic, and that statistic has error
> associated with it. (Just like the mean of a distribution has associated
> error, the "standard error of the mean".)
>
> Does anyone know how to calculate this error? I have tried resampling
> techniques, which is how I would normally go about estimating a
> confidence interval for things like medians

Cyclist, the following is not very helpful, and a bit rambling.  But I'm 
not sure what you're really asking for.

I agree that a p-value is a random quantity, and even a "statistic" 
(it's a function of the random data).  But it seems like by comparing it 
to the mean, and by asking for a confidence interval, you're making the 
implicit statement that there's some "true value" that the p-value is an 
estimate of.  That would seem to be the only way to make sense of a 
confidence interval in the usual sense.  So what is the "true value" 
that the p-value is estimating?  The expected p-value?  Under the null 
hypothesis, the p-value is uniform [0,1], so we know its expectation. 
The alternative is probably a composite hypothesis, so you have to 
define the expected p-value conditionally (or be a Bayesian).

I guess that by bootstrapping, you're trying to non-parametrically 
estimate the distribution of your data, and then simulate the sampling 
distribution of your statistic given that estimate of the data's 
distribution.  Then you're distilling that sampling distribution down to 
a measure of "error" for the p-value.  So what is it you're trying to 
quantify with that measure of "error"?  Ordinarily, for example, a large 
standard error for a statistic is "bad".  Under the null hypothesis, the 
p-value has as unpredictable distribution as you can get on [0,1], so is 
that somehow bad?  I don't think so.  And presumably, the closer you get 
to the null, the more uniform the sampling distribution of the p-value is.

But p-values aren't designed to have a small SE or MSE, they're designed 
to give a high power under some alternative hypothesis, given a 
specified false rejection rate under the null.  I'm not sure you can 
reconcile those two things.

Also, what you're doing seems somehow related to something called 
"observed power", which many people claim is completely bogus.
0
Reply Peter 12/8/2010 9:36:03 PM


On 12/7/2010 10:54 AM, the cyclist wrote:
> When I calculate a p-value of, say, a t-test on two sample
> distributions, that p-value is a statistic, and that statistic has error
> associated with it. (Just like the mean of a distribution has associated
> error, the "standard error of the mean".)
>
> Does anyone know how to calculate this error? I have tried resampling
> techniques, which is how I would normally go about estimating a
> confidence interval for things like medians

Cyclist, the following is not very helpful, and a bit rambling.  But I'm 
not sure what you're really asking for.

I agree that a p-value is a random quantity, and even a "statistic" 
(it's a function of the random data).  But it seems like by comparing it 
to the mean, and by asking for a confidence interval, you're making the 
implicit statement that there's some "true value" that the p-value is an 
estimate of.  That would seem to be the only way to make sense of a 
confidence interval in the usual sense.  So what is the "true value" 
that the p-value is estimating?  The expected p-value?  Under the null 
hypothesis, the p-value is uniform [0,1], so we know its expectation. 
The alternative is probably a composite hypothesis, so you have to 
define the expected p-value conditionally (or be a Bayesian).

I guess that by bootstrapping, you're trying to non-parametrically 
estimate the distribution of your data, and then simulate the sampling 
distribution of your statistic given that estimate of the data's 
distribution.  Then you're distilling that sampling distribution down to 
a measure of "error" for the p-value.  So what is it you're trying to 
quantify with that measure of "error"?  Ordinarily, for example, a large 
standard error for a statistic is "bad".  Under the null hypothesis, the 
p-value has as unpredictable distribution as you can get on [0,1], so is 
that somehow bad?  I don't think so.  And presumably, the closer you get 
to the null, the more uniform the sampling distribution of the p-value is.

But p-values aren't designed to have a small SE or MSE, they're designed 
to give a high power under some alternative hypothesis, given a 
specified false rejection rate under the null.  I'm not sure you can 
reconcile those two things.

Also, what you're doing seems somehow related to something called 
"observed power", which many people claim is completely bogus.
0
Reply Peter 12/8/2010 9:36:32 PM

Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <idotp0$h5h$2@fred.mathworks.com>...
> On 12/7/2010 10:54 AM, the cyclist wrote:
> > When I calculate a p-value of, say, a t-test on two sample
> > distributions, that p-value is a statistic, and that statistic has error
> > associated with it. (Just like the mean of a distribution has associated
> > error, the "standard error of the mean".)
> >
> > Does anyone know how to calculate this error? I have tried resampling
> > techniques, which is how I would normally go about estimating a
> > confidence interval for things like medians
> 
> Cyclist, the following is not very helpful, and a bit rambling.  But I'm 
> not sure what you're really asking for.
> 
> I agree that a p-value is a random quantity, and even a "statistic" 
> (it's a function of the random data).  But it seems like by comparing it 
> to the mean, and by asking for a confidence interval, you're making the 
> implicit statement that there's some "true value" that the p-value is an 
> estimate of.  That would seem to be the only way to make sense of a 
> confidence interval in the usual sense.  So what is the "true value" 
> that the p-value is estimating?  The expected p-value?  Under the null 
> hypothesis, the p-value is uniform [0,1], so we know its expectation. 
> The alternative is probably a composite hypothesis, so you have to 
> define the expected p-value conditionally (or be a Bayesian).
> 
> I guess that by bootstrapping, you're trying to non-parametrically 
> estimate the distribution of your data, and then simulate the sampling 
> distribution of your statistic given that estimate of the data's 
> distribution.  Then you're distilling that sampling distribution down to 
> a measure of "error" for the p-value.  So what is it you're trying to 
> quantify with that measure of "error"?  Ordinarily, for example, a large 
> standard error for a statistic is "bad".  Under the null hypothesis, the 
> p-value has as unpredictable distribution as you can get on [0,1], so is 
> that somehow bad?  I don't think so.  And presumably, the closer you get 
> to the null, the more uniform the sampling distribution of the p-value is.
> 
> But p-values aren't designed to have a small SE or MSE, they're designed 
> to give a high power under some alternative hypothesis, given a 
> specified false rejection rate under the null.  I'm not sure you can 
> reconcile those two things.
> 
> Also, what you're doing seems somehow related to something called 
> "observed power", which many people claim is completely bogus.

That was not so rambling, and definitely had the nuggets of truth I needed.  I think you are spot-on about the fact that there is no "true" p-value, and I think that is the heart of the matter.  Also, I had not realized the fact that p is uniform [0,1] under the null hypothesis (although it is obvious using the retrospectoscope!).

Thanks so much for the substantive reply.

the cyclist
0
Reply the 12/8/2010 10:08:05 PM

3 Replies
309 Views

(page loaded in 0.039 seconds)

Similiar Articles:













7/25/2012 11:30:06 PM


Reply: