COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### Construct confidence interval for a p-value

• Follow

```When I calculate a p-value of, say, a t-test on two sample distributions, that p-value is a statistic, and that statistic has error associated with it.  (Just like the mean of a distribution has associated error, the "standard error of the mean".)

Does anyone know how to calculate this error?  I have tried resampling techniques, which is how I would normally go about estimating a confidence interval for things like medians (and other percentiles), which do not always have analytical formulas.  [See my resampling code below, where I do a t-test on two normal distributions, slightly offset from each other.]

The resulting resampled distribution is very wide, and would give huge error for the p-value.  I am a bit baffled about what (if anything!) is wrong in this approach.  Or maybe I have a simple coding error.

Any insights will be deeply appreciated.

the cyclist

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [] = confidenceIntervalForPValue()

% Set RNG seed for replicability
stream = RandStream('mt19937ar','Seed',1);
RandStream.setDefaultStream(stream);

% Number in each sample
N = 500;

% How much to offset one random normal from the other,
% to simulate an "effect"
EFFECT_SIZE = 0.1;

% Number of times to resample the original sample, for
% constructing the confidence interval
NRESAMPLE = 20000;

% Generate two random samples.  The second one is offset
% from the first, to simulate an effect

X1 = randn(N,1);
X2 = randn(N,1) + EFFECT_SIZE;

% Perform t-test, and obtain p-value.  [Don't care about hypothesis test.]
[~, p] = ttest2(X1,X2)

% Generate random values for indexing (with replacement) into
% the original sample
resampleIndexIntoX1 = randi(N,[N,NRESAMPLE]);
resampleIndexIntoX2 = randi(N,[N,NRESAMPLE]);

% Create the resamples
resampleX1 = X1(resampleIndexIntoX1);
resampleX2 = X2(resampleIndexIntoX2);

% Perform t-test on resamples, and get resampled p-values
[~, resample_p] = ttest2(resampleX1,resampleX2);

% Plot distribution of p-values
figure
hist(resample_p,0.01:0.01:1);
set(gca,'XLim',[0 1])
ylim = get(gca,'YLim');
legend('Distribution of resampled p-values')
hp = line([p p],ylim);
set(hp,'Color','r')
ht=text(p+0.02,ylim(2)/2,'p-value of original sample')
set(ht,'Color','r')

% Get mean and percentiles of resample, to estimate CI of original sample
mean_resampled_p  = mean(resample_p)
ci_lo_resampled_p = prctile(resample_p,2.5)
ci_hi_resampled_p = prctile(resample_p,97.5)

end
```
 0

```On 12/7/2010 10:54 AM, the cyclist wrote:
> When I calculate a p-value of, say, a t-test on two sample
> distributions, that p-value is a statistic, and that statistic has error
> associated with it. (Just like the mean of a distribution has associated
> error, the "standard error of the mean".)
>
> Does anyone know how to calculate this error? I have tried resampling
> techniques, which is how I would normally go about estimating a
> confidence interval for things like medians

Cyclist, the following is not very helpful, and a bit rambling.  But I'm
not sure what you're really asking for.

I agree that a p-value is a random quantity, and even a "statistic"
(it's a function of the random data).  But it seems like by comparing it
to the mean, and by asking for a confidence interval, you're making the
implicit statement that there's some "true value" that the p-value is an
estimate of.  That would seem to be the only way to make sense of a
confidence interval in the usual sense.  So what is the "true value"
that the p-value is estimating?  The expected p-value?  Under the null
hypothesis, the p-value is uniform [0,1], so we know its expectation.
The alternative is probably a composite hypothesis, so you have to
define the expected p-value conditionally (or be a Bayesian).

I guess that by bootstrapping, you're trying to non-parametrically
estimate the distribution of your data, and then simulate the sampling
distribution of your statistic given that estimate of the data's
distribution.  Then you're distilling that sampling distribution down to
a measure of "error" for the p-value.  So what is it you're trying to
quantify with that measure of "error"?  Ordinarily, for example, a large
standard error for a statistic is "bad".  Under the null hypothesis, the
p-value has as unpredictable distribution as you can get on [0,1], so is
that somehow bad?  I don't think so.  And presumably, the closer you get
to the null, the more uniform the sampling distribution of the p-value is.

But p-values aren't designed to have a small SE or MSE, they're designed
to give a high power under some alternative hypothesis, given a
specified false rejection rate under the null.  I'm not sure you can
reconcile those two things.

Also, what you're doing seems somehow related to something called
"observed power", which many people claim is completely bogus.
```
 0

```On 12/7/2010 10:54 AM, the cyclist wrote:
> When I calculate a p-value of, say, a t-test on two sample
> distributions, that p-value is a statistic, and that statistic has error
> associated with it. (Just like the mean of a distribution has associated
> error, the "standard error of the mean".)
>
> Does anyone know how to calculate this error? I have tried resampling
> techniques, which is how I would normally go about estimating a
> confidence interval for things like medians

Cyclist, the following is not very helpful, and a bit rambling.  But I'm
not sure what you're really asking for.

I agree that a p-value is a random quantity, and even a "statistic"
(it's a function of the random data).  But it seems like by comparing it
to the mean, and by asking for a confidence interval, you're making the
implicit statement that there's some "true value" that the p-value is an
estimate of.  That would seem to be the only way to make sense of a
confidence interval in the usual sense.  So what is the "true value"
that the p-value is estimating?  The expected p-value?  Under the null
hypothesis, the p-value is uniform [0,1], so we know its expectation.
The alternative is probably a composite hypothesis, so you have to
define the expected p-value conditionally (or be a Bayesian).

I guess that by bootstrapping, you're trying to non-parametrically
estimate the distribution of your data, and then simulate the sampling
distribution of your statistic given that estimate of the data's
distribution.  Then you're distilling that sampling distribution down to
a measure of "error" for the p-value.  So what is it you're trying to
quantify with that measure of "error"?  Ordinarily, for example, a large
standard error for a statistic is "bad".  Under the null hypothesis, the
p-value has as unpredictable distribution as you can get on [0,1], so is
that somehow bad?  I don't think so.  And presumably, the closer you get
to the null, the more uniform the sampling distribution of the p-value is.

But p-values aren't designed to have a small SE or MSE, they're designed
to give a high power under some alternative hypothesis, given a
specified false rejection rate under the null.  I'm not sure you can
reconcile those two things.

Also, what you're doing seems somehow related to something called
"observed power", which many people claim is completely bogus.
```
 0

```Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <idotp0\$h5h\$2@fred.mathworks.com>...
> On 12/7/2010 10:54 AM, the cyclist wrote:
> > When I calculate a p-value of, say, a t-test on two sample
> > distributions, that p-value is a statistic, and that statistic has error
> > associated with it. (Just like the mean of a distribution has associated
> > error, the "standard error of the mean".)
> >
> > Does anyone know how to calculate this error? I have tried resampling
> > techniques, which is how I would normally go about estimating a
> > confidence interval for things like medians
>
> Cyclist, the following is not very helpful, and a bit rambling.  But I'm
> not sure what you're really asking for.
>
> I agree that a p-value is a random quantity, and even a "statistic"
> (it's a function of the random data).  But it seems like by comparing it
> to the mean, and by asking for a confidence interval, you're making the
> implicit statement that there's some "true value" that the p-value is an
> estimate of.  That would seem to be the only way to make sense of a
> confidence interval in the usual sense.  So what is the "true value"
> that the p-value is estimating?  The expected p-value?  Under the null
> hypothesis, the p-value is uniform [0,1], so we know its expectation.
> The alternative is probably a composite hypothesis, so you have to
> define the expected p-value conditionally (or be a Bayesian).
>
> I guess that by bootstrapping, you're trying to non-parametrically
> estimate the distribution of your data, and then simulate the sampling
> distribution of your statistic given that estimate of the data's
> distribution.  Then you're distilling that sampling distribution down to
> a measure of "error" for the p-value.  So what is it you're trying to
> quantify with that measure of "error"?  Ordinarily, for example, a large
> standard error for a statistic is "bad".  Under the null hypothesis, the
> p-value has as unpredictable distribution as you can get on [0,1], so is
> that somehow bad?  I don't think so.  And presumably, the closer you get
> to the null, the more uniform the sampling distribution of the p-value is.
>
> But p-values aren't designed to have a small SE or MSE, they're designed
> to give a high power under some alternative hypothesis, given a
> specified false rejection rate under the null.  I'm not sure you can
> reconcile those two things.
>
> Also, what you're doing seems somehow related to something called
> "observed power", which many people claim is completely bogus.

That was not so rambling, and definitely had the nuggets of truth I needed.  I think you are spot-on about the fact that there is no "true" p-value, and I think that is the heart of the matter.  Also, I had not realized the fact that p is uniform [0,1] under the null hypothesis (although it is obvious using the retrospectoscope!).

Thanks so much for the substantive reply.

the cyclist
```
 0

3 Replies
309 Views

Similiar Articles:

7/25/2012 11:30:06 PM