On Mon, 28 Feb 2011 07:05:24 -0800 (PST), DawnBoyer
<dawndboyer@gmail.com> wrote:
>I was told recently that SPSS does NOT calculate degrees of freedom
>correctly for the number of f tests for Stepwise Regression.
>
>I searching the Internet, it seems several researchers somewhat agree,
>but in the journals, not one provides a solution to the issue.
Maybe you would be able to quote what someone is agreeing
on. Every stat-package I have used has computed the d.f. for
stepwise in the same way as SPSS, which gives tests that
are "nominally correct" but do not, as everyone agrees, give
a proper compensation for what is (almost always) capitalization
on chance. That is -- if your variables do enter, using stepwise,
in exactly the order that you would have specified by hypothesis,
then the p-values, as separate, distinct, p-values, are correct.
Stepwise is a lousy technique for most purposes. The problem
with p-values might be less serious than the inclusion of
random, non-useful variables; and the encouragement of
improper inferences.
>
>I did find a few allusions to df with formulas, but wanted to check
>with you as you note you are an expert, but didn't find the solution
>on your website (yet).
>
>According to what I understand, I need to calculate df for Multiple
>Regression by the following for the (assuming 3) Independent
>Variables:
>
>30 sets of data, 5 sets of independent variables, using IV#1 as
>criterion, IV#2 as included in the Regression Model, IV#3 and IV#4 as
>Predictive Variables:
>
> df1 = N - 1 df2 = (N - 1) - 1 - 1 - 1 =
> df1 = 20 - 1 df2 = (19) - 1 -1 -1 = 26
>
That is mainly unintelligible to me. I expect that "19" is a typo for
"29" for df2, since the math works out to give 26. But "N" seems
to be stated as 20 for df1... which does not exist in the problem
statement, where I see 30 cases. Or do you say 30 and 26 are
typos?
>So my written answer would be:
>
>At alpha .05, F(4,19) = 1.870, p > .05, R2 change = .143; the results
>did not support the unique predictive quality of IV#3 and IV#4 as a
>block to the model.
>
There is one conservative approach to testing for stepwise that
says, Use the number for numerator d.f. equal to the count of the
variables that were offered for entry. I've never heard of that
being implemented in a program.
It is a better idea never to use stepwise regression if you are
interested in hypothesis testing. The proper exploratory usage
generally employs massive amounts of replication and crossvalidation
(and thus, samples that are 5 or 25 times larger than the
conventional guideline of 10 or more cases per variable).
>
>Would you have the ability to kindly steer me in the right direction?
>I appreciate your assistance in answering this question if you have
>the time and capability!
Here is a reference to Frank Harrell's comments on stepwise.
http://www.stata.com/support/faqs/stat/stepwise.html
which he originally posted to one of the sci.stat.* groups.
His book on logistic regression is a good general reference,
too.
--
Rich Ulrich
|