Hello,
I have a dataset and a set of apriori models and I am going to use
model selection and AIC to rank the models. My models have fixed and
random effects. I have two random class variables, year and unit, and
a suite of continuous variables. Below is a simplified sample
dataset. One thing I have to consider is that some, but not all
experimental units were sampled each year.
From research with SAS so far, I have found that the default
estimator used in proc mixed is REML, and that REML only considers the
random effects. Since the formula that calculates each AIC value
includes a bias correction term based on the number of parameters, it
seems that the REML method would be inappropriate for models including
fixed effects. In order to consider the fixed effects, I need to
specify the ML method. I have found that the ML method counts each
unique observation in a class variable as a separate parameter. For
example each year is counted as a separate parameter in the model.
This would seem to inflate the bias correction term for AIC, as it
uses the number of parameters for the calculation. I would welcome
any suggestions on the best way to proceed with this analysis. I am
wondering whether or not SAS is the best environment to perform model
selection, and I plan on calculating AIC values manually as a check.
Any recommendations or insight on how best to proceed with this
analysis are welcome.
Thanks
y year unit x3 x4 x5 x6
43 2005 A 23 37 19 7
34 2005 B 14 48 28 31
50 2005 C 19 24 48 48
4 2005 D 47 9 46 20
28 2005 E 37 36 6 12
7 2005 F 9 27 22 19
40 2005 G 31 9 15 32
45 2006 A 17 4 29 6
24 2006 C 29 23 7 38
37 2006 D 9 26 34 32
18 2006 F 11 45 50 18
18 2006 G 27 10 16 42
17 2007 B 6 34 7 29
49 2007 C 14 2 17 26
27 2007 D 12 13 31 46
18 2007 E 4 22 46 44
28 2007 F 50 45 5 16
5 2007 G 47 23 16 16
22 2007 H 29 5 29 36
40 2007 I 9 45 15 32
|
|
0
|
|
|
|
Reply
|
wyldsoul (25)
|
3/22/2010 5:45:14 PM |
|
On Mar 22, 1:45=A0pm, wyldsoul <wylds...@gmail.com> wrote:
> Hello,
> =A0I have a dataset and a set of apriori models and I am going to use
> model selection and AIC to rank the models. =A0My models have fixed and
> random effects. =A0I have two random class variables, year and unit, and
> a suite of continuous variables. =A0Below is a simplified sample
> dataset. =A0One thing I have to consider is that some, but not all
> experimental units were sampled each year.
> =A0From research with SAS so far, I have found that the default
> estimator used in proc mixed is REML, and that REML only considers the
> random effects. =A0Since the formula that calculates each AIC value
> includes a bias correction term based on the number of parameters, it
> seems that the REML method would be inappropriate for models including
> fixed effects. =A0 =A0In order to consider the fixed effects, I need to
> specify the ML method. =A0 =A0I have found that the ML method counts each
> unique observation in a class variable as a separate parameter. =A0For
> example each year is counted as a separate parameter in the model.
> This would seem to inflate the bias correction term for AIC, as it
> uses the number of parameters for the calculation. =A0I would welcome
> any suggestions on the best way to proceed with this analysis. =A0I am
> wondering whether or not SAS is the best environment to perform model
> selection, and I plan on calculating AIC values manually as a check.
> Any recommendations or insight on how best to proceed with this
> analysis are welcome.
>
> Thanks
>
> y =A0 =A0 =A0 year =A0 =A0unit =A0 =A0x3 =A0 =A0 =A0x4 =A0 =A0 =A0x5 =A0 =
=A0 =A0x6
> 43 =A0 =A0 =A02005 =A0 =A0A =A0 =A0 =A0 23 =A0 =A0 =A037 =A0 =A0 =A019 =
=A0 =A0 =A07
> 34 =A0 =A0 =A02005 =A0 =A0B =A0 =A0 =A0 14 =A0 =A0 =A048 =A0 =A0 =A028 =
=A0 =A0 =A031
> 50 =A0 =A0 =A02005 =A0 =A0C =A0 =A0 =A0 19 =A0 =A0 =A024 =A0 =A0 =A048 =
=A0 =A0 =A048
> 4 =A0 =A0 =A0 2005 =A0 =A0D =A0 =A0 =A0 47 =A0 =A0 =A09 =A0 =A0 =A0 46 =
=A0 =A0 =A020
> 28 =A0 =A0 =A02005 =A0 =A0E =A0 =A0 =A0 37 =A0 =A0 =A036 =A0 =A0 =A06 =A0=
=A0 =A0 12
> 7 =A0 =A0 =A0 2005 =A0 =A0F =A0 =A0 =A0 9 =A0 =A0 =A0 27 =A0 =A0 =A022 =
=A0 =A0 =A019
> 40 =A0 =A0 =A02005 =A0 =A0G =A0 =A0 =A0 31 =A0 =A0 =A09 =A0 =A0 =A0 15 =
=A0 =A0 =A032
> 45 =A0 =A0 =A02006 =A0 =A0A =A0 =A0 =A0 17 =A0 =A0 =A04 =A0 =A0 =A0 29 =
=A0 =A0 =A06
> 24 =A0 =A0 =A02006 =A0 =A0C =A0 =A0 =A0 29 =A0 =A0 =A023 =A0 =A0 =A07 =A0=
=A0 =A0 38
> 37 =A0 =A0 =A02006 =A0 =A0D =A0 =A0 =A0 9 =A0 =A0 =A0 26 =A0 =A0 =A034 =
=A0 =A0 =A032
> 18 =A0 =A0 =A02006 =A0 =A0F =A0 =A0 =A0 11 =A0 =A0 =A045 =A0 =A0 =A050 =
=A0 =A0 =A018
> 18 =A0 =A0 =A02006 =A0 =A0G =A0 =A0 =A0 27 =A0 =A0 =A010 =A0 =A0 =A016 =
=A0 =A0 =A042
> 17 =A0 =A0 =A02007 =A0 =A0B =A0 =A0 =A0 6 =A0 =A0 =A0 34 =A0 =A0 =A07 =A0=
=A0 =A0 29
> 49 =A0 =A0 =A02007 =A0 =A0C =A0 =A0 =A0 14 =A0 =A0 =A02 =A0 =A0 =A0 17 =
=A0 =A0 =A026
> 27 =A0 =A0 =A02007 =A0 =A0D =A0 =A0 =A0 12 =A0 =A0 =A013 =A0 =A0 =A031 =
=A0 =A0 =A046
> 18 =A0 =A0 =A02007 =A0 =A0E =A0 =A0 =A0 4 =A0 =A0 =A0 22 =A0 =A0 =A046 =
=A0 =A0 =A044
> 28 =A0 =A0 =A02007 =A0 =A0F =A0 =A0 =A0 50 =A0 =A0 =A045 =A0 =A0 =A05 =A0=
=A0 =A0 16
> 5 =A0 =A0 =A0 2007 =A0 =A0G =A0 =A0 =A0 47 =A0 =A0 =A023 =A0 =A0 =A016 =
=A0 =A0 =A016
> 22 =A0 =A0 =A02007 =A0 =A0H =A0 =A0 =A0 29 =A0 =A0 =A05 =A0 =A0 =A0 29 =
=A0 =A0 =A036
> 40 =A0 =A0 =A02007 =A0 =A0I =A0 =A0 =A0 9 =A0 =A0 =A0 45 =A0 =A0 =A015 =
=A0 =A0 =A032
I'm no expert here on Proc MIXED, but it seems like you are taking the
right approach. Yes, each level of a class variable, minus one,
should be used as a parameter (K) to estimate AIC (-2LL + 2K) from
your ML or LL output, and then AICc that has a further correction to
prevent overfitting models to data. I think you should calculate your
own AICc values in Excel or whatever other program you use - don't
just trust SAS to give you what you think you are getting.
I recall that the bigger issue i had with Proc MIXED (or PHREG) was
with the estimate of sample size used for calculating AIC. At least
with family-group data in PHREG, i recall that I was not satisfied
with what SAS estimated as a sample size - I thought it was too
liberal - given those semiparametric and partial-likelihood models, I
used a conservative estimate of sample size as the number of mortality
events. Maybe my memory fails me for Proc MIXED - can someone epxlain
how sample size is calcualted in Proc MIXED? Is it adequately
parsimonius? i think I used the number of individual animals as an
estimate of sample size for calcuating AICc from ML (or LL) given by
Proc MIXED. thanks. Shawn
|
|
0
|
|
|
|
Reply
|
Shawn
|
3/23/2010 3:54:13 PM
|
|
|
1 Replies
340 Views
(page loaded in 1.989 seconds)
|