Model selection in Proc Mixed

  • Follow


Hello,
 I have a dataset and a set of apriori models and I am going to use
model selection and AIC to rank the models.  My models have fixed and
random effects.  I have two random class variables, year and unit, and
a suite of continuous variables.  Below is a simplified sample
dataset.  One thing I have to consider is that some, but not all
experimental units were sampled each year.
 From research with SAS so far, I have found that the default
estimator used in proc mixed is REML, and that REML only considers the
random effects.  Since the formula that calculates each AIC value
includes a bias correction term based on the number of parameters, it
seems that the REML method would be inappropriate for models including
fixed effects.    In order to consider the fixed effects, I need to
specify the ML method.    I have found that the ML method counts each
unique observation in a class variable as a separate parameter.  For
example each year is counted as a separate parameter in the model.
This would seem to inflate the bias correction term for AIC, as it
uses the number of parameters for the calculation.  I would welcome
any suggestions on the best way to proceed with this analysis.  I am
wondering whether or not SAS is the best environment to perform model
selection, and I plan on calculating AIC values manually as a check.
Any recommendations or insight on how best to proceed with this
analysis are welcome.

Thanks



y	year	unit	x3	x4	x5	x6
43	2005	A	23	37	19	7
34	2005	B	14	48	28	31
50	2005	C	19	24	48	48
4	2005	D	47	9	46	20
28	2005	E	37	36	6	12
7	2005	F	9	27	22	19
40	2005	G	31	9	15	32
45	2006	A	17	4	29	6
24	2006	C	29	23	7	38
37	2006	D	9	26	34	32
18	2006	F	11	45	50	18
18	2006	G	27	10	16	42
17	2007	B	6	34	7	29
49	2007	C	14	2	17	26
27	2007	D	12	13	31	46
18	2007	E	4	22	46	44
28	2007	F	50	45	5	16
5	2007	G	47	23	16	16
22	2007	H	29	5	29	36
40	2007	I	9	45	15	32
0
Reply wyldsoul (25) 3/22/2010 5:45:14 PM

On Mar 22, 1:45=A0pm, wyldsoul <wylds...@gmail.com> wrote:
> Hello,
> =A0I have a dataset and a set of apriori models and I am going to use
> model selection and AIC to rank the models. =A0My models have fixed and
> random effects. =A0I have two random class variables, year and unit, and
> a suite of continuous variables. =A0Below is a simplified sample
> dataset. =A0One thing I have to consider is that some, but not all
> experimental units were sampled each year.
> =A0From research with SAS so far, I have found that the default
> estimator used in proc mixed is REML, and that REML only considers the
> random effects. =A0Since the formula that calculates each AIC value
> includes a bias correction term based on the number of parameters, it
> seems that the REML method would be inappropriate for models including
> fixed effects. =A0 =A0In order to consider the fixed effects, I need to
> specify the ML method. =A0 =A0I have found that the ML method counts each
> unique observation in a class variable as a separate parameter. =A0For
> example each year is counted as a separate parameter in the model.
> This would seem to inflate the bias correction term for AIC, as it
> uses the number of parameters for the calculation. =A0I would welcome
> any suggestions on the best way to proceed with this analysis. =A0I am
> wondering whether or not SAS is the best environment to perform model
> selection, and I plan on calculating AIC values manually as a check.
> Any recommendations or insight on how best to proceed with this
> analysis are welcome.
>
> Thanks
>
> y =A0 =A0 =A0 year =A0 =A0unit =A0 =A0x3 =A0 =A0 =A0x4 =A0 =A0 =A0x5 =A0 =
=A0 =A0x6
> 43 =A0 =A0 =A02005 =A0 =A0A =A0 =A0 =A0 23 =A0 =A0 =A037 =A0 =A0 =A019 =
=A0 =A0 =A07
> 34 =A0 =A0 =A02005 =A0 =A0B =A0 =A0 =A0 14 =A0 =A0 =A048 =A0 =A0 =A028 =
=A0 =A0 =A031
> 50 =A0 =A0 =A02005 =A0 =A0C =A0 =A0 =A0 19 =A0 =A0 =A024 =A0 =A0 =A048 =
=A0 =A0 =A048
> 4 =A0 =A0 =A0 2005 =A0 =A0D =A0 =A0 =A0 47 =A0 =A0 =A09 =A0 =A0 =A0 46 =
=A0 =A0 =A020
> 28 =A0 =A0 =A02005 =A0 =A0E =A0 =A0 =A0 37 =A0 =A0 =A036 =A0 =A0 =A06 =A0=
 =A0 =A0 12
> 7 =A0 =A0 =A0 2005 =A0 =A0F =A0 =A0 =A0 9 =A0 =A0 =A0 27 =A0 =A0 =A022 =
=A0 =A0 =A019
> 40 =A0 =A0 =A02005 =A0 =A0G =A0 =A0 =A0 31 =A0 =A0 =A09 =A0 =A0 =A0 15 =
=A0 =A0 =A032
> 45 =A0 =A0 =A02006 =A0 =A0A =A0 =A0 =A0 17 =A0 =A0 =A04 =A0 =A0 =A0 29 =
=A0 =A0 =A06
> 24 =A0 =A0 =A02006 =A0 =A0C =A0 =A0 =A0 29 =A0 =A0 =A023 =A0 =A0 =A07 =A0=
 =A0 =A0 38
> 37 =A0 =A0 =A02006 =A0 =A0D =A0 =A0 =A0 9 =A0 =A0 =A0 26 =A0 =A0 =A034 =
=A0 =A0 =A032
> 18 =A0 =A0 =A02006 =A0 =A0F =A0 =A0 =A0 11 =A0 =A0 =A045 =A0 =A0 =A050 =
=A0 =A0 =A018
> 18 =A0 =A0 =A02006 =A0 =A0G =A0 =A0 =A0 27 =A0 =A0 =A010 =A0 =A0 =A016 =
=A0 =A0 =A042
> 17 =A0 =A0 =A02007 =A0 =A0B =A0 =A0 =A0 6 =A0 =A0 =A0 34 =A0 =A0 =A07 =A0=
 =A0 =A0 29
> 49 =A0 =A0 =A02007 =A0 =A0C =A0 =A0 =A0 14 =A0 =A0 =A02 =A0 =A0 =A0 17 =
=A0 =A0 =A026
> 27 =A0 =A0 =A02007 =A0 =A0D =A0 =A0 =A0 12 =A0 =A0 =A013 =A0 =A0 =A031 =
=A0 =A0 =A046
> 18 =A0 =A0 =A02007 =A0 =A0E =A0 =A0 =A0 4 =A0 =A0 =A0 22 =A0 =A0 =A046 =
=A0 =A0 =A044
> 28 =A0 =A0 =A02007 =A0 =A0F =A0 =A0 =A0 50 =A0 =A0 =A045 =A0 =A0 =A05 =A0=
 =A0 =A0 16
> 5 =A0 =A0 =A0 2007 =A0 =A0G =A0 =A0 =A0 47 =A0 =A0 =A023 =A0 =A0 =A016 =
=A0 =A0 =A016
> 22 =A0 =A0 =A02007 =A0 =A0H =A0 =A0 =A0 29 =A0 =A0 =A05 =A0 =A0 =A0 29 =
=A0 =A0 =A036
> 40 =A0 =A0 =A02007 =A0 =A0I =A0 =A0 =A0 9 =A0 =A0 =A0 45 =A0 =A0 =A015 =
=A0 =A0 =A032

I'm no expert here on Proc MIXED, but it seems like you are taking the
right approach.  Yes, each level of a class variable, minus one,
should be used as a parameter (K) to estimate AIC (-2LL + 2K) from
your ML or LL output, and then AICc that has a further correction to
prevent overfitting models to data. I think you should calculate your
own AICc values in Excel or whatever other program you use - don't
just trust SAS to give you what you think you are getting.

I recall that the bigger issue i had with Proc MIXED (or PHREG) was
with the estimate of sample size used for calculating AIC.  At least
with family-group data in PHREG, i recall that I was not satisfied
with what SAS estimated as a sample size - I thought it was too
liberal - given those semiparametric and partial-likelihood models, I
used a conservative estimate of sample size as the number of mortality
events.  Maybe my memory fails me for Proc MIXED - can someone epxlain
how sample size is calcualted in Proc MIXED?  Is it adequately
parsimonius?  i think I used the number of individual animals as an
estimate of sample size for calcuating AICc from ML (or LL) given by
Proc MIXED.  thanks.  Shawn
0
Reply Shawn 3/23/2010 3:54:13 PM


1 Replies
340 Views

(page loaded in 1.989 seconds)

Similiar Articles:













7/23/2012 12:52:21 PM


Reply: