bootstrapping for glm models with large unbalanced datasets

  • Follow


Let's say I'm interested in point estimates for record (dairy
lactations) traits A, B, and C (these are (correlated) parameters for
a function estimating the shape of the lactation curve and were
estimated for each cow based on 4-10 sample points of daily milk
production)

My hypothesis is that A, B, and C are influenced by:
Region and season within region (where region and season have 4
classes each)
MILK	underlying production level (we=92ll call this continuous for now)
DO	pregnancy (using a continuous measure of time-not-pregnant as a
proxy)
MILK*season(region)
MILK*DO
                depressive effects of climate or pregnancy influence
'high' producers differently than 'low' producers
DIM	the length of the record used in estimating A, B, and C (also
continuous for now)
TD1	the timing of the first sample point used to estimate A, B, and C
(continuous)
herd (VERY large number of classes)
lactation number (current plan is to deal with this with separate
analysis for 3 levels)
The desirable endpoint is to have a table of estimates for A,B, and C
by lactation, region, and season (so, 3 x 16 estimates) adjusted to
TD1 =3D 15, DIM =3D 305 with adjustment factors for DO and MILK.
The dataset is VERY large, and completely unbalanced.
I could use ABSORB to deal with the herd effect, but this doesn=92t seem
to get me to the desired endpoints (i.e. point estimates)

An alternative suggested is to create bootstrap samples for subsets of
herds, and run the GLM model on the subsamples without including herd
as an effect. Then use the distribution of results to generate the
estimates.
1)	Opinions on this approach?
2)	How should I go about choosing subsample size and number?
0
Reply asanders (32) 12/9/2009 7:38:49 PM

From off list:

>>Your bootstrap samples should account for the design of your study.  If HERD might affect A, B, and C, your bootstrapping method should account for herd rather than ignore it.
Absolutely Herd is expected to affect the dependents A, B, and C. The
affect of herd (in this case, however) is not of any interest; I just
need it washed out (as ABSORB might be used).

>>Your description is unclear about the dependent variable(s) in your PROC GLM analyses:  Are they A, B, and C?  Or MILK?  Or what?
Yes, A, B, and C are the dependents.

>>If the dependent variables are A, B, and C, and if you're interested in 48 estimates (3 lactation groups X 4 regions X 4 seasons) for A, B, and C, then you might perform 48 PROC GLMs
>>(multivariate multiple regressions) using a BY-statement for these lactation groups, regions, and seasons:
      proc sort;
        by lactation region season herd;
      run;

      ods listing;
     ods output ParameterEstimates=Parms;
      proc glm;
        by lactation region season;
        absorb herd;
        model a b c=milk do milk*do td1 dim / solution;
      run;
      quit;
      ods listing close;

I don't see how this gets me a table of As, Bs, and Cs for the various
lactation/region/season. LSMEANS for lactation/region/season are what
I'm after, I think, and you can't get that with absorb (or by using by
groups this way I think)

The code suggested to me is something like this:

*log transform dependents;
*so, lna = log(a) etc.;

*then...;

proc glm;
by lactation;
class region season;
model lna = td1 td1*td1 dim do do*milk milk region season(region)
milk*season(region) /solution;
lsmeans  milk*season(region)/at (do dim td1) = (70 305 15)
stderr;*early pregnant;
lsmeans  milk*season(region)/at (do dim td1) = (150 305 15)
stderr;*avg pregnant;
lsmeans  milk*season(region)/at (do dim td1) = (240 305 15)
stderr;*late pregnant;
lsmeans  milk*season(region)/at (do dim td1) = (305 305 15)
stderr;*not pregnant;
ods output ParameterEstimates = param lsmeans = ls;
run;

*then back transform the estimates and errors;
data ls;
set ls;
w = exp(stderr*stderr);
mu = exp(LSMean)*sqrt(w);
stderrMU = exp(2*LSMean)*w*(w-1);
run;
0
Reply A 12/11/2009 2:29:24 PM


1 Replies
455 Views

(page loaded in 0.059 seconds)

Similiar Articles:










7/26/2012 1:06:51 PM


Reply: