Let's say I'm interested in point estimates for record (dairy
lactations) traits A, B, and C (these are (correlated) parameters for
a function estimating the shape of the lactation curve and were
estimated for each cow based on 4-10 sample points of daily milk
My hypothesis is that A, B, and C are influenced by:
Region and season within region (where region and season have 4
MILK underlying production level (we=92ll call this continuous for now)
DO pregnancy (using a continuous measure of time-not-pregnant as a
depressive effects of climate or pregnancy influence
'high' producers differently than 'low' producers
DIM the length of the record used in estimating A, B, and C (also
continuous for now)
TD1 the timing of the first sample point used to estimate A, B, and C
herd (VERY large number of classes)
lactation number (current plan is to deal with this with separate
analysis for 3 levels)
The desirable endpoint is to have a table of estimates for A,B, and C
by lactation, region, and season (so, 3 x 16 estimates) adjusted to
TD1 =3D 15, DIM =3D 305 with adjustment factors for DO and MILK.
The dataset is VERY large, and completely unbalanced.
I could use ABSORB to deal with the herd effect, but this doesn=92t seem
to get me to the desired endpoints (i.e. point estimates)
An alternative suggested is to create bootstrap samples for subsets of
herds, and run the GLM model on the subsamples without including herd
as an effect. Then use the distribution of results to generate the
1) Opinions on this approach?
2) How should I go about choosing subsample size and number?
||12/9/2009 7:38:49 PM
See related articles to this posting
From off list:
>>Your bootstrap samples should account for the design of your study. If HERD might affect A, B, and C, your bootstrapping method should account for herd rather than ignore it.
Absolutely Herd is expected to affect the dependents A, B, and C. The
affect of herd (in this case, however) is not of any interest; I just
need it washed out (as ABSORB might be used).
>>Your description is unclear about the dependent variable(s) in your PROC GLM analyses: Are they A, B, and C? Or MILK? Or what?
Yes, A, B, and C are the dependents.
>>If the dependent variables are A, B, and C, and if you're interested in 48 estimates (3 lactation groups X 4 regions X 4 seasons) for A, B, and C, then you might perform 48 PROC GLMs
>>(multivariate multiple regressions) using a BY-statement for these lactation groups, regions, and seasons:
by lactation region season herd;
ods output ParameterEstimates=Parms;
by lactation region season;
model a b c=milk do milk*do td1 dim / solution;
ods listing close;
I don't see how this gets me a table of As, Bs, and Cs for the various
lactation/region/season. LSMEANS for lactation/region/season are what
I'm after, I think, and you can't get that with absorb (or by using by
groups this way I think)
The code suggested to me is something like this:
*log transform dependents;
*so, lna = log(a) etc.;
class region season;
model lna = td1 td1*td1 dim do do*milk milk region season(region)
lsmeans milk*season(region)/at (do dim td1) = (70 305 15)
lsmeans milk*season(region)/at (do dim td1) = (150 305 15)
lsmeans milk*season(region)/at (do dim td1) = (240 305 15)
lsmeans milk*season(region)/at (do dim td1) = (305 305 15)
ods output ParameterEstimates = param lsmeans = ls;
*then back transform the estimates and errors;
w = exp(stderr*stderr);
mu = exp(LSMean)*sqrt(w);
stderrMU = exp(2*LSMean)*w*(w-1);
12/11/2009 2:29:24 PM
11/30/2013 12:19:17 PM
page loaded in 27859 ms. (0)