BlockShapeSize.doc

ST 524
Homework 4
NCSU - Fall 2007
Due: 10/17/07
Question 1.
Data set “uniftrialdata.xls” presents yields of a uniformity trial on winter wheat (simulated data). Unit
size (1.5m wide × 4.5m long) plots are distributed in a 6 columns × 48 rows, for a total of 288 plots (size
X = 1). Interest is in exploring the relationship between plot size and the variance among plots (in unit
basis). There are four variables that identify plots according to their size: plot2, plot4, plot8 and plot16,
where the plot sizes are 2,4,6,8, and 16 units.
1.
Following the approach presented in Swallow and H a nested analysis of variance on yield is
presented that will be used in the estimation of VX, variance among plots of size x, expressed in
unitary basis.
proc glm data=b3(where=(plot1<49 and col <7));
class
plot1 plot2 plot4 plot8 plot16 ;
model newyield = plot16 plot8(plot16) plot4(plot8*plot16) plot2(plot4*plot8*plot16) ;
random plot16 plot8(plot16) plot4(plot8*plot16) plot2(plot4*plot8*plot16) /test;
output out=outglm r=resid student=sres p=pred;
run;
The GLM Procedure
Dependent Variable: newyield
Source
DF
Sum of
Squares
Mean Square
F Value
Pr > F
Model
143
209919.5978
1467.9692
2.40
<.0001
Error
144
87907.8423
610.4711
Corrected Total
287
297827.4402
R-Square
Coeff Var
Root MSE
newyield Mean
0.704836
6.033877
24.70771
409.4832
Source
plot16
plot8(plot16)
plot4(plot8*plot16)
plot(plot*plot*plot)
DF
17
18
36
72
Type I SS
50233.17637
34266.59997
62756.55429
62663.26720
Mean Square
2954.89273
1903.70000
1743.23762
870.32316
F Value
4.84
3.12
2.86
1.43
Pr > F
<.0001
<.0001
<.0001
0.0370
Source
plot16
plot8(plot16)
plot4(plot8*plot16)
DF
17
18
36
Type III SS
50233.17637
34266.59997
62756.55429
Mean Square
2954.89273
1903.70000
1743.23762
F Value
4.84
3.12
2.86
Pr > F
<.0001
<.0001
<.0001
plot(plot*plot*plot)
72
62663.26720
870.32316
1.43
0.0370
Expected Mean Squares
Source
Type III Expected Mean Square
plot16
Var(Error) + 2 Var(plot(plot*plot*plot)) + 4 Var(plot4(plot8*plot16)) + 8
Var(plot8(plot16)) + 16 Var(plot16)
plot8(plot16)
Var(Error) + 2 Var(plot(plot*plot*plot)) + 4 Var(plot4(plot8*plot16)) + 8
Var(plot8(plot16))
plot4(plot8*plot16)
Var(Error) + 2 Var(plot(plot*plot*plot)) + 4 Var(plot4(plot8*plot16))
plot(plot*plot*plot)
Var(Error) + 2 Var(plot(plot*plot*plot))
Tuesday October 9, 2007 Homework 5
1
ST 524
Homework 4
NCSU - Fall 2007
Due: 10/17/07
Plot size
MS
VX
1
610.4711
V1 = 610.4711
2
870.32316
V2 
1743.23762
4
V4 
1903.70000
8
V8 
2954.89273
16
V16 
870.32316  610.4711
= 129.9261
2
1743.23762  870.32316 
218.2286
=
4
1903.70000  1743.23762 
=
20.0578
8
 2954.89273 1903.70000 
=
65.69954
16
Variance components may be obtained directly with PROC MIXED,
proc mixed data=b3(where=(plot1<49 and col<7));
class
plot1 plot2 plot4 plot8 plot16 ;
model newyield= / outp=predds ;
random plot16 plot8(plot16) plot4(plot8*plot16) plot2(plot4*plot8*plot16) ;
run;
Variance components estimates from PROC MIXED
Plot Size
Estimate
1
610.47
2
129.93
4
218.23
8
20.0578
16
65.6995
log Vx   6.0354  0.9127log  X 
The Mixed Procedure
Covariance Parameter Estimates
Cov Parm
plot16
plot8(plot16)
plot4(plot8*plot16)
plot(plot*plot*plot)
Residual
2.
Estimate
65.6995
20.0578
218.23
129.93
610.47
Size
16
8
4
16
1
Next, a regression of Vx on X, in a log scale, is used to get a raw estimate of the coefficient of soil
heterogeneity b, Smith’s b.
The REG Procedure
Model: MODEL1
Tuesday October 9, 2007 Homework 5
2
ST 524
Homework 4
NCSU - Fall 2007
Due: 10/17/07
Dependent Variable: log_vx
Analysis of Variance
DF
Sum of
Squares
Mean
Square
1
3
4
4.00265
2.56906
6.57171
4.00265
0.85635
Root MSE
Dependent Mean
Coeff Var
0.92539
4.77010
19.39988
Source
Model
Error
Corrected Total
R-Square
Adj R-Sq
F Value
Pr > F
4.67
0.1194
0.6091
0.4788
Parameter Estimates
Variable
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
Intercept
1
6.03543
0.71681
8.42
0.0035
log_x
1
-0.91274
0.42218
-2.16
0.1194
Regression equation:
log Vx   6.0354  0.9127log  X 
Smith’s b = 0.9127
Values closer to 1 indicates increasing homogeneity of the soil. A plot size between 2 and 8 seems
adequate since for X=16 the variance among plots of size 16 is greater.
3. Additionally, we can analyze the residuals for the plot size X = 1, X = 8 and see whether the
use of a larger plot reduces the residual variation.
Check residual distribution on field

*** fit just an intercept in the model
yij     ij ,
coordinates of each plot, i = 1, 2, . . ., 48
is the yield in (i, j) plot,

where i and j are the
row and j = 1,2,3,4,5,6 column,
is the overall mean, and
 ij is
yij
the residual value
in (i, j) plot.
proc glm data = newtrial;
model newyield =
;
output out = outglm r = resid student = sres p = pred;
run;
The GLM Procedure
Dependent Variable: newyield
Sum of
Source
DF
Squares
Mean Square
Model
1
Tuesday October 9, 2007 Homework 5
48290839.62
48290839.62
F Value
Pr > F
46535.2
<.0001
3
ST 524
Homework 4
NCSU - Fall 2007
Due: 10/17/07
Error
287
297827.44
Uncorrected Total
288
48588667.06
1037.73
R-Square
Coeff Var
Root MSE
newyield Mean
0.000000
7.866930
32.21376
409.4832
Source
Intercept
DF
Type I SS
Mean Square
F Value
Pr > F
1
48290839.62
48290839.62
46535.2
<.0001
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
Intercept
409.4832432
1.89821396
215.72
<.0001
*** residual plot on the field ***;
Residual plot

Standardized Residual plot
*** graph a contour plot for residuals on the field ***;
proc g3grid data=outglm out=out2;
grid row*col = sres ;
run;
proc gcontour data=out2;
plot row*col=sres/ levels= -4 -3 -2 -1 0 1 2 3 4;* pattern join;
run;
Tuesday October 9, 2007 Homework 5
4
ST 524
Homework 4
Tuesday October 9, 2007 Homework 5
NCSU - Fall 2007
Due: 10/17/07
5
ST 524
Homework 4
NCSU - Fall 2007
Due: 10/17/07
Question 2
Dylan B. Keon and Patricia S. Muir. Growth of Usnea longissima Across a Variety of Habitats in the Oregon Coast Range.
The Bryologist. Vol 105, No. 2, pp 233-242
Abstract. The sensitive lichen Usnea longissima Ach. has a limited, patchy distribution across forested landscapes in the
U.S. Pacific Northwest. To gain insight into whether the current distribution within the Oregon Coast Range has resulted
from a lack of suitable habitat or from dispersal limitations, we measured growth of U. longissima transplants placed in
four habitats. Transplant study site locations and habitats were determined through an accompanying study that identified
significant U. longissima habitat characteristics, based on the present distribution of the species, and used predictive
modeling to identify areas of apparently suitable habitat within the study area. Transplants were placed in 12 sites,
comprised of three replicates of the four habitats. Ninety transplants were placed in each habitat (n = 360). Growth was
measured as changes in biomass and length after one year. Transplants grew in all habitats, particularly in sites where
habitat was predicted to be least suitable for U. longissima. Although transplants in those sites had mean biomass
increases that were 2.7 to 4.6 times greater than those of transplants placed in the other three habitats, their overall rate
of attrition was 1.5 to 1.8 times higher than transplants in the other three habitats. Increases in length were also greatest
in sites where habitat was predicted to be least suitable. The fact that the transplants grew well in all habitats and actually
thrived in sites where habitat was predicted to be least suitable indicates that dispersal limitations may play a more
significant role than the availability of suitable habitat in determining the distribution of U. longissima in the Oregon Coast
Range. These findings underscore the importance of green tree retention during timber harvests. Trees containing U.
longissima should be retained so that they may inoculate the regenerating stand with U. longissima fragments. It is also
recommended that stands harboring significant populations of U. longissima (typically old stands) be preserved as source
locations of this dispersal-limited species.
Habitat is a Fixed effects factor
Additive Linear Model
Yijk     i   ij  ijk
 i Habitat Fixed effect, i = 1, 2, 3, 4.
 ij , Site Random effect, j = 1, 2, 3.
 ijk , Individuals Random effect k = 1, 2, …, 30
1. “This is an Observational Study”. Argue in favor or against this statement.
2. Analyze the following statement. What do you consider the reason(s) that we use a nested
ANOVA to analyze the data? Were the individual transplants independent?
Tuesday October 9, 2007 Homework 5
6
ST 524
Homework 4
NCSU - Fall 2007
Due: 10/17/07
1. Write down null hypotheses to be tested.
2. Write down the analysis of variance table with Sources of Variation, corresponding
degrees of freedom, and Expected Mean Squares column.
3.
Initially there were thirty 30 individual transplants within each site, but some were discarded as
the experiment progressed. Compare degrees of freedom for ANOVA Table in question 2 with
Table 3, below. Note that a separate Error term is specified for testing of Habitat effect, as a result
of missing observations. What should be the degrees of freedom of this F test denominator, if
there were no missing observations?
4.
Use Table 3 to write down conclusions. Refer to the hypotheses being tested. Do we have an
estimate of the variation among sites? What about the variation among individual transplants
subject to similar conditions (within same site and habitat)?
Read the following paragraph. Would you consider necessary any changes?
5.
Tuesday October 9, 2007 Homework 5
7
ST 524
Homework 4
NCSU - Fall 2007
Due: 10/17/07
6. To analyze the quality of the data, we may pay attention to the following two paragraphs. They
explain how the discard and retention of transplants process was carried and may give information
about any bias in the data. Comment on the quality of data.
Surviving Transplants II
Surviving Transplants I
Missing observations were completely missing or they
did not qualify as “surviving transplants”
6.
7.
ssss
Tuesday October 9, 2007 Homework 5
8