ST512 Exercises Final Exam SSII 2010 I. use multiple regression to fit the model Y = beta_0 + beta_1 X1 + beta_2 X2 + beta_3 X3 + e and here is part of the PROC REG output, including the Type I (sequential) sums of squares Analysis of Variance Source Model Error Corrected Total DF (_3__) ( 6_ ) ( 9_ ) Sum of Squares 23.14600 4.69000 27.83600 Mean Square 7.71533 0.78167 F Value 9.87 Pr > F 0.0098 Parameter Estimates Variable Intercept x1 x2 x3 Parameter Estimate 14.20813 -0.39313 -0.62438 0.43750 DF 1 1 1 1 Standard Error 3.03548 (0.3954) 0.29491 0.31903 t Value 4.68 ***** -2.12 1.37 Pr > |t| 0.0034 ****** 0.0786 0.2193 Type I SS 806.40400 (19.10412) 2.57188 1.47000 The X'X matrix and its inverse are X'X = 10 25 75 10 25 145 100 65 75 100 665 35 10 65 35 37.68 inverse of X'X = 11.79 -1.41 -1.14 -1.41 0.20 0.13 -1.14 0.13 0.11 0.36 -0.10 -0.03 0.36 (-0.10) -0.03 0.13 s.e.(x1) = sqrt(0.20*0.78167) = 0.3954 Type I SS (x1) = Model SS – = 23.14600 – (A) Fill in the 6 blanks in in calculating elements (B) Type I SS (x2) - Type I SS (x3) 2.57188 – 1.47000 = 19.10412 the display above. Do not use the inverse of X'X of X'X - it will not be accurate enough. Calculate F tests for these hypotheses. In each case H is just "not H0." F = 1 H0: beta_2=0. F = _(-2.12)2 = 4.4944_ H0: beta_2=beta_3=0 F = [(2.57188+1.470000)/2]/ 0.78167 = 2.58 H0: beta_1+beta_2+beta_3=0 , beta_2-beta_3 =0, and beta_1-beta_2=0 Set of these three null hypothesis is equivalent to H0: beta_1 = beta_2 = beta_3=0 9.87 (Model F) ST512 Exercises Final Exam SSII 2010 (C) Compute the standard error of b1- 2*b2 where b1 and b2 are the estimates of beta_1 and beta_2. var(b1-2*b2) = [0 1 -2 0] (X’X)-1 [0, 1 , -2 , 0] AV(b)A’ 11.79 1.41 1.14 0.36 0 1.41 0.20 0.13 0.10 1 0 1 2 0 1.14 0.13 0.11 0.03 2 0.36 0.10 0.03 0.13 0 0 1 1.41 2*(1.14) 0.20 2*(0.13) 0.13 2*(0.11) 0.10 2*(0.03) 2 0 0 1 0.88 0.06 0.09 0.04 0.06 2(0.09) 0.12 2 0 se b1 2b2 0.12 0.3464 II. I used 4 drugs, each on three patients. In addition to the response variable Y,I have these dummy variables X1, X2, and X3 in my dataset. Note that there is no dummy variable for drug C. drug A A A B B B C C C D D D X1: drug A; X2: drug B; DF 1 1 1 1 X2 X3 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 X3: drug D I ran PROC REG; MODEL Y = X1 Variable Intercept X1 X2 X3 X1 X2 X3; getting this partial output: Parameter Estimates Parameter Standard Estimate Error 11.00000 1.47196 9.00000 2.08167 6.00000 2.08167 8.00000 2.08167 t Value 7.47 4.32 2.88 3.84 Pr > |t| <.0001 0.0025 0.0204 0.0049 (A) Give the sample average response for drug A 20.00 and for drug C 11.00 Since regression in dummy variables, with coefficient for drug C set up to be zero, intercept is y C , and parameter estimates (for X1, X2, and X3) correspond to differences y i y C sample average for A = 11.00 + 9.00 = 20.00 sample average for C = 11.00 = 11.00 (B) Give a t test for testing the null hypothesis that drugs B and D have the same mean response. t = -0.9607 H o : B. D. or equivalently, H o : B. D. 0 , H1 : B . D . H1 : B . D . 0 Note that var y y 2v ar y 2 MSE 2.081672 i. j . i. r 2 ST512 Exercises Final Exam t y B. y D. 0 2.08167 2 11 6 11 8 0 2 0.960767 2 2.08167 2.08167 (C) Find, if possible, the error mean square, MSE= 6.50 for this regression. 2.081672 var y i. y c. 2v ar y i. 2 MSE , r = 3 r MSE 3 2.08167 2 6.50 2 (D) I create a t statistic whose numerator is the mean response for drug A minus the average of the B and C means. Complete this formula for the standard error of this linear combination of means (i.e.the denominator of t ) std error = sqrt[(2)MSE] where, as always, MSE is the error mean square. Contrast: Mean_A – average(Mean_B and Mean_C) Contrast Ĉ1 = 1*mean_A-(1/2)*mean_B-(1/2)*mean_C+0*Mean_D var( Ĉ1 )= MSE 02 12 1 2 2 1 2 2 0 MSE 1 6 MSE r 3 3 4 2 SSII 2010 ST512 Exercises Final Exam SSII 2010 III. An experiment is done to investigate the effects of factor S=salinity and factor T=water temperature on growth of a certain type of underwater plant. Three equally spaced levels of each factor are investigated in a factorial treatment arrangement. Each replicate of the experiment uses 9 containers, one for each treatment combination. The 9 treatments are assigned at random to the containers, each of which has been stocked with young plants. After 1 month the plants are harvested and growth measured. The equipment is cleaned out, new plants inserted and the whole process is repeated the next month and the month after for a total of 3 replicates (REP). Note replicates on time: Design is a Randomized Complete Block Design 1. How many observations do we have in this experiment? 3*3*3 = 27 2. Compute the error degrees of freedom resulting from each of these analyses (SAS code is given for clarity - note the CLASS statements). Note CLASS statement identifies factor in Anova setting with dummy variables 0/1 being created. a. Factorial ANOVA with interaction, no blocks. PROC GLM; CLASS S T; MODEL YIELD = S T S*T; error df= 3*3*3 – 1-2-2-4 = 18 b. Factorial ANOVA with blocks, no interaction. PROC GLM; CLASS REP S T; MODEL YIELD=REP S T; error df= = 3*3*3–1 -2-2-2 = 20 c. Linear regression in S and T with blocks, no interaction. PROC GLM; CLASS REP; MODEL YIELD=REP S T; error df = 3*3*3–1 -2-1-1 = 22 d. Linear regression in S and T with (linear by linear) interaction, blocks. PROC GLM; MODEL YIELD = S T S*T; error df= 3*3*3–1 -1-1-1 = 23 e. Full quadratic surface in S and T, no blocks. PROC GLM; MODEL YIELD = S S*S T T*T T*S; error df= 3*3*3–1 -1-1-1-1-1 = 21 3. Some of the models in question 2 include block effects (REP)and some don't. Based on the description of the experiment, which is appropriate? (include REP, it was replicated in blocks of time) 4. Here are some treatment totals. You may want to recall that the orthogonal polynomial coefficients for a factor at 3 equally spaced levels are Linear: -1 0 1 Quadratic: -1 2 -1 4 no ST512 Exercises Final Exam SSII 2010 Treatment totals: 1 S 2 3 T 1 2 3 --------------------| 50 44 46 | | | | 45 35 40 | | | | 35 21 24 | --------------------130 100 110 140 120 80 340 Compute the sum of squares for linear effect of T within level 1 of S Q = (-1)*50+0*44+1*46 = -4 SS(Q) = (Q)2/(r*SUM_coef_squared) 2 4 SS Q 2.6667 2 3 1 02 12 2.6667 Compute the sum of squares for the T linear by S linear interaction 4.0833 Each of the above sums of squares is associated with a contrast in the 9 totals in our table. Are these 2 contrasts orthogonal? (yes, no) S1T1 S1T2 S1T3 S2T1 S2T2 S2T3 S3T1 S3T2 S3T3 50 44 46 45 35 40 35 21 24 Tlin in S1 -1 0 1 0 0 0 0 0 0 Slinear -1 -1 -1 0 0 0 1 1 1 TLinear -1 0 1 -1 0 1 -1 0 1 SlinxTlin 1 0 -1 0 0 0 -1 0 1 QSlin*Tlin = (1)*50+0*44+(-1)*46+0*45+0*35+0*40+(-1)*35+0*21+1*24 = -7 SS(QSlin*Tlin) = (QSlin*Tlin)2/(r*SUM_coef_squared) SS QSlin Tlin 7 49 4.0833 3 1 0 1 0 02 02 12 02 12 3 4 2 2 2 2 2 “Tlinear within S1” and “Tlinear” are not orthogonal since Sum of the product of their coefficients is not 0 (-1)*(-1) + (0)*(0)+(1)*(1) = 2 “Tlinear within S1” and “Tlinear x Slinear” are not orthogonal since Sum of the product of their coefficients is not 0 (-1)*(1) + (0)*(0)+(1)*(-1) = -2 5 ST512 IV. Xl X2 X3 Exercises Final Exam SSII 2010 A multiple regression equation Yt + Beta0 + Betal Xlt + Beta2 X2t + Beta3 X3t is fit to some data. We obtain: Sum of Squares df Type I Type II + : .03 l l80 40 : -1 : .05 l 90 l00 (X'X) : : .04 l 50 50 : : .01 Error 20 .l2 300 .l0 + et .05 .04 .30 .20 .20 .48 + .0l : : .l2 : : .l0 : : .0l : + + Give, if possible, the computed F statistic for testing: (a) H0: Beta2 = Beta3 = 0 F = [(90+50)/2]/(300/20)=4.6667 (b) H0: Beta2 = 0 F = [(100)/1]/(300/20) = 6.6667 V. A factorial experiment has quantitative factors A at 3 equally spaced levels and B at 4 equally spaced levels. The 10 replications are in blocks. Here are the totals for A and B, each being a total of 10 original observations: b0 bl b2 b3 Polynomials +--------------------------+ a0 : 530 : 700 : 720 : 850 : :------:-----:-----:-------: al : 4l0 : 500 : 620 : 700 : :------:-----:-----:-------: a2 : 400 : 470 : 500 : 650 : +--------------------------+ BLOCK SS = Linear Orthogonal 4 levels 3 levels +--------------------------: -3 -l l 3 : : -l 0 l : 5000 Total SS = 28000 Compute the sums of squares for: a0b0 a0b1 a0b2 a0b3 a1b0 a1b1 a1b2 a1b3 a2b0 a2b1 a2b2 a2b3 totals AL BL 530 700 720 850 410 500 620 700 400 470 500 650 Q div SS -1 -3 -1 -1 -1 1 -1 3 0 -3 0 -1 0 1 0 3 1 -3 1 -1 1 1 1 3 -780 2750 7605 13504.46 3 -200 8*10 10*14* 3 =168 10*14* 2 =112 ALxBL 3 1 -1 -3 0 0 0 0 -3 -1 1 142.8571 r=10 Contrasts Q QAL (-1)*530+ (-1) *700+ (-1)*720+ (-1)*850+ (0)*410+ (0)*500+ (0)*620+ (0)*700+ (1)*400+ (1)*470+ (1)*500+ (1)*650 = -780 QBL (-3)*530+ (-1) *700+ (1)*720+ (3)*850+ (-3)*410+ (-1)*500+ (1)*620+ (3)*700+ (-3)*400+ (-1)*470+ (1)*500+ (3)*650 = 2750 QALxBL (3)*530+ (1) *700+ (-1)*720+ (-3)*850+ (0)*410+ (0)*500+ (0)*620+ (0)*700+ (-3)*400+ (-1)*470+ (1)*500+ (3)*650 = -200 6 ST512 Exercises Final Exam SSII 2010 Sum of Squares SS (AL) = (QAL^2)/div (-780)^2/(8*10) = 7605 SS (BL) = (QBL^2)/div 2750)^2/(10*14*3) = 13504.46 SS (ALxBL) = (QALxBL^2)/div -200)^2/(10*14*2) = 142.8571 SS AL = A linear 7605 BL = B linear 13504.46 AL x BL 142.8571 Treatment SS = (530^2+700^2+720^2+850^2+410^2+500^2+620^2+700^2+400^2+470^2+500^2+650^2)/(10)(530+700+720+850+410+500+620+700+400+470+500+650)^2/(10*3*4) [1] 21582.5 Or, means a0 A1 a2 mean b0 53 41 40 44.67 b1 70 50 47 55.67 b2 72 62 50 61.33 b3 85 70 65 73.33 mean 70.00 55.75 50.50 58.75 Treatment SS = 10*( (53-58.75)^2 + (70-58.75)^2 + (72 -58.75)^2 + (85-58.75)^2 + (41 -58.75)^2 + (50-58.75)^2 + (70 -58.75)^2 + (40-58.75)^2 +(47-58.75)^2 +(50-58.75)^2 +(65 -58.75)^2 ) = 21582.5 + (62-58.75)^2 Error SS = Total SS – Block Ss – Treatment SS = 28000 – 5000 – 21582.5 = 1417.5 Source BLOCK TREATMENTS A AL (Alinear) B BL (Blinear) A*B ALxBL ERROR TOTAL DF 10-1 = 9 3*4-1=11 3-1 = 2 1 4–1 = 3 1 (3-1)*(4-1) = 6 1 (10-1)*(3*4-1) = 9*11= 99 10*3*4-1= 119 TREATMENT MS: SS MS F 21582.5 1962.045 137.0317 7605 7605 13504.46 13504.46 142.8571 142.8571 1417.5 14.31818 21582.5/11 = 1962.045 14.31818 Error MS: 1417.5/99 = Treatment F: 1962.045/14.31818 = 137.0317 () The test statistic which test the null hypothesis that nothing other than the above effects is needed to describe the effects of A and B, F = 7 hypothesis MS/ Error MS = 41.2725/14.31818 = 2.88 ST512 Exercises Final Exam Want to test if the variation due to treatments ( main effect of A with 2 df, main effect of B with 3 df and interaction effects with 6 df) is solely due to the linear effect of A (1 df) and the linear effect of B (1 df) and their A_linear by B_linear effect (1 df), i.e., whether the remaining effects are jointly all equal 0 Full model with 9 df Full Model SS = 21582.5 Reduced model with 3 df ( AL, BL, ALxBL) Reduced Model SS = 7605 + 13504.46+ 142.8571= 21252.32 Full Model SS – Reduced Model SS = 21582.5- 21252.32 = 330.18 with 11-3 = 8 df Hypothesis MS = (21582.5- 21252.32)/8 = 41.2725 Hypothesis F = hypothesis MS/ Error MS = 41.2725/14.31818 = 2.88 () How many numerator 8 denominator 99 degrees of freedom for F? 8 SSII 2010 ST512 Exercises Final Exam VI. Eight trees are selected at random and from each tree 12 identical boards are cut for a total of 96 boards. Breaking strengths of the boards are measured. From each tree, three randomly selected boards are broken while dry at temperature 40 degrees F, three are broken wet at 40 degrees F, three dry at 90 degrees F and three wet at 90 degrees F. Let factor A be temperature, B be tree, and C be wetness. Label as random or fixed. NOTE: Factor A is (random, fixed) Fixed Factor B is (random, fixed) Random Factor C is (random, fixed) Fixed Wetness is not selected at random from a normal population. In a factorial experiment, the experimenter would try to hit the same wetness (saturation level) each time, but might not be successful. Still, half the numbers would be close to 0 and half close to, say, 60% saturation so we would expect effects that are bimodal, not normal looking. Also, wetness (like drug dosage, fertilizer level, etc.) would likely be analyzed as a "regression" type variable, that is, extension to other levels of wetness would be done by running a (linear) regression rather than by describing the variance of some normal population. These "regression" factors are considered fixed. Who would be interested in these results? Why was the experiment done? Someone interested in breaking strength of wood in construction applications might realize that items he constructs might be in warm or cold, wet or dry environments and hence would be interested in these (fixed) effects. Wood will come from various trees and tree-to-tree variation is likely of interest, but not the comparison of one tree to another since these were randomly selected (likely all of one variety - e.g. knotty pine). Anyone interested in this experiment will not want results restricted to these trees and extrapolation by regression (on tree number ??) would obviously not be the correct way to extrapolate (in contrast to wetness factor). Trees are thus random. VII. Trucks deliver loads of corn to a breakfast cereal company. Several trucks were selected at random and from each, six sample jars of corn (from six randomly selected locations in the truck bed) were taken. Three of these (randomly selected) were refrigerated and the other three left at room temperature. Call this factor “storage”. After a period of time, an aflatoxin measurement Y was made on each jar. The company wants to compare these two storage methods in terms of the effect on aflatoxin. How many trucks _____ were selected? One (there is a total of 6 jars stored) There is only one fixed effect here. Which is it? STORAGE 9 SSII 2010 ST512 Exercises Final Exam SSII 2010 VIII. A 2x3 factorial set of treatments, namely 2 wheat varieties (A,B) and 3 levels of fertilizer (1, 2, or 3 fertilizer applications) were set up for research . The experiment was done by selecting 5 farms at random in North Carolina, laying out 6 plots per farm and assigning the 6 treatment combinations to the six plots randomly within each farm. a. Present the layout of the experiment. Design is a Randomized complete block design Factors - Blocking factor: Farm, recognizing differences (soil, management, fertility, …) between farms across North Carolina - Random Treatment Factors o Varieties: 2 (A, B) Fixed o Fertilizer Applications: 3 (1, 2, 3) Fixed - Plots (Exp. Unit) nested within each farm - Number of treatment combinations: 2 x 3 = 6 - Number of blocks (repetitions) 5 - Total number of observations 2 x 3 x 5 = 30 In any given farm A3 B3 A1 B1 B1 10 A2 Random ST512 Exercises Final Exam SSII 2010 b. ANOVA table: SOURCES and DF. Indicate whether factors are random or fixed. Linear Model - Yijk i j ij dk eijk 2 Variety is a Fixed effect i 0 j 0 2 3 i 1 - 3 Fertilizer is a Fixed effect i 1 - Variety*Fertilizer is a Fixed effect i 1 j 1 ij 0 - Farm is a Random effect , all six treatments present in each farm - Plot is a Random effect, nested on farm and treatment combination - eijk and dk are independent random effects ANOVA Table Source E(MS) BLOCK (FARMS) 5-1 = 4 TREATMENTS 2*3-1=5 Variety (V) Fertilizer (F) V*F 2-1 = 1 3–1 = 2 (2-1)*(3-1) = 2 ERROR TOTAL 11 DF (5-1)*(2*3-1) = 4*5= 20 5*2*3-1= 29 2 2 3 2farm 2 5 3 2 5 2 2 5 2 2 i i 2 1 2 j j 3 1 i, j 2 ij 2 1 3 1 dk ~ N 0, 2farm eijk ~ N 0, 2 ,
© Copyright 2025 Paperzz