Take_HomeQuiz3.pdf

NCSU ST512
TAKE HOME QUIZ
SUM 2 2011
1. A friend has asked you for help in fitting a line for a class project. He has collected data for
runners world record times on ten distances for men and women. He needs to fit a single linear
trend over all data points to show the relationship between time and distance, ignoring gender
of the runner. The world records on ten distances (outdoor running) are listed for men and
women in Table below. The records were taken from the website of the International
Association of Athletics Federation (IAAF), http://www.iaaf.org on July 23, 2011.
The marathon is a long-distance running event with an official distance of 42.195 kilometers (26
miles and 385 yards)
Male
Distance (m)
time(sec)
Female
date
time(sec)
date
100
9.77 14/06/2005
10.49 16/07/1988
200
19.32 01/08/1996
21.34 29/09/1988
400
43.18 26/08/1999
47.60 06/10/1985
800
101.11 24/08/1997
113.28 26/07/1983
1500
206.00 14/07/1998
230.46 11/09/1993
3000
440.67 01/09/1996
486.11 13/09/1993
5000
757.35 31/05/2004
864.53 03/06/2006
10000
1577.53 26/08/2005
1771.78 08/09/1993
21097.5
3535.00 15/01/2006
4004.00 15/01/1999
42195
7495.00 28/09/2003
8125.00 13/04/2003
As requested, your run a simple linear regression on this dataset and present your friend with
the results.
a)
Write the estimated Simple linear regression equation and test whether the linear
regression coefficient is significantly different from 0.
b)
As learned in class, you run a lack of fit test on this data to ensure that the linear
fitting is adequate.
Test the hypothesis that a higher degree polynomial may be needed.
Based on the lack of fit test, you decided that linear trend is fine, and prepared a plot of the
observed records and linear trend.
c)
Include the plot of predicted vs distance. Still, a look at the residual should not be
bad idea, present a plot of residual against predicted and discuss the linear regression
fitting. Does the plot of predicted against distance adjust well to data? Does the
residual plot show whether the linear fitting was adequate?
July 23, 2011
1
NCSU ST512
d)
TAKE HOME QUIZ
SUM 2 2011
After looking at the residuals, you decided to try a power function for this
relationship. This power function is expressed as a linear function after a log
transformation for both x- and y-variables,as shown next,
time  M  distanceb1
log10  time   log10  M   b1  log  distance 
y  bo  b1 x
where
bo  log10  M 
y  log10  time 
x  log10  distance 
e)
Write the power function equation estimated for this data.
What are the estimated
values for M and b1.
f)
Compute the predicted mean for distance 100 meters,
distance of 1000 meters
time1000 .
time100 .
Repeat computation for a
Note that
time x new   10 y 
where y  bo  b1  xnew
g)
Find the
time 1000
. Interpret b1.
time 100
h)
Estimate the mean record time for a distance of 25 km. Calculate the 95%confidence
interval for this predicted mean.
i)
Plot of residuals against the predicted values is presented below. Discuss whether a
separate fitting is needed for male and females.
July 23, 2011
2
NCSU ST512
TAKE HOME QUIZ
SUM 2 2011
Residual plot
Residual
0.06
0.05
0.04
0.03
0.02
0.01
0.00
-0.01
-0.02
-0.03
-0.04
-0.05
-0.06
1
2
3
4
Predicted Value of LOG_TIME
F
gender
M
2. The following data presents the results of a study of the effect of ambient temperature and liquid
viscosity on the amount of energy (joules/sec) honeybees spend while drinking. Temperature levels
were 20 and 30C. Levels of liquid viscosity refer to the percent of Sucrose in total solids dissolved in
liquid. There were two levels for Sucrose, 20% and 40%. Each of the 4 combinations of temperature
and viscosity were repeated three times in controlled conditions, randomly assigning the bees to
each of the four experimental groups.
The following variables were used in the analysis to simplify calculations:
Temperature  25
5
Sucrose  30
X2
10
X 3  X 1 X 2
X1 
Sucrose X 2
30
1
Temperature X 1
Note 20
1
30
1
40
1
Temperature Sucrose
X3
20
20
1
20
40
1
30
20
1
30
40
1
Data.
July 23, 2011
Obs
i
temperature
sucrose
rep
energy
x1
x2
x3
1
20
20
1
3.1
-1
-1
1
2
20
20
2
3.7
-1
-1
1
3
NCSU ST512
TAKE HOME QUIZ
SUM 2 2011
Obs
i
temperature
sucrose
rep
energy
x1
x2
x3
3
20
20
3
4.7
-1
-1
1
4
20
40
1
5.5
-1
1
-1
5
20
40
2
6.7
-1
1
-1
6
20
40
3
7.3
-1
1
-1
7
30
20
1
6.0
1
-1
-1
8
30
20
2
6.9
1
-1
-1
9
30
20
3
7.5
1
-1
-1
10
30
40
1
11.5
1
1
1
11
30
40
2
12.9
1
1
1
12
30
40
3
13.4
1
1
1
a) The following regression model was fit to study the effect of temperature, sucrose and their
interaction on the amount of energy spent.
y j  o  1 X 1  2 X 2  3 X 3  e j
j  1,...,12
b) Test the following hypothesis
H o : o  1   2  3  0
H1 : not all i  0
,
i  1, 2,3
c) Write the estimated regression equation (need to replace each regression coefficient by is
estimated value).
d) Write the test hypothesis for each parameter, and conclusion.
e) How much is the change in energy when the temperature increases from 20 to 40 and the
viscosity is 20%?
Table of means for each experimental group is presented next.
Group mean
Temperature
20
30
Mean
20
3.8
6.8
5.3
Sucrose
40
6.5
12.6
9.6
Mean
5.2
9.7
7.4
f)
Show that the predicted mean for Temperature 20 when sucrose is at its average value is equal
to the observed mean for this temperature over both sucrose levels.
g) Show that the predicted mean for Sucrose = 40 when temperature is at its average value is
equal to the observed mean for that sucrose over both temperature levels.
July 23, 2011
4
NCSU ST512
TAKE HOME QUIZ
SUM 2 2011
h) Show that predicted value for Temperature =20 and Sucrose = 30 is equal to the observed mean
for the corresponding experimental group.
i) Use the following graph to explain the significance of X3.
Energy against temperature
by sucrose level
energy
mean
14
14
12
12
10
10
8
8
6
6
4
4
2
2
0
0
20
21
22
23
24
25
26
27
28
29
30
temperature
July 23, 2011
sucrose
20
40
sucrose
20
40
5