Lecture Notes for BUSINESS STATISTICS - BMGT 571 Chapters 7 through 13 Professor Ahmadi, Ph.D. Department of Management Revised May 2005 Chapter 7 Formulas SAMPLING AND SAMPLING DISTRIBUTIONS The number of different simple random samples of size n that can be selected from a finite population of size N: N! n! N - n! FINITE POPULATION INFINITE POPULATION Expected Value of x E( x ) = E( x ) = where: E( x ) = the expected value of the random variable x = the population mean x Nn N 1 n Standard Deviation of the Distribution of x Values (Standard Error of the Mean) x n Z Score Z X X where: X n Expected Value of p E( p ) = p E( p ) = p where: E( p ) = the expected value of the random variable p p = the population proportion Standard Deviation of the Distribution of p Values (Standard Error of the Proportion) p = N - n N - 1 p (1 - p) n p = p (1 - p) n Z Score Z pp p where: p = Professor Ahmadi’s Lecture Notes p (1 - p) n Page 2 Chapter 7 SAMPLING AND SAMPLING DISTRIBUTIONS Problem 1. Consider a population of four weights identical in appearance but weighing 2, 4, 6, and 8 grams. The mean () and the standard deviation () of the population can be computed to be 5 and 2.236 grams respectively. ( X ) 2 X 2 4 6 8 Samples of size two (with replacement) are drawn from this population. Show the Sampling Distribution of X . The list of all possible samples and the sample means can be shown as: Possible Samples Sample Means 2&2 2 2&4 3 2&6 4 2&8 5 4&2 3 4&4 4 4&6 5 4&8 6 6&2 4 6&4 5 6&6 6 6&8 7 8&2 5 8&4 6 8&6 7 8&8 8 The frequency of each mean can be shown as follows: Possible Sample Means 2 3 4 5 6 7 8 Professor Ahmadi’s Lecture Notes Frequency 1 2 3 4 3 2 1 Page 3 Chapter 7 SAMPLING DISTRIBUTION OF X Problem 2. The average yearly starting salary (µ) of MBA’s is $60,000 with a standard deviation () of $16,000. A random sample of 64 MBAs is selected. a. Show the sampling distribution of the sample means. b. What is the probability that the sample mean will be greater than $56,000? SAMPLING DISTRIBUTION OF P Problem 3. Twenty percent of the students at UTC are business majors. A random sample of 100 students is selected. a. Show the sampling distribution of the sample proportions b. What is the probability that the sample proportion (the proportion of business majors) is between 0.1 and 0.3? c. What is the probability that the sample proportion (the proportion of business majors) is more than 0.25? Professor Ahmadi’s Lecture Notes Page 4 Chapter 8 Formulas I. Interval Estimation of a Population Mean () A. B. When the standard deviation of the population is known, x Z 2 n where the standard error of the mean is x n and the margin of error = Z 2 n When the standard deviation is unknown, S x t 2 n S where the standard error of the mean is S x n S and the margin of error = t 2 n n Z 2 2 Sample Size for an Interval Estimate of a Population Mean 2 2 E where E = the desired margin of error II. Interval Estimation of a Population Proportion (P) p p Z 2 p 1 p n where the standard error of proportion is S p and the margin of error = Z 2 p 1 p n p 1 p n Sample Size for an Interval Estimate of a Population Proportion 2 * * Z 2 p 1 p n 2 E If the value of p* is not known and a good estimate of p* is not available, use p* = 0.50. Professor Ahmadi’s Lecture Notes Page 5 Chapter 8 – Interval Estimation I. Interval Estimation of a Population Mean () A. The standard deviation of the population ( ) is known: Problem 1. In order to estimate the average electric usage per month, a sample of 169 houses was selected; and their electric usage was determined. a. Assume a population standard deviation of 260-kilowatt hours. Determine the standard error of the mean. b. With a 0.90 probability, what can be said about the size of the sampling error? c. If the sample mean is 1834 KWH, what is the 90% confidence interval estimate of the population mean? B. The standard deviation of the population ( ) is unknown: Problem 2. Chattanooga Paper Company makes various types of paper products. One of their products is a 30 mils thick paper. In order to ensure that the thickness of the paper meets the 30 mils specification, random cuts of paper are selected and the thickness of each cut is measured. A sample of 256 cuts had a mean thickness of 30.3 mils with a standard deviation of 4 mils. a. Develop a 95% confidence for the thickness of the paper. b. The company considers the production in control if the thickness does not deviate from the desired 30 mils by more than + 3%. Is the production in control? Explain. Professor Ahmadi’s Lecture Notes Page 6 Problem 3. The cost of a roll of camera film (35 mm, 24 exposure) in a sample of 12 cities worldwide is shown below. City Rio de Janeiro Stockholm Tokyo Moscow Paris London New York Mexico City Sydney Honolulu Cairo Hong Kong Cost (in dollars) 12.14 7.47 6.56 5.69 5.62 5.41 4.33 4.00 3.62 3.43 3.40 2.73 a. Using Excel, compute the basic descriptive statistics (the mean, the median, the mode, the standard deviation, and the standard error of the mean) for the cost of film. b. Determine a 95% confidence interval for the population mean. II. Interval Estimation of a Population Proportion (P) Problem 4. Many people who bought Xbox gaming systems, have complained about having received defective systems. In a sample of 1200 units sold, 18 units were defective. a. Determine a 95% confidence interval for the percentage of defective systems. b. If 1.5 million Xboxes were sold, determine an interval for the number of defectives. Professor Ahmadi’s Lecture Notes Page 7 Chapter 9 Formulas I. Hypothesis Tests about a Population Mean () A. The standard deviation of the population ( ) is known: Test Statistic: Z X 0 n Decision Rule for P-Value Approach: In All Cases Reject Ho if P-Value Decision Rule for Critical Value Approach: Lower One-Tailed Test of the Form Upper One-Tailed Test of the Form Two-Tailed Test of the Form Ho: > o Ho: < o Ho: = o Ha: < o Ha: > o Ha: o Reject Ho if: Z -Z Reject Ho if: Z Z Reject Ho if: Z -Z/2 or Z Z/2 B. The standard deviation of the population ( ) is unknown: X o Test Statistic: t S n The decision rules are the same as those shown in Part A (above) with the t statistic substituted for the Z statistic. Professor Ahmadi’s Lecture Notes Page 8 CHAPTER FORMULAS (Continued) II. Hypothesis Tests about a Population Proportion (P) Test Statistic: Z p po p where p = p o (1 - p o ) n thus Z will have the form: Z p po p o (1 p o ) n Decision Rule for P-Value Approach: In All Cases Reject Ho if P-Value Decision Rule for Critical Value Approach: Lower One-Tailed Test of the Form Upper One-Tailed Test of the Form Two-Tailed Test of the Form Ho: p > po Ho: p < po Ho: p= po Ha: p < po Ha: p> po Ha: p po Reject Ho if: Z -Z Professor Ahmadi’s Lecture Notes Reject Ho if: Z Z Reject Ho if: Z -Z/2 or Z Z/2 Page 9 Chapter 9 HYPOTHESIS TESTING PROCEDURE Assume we are interested in testing whether or not the mean of the population is 70. Then the null and the alternative hypotheses can be written as: Ho: µ = 70 Ha: µ 70 Possible hypothesis-testing errors will be: SITUATION IN THE POPULATION DECISION Do not reject Ho (Conclude µ = 70) Reject Ho (Conclude µ 70) Ho is true (µ = 70) Ho is false (µ 70) Correct Decision Type II Error Type I Error Correct Decision Steps of Hypothesis Testing Step 1: Develop the null and the alternative hypotheses. Step 2:S Specify the level of significance. Step 3: Compute the test statistic (t or Z) from the sample data. Rejection Rule: p-Value Approach Step 4: Compute the p-value by using the test statistic (t or Z) from step 3. Step 5: Reject Ho if p-value . Rejection Rule: Critical Value Approach Step 4: Determine the critical value(s) of t or Z at the specified level of significance and set up the rejection rule. Step 5: Compare the test statistic from step 3 to that of the critical value(s) from step 4. If the test statistic is beyond the critical value(s), reject the null hypothesis. Professor Ahmadi’s Lecture Notes Page 10 Chapter 9 I. Hypothesis Tests about a Population Mean () A. THE STANDARD DEVIATION OF THE POPULATION IS KNOWN: Problem 1. The Chamber of Commerce of a Florida gulf coast community advertises area commercial property available at a mean cost of under $40,000 per acre. A sample of 49 properties provided a sample mean of $38,000 per acre. Assume the standard deviation of the population is known to be $7000. a. At 95% confidence, test the validity of their advertisement. Ho: Ha: Conclusion: b. Compute the p-value and interpret its meaning. Professor Ahmadi’s Lecture Notes Page 11 B. THE STANDARD DEVIATION OF THE POPULATION IS UNKNOWN: Problem 2. A soft drink filling machine, when in perfect adjustment, fills the bottles with 12 ounces of soft drink. A random sample of 64 bottles is selected, and the contents are measured. The sample yielded a mean content of 11.88 ounces with a standard deviation of 0.8 ounces. a. With a 0.05 level of significance (i.e., 95% confidence), test to see if the machine is in perfect adjustment. Ho: Ha: Conclusion: b. Compute the p-value and interpret its meaning. Professor Ahmadi’s Lecture Notes Page 12 Problem 3. Chattanooga public transportation operates a fleet of electric powered shuttle buses for downtown services. Daily mean maintenance costs have been $76 per bus. A recent random sample of 25 buses shows a sample mean maintenance cost of $83.50 per day with a sample standard deviation of $30. Management would like to determine whether or not there has been a significant increase in the mean daily maintenance cost. a. At 95% confidence, test to determine whether or not the mean cost has increased. Ho: Ha: Conclusion: b. Compute the p-value and interpret its meaning. Professor Ahmadi’s Lecture Notes Page 13 II. Hypothesis Tests about a Population Proportion (P) Problem 4. A supplier claims that more than 80% of the parts it supplies meet the product specifications. In a sample of 800 parts received, 664 met the specifications. a. At 93.7% confidence, test the supplier's claim. Ho: Ha: Conclusion: b. Compute the p-value and interpret its meaning. Professor Ahmadi’s Lecture Notes Page 14 Chapter 9 final examples: Your turn 1. For each of the following, read the t statistic from the table and write its value in the space provided. a. A two-tailed test, a sample of 31 at 80% confidence t = b. A one-tailed test (upper tail), a sample size of 22 at 99% confidence t = c. A one-tailed test (lower tail), a sample size of 16 at 95% confidence t = 2. For each of the following, read the Z statistic from the table and write its value in the space provided. a. A two-tailed test at 85.3% confidence Z = b. A one-tailed test (lower tail) at 87.7% confidence Z = c. A one-tailed test (upper tail) at 97.61% confidence Z = 3. The average dinner bill for one person in Chattanooga has been $24. It is believed there has been a significant increase in the average dinner prices. A sample of 36 dinner bills showed a mean of $27 with a standard deviation of $9. a. At 95% confidence test to determine if there has been a significant increase in the average dinner prices. Ho: Ha: Conclusion: b. Determine the p-value for the above and use it for the test Professor Ahmadi’s Lecture Notes Page 15 4. The ACT scores of a random sample of 6 UTC students are given below. Student 1 ACT Score 28 2 22 3 18 4 23 5 29 6 24 At 95% confidence test to see if the average ACT scores of UTC students is significantly different from 27. Professor Ahmadi’s Lecture Notes Page 16 CHAPTER 10 FORMULAS I. Inferences About the Difference Between Two Population Means:1 and 2 Known Point Estimator of the Difference Between the Means of Two Populations: x 1 x 2 Standard Error of x 1 x 2 (the Standard Deviation of the sampling distribution of x 1 x 2 ) x A. 1 x 2 12 22 n1 n 2 Interval Estimate of the Difference Between the Means of Two Populations ( x1 x 2 ) Z 2 12 22 n1 n 2 Margin of Error = Z 2 x B. 1 x2 Z 2 12 22 n1 n 2 Hypothesis Testing (Means), Independent Samples D0 x1 x 2 D0 Test Statistic Z x1 x 2 x x 12 22 1 2 n1 n 2 Do is the hypothesized difference between 1 and 2 . In most situations, Do = 0. Decision Rules for P-Value Approach: When Using the P-Value Approach, In All Cases Reject Ho if P-Value Decision Rules for Critical Value Approach: Lower one-tailed test of the form Ho: 1 2 D 0 Upper one-tailed test of the form Ho: 1 2 D 0 Two-tailed test of the form Ho: 1 2 = D 0 Ha: 1 2 D 0 Reject Ho if: Z -Z Ha: 1 2 D 0 Reject Ho if: Z Z Ha: 1 2 D 0 Reject Ho if: Z -Z/2 or Z Z/2 Professor Ahmadi’s Lecture Notes Page 17 CHAPTER FORMULAS (Continued) II. Inferences about the Difference Between Two Population Means: 1 and 2 Unknown A. Interval Estimate of the Difference Between the Means of Two Populations x1 x 2 t 2 B. S12 S22 n1 n 2 Hypothesis Testing (Means), Independent Samples D0 Test Statistic t x1 x 2 S12 S22 n1 n 2 The degrees of freedom for t are given by 2 S12 S22 n1 n 2 df 2 2 1 S12 1 S22 n1 1 n1 n 2 1 n 2 When computing the degrees of freedom, round to the lower integer value. Decision rules are the same as those given above for 1 and 2 Known cases, Simply substitute t for Z III. Inferences About the Difference Between Two Population Means: Matched Samples A. Interval Estimate d t B. 2 Sd n Hypothesis Test Test statistic t = d d sd Professor Ahmadi’s Lecture Notes n where Sd = (d i - d) 2 n - 1 Page 18 CHAPTER FORMULAS (Continued) IV. Analysis of Variance: Testing for the Equality of k Population Means Hypotheses to be tested: Ho: 1 = 2 = . . . = k Ha: Not all the population means are equal where j = the mean of the jth population k = the number of populations or treatments nT = Total Number of Observations The General Form of the ANOVA Table - Completely Randomized Design: Source of Variation Sum of Squares Degrees of Freedom Mean Squares Between Treatments SSTR K-1 MSTR Test Statistic F MSTR MSE Within Treatments SSE nT - K Total SST nT - 1 MSE Decision rules: When using the p-value approach, reject Ho if the p-value When using the critical value approach, reject Ho if F = Professor Ahmadi’s Lecture Notes MSTR > F MSE Page 19 CHAPTER FORMULAS (Continued) Sample Mean for Treatment j nj xj xij i 1 nj Sample Variance for Treatment j nj S2j x ij x j i 1 2 nj 1 where xij = the value of observation i for Treatment j nj = the number of observations for treatment j The Overall Sample Mean (Grand Mean) k nj x ij x j1i 1 nT n T n1 n 2 ... n k where: Mean Square due to Treatments (Between Treatments) n j x j x k MSTR SSTR k 1 where: SSTR = n j x j x k j1 2 Therefore: MSTR j1 k 1 Mean Square due to Error (Within Treatments) n j 1 S2j k MSE = SSE nT k where: SSE = n j 1 S2j k j1 SSE also can be computed as: SSE x ij x j Therefore: MSE 2 j1 nT K j i k nj SST x ij x j1i 1 Total Sum of Squares 2 or: SST = SSTR + SSE General Form of an Interval Estimate for a Population Mean s x t 2 n Professor Ahmadi’s Lecture Notes Page 20 2 I. Inferences About the Difference Between Two Population Means:1 and 2 Known A. Interval Estimate of the Difference Between the Means of Two Populations Problem 1. In order to estimate the difference between the age (in months) of computer consulting firms in the East and the West of the United States, the following information is gathered: East 40 70 5 Sample size Sample mean (months) Population Standard deviation (months) West 45 75 7 Develop an interval estimate for the difference between the average age of the firms in the East and the West. Let = 0.03. B. Hypothesis Testing (Means), Independent Samples Problem 2. Independent random samples taken at two local malls provided the following information regarding purchases of the patrons at the two malls: Hamilton Place Sample Size 80 Average purchase $43 Population Standard deviation $ 8 Northgate 75 $40 $ 6 a. Use the critical value approach and at 95% confidence test to determine whether or not there is a significant difference between the average purchases of the patrons at the two malls. Professor Ahmadi’s Lecture Notes Page 21 b. Compute the p-value and interpret its meaning. Use it to answer the question in part “a”. Professor Ahmadi’s Lecture Notes Page 22 II. Inferences about the Difference Between Two Population Means: 1 and 2 Unknown A. Interval Estimate of the Difference Between the Means of Two Populations Problem 3. In order to estimate the difference between the average daily sales of two branches of a department store, the following data has been gathered. Downtown Store North Mall Store Sample size Sample mean (in $1,000) Sample standard deviation (in $1,000) n1 = 23 days x 1 = 37 S1 = 4 n2 = 26 days x 2 = 34 S2 = 5 Develop a 95% confidence interval for the difference between the two population means. B. Hypothesis Testing (Means), Independent Samples Problem 4. Refer to Problem 3 (above) and at 95% confidence test to determine if the average daily sales of the Downtown Store (1) is significantly more than the average sales of the North Mall Store (2). Use both the critical value approach and the p-value approach. Professor Ahmadi’s Lecture Notes Page 23 III. Inferences About the Difference Between Two Population Means: Matched Samples A. Interval Estimate Problem 5. The daily production rates of a sample of workers in a factory before and after a training program are shown below: Worker 1 2 3 4 5 6 Before 6 10 10 8 7 11 After 10 13 9 11 9 12 Provide a 95% confidence interval for the difference between the mean production rates of before and after the training program. B. Hypothesis Test Problem 6. Refer to Problem 5 (above) and at 95% confidence test to see if the training program was effective. That is, did the training program actually increase the production rates? Professor Ahmadi’s Lecture Notes Page 24 IV. Analysis of Variance: Testing for the Equality of k Population Means Completely Randomized Design Problem 7. Ahmadi, Inc. uses three types of advertising (radio, newspaper, and television) in three different geographical areas. The company is interested in determining whether there is a significant difference in the effectiveness among the three different methods of advertising. Sales (in $ millions) over a six-day period for the three geographical areas are shown below: Area 1 (Radio) 48 40 36 50 51 45 Area 2 (Paper) 48 46 42 50 48 48 Area 3 (T.V.) 44 52 54 52 50 60 At 95% confidence test to determine whether there is a significant difference in the effectiveness among the three different methods of advertising. Professor Ahmadi’s Lecture Notes Page 25 Problem 8. Three universities in your state have decided to administer the same comprehensive examination to the recipients of MBA degrees from the three institutions. From each institution, a random sample of MBA recipients has been selected and given the test. The following table shows the scores of the students from each university. Northern University Central University Southern University 56 85 65 86 93 62 97 91 82 94 72 93 78 54 77 77.0 83.0 78.0 Sample Variance ( s2j ) 246.5 234.0 218.8 Sample Mean ( x j ) At = 0.01, test to see if there is any significant difference in the average scores of the students from the three universities. Note that the sample sizes are not equal. Professor Ahmadi’s Lecture Notes Page 26 Problem 9. Part of an ANOVA table involving 8 groups for a study is shown below. Source of Variation Sum of Squares Degrees of Freedom Mean Square Between Treatments 126 ? ? Within Treatments 240 ? ? Total a. b. ? F ? 67 Complete all the missing values in the above table and fill in the blanks. Use = 0.05 to determine if there is any significant difference among the means of the eight groups. Problem 10. In a completely randomized experimental design, 11 experimental units were used for each of the 4 treatments. Part of the ANOVA table is shown below. Source of Variation Sum of Squares Degrees of Freedom Mean Square Between Treatments 1500 ? ? Within Treatments ? ? ? Total a. b. F ? 5500 Fill in the blanks in the above ANOVA table. Use = 0.05 to determine if there is any significant difference among the means of the four groups. Professor Ahmadi’s Lecture Notes Page 27 CHAPTER 11 FORMULAS A. Interval Estimation of the Difference Between the Proportions of Two Populations ( P1 P 2 ) Z 2 S p p 1 B. Where Sp 2 p 1 2 p p p p 1 1 2 2 Where SP Assuming p 1 = p 2 , the pooled proportion is computed as p Sp 1 p 2 1 p2 1 1 p (1 - p) n1 n 2 n1 p1 n2 p 2 X 1 X 2 n1 n2 n1 n2 Goodness of Fit Test The Test Statistic “ ” is: 2 D. Hypothesis Test about the Difference Between the Proportions of Two Populations The Test Statistic “Z” is: Z C. p1 1 p1 p 2 1 p 2 n1 n2 k f e 2 χ i i i 1 ei 2 Test of Independence The Test Statistic “ ” is: 2 Professor Ahmadi’s Lecture Notes i j 2 f ij eij 2 eij Page 28 Chapter 11 A. Interval Estimation of the Difference Between the Proportions of Two Populations Problem 1. In a sample of 400 Democrats, 60 said that they support the president's new tax proposal. While of 500 Republicans, only 80 said they support it. Determine a 90% confidence interval estimate for the difference between the proportions of the opinions of the individuals in the two parties. B. Hypothesis Test about the Difference Between the Proportions of Two Populations Problem 2. In a sample of 600 Republicans, 480 were in favor of the President's foreign policies. While in a sample of 900 Democrats, 675 were in favor of his policies. a. At 95% confidence, test to see if there is a significant difference in the proportions of the Democrats and the Republicans who are in favor of the President's foreign policies. b. Compute the p-value and use it to test to determine if the percentage of Republicans who favored the president’s foreign policies is significantly more than the percentage of Democrats. Professor Ahmadi’s Lecture Notes Page 29 C. Goodness of Fit Test Problem 3. The AMA Journal reported the following frequencies of deaths due to cardiac arrest for each day of the week. Cardiac Death by Day of the Week Day Monday Tuesday Wednesday Thursday Friday Saturday Sunday f _ 40 17 16 29 15 20 17 At 95% confidence, determine whether the number of deaths is uniform over the week. Professor Ahmadi’s Lecture Notes Page 30 D. Test of Independence - Contingency Tables Problem 4. Dr. Ahmadi’s diet pills are supposed to cause significant weight loss. The following table shows the results of a recent study where some individuals took the diet pills and some did not. No weight loss Weight loss Total Diet pills 80 100 180 No Diet pills 20 100 120 Total 100 200 300 With 95% confidence, test to see if losing weight is dependent on taking the diet pills. Professor Ahmadi’s Lecture Notes Page 31 CHAPTER 12 FORMULAS Simple Linear Regression Model y = 0 + 1 x + Simple Linear Regression Equation E(y) = 0 + 1 x Least Squares Criterion Min y i y i 2 Estimated Simple Linear Regression Equation y b o b 1 x = the estimated value of the dependent variable y where b1 = the slope of the line b1 ( x i x )( y i y) 2 (x i x) (x SSR and b0 = the y-intercept and b o y b1 x Sum of Squares Due to Regression x )(y i y ) 2 (x i x ) i 2 Total Sum of Squares SST = y i y 2 SSE = y i y i Also: SST = SSR + SSE Sum of Squares Due to Error 2 Coefficient of Determination r2 SSR SST Professor Ahmadi’s Lecture Notes Also r 2 1 SSE SST Page 32 CHAPTER FORMULAS (Continued) Sample Correlation Coefficient r = (the sign of b1) where Coefficient of Determination =+ r2 b1 = the slope of the regression equation Mean Square Error (Estimate of 2 ) s 2 MSE SSE n-2 Standard Error of the Estimate s MSE t Test for significance of the slope of the regression equation H o : 1 0 H a : 1 0 t statistic: t b1 s b1 where s b1 (Estimated Standard Deviation of b1) is: Reject Ho if s b1 s Σ(x i x )2 t t 2 or: t t 2 (degrees of freedom = n – p – 1) Professor Ahmadi’s Lecture Notes Page 33 CHAPTER FORMULAS (Continued) F Test for Significance of the Linear Regression Model (ANOVA) H o : 1 0 (i.e., the regression model is NOT significant) H a : 1 0 (the regression model IS significant) ANOVA Table Source of Variation Sum of Squares Regression SSR Degrees of Freedom p Mean Square Test Statistic F MSR MSR MSE Error (Residual) SSE n-p-1 Total SST n-1 Where: p = Number of independent variables MSE n = The sample size Reject Ho if the Test statistic F > Critical F Confidence Interval Estimate for the Mean Value of y, that is E(yp) y p t 2 sy p Estimated Standard Deviation of y p s ŷ p s 2 1 (x p x) n ( x i x ) 2 Remember: s MSE Professor Ahmadi’s Lecture Notes Page 34 Chapter 12 Simple (Bivariate) Linear Regression and Correlation Problem 1. Ahmadi, Inc. is a microcomputer producer. The following data represent Ahmadi's yearly sales volume and their advertising expenditure over a period of 8 years. (Y) Sales Year (In $1,000,000) (X) Advertising (In $10,000) 1996 15 32 1997 16 33 1998 18 35 1999 17 34 2000 16 36 2001 19 37 2002 19 39 2003 24 42 a. b. c. d. e. f. g. h. i. j. k. Develop a scatter diagram of sales versus advertising. Use the method of least squares to compute an estimated regression line between sales and advertising. If the company's advertising expenditure is $400,000, what is the predicted sales? Give the answer in dollars. What does the slope of the estimated regression line indicate? Compute the coefficient of determination and fully interpret its meaning. Use the F test to determine whether or not the regression model is significant. Let = 0.05. Use the t test to determine whether the slope of the regression model is significant. Let = 0.05 Explain the basic assumptions about the error term in regression. Develop a 95% confidence interval for predicting the average sales for the years when $400,000 was spent on advertising. Use Excel and solve the above problems. Using Excel determine the regression equation between sales an time (where 1996 = 1). Professor Ahmadi’s Lecture Notes Page 35 CHAPTER 13 FORMULAS Multiple Regression Model y = 0 + 1x1 + 2x2 + . . . pxp + Multiple Regression Equation E(y) = 0 + 1x1 + 2x2 + . . . pxp Estimated Multiple Regression Equation ŷ = b0 + b1x1 + b2x2 + . . . + bpxp Least Squares Criterion Min y i y i 2 where Relationship among SST, SSR, and SSE SST = SSR + SSE Multiple Coefficient of Determination r2 = SSR SST Also r 2 1 SSE SST Adjusted Multiple Coefficient of Determination n 1 R a2 1 1 R 2 n p 1 Excel’s ANOVA Table ANOVA Regression Residual Total df p n-p-1 n-1 Professor Ahmadi’s Lecture Notes SS SSR SSE SST MS MSR = SSR/p MSE = SSE/(n-p-1) F Significance F F = MSR/MSE Page 36 CHAPTER FORMULAS (Continued) F Test for Overall Significance in Multiple Regression Ho: 1 2 ... p 0 (the model is not significant) Ha: One or more of the coefficients is not equal to zero (the model is significant) Test Statistic F= MSR (See Excel’s ANOVA table) MSE When using the p-value approach, reject Ho if the p-value When using the critical value approach, reject Ho if the test statistic F F where F is based on an F distribution with p numerator degrees of freedom and (n – p – 1) denominator degrees of freedom t Test for Individual Significance in Multiple Regression Ho: i 0 Ha: i 0 for any parameter i Test Statistic b t= i s bi When using the p-value approach, reject Ho if the p-value When using the critical value approach, reject Ho if the test statistic t t 2 or if t t 2 , where t 2 is based on a t distribution with (n - p – 1) degrees of freedom Professor Ahmadi’s Lecture Notes Page 37 Chapter 13 Multiple Regression and Correlation Problem 1. Ahmadi, Inc. is a microcomputer producer. The following data represent Ahmadi's yearly sales volume, their advertising expenditure, and the number of individuals in the sales force over a period of 15 years: (Y) X1 X2 X3 Sales Advertising Sales Force Time Year ($1,000,000) ($10,000) (100) 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 15 16 18 17 16 19 19 24 25 27 30 33 38 40 45 32 33 35 34 36 37 39 42 44 40 45 50 49 50 55 10 12 11 14 16 18 17 20 25 22 27 28 30 30 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a. Using Excel, enter the above data in a file and save the file. Print the file as well as the results of all of the following parts. b. Run the correlation analysis relating sales (Y) and all of the independent variables. (Do not include the column of Year.) Explain the results. Discuss the concept of multicollinearity. c. d. Run the Regression analyses relating sales (Y) and advertising (X1). Explain the results. Run a regression analysis relating sales (Y) and two independent variables X1 and X2. Explain the results. Run a regression analysis relating sales (Y) and two independent variables X1 and X3. Explain the results. Using the model developed in part "e", predict sales for 2004 assuming we are planning to advertise $700,000. Run a regression analysis relating sales (Y) and Time (X3). Explain the results. Using the model developed in part "g" predict sales for 2008. Run a regression analysis relating sales (Y) and three independent variables X1, X2, and X3. Explain the results. e. f. g. h. i. Professor Ahmadi’s Lecture Notes Page 38 Chapter 13 Multiple Regression & Correlation With Dummy Variables Problem 2. Ahmadi, Inc. is a microcomputer producer. The following data represent Ahmadi's yearly sales volume, their advertising expenditure, and whether in a given year they used all Television advertising (X2 = 0) or used Multimedia advertising (X2 = 1). (Y) X1 X2 Sales Advertising Dummy Variable Year ($1,000,000) ($10,000) (0,1) 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 15 16 18 17 16 19 19 24 25 27 30 33 38 40 45 32 33 35 34 36 37 39 42 44 40 45 50 49 50 55 0 1 1 1 0 1 0 0 1 0 1 1 0 0 1 Regression procedure of Excel was used on the above data and parts of the results are shown on the next page. a. Fill in all the blanks on the next page. b. Write the estimated regression equation. c. Using the results shown on the next page, predict sales for the year 2004 assuming we are planning to use $700,000 for television advertising only. d. Using the results shown on the next page, predict sales for the year 2004 assuming we are planning to use $700,000 for multimedia advertising. Professor Ahmadi’s Lecture Notes Page 39 SUMMARY OUTPUT Multiple R R Square Adjusted R Square Standard Error Observations ___________? ___________? ___________? 2.715 ___________? ANOVA Regression Residual Total df ___________? ___________? ___________? Intercept Advertising Dummy Coefficients -28.462401 1.31332227 -0.8296375 Professor Ahmadi’s Lecture Notes SS 1243.274 ___________? ___________? Standard Error 4.285592715 0.10113336 1.406029116 MS ___________? ___________? t Stat ___________? ___________? ___________? F ___________? Significance F 8.59E-08 P-value ___________? ___________? ___________? Page 40 Your Turn – One Final Example Significance of variables and other issues Problem 3. Ahmadi, Inc. produces several models of computer printers. Data on a few variables for one of the company’s printers are presented below. Sales (Y) (In $1,000,000) 1578 1741 2295 2134 2035 2408 2337 2468 2533 2800 2729 2799 3264 3367 3289 3453 5031 6125 6519 4586 4876 4675 3473 3669 4167 a. b. c. Advertising (X1) (In $1,000) 588 600 600 780 750 820 810 840 700 970 920 950 980 1167 800 1255 1706 1890 1996 1700 1706 1888 1300 1500 1400 Price (X2) (In $100) 21 20 17 21 21 19 20 25 25 16 15 24 17 19 12 17 17 12 17 15 21 14 19 18 24 Competitor's Price (X3) (In $100) 20 22 19 21 21 21 20 22 24 18 21 23 23 17 18 16 25 26 28 18 24 23 24 21 23 Time (X4) (In Years) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Rating (X5) (0 to 10) 4 2 4 8 6 8 8 6 8 8 6 6 6 4 6 6 8 8 8 10 4 6 10 8 4 Enter the above data into an Excel file and save the file. Print the file and the results of all of the following parts. Run a correlation analysis (among all variables) and print the results. Fully discuss the meaning of the correlation coefficients. Be sure to discuss the concept of multicollinearity. Run a regression analysis relating sales (Y) and ALL the independent variables. Fully explain the results. Professor Ahmadi’s Lecture Notes Page 41 d. Drop the variable(s) that at 95% confidence were not significant in part “c” and run a new regression analysis. Fully explain your results. Professor Ahmadi’s Lecture Notes Page 42
© Copyright 2025 Paperzz