Name: Ms. D’Amato Date: Block: Chapter 3: Displaying and Describing Categorical Data To compile our data, . Easier to interpret Shows patterns Conveys information Frequency Tables: Making Piles Use each category! to “make piles.” All you have to do is count the totals for Sometimes we want to know the fraction or of the data in each category. Once we divide the counts by the total, we multiply by 100 to express these proportions as . A gives percentages (instead of counts) for each category. The Area Principle What impression do you get about who was aboard this ship? Is this display a good way to show the Titanic data? The ship display makes it look like most of the people on the Titanic were crew members, with a few passengers along for the ride. When we look at each ship, we see the area taken up by the ship, instead of the length of the ship. The area principle says that the area of graph should correspond to the of the value it represents. Bar Charts Displays the distribution of a variable, showing the for each category next to each other for easy comparison. Obeys the area principle, although it is not as visual entertaining as the ships, but it does give an visual impression of the distribution. The bars can be horizontal or vertical with small spaces in between. Generally, the displays the categorical data and the displays the quantitative data. A displays the relative proportion of counts for each category. It replaces the counts with . Pie Charts Shows the whole group of cases as a Use when interested in parts of the whole. Categories are sliced into pieces whose size is the whole. Make sure that the categories don’t overlap, so nothing is counted twice. . to the fraction of Contingency Tables: Children and First-Class Ticket Holders First? A contingency tables allows us to look at two categorical variables together. It shows how individuals are distributed along each variable, contingent on the value of the other variable. Let’s look at an example between class and whether the person survived on the Titanic: What does this table tell us? The margins of the table, both on the right and on the bottom, give totals and the frequency distributions for each of the variables. Each frequency distribution is called a respective variable. of its Each of the table gives the count for a combination of values of the two values. o For example, the second cell in the crew column tells us that 673 crew members died when the Titanic sunk. Let’s take a look at the following contingency table: Class Survival Alive Dead Total Count % of Row % of Column % of Table Count % of Row % of Column % of Table Count % of Row % of Column First 203 28.6% 62.5% 9.2% 122 8.2% 37.5% 5.6% 325 14.8% 100% Second 118 16.6% 41.4% 5.4% 167 11.2% 58.6% 7.6% 285 12.9% 100% Third 178 25.0% 25.2% 8.1% 528 35.4% 74.8% 24.0% 706 32.1% 100% Crew 212 29.8% 24.0% 9.6% 673 45.2% 76.0% 30.6% 885 40.2% 100% Total 711 100% 32.3% 32.3% 1490 100% 67.7% 67.7% 2201 100% 100% % of Table 14.8% 12.9% 32.1% 40.2% 100% At first glance, what do you think about this contingency table? Conditional Distribution What happens if we were interested in only comparing the survival rate for each class? Create a , which only shows the distribution of one variable for just the individuals who satisfy some condition on another variable. The conditional distribution of ticket Class, conditional on having survived. The conditional distribution of ticket Class, conditional on having perished. Pie Charts of the Two Distributions: What does this tell about survival and class? We see that the distribution of Class for the survivors is different from that of the nonsurvivors. This leads us to believe that Class and Survival are associated, that they are not independent. The variables would be considered when the distribution of one variable in a contingency table is the same for all categories of the other variable. Segmented Bar Charts A but in the form of bars instead of circles. displays the same information as a pie chart, Treats each bar as the “whole” and divides it corresponding to the into segments in each group. Take a look at the segmented bar chart for ticket Class, by Survival status: Notice that although the totals for survivors and nonsurvivors are different, the bars are the same height because we have converted the numbers to percentages. Are the two variables related? What Can Go Wrong? Don’t violate the area principle. o While some people might like the pie chart on the left better, it is harder to compare fractions of the whole, which a well-done pie chart does. Keep it honest —make sure your display shows what it says it shows. o This plot of the percentage of high-school students who engage in specified dangerous behaviors has a problem. Can you see it? 100 80 What is wrong with this display? 60 40 20 0 2001-2002 2003-2004 2005-2006 2007 Jan-Jul 2008 Don’t confuse similar-sounding percentages — pay particular attention to the wording of the context. Don’t forget to look at the variables separately, too — examine the marginal distributions, since it is important to know how many cases are in each category. Be sure to use enough individuals. o Do not make a report like “We found that 66.67 of the rats improved their performance with training. The other rat died.” Don’t overstate your case — don’t claim something you can’t. Don’t use unfair or silly averages — this could lead to careful when you average one variable across different levels of a second variable. Moe Jill Day 90 out of 100 90% 19 out of 20 95% Night 10 out of 20 50% 75 out of 100 75% , so be Overall 100 out of 120 83% 94 out of 120 78% What Have We Learned? We can summarize categorical data by counting the number of cases in each category (expressing these as counts or percents). We can display the distribution in a bar chart or pie chart. And, we can examine two-way tables called contingency tables, examining marginal and/or conditional distributions of the variables.
© Copyright 2025 Paperzz