Name: Date: Ms. D`Amato Block: Chapter 3: Displaying and

Name:
Ms. D’Amato
Date:
Block:
Chapter 3: Displaying and Describing Categorical Data
To compile our data,



.
Easier to interpret
Shows patterns
Conveys information
Frequency Tables: Making Piles
Use
each category!
to “make piles.” All you have to do is count the totals for
Sometimes we want to know the fraction or
of the data in each category.
Once we divide the counts by the total, we multiply by 100 to express these proportions as
. A
gives percentages (instead
of counts) for each category.
The Area Principle
What impression do you get about who was aboard
this ship?
Is this display a good way to show the Titanic data?

The ship display makes it look like most of the people on the Titanic were crew members, with
a few passengers along for the ride.

When we look at each ship, we see the area taken up by the ship, instead of the length of the
ship.

The area principle says that the area of graph should correspond to the
of the value it represents.
Bar Charts

Displays the distribution of a
variable, showing the
for each category next to each other for easy comparison.

Obeys the area principle, although it is not as visual entertaining as the ships, but it does give
an


visual impression of the distribution.
The bars can be horizontal or vertical with small spaces in between. Generally, the
displays the categorical data and the
displays the quantitative data.
A
displays the relative proportion of counts for
each category. It replaces the counts with
.
Pie Charts

Shows the whole group of cases as a

Use when interested in parts of the whole.

Categories are sliced into pieces whose size is
the whole.

Make sure that the categories don’t overlap, so nothing is counted twice.
.
to the fraction of
Contingency Tables: Children and First-Class Ticket Holders First?

A contingency tables allows us to look at two categorical variables together.

It shows how individuals are distributed along each variable, contingent on the value of the
other variable.
Let’s look at an example between class and whether the person survived on the Titanic:
What does this table tell us?

The margins of the table, both on the right and on the bottom, give totals and the frequency
distributions for each of the variables.

Each frequency distribution is called a
respective variable.
of its

Each
of the table gives the count for a combination of values of the two values.
o For example, the second cell in the crew column tells us that 673 crew members died
when the Titanic sunk.
Let’s take a look at the following contingency table:
Class
Survival
Alive
Dead
Total
Count
% of Row
% of Column
% of Table
Count
% of Row
% of Column
% of Table
Count
% of Row
% of Column
First
203
28.6%
62.5%
9.2%
122
8.2%
37.5%
5.6%
325
14.8%
100%
Second
118
16.6%
41.4%
5.4%
167
11.2%
58.6%
7.6%
285
12.9%
100%
Third
178
25.0%
25.2%
8.1%
528
35.4%
74.8%
24.0%
706
32.1%
100%
Crew
212
29.8%
24.0%
9.6%
673
45.2%
76.0%
30.6%
885
40.2%
100%
Total
711
100%
32.3%
32.3%
1490
100%
67.7%
67.7%
2201
100%
100%
% of Table
14.8%
12.9%
32.1%
40.2%
100%
At first glance, what do you think about this contingency table?
Conditional Distribution
What happens if we were interested in only comparing the survival rate for each class?

Create a
, which only shows the distribution of one
variable for just the individuals who satisfy some condition on another variable.
The conditional distribution of ticket Class,
conditional on having survived.
The conditional distribution of ticket Class,
conditional on having perished.
Pie Charts of the Two Distributions:
What does this tell about survival and class?

We see that the distribution of Class for the survivors is different from that of the nonsurvivors.

This leads us to believe that Class and Survival are associated, that they are not independent.

The variables would be considered
when the distribution of one
variable in a contingency table is the same for all categories of the other variable.
Segmented Bar Charts



A
but in the form of bars instead of circles.
displays the same information as a pie chart,
Treats each bar as the “whole” and divides it
corresponding to the
into segments
in each group.
Take a look at the segmented bar chart for ticket Class, by Survival status:

Notice that although the totals for
survivors and nonsurvivors are
different, the bars are the same height
because we have converted the numbers
to percentages.

Are the two variables related?
What Can Go Wrong?

Don’t violate the area principle.
o While some people might like the pie chart on the left better, it is harder to compare
fractions of the whole, which a well-done pie chart does.

Keep it honest —make sure your display shows what it says it shows.
o This plot of the percentage of high-school students who engage in specified dangerous
behaviors has a problem. Can you see it?
100
80
What is wrong with this display?
60
40
20
0




2001-2002 2003-2004 2005-2006
2007
Jan-Jul
2008
Don’t confuse similar-sounding percentages — pay particular attention to the wording of the
context.
Don’t forget to look at the variables separately, too — examine the marginal distributions, since
it is important to know how many cases are in each category.
Be sure to use enough individuals.
o Do not make a report like “We found that 66.67 of the rats improved their performance
with training. The other rat died.”
Don’t overstate your case — don’t claim something you can’t.

Don’t use unfair or silly averages — this could lead to
careful when you average one variable across different levels of a second variable.
Moe
Jill
Day
90 out of 100
90%
19 out of 20
95%
Night
10 out of 20
50%
75 out of 100
75%
, so be
Overall
100 out of 120
83%
94 out of 120
78%
What Have We Learned?

We can summarize categorical data by counting the number of cases in each category
(expressing these as counts or percents).

We can display the distribution in a bar chart or pie chart.

And, we can examine two-way tables called contingency tables, examining marginal and/or
conditional distributions of the variables.