Chapter 1

Chapter 1:
Looking at Data—Distributions
Dr. Nahid Sultana
Chapter 1:
Looking at Data—Distributions
 1.1 Displaying Distributions with Graphs
 1.2 Describing Distributions with Numbers
 1.3 Density Curves and Normal Distributions
1.1 Displaying Distributions with Graphs
Objectives
 Variables
 Types of variables
 Graphs for categorical variables
 Bar graphs
 Pie charts
 Graphs for quantitative variables
 Histograms
 Stemplots
 Stemplots versus histograms
 Interpreting histograms
 Time plots
Variables
 Statistics is the science of learning from data.
 In a study, we collect information—data—from individuals.
 Individuals can be people, animals, plants, or any object of
interest.
 A variable is any characteristic of an individual.
A variable varies among individuals i.e. it can take different
values for different individuals.
Example: age, height, blood pressure, ethnicity, first language
Two types of variables
 Variables can be either categorical
 Something that falls into one of several categories.
Example: Your blood type (A, B, AB, O), your hair color, your
ethnicity, whether you paid income tax last tax year or not.
 Or quantitative
 Something that takes numerical values for which arithmetic
operations, such as adding and averaging, make sense.
Example: How tall you are, your age, your blood cholesterol level,
the number of credit cards you own.
Two types of variables (Cont..)
Example:
Distribution of a Variable
To examine a single variable, we graphically display its distribution.
 The distribution of a variable tells us what values it takes and
how often it takes these values.
Distributions can be displayed using a variety of graphical tools.
The proper choice of graph depends on the nature of the variable.
Categorical Variable
Pie chart
Bar graph
Quantitative Variable
Histogram
Stemplot
Distribution of Categorical Variables
 Lists the categories and gives the count or percent of individuals
who fall into that category.
 Can be displayed using:
 Bar Graphs represent each category as a bar whose heights
show the category counts or percents.
Bar graph quickly compares the size of each group.
 Pie Charts show the distribution of a categorical variable as a
“pie” whose slices are sized by the counts or percents for the
categories.
Require that you include all the categories that make up a whole.
 The size of a slice depends on what percent of the whole this
category represents.
Bar Graphs and Pie Charts
Marital Status
Single
Married
Widowed
Divorced
Count
(millions)
41.8
113.3
13.9
16.3
Percent
22.6
61.1
7.5
8.8
Pie Charts and Bar Graphs (cont…)
Pie Charts and Bar Graphs (cont…)
Bar Graphs
 Data in the graph can be ordered any way we want (alphabetical, by
increasing value, by year, by personal preference, etc.)
Pie Charts and Bar Graphs (cont…)
Pie Charts and Bar Graphs (cont…)
Pie Graphs
Pie Charts and Bar Graphs (cont…)
Pie Graphs
Distribution Quantitative Variables
 Tells us what values the variable takes on and how often it takes
those values.
 Can be displayed using:
 Histograms
 Stemplots
 Time plots
Histograms and stemplots are summary graphs for a single
variable. They are very useful to understand the pattern of
variability in the data.
Time plot shows the behavior of observations over time.
Histogram
Histograms show the distribution of a quantitative variable by using
bars whose height represents the number of individuals who take
on a value within a particular class.
Draw a histogram :
 Divide the possible values into equal size interval (classes).
 Count how many observations fall into each interval (may
change to percents).
 Draw picture representing the distribution―bar heights are
equivalent to the number (percent) of observations in each
interval.
Histogram (Cont…)
Example: Weight Data―Introductory Statistics Class
Number of Students
Weight Data
15
10
5
0
Weight
The first bar represents all students with weight 100-<120. The
height of this bar shows how many students’ weight are in this range.
Stemplots
Stemplots separate each observation into a stem and a leaf that are
then plotted to display the distribution while maintaining the original
values of the variable.
To construct a stemplot:
 Separate each observation into a stem (first part of the number)
and a leaf (the remaining part of the number).
 Write the stems in a vertical column; draw a vertical line to the
right of the stems.
 Write each leaf in the row to the right of its stem; order leaves if
desired.
Stemplots (Cont…)
Example: Stemplot of the percents of females who are literate.
Stemplots (Cont…)
(a)Write the stems.
(b) Go through the data and write each leaf on the proper stem
(c) Arrange the leaves on each stem in order out from the stem.
Stemplots (Cont…)
 To compare two related distributions, a back-to-back stem plot
with common stems is useful. Example:
Here this Back-to-back stemplot
comparing the distributions of female and
male literacy rates.
Values on the left are the female percents,
ordered out from the stem from right to left
Values on the right are the male percents.
It is clear that literacy is generally higher
among males than among females in
these countries.
Stemplots (Cont…)
 Stem plots do not work well for large datasets.
 When the observed values have too many digits, trim the numbers
before making a stem plot.
 If there are very few stems (when the data cover only a very small
range of values), then we may want to create more stems by
splitting the original stems.
Example: If all of the data values were between 150 and 179, then
we may choose to use the following stems:
15
15
16
16
17
17
Leaves 0–4 would go on each upper stem
(first “15”), and leaves 5–9 would go on
each lower stem (second “15”).
Examining Distributions
 When describing the distribution of a quantitative variable, we
look for the overall pattern and for striking deviations from that
pattern.
 We can describe the overall pattern of a histogram by its shape,
center, and spread.
Histogram with a smoothed
curve highlighting the overall
pattern of the distribution
Examining Distributions
 A distribution is symmetric if the right and left sides of the graph
are approximately mirror images of each other.
 A distribution is skewed to the right (right-skewed) if the right
side of the graph is much longer than the left side.
 It is skewed to the left (left-skewed) if the left side of the graph is
much longer than the right side.
Outliers
 Outliers are observations that lie outside the overall pattern of a
distribution.
The overall pattern is fairly
symmetrical except for two
states Alaska and Florida.
A large gap in the
distribution is typically a
sign of an outlier.
Time Plots
 Time plot of a variable plots each observation against the
time at which it was measured.
 Always put time on the horizontal axis and the measuring
variable is on the vertical axis of the plot .
 Connect the data points by lines helps emphasize any
change over time.
Time Plots (cont…)
Scale matter
Look at the scales
Time series and Trend
 We describe time series by looking for an overall pattern and
for striking deviations from that pattern.
 In a time series, a trend is a rise or fall that persists over time,
despite small irregularities.
o This plot is a graph of
a(n)____________.
Ans: time series
o It shows that there is
(are)_________ in the data.
Ans: A decreasing trend.