Chapter 1: Looking at Data—Distributions Dr. Nahid Sultana Chapter 1: Looking at Data—Distributions 1.1 Displaying Distributions with Graphs 1.2 Describing Distributions with Numbers 1.3 Density Curves and Normal Distributions 1.1 Displaying Distributions with Graphs Objectives Variables Types of variables Graphs for categorical variables Bar graphs Pie charts Graphs for quantitative variables Histograms Stemplots Stemplots versus histograms Interpreting histograms Time plots Variables Statistics is the science of learning from data. In a study, we collect information—data—from individuals. Individuals can be people, animals, plants, or any object of interest. A variable is any characteristic of an individual. A variable varies among individuals i.e. it can take different values for different individuals. Example: age, height, blood pressure, ethnicity, first language Two types of variables Variables can be either categorical Something that falls into one of several categories. Example: Your blood type (A, B, AB, O), your hair color, your ethnicity, whether you paid income tax last tax year or not. Or quantitative Something that takes numerical values for which arithmetic operations, such as adding and averaging, make sense. Example: How tall you are, your age, your blood cholesterol level, the number of credit cards you own. Two types of variables (Cont..) Example: Distribution of a Variable To examine a single variable, we graphically display its distribution. The distribution of a variable tells us what values it takes and how often it takes these values. Distributions can be displayed using a variety of graphical tools. The proper choice of graph depends on the nature of the variable. Categorical Variable Pie chart Bar graph Quantitative Variable Histogram Stemplot Distribution of Categorical Variables Lists the categories and gives the count or percent of individuals who fall into that category. Can be displayed using: Bar Graphs represent each category as a bar whose heights show the category counts or percents. Bar graph quickly compares the size of each group. Pie Charts show the distribution of a categorical variable as a “pie” whose slices are sized by the counts or percents for the categories. Require that you include all the categories that make up a whole. The size of a slice depends on what percent of the whole this category represents. Bar Graphs and Pie Charts Marital Status Single Married Widowed Divorced Count (millions) 41.8 113.3 13.9 16.3 Percent 22.6 61.1 7.5 8.8 Pie Charts and Bar Graphs (cont…) Pie Charts and Bar Graphs (cont…) Bar Graphs Data in the graph can be ordered any way we want (alphabetical, by increasing value, by year, by personal preference, etc.) Pie Charts and Bar Graphs (cont…) Pie Charts and Bar Graphs (cont…) Pie Graphs Pie Charts and Bar Graphs (cont…) Pie Graphs Distribution Quantitative Variables Tells us what values the variable takes on and how often it takes those values. Can be displayed using: Histograms Stemplots Time plots Histograms and stemplots are summary graphs for a single variable. They are very useful to understand the pattern of variability in the data. Time plot shows the behavior of observations over time. Histogram Histograms show the distribution of a quantitative variable by using bars whose height represents the number of individuals who take on a value within a particular class. Draw a histogram : Divide the possible values into equal size interval (classes). Count how many observations fall into each interval (may change to percents). Draw picture representing the distribution―bar heights are equivalent to the number (percent) of observations in each interval. Histogram (Cont…) Example: Weight Data―Introductory Statistics Class Number of Students Weight Data 15 10 5 0 Weight The first bar represents all students with weight 100-<120. The height of this bar shows how many students’ weight are in this range. Stemplots Stemplots separate each observation into a stem and a leaf that are then plotted to display the distribution while maintaining the original values of the variable. To construct a stemplot: Separate each observation into a stem (first part of the number) and a leaf (the remaining part of the number). Write the stems in a vertical column; draw a vertical line to the right of the stems. Write each leaf in the row to the right of its stem; order leaves if desired. Stemplots (Cont…) Example: Stemplot of the percents of females who are literate. Stemplots (Cont…) (a)Write the stems. (b) Go through the data and write each leaf on the proper stem (c) Arrange the leaves on each stem in order out from the stem. Stemplots (Cont…) To compare two related distributions, a back-to-back stem plot with common stems is useful. Example: Here this Back-to-back stemplot comparing the distributions of female and male literacy rates. Values on the left are the female percents, ordered out from the stem from right to left Values on the right are the male percents. It is clear that literacy is generally higher among males than among females in these countries. Stemplots (Cont…) Stem plots do not work well for large datasets. When the observed values have too many digits, trim the numbers before making a stem plot. If there are very few stems (when the data cover only a very small range of values), then we may want to create more stems by splitting the original stems. Example: If all of the data values were between 150 and 179, then we may choose to use the following stems: 15 15 16 16 17 17 Leaves 0–4 would go on each upper stem (first “15”), and leaves 5–9 would go on each lower stem (second “15”). Examining Distributions When describing the distribution of a quantitative variable, we look for the overall pattern and for striking deviations from that pattern. We can describe the overall pattern of a histogram by its shape, center, and spread. Histogram with a smoothed curve highlighting the overall pattern of the distribution Examining Distributions A distribution is symmetric if the right and left sides of the graph are approximately mirror images of each other. A distribution is skewed to the right (right-skewed) if the right side of the graph is much longer than the left side. It is skewed to the left (left-skewed) if the left side of the graph is much longer than the right side. Outliers Outliers are observations that lie outside the overall pattern of a distribution. The overall pattern is fairly symmetrical except for two states Alaska and Florida. A large gap in the distribution is typically a sign of an outlier. Time Plots Time plot of a variable plots each observation against the time at which it was measured. Always put time on the horizontal axis and the measuring variable is on the vertical axis of the plot . Connect the data points by lines helps emphasize any change over time. Time Plots (cont…) Scale matter Look at the scales Time series and Trend We describe time series by looking for an overall pattern and for striking deviations from that pattern. In a time series, a trend is a rise or fall that persists over time, despite small irregularities. o This plot is a graph of a(n)____________. Ans: time series o It shows that there is (are)_________ in the data. Ans: A decreasing trend.
© Copyright 2025 Paperzz