Describing Data Numerically: Measures of Center and Spread

Section 1.3 Describing Data Numerically: Measures of Center and Spread

Introduction.

While describing data sets using graphs gives a good overview, it does not give us the detail that we need to do more advanced analysis. Graphs are meant to give a picture of the data, but in order to make decisions based on a set of data, we need to have numerical summaries.

In this section, we will look at several types of numerical summaries, or measures, which we can compute from a set of quantitative data. These measures include both:

Measures of Center.

A measure of center gives a single number that represents the "typical" value in the set of data. One very familiar measure of center is the average. When you say that the average score on an exam is 80%, you are claiming that the typical student scored 80%. This is not, however, the only way to measure the center of a data set, as we shall see in this section.
Measures of Spread.

Also called a measure of variation, this number or set of numbers indicates how spread out the data is. For example, if three students take an exam and all score 80%, the average would be 80%, but there is no variation between the scores. But if three students scored 60%, 80%, and 100%, the average would still be 80%, but now there is a large amount of variation between their scores–they are much more spread out.

Each of these values can be computed for data drawn from a sample (a small number of values from a larger population), or for a census of an entire population. We give them different names, however, depending on the source.

Definition 1.3.1.

A measurement describing some characteristic of a sample is called a statistic.

Definition 1.3.2.

A measurement describing some characteristic of a population is called a parameter.

The measures of center that we will look at often come with a related measure of variation or spread, so we will examine these together where appropriate.

Objectives

After finishing this section you should be able to

describe the following terms:
- interquartile range
- mean
- median
- midrange
- mode
- outlier
- parameter
- quartile
- range
- standard deviation
- statistic
- variance
accomplish the following tasks:
- Compute the mode
- Compute the midrange and the range
- Compute the median, 1st and 3rd quartiles, and interquartile range
- Compute the mean and standard deviation
- Select an appropriate measure of center for a given dataset

Subsection 1.3.1 The Mode

We have already seen the term mode used in the context of a graph. Recall that the mode of a histogram or bar graph is the class or category that has the tallest bar. That is, it is the class or category that appears the most in the data set. This seems to be a reasonable way to describe a “typical” element of a data set and leads to the following definition.

Definition 1.3.3.

The mode of a set of values is the value that appears most often. If more than one value is tied for the most appearances, each one is a mode. If no value appears more than once, there is no mode.

The mode is unique among our measures of center as it is the only one that applies to both quantitative variables and qualitative variables. Consider the following examples.

Example 1.3.4. The Mode of a Qualitative Variable.

A breakout presentation at a business convention attracts 16 participants. The job title of each of these participants is as follows. Find the mode of this set of data.

manager	owner	H.R. director	owner
manager	H.R. director	manager	manager
owner	manager	manager	H.R. director
manager	owner	owner	manager

Table 1.3.5. Job Titles

Solution

The mode is the most commonly appearing value. One way to find that is to construct a frequency table. Doing so for this data produces the following table, from which we can see that the mode is “manager”.

Value	Frequency
manager	8
H.R. director	3
owner	5

Table 1.3.6. Frequency Table

Example 1.3.7. The Mode of a Quantitative Variable.

A survey of twelve grocery store shoppers asked how many times a month these shoppers visit their favorite grocery store. Find the mode of the resulting data, shown below.

6	2	5	1	3	5	3	2
7	4	1	5	3	6	4	8

Table 1.3.8. Number of Visits Per Month

Solution

Again, the best starting point is to construct a quick frequency table.

Value	1	2	3	4	5	6	7	8
Frequency	2	2	3	2	3	2	1	1

Table 1.3.9. Frequency Table

Notice that in this data set, both the 5 and the 3 appear three times. Every other value appears two or fewer times. This makes this set of data bimodal with modes 3 and 5.

While the mode is an easy way to measure the center, and is in fact the only way we have to measure the center of a set of qualitative data, it is usually not the best choice for measuring the center of a set of quantitative data. To see why this may be so, consider the next example.

Example 1.3.10.

Find the mode of the following sets of numbers.

$\lbrace 1, 2, 5, 7, 12, 15, 19, 22, 50 \rbrace$
$\lbrace 1, 1, 20, 23, 26, 24, 29, 30, 27, 32, 19\rbrace$

Solution

The modes of these sets of numbers are as follows:

This data set contains no repeated values. Therefore, according to the definition, there is no mode for this set.
This data contains only one repeated value, the 1. So the mode is 1. However, 1 is definitely not the “typical” value in the set since all other values are between 19 and 32.

In summary, you should use the mode to measure the center of any qualitative set of data, or as a quick, but not definitive, measure of center for quantitative data. But be careful! The mode may not exist, there may be multiple modes, and if there is a single mode it may be very different from the “typical” values in the data set.

Figure 1.3.11. Finding the Mode I

Figure 1.3.12. Finding the Mode II

Checkpoint 1.3.13. Finding the Mode I.

The following colors of cars were observed in the parking lot of the local grocery store.

blue	green	black	white
white	green	blue	yellow
white	white	blue	black
black	green	red	green

Table 1.3.14. Car Colors

Question: Which color(s) is/are the mode(s) for this set of data?

cat	fish	bird	hamster	ferret
dog	rabbit	rat	gerbil	pig

13.2	10.8	14.6	17.2	17
18.3	6.5	13.4	16.7	11.3
12.3	9.8	9.4	11.6	12.1
8.7	12.6	13	14.3	13.1

\(x_i\)	\((x_i - \overline{x})\)	\((x_i - \overline{x})^2\)
\(4\)	\((4 - 5) = -1\)	\((-1)^2 = 1\)
\(6\)	\((6 - 5) = 1\)	\((1)^2 = 1\)
\(3\)	\((3 - 5) = -2\)	\((-2)^2 = 4\)
\(8\)	\((8 - 5) = 3\)	\((3)^2 = 9\)
\(4\)	\((4 - 5) = -1\)	\((-1)^2 = 1\)
\(s^2 =\)		\(\frac{16}{5-1} = 4\)

	Qualitative	Quantitative	Sensitive	Considers
	Data	Data	to Outliers	every Value
Mode	sometimes	sometimes	no	no
Midrange and Range	no	yes	yes	no
Median and IQR	no	yes	no	no
Mean and Standard Deviation	no	yes	somewhat	yes

Section 1.3 Describing Data Numerically: Measures of Center and Spread

Introduction.

Measures of Center.

Measures of Spread.

Definition 1.3.1.

Definition 1.3.2.

Objectives

Subsection 1.3.1 The Mode

Definition 1.3.3.

Example 1.3.4. The Mode of a Qualitative Variable.

Example 1.3.7. The Mode of a Quantitative Variable.

Example 1.3.10.

Checkpoint 1.3.13. Finding the Mode I.

Checkpoint 1.3.15. Finding the Mode II.

Checkpoint 1.3.16. Finding the Mode II.

Subsection 1.3.2 Midrange and Range

Definition 1.3.18.

Definition 1.3.19.

Example 1.3.20.

Definition 1.3.22.

Example 1.3.23. The Effect of Outliers on the Midrange and Range.

Checkpoint 1.3.26. Finding the Midrange and Range I.

Checkpoint 1.3.27. Finding the Midrange and Range II.

Checkpoint 1.3.28. Finding the Midrange and Range III.

Checkpoint 1.3.29. Finding the Midrange and Range IV.

Subsection 1.3.3 Median and Quartiles

Example 1.3.30.

Definition 1.3.31.

Example 1.3.32. Computing the Median.

Definition 1.3.33.

Definition 1.3.34.

Example 1.3.35.

Checkpoint 1.3.38.

Checkpoint 1.3.39.

Checkpoint 1.3.40.

Checkpoint 1.3.41.

Subsection 1.3.4 Mean and Standard Deviation

Definition 1.3.42.

Definition 1.3.43.

Definition 1.3.44.

Example 1.3.45. Computing the Mean and Standard Deviation.

Principle 1.3.47.

Checkpoint 1.3.50.

Checkpoint 1.3.51.

Checkpoint 1.3.52.

Subsection 1.3.5 Selecting Appropriate Summaries

Observation 1.3.54.

Mode.

Midrange and Range.

Median and Inter-Quartile Range.

Mean and Standard Deviation.

Example 1.3.55. Picking Appropriate Measures.

Checkpoint 1.3.58.

Checkpoint 1.3.59.

Checkpoint 1.3.60.