Math for Liberal Arts MAT 110: Chapter 6 Notes Characterizing Data - - PowerPoint PPT Presentation

math for liberal arts mat 110 chapter 6 notes
SMART_READER_LITE
LIVE PREVIEW

Math for Liberal Arts MAT 110: Chapter 6 Notes Characterizing Data - - PowerPoint PPT Presentation

MAT 110 Chapter 6 Math for Liberal Arts MAT 110: Chapter 6 Notes Characterizing Data Putting Statistics to Work David J. Gisch Measures of Central Tendency Average = Mean The m ean is what we most commonly call the average value. It


slide-1
SLIDE 1

MAT 110 ‐ Chapter 6 1

Math for Liberal Arts MAT 110: Chapter 6 Notes

Putting Statistics to Work David J. Gisch

Characterizing Data

Measures of Central Tendency

  • The m ean is what we most commonly call the average
  • value. It is defined as follows:

̅ ∑

  • The m edian is the middle value in the sorted data set

(or halfway between the two middle values if the number

  • f values is even).
  • The m ode is the most common value (or group of

values) in a distribution.

▫ There can be multiple modes or no mode.

Average = Mean

slide-2
SLIDE 2

MAT 110 ‐ Chapter 6 2

An Odd S et of Data

Example: Use the given data to answer the following. 13, 16, 13, 18, 13, 13, 14, 21, 14

(a) What is the mean? (b) What is the mode? (c) What is the median?

13 13 13 13 14 14 16 18 21

An Even S et of Data

Example: Use the given data to answer the following.

(a) What is the mean? (b) What is the mode? (c) What is the median?

2 10 4 5 2 4 6 5 3 6 7 16 3 3 8 10

2 2 3 3 3 4 4 5 5 6 6 7 8 10 10 16

Example (On your own)

Example: Use the given data to answer the following.

(a) What is the mean? (b) What is the mode? (c) What is the median?

5 11 4 3 6 4 6 3 3 6 6 16 3 3 8 11

Outliers

  • An outlier is a data value that is much higher or much

lower than almost all other values. Example: To see the effect of an outlier consider the salaries of people in this DMACC classroom.

slide-3
SLIDE 3

MAT 110 ‐ Chapter 6 3

Outliers

  • To accommodate for outliers we tend to use the median.

The following are examples of data that are typically reported using the median.

▫ Income ▫ House Prices ▫ Age

Two single-peaked (unimodal) distributions A double-peaked (bimodal) distribution

S hapes of Distributions

Example: Give an example of a data set for each type of shape.

S hapes of Distributions (S ymmetry)

A distribution is sym m etric if its left half is a mirror image of its right half.

Example: Give an example of a data set that is symmetric.

A distribution is left- skewed if its values are more spread out on the left side.

S hapes of Distributions (S kewness)

Example: Give an example of a data set that is left-skewed.

slide-4
SLIDE 4

MAT 110 ‐ Chapter 6 4

A distribution is right- skewed if its values are more spread out on the right side.

S hapes of Distributions (S kewness)

Example: Give an example of a data set that is right- skewed.

Example

Example: For each scenario state if it is (unimodal or bimodal), (skew-left, skew-right, or symmetric), and whether you would use the (mean, median, or mode) to measure the center.

(a) You want to buy a house in a neighborhood and are interested in the prices of other houses in the area. (b) Your office mates are collecting money for Milton’s (“have you seen my stapler”) birthday. You are trying to determine what to give. (c) You are collecting data on the color of cars in the DMACC parking lot. Measures of Variation

Variation (S pread)

From left to right, these three distributions have increasing variation. Variation describes how widely data values are spread out about the center of a distribution.

slide-5
SLIDE 5

MAT 110 ‐ Chapter 6 5

Why Variation Matters

Example: Consider the following waiting times for 11 customers at 2 banks. Calculate the mean, median and mode for both.

Big Bank (three lines): 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Best Bank (one line): 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8

Which bank is likely to have more unhappy customers?

Measures of Variation (S pread)

  • The range of a data set is the total spread of the data set

max – min

  • The lower quartile, Q1, (or first quartile) divides the lowest

fourth of a data set from the upper three-fourths. It is the median of the data values in the low er half of a data set.

  • The m iddle quartile, Q2, (or second quartile) is the overall

median.

  • The upper quartile, Q3, (or third quartile) divides the

lower three-fourths of a data set from the upper fourth. It is the median of the data values in the upper half of a data set.

Quartiles (An Odd S et of Data)

Example: Use the given data to answer the following. 76, 65, 100, 85, 68, 70, 74, 87, 90, 80, 92

(a) What is the range? (b) State the quartiles.

65 68 70 74 76 80 85 87 90 92 100

Quartiles (An Even S et of Data)

Example: Use the given data to answer the following.

(a) What is the range? (b) State the quartiles.

2 10 4 5 2 4 6 5 3 6 7 16 3 3 8 10

2 2 3 3 3 4 4 5 5 6 6 7 8 10 10 16

slide-6
SLIDE 6

MAT 110 ‐ Chapter 6 6

  • The five-num ber sum m ary for a data set consists of

the following five numbers:

low value low er quartile m edian upper quartile high value

  • A boxplot (or box-and-whiskers plot) shows the five-

number summary visually, with a rectangular box enclosing the lower and upper quartiles, a line marking the median, and whiskers extending to the low and high values.

The Five-Number S ummary

This boxplot is for Example 6.B.2 data. 65, 68, 70, 74, 76, 80, 85, 87, 90, 92, 100 Lower Half Upper Half Q2 Q1 Q3 High Low Q1 Q2 Q3 Best Bank Big Bank

low value (min) = 4.1 lower quartile = 5.6 median = 7.2 upper quartile = 8.5 high value (max) = 11.0 low value (min) = 6.6 lower quartile = 6.7 median = 7.2 upper quartile = 7.7 high value (max) = 7.8

Five-number summary of the waiting times at each bank: The corresponding boxplot:

The Five-Number S ummary Quick Review

Use the given data to answer the following.

(a) What is the mean? (b) What is the mode? (c) What is the median? (d) State the quartiles

2 2 2 4 5 5 6 7 10 15

2 2 2 4 5 5 6 7 10 15

S tandard Deviation

  • The standard deviation is the single number most

commonly used to describe variation.

▫ Think of the standard deviation as a way to take different types of data sets and convert them to common unit of measure so we can compare their variation (spread).

∑ ̅ 1 Here, is each data point, ̅ is the mean, and is the number of data points. The symbol ∑, capital greek letter sigma, means to “add up” or “to sum up.”

slide-7
SLIDE 7

MAT 110 ‐ Chapter 6 7

S tandard Deviation

  • To guide you follow these instructions.
  • You will get a blank table and this exact image to use on

the tests, see next example.

S tandard Deviation

x (data value) x – mean (deviation) (deviation)2 2 8 9 12 19 Total

Example: Calculate the standard deviation of the following data set. 2, 8, 9, 12, 19

S tandard Deviation

Example: Two car companies have the same mean (15 years) lifespan for their best selling sedan. However, Company A has a standard deviation of 1.2 years and Company B has a standard deviation of 3.1 years. Which is better?

The Normal Distribution

slide-8
SLIDE 8

MAT 110 ‐ Chapter 6 8

The Normal Distribution

  • The norm al distribution is a symmetric, bell-shaped

distribution with a single peak. Its peak corresponds to the mean, median, and mode of the distribution.

Both sets of data (distributions) are normally distributed with a mean of 75, but the graph on the left has a larger variation (spread).

1. Most data values are clustered near the mean, giving the distribution a well-defined single peak.

  • 2. Data values are spread evenly around the mean, making

the distribution symmetric.

  • 3. Larger deviations from the mean are increasingly rare,

producing the tapering tails of the distribution.

  • 4. Individual data values result from a combination of many

different factors.

A data set satisfying the following criteria is likely to have a nearly normal distribution.

Conditions for a Normal Distribution

Normal Distributions?

Example: Describe each of the following distributions by their shapes. (a) Scores on a very easy test. (b) Shoe sizes of adult women. (c) The weight of Doritos bags of the same size. (d) The length of hair donated to “locks of love.”

The 68-95-99.7 Rule for a Normal Distribution

is lower-case greek letter sigma. It is used for the standard deviation.

slide-9
SLIDE 9

MAT 110 ‐ Chapter 6 9

S tandard Deviation

  • Notice that this means almost all data lies between 3

standard deviations above and below the mean.

S tandard Deviation

Example: Two car companies have the same mean (15 years) lifespan for their best selling sedan. However, Company A has a standard deviation of 1.2 years and Company B has a standard deviation of 3.1 years. Which is better?

15 15 31.2 18.6 15 31.2 11.4 15 15 33.1 24.3 15 33.1 5.7 Company A Company B

68-95-99.7 Rule

Example: A data set is normally distributed with a mean of 84 and a standard deviation of 6. Use the 68-95-99.7 rule to answer each of the following. (a) 68% of the data lies between? (b) 95% of the data lies between? (c) 99.7% of the data lies between?

68-95-99.7 Rule

Example: A data set is normally distributed with a mean of 15 and a standard deviation of 3. Using the 68-95-99.7 rule, what percent of the data lies below 15?

slide-10
SLIDE 10

MAT 110 ‐ Chapter 6 10

68-95-99.7 Rule

Example: A data set is normally distributed with a mean of 15 and a standard deviation of 3. Using the 68-95-99.7 rule, what percent of the data lies between 12 and 21?

68-95-99.7 Rule

Example: A data set is normally distributed with a mean of 15 and a standard deviation of 3. Using the 68-95-99.7 rule, what percent of the data lies above 9?

68-95-99.7 Rule

What if a value does not lie right on one of these marks? For example, what percent is between 8 and 16? What percent is great than 10? etc…

6 9 12 15 18 21 24

S tandard S cores (z-scores)

  • The 68-95-99.7 rule is very limited. We would like to be

able to find percentiles for any value.

  • The number of standard deviations that a data value lies

above or below the mean is called its standard score (or z-score), defined by ̅

  • Where is your data point, ̅ is the mean, and is the

standard deviation.

slide-11
SLIDE 11

MAT 110 ‐ Chapter 6 11

S tandard S cores (ACT)

Example: If the mean were 21 with a standard deviation of 4.7 for scores on a nationwide test, find the z-score for a score of 30. What does this mean?

S tandard S cores (S AT)

Example: If the mean were 1000 with a standard deviation

  • f 200 for scores on a nationwide test, find the z-score for

a score of 910. What does this mean?

S tandard S cores and Percentiles

  • The nth percentile of a data set is the smallest value in

the set with the property that n% of the data are less than

  • r equal to it.

For example, if I am in the 95th percentile for height, then 95% of the population is shorter (less) than me.

  • We can use z-scores and the chart on the following page

to find percentiles.

S tandard S cores and Percentiles

Example: Take the ACT result where the z-score was 1.91 for a score of 30. Find that value on the chart and tell me what percentile a sore

  • f 30 would have.

So 97.72% of all test takers scored below a 30.

slide-12
SLIDE 12

MAT 110 ‐ Chapter 6 12

S tandard S cores and Percentiles

Example: Cholesterol levels for men age 18 to 24 are normally distributed with a mean of 178 and a standard deviation of 41. (a) In what percentile is a man with a cholesterol level of 190? (b) In what percentile is a man with a cholesterol level of 120?

S tandard S cores and Percentiles

Example: The heights of American women age 18 to 24 are normally distributed with a mean of 65 inches and a standard deviation of 2.5 inches. (a) In what percentile is a woman with a height of 74 inches? (b) What percent of women are taller than 62 inches?

S tandard S cores and Percentiles

c) In order to serve in the U.S. Army, women must be between 58 and 80 inches tall. What percent of women fit this criteria?

How this can be used.

Page 1

  • Your handout is a hypothetical list of 50 students

applying to be accepted to university.

  • All the data is shown and at the bottom are the mean and

standard deviation for each column.

  • Do you notice anything interesting on page 1?
slide-13
SLIDE 13

MAT 110 ‐ Chapter 6 13

How this can be used.

Page 2

  • Using the mean and standard deviation for each column

we turned each data point into a z-score.

  • Verify the z-score for student A11 HS GPA.

How this can be used.

Page 3

  • At the top of the page are weighted importance for each

category for two universities.

▫ Notice that All Saints is big on helping other but cares little about money. ▫ Crass cares deeply about money and donations.

  • The weights were used with the z-scores to calculate each

student’s entrance exam score.

▫ For example, the score (using Crass weighting) for student A01

2 .19 1 1.38 1 .55 3 .69 2 2.48 2 6.95 1 .91 18.17

How this can be used.

Page 4

  • We rearranged the students in order of high to low

according to their All Saints scores.

  • So who do you now want to accept into the university?