SLIDE 1
ACMS 20340 Statistics for Life Sciences Chapter 24: One-way - - PowerPoint PPT Presentation
ACMS 20340 Statistics for Life Sciences Chapter 24: One-way - - PowerPoint PPT Presentation
ACMS 20340 Statistics for Life Sciences Chapter 24: One-way Analysis of Variance: Comparing Several Means Tropical Flowers Researchers have been studying dif- ferent varieties of the tropical flower Heliconia on the island of Dominica and
SLIDE 2
SLIDE 3
Tropical Flower Data
SLIDE 4
Comparing Several Means
We’ve compared two means using two-sample t procedures, but what if we have samples from several populations? Here we have three populations of flowers. Each of these has a mean: µ1 for bihai, µ2 for red, and µ3 for yellow. Using what we know, we could perform several two-sample t tests:
◮ Test H0 : µ1 = µ2 to see if the mean lengths of the bihai and
red varieties differ.
◮ Test H0 : µ1 = µ3 to see if the mean lengths of the bihai and
yellow varieties differ.
◮ Test H0 : µ2 = µ3 to see if the mean lengths of the red and
yellow varieties differ. This method becomes ridiculously cumbersome as we increase the number of populations being compared.
SLIDE 5
The Problem with Comparing Several Means
However, there is another issue at hand with this method. Simply put, we cannot safely compare numerous parameters by doing inference for two parameters at a time. Getting three different P-values, one from each test performed, doesn’t tell us how likely it is that the three sample means are spread apart as far as they are. Two groups may seem significantly different without considering
- thers, but not when all groups are considered together.
For example, consider the tallest and shortest persons in the class. These are simply the extremes of a continuum of heights.
SLIDE 6
Handling Multiple Comparisons
Dealing with many comparisons is common in statistics, and an
- verall measure of all population comparisons is needed.
As with methods such as chi-square, we handle multiple comparisons in two steps:
◮ First, we do an overall test to see if there is good evidence of
any differences among the parameters we want to compare.
◮ Second, we perform a detailed follow-up analysis to decide
which of the parameters differ and to estimate how large the differences are. We’ll just consider the overall test.
SLIDE 7
The Analysis of Variance Hypothesis Test
We consider our tropical flowers once again and formulate the hypotheses. We want to test against a null hypothesis of “no difference” among the mean lengths for the three populations of flowers. H0 : µ1 = µ2 = µ3 As with similar methods of multiple comparisons, we consider an alternative hypothesis that simply states “H0 is not true”. Ha : not all of µ1, µ2, and µ3 are equal This test is referred to as ANOVA, or ANalysis Of VAriance.
SLIDE 8
Tropical Flowers Conclusion
We can finish up the tropical flower example using software (time to Crunchit!). Crunchit! gives us a test statistic of F = 259.12 and a P-value of P < 0.0001. (More on the F distributions later) There is strong evidence that the population means are not equal. ANOVA doesn’t say which of the means are significantly different, but the bihai variety has longer flowers than the red and yellow varieties.
SLIDE 9
The Idea of ANOVA
With this much variation among individuals, we would not be surprised if another set of samples gave different sample means. These observed differences could easily happen by chance. It is unlikely that any sample from the first group would have a mean as small as the mean of the second group. This is evidence that there are real differences among the means
- f the populations.
SLIDE 10
The Idea of ANOVA
Caution: These boxplots and the applet ignore the effect of sample sizes.
◮ Small differences among sample means can be significant if
the samples are large.
◮ Large differences among sample means can fail to be
significant if the samples are small. However, the big idea remains: When we ask if a set of sample means provide evidence for differences among the population means, what matters is not how far apart the sample means are, but how far apart they are relative to the variability of individual
- bservations.
SLIDE 11
One-Way ANOVA
Analysis of variance is a general method for studying variations in data, and has a variety of applications. Comparing several means is the simplest. This form of ANOVA is called one-way ANOVA. This name comes from terminology for completely randomized experimental designs. An experiment has a one-way or completely randomized design if several levels of one factor are being studied and the individuals are randomly assigned to its levels. (There is only one way to group the data.)
◮ One-way: Testing 4 different amounts of fertilizer on seeds to
analyze the effect on growth rate.
◮ Two-way: Testing 4 different amounts of fertilizer on 3
different kinds of seeds to analyze the effect on growth rate.
SLIDE 12
The ANOVA F Statistic
As mentioned before, analysis of variance relies on the F distributions to calculate P-values under the assumption that the null hypothesis is true. The ANOVA F statistic takes on the general form F = variation among the sample means variation among individuals in the same sample. The numerator is a variance calculated using the sample means, while the denominator is essentially an average of the sample variances. Like χ2, the F statistic takes on non-negative values, where it is zero only when all the sample means are identical. The value of F becomes larger when the sample means are much more variable than individuals in the same sample.
SLIDE 13
The F Distributions
ANOVA F statistics correspond to one of a family of right-skewed distributions. Each F distribution has two parameters: the degrees of freedom of the numerator and the degrees of freedom for the denominator. Our notation will be F(df1, df2) where df1 is the numerator degrees of freedom and df2 is the denominator degrees of freedom.
SLIDE 14
Degrees of Freedom
Suppose we wish to compare the means of k populations. For each i ≤ k, we have an SRS of size ni from the ith population. Let N be the total number of observations, N = n1 + n2 + ... + nk. Then
◮ The degrees of freedom in the numerator are k − 1. ◮ The degrees of freedom in the denominator are N − k.
SLIDE 15
SLIDE 16
SLIDE 17
SLIDE 18
SLIDE 19
SLIDE 20
SLIDE 21
SLIDE 22
Conditions for ANOVA
There are three requirements to be able to apply ANOVA to the data.
◮ We have k independent SRSs, one from each of k
populations.
◮ Each of the k populations has a Normal distribution with an
unknown mean, µi
◮ All of the populations have the same standard deviation, σ,
whose value is unknown Note that there are k + 1 population parameters we must estimate from the data: the k population means, and the standard deviation σ.
SLIDE 23
About the Conditions
Independent SRSs: This requirement is familiar to us. Biased sampling or confounding can make inference meaningless. It can be difficult to get a true SRS at times, but getting as close as you can is always important. Normally distributed populations: This is also familar to us. No “real” population is exactly Normal, so can we use ANOVA when Normality is violated? Fortunately, like the t procedures, ANOVA F is robust. The normality of the distribution of sample means is what matters.
SLIDE 24