ANOVA and Multiple Comparisons
Biostatistics BI 45, Saint Anselm College
Background Information
Analysis of Variance - ANOVA
Developed by Fisher while studying agriculture and crop output with different fertilizer treatments, needed test that could evaluate differences between three or more means, developed the F distribution
Why - problems with applying the Student's t test to more than 2 sample/population means
number of comparisons between all means increases with an increase in the number of samples/populations (k)
alpha level becomes unstable with addition of more means, chances of making a Type I error increase with an increase in the number of means being compared
Types of ANOVAs
One-way ANOVA
Examines one factor at a time, tests for differences among levels of the factor, Example from Zar (1999)
Question - is there a difference in the average weight of pigs fed different types of feed?
Factor - pig feed
Levels - different feeds (feed type 1, feed type 2, feed type 3, feed type 4)
Fixed effects or Model I One-way ANOVA
the levels of the factor are specifically chosen by the investigator
example - 4 feed types were specifically chosen
Random effects or Model II One-way ANOVA
the levels of the factor are randomly chosen by the investigator
example - 4 feed types were randomly chosen from many different types of feeds
This course addresses the One-way ANOVA, can be applied to fixed effects or random effects models
Others
Two-way ANOVA - examines effects of two factors simultaneously, beyond scope of this course
Mechanics of One-way ANOVA
Focus in on analysis of variance - comparison between 2 types of variance
among group variance (variation among the grand mean and the sample means)
note that this is often referred to the group mean square (MSg) as well as the among group variance
grand mean is the mean for all data
within group variance (sum of the variation within each sample - around each sample mean)
note that this is often referred to the error variance or error mean square (MSe) as well as the within group variance (MSw)
evaluation uses a one-tailed F test to determine if the among group variance is greater than the within group variance
Assumptions of the ANOVA
homogeneity of variance, sample variances are equal
Bartlett's test for homogeneity of variance
samples come from normal populations
ANOVA is robust enough to handle departures from normality and unequal variances
Problems occur when
heterogeneity of variances is combined with unequal sample sizes
skewness/kurtosis is combined with smaller sample sizes
Results of ANOVA
no difference between among group variance and within group variance
this tells us there is no difference among means
difference between among group variance and within group variance
this tells us there is a difference among means but not which ones are different
How we conduct an ANOVA
Mechanics - Null hypotheses, formulae, interpretation, etc.
Multiple Comparisons Tests - only used if the results of an ANOVA yield a significant difference between among group variance and within group variance
ANOVA results only indicate that a difference exists among means, not where the difference(s) are - which means are different, have to use multiple comparisons test to detect where the differences lie
Referred to as ad hoc or a posteriori tests because they are used after you know there is a significant difference from the ANOVA
Several types of multiple comparisons tests - can be combined into three broad categories
Generic multiple comparisons tests - evaluate all possible pairs/combinations of means
Tukey's HSD test
Student-Newman-Keuls (SNK) test
unique in that the critical value of q changes with the range of means being considered, sometimes referred to as the multiple range test
Control group test - evaluate differences between experimental group versus the control group (not differences among experimental groups), also employs one-tailed tests which is less common practice with other multiple comparisons tests
Dunnett's test
Multiple contrasts tests - can be used like the traditional tests mentioned above to evaluate differences among pairs of mean but is better used to evaluate homogeneous groups of means against other such groups or individual means,
Scheffe test
Examples
group of 2 means versus 1 mean
group of 2 means versus group of 2 means
group of 3 means versus 1 mean
group of 3 means versus group of 2 means
etc.
How we conduct Multiple Comparisons test
General Overview of how to conduct these tests
Procedures in these tests that are similar:
All tests involve pairwise comparisons of means
Rank order the means for comparisons
highest to lowest (more common) or lowest to highest (Dunnett's test)
set up statistical hypotheses for each pairwise comparison
Calculate an observed q value similar to the t test and z scores
Compare with a critical value and reject or do not reject the null hypothesis for a pair of means
Use the enclosure rule in all tests (if any pair of means are not different, then all means enclosed within that comparison are also not different)
Procedures in these tests that are different:
how the means are rank ordered (changes in the Dunnett test or one-tailed tests)
the standard error term differs among tests
Mechanics of each Procedure - Null hypotheses, formulae, interpretation, etc.
Multiple Comparisons - Examples from Zar (1999)
Blackboard at Saint Anselm College
Trademark and Disclaimers
Copyright © 2001 Jay Pitocchelli. All rights reserved. The contents of this page are the intellectual property of Dr. Jay Pitocchelli for distribution to students enrolled in Biostatistics BI 45 at Saint Anselm College. These pages may not be copied, photocopied, reproduced, translated, or published in any electronic or machine-readable form in whole or in part without prior written approval of Jay Pitocchelli. Students enrolled in Biostatistics BI 45 at Saint Anselm College have permission to print this material for their lecture notes. All formulae and critical values from: Zar, J. H. 1999. Biostatistical Analysis. (4th ed.). New Jersey, Prentice Hall.