Choosing the correct statistical test to analyze a data set involves understanding what type of variables you’re testing, such as whether the independent and dependent variables are categorical or continuous (definitions
here), and checking whether the data meet the assumptions of particular statistical tests. Thoroughly checking assumptions is beyond the scope of this online resource, though some assumptions are listed at the end of each chapter in this
statistics manual. Before attempting to publish your work, you should consult a statistics text (see
references cited) and/or a professional statistician to be sure you’re using the correct test.
This page focuses on several
parametric tests. For now, let's assume that your data meet the assumptions for one of these tests. If so, the table below should help you decide which statistical test is best for analyzing your data.
General guide to choosing statistical tests based on whether the independent and dependent variables are categorical or continuous.
Independent Variable |
Dependent Variable |
Statistical Test |
  categorical (with two values or groups being compared)   |
continuous
|
t-Test or paired t-Test |
  categorical (with three or more values or groups being compared)   |
continuous |
Anova |
  continuous   |
continuous |
Regression Analysis |
  continuous   |
categorical |
logistic regression (not covered in this online guide; see Gotelli and Ellison 2004 referenced here) |
  categorical   |
categorical |
chi-square analysis |
Parametric Statistics
All parametric tests rely on the assumption that the data were sampled from a specified probability distribution. A probability distribution is a distribution of outcomes based on a mathematical equation called a probability distribution function (PDF). If data follow a known PDF, they can be characterized with relatively few parameters and the mathematical calculations for performing a statistical test are relatively simple. Many parametric tests assume that data follow the normal distribution, and fortunately biological and ecological data are often distributed normally.
(back to top)
The Normal Distribution
The PDF for the normal distribution (see graph below) shows the expected frequency of different values. As you can see, values occurring near the mean are most common, while those further from the mean in the tails of the distribution are less common. Consequently, if you’re collecting data from a set of values that follow the normal distribution (such as height in humans), most measurements will be relatively close to the mean, and extremely low or high values will be rare.
Graph of the PDF for The Normal Distribution. The x-axis corresponds to the measurement of interest (such as height in humans), and the y-axis is the frequency of each measurement.
Under the normal distribution, the mean value occurs at the center of the curve and has the highest frequency. The standard deviation measures the width or "spread" of the distribution.
Because data that follow the normal distribution can be characterized by these two parameters (mean and standard deviation), statistical testing is easier than for data that don't follow a known distribution and therefore can't be characterized by two simple parameters. The t-Test and Anova rely on the assumption that data are distributed normally.