Chapter 1 – Comparing the Means of Two Groups of Numbers:
Scatterplots and the t-Test

INTRODUCTION
One of the most common questions encountered in biological and ecological research is whether the averages or mean values for two groups of numbers are different from each another. (Throughout this manual average and mean both refer to arithmetic mean – for definitions, see glossary.) To answer this type of question, it is tempting to simply calculate the mean values, compare them, and if one seems larger than the other, conclude that the means are different. However, there is no way to make any correct conclusions about overall differences between two groups of numbers just by calculating mean values. It is important to consider the spread or variation within each group of numbers. Tools for considering variation include simple descriptive statistics, graphing using scatterplots, and the t-Test.

BACKGROUND EXAMPLE

Research Question: Is there a meaningful difference in the average weight of mountain lions in the northern vs. southern Rocky Mountains?

In order to address this question, let's consider a hypothetical study by the US Department of Fish and Wildlife in which mountain lions have been captured, weighed and released. We access the data and calculate means of 34 kgs. for the northern population and 31 kgs. for the southern population. (For this example, let's assume that these captured mountain lions represent an unbiased random sample of the mountain lion populations.) Based on the averages we've calculated, can we conclude that mountain lions in the northern Rockies are larger than those in the south? ABSOLUTELY NOT! It is impossible to make any reasonable conclusions about differences between two groups of numbers based only on mean values – we also need to know something about the spread or variation within each group!

Let's look more closely at the mountain lion data to see how variation can influence our interpretation. We'll consider two hypothetical examples – in both examples the means for the northern and southern populations are 34 kgs. and 31 kgs. However, as Table 1.1 shows, the amount of variation in mountain lion weight is very different in the two examples.  Before we start looking closely at the numbers, let's quickly define range and overlap. Range is defined as the difference between the maximum value and the minimum value in a group of numbers. For example, a group of numbers with a maximum value of 9 and a minimum value of 2 has a range of 7. Range is related to the amount of spread or variation within a group of numbers - in general, the greater the range in a group of numbers, the greater the variation in that group.

Overlap between two groups of numbers is determined by comparing the maximum and minimum values of the two groups. If the maximum and minimum values of one group of numbers fall entirely outside of the range of a second group of numbers, there is no overlap. On the other hand, if the maximum and minimum values of two groups of numbers are identical, they overlap entirely. As you might expect, the greater the overlap between two groups of numbers, the less likely they will have significantly different mean values. Table 1.2 shows the maximum values, the minimum values, and the ranges in the weights for the two examples. First let's consider the data in Example 1. Here the weights vary from 26.0 to 41.0 for both populations (a range of 15), and the ranges overlap completely. By closely examining the raw data (Table 1.1a), it should be clear that individuals from the northern population are not always larger than those from the south. For example, four of the ten mountain lions (data shown in bold in Table 1.1a) in the northern population are smaller than the mean of 31 kgs. for the southern population. Likewise, three of the ten mountain lions (data shown in bold) in the southern population are larger than the mean of 34 kgs. for the northern population. If we choose two mountain lions at random, one from the north and one from the south, there is a reasonable chance that the southern individual would be the larger of the two.

Now let's look at Example 2 and see how the situation differs. The variation in weight in Example 2 is relatively low as reflected by the range of 3 compared to a range of 15 in Example 1. In addition, there is almost no overlap between the two populations in Example 2 – only at the value 32.5 kgs. (data shown in bold in Table 1.1b) is there any overlap. Furthermore, an examination of the raw data (Table 1.1b) reveals that all of the mountain lions in the northern population are bigger than 31 kgs. (the mean weight of southern mountain lions) and all of the mountain lions in the southern population are smaller than 34 kgs. (the mean weight of the northern mountain lions). Based on these observations, in Example 2 it appears that the difference in the means between the two populations represents a meaningful difference in size. On average, northern mountain lions are in fact larger than southern mountain lions (later we will confirm this with a statistical test). If we select two mountain lions at random, one from the northern population and one from the southern population, chances are very high that the northern mountain lions will be bigger.

So returning to Example 1, why did the means for the two populations differ even though there is no meaningful size difference between the two populations? In Example 1 the difference between 34 and 31 is probably the result of random chance rather than a real difference between the two populations

Random Chance
Does this lack of "meaningful difference" suggest that the number 34 is actually not different than the number 31? No . . . it simply suggests that the calculated means of 34 kgs. and 31 kgs. do not represent an overall size difference between these two mountain lions populations. If a different set of mountain lions from the north and south had been weighed, we certainly would have ended up with different estimates for the means for each population (our calculated averages would not be exactly 34 and 31 with a different set of mountain lions). In fact, we may have even ended up with a greater mean value for the southern population. This scenario is similar to flipping a coin 100 times. The result will rarely be exactly 50 head and 50 tails . . . sometimes there will be a few more heads than tails, other times a few more tails than heads. For Example, recording 54 heads and 46 tails probably doesn't mean that a coin is biased – we could have just as easily recorded 46 heads and 54 tails. In this case, random chance has simply caused minor differences in the number of heads and tails. In our mountain lions in Example 1, random chance was most likely the cause for the different means in the two populations; there is probably no meaningful size difference. On the other hand, in Example 2 the weights of individuals from the two populations are so different that these differences are probably not the result of random chance – the difference is meaningful, and as we will see, "statistically significant."

ENTERING AND DESCRIBING DATA
So how do we determine objectively whether 1. the difference between the means for two groups of numbers is due to random chance OR 2. the difference is probably not due to random chance and therefore reflects something meaningful or interesting? The answer is to use graphs to visualize the data and a statistical test (such as a t-Test) to get a quantitative estimate of the likelihood that random chance is causing the differences between the means. In order to use MS Excel to graph and test our data, first we need to enter the data into a spreadsheet.

Entering the Data

Table 1.3. Spreadsheet of raw data corresponding to Example 1. Describing the Data
Once you have entered the data, it is helpful to calculate some basic descriptive statistics. Before beginning, save a new copy of your data by using the "File, Save As" command and give it the name "data analyses".

To summarize your data, I recommend calculating the statistics shown in Table 1.4. The mean values will summarize the "location" of your data. The standard deviation, min value, and max value will summarize the "spread" or variation in your data (see the Introduction to this manual for a discussion of location and spread). I also recommend doing a "count" on your data. Count is simply the total sample size – the number of observations or data points in the group of numbers. On small data sets the count or sample size may seem obvious, but for larger sample sizes it is helpful to have count appear in your summary table.

Table 1.4. Raw data from Example 1 and formulas for calculating summary statistics. In order to calculate these summary statistics, you will need to enter formulas into the Excel spreadsheet. If you have already used Excel formulas, this should be relatively straightforward. If not, you will see that with a little practice, it's pretty easy.

So, type in the information shown in Table 1.4 to the right of your raw data. To use formulas, type text into each cell exactly as it appears in the 4th and 5th columns of the table. Once you type "=", MS Excel goes into formula mode and anything you type or click will get entered into the formula. If you type carefully, the formulas in Table 1.4 will work. When first using formulas in spreadsheets, patience and practice are helpful.

A useful shortcut to know when entering text within parentheses (such as "A3:A12") is to click and drag down the range of values you wish to enter. However, regardless of how you enter your formulas, it is important to check to be sure you that you have entered them correctly. To display your formulas in your spreadsheet, hold down the ctrl and ` buttons (the ` button is usually on the upper left corner of the keyboard). To switch back to viewing the calculated values, hold down ctrl and ` again. Once you have entered your formulas, check to be sure that your calculated values appear as shown in Table 1.5. If not, you have made a mistake in entering your formulas (the formulas will only work if you've set up and entered your data exactly as shown in Tables 1.3 – 1.5).

Table 1.5. Raw data and descriptive statistics showing calculated values for Example 1. After summarizing your data in a table, it is extremely helpful to visualize it with a graph. It is much easier to observe range, overlap, and the overall pattern in your raw data with a graph than with a data table. It is also much easier for a reader or an audience to understand your data by looking at a graph than by looking at a table of numbers.

GRAPHING THE DATA
The most straightforward way to visualize the data for two groups of numbers is to use a scatterplot. This type of plot has the advantage of showing all of the raw data rather than grouping it into categories as is done in a frequency distribution or histogram (see Appendix V).

A scatterplot typically includes an independent variable along the x-axis and a dependent variable along the y-axis. In the mountain lion example, the independent variable is the categorical variable source population which has two possible values (northern or southern). The dependent variable is the continuous variable mountain lion weight.

Making the Graph
In order to make a scatterplot in MS Excel, all of the measured values for the dependent variable need to be in the same column. In the mountain lions data set, this means that you need to combine all the weight values for both populations into one column.

I recommend setting up a second worksheet in your MS Excel workbook. Give Sheet2 a new title such as "scatterplot" (remember, you can re-name worksheets by double clicking on the "Sheet" tab at the bottom of your screen). As shown in Table 1.6, title column 1 "category" and column 2 "mountain lion weight". You then need to know how many categories or groups there are (for the mountain lion examples, there are 2 groups – the northern and southern populations). You also need to know how many data points you have within each group (10 mountain lions per population). Enter "1" into the first ten rows in the "category" column and "2" into the next ten rows as shown in Table 1.6. The "1" and "2" are category variables that stand for the northern and southern populations.

Table 1.6. Correct format for creating scatterplot for data from Example 1. To fill your "mountain lions weight" column you can copy and paste the data for the northern mountain lions population into the first ten rows of that column and the data for the southern mountain lions population into the next ten rows. There are two very convenient shortcuts that will help at this stage: after highlighting the cells you want to copy by clicking and dragging, hold down the ctrl and c keys (this is the copy shortcut). Next highlight the first cell in the row or column where you want to paste the data and hold down the ctrl and v keys (this is the paste shortcut). Don't forget to periodically save your data by simultaneously holding down the ctrl and s keys! Now your data are formatted correctly for creating a scatterplot.

# Directions for Formatting Your Scatterplot Figure 1.1. Scatterplot showing mountain lion weight for the two populations in Example 1. Figure 1.2. Scatterplot showing mountain lion weight for the two populations in Example 2.

There are many types of scatterplots, and interpretation is largely a matter of experience and practice. However, there are some general guidelines that will help you to interpret scatterplots that represent data from two groups of numbers (such as the scatterplots in Figures 1.1 and 1.2 - scroll up to see them). The first thing to look for is whether there is overlap between the two groups of numbers. The more overlap, the less likely the two groups are different; the less overlap, the more likely they are different. It is also helpful to look at the amount of variation or spread in the numbers for each group. In other words, within each group, are the values all clustered near each other, or are they spread out?

It should be relatively clear just by looking at Figure 1.1 that mountain lion weight for the northern population is not consistently greater than for the southern population. There is a lot of overlap between the two populations and the data are spread out. In fact it should be easy to see (as we discovered by examining the raw data from the original data tables) that four of the ten mountain lions in the northern population are smaller than the mean of 31 kgs. for the southern population and three of the ten mountain lions in the southern population are larger than the mean of 34 kgs. for the northern population. On the other hand, Figure 1.2 shows that for Example 2, there is little overlap between the two populations - almost all of the mountain lions from the northern population weigh more than those from the southern population. Of course, scatterplots are most useful for interpreting your data when they are combined with a statistical test like the t-Test described in the following section.

Important Note on Hidden Data
Another extremely important issue to be aware of when creating and interpreting Scatterplots is hidden data. When there are two or more identical data points (such as two or more mountain lions that have the same weight), scatterplots created in MS Excel only show one data point. It is critical to find hidden data points and represent them on your graph because multiple identical data points can radically change the interpretation of a scatterplot. You should always visually scan your raw data for identical data points. Appendix VII describes how to find hidden data and modify scatterplots to include them.

TESTING THE DATA
Once you've visualized your data with a scatterplot, you should have a much better idea of whether there are real differences between your two groups of numbers. However, to make an objective assessment of whether the differences are meaningful (or in technical terms, to estimate the probability that the difference between the means in your two groups of numbers is the result of random chance), you need to perform a statistical test. For now, let's assume that our data meet the appropriate assumptions and that a t-Test is the correct test.

# Directions for Doing a t-Test

Table 1.7. Example of properly formatted output from a t-Test. • Mean: These values are the calculated averages or arithmetic means for each of the two groups of numbers.
• Variance: This is a measure of the variation within each group of numbers. The greater the value for variance, the greater the variation. It is also helpful to know that the variance = the standard deviation squared. If you wanted to know the standard deviation, you could take the square root of the variance.
• Observations: This is simply the sample size or the number of data points within each group.
• df: This stands for "degrees of freedom" and is related to sample size. It is necessary when you use a statistical table to find your p-value. Even though you have used the computer to find p, it is still important to report df. In general, the greater your df, the better you are able to detect differences between means.
• p-value: This is the "punch-line" - the output that tells you whether the difference between the means in the two groups of data is statistically significant. If p < 0.05, then the difference between the means is statistically significant. If p > 0.05, the difference between the means is not statistically significant. It is also helpful to know precisely what p stands for – it is the probability that the difference between the means is due random chance (see Introduction on Critical Concepts). The lower the probability that the difference is due to random chance (the lower the value of p), the more likely that something non-random (and possibly biologically or ecologically interesting) is causing the difference between the means. In the comparison of the weights of mountain lions in Example 1, the p value is 0.20 (which is greater than 0.05) meaning there is a 20% chance that the difference in weight between the two populations is due to random chance. Therefore, there is not a meaningful difference between the two groups of numbers; the difference is not "statistically significant."

Repeat the t-Test on the data from Example 2. Your formatted output should look like Table 1.8. Why is the result different from Example 1 even though the mean values for the northern and southern populations in the two examples are the same? If you are unsure how to answer this question, I recommend re-reading the Introduction and Chapter 1 and talking to other students and your professor. Once you fully understand these examples and the interpretation, you are well on your way to understanding some of the most important concepts in statistics!

Table 1.8. The output for a t-Test on the data from Example 2. Note that the means are exactly the same as in Table 1.7, but the variances are much lower and the p-value of 0.0000032 shows that the difference between the means is highly significant. REVIEW OF KEY CONCEPTS
• It is impossible to determine whether two groups of numbers are meaningfully different in terms of their "location" or "average values" just by comparing the calculated mean values. The "spread" or "variation" within each group must also be considered.
• The lower the range and the less the overlap between the two groups of numbers (and therefore the lower the variation), the more likely there is a significant difference between the means.
• Visualizing data by using a scatterplot is a critical tool for data interpretation.
• The lower the probability that the difference between the means is simply due to random chance (in other words, the lower the p-value), the more likely there is something interesting or meaningful causing the difference between the means. In a t-Test, If p < 0.05, the difference between the means is considered statistically significant.

MORE ON THE T-TEST AND ALTERNATIVE TESTS

The Concept Behind the t-Test
What follows is a very brief overview of how the t-Test works. For details, consult Snedecor and Cochran (1980).

In a t-Test an equation is used to estimate a statistical parameter called t. The value of t is influenced by both the difference between the means of the two groups being tested and by the variation within the groups: the greater the difference between the means, the greater the value of t; the less variation within groups, the greater the value of t.

Once t is estimated from the data being tested, it is then compared to a known distribution called the t distribution. If the calculated value of t is greater than a critical t value (found using df from a table of t values), then it is unlikely that the difference between the means of the two groups being tested is due to random chance alone. This is the same as saying that the calculated value of t falls beyond a cutoff within one of the tails of the t distribution; the further the calculated t is from the center of the distribution, the less likely random chance alone is causing the difference between the means.

Other Types of t-Tests
The t-Test described above is for comparing two groups that have equal variances. Loosely speaking, this means that the variation within one group is similar to the variation within the other group. MS Excel can perform a similar test called a "t-Test: Two-Sample Assuming Unequal Variances" (also found under Data Analysis in the Tools menu). This test should be performed when the variation in the two groups is quite different. Additionally, some texts recommend using the test assuming unequal variances when it is unclear which to use (Gotelli and Ellison, 2004).

Another variation on t-Tests is the one-tailed t-Test (the test used on the mountain lion data was a two-tailed test). A two-tailed test is used when there is no prediction about which group should have a higher mean values. It just tests whether the mean values are different from each another. A one-tailed test is used when there is a prediction before the data are collected that one group should have a greater mean value than the other. In MS Excel, the table showing the results of a t-Test includes the p-value for both two-tailed and one-tailed tests. The two-tailed value is shown as "P(T<=t) two-tail" while the one-tailed value is shown as "P(T<=t) one-tail".

Yet another type of t-Test is called a paired t-Test (another option under Data Analysis in the Tools menu). A paired t-Test is often more powerful than other t-Tests but should only be used when data are paired. Pairing refers to situations when observations from the two groups occur in pairs. For example, if you are comparing the size of male vs. female birds and you measure the size of males and females from mating pairs, the data are naturally paired. To perform a paired t-Test, data must be entered so that both observations for a pair occur in the same row. It is then relatively straightforward to use MS Excel's Data Analysis option to perform a "t-Test: Paired Two Sample for Means".

When the t-Test is Appropriate
(See Snedecor and Cochran (1980) for thorough discussion of assumptions of the t-Test.)
• When the data are distributed normally (see Appendix III).
• When the data are independent (see glossary for more on independence).
• When the data were collected in an unbiased manner. Though this manual does not go into detail on methods for data collection, it is important to stress that statistical tests can not correct for data sets that were collected improperly. See Brower et. al. (1998) or Krebs (1989) for more on unbiased sampling.
Alternatives to the t-Test
There are several non-parametric tests that have less restrictive assumptions and can be used when data are not distributed normally. For details on how to perform these analyses, see Gotelli and Ellison (2004).
• A Median Test or a Wilcoxon Rank Sum Test is analogous to an unpaired t-Test.
• A Wilcoxon Signed Rank Test is analogous to a paired t-Test.