APPENDIX VII: Finding and Displaying Hidden Data on Scatterplots
(back to table of contents; back to stats page)

Hidden data occur in scatterplots when two or more data points have the same values for the x and y coordinates; though there are several data points with the same values, the scatterplot only shows one data point on the graph. As we will see below, the presence of hidden data can greatly influence the interpretation of a scatterplot. Therefore, hidden data points need to be identified and displayed.

Note: If you're having problems with hidden data and you're making a scatterplot with an x-axis representing categorical data, you may be better off simply displaying means and error bars (rather than the raw data) as described in Appendix VI. However, if your x-axis represents a continous variable (such as in regression analysis), read on.

Below is a hypothetical data set for fish sampled in lakes with different vegetation complexity (data are different from the example in Chapter 3; data for this example can be found here). Figure VIIa is a scatterplot of the data that does not display hidden data points.

Figure VIIa. The number of fish species plotted against vegetation index. Hidden data are not displayed. Notice that there are only nine data points on the graph even though sixteen lakes were sampled.

In the figure above it appears that thereís no relationship between the number of fish species and vegetation complexity. There is no indication that lakes with a higher vegetation index have consistently more or fewer fish species than lakes with a lower vegetation index. However, this graph is misleading because it doesnít display hidden data. In fact, regression analysis reveals that there's a significant positive relationship between the two variables (p=0.03, R2=0.29). How can this be? Read on.

Itís important to always check your scatterplot against your raw data for evidence of hidden data. The first (and often easiest) way to identify the presence of hidden data is to count the number of points on the scatterplot. In Figure VIIa there are only nine data points even though sixteen lakes were sampled. Clearly there must be data that are hidden.

Finding Hidden Data

These data have been "double-sorted." Note that when sorted, it's easy to identify and count data with identical x and y coordinates (identical data shown in bold with underlines).

Figure VIIb. Scatterplot showing the same data that are graphed in Figure VIIa except that hidden data are shown by numbers in parentheses. Now the scatterplot shows a positive relationship between number of fish species and vegetation complexity as also indicated by regression analysis (p=0.03, R2=0.29).

Clearly hidden data can affect your interpretation of your graphs! Be sure to check for this problem - fortunately, in many data sets it's not an issue.

©2014 EcologyAndEvolution.Org, All rights reserved.