APPENDIX I: The Language of Statistics

One of the trickiest parts of learning statistics is getting used to the language. The formal terms make it difficult to understand statistical writing or statisticians when they’re talking. Also, it’s difficult for the beginner to use terms correctly when referring to results in a written paper or during a talk.

Part of the reason that the terms are so difficult is that it’s important to be precise. Also, scientists try to be careful not to over-state or over-interpret their results. It’s always possible that new information will become available that can change the interpretation of data. As a result, scientists are reluctant to ever say that something has been proven.

Below is a list of definitions and guidelines that hopefully will help the reader to understand and use the language of statistics. Some of these terms and definitions might not be entirely clear upon first reading. It may be worth returning to this section after reading the results section in a paper from the scientific literature, or before reporting your own results in a paper or talk.

Some Common Terms Used in Reporting Results from Statistical Tests
• null hypothesis – a statement that any observed variability or pattern in the data is caused by random chance. Specific examples are shown in Appendix II. Statistical analyses work by testing specific null hypotheses. If the evidence against the null hypothesis is strong enough (if p < 0.05), we reject or disprove the null hypothesis. If the evidence is not strong enough (if p > 0.05), we accept the null hypothesis that random chance is causing the observed variability or pattern in the data.
• accept – to accept a hypothesis is to consider that based on the current data, it is likely to be true.
• reject – to reject a hypothesis is to say that it is probably untrue.
• disprove – often the term disprove is used instead of the term reject. It is important to note that many scientist take the position that hypotheses can never be proven (only supported) because it's always possible that new and better information may come along that will change the interpretation of a given set of data.
• statistically significant or significant If p < 0.05, the results of the statistical test are said to be statistically significant. Often the term significant is used alone when context makes it clear that significant means statistically significant.
• alternative hypothesis – in contrast to a null hypothesis, an alternative hypothesis provides an explanation (other than random chance) for the observed variability or pattern in the data. Often there are several alternative hypotheses each providing a different possible explanation for the observed variability or pattern in the data; each focuses on a different independent variable.
• research hypothesis – I have used the term research hypothesis in this manual to emphasize that I am not referring to null hypotheses. I use the term to refer to a scientific explanation proposed by a researcher to explain the observed variability or pattern in the data. A research hypothesis includes an independent and dependent variable (defined in the Introduction to the manual and in the glossary). In the research hypothesis, the independent variable (rather than random chance) is the proposed explanation for the observed variability or pattern in the data.
• support or consistent with – If the results of the statistical test reject the null hypothesis, they may be said "to support" or "to be consistent with" the research hypothesis that is being tested. As stated above, it is important to avoid stating that hypotheses have been proven (there is almost always another possible explanation for the results).
• pattern – whenever the observed variability in an dependent variable seems to be related to an independent variable, there is pattern in the data. If the variability is entirely random, there is no pattern.
• suggest – often the term "suggest" is used in sentences describing the pattern in data. For example – "The data in the scatterplot suggest that the diversity of fish communities is related to lake acidity." Usually the term "suggest" is more appropriate than the terms "show" or "prove".
• data – The word data is plural. The singular form is datum, a term that is almost never used.
• Type I Error – when a true null hypothesis is rejected. This may lead the researcher to support an alternative hypothesis that is incorrect. When there is a Type I Error, the results of the data analysis show statistical significance even though the null hypothesis is true. The lower the p-value, the less likely that there is a Type I Error.
• Type II Error – when a false null hypothesis is accepted. In this case the researcher may fail to support an alternative hypothesis that is correct. When there is a Type II Error, the results of the data analysis do not show statistical significance even though the null hypothesis is false. In general, Type II Errors are most common when the p-value is greater than but close to 0.05 (i.e. a p-value of 0.06). They are also much more likely when sample size is very small.