Glossary of Statistic Terms and Equations
Census: to measure every possible member of the population
Central tendency: the center or typical value of a dataset
Correlation coefficient: a number between −1 and +1 calculated to quantify the relationship between two variables or sets of data. Use Pearson's correlation coefficient for linear relationships and Spearman's rank correlation coefficient for monotonic relationships (meaning variables both increase or decrease together, whether in a line or a curve).
Data set: a collection of related observations or measurements
Descriptive Statistic: a value calculated from a data set to summarize the characteristics of the data set
Degrees of Freedom (df): the number of values in the final calculation of a statistic that are free to vary. Once a sample mean or median has been calculated, the df = n-1.
Dispersion: a measure of how spread out a set of data values are, indicating the variability or spread within a dataset
Inferential statistic: calculations that use data from a sample to make predictions or generalizations about a larger population.
Mean (x̄, pronounced "x-bar"): the average, also called the arithmetic mean, is the sum of all the values divided by the number of values.
Nonparametric statistics: statistical calculations that do not make assumptions about the data distribution and are often used for categorical data or continuous data that is not normally distributed. Outliers have less of an affect on the results of nonparametric statistics. Examples are the Kruskal-Wallace test and Spearman's Rank Correlation.
Normal distribution: a histogram with a "bell shaped curve" when graphed. The normal distribution describes a symmetrical plot of data around its mean value.
Outlier: a data point that differs significantly from other observations in a dataset, lying far outside the typical pattern or distribution of the data
Parametric statistic: statistical calculations that assume that the data is normally distributed. Parametric tests are often used for continuous data and are more likely to detect an effect if it exists compared to nonparametric statistics. However, outliers can significantly affect the results of parametric statistics. Examples are the T-test and ANOVA.
Population (N): total number of actual or possible objects/individuals in a group.
Sample (n): a smaller, more manageable representation of a full population.
Standard deviation (s): a descriptive statistic of the amount of variation of the values of a sampled variable about its mean. Describes variability within a single sample.
Standard error (SE): an inferential statistic that estimates the variability across multiple samples of a population. Also known as standard error of the mean (SEM).
Skew: a measure of how well the data distribution fits a normal distribution. If the distribution of data for a variable stretches toward the right or left tail of a frequency distribution, then the distribution is characterized as skewed.
95% Confidence Interval (95%CI): the range of number calculated from a sample that contains the true population mean 95% of the time.
Central tendency: the center or typical value of a dataset
Correlation coefficient: a number between −1 and +1 calculated to quantify the relationship between two variables or sets of data. Use Pearson's correlation coefficient for linear relationships and Spearman's rank correlation coefficient for monotonic relationships (meaning variables both increase or decrease together, whether in a line or a curve).
Data set: a collection of related observations or measurements
Descriptive Statistic: a value calculated from a data set to summarize the characteristics of the data set
Degrees of Freedom (df): the number of values in the final calculation of a statistic that are free to vary. Once a sample mean or median has been calculated, the df = n-1.
Dispersion: a measure of how spread out a set of data values are, indicating the variability or spread within a dataset
Inferential statistic: calculations that use data from a sample to make predictions or generalizations about a larger population.
Mean (x̄, pronounced "x-bar"): the average, also called the arithmetic mean, is the sum of all the values divided by the number of values.
Nonparametric statistics: statistical calculations that do not make assumptions about the data distribution and are often used for categorical data or continuous data that is not normally distributed. Outliers have less of an affect on the results of nonparametric statistics. Examples are the Kruskal-Wallace test and Spearman's Rank Correlation.
Normal distribution: a histogram with a "bell shaped curve" when graphed. The normal distribution describes a symmetrical plot of data around its mean value.
Outlier: a data point that differs significantly from other observations in a dataset, lying far outside the typical pattern or distribution of the data
Parametric statistic: statistical calculations that assume that the data is normally distributed. Parametric tests are often used for continuous data and are more likely to detect an effect if it exists compared to nonparametric statistics. However, outliers can significantly affect the results of parametric statistics. Examples are the T-test and ANOVA.
Population (N): total number of actual or possible objects/individuals in a group.
Sample (n): a smaller, more manageable representation of a full population.
Standard deviation (s): a descriptive statistic of the amount of variation of the values of a sampled variable about its mean. Describes variability within a single sample.
Standard error (SE): an inferential statistic that estimates the variability across multiple samples of a population. Also known as standard error of the mean (SEM).
Skew: a measure of how well the data distribution fits a normal distribution. If the distribution of data for a variable stretches toward the right or left tail of a frequency distribution, then the distribution is characterized as skewed.
95% Confidence Interval (95%CI): the range of number calculated from a sample that contains the true population mean 95% of the time.