Statistics is a branch of applied math that focused on:
• Theory and methods of collecting and interpreting the quantified data;
• Probabilistic description of random processes and systems with corresponding predictive algorithms (this domain is typically referred as a probability theory).
These two domains are tightly interrelated and incorporate similar mathematical models. At the core of math statistics are several fundamental metrics, defined as:
Statistical average or arithmetic mean of the quantified (numeric) data set is the sum of its element divided by elements count; statistical average is also called an expected value in probability theory. In the symbolic form: Avr{x1, x2, x3, … , xn} = (x1+x2+x3 +… + xn)/n, where n is the total elements count in data set. Numeric example of the statistical average computation follows:
Avr (6, 3, 2, 7) = (6+3+2+7)/4 = 9
Statistical median is the value, separating the lower half of the numeric data set from the upper half. For the data sets with total odd element’s count, median is just a number in the middle of the, which splits the corresponding sorted data set into two equal groups. For data sets with total even element’s count, median is defined as the average of the “central pair” of the numbers in sorted data set, which splits the whole set into two equal groups. Numeric examples follow:
Med(2, 5, 3) = 3
Med(2, 5, 7, 3) = (3+5)/2 = 4
Variance of the numeric data set is the average of the squared deviations of each number from the whole data set average value: Var{x1, x2, x3, … , xn} = ((x1-Avr)^2 + (x2-Avr)^2 + (x3-Avr)^2 +… +(xn-Avr)^2)/n, where Avr stands for the average value of the data set as described above and n is the total elements count in data set.
Standard deviation of the numeric data set is the square root of its variance.
Quick summary: Standard deviation and variance both serve as the measures of statistical dispersion in numeric data set. So-called unbiased estimator is often used for practical variance computation, replacing the element count n with (n-1) in the formula shown above.
Quantiles are the points, which divide the entire span of numeric data set into multiple equal intervals:
- The 2nd quantile is called a median
- The 3rd quantile(s) is called tertile(s)
- The 4th quantile(s) is called quartile(s)
- The 5th quantile(s) is called quintile(s)
- The 10th quantile(s) is called decile(s)
- The 100th quantile(s) is called percentile(s)
Statistical calculations of big numeric data sets could be rather time/labor intensive procedures if done with “paper-pencil” tool of trade (it’s “so XIX century!”). Better alternative is either to use commercial statistics calculator (hardware, which costs some money), or table processors with statistical functions, like Microsoft Excel™ (software, also with price tag attached), or totally free online statistics calculator. Just enter the numbers, separated by comma (the list could contain any integers, fractions, mixed numbers and decimals), then click enter to get the sorted list and all major statistics (see the sample image).
Copyright © 2009 Alexander Bell














Comments