Category: analysis
A frequency distribution (or simply, ‘distribution’) is a descriptive statistical application that is a way of displaying large sets of data to provide an accurate picture. It shows the number of occurrences of a type of data among various classes to present raw data in a more usable form.
Frequency distributions should have no fewer than 6 and no more than 15 classes. Classes should be selected that will accommodate all the data and each item should be able to go into only one class. The classes should be of as equal length as possible.
- A numerical or quantitative distribution sorts or groups data according to numerical size.
An arrangement of data where 25% of the data falls below the Q1 (quartile), 50% of the data falls below Q2 and 75% of data falls below Q3. To establish quartiles, Q1 is 25 percent of the items starting at the bottom of the distribution. Q2 is actually the median.
For more information visit: http://mathworld.wolfram.com/Quartile.html
An arrangement of the data in a distribution such that 1 percent of the data falls below 1 percent, 1 per cent falls between 1 and 2 percent, and 1 percent is above 99%.
The median is the middle number in a sequence of numbers. It is used to avoid the problems that occur when an extreme value has a pronounced effect on the mean. Sometimes a middle or center of a set of data is used. This is the value of the middle item or, the mean of the values of the two middle items when the data are arranged in an increasing or decreasing order of magnitude.
If you have an odd number of items there is always a middle item whose value serves as the median. If there is an even number of items, the median is determined as the mean of the values of the two middle numbers.
The median is not easily affected by extreme values and can be used to define the middle of a number of objects, properties, or qualities which do not permit a quantitative description.
In issues of inference (estimation, prediction, etc.,) the mean is usually more reliable than the median because the median is subject to greater chance fluctuations than the mean.
The median is also referred to as the 'midpoint.'
For more information, visit this excellent site: http://mathworld.wolfram.com/Median.html
Mean and median do not always coincide. The median divides the data so that half of the items are less than or equal to the median, while the values of the other half are greater than or equal to the median.
The weighted mean is used in situations where it would be misleading to use the average. A weighted mean gives weight to the relative importance of each item.
The sum of all items divided by the number of items.
The average is calculated by dividing the sum by 'n' where n is the sample size. It is often used in statistics as interchangeable with the ‘arithmetic mean’ or ‘mean.’
The advantage to utilizing the average is that it can be calculated for any set of numerical data, it is always unique, and it takes into account each individual item.
However, interpret averages with caution: a single extreme item can cause an average to skew the data and provide a result that is not typical or representative of the data. Consider calculating the weighted mean in addition to the average to ensure your interpretation of the data is correct.
A graphical representation of statistical data created by representing the class frequencies on a vertical scale, and the measurements or observations on a horizontal scale (e.g. scores). Histograms differ from bar charts in that bar charts do not display a continuous horizontal scale.
For more information, visit this excellent site: http://mathworld.wolfram.com/Histogram.html
Display of data that shows the percentage of items that falls above or below certain values. To calculate the percentage distribution, divide the class frequency by the total number of items grouped and multiply by 100.
Intentional or unintentional distortions in the manner of data collection.
Statistical inference describes generalizations that go beyond the data in an attempt to predict, compare, estimate, or determine a course of action. The resultant statistics are used to estimate unknown values of the population (anything from demographic data, to purchase behavior, to psychographic info, etc.) of the population, based on a data sample of the population.
For example, a study to determine the percentage of people in the US who suffer from allergies would find it impossible to survey every person in the U.S. Determining the incidence of flu sufferers among a representative sample of US citizens would enable them to infer the incidence of allergy sufferers among the US population.
Statistical inference is one of the two ways in which data can be analyzed. The other is via
descriptive statistics.
Subscribe to Brandeo’s free weekly newsletter or RSS feed