Showing posts with label Variance. Show all posts
Showing posts with label Variance. Show all posts

Saturday, October 17, 2015

Statistics (Yeah, this stuff looks like a complicated topic)

All right friends, no doubt some of you are wondering why someone would be writing up a blog post about statistics on a Saturday morning, ( I blame the current state of Saturday Morning Cartoons) but here we are.   Statistics is a big scary word and it has a lot of jargon that goes with it (statistics and jargon at the same time?) but I'll try and break this off into a bite size snippet so that you can ease your toes into it.

First I need to define some basic terms so that the rest of this makes sense, and hopefully i choose the right words to explain the concepts. 

First up

Data Point:  A discrete, individual numeric value taken for the purpose of analysis or reporting.  This can be a score on a test, a person's height, their shoe size, or one of many other factors that can be categorized independently with a numeric value.

Data Set: This is the term i tend to use when i start a statistical analysis.  The data set is the grouping of all data points you are using as the basis of your analysis.  Often referred to as a sample, it is usually a smaller grouping out of a larger population.

Sample:  a Small grouping of discrete data points taken from a larger population group for the purpose of analysis.  Because the populations we look at are so large, it is unfeasible to take data on the entire population as a whole, so we take small grouping out of it.

Population:  The total number of potential data points that are available to study.

So, to wrap this up,

The Population for a given study could be the student body at William Johns Junior Elementary school in Anytown USA.  Anytown has had a healthy growth, and currently has 1,200 students listed in its files.  Dr. Meanswell, the Principal, would like to know how the attendance is shaping up for the semester, so he decides to gather information on 60 of his students.  Those 60 students are the sample he's taken, (hopefully they're a well selected sample, which we will get to on another boring saturday).  He pulls their absences for the month of September.  Individually, these absences are data points, but when taken together as a whole, they represent the data set he would use for analysis. 


Now for some number concepts that help explain the math behind how this works


Mean:  In statistics, (and in most other math settings) this is the number you come to if you add up all of the numbers in a given group and then divide by the number of numbers in the group.  Now i realize that was me using the word numbers too many times, so let's try a visual.

                                  I have the number 9,3,7,10, and 12.

                                  First thing i want to do is put them in order

                                  So i have 3,7,9,10, and 12 in order

                                 Adding them together i come to the number 41.

                                 In order to calculate the mean, i need to divide 41 by the amount                                  of distinct numbers, in this case 5

                                so 41/5 = 8.2

                               thus, 8.2 is your mean

You'll often hear the words mean, median and mode tossed around in the same grouping, and they are similar, but for most statistical work, mean is the critical number to find.

Median:  The exact middle point in any data set.  In the example above, the median is 9, as it exactly the same distance from either end of the number line.  In a case where you have an even number of entries in your data set, average the two values in the middle to find the median.

Mode: This one is fairly simple, the mode is the most frequently occurring number  in the data set.  In the example above, there isn't one, because the numbers occur at the same frequency. 

Once you have calculated the mean, you can work out another pair of related concepts, variance, and standard deviation. 

Variance:  The difference of the data point and the mean, squared.    You need to square the numbers or the negative values start to cancel the positive ones, and then nothing makes any sense. 

Standard Deviation:  The standard deviation is a measurement tool that tells you how far apart the numbers in any given set are spread out.

So using our dear friend the example above

                                              3 - 8.2 = -5.2 Squared = 27.04
                                              7 - 8.2 =  -1.2 Squared =   1.44
                                              9 - 8.2 =     .8 Squared =     .64
                                            10 - 8.2 =   1.8 Squared =   3.24
                                            12 - 8.2 =   3.8 Squared = 14.44

Variance calculated, now for some mathemagic

                    27.04 + 1.44 +.64 +3.24 + 14.44
                   ---------------------------------------
                                              5

Simplified    46.8
                      -------
                          5

Variance is 9.36

Standard Deviation is fairly easy to calculate, as it is the square root of the variance, so

Standard Deviation is √9.36

Standard deviation= 3.06

So now we have our Mean, Median, Mode, Variance, and Standard Deviation.  Those are numbers though, and numbers can be hard to explain.  Next time we'll look at graphical representations of statistics, and take a look at the Bell Curve and the Normal Distribution